1. 下載台積電股價
2. 將資料做Normalization
3. 準備training set 和 test set:
training set 的Features由60日內的開盤價的陣列組成, 每一筆training set 相差一天
training set 的Labels由第70日的開盤價組成
test set 則由training set最後一筆資料過後的股價組成
4. 用training set建立RNN model
5. 視覺化模擬過程
6. 用test set測試model
import numpy as np import matplotlib.pyplot as plt import pandas as pd import datetime from yahoo_historical import Fetcher #下載台積電在費城半導體掛牌的TSM ADR的股價 data = Fetcher("TSM", [2007,1,1], [2019,1,1]) df=pd.DataFrame(data.getHistorical()) #print(data.getHistorical()) df=df.set_index('Date') df.head()df['Open'].plot()
#將資料做Normalization training_set = df.iloc[:,0:1].values from sklearn.preprocessing import MinMaxScaler sc = MinMaxScaler(feature_range = (0, 1)) training_set_scaled = sc.fit_transform(training_set) #準備training set X_train = [] y_train = [] for i in range(60, 2035): X_train.append(training_set_scaled[i-60:i, 0]) y_train.append(training_set_scaled[i+10, 0]) X_train, y_train = np.array(X_train), np.array(y_train) X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1)) #準備testing set X_test = [] y_test = [] for i in range(2035, len(training_set_scaled)-10): X_test.append(training_set_scaled[i-60:i, 0]) y_test.append(training_set_scaled[i+10, 0]) X_test, y_test = np.array(X_test), np.array(y_test) X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1)) #匯入Keras from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers import Dropout #建立模型 regressor = Sequential() regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1))) regressor.add(Dropout(0.2)) regressor.add(LSTM(units = 50, return_sequences = True)) regressor.add(Dropout(0.2)) regressor.add(LSTM(units = 50, return_sequences = True)) regressor.add(Dropout(0.2)) regressor.add(LSTM(units = 50)) regressor.add(Dropout(0.2)) regressor.add(Dense(units = 1)) regressor.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics=['mae']) train_history=regressor.fit(X_train, y_train, validation_split=0.1, epochs = 10, batch_size = 50) #視覺化訓練過程 import matplotlib.pyplot as plt def show_train_history(train_history,train,validation): plt.plot(train_history.history[train]) plt.plot(train_history.history[validation]) plt.title('Train History') plt.ylabel(train) plt.xlabel('Epoch') plt.legend(['train','validation'],loc='upper left') plt.show()
#測試model的準確度 score=regressor.evaluate(X_test,y_test) print('mae=',score[1])
#進行預測 prediction=regressor.predict(X_test) #將預測結果轉換成原始座標 prediction_t = sc.inverse_transform(prediction.reshape(-1,1)) y_test_t = sc.inverse_transform(y_test.reshape(-1,1)) #畫出實際股價和預測股價走勢 plt.plot(np.arange(len(y_test)),y_test_t,label='real') plt.plot(np.arange(len(prediction)),prediction_t,label='prediction') plt.title('台積電TSM ADR股價走勢模擬') plt.legend()
這個預測的趨勢看似準確,但實際上這些預測仍然有用到training set之後的股價來當作是input,此外這個模型只用到開盤價的資訊;實務上當日成交量,法人籌碼也都會對於未來股價有所影響,因此日後的model會加入當日成交量,法人籌碼的特徵值
以下我們用yahoo_historical取得的所有表單欄位(Open, High, Low, Close, Adj Close, Volume)當作是Features用來建立model,
經過100次訓練後得到的訓練過程如下:
用六個Features預測股價的擬合程度較用單純開盤價的擬合程度差