訓練和測試資料說明
從kaggle下載fashion圖片集,圖片大小類似mnist文字集
訓練資料集: 60000筆資料, 每筆由一個785個元素的一維陣列所組成,其中第一個元素為label,剩下的784個元素為Features
測試資料集: 10000筆
整理資料成 trainX, trainY, testX, testY等四個陣列,其中 trainX 和 testX 是圖片的Features,把圖片由784個元素的1D array轉換成28*28元素的2D array
import numpy as np
from keras.utils import np_utils
import os
#整理資料成trainX, trainY, testX, testY
train=pd.read_csv('fashionmnist/fashion-mnist_train.csv')
trainX = train.iloc[:,1:].values.reshape(60000,28,28).astype('float32')
trainY = train['label']
test = pd.read_csv('fashionmnist/fashion-mnist_test.csv')
testX = test.iloc[:,1:].values.reshape(10000,28,28).astype('float32')
testY = test['label']
建立對比Label和prediction的圖片集
import matplotlib.pyplot as plt
def plot_images_labels_prediction(images,labels,prediction,idx,num=10):
#images: 影像; labels: 答案; prediction: 預測結果; idx: 開始顯示的資料; num: 要顯示的資料筆數
fig=plt.gcf()
fig.set_size_inches(12,14)
if num>25:num=25
for i in range(0,num):
ax=plt.subplot(5,5,1+i)
ax.imshow(images[idx])
title='label='+str(label_class[labels[idx]])
if len(prediction)>0:
title+=",predict="+str(label_class[prediction[idx]])
ax.set_title(title,fontsize=8)
ax.set_xticks([]);ax.set_yticks([])
idx+=1
plt.show()
建立Label代碼和對應物件的字典
label_class=dict({0:'T-shirt/top',
1:'Trouser',
2:'Pullover',
3:'Dress',
4:'Coat',
5:'Sandal',
6:'Shirt',
7:'Sneaker',
8:'Sneaker',
9:'Bag',
10:'Ankle boot'})
plot_images_labels_prediction(trainX,trainY,prediction=[],idx=20,num=10)
#prediction: 預測結果; idx: 開始顯示的資料; num: 要顯示的資料筆數
#把image的數字標準化
trainX_normalize=trainX/225
testX_normalize=testX/225
#將圖片資料增加一個新的軸,最後一軸用來儲存CNN的不同filter輸出的結果
#0軸:樣本; 1,2軸:圖片; 3軸: 對filter做convolution後的結果
trainX_normalize=trainX_normalize[:,:,:,np.newaxis]
testX_normalize=testX_normalize[:,:,:,np.newaxis]
#label的資料預處理:利用One-hot enconding將0~9的數字轉換成10個0或1的組合,對應到10個神經元
trainY_OneHot=np_utils.to_categorical(trainY)
testY_OneHot=np_utils.to_categorical(testY)
卷積Convolutional neural network(CNN)
Yann LeCun在1998年提出LeNet-5演算法,利用4層2D CNN和3層密集層進行圖像辨識. CNN後來被推廣到1D,1D CNN可以用在字元等級的文字序列辨識.
LeNet Architecture, 1998
2D CNN
淺層的2D卷積可以將圖片轉換成邊框(local), 深層的2D卷積可以將圖片轉換為紋理與風格(global)
1D CNN
與循環神經網路(RNN)相較,用1D CNN來做文字辨識的好處是即便文字不在訓練資料出現,或是文字資料有拼字錯誤且較無次序, 仍可被1D CNN輸出到屬性正確的向量空間,因此非常適合用在無次序性的文章,e.g. 社群網路(facebook, twitter,微薄)和俚語與新字訓練
在本案例中,我們使用類似LeNet-5的演算法,此模型由2層CNN和2層密集層所組成.第一層CNN的輸入為三維陣列,的0,1軸為圖片向量;第2軸為卷積輸出向量. CNN的activation設定為'relu',最後一層密集層輸出為'softmax'
from keras.models import Sequential
from keras.layers import Dense, Conv2D,MaxPooling2D,Dropout,Flatten
#建立Sequential模型
model=Sequential()
model.add(Conv2D(filters=32,kernel_size=(3,3),padding='same',input_shape=(28,28,1),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10,activation='softmax'))
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 28, 28, 32) 320 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 14, 14, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 14, 14, 64) 18496 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 7, 7, 64) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 7, 7, 64) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 3136) 0 _________________________________________________________________ dense_1 (Dense) (None, 128) 401536 _________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 _________________________________________________________________ dense_2 (Dense) (None, 10) 1290 ================================================================= Total params: 421,642 Trainable params: 421,642 Non-trainable params: 0 _________________________________________________________________
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
#進行訓練
train_history=model.fit(x=trainX_normalize,y=trainY_OneHot, validation_split=0.2, epochs=10, batch_size=256, verbose=2)
Train on 48000 samples, validate on 12000 samples Epoch 1/10 - 1200s - loss: 0.6915 - acc: 0.7520 - val_loss: 0.4127 - val_acc: 0.8548 Epoch 2/10 - 1133s - loss: 0.4363 - acc: 0.8429 - val_loss: 0.3371 - val_acc: 0.8826 Epoch 3/10 - 1245s - loss: 0.3793 - acc: 0.8625 - val_loss: 0.3132 - val_acc: 0.8876 Epoch 4/10 - 1973s - loss: 0.3463 - acc: 0.8747 - val_loss: 0.2943 - val_acc: 0.8958 Epoch 5/10 - 1317s - loss: 0.3255 - acc: 0.8831 - val_loss: 0.2772 - val_acc: 0.8991 Epoch 6/10 - 1346s - loss: 0.3029 - acc: 0.8910 - val_loss: 0.2649 - val_acc: 0.9057 Epoch 7/10 - 1395s - loss: 0.2887 - acc: 0.8953 - val_loss: 0.2560 - val_acc: 0.9111 Epoch 8/10 - 1295s - loss: 0.2774 - acc: 0.8991 - val_loss: 0.2488 - val_acc: 0.9106 Epoch 9/10 - 1326s - loss: 0.2647 - acc: 0.9039 - val_loss: 0.2443 - val_acc: 0.9125 Epoch 10/10 - 1305s - loss: 0.2552 - acc: 0.9062 - val_loss: 0.2395 - val_acc: 0.9123#視覺化訓練過程 import matplotlib.pyplot as plt def show_train_history(train_history,train,validation): plt.plot(train_history.history[train]) plt.plot(train_history.history[validation]) plt.title('Train History') plt.ylabel(train) plt.xlabel('Epoch') plt.legend(['train','validation'],loc='upper left') plt.show() show_train_history(train_history,'acc','val_acc') #若訓練(train)的準確度一直增加而驗證(validation)的準確度沒有一直增加則可能是overfit #畫出loss誤差執行結果 show_train_history(train_history,'loss','val_loss') #測試model的準確度 score=model.evaluate(testX_normalize,testY_OneHot) print('accuracy=',score[1]) #進行預測 prediction=model.predict_classes(testX_normalize)
10000/10000 [==============================] - 110s 11ms/step accuracy= 0.9154
經過10次訓練後模型的準確度大約90%,有趣的是validation的精確度比traning set的精確度還要高,可能原因為:tranining的accuracy是這個epoch下235個batch(每個batch內有256筆資料)accuracy的平均值,每迭代一個batch(forward propagation and backward propagation)就會更新一次權重,因此一個epoch內總共更新了235次權重,分別對應235個accuracy, e.g. [0.2,0.5,0.6,...0.67],平均的accuracy是0.5,最後輸出運算validation的全中已經是經過235次更新的結果,因此計算validation的accuracy時使用了經過235次更新的權重,所以validation的accuracy比traning的accuracy高. 這個現象會發生在每個epoch內部有大量的batch時,例如超大數據集的案例.我們可以預測的是經過更多次的epoch最終training和validation的accuracy將會收收斂.
#顯示前30筆預測結果 (顯示第20筆到第50筆,共30筆data)
plot_images_labels_prediction(testX,testY,prediction,20,30)
#用pandas crosstab建立混淆矩陣
import pandas
pd.crosstab(testY, prediction,rownames=['label'],colnames=['prediction'])
prediction | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
label | ||||||||||
0 | 934 | 0 | 14 | 12 | 0 | 1 | 33 | 1 | 5 | 0 |
1 | 2 | 986 | 1 | 9 | 0 | 1 | 0 | 0 | 1 | 0 |
2 | 19 | 0 | 877 | 13 | 53 | 0 | 38 | 0 | 0 | 0 |
3 | 22 | 6 | 3 | 939 | 21 | 0 | 9 | 0 | 0 | 0 |
4 | 0 | 1 | 55 | 21 | 881 | 0 | 41 | 0 | 1 | 0 |
5 | 0 | 1 | 0 | 0 | 0 | 980 | 0 | 12 | 1 | 6 |
6 | 209 | 3 | 62 | 21 | 58 | 0 | 640 | 0 | 7 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 968 | 0 | 27 |
8 | 6 | 1 | 2 | 1 | 2 | 0 | 3 | 3 | 982 | 0 |
9 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 32 | 0 | 967 |
#將辨識錯誤的圖像過濾出來
df_prediction=pd.Series(prediction, name='prediction')
concated=pd.concat([df_prediction,testY],axis=1)
concated[concated['prediction']!=concated['label']].head()

df_prediction=pd.Series(prediction, name='prediction')
concated=pd.concat([df_prediction,testY],axis=1)
concated[concated['prediction']!=concated['label']].head()
prediction | label | |
---|---|---|
5 | 6 | 2 |
11 | 2 | 4 |
40 | 6 | 2 |
43 | 4 | 6 |
51 | 4 | 2 |
plt.imshow(testX[5])
print('preict=',label_class[concated.loc[5,'prediction']],'label=',label_class[concated.loc[5,'label']])
preict= Shirt label= Pullover