地球新視野: [自然語言]用LSTM創作文字內容

步驟：
1. 字元等級神經語言模型(character-level neural language model)：
使用LSTM層從文字庫中(尼采文章)以Ｎ個字元的字串作為輸入, 學習預測第Ｎ+1個字元的機率分佈,來建立字元等級神經語言模型(character-level neural language model).

2. 以逐一字元生成的方式產生文字資料
輸入Ｎ個測試字元的字串,用上面建立的字元等級神經語言模型預測第Ｎ+1個出現機率最高的字元,將此字元加入原先的測試字串末端，再送入語言模型......

import keras
import numpy as np

path = keras.utils.get_file(
'nietzsche.txt',
origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

maxlen = 60
step = 3

sentences = []

next_chars = []

for i in range(0, len(text) - maxlen, step):
sentences.append(text[i: i + maxlen]) #從i到i+maxlen-1
next_chars.append(text[i + maxlen]) #i+maxlen

print('Number of sequences:', len(sentences))

#做出字元轉換成數字的字典
chars = sorted(list(set(text))) #列出所有出現在文章的字元
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)

#將訓練的字元向量化
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
x[i, t, char_indices[char]] = 1
y[i, char_indices[next_chars[i]]] = 1

from keras import layers

model = keras.models.Sequential()

model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))

model.add(layers.Dense(len(chars), activation='softmax')) #softmax for機率分佈

optimizer = keras.optimizers.RMSprop(lr=0.01)

model.compile(loss='categorical_crossentropy', optimizer=optimizer)

#原始機率分佈受到溫度擾動而改變分佈機率,並傳回機率最大值的索引

def sample(preds, temperature=1.0):

preds = np.asarray(preds).astype('float64')

preds = np.log(preds) / temperature

exp_preds = np.exp(preds)

preds = exp_preds / np.sum(exp_preds)

probas = np.random.multinomial(1, preds, 1) #丟一次骰子作為一次實驗,每一面出現機率為preds,回傳每個字元出現的“次數”串列

return np.argmax(probas) #回傳結果陣列最大的索引值

import random

import sys

for epoch in range(1, 60):

print('epoch', epoch)

model.fit(x, y, batch_size=128, epochs=1)

start_index = random.randint(0, len(text) - maxlen - 1)

generated_text = text[start_index: start_index + maxlen]

print('--- Generating with seed: "' + generated_text + '"')

for temperature in [0.2, 0.5, 1.0, 1.2]:

print('------ temperature:', temperature)

sys.stdout.write(generated_text) #印出generated_text

#找出接下來最可能出現的400個字元

for i in range(400):

#將測試字串sampled向量化

sampled = np.zeros((1, maxlen, len(chars)))

for t, char in enumerate(generated_text):

sampled[0, t, char_indices[char]] = 1.

#將sampled丟入模型進預測,並不顯示過程(verbose=0)

preds = model.predict(sampled, verbose=0)[0]

#將預測的出現字元機率進行溫度擾動,回傳這個出現字元的索引值

next_index = sample(preds, temperature)

#將字元索引值帶入字典,取得字元

next_char = chars[next_index]

#將字元接到輸入文字的最下方,並刪除第一個字元

generated_text += next_char

generated_text = generated_text[1:]

sys.stdout.write(next_char) #印出generated_text

#第一次訓練

epoch 1
Epoch 1/1
200278/200278 [==============================] - 472s 2ms/step - loss: 1.9867
--- Generating with seed: "ie et sans esprit!

#將“in these later ages, which may be”丟入訓練模型,用不同的溫度來創作文章

229. in these later ages, which may be "
------ temperature: 0.2
ie et sans esprit!
229. in these later ages, which may be the such the still and the sure and and still the present the sure the man and the presenter the for the still and the sure of the from the still the man be the string the man becount and still the the the the sure the still that the soul and a more of the sure the still the stright the stright and the still the sure the sure and the moral the still the sure the sure and the still the sure and the

------ temperature: 0.5
l the still the sure the sure and the still the sure and the perhaps so this disto the philosopher that the string and will the super; and still the same with the desilse of the shill ana
suld conter the free solition of the than the sure for the sure desilse of presentate of the soulh and the for the histances to litely than this incression, and from the preale of contention, in the precestion, that the present to the discienteness than the is and are thi

------ temperature: 1.0
hat the present to the discienteness than the is and are thing, the musf lattire tercies somolian of remord hister
dolecy, with men ye of suses, is and a
corstare--thas and sole of the consciencly to yees as lose. the denchsion in the fantific fan and stone, and trung with their cincolne, and spoled asted to som suef
agage well in regeraving of real the spirits
mistodicato high returdisming
powhing.--as yre? of sulf the doven weally froe it wat give wo le

地球秘境

2019年6月13日星期四

[自然語言]用LSTM創作文字內容

三倍槓桿和一倍槓桿的長期定期定額報酬率分析

地球秘境

2019年6月13日 星期四

[自然語言]用LSTM創作文字內容

三倍槓桿和一倍槓桿的長期定期定額報酬率分析

2019年6月13日星期四