本篇我們要從台灣證交所的網站上用網路爬蟲的方式將每日的股票價格儲存到電腦主機上
首先我們先加入必要的Python模組
import datetime
import requests
from io import StringIO
import pandas as pd
import numpy as np
import time
設定爬蟲的開始和結束的日期'%d-%m-%Y'
start = datetime.datetime.strptime("01-01-2018", "%d-%m-%Y")
end = datetime.datetime.strptime("03-06-2019", "%d-%m-%Y")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
以for迴圈抓取每日股票資訊,並存入硬碟,休息五秒再讀取下一筆資料以免被遠端伺服器封鎖
for i in date_generated:
print(i.strftime("%Y%m%d"))
datestr=i.strftime("%Y%m%d")
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
if len(r.text)>0:
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '})
for i in r.text.split('\n')
if len(i.split('",')) == 17 and i[0] != '='])), header=0)
df.to_csv('stock/'+datestr)
time.sleep( 5 )