本文的目的是:
通過Tushare獲取股票基本信息, 并對獲取的數(shù)據(jù)做進一步處理倘零。
Tushare是什么
Tushare是一個免費、開源的python財經(jīng)數(shù)據(jù)接口包戳寸。主要實現(xiàn)對股票等金融數(shù)據(jù)從數(shù)據(jù)采集呈驶、清洗加工 到 數(shù)據(jù)存儲的過程,能夠為金融分析人員提供快速疫鹊、整潔袖瞻、和多樣的便于分析的數(shù)據(jù),為他們在數(shù)據(jù)獲取方面極大地減輕工作量拆吆,使他們更加專注于策略和模型的研究與實現(xiàn)上聋迎。考慮到Python pandas包在金融量化分析中體現(xiàn)出的優(yōu)勢枣耀,Tushare返回的絕大部分的數(shù)據(jù)格式都是pandas DataFrame類型霉晕,非常便于用pandas/NumPy/Matplotlib進行數(shù)據(jù)分析和可視化。當然,如果您習慣了用Excel或者關系型數(shù)據(jù)庫做分析牺堰,您也可以通過Tushare的數(shù)據(jù)存儲功能拄轻,將數(shù)據(jù)全部保存到本地后進行分析。應一些用戶的請求伟葫,從0.2.5版本開始恨搓,Tushare同時兼容Python 2.x和Python 3.x,對部分代碼進行了重構筏养,并優(yōu)化了一些算法斧抱,確保數(shù)據(jù)獲取的高效和穩(wěn)定。
Tushare的安裝
假設是Windows平臺, 首先安裝Python. 個人建議使用Cygwin, 不會的話建議自己摸索下, 官方安裝教程渐溶。為了方便通過Cygwin安裝包, 建議安裝cyg-apt.
- 安裝Python
apt-cyg install python3 python3-pip
- 安裝Tushare
pip install pandas bs4 lxml tshare
- 測試當前版本
import tushare as ts print(ts.__version__)
使用Tushare獲取A股指數(shù)
使用文件存儲數(shù)據(jù)
直接上python3
腳本
#!/usr/bin/python3
#-*- coding: utf-8 -*-
# FileName: GetIndexFromTushare.py
import sys,os
from glob import glob
from datetime import datetime
from pathlib import Path
import tushare as ts
# Get current date
current_date = datetime.now()
''' Function: years_before
Para: current, i
current: current date
i: the date of i years before current date
Explanation:
We will get data by tushare, as it is suggested that
we should get data range in one year, insteading of
the whole data, to avoid the 465 response from data server
'''
def years_before(current, i):
return current.replace(year=current.year-i).strftime("%Y-%m-%d")
''' Function: combine_csvs
Para: thedir, basename, partten
thedir: the root directory
basename: the shared file names in the root directory
partten: the partten of file name to search
Explanation:
We will save the data of each year to a file named `basename_i.csv`, where `basename` is the stock name and `i` is the year from current year
then we will use this function to combine then into one csv file
'''
def combine_csvs(thedir, basename, partten):
fname = thedir + basename + ".csv"
# delete the dumplicated file
if (Path(fname).exists()):
os.remove(fname)
# search all the `./basename_i.csv` in to an array
csv_arr=glob(thedir+basename+partten)
#print(csv_arr)
# open the fname to write, a=append
fout=open(fname, "a")
for csv in csv_arr:
f=open(csv)
# remove the header of csv
if csv_arr.index(csv) != 0:
f.__next__()
for line in f:
fout.write(line)
f.close()
os.remove(csv)
fout.close()
''' Function: pairwise
Para: arr
Explanation錛?
Given a array, we will pair each nearby two elements into a new array
The arr will be each year of current date getting from `years_before`
function, it will return the range of each two year
'''
def pairwise(arr):
if not arr: return
for i in range(len(arr)-1):
yield arr[i], arr[i+1]
# define the stocks name, id and initial date
stocks ={
#'SHS_index' : ["000001", "2016-12-19"],
'AS_index' : ["000002", "1990-12-19"],
#'SZ300_index' : ["000300", "2005-04-08"],
}
for name, id_date in stocks.items():
id= id_date[0]
date= id_date[1]
#print(name, id, date)
# construct the range of time, separated by year
i=0
date_arr=[]
while( date < years_before(current_date, i) ):
date_arr.append(years_before(current_date, i))
i=i+1
date_arr.append(date)
#print(date_arr)
#Get the data
for date_e, date_b in pairwise(date_arr):
#print(date_b, date_e, "\n")
data = ts.get_h_data(id, start=date_b, end=date_e, index=True, pause=10, retry_count=5)
# save data to csv file
data.to_csv(name + '_' + str(date_arr.index(date_b)) + '.csv', columns=['date', 'high', 'low', 'close', 'volume', 'amount'])
# combine into one csv file
combine_csvs('./', name, "*.csv")
#sys.exit()