折騰了一下午,也算是有一點點心得了座掘,還好沒有太早放棄吧递惋。總覺得別人已經(jīng)在玩很高深精妙的東西溢陪,而我只是在做一些最基礎(chǔ)的東西萍虽。。嬉愧。
第一步
改變數(shù)據(jù)存儲類型
data[['lag', 'L','S','B']] = data[['lag', 'L','S','B']].astype(np.float16)
data[['T']]=data[['T']].astype(np.float32)
第二步
改變數(shù)據(jù)存儲文件贩挣,從csv換成hdf或者feather,二進(jìn)制存儲相比csv快的真的不是一點點没酣。王财。。
pandas.read_hdf
pandas.read_hdf
(path_or_buf, key=None, mode: str = 'r', errors: str = 'strict', where=None, start: Union[int, NoneType] = None, stop: Union[int, NoneType] = None, columns=None, iterator=False, chunksize: Union[int, NoneType] = None, **kwargs)[source]
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_hdf.html
pandas.DataFrame.to_hdf
DataFrame.to_hdf
(self, path_or_buf, key: str, mode: str = 'a', complevel: Union[int, NoneType] = None, complib: Union[str, NoneType] = None, append: bool = False, format: Union[str, NoneType] = None, index: bool = True, min_itemsize: Union[int, Dict[str, int], NoneType] = None, nan_rep=None, dropna: Union[bool, NoneType] = None, data_columns: Union[List[str], NoneType] = None, errors: str = 'strict', encoding: str = 'UTF-8') → None[source]
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html
pandas.DataFrame.to_feather?
DataFrame.to_feather
(self, path) → None[source]
pandas.read_feather
pandas.read_feather
(path, columns=None, use_threads: bool = True)[source]
data_store = pd.HDFStore('data_1215.h5')
# 將 DataFrame 放進(jìn)對象中裕便,并設(shè)置 key 為 D1215
data_store['D1215'] = data
data_store.close()
##use hdf to write: 41.06633472442627 s
time1 = time.time()
data=pd.read_hdf('data_1215.h5',key='D1215')
time2 = time.time()
print("use hdf to read:", time2 - time1,"s")
print(data.head())
## use hdf to read: 11.263915061950684 s
第三步
需要研究下怎么進(jìn)行批量處理绒净,未完待續(xù)。
參考:
https://zhuanlan.zhihu.com/p/56541628
https://zhuanlan.zhihu.com/p/69221436
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#performance-considerations
[https://blog.csdn.net/hzau_yang/article/details/78485879]