這幾天在整 一些日志數(shù)據(jù)寝优,其實經(jīng)常整 日志數(shù)據(jù)的 會知道条舔,一般 每一條 日志數(shù)據(jù) 多多少少會含有某些列含有 json 數(shù)據(jù),大部分吧乏矾,我們的json 格式有時候還不固定孟抗,在解析時還是非常困難的
我們導(dǎo)出一部分 日志數(shù)據(jù) excel 格式,然后用pandas 讀取钻心,然后 在用pandas
dataframe.to_csv() 保存為 txt格式的時候發(fā)現(xiàn) json 所在的列 全部都多加了雙引號 凄硼,包括連json里的鍵值對也都被又添加了雙引號,這就不正常了
原來的excel
期望txt 的格式
f953979eacc708d37ee3e83cd41f327b 韓彬 13439073455 412321197907150209 2018/1/15 {"CODE":"200","PHONE":"13439073455","PROVINCE":"北京","CITY":"北京","RESULTS":[{"TYPE":"EMR002","CYCLE":"2016-06-12--2018-06-12","DATA":[{"P_TYPE":"2","PLATFORMCODE":"EM_0000003265","REGISTERTIME":"2018-01-20 20:39:25"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000182849","REGISTERTIME":"2018-03-20 06:31:48"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000183420","REGISTERTIME":"2018-05-26 19:23:42"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000003679","REGISTERTIME":"2017-04-02 12:58:30"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000180836","REGISTERTIME":"2017-04-11 13:37:54"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000003663","REGISTERTIME":"2018-01-26 14:50:07"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000182984","REGISTERTIME":"2018-01-16 13:59:38"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000181607","REGISTERTIME":"2017-12-29 12:16:39"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000207201","REGISTERTIME":"2018-02-02 15:35:03"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000183866","REGISTERTIME":"2018-01-20 22:06:38"}]},{"TYPE":"EMR004","CYCLE":"2016-06-12--2018-06-12","DATA":[{"P_TYPE":"2","PLATFORMCODE":"EM_0000207201","APPLICATIONTIME":"2018-02-02 15:35:03","APPLICATIONAMOUNT":"0.2W~0.5W","APPLICATIONRESULT":"Yes"}]},{"TYPE":"EMR007","CYCLE":"2016-06-12--2018-06-12","DATA":[{"P_TYPE":"2","PLATFORMCODE":"EM_0000207201","LOANLENDERSTIME":"2018-02-02 15:35:03","LOANLENDERSAMOUNT":"0.2W~0.5W"}]},{"TYPE":"EMR009","CYCLE":"2016-06-12--2018-06-12","DATA":[]},{"TYPE":"EMR012","CYCLE":"2016-06-12--2018-06-12","DATA":[]},{"TYPE":"EMR013","CYCLE":"2016-06-12--2018-06-12","DATA":[]}]}
7ee23bdbadcc92aacfa63350c487b9ae 王偉 18660714555 370882199008051615 2018/1/18 {"CODE":"200","PHONE":"18660714555","PROVINCE":"山東","CITY":"濟寧市","RESULTS":[{"TYPE":"EMR002","CYCLE":"2016-06-12--2018-06-12","DATA":[{"P_TYPE":"1","PLATFORMCODE":"EM_0000002247","REGISTERTIME":"2018-01-29 15:26:51"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000207201","REGISTERTIME":"2018-02-02 15:35:07"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000001851","REGISTERTIME":"2018-05-02 10:46:43"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000004397","REGISTERTIME":"2018-05-06 18:35:11"},{"P_TYPE":"2","PLATFORMCODE":"EM_0000003663","REGISTERTIME":"2018-01-26 14:50:10"}]},{"TYPE":"EMR004","CYCLE":"2016-06-12--2018-06-12","DATA":[{"P_TYPE":"2","PLATFORMCODE":"EM_0000207201","APPLICATIONTIME":"2018-02-02 15:35:07","APPLICATIONAMOUNT":"0.2W~0.5W","APPLICATIONRESULT":"Yes"}]},{"TYPE":"EMR007","CYCLE":"2016-06-12--2018-06-12","DATA":[{"P_TYPE":"2","PLATFORMCODE":"EM_0000207201","LOANLENDERSTIME":"2018-02-02 15:35:07","LOANLENDERSAMOUNT":"0.2W~0.5W"}]},{"TYPE":"EMR009","CYCLE":"2016-06-12--2018-06-12","DATA":[]},{"TYPE":"EMR012","CYCLE":"2016-06-12--2018-06-12","DATA":[]},{"TYPE":"EMR013","CYCLE":"2016-06-12--2018-06-12","DATA":[]}]}
dataframe.to_csv() 異常結(jié)果 txt
后來發(fā)現(xiàn) 直接從excel 粘貼復(fù)制到 txt 反而是正常的捷沸,但是 這個要程序化摊沉,就不可能使用手工粘貼復(fù)制,那怎么辦呢
最后發(fā)現(xiàn)還是使用最古老的 with open df.apply() 來寫文件 是最好用的
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
etl_Data=pd.read_excel(etl_label_path,sheet_name='Sheet1',header=0,encoding='utf-8',dtype={'mobile':np.str})
print(gandata.columns)
mergeda = pd.merge(gandata, etl_Data, how='inner', left_on='手機號1', right_on='mobile'
'', suffixes=('_r', '_y'))
ex_merge=mergeda[['gid', 'realname', 'mobile', 'certid', 'apply_time','外部接口返回json']]
ex_path='data_AA104p1.txt'
with open(ex_path,'w',encoding='utf-8') as f:
ex_merge.apply(lambda row: f.write(str(row[0])+'\t'+str(row[1])+'\t'+str(row[2])+'\t'+str(row[3])+'\t'+str(row[4])+'\t'+str(row[5])+'\n'),axis=1)