不確定年份excel面板數(shù)據(jù)的整理

通過搜索得到所有的年份漆魔，為了使得年份與財(cái)務(wù)指標(biāo)相對(duì)應(yīng)坷檩，需要在兩者之間建立鏈接。通過對(duì)指標(biāo)的排序進(jìn)行編號(hào)改抡，通過1矢炼，2，3阿纤，4...在年份與指標(biāo)之間建立鏈接句灌，形成面板數(shù)據(jù)。里邊涉及pandas的操作欠拾。

import xlrd
import os
import warnings
warnings.filterwarnings("ignore")
import pandas as pd 


#構(gòu)建236個(gè)公司的字典
companys=[]
for k in range(236):
    new_company={'excel_name':'','year_2020':'','year_2019':'','year_2018':'','year_2017':'',
    'year_2016':'','year_2015':'','year_2014':'','year_2013':'','year_2012':'','year_2011':'','year_2010':''
    ,'year_2009':'','year_2008':'','year_2007':'','year_2006':'','year_2005':'','year_2004':'','year_2003':'','year_2002':''
    ,'year_2001':'','year_2000':''}
    companys.append(new_company)
def del_null(ls):
    while '' in ls:
        ls.remove('')
    return ls

files=os.listdir('D:/company')
#---------------------------------------------------年份的獲取-------------
for k,j in enumerate(files):
    excel=r'D:/company/%s'%j
    companys[k]['excel_name']=j
    data=xlrd.open_workbook(excel)
    sheet=data.sheets()[0]
    for rowidx in range(sheet.nrows):
            row = sheet.row(rowidx)
            for colidx, cell in enumerate(row):
                for t in range(2000,2021):
                    if cell.value == "31/03/%s"%t:
                        companys[k]['year_%s'%t]='%s'%t
#剔除不存在記錄的年份
for k in companys:
    for j in range(2000,2021):
        if k['year_%s'%j]=='':
            k.pop('year_%s'%j)
df=pd.DataFrame(companys)
#print(companys)
dfs=df[['excel_name','year_2000']].dropna(axis=0, how='any').drop(columns=['year_2000'])
dfs['year']=2000

for k in range(2001,2021):
    dfnew=df[['excel_name','year_%s'%k]].dropna(axis=0, how='any').drop(columns=['year_%s'%k])
    dfnew['year']=k
    dfs=dfs.append(dfnew)
    print(dfnew)
dfs.to_csv('d:/企業(yè).csv')
#print(dfs)


#--------------------------------------指標(biāo)搜尋-------
#ta_temp,int_temp,ftemp,ctemp,clia_temp,operev_temp,sales_temp,gprofit_temp,pl_temp,rd_temp,ebitda_temp,export_temp=[]

ta_temp=[]
int_temp=[]
ftemp=[]
ctemp=[]
clia_temp=[]
operev_temp=[]
sales_temp=[]
gprofit_temp=[]
pl_temp=[]
rd_temp=[]
ebitda_temp=[]
export_temp=[]
empl_temp=[]
total_asset,int_asset,fix_asset,curr_asset,clia,operev,sales,gprofit,pl_ind,rd_ind,ebitda,export,employ=[],[],[],[],[],[],[],[],[],[],[],[],[]

id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,id11,id12,id13=[],[],[],[],[],[],[],[],[],[],[],[],[]
excel1,excel2,excel3,excel4,excel5,excel6,excel7,excel8,excel9,excel10,excel11,excel12,excel13=[],[],[],[],[],[],[],[],[],[],[],[],[]

for k,j in enumerate(files):
    excel=r'D:/company/%s'%j
    data=xlrd.open_workbook(excel)
    sheet=data.sheets()[0]
    for rowidx in range(sheet.nrows):
            row = sheet.row(rowidx)
            for colidx, cell in enumerate(row):
                if cell.value == "   TOTAL ASSETS":
                    #print(j,del_null(sheet.row_values(rowidx)))
                    at=del_null(sheet.row_values(rowidx))
                    at[0]=j
                    #print(asset)
                    ta_temp.append(at)
                if cell.value == "      Intangible fixed assets":
                    ia=del_null(sheet.row_values(rowidx))
                    ia[0]=j
                    int_temp.append(ia)
                    
                if cell.value == "   Fixed assets":
                    #print(j,del_null(sheet.row_values(rowidx)))
                    ft=del_null(sheet.row_values(rowidx))
                    ft[0]=j
                    ftemp.append(ft)
                if cell.value == "   Current assets":
                    ct=del_null(sheet.row_values(rowidx))
                    ct[0]=j
                    ctemp.append(ct)
                    #print(j,del_null(sheet.row_values(rowidx)))
                if cell.value == "   Current liabilities":
                    cl=del_null(sheet.row_values(rowidx))
                    cl[0]=j
                    #print(clia)
                    clia_temp.append(cl)
                    #print(j,del_null(sheet.row_values(rowidx)))
                if cell.value == " Profit & loss account":
                    orv=del_null(sheet.row_values(rowidx+1))
                    orv[0]=j
                    operev_temp.append(orv)
                    #print(j,del_null(sheet.row_values(rowidx+1)))
                if cell.value == "      Sales":
                    #print(j,del_null(sheet.row_values(rowidx)))
                    sa=del_null(sheet.row_values(rowidx))
                    sa[0]=j
                    sales_temp.append(sa)
                if cell.value == "   Gross profit":
                    gp=del_null(sheet.row_values(rowidx))
                    gp[0]=j
                    gprofit_temp.append(gp)
                    #print(j,del_null(sheet.row_values(rowidx)))
                if cell.value == "   P/L for period [=Net income]":
                    pl=del_null(sheet.row_values(rowidx))
                    pl[0]=j
                    pl_temp.append(pl)
                    #print(j,del_null(sheet.row_values(rowidx)))
                if cell.value == "    Research & Development expenses":
                    rds=del_null(sheet.row_values(rowidx))
                    rds[0]=j
                    rd_temp.append(rds)

                    #print(j,del_null(sheet.row_values(rowidx)))
                if cell.value == "    EBITDA":
                    eb=del_null(sheet.row_values(rowidx))
                    eb[0]=j
                    ebitda_temp.append(eb)
                    #print(j,del_null(sheet.row_values(rowidx)))

                if cell.value == "    Export revenue":
                    ex=del_null(sheet.row_values(rowidx))
                    ex[0]=j
                    export_temp.append(ex)
                    
                if cell.value == "   Number of employees":
                    em=del_null(sheet.row_values(rowidx))
                    em[0]=j
                    empl_temp.append(em)
                    #print(j,del_null(sheet.row_values(rowidx)))
#---------------------------第一個(gè)指標(biāo)
for k,j in enumerate(ta_temp):
    for m,n in enumerate(j[1:]):
        total_asset.append(n)
        n=j[0]
        excel1.append(n)
        id1.append(m)
df1=pd.DataFrame(columns=['id1','excel1','total_asset'])
df1['id1']=id1
df1['total_asset']=total_asset
df1['excel1']=excel1
df1=df1.drop_duplicates()
print(df1)
df1.to_csv('d:/df1.csv')



#--------------------------第二個(gè)指標(biāo)

for k,j in enumerate(int_temp):
    for m,n in enumerate(j[1:]):
        int_asset.append(n)
        n=j[0]
        excel2.append(n)
        id2.append(m)
df2=pd.DataFrame(columns=['id2','excel2','int_asset'])
df2['id2']=id2
df2['int_asset']=int_asset
df2['excel2']=excel2
df2=df2.drop_duplicates()
df2.to_csv('d:/df2.csv')
print(df2)
#-----------------------------第三個(gè)指標(biāo)

#print(ftemp)               
for k,j in enumerate(ftemp):
    for m,n in enumerate(j[1:]):
        fix_asset.append(n)
        n=j[0]
        excel3.append(n)
        id3.append(m)
#print(id3)
#print(excel3)
df3=pd.DataFrame(columns=['id3','excel3','fix_asset'])
df3['id3']=id3
df3['fix_asset']=fix_asset
df3['excel3']=excel3
df3=df3.drop_duplicates()
df3.to_csv('d:/df3.csv')
print(df3)
#------------------------------第四個(gè)指標(biāo)-------
for k,j in enumerate(ctemp):
    for m,n in enumerate(j[1:]):
        curr_asset.append(n)
        n=j[0]
        excel4.append(n)
        id4.append(m)
#print(id4)
#print(excel4)
df4=pd.DataFrame(columns=['id4','excel4','curr_asset'])
df4['id4']=id4
df4['curr_asset']=curr_asset
df4['excel4']=excel4
df4=df4.drop_duplicates()
print(df4)
df4.to_csv('d:/df4.csv')
#----------------------------第五個(gè)指標(biāo)

for k,j in enumerate(clia_temp):
    for m,n in enumerate(j[1:]):
        clia.append(n)
        n=j[0]
        excel5.append(n)
        id5.append(m)
#print(clia_temp)
#print(id5)
#print(excel5)
df5=pd.DataFrame(columns=['id5','excel5','clia'])
df5['id5']=id5
df5['clia']=clia
df5['excel5']=excel5
df5=df5.drop_duplicates()
print(df5)
df5.to_csv('d:/df5.csv')

#----------------------第六個(gè)指標(biāo)----
for k,j in enumerate(operev_temp):
    for m,n in enumerate(j[1:]):
        operev.append(n)
        n=j[0]
        excel6.append(n)
        id6.append(m)

df6=pd.DataFrame(columns=['id6','excel6','operev'])
df6['id6']=id6
df6['operev']=operev
df6['excel6']=excel6
df6=df6.drop_duplicates()
print(df6)
df6.to_csv('d:/df6.csv')
#--------------第七個(gè)指標(biāo)
for k,j in enumerate(sales_temp):
    for m,n in enumerate(j[1:]):
        sales.append(n)
        n=j[0]
        excel7.append(n)
        id7.append(m)

df7=pd.DataFrame(columns=['id7','excel7','sales'])
df7['id7']=id7
df7['sales']=sales
df7['excel7']=excel7
df7=df7.drop_duplicates()
print(df7)
df7.to_csv('d:/df7.csv')

#----------------------第八個(gè)指標(biāo)-------
for k,j in enumerate(gprofit_temp):
    for m,n in enumerate(j[1:]):
        gprofit.append(n)
        n=j[0]
        excel8.append(n)
        id8.append(m)

df8=pd.DataFrame(columns=['id8','excel8','gprofit'])
df8['id8']=id8
df8['gprofit']=gprofit
df8['excel8']=excel8
df8=df8.drop_duplicates()
print(df8)
df8.to_csv('d:/df8.csv')
#------------------------------第九個(gè)指標(biāo)
for k,j in enumerate(pl_temp):
    for m,n in enumerate(j[1:]):
        pl_ind.append(n)
        n=j[0]
        excel9.append(n)
        id9.append(m)

df9=pd.DataFrame(columns=['id9','excel9','pl_ind'])
df9['id9']=id9
df9['pl_ind']=pl_ind
df9['excel9']=excel9
df9=df9.drop_duplicates()
print(df9)
df9.to_csv('d:/df9.csv')
#-----------------------------第十個(gè)指標(biāo)-----

for k,j in enumerate(rd_temp):
    for m,n in enumerate(j[1:]):
        rd_ind.append(n)
        n=j[0]
        excel10.append(n)
        id10.append(m)

df10=pd.DataFrame(columns=['id10','excel10','rd_ind'])
df10['id10']=id10
df10['rd_ind']=rd_ind
df10['excel10']=excel10
df10=df10.drop_duplicates()
print(df10)
df10.to_csv('d:/df10.csv')
#-------------------第11個(gè)指標(biāo)---------
for k,j in enumerate(ebitda_temp):
    for m,n in enumerate(j[1:]):
        ebitda.append(n)
        n=j[0]
        excel11.append(n)
        id11.append(m)

df11=pd.DataFrame(columns=['id11','excel11','ebitda'])
df11['id11']=id11
df11['ebitda']=ebitda
df11['excel11']=excel11
df11=df11.drop_duplicates()
print(df11)
df11.to_csv('d:/df11.csv')

#--------------------------第12個(gè)指標(biāo)

for k,j in enumerate(export_temp):
    for m,n in enumerate(j[1:]):
        export.append(n)
        n=j[0]
        excel12.append(n)
        id12.append(m)

df12=pd.DataFrame(columns=['id12','excel12','export'])
df12['id12']=id12
df12['export']=export
df12['excel12']=excel12
df12=df12.drop_duplicates()
print(df12)
df12.to_csv('d:/df12.csv')

#--------------------------第13個(gè)指標(biāo)

for k,j in enumerate(empl_temp):
    for m,n in enumerate(j[1:]):
        employ.append(n)
        n=j[0]
        excel13.append(n)
        id13.append(m)

df13=pd.DataFrame(columns=['id13','excel13','employ'])
df13['id13']=id13
df13['employ']=employ
df13['excel13']=excel13
df13=df13.drop_duplicates()
print(df13)
df13.to_csv('d:df13.csv')

最后編輯于：2020.10.28 16:52:39

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末胰锌，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子藐窄，更是在濱河造成了極大的恐慌资昧，老刑警劉巖，帶你破解...
沈念sama閱讀 224,764評(píng)論 6贊 522
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件荆忍，死亡現(xiàn)場(chǎng)離奇詭異格带，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)刹枉，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 96,235評(píng)論 3贊 402
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門叽唱，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人嘶卧，你說我怎么就攤上這事尔觉。” “怎么了芥吟？”我有些...
開封第一講書人閱讀 171,965評(píng)論 0贊 366
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵侦铜，是天一觀的道長(zhǎng)专甩。經(jīng)常有香客問我，道長(zhǎng)钉稍，這世上最難降的妖魔是什么涤躲？我笑而不...
開封第一講書人閱讀 60,984評(píng)論 1贊 300
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮贡未，結(jié)果婚禮上种樱，老公的妹妹穿的比我還像新娘。我一直安慰自己俊卤，他們只是感情好嫩挤，可當(dāng)我...
茶點(diǎn)故事閱讀 69,984評(píng)論 6贊 399
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著消恍，像睡著了一般岂昭。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上狠怨，一...
開封第一講書人閱讀 53,471評(píng)論 1贊 314
城市分裂傳說
那天约啊，我揣著相機(jī)與錄音，去河邊找鬼佣赖。笑死恰矩，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的憎蛤。我是一名探鬼主播外傅，決...
沈念sama閱讀 41,844評(píng)論 3贊 428
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長(zhǎng)吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼蹂午！你這毒婦竟也來了栏豺？” 一聲冷哼從身側(cè)響起彬碱，我...
開封第一講書人閱讀 40,818評(píng)論 0贊 279
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤豆胸，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后巷疼，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體晚胡，經(jīng)...
沈念sama閱讀 47,359評(píng)論 1贊 324
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 39,385評(píng)論 3贊 346
?白月光啟示錄
正文我和宋清朗相戀三年嚼沿，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了估盘。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 41,515評(píng)論 1贊 354
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡骡尽，死狀恐怖遣妥，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情攀细，我是刑警寧澤箫踩，帶...
沈念sama閱讀 37,114評(píng)論 5贊 350
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布爱态，位于F島的核電站，受9級(jí)特大地震影響境钟，放射性物質(zhì)發(fā)生泄漏锦担。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 42,836評(píng)論 3贊 338
男人毒藥：我在死后第九天來索命
文/蒙蒙一慨削、第九天我趴在偏房一處隱蔽的房頂上張望洞渔。院中可真熱鬧，春花似錦缚态、人聲如沸磁椒。這莊子的主人今日做“春日...
開封第一講書人閱讀 33,291評(píng)論 0贊 25
一樁弒父案玫芦，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽衷快。三九已至，卻和暖如春姨俩，著一層夾襖步出監(jiān)牢的瞬間蘸拔，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 34,422評(píng)論 1贊 275
情欲美人皮
我被黑心中介騙來泰國打工环葵，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留调窍，地道東北人。一個(gè)月前我還...
沈念sama閱讀 50,064評(píng)論 3贊 381
代替公主和親
正文我出身青樓张遭，卻偏偏與公主長(zhǎng)得像邓萨，于是被迫代替她去往敵國和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子菊卷，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 46,581評(píng)論 2贊 365