pandas 實(shí)現(xiàn) Excel 常見操作 (2)

接下來介紹 pandas 的基本操作中床牧，大體相當(dāng)于 Excel 的合計(jì)丈牢、小計(jì) (subtotal) 和數(shù)據(jù)透視表的方法。pandas 的數(shù)據(jù)統(tǒng)計(jì)功能比 Excel 強(qiáng)大很多，方式也更加靈活相种。Excel 問題之一就是數(shù)據(jù)的存儲和顯示不分離，而 pandas 和數(shù)據(jù)庫處理方式類似东抹，數(shù)據(jù)的存儲和顯示分離蚂子。

計(jì)算合計(jì)數(shù)

使用上一篇的 Excel 文件作為分析的數(shù)據(jù)源，假如需要對各個(gè)月份以及月份合計(jì)數(shù)進(jìn)行求和缭黔。pandas 可以對 Series 運(yùn)行 sum() 方法來計(jì)算合計(jì)：

import pandas as pd
import numpy as np

df = pd.read_excel('http://pbpython.com/extras/excel-comp-data.xlsx');
df['Total'] = df.Jan + df.Feb + df.Mar

# sum_row 的類型是 pandas.core.series.Series, Jan, Feb 等成為 Series 的 index
sum_row = df[['Jan', 'Feb', 'Mar', 'Total']].sum()

也可以將 sum_row 轉(zhuǎn)換成 DataFrame, 以列的方式查看食茎。DataFrame 的 T 方法實(shí)現(xiàn)行列互換。

# 轉(zhuǎn)置變成 DataFrame
df_sum = pd.DataFrame(data=sum_row).T
df_sum

如果想要把合計(jì)數(shù)放在數(shù)據(jù)的下方馏谨，則要稍作加工别渔。首先通過 reindex() 函數(shù)將 df_sum 變成與 df 具有相同的列，然后再通過 append() 方法，將合計(jì)行放在數(shù)據(jù)的后面：

# 轉(zhuǎn)置變成 DataFrame
df_sum = pd.DataFrame(data=sum_row).T

# 將 df_sum 添加到 df
df_sum = df_sum.reindex(columns=df.columns)

# append 創(chuàng)建一個(gè)新的 DataFrame
df_with_total = df.append(df_sum, ignore_index=True)

分類匯總

Excel 的分類匯總功能哎媚，在數(shù)據(jù)功能區(qū)喇伯，但因?yàn)榉诸悈R總需要對數(shù)據(jù)進(jìn)行排序，并且分類匯總的數(shù)據(jù)與明細(xì)數(shù)據(jù)混在一起拨与，個(gè)人很少用到稻据，分類匯總一般使用數(shù)據(jù)透視表。

而在 pandas 進(jìn)行分類匯總买喧，可以使用 DataFrame 的 groupby() 函數(shù)捻悯，然后再對 groupby() 生成的 pandas.core.groupby.DataFrameGroupBy 對象進(jìn)行求和：

df_groupby = df[['state','Jan', 'Feb','Mar', 'Total']].groupby('state').sum()
df_groupby.head()

數(shù)據(jù)格式化

pandas 默認(rèn)的數(shù)據(jù)顯示，沒有使用千分位分隔符淤毛，在數(shù)據(jù)較大時(shí)今缚，感覺不方便。如果需要對數(shù)據(jù)的顯示格式化低淡，可以自定義一個(gè)函數(shù) number_format()姓言，然后對 DataFrame 運(yùn)行 applymap(number_format) 函數(shù)。applymap() 函數(shù)對 DataFrame 中每一個(gè)元素都運(yùn)行 number_format 函數(shù)蔗蹋。number_format 函數(shù)接受的參數(shù)必須為標(biāo)量值何荚，返回的也是標(biāo)量值。

# 數(shù)字格式化
def number_format(x):
    return "{:,.0f}".format(x) # 使用逗號分隔,沒有小數(shù)位

formated_df = df_groupby.applymap(number_format)
formated_df.head()

數(shù)據(jù)透視表

pandas 運(yùn)行數(shù)據(jù)透視表纸颜，使用 pivot_table() 方法兽泣。熟練使用 pivot_table() 需要一些練習(xí)。這里只是介紹最基本的功能：

index 參數(shù)：按什么條件進(jìn)行匯總
values 參數(shù)：對哪些數(shù)據(jù)進(jìn)行計(jì)算
aggfunc 參數(shù)：aggregation function胁孙，執(zhí)行什么運(yùn)算

# pivot table
# pd.pivot_table 生成一個(gè)新的 DataFrame
df_pivot = pd.pivot_table(df, index=['state'], values=['Jan','Feb','Mar','Total'], aggfunc= np.sum)

最后編輯于：2018.07.22 14:28:06

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者