Pandas手冊（6）- pandas常用操作

這里整理下pandas常用的操作炼幔，為什么要寫這個呢乃秀？有本書《利用Python進行數(shù)據(jù)分析》一邊看一遍記錄下圆兵。

1. 重新索引(reindex)

就是重構(gòu)一下索引，在重構(gòu)的同時刀脏，我們可以做一些其他操作

DataFrame.reindex(index=None, columns=None, **kwargs)
Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

Series.reindex(index=None, **kwargs)
Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

一個小例子

obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])

obj
Out[156]: 
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

#reindex后愈污，沒有的值轮傍，默認會用NaN填充
obj.reindex(['a','b','c','d','e'])
Out[157]: 
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

#fill_value，常用的參數(shù)擎析，表示沒有數(shù)據(jù)時默認填充的值
obj.reindex(['a','b','c','d','e'] , fill_value=9.9)
Out[159]: 
a   -5.3
b    7.2
c    3.6
d    4.5
e    9.9
dtype: float64


#method,常用參數(shù)，在遞增或遞減index中桨醋，填充空值的方法
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])

obj3
Out[165]: 
0      blue
2    purple
4    yellow
dtype: object

obj3.reindex(range(6))
Out[170]: 
0      blue
1       NaN
2    purple
3       NaN
4    yellow
5       NaN
dtype: object

#ffill现斋，前向填充
obj3.reindex(range(6),method='ffill')
Out[167]: 
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

#bfill庄蹋，后向填充
obj3.reindex(range(6),method='bfill')
Out[171]: 
0      blue
1    purple
2    purple
3    yellow
4    yellow
5       NaN
dtype: object

對于DataFrame來說，用起來也是差不多的

2. 丟棄指定軸上的項

主要就是drop方法的使用

DataFrame.drop(labels, axis=0, level=None, inplace=False, errors='raise')
Return new object with labels in requested axis removed.

小例子

obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])

obj
Out[174]: 
a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

obj.drop('c')
Out[175]: 
a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

obj.drop(['b','d'])
Out[176]: 
a    0.0
c    2.0
e    4.0
dtype: float64


#DataFrame
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                 index=['Ohio', 'Colorado', 'Utah', 'New York'],
                 columns=['one', 'two', 'three', 'four'])

data
Out[178]: 
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

#默認是橫軸虫蝶，
data.drop(['Ohio','Utah'])
Out[179]: 
          one  two  three  four
Colorado    4    5      6     7
New York   12   13     14    15

#我們可以指定axis能真，在columns上刪除
data.drop(['two','four'],axis=1)
Out[180]: 
          one  three
Ohio        0      2
Colorado    4      6
Utah        8     10
New York   12     14

3. 算術(shù)運算和數(shù)據(jù)對齊

在numpy和pandas中好像都會看到這個詞扰柠，數(shù)據(jù)對齊，就是說2個對象在運算的時候蝙泼，會取一個并集劝枣，然后在自動對齊的時候，不重疊的部分就會填充NaN

小例子先看看

s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])

s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])

#index不重疊的地方茎活，會填充NaN
s1+s2
Out[188]: 
a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

#使用自帶的add方法琢唾，就可以填充默認值了，這個和我們上面reindex時的思想是一樣的
#Series.add(other, level=None, fill_value=None, axis=0)

s1.add(s2,fill_value=0)
Out[189]: 
a    5.2
c    1.1
d    3.4
e    0.0
f    4.0
g    3.1
dtype: float64

4.DataFrame和Series之間的運算

這里用到了一個廣播的思想懒熙，就是指不同形狀的數(shù)組之間的算術(shù)運算的執(zhí)行方式普办，很強大的功能，這里肢娘，我們先簡單了解下。
小例子

arr = np.arange(12.).reshape((3, 4))

arr
Out[191]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

arr[0]
Out[192]: array([ 0.,  1.,  2.,  3.])

#3行4列的數(shù)組橱健，減1行4列的數(shù)組，這就是廣播
arr - arr[0]
Out[193]: 
array([[ 0.,  0.,  0.,  0.],
       [ 4.,  4.,  4.,  4.],
       [ 8.,  8.,  8.,  8.]])

DataFrame和Series之間的計算也是這樣

frame = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
                  index=['Utah', 'Ohio', 'Texas', 'Oregon'])

frame
Out[195]: 
          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0

s = frame.iloc[0]

s
Out[197]: 
b    0.0
d    1.0
e    2.0
Name: Utah, dtype: float64

frame - s
Out[198]: 
          b    d    e
Utah    0.0  0.0  0.0
Ohio    3.0  3.0  3.0
Texas   6.0  6.0  6.0
Oregon  9.0  9.0  9.0

s = pd.Series(range(3),index=list('abc'))

frame
Out[223]: 
          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0

s
Out[224]: 
a    0
b    1
c    2
dtype: int32

frame.add(s)
Out[225]: 
         a     b   c   d   e
Utah   NaN   1.0 NaN NaN NaN
Ohio   NaN   4.0 NaN NaN NaN
Texas  NaN   7.0 NaN NaN NaN
Oregon NaN  10.0 NaN NaN NaN

#我們可以通過axis控制在哪個方向上去廣播
frame.add(s,axis=0)
Out[227]: 
         b   d   e
Ohio   NaN NaN NaN
Oregon NaN NaN NaN
Texas  NaN NaN NaN
Utah   NaN NaN NaN
a      NaN NaN NaN
b      NaN NaN NaN
c      NaN NaN NaN

在這里臼节，不能使用fill_value填充默認值网缝，還不知道為啥蟋定，總是報錯，說不支持

5. 函數(shù)應用和映射

這里主要是介紹DataFrame中的一個函數(shù)使用扼仲，apply促王，就是對DataFrame中的每一個元素執(zhí)行傳入的函數(shù)

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
Applies function along input axis of DataFrame.

小例子

f = lambda x: x+10

#每一個單元格都會加10
frame.apply(f)
Out[230]: 
           b     d     e
Utah    10.0  11.0  12.0
Ohio    13.0  14.0  15.0
Texas   16.0  17.0  18.0
Oregon  19.0  20.0  21.0


f = lambda x: x.max() - x.min()

frame.apply(f)
Out[232]: 
b    9.0
d    9.0
e    9.0
dtype: float64

#我們可以指定軸蝇狼，去執(zhí)行函數(shù)
frame.apply(f,axis=1)
Out[233]: 
Utah      2.0
Ohio      2.0
Texas     2.0
Oregon    2.0
dtype: float64

這里還有一個applymap函數(shù)

DataFrame.applymap(func)

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

這里得注意下迅耘，這2個函數(shù)的區(qū)別监署；
目前的理解是，applymap是元素級的钠乏，apply在軸上進行操作（貌似不太順，等明白了再記錄下）

f = lambda x: '${:,.3f}'.format(x)

frame
Out[237]: 
          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0

#前面簇捍，我們有用過俏拱，格式化內(nèi)容的
frame.applymap(f)
Out[238]: 
             b        d        e
Utah    $0.000   $1.000   $2.000
Ohio    $3.000   $4.000   $5.000
Texas   $6.000   $7.000   $8.000
Oregon  $9.000  $10.000  $11.000

6.處理缺失數(shù)據(jù)

在pandas中處理缺失數(shù)據(jù)非常容易，pandas使用浮點值NaN（Not a Number）表示缺失值事格。
前面，我們說過使用isnull來判斷是否有NaN值
小例子

a = pd.Series(['one','two',np.nan,'three'])

a
Out[240]: 
0      one
1      two
2      NaN
3    three
dtype: object

a.isnull()
Out[241]: 
0    False
1    False
2     True
3    False
dtype: bool

a.notnull()
Out[242]: 
0     True
1     True
2    False
3     True
dtype: bool

#Python內(nèi)置的None也會被當做NaN處理
a[4]=None

a
Out[247]: 
0      one
1      two
2      NaN
3    three
4     None
dtype: object

a.isnull()
Out[248]: 
0    False
1    False
2     True
3    False
4     True
dtype: bool

對于這種數(shù)據(jù)远搪，我們要怎樣處理呢么鹤？有的時候，我們可能會初始化為默認值棠耕，或者直接剔除掉
我們可以使用dropna函數(shù)來剔除掉柠新，或者布爾類型索引

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

a
Out[249]: 
0      one
1      two
2      NaN
3    three
4     None
dtype: object

a.dropna()
Out[250]: 
0      one
1      two
3    three
dtype: object

a[a.notnull()]
Out[251]: 
0      one
1      two
3    three
dtype: object

##dataframe

data = pd.DataFrame([[1., 6.5, 3.], [1., np.nan, np.nan],
                  [np.nan, np.nan, np.nan], [np.nan, 6.5, 3.]])

data
Out[254]: 
     0    1    2
0  1.0  6.5  3.0
1  1.0  NaN  NaN
2  NaN  NaN  NaN
3  NaN  6.5  3.0

#默認的話，會將行蕊退、列含有NaN的都剔除掉
data.dropna()
Out[255]: 
     0    1    2
0  1.0  6.5  3.0

#我們可以使用參數(shù)how來控制
how : {‘a(chǎn)ny’, ‘a(chǎn)ll’}

        any : if any NA values are present, drop that label
        all : if all values are NA, drop that label

data.dropna(how='all')
Out[257]: 
     0    1    2
0  1.0  6.5  3.0
1  1.0  NaN  NaN
3  NaN  6.5  3.0

有的時候憔恳，我們想要做填充而不是剔除，像我們前面使用的參數(shù)fill_value

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
Fill NA/NaN values using the specified method
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

data
Out[261]: 
     0    1    2
0  1.0  6.5  3.0
1  1.0  NaN  NaN
2  NaN  NaN  NaN
3  NaN  6.5  3.0

data.fillna(9.9)
Out[259]: 
     0    1    2
0  1.0  6.5  3.0
1  1.0  9.9  9.9
2  9.9  9.9  9.9
3  9.9  6.5  3.0

#使用method输硝，和前面reindex的時候是一個道理
data.fillna(method='ffill')
Out[262]: 
     0    1    2
0  1.0  6.5  3.0
1  1.0  6.5  3.0
2  1.0  6.5  3.0
3  1.0  6.5  3.0

最后編輯于：2017.12.11 02:53:27

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末点把，一起剝皮案震驚了整個濱河市屿附，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌挺份，老刑警劉巖，帶你破解...
沈念sama閱讀 222,378評論 6贊 516
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件影暴，死亡現(xiàn)場離奇詭異型宙，居然都是意外死亡，警方通過查閱死者的電腦和手機妆兑，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 94,970評論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來芯勘，“玉大人，你說我怎么就攤上這事荷愕」髅” “怎么了？”我有些...
開封第一講書人閱讀 168,983評論 0贊 362
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵荐类，是天一觀的道長茁帽。經(jīng)常有香客問我，道長吊输，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 59,938評論 1贊 299
?港島之戀（遺憾婚禮）
正文為了忘掉前任季蚂，我火速辦了婚禮癣蟋，結(jié)果婚禮上透硝，老公的妹妹穿的比我還像新娘。我一直安慰自己埋泵，他們只是感情好罪治，可當我...
茶點故事閱讀 68,955評論 6贊 398
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著雁社，像睡著了一般晒骇。火紅的嫁衣襯著肌膚如雪磺浙。梳的紋絲不亂的頭發(fā)上徒坡，一...
開封第一講書人閱讀 52,549評論 1贊 312
城市分裂傳說
那天，我揣著相機與錄音伦泥，去河邊找鬼锦溪。笑死，一個胖子當著我的面吹牛海洼，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播域帐，決...
沈念sama閱讀 41,063評論 3贊 422
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼是整，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了龙优？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,991評論 0贊 277
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤彤断，失蹤者是張志新（化名）和其女友劉穎宰衙，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體供炼，經(jīng)...
沈念sama閱讀 46,522評論 1贊 319
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡窘疮，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 38,604評論 3贊 342
?白月光啟示錄
正文我和宋清朗相戀三年闸衫，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片蔚出。...
茶點故事閱讀 40,742評論 1贊 353
活死人
序言：一個原本活蹦亂跳的男人離奇死亡含懊，死狀恐怖衅胀，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情雏门，我是刑警寧澤，帶...
沈念sama閱讀 36,413評論 5贊 351
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布茁影，位于F島的核電站丧凤，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏浩螺。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 42,094評論 3贊 335
男人毒藥：我在死后第九天來索命
文/蒙蒙一要出、第九天我趴在偏房一處隱蔽的房頂上張望农渊。院中可真熱鬧，春花似錦砸紊、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,572評論 0贊 25
一樁弒父案徽鼎，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽弹惦。三九已至，卻和暖如春棠隐，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背助泽。一陣腳步聲響...
開封第一講書人閱讀 33,671評論 1贊 274
情欲美人皮
我被黑心中介騙來泰國打工嚎京，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留鞍帝，地道東北人。一個月前我還...
沈念sama閱讀 49,159評論 3贊 378
代替公主和親
正文我出身青樓帕涌，卻偏偏與公主長得像续徽，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子纫版，可洞房花燭夜當晚...
茶點故事閱讀 45,747評論 2贊 361