Pandas中文官檔~基礎(chǔ)用法1

呆鳥云：“在學(xué)習(xí) Python 數(shù)據(jù)分析的過程中，呆鳥發(fā)現(xiàn)直接看官檔就是牛逼啊呼伸，內(nèi)容全面、豐富钝尸、詳細(xì)括享，而 Python 數(shù)據(jù)分析里最核心的莫過于 pandas，于是就想翻譯 pandas 官檔珍促，于是就發(fā)現(xiàn)了 pypandas.cn 這個(gè)項(xiàng)目铃辖，于是就加入了 pandas 中文官檔翻譯小組，于是就沒時(shí)間更新公眾號猪叙，于是就犯懶想把翻譯與校譯的 pandas 當(dāng)公眾號文章發(fā)上來娇斩，于是今后大家就可以在這里看了⊙妫”

本節(jié)介紹 pandas 數(shù)據(jù)結(jié)構(gòu)的基礎(chǔ)用法犬第。下列代碼創(chuàng)建示例數(shù)據(jù)對象：

In [1]: index = pd.date_range('1/1/2000', periods=8)

In [2]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [3]: df = pd.DataFrame(np.random.randn(8, 3), index=index,
   ...:                   columns=['A', 'B', 'C'])
   ...:

Head 與 Tail

head() 與 tail() 用于快速預(yù)覽 Series 與 DataFrame，默認(rèn)顯示 5 條數(shù)據(jù)芒帕，也可以指定要顯示的數(shù)量歉嗓。

In [4]: long_series = pd.Series(np.random.randn(1000))

In [5]: long_series.head()
Out[5]: 
0   -1.157892
1   -1.344312
2    0.844885
3    1.075770
4   -0.109050
dtype: float64

In [6]: long_series.tail(3)
Out[6]: 
997   -0.289388
998   -1.020544
999    0.589993
dtype: float64

屬性與底層數(shù)據(jù)

Pandas 可以通過多個(gè)屬性訪問元數(shù)據(jù)：

shape:
- 輸出對象的軸維度，與 ndarray 一致
軸標(biāo)簽
- Series: Index (僅有此軸)
- DataFrame: Index (行) 與列

注意： 為屬性賦值是安全的背蟆！

In [7]: df[:2]
Out[7]: 
                   A         B         C
2000-01-01 -0.173215  0.119209 -1.044236
2000-01-02 -0.861849 -2.104569 -0.494929

In [8]: df.columns = [x.lower() for x in df.columns]

In [9]: df
Out[9]: 
                   a         b         c
2000-01-01 -0.173215  0.119209 -1.044236
2000-01-02 -0.861849 -2.104569 -0.494929
2000-01-03  1.071804  0.721555 -0.706771
2000-01-04 -1.039575  0.271860 -0.424972
2000-01-05  0.567020  0.276232 -1.087401
2000-01-06 -0.673690  0.113648 -1.478427
2000-01-07  0.524988  0.404705  0.577046
2000-01-08 -1.715002 -1.039268 -0.370647

Pandas 對象（Index鉴分， Series， DataFrame）相當(dāng)于數(shù)組的容器带膀，用于存儲數(shù)據(jù)志珍，并執(zhí)行計(jì)算。大部分類型的底層數(shù)組都是 numpy.ndarray垛叨。不過伦糯，pandas 與第三方支持庫一般都會擴(kuò)展 Numpy 類型系統(tǒng)，添加自定義數(shù)組（見數(shù)據(jù)類型）嗽元。

獲取 Index 或 Series 里的數(shù)據(jù)敛纲，請用 .array 屬性。

In [10]: s.array
Out[10]: 
<PandasArray>
[ 0.4691122999071863, -0.2828633443286633, -1.5090585031735124,
 -1.1356323710171934,  1.2121120250208506]
Length: 5, dtype: float64

In [11]: s.index.array
Out[11]: 
<PandasArray>
['a', 'b', 'c', 'd', 'e']
Length: 5, dtype: object

array 一般指 ExtensionArray还棱。至于什么是 ExtensionArray 及 pandas 為什么要用 ExtensionArray 不是本節(jié)要說明的內(nèi)容载慈。更多信息請參閱數(shù)據(jù)類型惭等。

提取 Numpy 數(shù)組珍手，用 to_numpy() 或 numpy.asarray()。

In [12]: s.to_numpy()
Out[12]: array([ 0.4691, -0.2829, -1.5091, -1.1356,  1.2121])

In [13]: np.asarray(s)
Out[13]: array([ 0.4691, -0.2829, -1.5091, -1.1356,  1.2121])

Series 與 Index 的類型是 ExtensionArray 時(shí)， to_numpy() 會復(fù)制數(shù)據(jù)琳要，并強(qiáng)制轉(zhuǎn)換值寡具。詳情見數(shù)據(jù)類型。

to_numpy() 可以控制 numpy.ndarray 生成的數(shù)據(jù)類型稚补。以帶時(shí)區(qū)的 datetime 為例童叠，Numpy 未提供時(shí)區(qū)信息的 datetime 數(shù)據(jù)類型，pandas 則提供了兩種表現(xiàn)形式：

一種是帶 Timestamp 的 numpy.ndarray课幕，提供了正確的 tz 信息厦坛。
另一種是 datetime64[ns]，這也是 numpy.ndarray乍惊，值被轉(zhuǎn)換為 UTC杜秸，但去掉了時(shí)區(qū)信息。

時(shí)區(qū)信息可以用 dtype=object 保存润绎。

In [14]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [15]: ser.to_numpy(dtype=object)
Out[15]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET', freq='D'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET', freq='D')],
      dtype=object)

或用 dtype='datetime64[ns]' 去除撬碟。

In [16]: ser.to_numpy(dtype="datetime64[ns]")
Out[16]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

獲取 DataFrame 里的原數(shù)據(jù)略顯復(fù)雜。DataFrame 里所有列的數(shù)據(jù)類型都一樣時(shí)莉撇，DataFrame.to_numpy() 返回底層數(shù)據(jù)：

In [17]: df.to_numpy()
Out[17]: 
array([[-0.1732,  0.1192, -1.0442],
       [-0.8618, -2.1046, -0.4949],
       [ 1.0718,  0.7216, -0.7068],
       [-1.0396,  0.2719, -0.425 ],
       [ 0.567 ,  0.2762, -1.0874],
       [-0.6737,  0.1136, -1.4784],
       [ 0.525 ,  0.4047,  0.577 ],
       [-1.715 , -1.0393, -0.3706]])

DataFrame 為同質(zhì)型數(shù)據(jù)時(shí)呢蛤，pandas 直接修改原始 ndarray，所做修改會直接反應(yīng)在數(shù)據(jù)結(jié)構(gòu)里棍郎。對于異質(zhì)型數(shù)據(jù)其障，即 DataFrame 列的數(shù)據(jù)類型不一樣時(shí)，就不是這種操作模式了涂佃。與軸標(biāo)簽不同静秆，不能為值的屬性賦值。

::: tip 注意

處理異質(zhì)型數(shù)據(jù)時(shí)巡李，輸出結(jié)果 ndarray 的數(shù)據(jù)類型適用于涉及的各類數(shù)據(jù)抚笔。若 DataFrame 里包含字符串，輸出結(jié)果的數(shù)據(jù)類型就是 object侨拦。要是只有浮點(diǎn)數(shù)或整數(shù)殊橙，則輸出結(jié)果的數(shù)據(jù)類型是浮點(diǎn)數(shù)。

:::

以前狱从，pandas 推薦用 Series.values 或 DataFrame.values 從 Series 或 DataFrame 里提取數(shù)據(jù)膨蛮。舊有代碼庫或在線教程里仍在用這種操作，但其實(shí) pandas 已經(jīng)對此做出了改進(jìn)季研，現(xiàn)在推薦用 .array 或 to_numpy 這兩種方式提取數(shù)據(jù)敞葛，別再用 .values 了。.values 有以下幾個(gè)缺點(diǎn)：

Series 含擴(kuò)展類型時(shí)与涡，Series.values 無法判斷到底是該返回 Numpy array惹谐，還是返回 ExtensionArray持偏。而 Series.array 則只返回 ExtensionArray，且不會復(fù)制數(shù)據(jù)氨肌。Series.to_numpy() 則返回 Numpy 數(shù)組鸿秆，其代價(jià)是需要復(fù)制、并強(qiáng)制轉(zhuǎn)換數(shù)據(jù)的值怎囚。
DataFrame 含多種數(shù)據(jù)類型時(shí)卿叽，DataFrame.values 會復(fù)制數(shù)據(jù)，并將數(shù)據(jù)的值強(qiáng)制轉(zhuǎn)換同一種數(shù)據(jù)類型恳守，這是一種代價(jià)較高的操作考婴。DataFrame.to_numpy() 則返回 Numpy 數(shù)組，這種方式更清晰催烘，也不會把 DataFrame 里的數(shù)據(jù)都當(dāng)作一種類型蕉扮。

加速操作

借助 numexpr 與 bottleneck 支持庫，pandas 可以加速特定類型的二進(jìn)制數(shù)值與布爾操作颗圣。

處理大型數(shù)據(jù)集時(shí)喳钟，這兩個(gè)支持庫特別有用，加速效果也非常明顯在岂。 numexpr 使用智能分塊奔则、緩存與多核技術(shù)。bottleneck 是一組專屬 cython 例程蔽午，處理含 nans 值的數(shù)組時(shí)易茬，特別快。

請看下面這個(gè)例子（DataFrame 包含 100 列 X 10 萬行數(shù)據(jù)）:

操作	0.11.0版 (ms)	舊版 (ms)	提升比率
`df1 > df2`	13.32	125.35	0.1063
`df1 * df2`	21.71	36.63	0.5928
`df1 + df2`	22.04	36.50	0.6039

強(qiáng)烈建議安裝這兩個(gè)支持庫及老，了解更多信息抽莱，請參閱推薦支持庫。

這兩個(gè)支持庫默認(rèn)為啟用狀態(tài)骄恶，可用以下選項(xiàng)設(shè)置：

0.20.0 版新增

pd.set_option('compute.use_bottleneck', False)
pd.set_option('compute.use_numexpr', False)

二進(jìn)制操作

pandas 數(shù)據(jù)結(jié)構(gòu)之間執(zhí)行二進(jìn)制操作食铐，要注意下列兩個(gè)關(guān)鍵點(diǎn)：

多維（DataFrame）與低維（Series）對象之間的廣播機(jī)制；
計(jì)算中的缺失值處理僧鲁。

這兩個(gè)問題可以同時(shí)處理虐呻，但下面先介紹怎么分開處理。

匹配/廣播機(jī)制

DataFrame 支持 add()寞秃、sub()斟叼、mul()、div() 及 radd()春寿、rsub() 等方法執(zhí)行二進(jìn)制操作朗涩。廣播機(jī)制重點(diǎn)關(guān)注輸入的 Series。通過 axis 關(guān)鍵字绑改，匹配 index 或 columns 即可調(diào)用這些函數(shù)谢床。

In [18]: df = pd.DataFrame({
   ....:     'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
   ....:     'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
   ....:     'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
   ....: 

In [19]: df
Out[19]: 
        one       two     three
a  1.394981  1.772517       NaN
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435
d       NaN  0.279344 -0.613172

In [20]: row = df.iloc[1]

In [21]: column = df['two']

In [22]: df.sub(row, axis='columns')
Out[22]: 
        one       two     three
a  1.051928 -0.139606       NaN
b  0.000000  0.000000  0.000000
c  0.352192 -0.433754  1.277825
d       NaN -1.632779 -0.562782

In [23]: df.sub(row, axis=1)
Out[23]: 
        one       two     three
a  1.051928 -0.139606       NaN
b  0.000000  0.000000  0.000000
c  0.352192 -0.433754  1.277825
d       NaN -1.632779 -0.562782

In [24]: df.sub(column, axis='index')
Out[24]: 
        one  two     three
a -0.377535  0.0       NaN
b -1.569069  0.0 -1.962513
c -0.783123  0.0 -0.250933
d       NaN  0.0 -0.892516

In [25]: df.sub(column, axis=0)
Out[25]: 
        one  two     three
a -0.377535  0.0       NaN
b -1.569069  0.0 -1.962513
c -0.783123  0.0 -0.250933
d       NaN  0.0 -0.892516

還可以用 Series 對齊多重索引 DataFrame 的某一層級兄一。

In [26]: dfmi = df.copy()

In [27]: dfmi.index = pd.MultiIndex.from_tuples([(1, 'a'), (1, 'b'),
   ....:                                         (1, 'c'), (2, 'a')],
   ....:                                        names=['first', 'second'])
   ....: 

In [28]: dfmi.sub(column, axis=0, level='second')
Out[28]: 
                   one       two     three
first second                              
1     a      -0.377535  0.000000       NaN
      b      -1.569069  0.000000 -1.962513
      c      -0.783123  0.000000 -0.250933
2     a            NaN -1.493173 -2.385688

Series 與 Index 還支持 divmod() 內(nèi)置函數(shù)，該函數(shù)同時(shí)執(zhí)行向下取整除與模運(yùn)算萤悴，返回兩個(gè)與左側(cè)類型相同的元組。示例如下：

In [29]: s = pd.Series(np.arange(10))

In [30]: s
Out[30]: 
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

In [31]: div, rem = divmod(s, 3)

In [32]: div
Out[32]: 
0    0
1    0
2    0
3    1
4    1
5    1
6    2
7    2
8    2
9    3
dtype: int64

In [33]: rem
Out[33]: 
0    0
1    1
2    2
3    0
4    1
5    2
6    0
7    1
8    2
9    0
dtype: int64

In [34]: idx = pd.Index(np.arange(10))

In [35]: idx
Out[35]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

In [36]: div, rem = divmod(idx, 3)

In [37]: div
Out[37]: Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2, 3], dtype='int64')

In [38]: rem
Out[38]: Int64Index([0, 1, 2, 0, 1, 2, 0, 1, 2, 0], dtype='int64')

divmod() 還支持元素級運(yùn)算：

In [39]: div, rem = divmod(s, [2, 2, 3, 3, 4, 4, 5, 5, 6, 6])

In [40]: div
Out[40]: 
0    0
1    0
2    0
3    1
4    1
5    1
6    1
7    1
8    1
9    1
dtype: int64

In [41]: rem
Out[41]: 
0    0
1    1
2    2
3    0
4    0
5    1
6    1
7    2
8    2
9    3
dtype: int64

缺失值與填充缺失值操作

Series 與 DataFrame 的算數(shù)函數(shù)支持 fill_value 選項(xiàng)皆的，即用指定值替換某個(gè)位置的缺失值覆履。比如，兩個(gè) DataFrame 相加费薄，除非兩個(gè) DataFrame 里同一個(gè)位置都有缺失值硝全，其相加的和仍為 NaN，如果只有一個(gè) DataFrame 里存在缺失值楞抡，則可以用 fill_value 指定一個(gè)值來替代 NaN伟众，當(dāng)然，也可以用 fillna 把 NaN 替換為想要的值召廷。

下面的第 43 條代碼里凳厢，Pandas 官檔沒有寫 df2 是哪里來的，這里補(bǔ)上竞慢，與 df 類似先紫。
df2 = pd.DataFrame({
  ....:     'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
  ....:     'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
  ....:     'three': pd.Series(np.random.randn(3), index=['a', 'b', 'c', 'd'])})
  ....:

In [42]: df
Out[42]: 
        one       two     three
a  1.394981  1.772517       NaN
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435
d       NaN  0.279344 -0.613172

In [43]: df2
Out[43]: 
        one       two     three
a  1.394981  1.772517  1.000000
b  0.343054  1.912123 -0.050390
c  0.695246  1.478369  1.227435
d       NaN  0.279344 -0.613172

In [44]: df + df2
Out[44]: 
        one       two     three
a  2.789963  3.545034       NaN
b  0.686107  3.824246 -0.100780
c  1.390491  2.956737  2.454870
d       NaN  0.558688 -1.226343

In [45]: df.add(df2, fill_value=0)
Out[45]: 
        one       two     three
a  2.789963  3.545034  1.000000
b  0.686107  3.824246 -0.100780
c  1.390491  2.956737  2.454870
d       NaN  0.558688 -1.226343

比較操作

與上一小節(jié)的算數(shù)運(yùn)算類似，Series 與 DataFrame 還支持 eq筹煮、ne遮精、lt、gt败潦、le本冲、ge 等二進(jìn)制比較操作的方法：

序號	縮寫	英文	中文
1	eq	equal to	等于
2	ne	not equal to	不等于
3	lt	less than	小于
4	gt	greater than	大于
5	le	less than or equal to	小于等于
6	ge	greater than or equal to	大于等于

In [46]: df.gt(df2)
Out[46]: 
     one    two  three
a  False  False  False
b  False  False  False
c  False  False  False
d  False  False  False

In [47]: df2.ne(df)
Out[47]: 
     one    two  three
a  False  False   True
b  False  False  False
c  False  False  False
d   True  False  False

這些操作生成一個(gè)與左側(cè)輸入對象類型相同的 pandas 對象，即劫扒，dtype 為 bool檬洞。這些 boolean 對象可用于索引操作，參閱布爾索引小節(jié)沟饥。

布爾簡化

empty疮胖、any()、all()闷板、bool() 可以把數(shù)據(jù)匯總簡化至單個(gè)布爾值澎灸。

In [48]: (df > 0).all()
Out[48]: 
one      False
two       True
three    False
dtype: bool

In [49]: (df > 0).any()
Out[49]: 
one      True
two      True
three    True
dtype: bool

還可以進(jìn)一步把上面的結(jié)果簡化為單個(gè)布爾值。

In [50]: (df > 0).any().any()
Out[50]: True

通過 empty 屬性遮晚，可以驗(yàn)證 pandas 對象是否為空性昭。

In [51]: df.empty
Out[51]: False

In [52]: pd.DataFrame(columns=list('ABC')).empty
Out[52]: True

用 bool() 方法驗(yàn)證單元素 pandas 對象的布爾值。

In [53]: pd.Series([True]).bool()
Out[53]: True

In [54]: pd.Series([False]).bool()
Out[54]: False

In [55]: pd.DataFrame([[True]]).bool()
Out[55]: True

In [56]: pd.DataFrame([[False]]).bool()
Out[56]: False

::: danger 警告

以下代碼：

>>> if df:
...     pass

或

>>> df and df2

上述代碼試圖比對多個(gè)值县遣，因此糜颠，這兩種操作都會觸發(fā)錯誤：

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

:::

了解詳情汹族，請參閱各種坑小節(jié)的內(nèi)容。

比較對象是否等效

一般情況下其兴，多種方式都能得出相同的結(jié)果顶瞒。以 df + df 與 df * 2 為例。應(yīng)用上一小節(jié)學(xué)到的知識元旬，測試這兩種計(jì)算方式的結(jié)果是否一致榴徐，一般人都會用 (df + df == df * 2).all()，不過匀归，這個(gè)表達(dá)式的結(jié)果是 False：

In [57]: df + df == df * 2
Out[57]: 
     one   two  three
a   True  True  False
b   True  True   True
c   True  True   True
d  False  True   True

In [58]: (df + df == df * 2).all()
Out[58]: 
one      False
two       True
three    False
dtype: bool

注意：布爾型 DataFrame df + df == df * 2 中有 False 值坑资！這是因?yàn)閮蓚€(gè) NaN 值的比較結(jié)果為不等：

In [59]: np.nan == np.nan
Out[59]: False

為了驗(yàn)證數(shù)據(jù)是否等效，Series 與 DataFrame 等 N 維框架提供了 equals() 方法穆端，袱贮，用這個(gè)方法驗(yàn)證 NaN 值的結(jié)果為相等。

In [60]: (df + df).equals(df * 2)
Out[60]: True

注意：Series 與 DataFrame 索引的順序必須一致体啰，驗(yàn)證結(jié)果才能為 True：

In [61]: df1 = pd.DataFrame({'col': ['foo', 0, np.nan]})

In [62]: df2 = pd.DataFrame({'col': [np.nan, 0, 'foo']}, index=[2, 1, 0])

In [63]: df1.equals(df2)
Out[63]: False

In [64]: df1.equals(df2.sort_index())
Out[64]: True

比較 array 型對象

用標(biāo)量值與 pandas 數(shù)據(jù)結(jié)構(gòu)對比數(shù)據(jù)元素非常簡單：

In [65]: pd.Series(['foo', 'bar', 'baz']) == 'foo'
Out[65]: 
0     True
1    False
2    False
dtype: bool

In [66]: pd.Index(['foo', 'bar', 'baz']) == 'foo'
Out[66]: array([ True, False, False])

pandas 還能對比兩個(gè)等長 array 對象里的數(shù)據(jù)元素：

In [67]: pd.Series(['foo', 'bar', 'baz']) == pd.Index(['foo', 'bar', 'qux'])
Out[67]: 
0     True
1     True
2    False
dtype: bool

In [68]: pd.Series(['foo', 'bar', 'baz']) == np.array(['foo', 'bar', 'qux'])
Out[68]: 
0     True
1     True
2    False
dtype: bool

對比不等長的 Index 或 Series 對象會觸發(fā) ValueError：

In [55]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo', 'bar'])
ValueError: Series lengths must match to compare

In [56]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo'])
ValueError: Series lengths must match to compare

注意：這里的操作與 Numpy 的廣播機(jī)制不同：

In [69]: np.array([1, 2, 3]) == np.array([2])
Out[69]: array([False,  True, False])

Numpy 無法執(zhí)行廣播操作時(shí)秧骑，返回 False:

In [70]: np.array([1, 2, 3]) == np.array([1, 2])
Out[70]: False

合并重疊數(shù)據(jù)集

有時(shí)會合并兩個(gè)近似數(shù)據(jù)集虐拓，兩個(gè)數(shù)據(jù)集中，其中一個(gè)的數(shù)據(jù)比另一個(gè)多。比如磨镶，展示特定經(jīng)濟(jì)指標(biāo)的兩個(gè)數(shù)據(jù)序列偎捎，其中一個(gè)是“高質(zhì)量”指標(biāo)杨耙，另一個(gè)是“低質(zhì)量”指標(biāo)衬廷。一般來說，低質(zhì)量序列可能包含更多的歷史數(shù)據(jù)搀擂，或覆蓋更廣的數(shù)據(jù)西潘。因此，要合并這兩個(gè) DataFrame 對象哨颂，其中一個(gè) DataFrame 中的缺失值將按指定條件用另一個(gè) DataFrame 里類似標(biāo)簽中的數(shù)據(jù)進(jìn)行填充喷市。要實(shí)現(xiàn)這一操作，請用下列代碼中的 combine_first() 函數(shù)威恼。

In [71]: df1 = pd.DataFrame({'A': [1., np.nan, 3., 5., np.nan],
   ....:                     'B': [np.nan, 2., 3., np.nan, 6.]})
   ....: 

In [72]: df2 = pd.DataFrame({'A': [5., 2., 4., np.nan, 3., 7.],
   ....:                     'B': [np.nan, np.nan, 3., 4., 6., 8.]})
   ....: 

In [73]: df1
Out[73]: 
     A    B
0  1.0  NaN
1  NaN  2.0
2  3.0  3.0
3  5.0  NaN
4  NaN  6.0

In [74]: df2
Out[74]: 
     A    B
0  5.0  NaN
1  2.0  NaN
2  4.0  3.0
3  NaN  4.0
4  3.0  6.0
5  7.0  8.0

In [75]: df1.combine_first(df2)
Out[75]: 
     A    B
0  1.0  NaN
1  2.0  2.0
2  3.0  3.0
3  5.0  4.0
4  3.0  6.0
5  7.0  8.0

通用的 DataFrame 合并方法

上述 combine_first() 方法調(diào)用了更普適的 DataFrame.combine() 方法品姓。該方法提取另一個(gè) DataFrame 及合并器函數(shù)，并將之與輸入的 DataFrame 對齊箫措，再傳遞與 Series 配對的合并器函數(shù)（比如腹备，名稱相同的列）。

下面的代碼復(fù)現(xiàn)了上述的 combine_first() 函數(shù)：

In [76]: def combiner(x, y):
   ....:     return np.where(pd.isna(x), y, x)
   ....:

最后編輯于：2020.10.26 10:41:15

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末斤蔓，一起剝皮案震驚了整個(gè)濱河市植酥，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖友驮，帶你破解...
沈念sama閱讀 206,126評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件漂羊，死亡現(xiàn)場離奇詭異，居然都是意外死亡卸留，警方通過查閱死者的電腦和手機(jī)走越，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,254評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來耻瑟，“玉大人旨指，你說我怎么就攤上這事〈以撸” “怎么了淤毛？”我有些...
開封第一講書人閱讀 152,445評論 0贊 341
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵今缚，是天一觀的道長算柳。經(jīng)常有香客問我，道長姓言，這世上最難降的妖魔是什么瞬项？我笑而不...
開封第一講書人閱讀 55,185評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮何荚，結(jié)果婚禮上囱淋，老公的妹妹穿的比我還像新娘。我一直安慰自己餐塘，他們只是感情好妥衣，可當(dāng)我...
茶點(diǎn)故事閱讀 64,178評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著戒傻，像睡著了一般税手。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上需纳，一...
開封第一講書人閱讀 48,970評論 1贊 284
城市分裂傳說
那天芦倒，我揣著相機(jī)與錄音，去河邊找鬼不翩。笑死兵扬，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的口蝠。我是一名探鬼主播器钟，決...
沈念sama閱讀 38,276評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼妙蔗！你這毒婦竟也來了俱箱？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,927評論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤灭必，失蹤者是張志新（化名）和其女友劉穎狞谱，沒想到半個(gè)月后乃摹，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,400評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡跟衅，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,883評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年孵睬，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片伶跷。...
茶點(diǎn)故事閱讀 37,997評論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡掰读，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出叭莫，到底是詐尸還是另有隱情蹈集，我是刑警寧澤，帶...
沈念sama閱讀 33,646評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布雇初，位于F島的核電站拢肆，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏靖诗。R本人自食惡果不足惜郭怪，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,213評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望刊橘。院中可真熱鬧鄙才，春花似錦、人聲如沸促绵。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,204評論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽败晴。三九已至浓冒，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間位衩，已是汗流浹背裆蒸。一陣腳步聲響...
開封第一講書人閱讀 31,423評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留糖驴，地道東北人僚祷。一個(gè)月前我還...
沈念sama閱讀 45,423評論 2贊 352
代替公主和親
正文我出身青樓，卻偏偏與公主長得像贮缕，于是被迫代替她去往敵國和親辙谜。傳聞我的和親對象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,722評論 2贊 345