Python Pandas 使用[ ]進(jìn)行數(shù)據(jù)操作
本文將介紹Pandas中“[ ]”的一些相關(guān)操作唉擂,如進(jìn)行數(shù)據(jù)選擇及更改晃酒。
“[ ]” 應(yīng)該是最基本的選擇數(shù)據(jù)的方法臀晃,下面是可以向其中傳入的類型:
- 可以直接傳入column;
- 也可以傳入column list;
- 使用切片;
- 使用布爾索引炮沐。
讀入數(shù)據(jù)
import pandas as pd
import numpy as np
import seaborn as sns
df
dates = pd.date_range('1/1/2020', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('ABCD'))
df
out:
A B C D
2020-01-01 0.336131 -0.086456 0.096903 -1.230599
2020-01-02 -0.106293 0.111821 1.165342 -1.378462
2020-01-03 -0.933779 0.898738 0.013194 -0.593243
2020-01-04 0.190229 -1.108908 0.597650 2.759475
2020-01-05 -0.647080 1.573537 1.357191 -0.536916
2020-01-06 -0.455373 1.342904 -0.316548 0.145119
2020-01-07 -1.350214 -0.044642 0.501508 1.969973
2020-01-08 -0.474602 -0.384916 1.829222 0.853519
傳入列表
傳入列表己沛,并以列表順序讀取慌核,返回 DataFrame對(duì)象。
df[['C','D']]
C D
2020-01-01 0.096903 -1.230599
2020-01-02 1.165342 -1.378462
2020-01-03 0.013194 -0.593243
2020-01-04 0.597650 2.759475
2020-01-05 1.357191 -0.536916
2020-01-06 -0.316548 0.145119
2020-01-07 0.501508 1.969973
2020-01-08 1.829222 0.853519
傳入單列
如果單獨(dú)傳入某一列申尼,則返回series對(duì)象;如果傳入列表垫桂,則返回DataFrame對(duì)象师幕,即使列表的長(zhǎng)度為1.
df['C']
out:
2020-01-01 0.096903
2020-01-02 1.165342
2020-01-03 0.013194
2020-01-04 0.597650
2020-01-05 1.357191
2020-01-06 -0.316548
2020-01-07 0.501508
2020-01-08 1.829222
Freq: D, Name: C, dtype: float64
df[['C']]
out:
2020-01-01 0.096903
2020-01-02 1.165342
2020-01-03 0.013194
2020-01-04 0.597650
2020-01-05 1.357191
2020-01-06 -0.316548
2020-01-07 0.501508
2020-01-08 1.829222
可以用來(lái)交換列值。
df[['A','B']] = df[['B','A']]
df
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 0.111821 -0.106293 1.165342 -1.378462
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
如下所示是另一種交換子集的方法。
df.loc[:, ['A', 'B']] = df[['B', 'A']]
df.loc[:, ['A', 'B']] = df[['B', 'A']]
df
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 0.111821 -0.106293 1.165342 -1.378462
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
上面的操作不會(huì)交換列值霹粥,交換列值需要使用值來(lái)交換灭将。
df.loc[:, ['A', 'B']] = df[['B', 'A']].values
df
out:
A B C D
2020-01-01 0.336131 -0.086456 0.096903 -1.230599
2020-01-02 -0.106293 0.111821 1.165342 -1.378462
2020-01-03 -0.933779 0.898738 0.013194 -0.593243
2020-01-04 0.190229 -1.108908 0.597650 2.759475
2020-01-05 -0.647080 1.573537 1.357191 -0.536916
2020-01-06 -0.455373 1.342904 -0.316548 0.145119
2020-01-07 -1.350214 -0.044642 0.501508 1.969973
2020-01-08 -0.474602 -0.384916 1.829222 0.853519
使用to_numpy()也可以進(jìn)行交換。
df.loc[:, ['A', 'B']] = df[['B', 'A']].to_numpy()
df
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 0.111821 -0.106293 1.165342 -1.378462
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
使用切片
獲取前兩行數(shù)據(jù)
df[:2]
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 1.000000 2.000000 5.000000 6.000000
設(shè)置步長(zhǎng)
df[::2]
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
df[1::2]
out:
A B C D
2020-01-02 4.000000 5.000000 6.000000 7.000000
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
將數(shù)據(jù)逆序排列
df[::-1]
out:
A B C D
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-02 1.000000 2.000000 5.000000 6.000000
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
使用切片進(jìn)行賦值
df[:2] = np.arange(8).reshape(2,4)
df
out:
A B C D
2020-01-01 0.000000 1.000000 2.000000 3.000000
2020-01-02 4.000000 5.000000 6.000000 7.000000
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
使用布爾索引
df = pd.DataFrame(np.random.randn(8,4),index=dates,columns=list('abcd'))
df
out:
a b c d
2020-01-01 -1.749988 -0.249398 -1.165277 -0.806687
2020-01-02 0.026334 0.158118 0.341183 -1.042534
2020-01-03 0.513027 -0.127235 -0.454433 -0.162600
2020-01-04 1.719313 -1.417885 0.267647 -0.960537
2020-01-05 -0.259797 -0.851702 -0.873451 -0.476420
2020-01-06 -0.048619 -0.690095 0.759120 1.184295
2020-01-07 -0.748535 -1.252718 0.386220 -0.415996
2020-01-08 -0.497471 -0.550428 -0.867333 -0.109223
mask = df['a'] > 0
mask
out:
2020-01-01 False
2020-01-02 True
2020-01-03 True
2020-01-04 True
2020-01-05 False
2020-01-06 False
2020-01-07 False
2020-01-08 False
Freq: D, Name: a, dtype: bool
df[mask]
out:
a b c d
2020-01-02 0.026334 0.158118 0.341183 -1.042534
2020-01-03 0.513027 -0.127235 -0.454433 -0.162600
2020-01-04 1.719313 -1.417885 0.267647 -0.960537
多條件
df[mask & mask2]
mask2 = df['b'] < 0
?
df[mask & mask2]
out:
a b c d
2020-01-03 0.513027 -0.127235 -0.454433 -0.162600
2020-01-04 1.719313 -1.417885 0.267647 -0.960537
使用布爾索引更改數(shù)據(jù)
df[mask & mask2] = np.arange(8).reshape(2,4)
df
out:
a b c d
2020-01-01 -1.749988 -0.249398 -1.165277 -0.806687
2020-01-02 0.026334 0.158118 0.341183 -1.042534
2020-01-03 0.000000 1.000000 2.000000 3.000000
2020-01-04 4.000000 5.000000 6.000000 7.000000
2020-01-05 -0.259797 -0.851702 -0.873451 -0.476420
2020-01-06 -0.048619 -0.690095 0.759120 1.184295
2020-01-07 -0.748535 -1.252718 0.386220 -0.415996
2020-01-08 -0.497471 -0.550428 -0.867333 -0.109223