Python數(shù)據(jù)分析-07

1.DataFrame對象

按照一定順序排列多列數(shù)據(jù),各列數(shù)據(jù)類型可以有所不同

DataFrame對象有兩個索引數(shù)組黑毅,第一個數(shù)組與行相關(guān),它與Series的索引數(shù)組極為相似,每個標(biāo)簽與標(biāo)簽所在行的所有元素相關(guān)聯(lián)晒骇,第二個數(shù)組包含一系列標(biāo)簽,每個標(biāo)簽與一列數(shù)據(jù)相關(guān)聯(lián)

DataFrame可以理解為一個由Series組成的字典磺浙,其中每一列的名稱為字典的鍵洪囤,形成DataFrame的列的Series作為字典的值

2定義DateFrame對象

新建dataFrame最常用的方法是傳遞一個dict對象給DataFrame()構(gòu)造函數(shù)

dictd對象的每一列名稱作為鍵,每個鍵都有一個數(shù)組作為值

1)將字典的每個鍵值對都放入DataFrame中

>>> import pandas as pd? #引入pandas包

>>> dict={'colors':['red','blue','yellow','black'],'object':['pen','paper','ball','mug'],'price':[1.1,1.2,3.2,4]}? #定義一個字典撕氧,每個鍵是以后DataFrame對象的列名箍鼓,每個鍵對應(yīng)的值是以后DataFrame列的元素內(nèi)容

>>> dict

{'object': ['pen', 'paper', 'ball', 'mug'], 'price': [1.1, 1.2, 3.2, 4], 'colors': ['red', 'blue', 'yellow', 'black']}

>>> s=pd.DataFrame(dict) #利用DataFrame的構(gòu)造函數(shù),將dict的內(nèi)容放入DataFrame中

>>> s?

??colors object?price

0???red??pen??1.1

1??blue?paper??1.2

2?yellow??ball??3.2

3??black??mug??4.0

2)挑選字典中部分?jǐn)?shù)據(jù)對用來初始化DataFrame對象

>>> import pandas as pd #導(dǎo)入pandas包

>>> dic={'colos':['red','black','yellow','orange'],'object':['pen','ball','shirt','mug'],'price':[1.2,3.4,2.3,5]} #定義字典

>>> dic

{'object': ['pen', 'ball', 'shirt', 'mug'], 'price': [1.2, 3.4, 2.3, 5], 'colos': ['red', 'black', 'yellow', 'orange']}

>>> s=pd.DataFrame(dic,columns=['price','object']) #用字典來初始化DataFrame對象并且只選擇兩列數(shù)據(jù)呵曹,且順序按照我選擇的來ding

>>> s

??price object

0??1.2??pen

1??3.4??ball

2??2.3?shirt

3??5.0??mug

3)對DataFrame對象進(jìn)行自定義索引(上面的例子都是不定義款咖,系統(tǒng)默認(rèn)從0開始定義)

4)不使用字典,使用構(gòu)造函數(shù)三個參數(shù)來進(jìn)行定義DataFrame

指定三個參數(shù)奄喂,順序:數(shù)據(jù)矩陣铐殃、index選項(xiàng)、columns選項(xiàng)跨新、將存放標(biāo)簽的數(shù)組賦給index富腊,將存放列名的數(shù)組賦值給columns選項(xiàng)、可使用np.arange(16).reshape(4,4)快捷生成矩陣

>>> import numpy as np

>>> import pandas as pd

>>> arry=np.arange(16)

>>> arry

array([ 0,?1,?2,?3,?4,?5,?6,?7,?8,?9, 10, 11, 12, 13, 14, 15])

>>> arry=np.arange(16).reshape(4,4)

>>> arry

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

>>> s=pd.DataFrame(arry,index=['a','b','c','d'],columns=['A','B','C','D'])

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

3.選取元素

1)要想知道DataFrame的所有列的名稱域帐,對它調(diào)用columns屬性即可

2)要想獲取DataFrame的索引列表赘被,調(diào)用index熟悉即可

3)想要獲取數(shù)據(jù)結(jié)構(gòu)中的元素,使用values熟悉獲取即可

>>> import numpy as np

>>> import pandas as pd

>>> arry=np.arange(16)

>>> arry

array([ 0,?1,?2,?3,?4,?5,?6,?7,?8,?9, 10, 11, 12, 13, 14, 15])

>>> arry=np.arange(16).reshape(4,4)

>>> arry

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

>>> s=pd.DataFrame(arry,columns=['a','b','c','d'])

>>> s

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> s=pd.DataFrame(arry,index=['a','b','c','d'],columns=['A','B','C','D'])

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

>>> s.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

>>> s.values

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

4)如果想要獲取一列元素內(nèi)容肖揣,把這一列名稱作為所以即可民假,或者是調(diào)用這個列名的屬性方法

第一種方法

>>> s['B']

a???1

b???5

c???9

d??13

Name: B, dtype: int64

第二種方法

>>> s.B

a???1

b???5

c???9

d??13

Name: B, dtype: int64

5)獲取DataFrame某一行數(shù)據(jù),利用ix熟悉的索引值獲取

獲取單行

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.ix[2]

A???8

B???9

C??10

D??11

Name: c, dtype: int64

>>> s.ix['c']

A???8

B???9

C??10

D??11

Name: c, dtype: int64

獲取多行(非連續(xù))

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.ix[[1,3]]

??A??B??C??D

b??4??5??6??7

d?12?13?14?15

>>> s.ix[['b','d']]

??A??B??C??D

b??4??5??6??7

d?12?13?14?15

獲取多行(連續(xù))

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.ix[0:3]

??A?B??C??D

a?0?1??2??3

b?4?5??6??7

c?8?9?10?11

>>> s.ix['a':'c']

??A?B??C??D

a?0?1??2??3

b?4?5??6??7

c?8?9?10?11

獲取某個元素

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s['A'][1] #注意一定要先寫列【A】在寫行【1】

4

4.賦值

1)給index和columns指定name

>>> import numpy as np

>>> import pandas as pd

>>> arry=np.arange(16)

>>> arry

array([ 0,?1,?2,?3,?4,?5,?6,?7,?8,?9, 10, 11, 12, 13, 14, 15])

>>> arry=np.arange(16).reshape(4,4)

>>> arry

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

>>> s=pd.DataFrame(arry,index=['a','b','c','d'],columns=['A','B','C','D'])

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.index.name=id

>>> s.columns.name='item'

>>> s

item???????????A??B??C??D

a????????????0??1??2??3

b????????????4??5??6??7

c????????????8??9?10?11

d????????????12?13?14?15

>>> s.index.name='id'

>>> s

item??A??B??C??D

id

a???0??1??2??3

b???4??5??6??7

c???8??9?10?11

d???12?13?14?15

2)添加一列新元素

>>> s

item??A??B??C??D

id

a???0??1??2??3

b???4??5??6??7

c???8??9?10?11

d???12?13?14?15

>>> s['E']=12

>>> s

item??A??B??C??D??E

id

a???0??1??2??3?12

b???4??5??6??7?12

c???8??9?10?11?12

d???12?13?14?15?12

3)給已經(jīng)有的一列更新元素值

>>> s

item??A??B??C??D??E

id

a???0??1??2??3?12

b???4??5??6??7?12

c???8??9?10?11?12

d???12?13?14?15?12

>>> s['E']=[3,5,2,6]

>>> s

item??A??B??C??D?E

id

a???0??1??2??3?3

b???4??5??6??7?5

c???8??9?10?11?2

d???12?13?14?15?6

5.元素的所屬關(guān)系

>>> s

item??A??B??C??D??E??F

id

a???0??1??2??3 NaN NaN

b???4??5??6??7 NaN NaN

c???8??9?10?11 NaN NaN

d???12?13?14?15 NaN NaN

>>> s.isin([1,4])

item???A???B???C???D???E???F

id

a???False??True?False?False?False?False

b???True?False?False?False?False?False

c???False?False?False?False?False?False

d???False?False?False?False?False?False

>>> s[s.isin([1,4])]

item??A??B??C??D??E??F

id

a???NaN?1.0 NaN NaN NaN NaN

b???4.0?NaN NaN NaN NaN NaN

c???NaN?NaN NaN NaN NaN NaN

d???NaN?NaN NaN NaN NaN NaN

6.刪除一列

>>> s

item??A??B??C??D??E??F

id

a???0??1??2??3 NaN NaN

b???4??5??6??7 NaN NaN

c???8??9?10?11 NaN NaN

d???12?13?14?15 NaN NaN

>>> del s['E']

>>> s

item??A??B??C??D??F

id

a???0??1??2??3 NaN

b???4??5??6??7 NaN

c???8??9?10?11 NaN

d???12?13?14?15 NaN

7.篩選

>>> s

item??A??B??C??D??F

id

a???0??1??2??3 NaN

b???4??5??6??7 NaN

c???8??9?10?11 NaN

d???12?13?14?15 NaN

>>> s[s<3]

item??A??B??C??D??F

id

a???0.0?1.0?2.0 NaN NaN

b???NaN?NaN?NaN NaN NaN

c???NaN?NaN?NaN NaN NaN

d???NaN?NaN?NaN NaN NaN

8.用嵌套字典生成DataFrame對象

將嵌套字典作為參數(shù)傳遞給DataFrame的構(gòu)造函數(shù)龙优,pandas就會將內(nèi)部的鍵作為列名羊异,將外部的鍵作為索引名,并非所有位置都有相應(yīng)的元素存在彤断,pandas會用NaN填充

>>> import pandas as pd

>>> dic={'red':{2012:22,2013:33},'white':{2011:13,2012:22,2013:16},'blue':{2017:17,2012:23,2018:18}}

>>> dic

{'blue': {2017: 17, 2018: 18, 2012: 23}, 'white': {2011: 13, 2012: 22, 2013: 16}, 'red': {2012: 22, 2013: 33}}

>>> s=pd.DataFrame(dic)

>>> s

???blue??red?white

2011??NaN??NaN??13.0

2012?23.0?22.0??22.0

2013??NaN?33.0??16.0

2017?17.0??NaN??NaN

2018?18.0??NaN??NaN

9.DataFrame轉(zhuǎn)置

>>> s

???blue??red?white

2011??NaN??NaN??13.0

2012?23.0?22.0??22.0

2013??NaN?33.0??16.0

2017?17.0??NaN??NaN

2018?18.0??NaN??NaN

>>> s.T #調(diào)用T方法就行

????2011?2012?2013?2017?2018

blue??NaN?23.0??NaN?17.0?18.0

red???NaN?22.0?33.0??NaN??NaN

white?13.0?22.0?16.0??NaN??NaN

10.index對象

在Series和DataFrame中index聲明后不可改變

11.index對象的方法

idmin()和idmax()函數(shù)分別返回索引值最小和最大的元素

12.含有重復(fù)標(biāo)簽的index

>>> import pandas as pd

>>> s=pd.Series(range(6),index=['a','a','b','c','c','d'])

>>> s

a??0

a??1

b??2

c??3

c??4

d??5

dtype: int64

>>> s['a']

a??0

a??1

dtype: int64

>>> s.index.is_unique #用來判斷索引中是否有重復(fù)的索引

False

13.更換索引

pandas的reindex函數(shù)可更換Series對象的索引野舶,根據(jù)新標(biāo)簽序列,重新調(diào)整原來Series的元素宰衙,生成一個新的Series對象

更換索引時(shí)平道,可以調(diào)整所以序列中各標(biāo)簽的順序,刪除或增加新標(biāo)簽

>>> import pandas as pd

>>> s=pd.Series([1,2,3,4],index=['a','b','c','d'])

>>> s

a??1

b??2

c??3

d??4

dtype: int64

>>> s.reindex(['e','f','g','b'])

e??NaN

f??NaN

g??NaN

b??2.0

dtype: float64

然而通過上述reindex的方式重新定義索引對于龐大的DataFrame不太適應(yīng)供炼,可以采用自動填充或插值的方法

如下:

>>> import pandas as pd

>>> s=pd.Series([1,5,6,3],index=[0,3,5,6])

>>> s

0??1

3??5

5??6

6??3

dtype: int64

>>> s.reindex(range(6),method='ffill')#讓對s這個對象的索引從0-5開始重新定義索引一屋,ffill告訴系統(tǒng)新增索引對應(yīng)值取比他小的那個索引對應(yīng)的值

0??1

1??1

2??1

3??5

4??5

5??6

dtype: int64

>>>

>>> s=pd.Series([1,5,6,3],index=[0,3,5,6])

>>> s

0??1

3??5

5??6

6??3

dtype: int64

>>> s.reindex(range(6),method='bfill')#bfill告訴系統(tǒng)新增索引的值用它后一個索引的元素值填充

0??1

1??5

2??5

3??5

4??6

5??6

dtype: int64

>>> dic={'colors':['blue','green','yellow','red','white'],'price':[1.2,1.0,0.6,0.9,1.7],'object':['ballpand','pen','pencil','paper','mug']}#定義一個嵌套字典

>>> dic

{'object': ['ballpand', 'pen', 'pencil', 'paper', 'mug'], 'price': [1.2, 1.0, 0.6, 0.9, 1.7], 'colors': ['blue', 'green', 'yellow', 'red', 'white']}

>>> s=pd.DataFrame(dic)#用嵌套字典定義s這個對象

>>> s

??colors??object?price

0??blue?ballpand??1.2

1??green????pen??1.0

2?yellow??pencil??0.6

3???red???paper??0.9

4??white????mug??1.7

>>> s.reindex(range(5),method='ffill',columns=['colors','price','new','object'])#補(bǔ)充new這個列索引

??colors?price???new??object

0??blue??1.2??blue?ballpand

1??green??1.0??green????pen

2?yellow??0.6?yellow??pencil

3???red??0.9???red???paper

4??white??1.7??white????mug

>>> s=pd.DataFrame(dic,index=[1,2,3,5,7] )#自定義一個索引的DataFrame對象

>>> s

??colors??object?price

1??blue?ballpand??1.2

2??green????pen??1.0

3?yellow??pencil??0.6

5???red???paper??0.9

7??white????mug??1.7

>>> s.reindex(range(5),method='ffill')#重定義行索引

??colors??object?price

0???NaN????NaN??NaN

1??blue?ballpand??1.2

2??green????pen??1.0

3?yellow??pencil??0.6

4?yellow??pencil??0.6

14.刪除索引

1)刪除Series中一項(xiàng)

2)刪除Series中多項(xiàng)窘疮,需要將多項(xiàng)組合成數(shù)組放入drop函數(shù)中

3)刪除DataFrame中某幾行

4)刪除DataFrame中列:需要加入axis值=1代表列

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series(np.arange(4),index=['red','blue','yellow','white'])

>>> s

red????0

blue???1

yellow??2

white???3

dtype: int64

>>> s.drop('yellow')#刪除Series中某個索引極其對應(yīng)元素

red???0

blue???1

white??3

dtype: int64

>>> s.drop(['red','white'])#刪除Series中多個索引

blue???1

yellow??2

dtype: int64

>>> frame=pd.DataFrame(np.arange(16).reshape(4,4),index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame.drop(['blue','yellow'])#刪除DataFrame中多個行

????ball?pen?pencil?paper

red????0??1????2???3

white??12??13???14???15

>>> frame.drop(['pen','pencil'],axis=1)#刪除DataFrame中多個列,需要指定axis=1

????ball?paper

red????0???3

blue????4???7

yellow???8???11

white???12???15

15.算術(shù)和數(shù)據(jù)對齊

1)兩個Series對象相加

>>> import pandas as pd

>>> s1=pd.Series([3,2,5,1],['white','yellow','green','blue'])

>>> s2=pd.Series([1,4,7,2,1],index=['white','yellow','black','blue','brown'])

>>> s1

white???3

yellow??2

green???5

blue???1

dtype: int64

>>> s2

white???1

yellow??4

black???7

blue???2

brown???1

dtype: int64

>>> s1+s2

black???NaN

blue???3.0

brown???NaN

green???NaN

white???4.0

yellow??6.0

dtype: float64

2)兩個DataFrame對象相加

>>> import numpy as np

>>> frame1=pd.DataFrame(np.arange(16).reshape(4,4),index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame2=pd.DataFrame(np.arange(12).reshape(4,3),index=['blue','green','white','yellow'],columns=['mug','pen','ball'])

>>> frame1

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame2

????mug?pen?ball

blue???0??1???2

green???3??4???5

white???6??7???8

yellow??9??10??11

>>> frame1+frame2

????ball?mug?paper??pen?pencil

blue???6.0?NaN??NaN??6.0???NaN

green??NaN?NaN??NaN??NaN???NaN

red???NaN?NaN??NaN??NaN???NaN

white??20.0?NaN??NaN?20.0???NaN

yellow?19.0?NaN??NaN?19.0???NaN

上述也可以使用如下的函數(shù)方法:

1)Series之間相加

2)DataFrame之間相加

>>> s1.add(s2)

black???NaN

blue???3.0

brown???NaN

green???NaN

white???4.0

yellow??6.0

dtype: float64

>>> frame1.add(frame2)

????ball?mug?paper??pen?pencil

blue???6.0?NaN??NaN??6.0???NaN

green??NaN?NaN??NaN??NaN???NaN

red???NaN?NaN??NaN??NaN???NaN

white??20.0?NaN??NaN?20.0???NaN

yellow?19.0?NaN??NaN?19.0???NaN

16.DataFramehe Series之間的運(yùn)算

1)Series的索引=DataFrame的列名

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series([1,2,3,4],index=['a','b','c','d'])

>>> frame=pd.DataFrame(np.arange(16).reshape(4,4),columns=['a','b','c','d'])

>>> s

a??1

b??2

c??3

d??4

dtype: int64

>>> frame

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> s+frame #frame的每一列都加上s的對應(yīng)索引的對應(yīng)值

??a??b??c??d

0??1??3??5??7

1??5??7??9?11

2??9?11?13?15

3?13?15?17?19

>>> frame-s #frame的每一列都加上s的對應(yīng)索引的對應(yīng)值

??a??b??c??d

0?-1?-1?-1?-1

1??3??3??3??3

2??7??7??7??7

3?11?11?11?11

2)Series的索引陆淀!=DataFrame的列名

>>> frame2=pd.DataFrame(np.arange(16).reshape(4,4),columns=['b','d','e','c'])

>>> s

a??1

b??2

c??3

d??4

dtype: int64

>>> frame2

??b??d??e??c

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> s+frame2

??a???b???c???d??e

0 NaN??2.0??6.0??5.0 NaN

1 NaN??6.0?10.0??9.0 NaN

2 NaN?10.0?14.0?13.0 NaN

3 NaN?14.0?18.0?17.0 NaN

>>> frame2-s

??a???b???c??d??e

0 NaN?-2.0??0.0 -3.0 NaN

1 NaN??2.0??4.0?1.0 NaN

2 NaN??6.0??8.0?5.0 NaN

3 NaN?10.0?12.0?9.0 NaN

17.對DataFrame的每個元素求平方根考余,利用numpy的sqrt函數(shù)

>>> frame

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> np.sqrt(frame)

?????a?????b?????c?????d

0?0.000000?1.000000?1.414214?1.732051

1?2.000000?2.236068?2.449490?2.645751

2?2.828427?3.000000?3.162278?3.316625

3?3.464102?3.605551?3.741657?3.872983

18.按行或列執(zhí)行操作的函數(shù)

1)按列對DataFrame每一列進(jìn)行套用自定義函數(shù)

2)按行對DataFrame每一行進(jìn)行套用自定義函數(shù)

>>> f=lambda x:x.max()-x.min()

>>> frame.apply(f) #函數(shù)參數(shù)是DataFrame中的每一列

a??12

b??12

c??12

d??12

dtype: int64

>>> frame.apply(f,axis=1)#axis=1代表f參數(shù)是DataFrame的每一行

0??3

1??3

2??3

3??3

dtype: int64

3)利用apply套用函數(shù)對某個DataFrame處理成另一個Dataframe,從而實(shí)現(xiàn)多維度計(jì)算

>>> f=lambda x:pd.Series([x.min(),x.max()],index=['min','max'])定義一個函數(shù)轧苫,函數(shù)的參數(shù)x是某DataFrame的一列楚堤,f然會一個Series對象,索引是min和max值是DaraFrame列的最大值和最小值

>>> frame.apply(f)#對frame這個Dataframe套用f函數(shù)含懊,對每一列計(jì)算后都會有一個Series對象身冬,所有的列的Series對象組合成為一個DataFrame對象產(chǎn)出

???a??b??c??d

min??0??1??2??3

max?12?13?14?15

19.統(tǒng)計(jì)函數(shù)

數(shù)組的大多數(shù)統(tǒng)計(jì)函數(shù)對DataFrame依舊有效

>>> frame

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> frame.sum()

a??24

b??28

c??32

d??36

dtype: int64

>>> frame.mean()

a??6.0

b??7.0

c??8.0

d??9.0

dtype: float64

>>> frame.describe()

????????a?????b?????c?????d

count??4.000000??4.000000??4.000000??4.000000

mean??6.000000??7.000000??8.000000??9.000000

std???5.163978??5.163978??5.163978??5.163978

min???0.000000??1.000000??2.000000??3.000000

25%???3.000000??4.000000??5.000000??6.000000

50%???6.000000??7.000000??8.000000??9.000000

75%???9.000000?10.000000?11.000000?12.000000

max??12.000000?13.000000?14.000000?15.000000

>>> frame.sum(axis=1)#要想對行進(jìn)行套用統(tǒng)計(jì)函數(shù),需要指定axis=1

0???6

1??22

2??38

3??54

dtype: int64

20.排序和排位次

1)Series對象的排序

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series([5,0,3,8,4],index=['red','blue','yellow','white','green'])

>>> s

red????5

blue???0

yellow??3

white???8

green???4

dtype: int64

>>> s.sort_index()#按照索引的A-z排序

blue???0

green???4

red????5

white???8

yellow??3

dtype: int64

>>> s.sort_index(ascending=False)#ascending參數(shù)代表指定是否是降序

yellow??3

white???8

red????5

green???4

blue???0

dtype: int64

2)DataFrame對象的排序

>>> import numpy as np

>>> import pandas as pd

>>> frame=pd.DataFrame(np.arange(16).reshape(4,4),index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame.sort_index()#默認(rèn)按照行索引進(jìn)行排序岔乔,就是按照blue酥筝、red、white雏门、yellow排序

????ball?pen?pencil?paper

blue????4??5????6???7

red????0??1????2???3

white???12??13???14???15

yellow???8??9???10???11

>>> frame.sort_index(axis=1)#axis=1說明按照列索引排序嘿歌,按照ball、paper茁影、pen宙帝、pencil排序是整列整列的換位置

????ball?paper?pen?pencil

red????0???3??1????2

blue????4???7??5????6

yellow???8???11??9???10

white???12???15??13???14

21以上都是對索引進(jìn)行排序以下對對象中內(nèi)容進(jìn)行排序

1)對Series中元素內(nèi)容進(jìn)行排序

s.order()

2)對DataFrame中元素內(nèi)容進(jìn)行排序

>>> frame

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame.sort_index(by='pen')

__main__:1: FutureWarning: by argument to sort_index is deprecated, please use .sort_values(by=...)

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

22.相關(guān)性和協(xié)方差

1)兩個Series對象之間的相關(guān)性和協(xié)方差

>>> import numpy as np

>>> import pandas as pd

>>> s1=pd.Series([3,4,3,4,5,4,3,2])

>>> s2=pd.Series([1,2,3,4,4,3,2,1])

>>> s1

0??3

1??4

2??3

3??4

4??5

5??4

6??3

7??2

dtype: int64

>>> s2

0??1

1??2

2??3

3??4

4??4

5??3

6??2

7??1

dtype: int64

>>> s1.corr(s2) #相關(guān)性

0.7745966692414834

>>> s1.cov(s2)#協(xié)方差

0.8571428571428571

2)單個DataFrame的相關(guān)性和協(xié)方差

>>> frame=pd.DataFrame([[1,4,3,6],[4,5,6,1],[3,3,1,5],[4,1,6,4]],index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame

????ball?pen?pencil?paper

red????1??4????3???6

blue????4??5????6???1

yellow???3??3????1???5

white???4??1????6???4

>>> frame.corr()

??????ball????pen??pencil???paper

ball??1.000000 -0.276026?0.577350 -0.763763

pen??-0.276026?1.000000 -0.079682 -0.361403

pencil?0.577350 -0.079682?1.000000 -0.692935

paper?-0.763763 -0.361403 -0.692935?1.000000

>>> frame.cov()

??????ball????pen??pencil???paper

ball??2.000000 -0.666667?2.000000 -2.333333

pen??-0.666667?2.916667 -0.333333 -1.333333

pencil?2.000000 -0.333333?6.000000 -3.666667

paper?-2.333333 -1.333333 -3.666667?4.666667

3)DataFrame對象的行或者列與Series對象或其他DataFrame對象元素兩兩之間的相關(guān)性

>>> s

red????5

blue???0

yellow??3

white???8

green???4

dtype: int64

>>> frame

????ball?pen?pencil?paper

red????1??4????3???6

blue????4??5????6???1

yellow???3??3????1???5

white???4??1????6???4

>>> frame.corrwith(s)

ball???-0.140028

pen???-0.869657

pencil??0.080845

paper???0.595854

dtype: float64

23.為元素賦NaN值

>>> s=pd.Series([1,2,np.NaN,3])

>>> s

0??1.0

1??2.0

2??NaN

3??3.0

dtype: float64

24.過濾NaN

>>> s

0??1.0

1??2.0

2??NaN

3??3.0

dtype: float64

>>> s.dropna()#利用dropna函數(shù)

0??1.0

1??2.0

3??3.0

dtype: float64

>>>

或者用以下方法:利用notnull方法

>>> s=pd.Series([1,2,np.NaN,3])

>>> s

0??1.0

1??2.0

2??NaN

3??3.0

dtype: float64

>>> s[s.notnull()]

0??1.0

1??2.0

3??3.0

dtype: float64:使用dropna()方法只要行或者列有一個NaN元素,該行或列的全部元素都會被刪除

>>> frame=pd.DataFrame([[6,np.NaN,6],[np.NaN,np.NaN,np.NaN],[2,np.NaN,5]],index=['blue','green','red'],columns=['ball','mug','pen'])

>>> frame

????ball?mug?pen

blue??6.0?NaN?6.0

green??NaN?NaN?NaN

red???2.0?NaN?5.0

>>> frame.dropna()

Empty DataFrame

Columns: [ball, mug, pen]

Index: []

因此為了防止避免刪除整行或整列募闲,需要使用how選項(xiàng)步脓,值位all,告知dropna函數(shù)只刪除所有元素都是NaN的行或者列

>>> frame=pd.DataFrame([[6,np.NaN,6],[np.NaN,np.NaN,np.NaN],[2,np.NaN,5]],index=['blue','green','red'],columns=['ball','mug','pen'])

>>> frame

????ball?mug?pen

blue??6.0?NaN?6.0

green??NaN?NaN?NaN

red???2.0?NaN?5.0

>>> frame.dropna(how='all')

???ball?mug?pen

blue??6.0?NaN?6.0

red??2.0?NaN?5.0

25.為NaN元素填充其他值

1)將所有的NAN替換成同一個元素浩螺,利用fillna函數(shù)

>>> frame=pd.DataFrame([[6,np.NaN,6],[np.NaN,np.NaN,np.NaN],[2,np.NaN,5]],index=['blue','green','red'],columns=['ball','mug','pen'])

>>> frame

????ball?mug?pen

blue??6.0?NaN?6.0

green??NaN?NaN?NaN

red???2.0?NaN?5.0

>>> frame.fillna(0)

????ball?mug?pen

blue??6.0?0.0?6.0

green??0.0?0.0?0.0

red???2.0?0.0?5.0

2)將不同列的NaN替換成不同的元素:需要依次指定列名及要替換成的元素即可

>>> frame.fillna('ball':1,'mug':2,'pen':8)

26.等級索引和分級

1)創(chuàng)建帶有等級索引的Series對象

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series(np.random.rand(8),index=[['a','a','a','b','b','c','c','c'],['up','down','right','up','down','up','down','left']])

>>> s

a?up????0.587733

??down???0.425383

??right??0.356205

b?up????0.251802

??down???0.105830

c?up????0.253041

??down???0.140155

??left???0.425004

dtype: float64

2)展示帶有等級索引Series對象的index屬性

>>> s.index

MultiIndex(levels=[['a', 'b', 'c'], ['down', 'left', 'right', 'up']],

??????labels=[[0, 0, 0, 1, 1, 2, 2, 2], [3, 0, 2, 3, 0, 3, 0, 1]])

3)選取帶有等級索引的Series對象的第一級索引對應(yīng)的元素

>>> s['a']

up????0.587733

down???0.425383

right??0.356205

dtype: float64

4)選取帶有等級索引的Series對象的第二級索引對應(yīng)的元素

>>> s[:,'up'] #一定記得有個逗號

a??0.587733

b??0.251802

c??0.253041

dtype: float64

5)選取帶有等級索引的Series對象的某個具體的元素

>>> s['a','up']

0.5877327517004284

6)將帶有等級索引的Series對象改變成一個DataFrame對象

>>> s.unstack()

????down???left???right????up

a?0.425383????NaN?0.356205?0.587733

b?0.105830????NaN????NaN?0.251802

c?0.140155?0.425004????NaN?0.253041

7)將一個DataFrame對象改變成一個帶有等級索引給的Series對象

>>> frame

????down???left???right????up

a?0.425383????NaN?0.356205?0.587733

b?0.105830????NaN????NaN?0.251802

c?0.140155?0.425004????NaN?0.253041

>>> frame.stack()

a?down???0.425383

??right??0.356205

??up????0.587733

b?down???0.105830

??up????0.251802

c?down???0.140155

??left???0.425004

??up????0.253041

dtype: float64

8)定義一個index和columns都是等級的DataFrame對象

>>> frame=pd.DataFrame(np.random.randn(16).reshape(4,4),index=[['white','white','red','red'],['up','down','up','down']],columns=[['pen','pen','paper','paper'],[1,2,1,2]])

>>> frame

?????????pen????????paper

??????????1?????2?????1?????2

white up??-0.487631?0.200648?0.344613?0.144835

???down?0.246683 -0.847063 -0.391592 -0.091928

red??up??-0.132962 -1.728167?1.787231?0.374895

???down -1.033622?0.354458?0.007813 -1.203889

27.重新調(diào)整順序和為層級排序

>>> frame

?????????pen????????paper

??????????1?????2?????1?????2

white up??-0.487631?0.200648?0.344613?0.144835

???down?0.246683 -0.847063 -0.391592 -0.091928

red??up??-0.132962 -1.728167?1.787231?0.374895

???down -1.033622?0.354458?0.007813 -1.203889

>>> frame.index.names=['colors','status']

>>> frame.columns.names=['objects','id']

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.swaplevel('colors','status')#交換colors和status兩列層級順序

objects???????pen????????paper

id??????????1?????2?????1?????2

status colors

up???white?-0.487631?0.200648?0.344613?0.144835

down??white??0.246683 -0.847063 -0.391592 -0.091928

up???red??-0.132962 -1.728167?1.787231?0.374895

down??red??-1.033622?0.354458?0.007813 -1.203889

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.sortlevel()#使用sortlevel對colots的所有進(jìn)行首字母的順序排列

__main__:1: FutureWarning: sortlevel is deprecated, use sort_index(level= ...)

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

red??down??-1.033622?0.354458?0.007813 -1.203889

????up???-0.132962 -1.728167?1.787231?0.374895

white?down??0.246683 -0.847063 -0.391592 -0.091928

????up???-0.487631?0.200648?0.344613?0.144835

28.按層級統(tǒng)計(jì)數(shù)據(jù)

1)按照某一行層級統(tǒng)計(jì)靴患,將層級名稱賦值給level,level作為統(tǒng)計(jì)函數(shù)的參數(shù)

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.sum(level='colors')#對colors這個行層級進(jìn)行sum處理

objects????pen????????paper

id???????1?????2?????1?????2

colors

white??-0.240947 -0.646416 -0.046978?0.052907

red???-1.166584 -1.373709?1.795044 -0.828994

2)想要對某一列層級

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.sum(level='id',axis=1) #對id這個列層級進(jìn)行sum處理要出,用axis=1標(biāo)識對列處理

id??????????1?????2

colors status

white?up???-0.143017?0.345483

????down??-0.144909 -0.938991

red??up???1.654270 -1.353272

????down??-1.025809 -0.849432

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末鸳君,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子厨幻,更是在濱河造成了極大的恐慌相嵌,老刑警劉巖,帶你破解...
    沈念sama閱讀 206,602評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件况脆,死亡現(xiàn)場離奇詭異,居然都是意外死亡批糟,警方通過查閱死者的電腦和手機(jī)格了,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,442評論 2 382
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來徽鼎,“玉大人盛末,你說我怎么就攤上這事弹惦。” “怎么了悄但?”我有些...
    開封第一講書人閱讀 152,878評論 0 344
  • 文/不壞的土叔 我叫張陵棠隐,是天一觀的道長。 經(jīng)常有香客問我檐嚣,道長助泽,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,306評論 1 279
  • 正文 為了忘掉前任嚎京,我火速辦了婚禮嗡贺,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘鞍帝。我一直安慰自己诫睬,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,330評論 5 373
  • 文/花漫 我一把揭開白布帕涌。 她就那樣靜靜地躺著摄凡,像睡著了一般。 火紅的嫁衣襯著肌膚如雪蚓曼。 梳的紋絲不亂的頭發(fā)上亲澡,一...
    開封第一講書人閱讀 49,071評論 1 285
  • 那天,我揣著相機(jī)與錄音辟躏,去河邊找鬼谷扣。 笑死,一個胖子當(dāng)著我的面吹牛捎琐,可吹牛的內(nèi)容都是我干的会涎。 我是一名探鬼主播,決...
    沈念sama閱讀 38,382評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼瑞凑,長吁一口氣:“原來是場噩夢啊……” “哼末秃!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起籽御,我...
    開封第一講書人閱讀 37,006評論 0 259
  • 序言:老撾萬榮一對情侶失蹤练慕,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后技掏,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體铃将,經(jīng)...
    沈念sama閱讀 43,512評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,965評論 2 325
  • 正文 我和宋清朗相戀三年哑梳,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了劲阎。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,094評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡鸠真,死狀恐怖悯仙,靈堂內(nèi)的尸體忽然破棺而出龄毡,到底是詐尸還是另有隱情,我是刑警寧澤锡垄,帶...
    沈念sama閱讀 33,732評論 4 323
  • 正文 年R本政府宣布沦零,位于F島的核電站,受9級特大地震影響货岭,放射性物質(zhì)發(fā)生泄漏路操。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,283評論 3 307
  • 文/蒙蒙 一茴她、第九天 我趴在偏房一處隱蔽的房頂上張望寻拂。 院中可真熱鬧,春花似錦丈牢、人聲如沸祭钉。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,286評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽慌核。三九已至,卻和暖如春申尼,著一層夾襖步出監(jiān)牢的瞬間垮卓,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,512評論 1 262
  • 我被黑心中介騙來泰國打工师幕, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留粟按,地道東北人。 一個月前我還...
    沈念sama閱讀 45,536評論 2 354
  • 正文 我出身青樓霹粥,卻偏偏與公主長得像灭将,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子后控,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,828評論 2 345

推薦閱讀更多精彩內(nèi)容