Python數(shù)據(jù)分析-07

1.DataFrame對象

按照一定順序排列多列數(shù)據(jù)，各列數(shù)據(jù)類型可以有所不同

DataFrame對象有兩個索引數(shù)組黑毅，第一個數(shù)組與行相關(guān)，它與Series的索引數(shù)組極為相似，每個標(biāo)簽與標(biāo)簽所在行的所有元素相關(guān)聯(lián)晒骇，第二個數(shù)組包含一系列標(biāo)簽，每個標(biāo)簽與一列數(shù)據(jù)相關(guān)聯(lián)

DataFrame可以理解為一個由Series組成的字典磺浙，其中每一列的名稱為字典的鍵洪囤，形成DataFrame的列的Series作為字典的值

2定義DateFrame對象

新建dataFrame最常用的方法是傳遞一個dict對象給DataFrame（）構(gòu)造函數(shù)

dictd對象的每一列名稱作為鍵，每個鍵都有一個數(shù)組作為值

1）將字典的每個鍵值對都放入DataFrame中

>>> import pandas as pd? #引入pandas包

>>> dict={'colors':['red','blue','yellow','black'],'object':['pen','paper','ball','mug'],'price':[1.1,1.2,3.2,4]}? #定義一個字典撕氧，每個鍵是以后DataFrame對象的列名箍鼓，每個鍵對應(yīng)的值是以后DataFrame列的元素內(nèi)容

>>> dict

{'object': ['pen', 'paper', 'ball', 'mug'], 'price': [1.1, 1.2, 3.2, 4], 'colors': ['red', 'blue', 'yellow', 'black']}

>>> s=pd.DataFrame(dict) #利用DataFrame的構(gòu)造函數(shù)，將dict的內(nèi)容放入DataFrame中

>>> s?

??colors object?price

0???red??pen??1.1

1??blue?paper??1.2

2?yellow??ball??3.2

3??black??mug??4.0

2）挑選字典中部分?jǐn)?shù)據(jù)對用來初始化DataFrame對象

>>> import pandas as pd #導(dǎo)入pandas包

>>> dic={'colos':['red','black','yellow','orange'],'object':['pen','ball','shirt','mug'],'price':[1.2,3.4,2.3,5]} #定義字典

>>> dic

{'object': ['pen', 'ball', 'shirt', 'mug'], 'price': [1.2, 3.4, 2.3, 5], 'colos': ['red', 'black', 'yellow', 'orange']}

>>> s=pd.DataFrame(dic,columns=['price','object']) #用字典來初始化DataFrame對象并且只選擇兩列數(shù)據(jù)呵曹，且順序按照我選擇的來ding

>>> s

??price object

0??1.2??pen

1??3.4??ball

2??2.3?shirt

3??5.0??mug

3）對DataFrame對象進(jìn)行自定義索引（上面的例子都是不定義款咖，系統(tǒng)默認(rèn)從0開始定義）

4）不使用字典，使用構(gòu)造函數(shù)三個參數(shù)來進(jìn)行定義DataFrame

指定三個參數(shù)奄喂，順序：數(shù)據(jù)矩陣铐殃、index選項(xiàng)、columns選項(xiàng)跨新、將存放標(biāo)簽的數(shù)組賦給index富腊，將存放列名的數(shù)組賦值給columns選項(xiàng)、可使用np.arange(16).reshape(4,4)快捷生成矩陣

>>> import numpy as np

>>> import pandas as pd

>>> arry=np.arange(16)

>>> arry

array([ 0,?1,?2,?3,?4,?5,?6,?7,?8,?9, 10, 11, 12, 13, 14, 15])

>>> arry=np.arange(16).reshape(4,4)

>>> arry

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

>>> s=pd.DataFrame(arry,index=['a','b','c','d'],columns=['A','B','C','D'])

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

3.選取元素

1）要想知道DataFrame的所有列的名稱域帐，對它調(diào)用columns屬性即可

2）要想獲取DataFrame的索引列表赘被，調(diào)用index熟悉即可

3）想要獲取數(shù)據(jù)結(jié)構(gòu)中的元素，使用values熟悉獲取即可

>>> import numpy as np

>>> import pandas as pd

>>> arry=np.arange(16)

>>> arry

array([ 0,?1,?2,?3,?4,?5,?6,?7,?8,?9, 10, 11, 12, 13, 14, 15])

>>> arry=np.arange(16).reshape(4,4)

>>> arry

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

>>> s=pd.DataFrame(arry,columns=['a','b','c','d'])

>>> s

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> s=pd.DataFrame(arry,index=['a','b','c','d'],columns=['A','B','C','D'])

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

>>> s.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

>>> s.values

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

4）如果想要獲取一列元素內(nèi)容肖揣，把這一列名稱作為所以即可民假，或者是調(diào)用這個列名的屬性方法

第一種方法

>>> s['B']

a???1

b???5

c???9

d??13

Name: B, dtype: int64

第二種方法

>>> s.B

a???1

b???5

c???9

d??13

Name: B, dtype: int64

5）獲取DataFrame某一行數(shù)據(jù)，利用ix熟悉的索引值獲取

獲取單行

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.ix[2]

A???8

B???9

C??10

D??11

Name: c, dtype: int64

>>> s.ix['c']

A???8

B???9

C??10

D??11

Name: c, dtype: int64

獲取多行（非連續(xù)）

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.ix[[1,3]]

??A??B??C??D

b??4??5??6??7

d?12?13?14?15

>>> s.ix[['b','d']]

??A??B??C??D

b??4??5??6??7

d?12?13?14?15

獲取多行（連續(xù)）

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.ix[0:3]

??A?B??C??D

a?0?1??2??3

b?4?5??6??7

c?8?9?10?11

>>> s.ix['a':'c']

??A?B??C??D

a?0?1??2??3

b?4?5??6??7

c?8?9?10?11

獲取某個元素

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s['A'][1] #注意一定要先寫列【A】在寫行【1】

4.賦值

1）給index和columns指定name

>>> import numpy as np

>>> import pandas as pd

>>> arry=np.arange(16)

>>> arry

array([ 0,?1,?2,?3,?4,?5,?6,?7,?8,?9, 10, 11, 12, 13, 14, 15])

>>> arry=np.arange(16).reshape(4,4)

>>> arry

array([[ 0,?1,?2,?3],

????[ 4,?5,?6,?7],

????[ 8,?9, 10, 11],

????[12, 13, 14, 15]])

>>> s=pd.DataFrame(arry,index=['a','b','c','d'],columns=['A','B','C','D'])

>>> s

??A??B??C??D

a??0??1??2??3

b??4??5??6??7

c??8??9?10?11

d?12?13?14?15

>>> s.index.name=id

>>> s.columns.name='item'

>>> s

item???????????A??B??C??D

a????????????0??1??2??3

b????????????4??5??6??7

c????????????8??9?10?11

d????????????12?13?14?15

>>> s.index.name='id'

>>> s

item??A??B??C??D

a???0??1??2??3

b???4??5??6??7

c???8??9?10?11

d???12?13?14?15

2）添加一列新元素

>>> s

item??A??B??C??D

a???0??1??2??3

b???4??5??6??7

c???8??9?10?11

d???12?13?14?15

>>> s['E']=12

>>> s

item??A??B??C??D??E

a???0??1??2??3?12

b???4??5??6??7?12

c???8??9?10?11?12

d???12?13?14?15?12

3）給已經(jīng)有的一列更新元素值

>>> s

item??A??B??C??D??E

a???0??1??2??3?12

b???4??5??6??7?12

c???8??9?10?11?12

d???12?13?14?15?12

>>> s['E']=[3,5,2,6]

>>> s

item??A??B??C??D?E

a???0??1??2??3?3

b???4??5??6??7?5

c???8??9?10?11?2

d???12?13?14?15?6

5.元素的所屬關(guān)系

>>> s

item??A??B??C??D??E??F

a???0??1??2??3 NaN NaN

b???4??5??6??7 NaN NaN

c???8??9?10?11 NaN NaN

d???12?13?14?15 NaN NaN

>>> s.isin([1,4])

item???A???B???C???D???E???F

a???False??True?False?False?False?False

b???True?False?False?False?False?False

c???False?False?False?False?False?False

d???False?False?False?False?False?False

>>> s[s.isin([1,4])]

item??A??B??C??D??E??F

a???NaN?1.0 NaN NaN NaN NaN

b???4.0?NaN NaN NaN NaN NaN

c???NaN?NaN NaN NaN NaN NaN

d???NaN?NaN NaN NaN NaN NaN

6.刪除一列

>>> s

item??A??B??C??D??E??F

a???0??1??2??3 NaN NaN

b???4??5??6??7 NaN NaN

c???8??9?10?11 NaN NaN

d???12?13?14?15 NaN NaN

>>> del s['E']

>>> s

item??A??B??C??D??F

a???0??1??2??3 NaN

b???4??5??6??7 NaN

c???8??9?10?11 NaN

d???12?13?14?15 NaN

7.篩選

>>> s

item??A??B??C??D??F

a???0??1??2??3 NaN

b???4??5??6??7 NaN

c???8??9?10?11 NaN

d???12?13?14?15 NaN

>>> s[s<3]

item??A??B??C??D??F

a???0.0?1.0?2.0 NaN NaN

b???NaN?NaN?NaN NaN NaN

c???NaN?NaN?NaN NaN NaN

d???NaN?NaN?NaN NaN NaN

8.用嵌套字典生成DataFrame對象

將嵌套字典作為參數(shù)傳遞給DataFrame的構(gòu)造函數(shù)龙优，pandas就會將內(nèi)部的鍵作為列名羊异，將外部的鍵作為索引名，并非所有位置都有相應(yīng)的元素存在彤断，pandas會用NaN填充

>>> import pandas as pd

>>> dic={'red':{2012:22,2013:33},'white':{2011:13,2012:22,2013:16},'blue':{2017:17,2012:23,2018:18}}

>>> dic

{'blue': {2017: 17, 2018: 18, 2012: 23}, 'white': {2011: 13, 2012: 22, 2013: 16}, 'red': {2012: 22, 2013: 33}}

>>> s=pd.DataFrame(dic)

>>> s

???blue??red?white

2011??NaN??NaN??13.0

2012?23.0?22.0??22.0

2013??NaN?33.0??16.0

2017?17.0??NaN??NaN

2018?18.0??NaN??NaN

9.DataFrame轉(zhuǎn)置

>>> s

???blue??red?white

2011??NaN??NaN??13.0

2012?23.0?22.0??22.0

2013??NaN?33.0??16.0

2017?17.0??NaN??NaN

2018?18.0??NaN??NaN

>>> s.T #調(diào)用T方法就行

????2011?2012?2013?2017?2018

blue??NaN?23.0??NaN?17.0?18.0

red???NaN?22.0?33.0??NaN??NaN

white?13.0?22.0?16.0??NaN??NaN

10.index對象

在Series和DataFrame中index聲明后不可改變

11.index對象的方法

idmin（）和idmax（）函數(shù)分別返回索引值最小和最大的元素

12.含有重復(fù)標(biāo)簽的index

>>> import pandas as pd

>>> s=pd.Series(range(6),index=['a','a','b','c','c','d'])

>>> s

a??0

a??1

b??2

c??3

c??4

d??5

dtype: int64

>>> s['a']

a??0

a??1

dtype: int64

>>> s.index.is_unique #用來判斷索引中是否有重復(fù)的索引

False

13.更換索引

pandas的reindex函數(shù)可更換Series對象的索引野舶，根據(jù)新標(biāo)簽序列，重新調(diào)整原來Series的元素宰衙，生成一個新的Series對象

更換索引時(shí)平道，可以調(diào)整所以序列中各標(biāo)簽的順序，刪除或增加新標(biāo)簽

>>> import pandas as pd

>>> s=pd.Series([1,2,3,4],index=['a','b','c','d'])

>>> s

a??1

b??2

c??3

d??4

dtype: int64

>>> s.reindex(['e','f','g','b'])

e??NaN

f??NaN

g??NaN

b??2.0

dtype: float64

然而通過上述reindex的方式重新定義索引對于龐大的DataFrame不太適應(yīng)供炼，可以采用自動填充或插值的方法

如下：

>>> import pandas as pd

>>> s=pd.Series([1,5,6,3],index=[0,3,5,6])

>>> s

0??1

3??5

5??6

6??3

dtype: int64

>>> s.reindex(range(6),method='ffill')#讓對s這個對象的索引從0-5開始重新定義索引一屋，ffill告訴系統(tǒng)新增索引對應(yīng)值取比他小的那個索引對應(yīng)的值

0??1

1??1

2??1

3??5

4??5

5??6

dtype: int64

>>>

>>> s=pd.Series([1,5,6,3],index=[0,3,5,6])

>>> s

0??1

3??5

5??6

6??3

dtype: int64

>>> s.reindex(range(6),method='bfill')#bfill告訴系統(tǒng)新增索引的值用它后一個索引的元素值填充

0??1

1??5

2??5

3??5

4??6

5??6

dtype: int64

>>> dic={'colors':['blue','green','yellow','red','white'],'price':[1.2,1.0,0.6,0.9,1.7],'object':['ballpand','pen','pencil','paper','mug']}#定義一個嵌套字典

>>> dic

{'object': ['ballpand', 'pen', 'pencil', 'paper', 'mug'], 'price': [1.2, 1.0, 0.6, 0.9, 1.7], 'colors': ['blue', 'green', 'yellow', 'red', 'white']}

>>> s=pd.DataFrame(dic)#用嵌套字典定義s這個對象

>>> s

??colors??object?price

0??blue?ballpand??1.2

1??green????pen??1.0

2?yellow??pencil??0.6

3???red???paper??0.9

4??white????mug??1.7

>>> s.reindex(range(5),method='ffill',columns=['colors','price','new','object'])#補(bǔ)充new這個列索引

??colors?price???new??object

0??blue??1.2??blue?ballpand

1??green??1.0??green????pen

2?yellow??0.6?yellow??pencil

3???red??0.9???red???paper

4??white??1.7??white????mug

>>> s=pd.DataFrame(dic,index=[1,2,3,5,7] )#自定義一個索引的DataFrame對象

>>> s

??colors??object?price

1??blue?ballpand??1.2

2??green????pen??1.0

3?yellow??pencil??0.6

5???red???paper??0.9

7??white????mug??1.7

>>> s.reindex(range(5),method='ffill')#重定義行索引

??colors??object?price

0???NaN????NaN??NaN

1??blue?ballpand??1.2

2??green????pen??1.0

3?yellow??pencil??0.6

4?yellow??pencil??0.6

14.刪除索引

1）刪除Series中一項(xiàng)

2）刪除Series中多項(xiàng)窘疮，需要將多項(xiàng)組合成數(shù)組放入drop函數(shù)中

3）刪除DataFrame中某幾行

4）刪除DataFrame中列：需要加入axis值=1代表列

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series(np.arange(4),index=['red','blue','yellow','white'])

>>> s

red????0

blue???1

yellow??2

white???3

dtype: int64

>>> s.drop('yellow')#刪除Series中某個索引極其對應(yīng)元素

red???0

blue???1

white??3

dtype: int64

>>> s.drop(['red','white'])#刪除Series中多個索引

blue???1

yellow??2

dtype: int64

>>> frame=pd.DataFrame(np.arange(16).reshape(4,4),index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame.drop(['blue','yellow'])#刪除DataFrame中多個行

????ball?pen?pencil?paper

red????0??1????2???3

white??12??13???14???15

>>> frame.drop(['pen','pencil'],axis=1)#刪除DataFrame中多個列，需要指定axis=1

????ball?paper

red????0???3

blue????4???7

yellow???8???11

white???12???15

15.算術(shù)和數(shù)據(jù)對齊

1）兩個Series對象相加

>>> import pandas as pd

>>> s1=pd.Series([3,2,5,1],['white','yellow','green','blue'])

>>> s2=pd.Series([1,4,7,2,1],index=['white','yellow','black','blue','brown'])

>>> s1

white???3

yellow??2

green???5

blue???1

dtype: int64

>>> s2

white???1

yellow??4

black???7

blue???2

brown???1

dtype: int64

>>> s1+s2

black???NaN

blue???3.0

brown???NaN

green???NaN

white???4.0

yellow??6.0

dtype: float64

2）兩個DataFrame對象相加

>>> import numpy as np

>>> frame1=pd.DataFrame(np.arange(16).reshape(4,4),index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame2=pd.DataFrame(np.arange(12).reshape(4,3),index=['blue','green','white','yellow'],columns=['mug','pen','ball'])

>>> frame1

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame2

????mug?pen?ball

blue???0??1???2

green???3??4???5

white???6??7???8

yellow??9??10??11

>>> frame1+frame2

????ball?mug?paper??pen?pencil

blue???6.0?NaN??NaN??6.0???NaN

green??NaN?NaN??NaN??NaN???NaN

red???NaN?NaN??NaN??NaN???NaN

white??20.0?NaN??NaN?20.0???NaN

yellow?19.0?NaN??NaN?19.0???NaN

上述也可以使用如下的函數(shù)方法：

1）Series之間相加

2)DataFrame之間相加

>>> s1.add(s2)

black???NaN

blue???3.0

brown???NaN

green???NaN

white???4.0

yellow??6.0

dtype: float64

>>> frame1.add(frame2)

????ball?mug?paper??pen?pencil

blue???6.0?NaN??NaN??6.0???NaN

green??NaN?NaN??NaN??NaN???NaN

red???NaN?NaN??NaN??NaN???NaN

white??20.0?NaN??NaN?20.0???NaN

yellow?19.0?NaN??NaN?19.0???NaN

16.DataFramehe Series之間的運(yùn)算

1）Series的索引=DataFrame的列名

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series([1,2,3,4],index=['a','b','c','d'])

>>> frame=pd.DataFrame(np.arange(16).reshape(4,4),columns=['a','b','c','d'])

>>> s

a??1

b??2

c??3

d??4

dtype: int64

>>> frame

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> s+frame #frame的每一列都加上s的對應(yīng)索引的對應(yīng)值

??a??b??c??d

0??1??3??5??7

1??5??7??9?11

2??9?11?13?15

3?13?15?17?19

>>> frame-s #frame的每一列都加上s的對應(yīng)索引的對應(yīng)值

??a??b??c??d

0?-1?-1?-1?-1

1??3??3??3??3

2??7??7??7??7

3?11?11?11?11

2）Series的索引陆淀！=DataFrame的列名

>>> frame2=pd.DataFrame(np.arange(16).reshape(4,4),columns=['b','d','e','c'])

>>> s

a??1

b??2

c??3

d??4

dtype: int64

>>> frame2

??b??d??e??c

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> s+frame2

??a???b???c???d??e

0 NaN??2.0??6.0??5.0 NaN

1 NaN??6.0?10.0??9.0 NaN

2 NaN?10.0?14.0?13.0 NaN

3 NaN?14.0?18.0?17.0 NaN

>>> frame2-s

??a???b???c??d??e

0 NaN?-2.0??0.0 -3.0 NaN

1 NaN??2.0??4.0?1.0 NaN

2 NaN??6.0??8.0?5.0 NaN

3 NaN?10.0?12.0?9.0 NaN

17.對DataFrame的每個元素求平方根考余，利用numpy的sqrt函數(shù)

>>> frame

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> np.sqrt(frame)

?????a?????b?????c?????d

0?0.000000?1.000000?1.414214?1.732051

1?2.000000?2.236068?2.449490?2.645751

2?2.828427?3.000000?3.162278?3.316625

3?3.464102?3.605551?3.741657?3.872983

18.按行或列執(zhí)行操作的函數(shù)

1）按列對DataFrame每一列進(jìn)行套用自定義函數(shù)

2）按行對DataFrame每一行進(jìn)行套用自定義函數(shù)

>>> f=lambda x:x.max()-x.min()

>>> frame.apply(f) #函數(shù)參數(shù)是DataFrame中的每一列

a??12

b??12

c??12

d??12

dtype: int64

>>> frame.apply(f,axis=1)#axis=1代表f參數(shù)是DataFrame的每一行

0??3

1??3

2??3

3??3

dtype: int64

3）利用apply套用函數(shù)對某個DataFrame處理成另一個Dataframe，從而實(shí)現(xiàn)多維度計(jì)算

>>> f=lambda x:pd.Series([x.min(),x.max()],index=['min','max'])定義一個函數(shù)轧苫，函數(shù)的參數(shù)x是某DataFrame的一列楚堤，f然會一個Series對象，索引是min和max值是DaraFrame列的最大值和最小值

>>> frame.apply(f)#對frame這個Dataframe套用f函數(shù)含懊，對每一列計(jì)算后都會有一個Series對象身冬，所有的列的Series對象組合成為一個DataFrame對象產(chǎn)出

???a??b??c??d

min??0??1??2??3

max?12?13?14?15

19.統(tǒng)計(jì)函數(shù)

數(shù)組的大多數(shù)統(tǒng)計(jì)函數(shù)對DataFrame依舊有效

>>> frame

??a??b??c??d

0??0??1??2??3

1??4??5??6??7

2??8??9?10?11

3?12?13?14?15

>>> frame.sum()

a??24

b??28

c??32

d??36

dtype: int64

>>> frame.mean()

a??6.0

b??7.0

c??8.0

d??9.0

dtype: float64

>>> frame.describe()

????????a?????b?????c?????d

count??4.000000??4.000000??4.000000??4.000000

mean??6.000000??7.000000??8.000000??9.000000

std???5.163978??5.163978??5.163978??5.163978

min???0.000000??1.000000??2.000000??3.000000

25%???3.000000??4.000000??5.000000??6.000000

50%???6.000000??7.000000??8.000000??9.000000

75%???9.000000?10.000000?11.000000?12.000000

max??12.000000?13.000000?14.000000?15.000000

>>> frame.sum(axis=1)#要想對行進(jìn)行套用統(tǒng)計(jì)函數(shù)，需要指定axis=1

0???6

1??22

2??38

3??54

dtype: int64

20.排序和排位次

1）Series對象的排序

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series([5,0,3,8,4],index=['red','blue','yellow','white','green'])

>>> s

red????5

blue???0

yellow??3

white???8

green???4

dtype: int64

>>> s.sort_index()#按照索引的A-z排序

blue???0

green???4

red????5

white???8

yellow??3

dtype: int64

>>> s.sort_index(ascending=False)#ascending參數(shù)代表指定是否是降序

yellow??3

white???8

red????5

green???4

blue???0

dtype: int64

2）DataFrame對象的排序

>>> import numpy as np

>>> import pandas as pd

>>> frame=pd.DataFrame(np.arange(16).reshape(4,4),index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame.sort_index()#默認(rèn)按照行索引進(jìn)行排序岔乔，就是按照blue酥筝、red、white雏门、yellow排序

????ball?pen?pencil?paper

blue????4??5????6???7

red????0??1????2???3

white???12??13???14???15

yellow???8??9???10???11

>>> frame.sort_index(axis=1)#axis=1說明按照列索引排序嘿歌，按照ball、paper茁影、pen宙帝、pencil排序是整列整列的換位置

????ball?paper?pen?pencil

red????0???3??1????2

blue????4???7??5????6

yellow???8???11??9???10

white???12???15??13???14

21以上都是對索引進(jìn)行排序以下對對象中內(nèi)容進(jìn)行排序

1）對Series中元素內(nèi)容進(jìn)行排序

s.order()

2）對DataFrame中元素內(nèi)容進(jìn)行排序

>>> frame

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

>>> frame.sort_index(by='pen')

__main__:1: FutureWarning: by argument to sort_index is deprecated, please use .sort_values(by=...)

????ball?pen?pencil?paper

red????0??1????2???3

blue????4??5????6???7

yellow???8??9???10???11

white???12??13???14???15

22.相關(guān)性和協(xié)方差

1）兩個Series對象之間的相關(guān)性和協(xié)方差

>>> import numpy as np

>>> import pandas as pd

>>> s1=pd.Series([3,4,3,4,5,4,3,2])

>>> s2=pd.Series([1,2,3,4,4,3,2,1])

>>> s1

0??3

1??4

2??3

3??4

4??5

5??4

6??3

7??2

dtype: int64

>>> s2

0??1

1??2

2??3

3??4

4??4

5??3

6??2

7??1

dtype: int64

>>> s1.corr(s2) #相關(guān)性

0.7745966692414834

>>> s1.cov(s2)#協(xié)方差

0.8571428571428571

2）單個DataFrame的相關(guān)性和協(xié)方差

>>> frame=pd.DataFrame([[1,4,3,6],[4,5,6,1],[3,3,1,5],[4,1,6,4]],index=['red','blue','yellow','white'],columns=['ball','pen','pencil','paper'])

>>> frame

????ball?pen?pencil?paper

red????1??4????3???6

blue????4??5????6???1

yellow???3??3????1???5

white???4??1????6???4

>>> frame.corr()

??????ball????pen??pencil???paper

ball??1.000000 -0.276026?0.577350 -0.763763

pen??-0.276026?1.000000 -0.079682 -0.361403

pencil?0.577350 -0.079682?1.000000 -0.692935

paper?-0.763763 -0.361403 -0.692935?1.000000

>>> frame.cov()

??????ball????pen??pencil???paper

ball??2.000000 -0.666667?2.000000 -2.333333

pen??-0.666667?2.916667 -0.333333 -1.333333

pencil?2.000000 -0.333333?6.000000 -3.666667

paper?-2.333333 -1.333333 -3.666667?4.666667

3）DataFrame對象的行或者列與Series對象或其他DataFrame對象元素兩兩之間的相關(guān)性

>>> s

red????5

blue???0

yellow??3

white???8

green???4

dtype: int64

>>> frame

????ball?pen?pencil?paper

red????1??4????3???6

blue????4??5????6???1

yellow???3??3????1???5

white???4??1????6???4

>>> frame.corrwith(s)

ball???-0.140028

pen???-0.869657

pencil??0.080845

paper???0.595854

dtype: float64

23.為元素賦NaN值

>>> s=pd.Series([1,2,np.NaN,3])

>>> s

0??1.0

1??2.0

2??NaN

3??3.0

dtype: float64

24.過濾NaN

>>> s

0??1.0

1??2.0

2??NaN

3??3.0

dtype: float64

>>> s.dropna()#利用dropna函數(shù)

0??1.0

1??2.0

3??3.0

dtype: float64

>>>

或者用以下方法：利用notnull方法

>>> s=pd.Series([1,2,np.NaN,3])

>>> s

0??1.0

1??2.0

2??NaN

3??3.0

dtype: float64

>>> s[s.notnull()]

0??1.0

1??2.0

3??3.0

dtype: float64：使用dropna（）方法只要行或者列有一個NaN元素，該行或列的全部元素都會被刪除

>>> frame=pd.DataFrame([[6,np.NaN,6],[np.NaN,np.NaN,np.NaN],[2,np.NaN,5]],index=['blue','green','red'],columns=['ball','mug','pen'])

>>> frame

????ball?mug?pen

blue??6.0?NaN?6.0

green??NaN?NaN?NaN

red???2.0?NaN?5.0

>>> frame.dropna()

Empty DataFrame

Columns: [ball, mug, pen]

Index: []

因此為了防止避免刪除整行或整列募闲，需要使用how選項(xiàng)步脓，值位all，告知dropna函數(shù)只刪除所有元素都是NaN的行或者列

>>> frame=pd.DataFrame([[6,np.NaN,6],[np.NaN,np.NaN,np.NaN],[2,np.NaN,5]],index=['blue','green','red'],columns=['ball','mug','pen'])

>>> frame

????ball?mug?pen

blue??6.0?NaN?6.0

green??NaN?NaN?NaN

red???2.0?NaN?5.0

>>> frame.dropna(how='all')

???ball?mug?pen

blue??6.0?NaN?6.0

red??2.0?NaN?5.0

25.為NaN元素填充其他值

1）將所有的NAN替換成同一個元素浩螺，利用fillna函數(shù)

>>> frame=pd.DataFrame([[6,np.NaN,6],[np.NaN,np.NaN,np.NaN],[2,np.NaN,5]],index=['blue','green','red'],columns=['ball','mug','pen'])

>>> frame

????ball?mug?pen

blue??6.0?NaN?6.0

green??NaN?NaN?NaN

red???2.0?NaN?5.0

>>> frame.fillna(0)

????ball?mug?pen

blue??6.0?0.0?6.0

green??0.0?0.0?0.0

red???2.0?0.0?5.0

2）將不同列的NaN替換成不同的元素：需要依次指定列名及要替換成的元素即可

>>> frame.fillna('ball':1,'mug':2,'pen':8)

26.等級索引和分級

1）創(chuàng)建帶有等級索引的Series對象

>>> import numpy as np

>>> import pandas as pd

>>> s=pd.Series(np.random.rand(8),index=[['a','a','a','b','b','c','c','c'],['up','down','right','up','down','up','down','left']])

>>> s

a?up????0.587733

??down???0.425383

??right??0.356205

b?up????0.251802

??down???0.105830

c?up????0.253041

??down???0.140155

??left???0.425004

dtype: float64

2）展示帶有等級索引Series對象的index屬性

>>> s.index

MultiIndex(levels=[['a', 'b', 'c'], ['down', 'left', 'right', 'up']],

??????labels=[[0, 0, 0, 1, 1, 2, 2, 2], [3, 0, 2, 3, 0, 3, 0, 1]])

3）選取帶有等級索引的Series對象的第一級索引對應(yīng)的元素

>>> s['a']

up????0.587733

down???0.425383

right??0.356205

dtype: float64

4）選取帶有等級索引的Series對象的第二級索引對應(yīng)的元素

>>> s[:,'up'] #一定記得有個逗號

a??0.587733

b??0.251802

c??0.253041

dtype: float64

5）選取帶有等級索引的Series對象的某個具體的元素

>>> s['a','up']

0.5877327517004284

6）將帶有等級索引的Series對象改變成一個DataFrame對象

>>> s.unstack()

????down???left???right????up

a?0.425383????NaN?0.356205?0.587733

b?0.105830????NaN????NaN?0.251802

c?0.140155?0.425004????NaN?0.253041

7）將一個DataFrame對象改變成一個帶有等級索引給的Series對象

>>> frame

????down???left???right????up

a?0.425383????NaN?0.356205?0.587733

b?0.105830????NaN????NaN?0.251802

c?0.140155?0.425004????NaN?0.253041

>>> frame.stack()

a?down???0.425383

??right??0.356205

??up????0.587733

b?down???0.105830

??up????0.251802

c?down???0.140155

??left???0.425004

??up????0.253041

dtype: float64

8）定義一個index和columns都是等級的DataFrame對象

>>> frame=pd.DataFrame(np.random.randn(16).reshape(4,4),index=[['white','white','red','red'],['up','down','up','down']],columns=[['pen','pen','paper','paper'],[1,2,1,2]])

>>> frame

?????????pen????????paper

??????????1?????2?????1?????2

white up??-0.487631?0.200648?0.344613?0.144835

???down?0.246683 -0.847063 -0.391592 -0.091928

red??up??-0.132962 -1.728167?1.787231?0.374895

???down -1.033622?0.354458?0.007813 -1.203889

27.重新調(diào)整順序和為層級排序

>>> frame

?????????pen????????paper

??????????1?????2?????1?????2

white up??-0.487631?0.200648?0.344613?0.144835

???down?0.246683 -0.847063 -0.391592 -0.091928

red??up??-0.132962 -1.728167?1.787231?0.374895

???down -1.033622?0.354458?0.007813 -1.203889

>>> frame.index.names=['colors','status']

>>> frame.columns.names=['objects','id']

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.swaplevel('colors','status')#交換colors和status兩列層級順序

objects???????pen????????paper

id??????????1?????2?????1?????2

status colors

up???white?-0.487631?0.200648?0.344613?0.144835

down??white??0.246683 -0.847063 -0.391592 -0.091928

up???red??-0.132962 -1.728167?1.787231?0.374895

down??red??-1.033622?0.354458?0.007813 -1.203889

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.sortlevel()#使用sortlevel對colots的所有進(jìn)行首字母的順序排列

__main__:1: FutureWarning: sortlevel is deprecated, use sort_index(level= ...)

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

red??down??-1.033622?0.354458?0.007813 -1.203889

????up???-0.132962 -1.728167?1.787231?0.374895

white?down??0.246683 -0.847063 -0.391592 -0.091928

????up???-0.487631?0.200648?0.344613?0.144835

28.按層級統(tǒng)計(jì)數(shù)據(jù)

1）按照某一行層級統(tǒng)計(jì)靴患，將層級名稱賦值給level，level作為統(tǒng)計(jì)函數(shù)的參數(shù)

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.sum(level='colors')#對colors這個行層級進(jìn)行sum處理

objects????pen????????paper

id???????1?????2?????1?????2

colors

white??-0.240947 -0.646416 -0.046978?0.052907

red???-1.166584 -1.373709?1.795044 -0.828994

2）想要對某一列層級

>>> frame

objects???????pen????????paper

id??????????1?????2?????1?????2

colors status

white?up???-0.487631?0.200648?0.344613?0.144835

????down??0.246683 -0.847063 -0.391592 -0.091928

red??up???-0.132962 -1.728167?1.787231?0.374895

????down??-1.033622?0.354458?0.007813 -1.203889

>>> frame.sum(level='id',axis=1) #對id這個列層級進(jìn)行sum處理要出，用axis=1標(biāo)識對列處理

id??????????1?????2

colors status

white?up???-0.143017?0.345483

????down??-0.144909 -0.938991

red??up???1.654270 -1.353272

????down??-1.025809 -0.849432

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末鸳君，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子厨幻，更是在濱河造成了極大的恐慌相嵌，老刑警劉巖，帶你破解...
沈念sama閱讀 206,602評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件况脆，死亡現(xiàn)場離奇詭異，居然都是意外死亡批糟，警方通過查閱死者的電腦和手機(jī)格了，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,442評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來徽鼎，“玉大人盛末，你說我怎么就攤上這事弹惦。” “怎么了悄但？”我有些...
開封第一講書人閱讀 152,878評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵棠隐，是天一觀的道長。經(jīng)常有香客問我檐嚣，道長助泽，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,306評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任嚎京，我火速辦了婚禮嗡贺，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘鞍帝。我一直安慰自己诫睬，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,330評論 5贊 373
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布帕涌。她就那樣靜靜地躺著摄凡，像睡著了一般。火紅的嫁衣襯著肌膚如雪蚓曼。梳的紋絲不亂的頭發(fā)上亲澡，一...
開封第一講書人閱讀 49,071評論 1贊 285
城市分裂傳說
那天，我揣著相機(jī)與錄音辟躏，去河邊找鬼谷扣。笑死，一個胖子當(dāng)著我的面吹牛捎琐，可吹牛的內(nèi)容都是我干的会涎。我是一名探鬼主播，決...
沈念sama閱讀 38,382評論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼瑞凑，長吁一口氣：“原來是場噩夢啊……” “哼末秃！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起籽御，我...
開封第一講書人閱讀 37,006評論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤练慕，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后技掏，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體铃将，經(jīng)...
沈念sama閱讀 43,512評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,965評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年哑梳，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了劲阎。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,094評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡鸠真，死狀恐怖悯仙，靈堂內(nèi)的尸體忽然破棺而出龄毡，到底是詐尸還是另有隱情，我是刑警寧澤锡垄，帶...
沈念sama閱讀 33,732評論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布沦零，位于F島的核電站，受9級特大地震影響货岭，放射性物質(zhì)發(fā)生泄漏路操。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,283評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一茴她、第九天我趴在偏房一處隱蔽的房頂上張望寻拂。院中可真熱鬧，春花似錦丈牢、人聲如沸祭钉。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,286評論 0贊 19
一樁弒父案己沛，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽慌核。三九已至，卻和暖如春申尼，著一層夾襖步出監(jiān)牢的瞬間垮卓，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,512評論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工师幕，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留粟按，地道東北人。一個月前我還...
沈念sama閱讀 45,536評論 2贊 354
代替公主和親
正文我出身青樓霹粥，卻偏偏與公主長得像灭将，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子后控，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,828評論 2贊 345

Python數(shù)據(jù)分析-07

推薦閱讀更多精彩內(nèi)容