Pandas 介紹
pandas是python的一個(gè)數(shù)據(jù)分析庫(kù)泥兰,主要提供兩種主要的資料結(jié)構(gòu),Series與DataFrame题禀。Series是用來(lái)處理時(shí)間 順序相關(guān)資料鞋诗,DataFrame則是用來(lái)處理結(jié)構(gòu)化的資料(二維的數(shù)據(jù)資料)
安裝Pandas
pip install pandas
Pandas讀取不同格式的資料
讀取CSV檔案
import pandas as pd
df = pd.read_csv('file.csv')
print(df)
讀取HTML檔案
import pandas as pd
df = pd.read_html('http://www.reibang.com/u/e635858eda0b')
print(df)
Pandas提供的資料結(jié)構(gòu)
· Series:處理時(shí)間序列的相關(guān)資料,主要是創(chuàng)建一維list迈嘹。
·DataFrame:處理結(jié)構(gòu)化的資料削彬,有索引和標(biāo)簽的二維資料集。
·Panel:處理三維數(shù)據(jù)秀仲。
1.series
數(shù)據(jù)類型是array
import pandas as pd
list = ['python', 'ruby', 'c', 'c++']
select = pd.Series(list)
print (select)
輸出:
0 python
1 ruby
2 c
3 c++
dtype: object
數(shù)據(jù)類型是Dictionary
import pandas as pd
dict = {'key1': '1', 'key2': '2', 'key3': '3'}
select = pd.Series(dict, index = dict.keys())
輸出:
print select
key3 3
key2 2
key1 1
dtype: object
print (select[0])
3
print select[2]
1
print select['key3']
3
print select[[2]]
key1 1
dtype: object
print (select[[0,2,1]])
key3 3
key1 1
key2 2
dtype: object
數(shù)據(jù)類型是單一數(shù)據(jù)
import pandas as pd
string = 'henry'
select = pd.Series (string, index = range(3))
print (select)
輸出:
0 henry
1 henry
2 henry
切片選擇
print (select[1:])
1 henry
2 henry
2.DataFrame
2.1建立DataFrame
可以用DDictionary或Array來(lái)創(chuàng)建融痛,也可以用外部資料讀取后創(chuàng)建。
Dictionary
import pandas as pd
groups = ['Movies', 'Sports', 'Conding', 'Fishing', 'Dancing']
num = [12, 5, 18, 99, 88]
dict = {'groups': groups, 'num': num}
df = pd.DataFrame(dict)
print (df)
輸出:
groups num
0 Movies 12
1 Sports 5
2 Conding 18
3 Fishing 99
4 Dancing 88
Array
array = [['Movies',12], ['Sports', 5], ['Conding', 18], ['Fishing', 99], ['Dancing', 88]]
df = pd.DataFrame(arr, colums = ['name', 'num'])
df = pd.DataFrame(array, columns = ['name', 'num'])
print df
輸出:
name num
0 Movies 12
1 Sports 5
2 Conding 18
3 Fishing 99
4 Dancing 88
2.2DataFrame的操作
DataFrame的方法
.shape 返回行數(shù)和列數(shù)
.describe() 返回描述性統(tǒng)計(jì)
.head()
.tail()
.columns
.index
.info()
import pandas as pd
groups = ['Movies', 'Sports', 'Conding', 'Fishing', 'Dancing']
num = [12, 5, 18, 99, 88]
dict = {'groups': groups, 'num': num}
df = pd.DataFrame(dict)
print df.shape
(5, 2)
print df.describe()
num
count 5.000000
mean 44.400000
std 45.224993
min 5.000000
25% 12.000000
50% 18.000000
75% 88.000000
max 99.000000
print df.head()
groups num
0 Movies 12
1 Sports 5
2 Conding 18
3 Fishing 99
4 Dancing 88
print df.columns
Index([u'groups', u'num'], dtype='object')
print df.index
RangeIndex(start=0, stop=5, step=1)
print df.info
<bound method DataFrame.info of groups num
0 Movies 12
1 Sports 5
2 Conding 18
3 Fishing 99
4 Dancing 88>
print df.tail(3)
groups num
2 Conding 18
3 Fishing 99
4 Dancing 88