目錄
- 一夜畴、Pandas的數(shù)據(jù)結(jié)構(gòu)介紹
- 1菱涤、Series
- 2此再、DataFrame
- (1)Dataframe創(chuàng)建
- (2)讀取DataFrame的索引名和數(shù)值
- (3)讀取DataFrame特定索引的值
- (4)為DataFrame對(duì)象賦值
- (5)列的刪除
- 3瘩绒、index對(duì)象
一、Pandas的數(shù)據(jù)結(jié)構(gòu)介紹
1堕战、Series
Series類型類似于字典類型,每組數(shù)據(jù)都有一組與之相關(guān)的數(shù)據(jù)標(biāo)簽(即索引)組成缀旁。
來看看最簡(jiǎn)單的Series:
from pandas import Series, DataFrame
import pandas as pd
obj = Series([4, 7, -5, 3])
obj
輸出:0 4
1 7
2 -5
3 3
dtype: int64
可以看到數(shù)值左邊都有一列索引。一個(gè)Series對(duì)象有一列index值和一列values值。
obj.values
輸出:array([ 4, 7, -5, 3])
obj.index
輸出:RangeIndex(start=0, stop=4, step=1)
索引值也是可以自定義的懈息。
obj2 = Series([4, 7, -5, 3], index = ['d', 'b', 'a', 'c'])
obj2
輸出:d 4
b 7
a -5
c 3
dtype: int64
obj2['a']
輸出:-5
obj2[['c', 'a', 'd']]
輸出:c 3
a -5
d 4
dtype: int64
對(duì)Series對(duì)象進(jìn)行運(yùn)算速种,也會(huì)保留其索引值亿絮。
obj2[obj2 > 0]
輸出:d 4
b 7
c 3
dtype: int64
obj2 * 2
輸出:d 8
b 14
a -10
c 6
dtype: int64
import numpy as np
np.exp(obj2)
輸出:d 54.598150
b 1096.633158
a 0.006738
c 20.085537
dtype: float64
Series與字典五慈。Series對(duì)象可以像字典對(duì)象一樣,可以用在許多原本需要字典參數(shù)的函數(shù)中主穗。也可以通過字典類型來創(chuàng)建Series類型泻拦。
'b' in obj2
輸出:'b' in obj2
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = Series(sdata)
obj3
輸出:Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
dtype: int64
如果只傳入一個(gè)字典,則結(jié)果Series的索引就是原字典的鍵忽媒。
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = Series(sdata, index = states)
obj4
輸出:California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
dtype: float64
Series對(duì)象有name屬性及index.name屬性
obj4.name = 'population'
obj4.index.name = 'state'
obj4
輸出:state
California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
Name: population, dtype: float64
2聪轿、DataFrame
DataFrame類型可以簡(jiǎn)單理解為有行索引及列索引的數(shù)據(jù)類型,也可以理解為帶索引的Series猾浦。從創(chuàng)建開始說起。
(1)Dataframe創(chuàng)建
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
frame
輸出: state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
可以看到在創(chuàng)建DataFrame時(shí)灯抛,自動(dòng)為數(shù)據(jù)增加了行索引金赦,0~4。
可以在創(chuàng)建DataFrame時(shí)規(guī)定列的順序对嚼。
DataFrame(data, columns = ['year', 'state', 'pop'])
輸出: year state pop
0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
也可以自定義行索引的值夹抗。
frame2 = DataFrame(data, columns = ['year', 'state', 'pop', 'debt'], index = ['one', 'two', 'three', 'four', 'five'])
frame2
輸出:
year state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 NaN
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 NaN
five 2002 Nevada 2.9 NaN
另一種創(chuàng)建方式是通過嵌套字典。
pop = {'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = DataFrame(pop)
frame3
輸出: Nevada Ohio
2000 NaN 1.5
2001 2.4 1.7
2002 2.9 3.6
(2)讀取DataFrame的索引名和數(shù)值
可以通過DataFrame.columns讀取DataFrame的列索引名纵竖。
frame2.columns
輸出:Index(['year', 'state', 'pop', 'debt'], dtype='object')
可以通過DataFrame.index讀取DataFrame的行索引名漠烧。
frame2.index
輸出:Index(['one', 'two', 'three', 'four', 'five'], dtype='object')
可以通過DataFrame.values方法讀取DataFrame除索引外的純數(shù)據(jù)值。
frame2.values
輸出:array([[2000, 'Ohio', 1.5, nan],
[2001, 'Ohio', 1.7, -1.2],
[2002, 'Ohio', 3.6, nan],
[2001, 'Nevada', 2.4, -1.5],
[2002, 'Nevada', 2.9, -1.7]], dtype=object)
(3)讀取DataFrame特定索引的值
可以通過['索引名']或.索引名的方式讀取某列的數(shù)據(jù)靡砌。
frame2['state']
輸出:one Ohio
two Ohio
three Ohio
four Nevada
five Nevada
Name: state, dtype: object
frame2.year
輸出:one 2000
two 2001
three 2002
four 2001
five 2002
Name: year, dtype: int64
frame2.year['one']
輸出:2000
(4)為DataFrame對(duì)象賦值
可以用標(biāo)量數(shù)值直接為某列數(shù)據(jù)進(jìn)行賦值已脓,或者用Series類型數(shù)據(jù)為DataFrame某幾列賦值。
frame2['debt'] = 16.5
frame2
輸出: year state pop debt
one 2000 Ohio 1.5 16.5
two 2001 Ohio 1.7 16.5
three 2002 Ohio 3.6 16.5
four 2001 Nevada 2.4 16.5
five 2002 Nevada 2.9 16.5
frame2['debt'] = np.arange(5.)
frame2
輸出: year state pop debt
one 2000 Ohio 1.5 0.0
two 2001 Ohio 1.7 1.0
three 2002 Ohio 3.6 2.0
four 2001 Nevada 2.4 3.0
five 2002 Nevada 2.9 4.0
val = Series([-1.2, -1.5, -1.7], index = ['two', 'four', 'five'])
frame2['debt'] = val
frame2
輸出: year state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 -1.2
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 -1.5
five 2002 Nevada 2.9 -1.7
(5)列的刪除
frame2['eastern'] = (frame2.state == 'Ohio')
frame2
輸出: year state pop debt eastern
one 2000 Ohio 1.5 NaN True
two 2001 Ohio 1.7 -1.2 True
three 2002 Ohio 3.6 NaN True
four 2001 Nevada 2.4 -1.5 False
five 2002 Nevada 2.9 -1.7 False
del frame2['eastern']
frame2.columns
輸出:Index(['year', 'state', 'pop', 'debt'], dtype='object')
3通殃、index對(duì)象
首先來看看index對(duì)象是什么度液。
obj = Series(range(3), index = ['a', 'b', 'c'])
index = obj.index
index
輸出:Index(['a', 'b', 'c'], dtype='object')
可見,對(duì)Series和DataFrame對(duì)象使用index方法,返回的就是一個(gè)index對(duì)象堕担。
index是不可以被手動(dòng)修改的已慢。不可修改性非常重要,這樣才能使index在多個(gè)數(shù)據(jù)結(jié)構(gòu)之間安全共享霹购。
index常用的方法有很多佑惠,可以參照如下表格。