數(shù)據(jù)框(dataFrame)的創(chuàng)建:
import numpy as np
import pandas as pd
data = {'year':[2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team':['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
'Lions', 'Lions'],
'wins':[11, 8, 10, 15, 11, 6, 10, 4],
'losses':[5, 8, 6, 1, 5, 10, 6, 12]
}
football =pd.DataFrame(data)
print football
輸出:
losses team wins year
0 5 Bears 11 2010
1 8 Bears 8 2011
2 6 Bears 10 2012
3 1 Packers 15 2011
4 5 Packers 11 2012
5 10 Lions 6 2010
6 6 Lions 10 2011
7 12 Lions 4 2012
Pandas 也有很多幫助你理解數(shù)據(jù)框中一些基本信息的方法:
- dtypes: 獲取每一柱數(shù)據(jù)的數(shù)據(jù)類型
- describle: 對于用來觀察數(shù)據(jù)框的數(shù)值列的基本是有數(shù)據(jù)用的
- head :顯示前5行數(shù)據(jù)集
- tail : 顯示最后5行的數(shù)據(jù)集
test:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
'Lions', 'Lions'],
'wins': [11, 8, 10, 15, 11, 6, 10, 4],
'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football.dtypes
print ""
print football.describe()
print ""
print football.head()
print ""
print football.tail()
運行結(jié)果:
# 數(shù)據(jù)類型
losses int64
team object
wins int64
year int64
dtype: object
#返回一些基本數(shù)據(jù)
losses wins year
count 8.000000 8.000000 8.000000 # 總數(shù)
mean 6.625000 9.375000 2011.125000 #平均數(shù)
std 3.377975 3.377975 0.834523 #標(biāo)準(zhǔn)差
min 1.000000 4.000000 2010.000000 #最小值
25% 5.000000 7.500000 2010.750000
50% 6.000000 10.000000 2011.000000
75% 8.500000 11.000000 2012.000000
max 12.000000 15.000000 2012.000000#最大值
losses team wins year
0 5 Bears 11 2010
1 8 Bears 8 2011
2 6 Bears 10 2012
3 1 Packers 15 2011
4 5 Packers 11 2012
losses team wins year
3 1 Packers 15 2011
4 5 Packers 11 2012
5 10 Lions 6 2010
6 6 Lions 10 2011
7 12 Lions 4 2012
再來一組code:
from pandas import DataFrame, Series
#################
# Syntax Reminder:
#
# The following code would create a two-column pandas DataFrame
# named df with columns labeled 'name' and 'age':
#
# people = ['Sarah', 'Mike', 'Chrisna']
# ages = [28, 32, 25]
# df = DataFrame({'name' : Series(people),
# 'age' : Series(ages)})
def create_dataframe():
'''
Create a pandas dataframe called 'olympic_medal_counts_df' containing
the data from the table of 2014 Sochi winter olympics medal counts.
The columns for this dataframe should be called
'country_name', 'gold', 'silver', and 'bronze'.
There is no need to specify row indexes for this dataframe
(in this case, the rows will automatically be assigned numbered indexes).
You do not need to call the function in your code when running it in the
browser - the grader will do that automatically when you submit or test it.
'''
countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
'Netherlands', 'Germany', 'Switzerland', 'Belarus',
'Austria', 'France', 'Poland', 'China', 'Korea',
'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']
gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]
data ={
'countries':countries,
'gold':gold,
'silver':silver,
'bronze':bronze}
olympic_medal_counts_df=DataFrame(data)
return olympic_medal_counts_df
運行結(jié)果:
bronze countries gold silver
0 9 Russian Fed. 13 11
1 10 Norway 11 5
2 5 Canada 10 10
3 12 United States 9 7
4 9 Netherlands 8 7
5 5 Germany 8 6
6 2 Switzerland 6 3
7 1 Belarus 5 0
8 5 Austria 4 8
9 7 France 4 4
10 1 Poland 4 1
11 2 China 3 4
12 2 Korea 3 3
13 6 Sweden 2 7
14 2 Czech Republic 2 4
15 4 Slovenia 2 2
16 3 Japan 1 4
17 1 Finland 1 3
18 2 Great Britain 1 1
19 1 Ukraine 1 0
20 0 Slovakia 1 0
21 6 Italy 0 2
22 2 Latvia 0 2
23 1 Australia 0 2
24 0 Croatia 0 1
25 1 Kazakhstan 0 0