數(shù)據(jù)源:鏈接: https://pan.baidu.com/s/1EFqJFXf70t2Rubkh6D19aw 提取碼: syqg
數(shù)據(jù)源示例:
探索1960 - 2014 美國犯罪數(shù)據(jù)
步驟1 導(dǎo)入必要的庫
import pandas as pd
import numpy as np
步驟2 從以下地址導(dǎo)入數(shù)據(jù)集
path1='pandas_exercise\exercise_data/US_Crime_Rates_1960_2014.csv'
步驟3 將數(shù)據(jù)框命名為crime
crime=pd.read_csv(path1)
print(crime.head())
步驟4 每一列(column)的數(shù)據(jù)類型是什么樣的可霎?用info
print(crime.info())
步驟5 將Year的數(shù)據(jù)類型轉(zhuǎn)換為 datetime64 用pd.to_datetime
crime['Year']=pd.to_datetime(crime.Year,format='%Y')
print(crime.head())
步驟6 將列Year設(shè)置為數(shù)據(jù)框的索引 用set_index
crime=crime.set_index('Year',drop=True)
print(crime.head())
步驟7 刪除名為Total的列 用del
del crime['Total']
print(crime.head())
步驟8 按照Year對數(shù)據(jù)框進行分組并求和 每十年 時間序列重采樣resample
crimes=crime.resample('10AS').sum() #對每一列進行十年加和運算
crimes['Population']=crime['Population'].resample('10AS').max() #用resample去得到“Population”列的最大值,并替換
print(crimes)
步驟9 何時是美國歷史上生存最危險的年代短蜕?
print(crime.idxmax(0)) #采用idxmax()
函數(shù)用于沿索引軸查找最大值的索引
示例:
輸出
# 步驟3
Year Population Total ... Burglary Larceny_Theft Vehicle_Theft
0 1960 179323175 3384200 ... 912100 1855400 328200
1 1961 182992000 3488000 ... 949600 1913000 336000
2 1962 185771000 3752200 ... 994300 2089600 366800
3 1963 188483000 4109500 ... 1086400 2297800 408300
4 1964 191141000 4564600 ... 1213200 2514400 472800
[5 rows x 12 columns]
# 步驟4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 55 non-null int64
1 Population 55 non-null int64
2 Total 55 non-null int64
3 Violent 55 non-null int64
4 Property 55 non-null int64
5 Murder 55 non-null int64
6 Forcible_Rape 55 non-null int64
7 Robbery 55 non-null int64
8 Aggravated_assault 55 non-null int64
9 Burglary 55 non-null int64
10 Larceny_Theft 55 non-null int64
11 Vehicle_Theft 55 non-null int64
dtypes: int64(12)
memory usage: 5.3 KB
None
# 步驟5
Year Population Total ... Burglary Larceny_Theft Vehicle_Theft
0 1960-01-01 179323175 3384200 ... 912100 1855400 328200
1 1961-01-01 182992000 3488000 ... 949600 1913000 336000
2 1962-01-01 185771000 3752200 ... 994300 2089600 366800
3 1963-01-01 188483000 4109500 ... 1086400 2297800 408300
4 1964-01-01 191141000 4564600 ... 1213200 2514400 472800
[5 rows x 12 columns]
# 步驟6
Population Total ... Larceny_Theft Vehicle_Theft
Year ...
1960-01-01 179323175 3384200 ... 1855400 328200
1961-01-01 182992000 3488000 ... 1913000 336000
1962-01-01 185771000 3752200 ... 2089600 366800
1963-01-01 188483000 4109500 ... 2297800 408300
1964-01-01 191141000 4564600 ... 2514400 472800
[5 rows x 11 columns]
# 步驟7
Population Violent ... Larceny_Theft Vehicle_Theft
Year ...
1960-01-01 179323175 288460 ... 1855400 328200
1961-01-01 182992000 289390 ... 1913000 336000
1962-01-01 185771000 301510 ... 2089600 366800
1963-01-01 188483000 316970 ... 2297800 408300
1964-01-01 191141000 364220 ... 2514400 472800
[5 rows x 10 columns]
# 步驟8
Population Violent ... Larceny_Theft Vehicle_Theft
Year ...
1960-01-01 201385000 4134930 ... 26547700 5292100
1970-01-01 220099000 9607930 ... 53157800 9739900
1980-01-01 248239000 14074328 ... 72040253 11935411
1990-01-01 272690813 17527048 ... 77679366 14624418
2000-01-01 307006550 13968056 ... 67970291 11412834
2010-01-01 318857056 6072017 ... 30401698 3569080
[6 rows x 10 columns]
# 步驟9
Population 2014-01-01
Violent 1992-01-01
Property 1991-01-01
Murder 1991-01-01
Forcible_Rape 1992-01-01
Robbery 1991-01-01
Aggravated_assault 1993-01-01
Burglary 1980-01-01
Larceny_Theft 1991-01-01
Vehicle_Theft 1991-01-01
dtype: datetime64[ns]