首先寝杖,我們需要引入需要使用的庫蜕着。
import numpy as np
import pandas as pd
除此之外,我們需要讀寫csv文件
from pandas import read_csv
讀取我們的測試數(shù)據(jù)
df=pd.read_csv('data_study.csv')
>>>df
num class name sex english sport army math possity space
0 10 1 mary woman 80 80 90 75.0 60 65
1 28 1 land man 80 50 69 70.0 58 70
2 15 2 asnx man 80 69 80 75.0 90 94
3 18 4 david man 90 80 86 85.0 95 62
4 19 2 gry woman 90 50 64 NaN 64 85
5 20 2 kitty woman 84 58 97 94.0 63 21
6 14 3 lury woman 98 77 88 0.0 55 40
7 21 1 facy man 55 68 94 52.0 36 48
接下來我們將對數(shù)據(jù)進行處理
df.duplicated()#顯示是否重復
df.drop_duplicates()#刪除重復
對于空值進行用0填補
df.fillna(0)
為了數(shù)據(jù)處理惠况,對數(shù)據(jù)進行拷貝
df1=df.copy()
df2=df.copy()
查看數(shù)據(jù)類型并填空不是int的數(shù)據(jù)
>>> for i in ty:
... if df1[i].dtype=='O':
... noint.append(i)
...
>>>
>>> noint
['name', 'sex']
添加總分
df2['total_score']=df2['english']+df2['sport']+df2['army']+df2['math']+df2['possity']+df2['space']
df>>> df2
num class name sex english sport army math possity space total_score
0 10 1 mary woman 80 80 90 75 60 65 450
1 28 1 land man 80 50 69 70 58 70 397
2 15 2 asnx man 80 69 80 75 90 94 488
3 18 4 david man 90 80 86 85 95 62 498
4 19 2 gry woman 90 50 64 0 64 85 353
5 20 2 kitty woman 84 58 97 94 63 21 417
6 14 3 lury woman 98 77 88 0 55 40 358
7 21 1 facy man 55 68 94 52 36 48 353
對數(shù)據(jù)進行分組處理
bins=[df2.total_score.min()-1,400,450,df2.total_score.max()+1]
>>> label=['common','good','perfect']
>>> df2_list=pd.cut(df2.total_score,bins,right=False,labels=label)
>>> df2['catalogy']=df2
df2 df2_list
>>> df2['catalogy']=df2_list
>>> df2
num class name sex english ... math possity space total_score catalogy
0 10 1 mary woman 80 ... 75 60 65 450 perfect
1 28 1 land man 80 ... 70 58 70 397 common
2 15 2 asnx man 80 ... 75 90 94 488 perfect
3 18 4 david man 90 ... 85 95 62 498 perfect
4 19 2 gry woman 90 ... 0 64 85 353 common
5 20 2 kitty woman 84 ... 94 63 21 417 good
6 14 3 lury woman 98 ... 0 55 40 358 common
7 21 1 facy man 55 ... 52 36 48 353 common
當然遭庶,除此之外,我們需要進行數(shù)據(jù)的標準化處理
for i in list(df1.columns[4:]):
... df1[i]=(df1[i]-df1[i].min())/(df1[i].max()-df1[i].min())
...
>>> df1
num class name sex english sport army math possity space
0 10 1 mary woman 0.581395 1.000000 0.787879 0.797872 0.406780 0.602740
1 28 1 land man 0.581395 0.000000 0.151515 0.744681 0.372881 0.671233
2 15 2 asnx man 0.581395 0.633333 0.484848 0.797872 0.915254 1.000000
3 18 4 david man 0.813953 1.000000 0.666667 0.904255 1.000000 0.561644
4 19 2 gry woman 0.813953 0.000000 0.000000 0.000000 0.474576 0.876712
5 20 2 kitty woman 0.674419 0.266667 1.000000 1.000000 0.457627 0.000000
6 14 3 lury woman 1.000000 0.900000 0.727273 0.000000 0.322034 0.260274
7 21 1 facy man 0.000000 0.600000 0.909091 0.553191 0.000000 0.369863
>>> df1['total_score']=df1['english']+df1['sport']+df1['army']+df1['math']+df1['possity']+df1['space']
>>> df1
num class name sex english sport army math possity space total_score
0 10 1 mary woman 0.581395 1.000000 0.787879 0.797872 0.406780 0.602740 4.176666
1 28 1 land man 0.581395 0.000000 0.151515 0.744681 0.372881 0.671233 2.521706
2 15 2 asnx man 0.581395 0.633333 0.484848 0.797872 0.915254 1.000000 4.412704
3 18 4 david man 0.813953 1.000000 0.666667 0.904255 1.000000 0.561644 4.946519
4 19 2 gry woman 0.813953 0.000000 0.000000 0.000000 0.474576 0.876712 2.165242
5 20 2 kitty woman 0.674419 0.266667 1.000000 1.000000 0.457627 0.000000 3.398712
6 14 3 lury woman 1.000000 0.900000 0.727273 0.000000 0.322034 0.260274 3.209581
7 21 1 facy man 0.000000 0.600000 0.909091 0.553191 0.000000 0.369863 2.432145
>>> bins=[df1.total_score.min()-1,3,4,df1.total_score.max()+1]
>>> label=['common','good','perfect']
>>> df1_list=pd.cut(df1.total_score,bins,right=False,labels=label)
>>> df1['catalogy']=df1_list
>>>
>>> df1
num class name sex english ... math possity space total_score catalogy
0 10 1 mary woman 0.581395 ... 0.797872 0.406780 0.602740 4.176666 perfect
1 28 1 land man 0.581395 ... 0.744681 0.372881 0.671233 2.521706 common
2 15 2 asnx man 0.581395 ... 0.797872 0.915254 1.000000 4.412704 perfect
3 18 4 david man 0.813953 ... 0.904255 1.000000 0.561644 4.946519 perfect
4 19 2 gry woman 0.813953 ... 0.000000 0.474576 0.876712 2.165242 common
5 20 2 kitty woman 0.674419 ... 1.000000 0.457627 0.000000 3.398712 good
6 14 3 lury woman 1.000000 ... 0.000000 0.322034 0.260274 3.209581 good
7 21 1 facy man 0.000000 ... 0.553191 0.000000 0.369863 2.432145 common
以上便是簡單的數(shù)據(jù)處理內容了稠屠。