Pandas 數(shù)據(jù)處理(一) —— 幾個簡單函數(shù)掌握击蹲！

對于 Pandas，接觸過 Python 數(shù)據(jù)處理的小伙伴們都應該挺熟悉的巍棱，做數(shù)據(jù)處理不可或缺的一個程序包惑畴，最大的特點高效，本篇文章將通過案例介紹一下 Pandas 的一些基礎使用航徙！

1桨菜，讀入數(shù)據(jù)

大部分數(shù)據(jù)都可以用 read_csv() 函數(shù)讀入，函數(shù)中有個 sep 參數(shù)捉偏，表示數(shù)據(jù)的分隔符倒得，默認為 “，” (因為大部分 csv 文件數(shù)據(jù)之間就是以夭禽，隔開的)

users = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user",                   sep = '|')# Read data;users

原始數(shù)據(jù)：

Snipaste_2020-06-13_08-22-39.png

讀取之后的數(shù)據(jù)：

Snipaste_2020-06-13_08-26-03.png

除了 read_csv 之外霞掺，還有一個常用的 read_table函數(shù)也可進行讀取操作，用法與 read_csv 相似

2讹躯，改變索引值菩彬，只展示前幾行數(shù)據(jù)

set_index() 函數(shù)用來改變索引值，注意需要加一個參數(shù) replace = True 表示替代潮梯；利用 head(n) 函數(shù)表示只展示前 n 行數(shù)據(jù)

users.set_index('user_id',inplace = True)users.head(25)

Snipaste_2020-06-13_08-26-13.png

tail(n) 只展示后幾行數(shù)據(jù)骗灶；

3，查看數(shù)據(jù)的行和列的基本信息

1秉馏，shape 返回數(shù)據(jù)的行數(shù)和列數(shù)耙旦，以 tuple 形式返回；

users.shape# (943, 4)

2萝究，columns 返回數(shù)據(jù)列名免都；

users.columns# Index(['age', 'gender', 'occupation', 'zip_code'], dtype='object')

3，index 返回行名帆竹；

users.indexInt64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,            ...            934, 935, 936, 937, 938, 939, 940, 941, 942, 943],           dtype='int64', name='user_id', length=943)

4绕娘，dtypes 返回各列的數(shù)據(jù)類型；

users.dtypes# age            int64gender        objectoccupation    objectzip_code      objectdtype: object

4栽连，只選取某列或多列數(shù)據(jù)

Pandas 提供多種方式可供選擇险领，注：users 表示 Pandas 可處理的DataFrame 格式;

1，users.列名；

users.occupation

2绢陌，users[['列名']]挨下；

users[['occupation']]

3，users.loc[:,['列名']]下面；

users.loc[:,['occupation']]

Snipaste_2020-06-13_10-39-00.png

同時選取多列數(shù)據(jù)時

1，users[['列名1','列名2']]绩聘；

users[['occupation','age']]

2沥割，users.loc[:,['列名1','列名2']];

users.loc[:,['occupation','age']]

Snipaste_2020-06-13_20-49-34.png

5，對列中數(shù)據(jù)做去重統(tǒng)計

1凿菩，列名.nunique() 查看某一列數(shù)據(jù)有多少個不重復樣本机杜；

users.occupation.nunique()# 21

也可以通過這種方式實現(xiàn)

列名.value_counts().count()

users.occupation.value_counts().count()# 21

如果想在1 的基礎之上，查看每一個不重復樣本在數(shù)據(jù)列表沖出現(xiàn)了幾次衅谷，可用下面語句

users.列名.value_counts()

users.occupation.value_counts().head()# student          196other            105educator          95administrator     79engineer          67Name: occupation, dtype: int64

6椒拗，對數(shù)據(jù)列表中的數(shù)字列做個簡單統(tǒng)計

users.describe() 即可實現(xiàn)，默認統(tǒng)計的是 numeric columns(列中數(shù)據(jù)都是以數(shù)值進行展示的)

users.describe()

Snipaste_2020-06-13_20-49-55.png

當然也可以統(tǒng)計全部列获黔，加一個參數(shù) include = 'all';

users.describe(include = 'all')

Snipaste_2020-06-13_20-50-02.png

users.列名.describe() 也可以對指定列進行統(tǒng)計：

users.occupation.describe()#count         943unique         21top       studentfreq          196Name: occupation, dtype: object

7蚀苛，對數(shù)據(jù)做組聚類

groupby 函數(shù)對某一列做聚類操作，返回的是 GroupBy 對象玷氏；與 5 中方法相似堵未，區(qū)別是 groupby 是以聚類后的列為參照，查看其他列的數(shù)據(jù)統(tǒng)計情況

c =users.groupby("occupation")c# <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000017673002788>

GroupBy.head(n) 查看前 n 行數(shù)據(jù)

c.head(5)

GroupBy.cout() 對每個樣本對應其他列進行數(shù)據(jù)統(tǒng)計

c.count()

GroupBy.size() 統(tǒng)計列中每個樣本出現(xiàn)次數(shù)

c.size()

還有其它許可操作的函數(shù)盏触，

Snipaste_2020-06-13_10-33-50.png

詳細的可去官網(wǎng)上查詢：https://pandas.pydata.org/docs/reference/groupby.html

8渗蟹，對數(shù)據(jù)按照某一列進行排序

用到 data.sort_values() 函數(shù)，默認從小到大赞辩，可以設置 ascending = False 設置為從大到写蒲俊；

users.sort_values(["age"],ascending = False)

也可以參考多個列進行排序：

users.sort_values(["age","zip_code"],ascending = False)

double_columns_sort.png

9辨嗽，創(chuàng)建新的列

加入新的列比較簡單世落，創(chuàng)建一個 Series (行數(shù)需與原列表數(shù)據(jù)行數(shù)保持一致)，賦值到源數(shù)據(jù)即可

data['列名'] = 新創(chuàng)建的 series糟需；下面我利用對 age 中數(shù)據(jù)進行均一化岛心，把數(shù)據(jù)存放在新的列 age_normalize 中

Snipaste_2020-06-13_10-57-10.png

10，刪除指定列

用 drop() 函數(shù)可刪除源數(shù)據(jù)中的指定列

users.drop(['age'],axis = 1)

這里的 axis 代表指定要刪除的是行還是列篮灼，默認為0忘古，0代表的是行，1代表的是列诅诱；也可以直接用下面命令：

users.drop(columns =['age'])

drop_columns.png

Pandas 數(shù)據(jù)處理(一) —— 幾個簡單函數(shù)掌握拙友！

Pandas 數(shù)據(jù)處理(一) —— 幾個簡單函數(shù)掌握辟癌！