圖解Pandas的assign函數(shù)

公眾號(hào)：尤而小屋
作者：Peter
編輯：Peter

大家好，我是Peter~

本文介紹的是Pandas庫中一個(gè)非常有用的函數(shù)：assign

在我們處理數(shù)據(jù)的時(shí)候狱杰，有時(shí)需要根據(jù)某個(gè)列進(jìn)行計(jì)算得到一個(gè)新列差凹，以便后續(xù)使用期奔，相當(dāng)于是根據(jù)已知列得到新的列侧馅，這個(gè)時(shí)候assign函數(shù)非常方便。下面通過實(shí)例來說明函數(shù)的的用法呐萌。

Pandas文章

本文是Pandas文章連載系列的第21篇馁痴，主要分為3類：

基礎(chǔ)部分：1-16篇时呀，主要是介紹Pandas中基礎(chǔ)和常用操作脐区，比如數(shù)據(jù)創(chuàng)建、檢索查詢抢蚀、排名排序渠旁、缺失值/重復(fù)值處理等常見的數(shù)據(jù)處理操作

進(jìn)階部分：第17篇開始講解Pandas中的高級(jí)操作方法

對(duì)比SQL攀例，學(xué)習(xí)Pandas：將SQL和Pandas的操作對(duì)比起來進(jìn)行學(xué)習(xí)

image

參數(shù)

assign函數(shù)的參數(shù)只有一個(gè)：DataFrame.assign(**kwargs)。

**kwargs: dict of {str: callable or Series}

關(guān)于參數(shù)的幾點(diǎn)說明：

列名是關(guān)鍵字keywords
如果列名是可調(diào)用的顾腊，那么它們將在DataFrame上計(jì)算并分配給新的列
如果列名是不可調(diào)用的（例如：Series粤铭、標(biāo)量scalar或者數(shù)組array），則直接進(jìn)行分配

最后杂靶，這個(gè)函數(shù)的返回值是一個(gè)新的DataFrame數(shù)據(jù)框梆惯，包含所有現(xiàn)有列和新生成的列

導(dǎo)入庫

import pandas as pd
import numpy as np

# 模擬數(shù)據(jù)

df = pd.DataFrame({
  "col1":[12, 16, 18],
  "col2":["xiaoming","peter", "mike"]})

df

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>col1</th>
<th>col2</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>12</td>
<td>xiaoming</td>
</tr>
<tr>
<th>1</th>
<td>16</td>
<td>peter</td>
</tr>
<tr>
<th>2</th>
<td>18</td>
<td>mike</td>
</tr>
</tbody>
</table>

</div>

實(shí)例

當(dāng)值是可調(diào)用的，我們直接在數(shù)據(jù)框上進(jìn)行計(jì)算：

方式1：直接調(diào)用數(shù)據(jù)框

# 方式1：數(shù)據(jù)框df上調(diào)用
# 使用數(shù)據(jù)框df的col1屬性吗垮，生成col3

df.assign(col3=lambda x: x.col1 / 2 + 20)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>col1</th>
<th>col2</th>
<th>col3</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>12</td>
<td>xiaoming</td>
<td>26.0</td>
</tr>
<tr>
<th>1</th>
<td>16</td>
<td>peter</td>
<td>28.0</td>
</tr>
<tr>
<th>2</th>
<td>18</td>
<td>mike</td>
<td>29.0</td>
</tr>
</tbody>
</table>

</div>

我們可以查看原來的df垛吗，發(fā)現(xiàn)它是不變的

df  # 原數(shù)據(jù)框不變的

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

</div>

操作字符串類型的數(shù)據(jù)：

df.assign(col3=df["col2"].str.upper())

image

方式2：調(diào)用Series數(shù)據(jù)

可以通過直接引用現(xiàn)有的Series或序列來實(shí)現(xiàn)相同的行為:

# 方式2：調(diào)用現(xiàn)有的Series來計(jì)算

df.assign(col4=df["col1"] * 3 / 4 + 25)

image

df  # 原數(shù)據(jù)不變

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

</div>

在Python3.6+中，我們可以在同一個(gè)賦值中創(chuàng)建多個(gè)列烁登，并且其中一個(gè)列還可以依賴于同一個(gè)賦值中定義的另一列怯屉，也就是中間生成的新列可以直接使用：

df.assign(
    col5=lambda x: x["col1"] / 2 + 10,         
    col6=lambda x: x["col5"] * 5,  # 在col6計(jì)算中直接使用col5        
    col7=lambda x: x.col2.str.upper(),         
    col8=lambda x: x.col7.str.title()  # col8中使用col7
)

image

df   # 原數(shù)據(jù)不變

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

如果我們重新分配的是一個(gè)現(xiàn)有的列，那么這個(gè)現(xiàn)有列的值將會(huì)被覆蓋：

df.assign(col1=df["col1"] / 2)  # col1直接被覆蓋

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>col1</th>
<th>col2</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>6.0</td>
<td>xiaoming</td>
</tr>
<tr>
<th>1</th>
<td>8.0</td>
<td>peter</td>
</tr>
<tr>
<th>2</th>
<td>9.0</td>
<td>mike</td>
</tr>
</tbody>
</table>

</div>

對(duì)比apply函數(shù)

我們?cè)趐andas中同樣可以使用apply函數(shù)來實(shí)現(xiàn)

df  # 原數(shù)據(jù)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

生成一個(gè)副本饵沧，我們直接在副本上操作：

df1 = df.copy()  # 生成副本锨络，直接在副本上操作
df2 = df.copy()

df1

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

df1.assign(col3=lambda x: x.col1 / 2 + 20)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

df1  # df1保持不變

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

df1["col3"] = df1["col1"].apply(lambda x:x / 2 + 20)

df1  # df1已經(jīng)發(fā)生了變化

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

我們發(fā)現(xiàn)：通過assign函數(shù)的操作，原數(shù)據(jù)是不變的狼牺，但是通過apply操作的數(shù)據(jù)已經(jīng)變化了

BMI

最后在模擬一份數(shù)據(jù)羡儿，計(jì)算每個(gè)人的BMI。

身體質(zhì)量指數(shù)是钥，是BMI指數(shù)掠归，簡(jiǎn)稱體質(zhì)指數(shù)，是國際上常用的衡量人體胖瘦程度以及是否健康的一個(gè)標(biāo)準(zhǔn)悄泥。

${BMI} = \frac {體重}{身高^2}$

其中：體重單位是kg拂到，身高單位是m

df2 = pd.DataFrame({
    "name":["xiaoming","xiaohong","xiaosu"],
    "weight":[78,65,87],
    "height":[1.82,1.75,1.89]
})

df2

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>weight</th>
<th>height</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>xiaoming</td>
<td>78</td>
<td>1.82</td>
</tr>
<tr>
<th>1</th>
<td>xiaohong</td>
<td>65</td>
<td>1.75</td>
</tr>
<tr>
<th>2</th>
<td>xiaosu</td>
<td>87</td>
<td>1.89</td>
</tr>
</tbody>
</table>

</div>

# 使用assign函數(shù)實(shí)現(xiàn)

df2.assign(BMI=df2["weight"] / (df2["height"] ** 2))

image

df2 # 不變

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: left;
}

</style>

</div>

df2["BMI"] = df2["weight"] / (df2["height"] ** 2)

df2  # df2生成了一個(gè)新的列：BMI

image

總結(jié)

通過上面的例子，我們發(fā)現(xiàn)：

使用assign函數(shù)生成的DataFrame是不會(huì)改變?cè)瓉淼臄?shù)據(jù)码泞，這個(gè)DataFrame是新的
assign函數(shù)能夠同時(shí)操作多個(gè)列名兄旬，并且中間生成的列名能夠直接使用
assign和apply的主要區(qū)別在于：前者不改變?cè)瓟?shù)據(jù)，apply函數(shù)是在原數(shù)據(jù)的基礎(chǔ)上添加新列

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市领铐，隨后出現(xiàn)的幾起案子悯森，更是在濱河造成了極大的恐慌，老刑警劉巖绪撵，帶你破解...
沈念sama閱讀 216,372評(píng)論 6贊 498
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件瓢姻，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡音诈，警方通過查閱死者的電腦和手機(jī)幻碱，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,368評(píng)論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來细溅，“玉大人褥傍，你說我怎么就攤上這事±模” “怎么了恍风？”我有些...
開封第一講書人閱讀 162,415評(píng)論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長誓篱。經(jīng)常有香客問我朋贬，道長，這世上最難降的妖魔是什么窜骄？我笑而不...
開封第一講書人閱讀 58,157評(píng)論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任锦募，我火速辦了婚禮，結(jié)果婚禮上邻遏，老公的妹妹穿的比我還像新娘糠亩。我一直安慰自己，他們只是感情好党远，可當(dāng)我...
茶點(diǎn)故事閱讀 67,171評(píng)論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著富弦，像睡著了一般沟娱。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上腕柜，一...
開封第一講書人閱讀 51,125評(píng)論 1贊 297
城市分裂傳說
那天济似，我揣著相機(jī)與錄音，去河邊找鬼盏缤。笑死砰蠢，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的唉铜。我是一名探鬼主播台舱，決...
沈念sama閱讀 40,028評(píng)論 3贊 417
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來了竞惋？” 一聲冷哼從身側(cè)響起柜去，我...
開封第一講書人閱讀 38,887評(píng)論 0贊 274
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎拆宛，沒想到半個(gè)月后嗓奢，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,310評(píng)論 1贊 310
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡浑厚，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,533評(píng)論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年股耽，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片钳幅。...
茶點(diǎn)故事閱讀 39,690評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡物蝙，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出贡这，到底是詐尸還是另有隱情茬末，我是刑警寧澤，帶...
沈念sama閱讀 35,411評(píng)論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布盖矫，位于F島的核電站丽惭，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏辈双。R本人自食惡果不足惜责掏，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,004評(píng)論 3贊 325
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望湃望。院中可真熱鬧换衬，春花似錦、人聲如沸证芭。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,659評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽废士。三九已至叫潦，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間官硝，已是汗流浹背矗蕊。一陣腳步聲響...
開封第一講書人閱讀 32,812評(píng)論 1贊 268
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留氢架，地道東北人傻咖。一個(gè)月前我還...
沈念sama閱讀 47,693評(píng)論 2贊 368
代替公主和親
正文我出身青樓，卻偏偏與公主長得像岖研，于是被迫代替她去往敵國和親卿操。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,577評(píng)論 2贊 353

圖解Pandas的assign函數(shù)

Pandas文章

參數(shù)

導(dǎo)入庫

實(shí)例

方式1：直接調(diào)用數(shù)據(jù)框

方式2：調(diào)用Series數(shù)據(jù)

對(duì)比apply函數(shù)

BMI

總結(jié)

推薦閱讀更多精彩內(nèi)容