Pandas怎樣實現(xiàn)DataFrame的Merge
Pandas的Merge狐粱,相當(dāng)于Sql的Join施禾,將不同的表按key關(guān)聯(lián)到一個表
merge的語法:
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True,
suffixes=('_x', '_y'), copy=True, indicator=False,
validate=None)
- left,right:要merge的dataframe或者有name的Series
- how:join類型车海,'left', 'right', 'outer', 'inner'
- on:join的key赶么,left和right都需要有這個key
- left_on:left的df或者series的key
- right_on:right的df或者seires的key
- left_index,right_index:使用index而不是普通的column做join
- suffixes:兩個元素的后綴替久,如果列有重名凉泄,自動添加后綴,默認(rèn)是('_x', '_y')
文檔地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
本章終點
- 電影數(shù)據(jù)集的join實例
- 理解merge時一對一蚯根、一對多后众、多對多的數(shù)量對齊關(guān)系
- 理解left join、right join颅拦、inner join蒂誉、outer join的區(qū)別
- 如果出現(xiàn)非Key的字段重名怎么辦
一、電影數(shù)據(jù)集的join實例
import pandas as pd
df_ratings = pd.read_csv(
r"D:\node\nd\Pandas_study\pandas_test\ratings.dat",
sep="::",
engine='python',
names="UserID::MovieID::Rating::Timestamp".split("::")
)
ratings = df_ratings.head()
print(ratings)
df_users = pd.read_csv(
r"D:\node\nd\Pandas_study\pandas_test\users.dat",
sep="::",
engine='python',
names="UserID::Gender::Age::Occupation::Zip-code".split("::")
)
users = df_users.head()
print(users)
df_movies = pd.read_csv(
r"D:\node\nd\Pandas_study\pandas_test\movies.dat",
sep="::",
engine='python',
names="MovieID::Title::Genres".split("::")
)
movies = df_movies.head()
print(movies)
1.評分?jǐn)?shù)據(jù)和用戶數(shù)據(jù)進(jìn)行關(guān)聯(lián)
df_ratings_user = pd.merge(
df_ratings,df_users,left_on="UserID",right_on="UserID",how = "inner"
)
print(df_ratings_user.head())
2距帅、df_ratings_user形成的新表和電影表關(guān)聯(lián)
df_ratings_user_movie = pd.merge(
df_ratings_user,df_movies,left_on="MovieID",right_on="MovieID",how="inner"
)
二右锨、解merge時一對一、一對多碌秸、多對多的數(shù)量對齊關(guān)系
以下關(guān)系要正確理解:
- one-to-one:一對一關(guān)系绍移,關(guān)聯(lián)的key都是唯一的
- 比如(學(xué)號,姓名) merge (學(xué)號讥电,年齡)
-
結(jié)果條數(shù)為:1*1
left = pd.DataFrame({'sno': [11, 12, 13, 14],
'name': ['name_a', 'name_b', 'name_c', 'name_d']
})
print(left)
right = pd.DataFrame({'sno': [11, 12, 13, 14],
'age': ['21', '22', '23', '24']
})
print(right)
a = pd.merge(
left,right,on="sno"
)
print(a)
- one-to-many:一對多關(guān)系蹂窖,左邊唯一key,右邊不唯一key
- 比如(學(xué)號恩敌,姓名) merge (學(xué)號瞬测,[語文成績、數(shù)學(xué)成績、英語成績])
-
結(jié)果條數(shù)為:1*N
left = pd.DataFrame({'sno': [11, 12, 13, 14],
'name': ['name_a', 'name_b', 'name_c', 'name_d']
})
print(left)
right = pd.DataFrame({'sno': [11, 11, 11, 12, 12, 13],
'grade': ['語文88', '數(shù)學(xué)90', '英語75','語文66', '數(shù)學(xué)55', '英語29']
})
print(right)
a = pd.merge(
left,right,on="sno"
)
print(a)
- many-to-many:多對多關(guān)系月趟,左邊右邊都不是唯一的
- 比如(學(xué)號灯蝴,[語文成績、數(shù)學(xué)成績孝宗、英語成績]) merge (學(xué)號绽乔,[籃球、足球碳褒、乒乓球])
-
結(jié)果條數(shù)為:M*N
left = pd.DataFrame({'sno': [11, 11, 12, 12,12],
'愛好': ['籃球', '羽毛球', '乒乓球', '籃球', "足球"]
})
print(left)
right = pd.DataFrame({'sno': [11, 11, 11, 12, 12, 13],
'grade': ['語文88', '數(shù)學(xué)90', '英語75','語文66', '數(shù)學(xué)55', '英語29']
})
print(right)
a = pd.merge(
left,right,on="sno"
)
print(a)
三、理解left join看疗、right join沙峻、inner join、outer join的區(qū)別
3-1 inner join 默認(rèn)
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
print(left)
right = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K5'],
'C': ['C0', 'C1', 'C4', 'C5'],
'D': ['D0', 'D1', 'D4', 'D5']})
print(right)
a = pd.merge(
left,right,how="inner"
)
print(a)
3-2 left join 左邊都會出現(xiàn)在結(jié)果里两芳,右邊的如果無法匹配則為null
b = pd.merge(
left,right,how="left"
)
print(b)
3-3 right join右邊都會出現(xiàn)在結(jié)果里摔寨,左邊的如果無法匹配則為null
c = pd.merge(
left,right,how="right"
)
print(c)
3-5 outer join 左邊、右邊都會出現(xiàn)在結(jié)果里怖辆,如果無法匹配則為null
d = pd.merge(
left,right,how="outer"
)
print(d)
四是复、如果出現(xiàn)非Key的字段重名怎么辦
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K5'],
'A': ['A10', 'A11', 'A12', 'A13'],
'D': ['D0', 'D1', 'D4', 'D5']})
print(left)
print(right)
a = pd.merge(
left,right,on="key"
)
print(a)
b = pd.merge(
#suffixes指定相同參數(shù)的后綴
left,right,on="key",suffixes=("_left","_right")
)
print(b)