31. 加載電信客戶流失數(shù)據(jù)集
df = pd.read_csv("Telco-Customer-Churn.csv")
print(df.head(5))
32. 統(tǒng)計每一列數(shù)據(jù)的缺失值
print(df.isnull())
print(df.isnull().sum())
33. 正確設(shè)置數(shù)據(jù)列的類型
print(df.info())
print(df["TotalCharges"].value_counts())
median = df["TotalCharges"][df["TotalCharges"] != " "].median()
df.loc[df["TotalCharges"] == " ", "TotalCharges"] = median
df["TotalCharges"] = df["TotalCharges"].astype(float)
print()
print(df["TotalCharges"].value_counts())
34. 將類別字段轉(zhuǎn)換成cat類型
print(df.columns)
number_columns = ["tenure", "MonthlyCharges", "TotalCharges"]
for column in number_columns:
df[column] = df[column].astype(float)
for column in set(df.columns) - set(number_columns):
df[column] = pd.Categorical(df[column])
print(df.info())
35. 對cat類型字段數(shù)據(jù)統(tǒng)計
print(df.describe(include=["category"]))
36. churn字段的數(shù)據(jù)分布
print(df["Churn"].value_counts())
37. 多維度查看MonthlyCharges字段統(tǒng)計
print(df.columns)
print(df.groupby(["Churn", "PaymentMethod"])["MonthlyCharges"].mean())
38. Churn字段的數(shù)據(jù)映射
print(df["Churn"].value_counts())
df["Churn"] = df["Churn"].map({"Yes": 1, "No": 0})
print()
print(df["Churn"].value_counts())
39. 查看字段相關(guān)矩陣
print(df.head(3))
print(df.info())
print()
print(df.corr())
40. 從數(shù)據(jù)集中采樣數(shù)據(jù)行
print(df.sample(10))
課程參考鏈接:https://ke.qq.com/course/4000626#term_id=104152097