Kaggle-Titanic羡儿，跟著做

Kaggle上的Titanic败匹，先跟著別人做，學(xué)習(xí)下別人的特征工程和調(diào)參桨踪。
https://www.kaggle.com/startupsci/titanic-data-science-solutions
這篇是vote最多的一個(gè)kernel,就從這里開始吧老翘。
背景是20世紀(jì)初泰坦尼克沉沒，船上有2214人锻离，約32%的人獲救网严，數(shù)據(jù)集中給出了船上人員的信息，你需要對(duì)這些信息進(jìn)行整理建模揉稚，從而預(yù)測(cè)他是否會(huì)獲救炼七。

# data analysis and wrangling
import pandas as pd
import numpy as np
import random as rnd

# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# machine learning
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier

需要用到的包
所需的數(shù)據(jù)：https://www.kaggle.com/c/titanic/data 大家自己下載

train_df = pd.read_csv('../input/train.csv')
test_df = pd.read_csv('../input/test.csv')
combine = [train_df, test_df]

數(shù)據(jù)讀進(jìn)來(lái)之后讓我們看看都有哪些特征呢

print(train_df.columns.values)

['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'
 'Ticket' 'Fare' 'Cabin' 'Embarked']

看看這些特征哪些是標(biāo)簽型的，哪些是數(shù)值型的呢
Categorical: Survived, Sex, and Embarked. Ordinal: Pclass.
注意其中Pclass是有次序的

這些是數(shù)值型的虱朵，一部分是連續(xù)的莉炉，剩下是離散值
Continous: Age, Fare. Discrete: SibSp, Parch.

看一下數(shù)據(jù)吧：

train_df.head()

數(shù)據(jù)1.png

數(shù)據(jù)里有哪些是混合型特征呢？
Ticket is a mix of numeric and alphanumeric data types. Cabin is alphanumeric.可以看到Ticket和Cabin都是字母數(shù)字混合型碴犬，但是Cabin字母是有序的

那么哪些特征包含錯(cuò)誤信息呢絮宁？
數(shù)據(jù)集很大的時(shí)候我們很難檢查，但是這種小數(shù)據(jù)量下還是可以發(fā)現(xiàn)的
比如姓名這一個(gè)特征欄：有拼寫錯(cuò)誤服协，頭銜绍昂，縮寫等等問(wèn)題

還有的特征需要修正，里面存在空白值。
Cabin > Age > Embarked features contain a number of null values in that order for the training dataset.
Cabin > Age are incomplete in case of test dataset.

好了窘游，再看一下特征的數(shù)據(jù)信息：

train_df.info()
print('_'*40)
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
________________________________________
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId    418 non-null int64
Pclass         418 non-null int64
Name           418 non-null object
Sex            418 non-null object
Age            332 non-null float64
SibSp          418 non-null int64
Parch          418 non-null int64
Ticket         418 non-null object
Fare           417 non-null float64
Cabin          91 non-null object
Embarked       418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB

對(duì)數(shù)據(jù)做一些簡(jiǎn)單的統(tǒng)計(jì)分析吧唠椭！
Total samples are 891 or 40% of the actual number of passengers on board the Titanic (2,224).(891個(gè)樣本占總體2224個(gè)人的40%）
Survived is a categorical feature with 0 or 1 values.（是否獲救用0,1來(lái)代表，1為獲救）
Around 38% samples survived representative of the actual survival rate at 32%.（樣本中獲救率為38%忍饰，而真實(shí)獲救率為32%）
Most passengers (> 75%) did not travel with parents or children.（超過(guò)75%的人沒有和父母孩子旅行）
Nearly 30% of the passengers had siblings and/or spouse aboard.（有30%的人有配偶或兄弟姐妹在船上）
Fares varied significantly with few passengers (<1%) paying as high as $512.（少于1%付了最高可達(dá)512美刀的船費(fèi)）
Few elderly passengers (<1%) within age range 65-80.（65-80歲的旅行者很少贪嫂，少于1%）

train_df.describe()

數(shù)據(jù)2.png

Names are unique across the dataset (count=unique=891)（名字都是唯一的！）
Sex variable as two possible values with 65% male (top=male, freq=577/count=891).（性別比）
Cabin values have several dupicates across samples. Alternatively several passengers shared a cabin.（房間號(hào)重復(fù)率很高喘批，好幾個(gè)人住一個(gè)房間）
Embarked takes three possible values. S port used by most passengers (top=S)（登陸港口有三個(gè)撩荣，其中S最多）
Ticket feature has high ratio (22%) of duplicate values (unique=681).（船票編號(hào)重復(fù)率很高，22%）

train_df.describe(include=['O'])

image.png

數(shù)據(jù)分析的假設(shè)##

特征相關(guān)性分析
看看不同特征和生存率之間的關(guān)系
補(bǔ)全數(shù)據(jù)
1.需要補(bǔ)全年齡這個(gè)特征
2.港口信息也要補(bǔ)全
這兩個(gè)特征和獲救率有很大關(guān)系
修正特征
票號(hào)這個(gè)特征重復(fù)高饶深，沒有實(shí)際用途餐曹，需要?jiǎng)h除
游客編號(hào)也要?jiǎng)h掉
船艙號(hào)可能也要?jiǎng)h除
創(chuàng)造新特征
這個(gè)后面會(huì)寫到
分類
婦女，小孩敌厘，倉(cāng)位等級(jí)高的人更容易獲救

數(shù)據(jù)分組觀察##

現(xiàn)在要依據(jù)不同特征類別分組進(jìn)行觀察台猴。

train_df[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean().sort_values(by='Survived', ascending=False)

image.png

train_df[["Sex", "Survived"]].groupby(['Sex'], as_index=False).mean().sort_values(by='Survived', ascending=False)

image.png

剩下的幾個(gè)特征大家依樣畫葫蘆試試咯

觀察完分組特征后，是時(shí)候可視化數(shù)據(jù)了##

表格畢竟沒有圖來(lái)的清楚明了俱两，可視化數(shù)據(jù)是關(guān)鍵一環(huán)

觀察到
Infants (Age <=4) had high survival rate.（小于4歲嬰兒生存率很高）
Oldest passengers (Age = 80) survived.（80歲老人生存率很高）
Large number of 15-25 year olds did not survive.（15-25歲很多人沒有獲救）
Most passengers are in 15-35 age range.（大多數(shù)人在15-35歲）
結(jié)論
We should consider Age (our assumption classifying #2) in our model training.（訓(xùn)練模型的時(shí)候得考慮年齡）
Complete the Age feature for null values (completing #1).（補(bǔ)全年齡特征的空值）
We should band age groups (creating #3).（需要對(duì)年齡段分組）

g = sns.FacetGrid(train_df, col='Survived')
g.map(plt.hist, 'Age', bins=20)

image.png

有次序的特征
Observations.
Pclass=3 had most passengers, however most did not survive. Confirms our classifying assumption #2.（3等座人最多饱狂，但多數(shù)人都跪了，證實(shí)了我們開始的數(shù)據(jù)假設(shè)）
Infant passengers in Pclass=2 and Pclass=3 mostly survived. Further qualifies our classifying assumption #2.（未成年的2宪彩、3等座乘客大多獲救）
Most passengers in Pclass=1 survived. Confirms our classifying assumption #3.（一等座大多獲救）
Pclass varies in terms of Age distribution of passengers.
Decisions.結(jié)論
Consider Pclass for model training.（座位等級(jí)需要考慮）

# grid = sns.FacetGrid(train_df, col='Pclass', hue='Survived')
grid = sns.FacetGrid(train_df, col='Survived', row='Pclass', size=2.2, aspect=1.6)
grid.map(plt.hist, 'Age', alpha=.5, bins=20)
grid.add_legend();

image.png

修正類別屬性
Observations.觀察到
Female passengers had much better survival rate than males. Confirms classifying (#1).（女性乘客獲救率更高）
Exception in Embarked=C where males had higher survival rate. This could be a correlation between Pclass and Embarked and in turn Pclass and Survived, not necessarily direct correlation between Embarked and Survived.（除了登陸點(diǎn)為C的男性獲救率更高之外休讳，其余都是女性獲救率更高，座位次序高的獲救率高）
Males had better survival rate in Pclass=3 when compared with Pclass=2 for C and Q ports. Completing (#2).
Ports of embarkation have varying survival rates for Pclass=3 and among male passengers. Correlating (#1).
Decisions.
Add Sex feature to model training.（需要考慮性別）
Complete and add Embarked feature to model training.（完善登陸信息）

# grid = sns.FacetGrid(train_df, col='Embarked')
grid = sns.FacetGrid(train_df, row='Embarked', size=2.2, aspect=1.6)
grid.map(sns.pointplot, 'Pclass', 'Survived', 'Sex', palette='deep')
grid.add_legend()

image.png

相關(guān)分類和數(shù)字特征
Observations.
Higher fare paying passengers had better survival. Confirms our assumption for creating (#4) fare ranges.（船費(fèi)越貴生存率越高）
Port of embarkation correlates with survival rates. Confirms correlating (#1) and completing (#2).（登陸港口和生存率有關(guān)）
Decisions.
Consider banding Fare feature.（需要考慮船費(fèi)）

# grid = sns.FacetGrid(train_df, col='Embarked', hue='Survived', palette={0: 'k', 1: 'w'})
grid = sns.FacetGrid(train_df, row='Embarked', col='Survived', size=2.2, aspect=1.6)
grid.map(sns.barplot, 'Sex', 'Fare', alpha=.5, ci=None)
grid.add_legend()

image.png

修正數(shù)據(jù)##

通過(guò)觀察數(shù)據(jù)尿孔，其實(shí)我們有一些觀察結(jié)論了俊柔，現(xiàn)在可以執(zhí)行這些結(jié)論了
先刪除一些特征

print("Before", train_df.shape, test_df.shape, combine[0].shape, combine[1].shape)

train_df = train_df.drop(['Ticket', 'Cabin'], axis=1)
test_df = test_df.drop(['Ticket', 'Cabin'], axis=1)
combine = [train_df, test_df]

"After", train_df.shape, test_df.shape, combine[0].shape, combine[1].shape

然后從現(xiàn)有特征中創(chuàng)造一些新的特征
Observations.
When we plot Title, Age, and Survived, we note the following observations.
Most titles band Age groups accurately. For example: Master title has Age mean of 5 years.（頭銜和年齡鏈接緊密）
Survival among Title Age bands varies slightly.（年齡段和獲救率聯(lián)系緊密）
Certain titles mostly survived (Mme, Lady, Sir) or did not (Don, Rev, Jonkheer).（某些頭銜獲救率確實(shí)高，有些則不然）
Decision.
We decide to retain the new Title feature for model training.（保留頭銜特征）使用正則表達(dá)式

for dataset in combine:
    dataset['Title'] = dataset.Name.str.extract(' ([A-Za-z]+)\.', expand=False)

pd.crosstab(train_df['Title'], train_df['Sex'])

image.png

把一些沒卵用的頭銜替換掉

for dataset in combine:
    dataset['Title'] = dataset['Title'].replace(['Lady', 'Countess','Capt', 'Col',\
    'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')

    dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')
    
train_df[['Title', 'Survived']].groupby(['Title'], as_index=False).mean()

image.png

然后數(shù)字化

title_mapping = {"Mr": 1, "Miss": 2, "Mrs": 3, "Master": 4, "Rare": 5}
for dataset in combine:
    dataset['Title'] = dataset['Title'].map(title_mapping)
    dataset['Title'] = dataset['Title'].fillna(0)

train_df.head()

image.png

然后再刪除名字和旅客ID兩個(gè)無(wú)用特征

train_df = train_df.drop(['Name', 'PassengerId'], axis=1)
test_df = test_df.drop(['Name'], axis=1)
combine = [train_df, test_df]
train_df.shape, test_df.shape

然后對(duì)性別進(jìn)行數(shù)字化

for dataset in combine:
    dataset['Sex'] = dataset['Sex'].map( {'female': 1, 'male': 0} ).astype(int)

train_df.head()

image.png

然后開始完善一些特征活合，補(bǔ)充缺失值雏婶。首先是age
有三個(gè)考慮可以使用的方法來(lái)補(bǔ)充連續(xù)性的數(shù)值特征：
1.最簡(jiǎn)單的就是在均值到標(biāo)準(zhǔn)差內(nèi)來(lái)個(gè)隨機(jī)數(shù)
2.更準(zhǔn)確一點(diǎn)的就是通過(guò)相關(guān)特征來(lái)猜測(cè)當(dāng)前特征值。在這個(gè)例子中我們發(fā)現(xiàn)年齡與性別白指、座位級(jí)別這兩個(gè)特征有關(guān)留晚，所以用這兩個(gè)特征分類的均值來(lái)代替特征值。
3.結(jié)合1告嘲、2兩個(gè)方法

由于1错维、3會(huì)引入隨機(jī)誤差，這里作者更偏向于使用2

# grid = sns.FacetGrid(train_df, col='Pclass', hue='Gender')
grid = sns.FacetGrid(train_df, row='Pclass', col='Sex', size=2.2, aspect=1.6)
grid.map(plt.hist, 'Age', alpha=.5, bins=20)
grid.add_legend()

image.png

上圖是不同年齡和性別組合下年齡及人數(shù)分布

然后創(chuàng)建一個(gè)Array來(lái)準(zhǔn)備fill年齡的空值

guess_ages = np.zeros((2,3))
guess_ages

然后迭代的通過(guò)性別橄唬、座席的六種組合來(lái)估計(jì)年齡均值

for dataset in combine:
    for i in range(0, 2):
        for j in range(0, 3):
            guess_df = dataset[(dataset['Sex'] == i) & \
                                  (dataset['Pclass'] == j+1)]['Age'].dropna()

            # age_mean = guess_df.mean()
            # age_std = guess_df.std()
            # age_guess = rnd.uniform(age_mean - age_std, age_mean + age_std)

            age_guess = guess_df.median()

            # Convert random age float to nearest .5 age
            guess_ages[i,j] = int( age_guess/0.5 + 0.5 ) * 0.5
            
    for i in range(0, 2):
        for j in range(0, 3):
            dataset.loc[ (dataset.Age.isnull()) & (dataset.Sex == i) & (dataset.Pclass == j+1),\
                    'Age'] = guess_ages[i,j]

    dataset['Age'] = dataset['Age'].astype(int)

train_df.head()

image.png

填補(bǔ)完空白值之后在對(duì)年齡離散化

train_df['AgeBand'] = pd.cut(train_df['Age'], 5)
train_df[['AgeBand', 'Survived']].groupby(['AgeBand'], as_index=False).mean().sort_values(by='AgeBand', ascending=True)

image.png

然后替換age

for dataset in combine:    
    dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0
    dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1
    dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2
    dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3
    dataset.loc[ dataset['Age'] > 64, 'Age']
train_df.head()

image.png

然后就可以移除ageband這個(gè)過(guò)渡特征了

train_df = train_df.drop(['AgeBand'], axis=1)
combine = [train_df, test_df]
train_df.head()

image.png

從現(xiàn)有特征中組合出新特征各種花樣
新特征FamiliySize是 Parch SibSp之和赋焕，加了新特征之后就可以把這兩個(gè)去掉了

for dataset in combine:
    dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1

train_df[['FamilySize', 'Survived']].groupby(['FamilySize'], as_index=False).mean().sort_values(by='Survived', ascending=False)

image.png

畢竟還是有不少人是一個(gè)人的，所以可以創(chuàng)造個(gè)新特征isalone

for dataset in combine:
    dataset['IsAlone'] = 0
    dataset.loc[dataset['FamilySize'] == 1, 'IsAlone'] = 1

train_df[['IsAlone', 'Survived']].groupby(['IsAlone'], as_index=False).mean()

image.png

看了效果作者覺得isalone就行了轧坎，于是刪除了FamilySize和前面那兩個(gè)特征，但是我覺得familysize可以留著泽示，后面有讀者留言說(shuō)留了之后泛化誤差更低

train_df = train_df.drop(['Parch', 'SibSp', 'FamilySize'], axis=1)
test_df = test_df.drop(['Parch', 'SibSp', 'FamilySize'], axis=1)
combine = [train_df, test_df]

train_df.head()

image.png

作者又創(chuàng)造了一個(gè)新特征是Pclass與age的乘積

for dataset in combine:
    dataset['Age*Class'] = dataset.Age * dataset.Pclass

train_df.loc[:, ['Age*Class', 'Age', 'Pclass']].head(10)

image.png

也沒說(shuō)效果缸血，感覺就是這個(gè)數(shù)字越大越完蛋蜜氨，大家可以測(cè)試下

對(duì)于Embarked 這個(gè)特征代表了游客上船的港口，但是training dataset有些值缺失捎泻，作者就直接用頻率最高的代替了

freq_port = train_df.Embarked.dropna().mode()[0]
freq_port

結(jié)果是S

for dataset in combine:
    dataset['Embarked'] = dataset['Embarked'].fillna(freq_port)
    
train_df[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False).mean().sort_values(by='Survived', ascending=False)

image.png

生存率和上船的港口有關(guān)飒炎，越晚上船的可能越在上面更方便

然后把標(biāo)簽特征轉(zhuǎn)換成離散型

for dataset in combine:
    dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)

train_df.head()

image.png

然后需要填充fare這個(gè)特征，缺失值就用頻率最高的那個(gè)值代替笆豁，然后對(duì)它離散化郎汪。

test_df['Fare'].fillna(test_df['Fare'].dropna().median(), inplace=True)
test_df.head()

image.png

train_df['FareBand'] = pd.qcut(train_df['Fare'], 4)
train_df[['FareBand', 'Survived']].groupby(['FareBand'], as_index=False).mean().sort_values(by='FareBand', ascending=True)

image.png

for dataset in combine:
    dataset.loc[ dataset['Fare'] <= 7.91, 'Fare'] = 0
    dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454), 'Fare'] = 1
    dataset.loc[(dataset['Fare'] > 14.454) & (dataset['Fare'] <= 31), 'Fare']   = 2
    dataset.loc[ dataset['Fare'] > 31, 'Fare'] = 3
    dataset['Fare'] = dataset['Fare'].astype(int)

train_df = train_df.drop(['FareBand'], axis=1)
combine = [train_df, test_df]
    
train_df.head(10)

image.png

test_df.head(10)

image.png

OK,到現(xiàn)在特征處理和數(shù)據(jù)分析、清洗闯狱、轉(zhuǎn)換就做完了煞赢！下面就該放入模型預(yù)測(cè)了，有木有很激動(dòng)哄孤？

Logistic Regression
KNN or k-Nearest Neighbors
Support Vector Machines
Naive Bayes classifier
Decision Tree
Random Forrest
Perceptron
Artificial neural network
RVM or Relevance Vector Machine
這些就不用翻譯了吧
選擇模型也是學(xué)問(wèn)照筑，首先確定自己要解決的是什么問(wèn)題，然后想想各個(gè)算法的優(yōu)劣勢(shì)瘦陈。

X_train = train_df.drop("Survived", axis=1)
Y_train = train_df["Survived"]
X_test  = test_df.drop("PassengerId", axis=1).copy()
X_train.shape, Y_train.shape, X_test.shape

# Logistic Regression

logreg = LogisticRegression()
logreg.fit(X_train, Y_train)
Y_pred = logreg.predict(X_test)
acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
acc_log

80.359999999999999

看看不同特征與Ytrain的相關(guān)性

coeff_df = pd.DataFrame(train_df.columns.delete(0))
coeff_df.columns = ['Feature']
coeff_df["Correlation"] = pd.Series(logreg.coef_[0])

coeff_df.sort_values(by='Correlation', ascending=False)

image.png

# Support Vector Machines

svc = SVC()
svc.fit(X_train, Y_train)
Y_pred = svc.predict(X_test)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
acc_svc

單模型里SVM還是很強(qiáng)大的
83.840000000000003

knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train, Y_train)
Y_pred = knn.predict(X_test)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
acc_knn

84.739999999999995
KNN也不錯(cuò)

# Decision Tree

decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
Y_pred = decision_tree.predict(X_test)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
acc_decision_tree

86.760000000000005

# Random Forest

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
Y_pred = random_forest.predict(X_test)
random_forest.score(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
acc_random_forest

86.760000000000005
RF竟然和DT一樣凝危，有點(diǎn)吃驚

模型評(píng)價(jià)

models = pd.DataFrame({
    'Model': ['Support Vector Machines', 'KNN', 'Logistic Regression', 
              'Random Forest', 'Naive Bayes', 'Perceptron', 
              'Stochastic Gradient Decent', 'Linear SVC', 
              'Decision Tree'],
    'Score': [acc_svc, acc_knn, acc_log, 
              acc_random_forest, acc_gaussian, acc_perceptron, 
              acc_sgd, acc_linear_svc, acc_decision_tree]})
models.sort_values(by='Score', ascending=False)

image.png

最后就可以提交結(jié)果咯

submission = pd.DataFrame({
        "PassengerId": test_df["PassengerId"],
        "Survived": Y_pred
    })
# submission.to_csv('../output/submission.csv', index=False)

那么這個(gè)介紹就到這里了，謝謝大家咯晨逝，See you!

最后編輯于：2017.12.08 00:28:20

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末蛾默，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子捉貌，更是在濱河造成了極大的恐慌支鸡，老刑警劉巖，帶你破解...
沈念sama閱讀 206,378評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件昏翰，死亡現(xiàn)場(chǎng)離奇詭異苍匆，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)棚菊，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,356評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門浸踩，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人统求，你說(shuō)我怎么就攤上這事检碗。” “怎么了码邻？”我有些...
開封第一講書人閱讀 152,702評(píng)論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵折剃，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我像屋，道長(zhǎng)怕犁，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,259評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮奏甫，結(jié)果婚禮上戈轿，老公的妹妹穿的比我還像新娘。我一直安慰自己阵子，他們只是感情好思杯，可當(dāng)我...
茶點(diǎn)故事閱讀 64,263評(píng)論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開白布。她就那樣靜靜地躺著挠进，像睡著了一般色乾。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上领突，一...
開封第一講書人閱讀 49,036評(píng)論 1贊 285
城市分裂傳說(shuō)
那天暖璧，我揣著相機(jī)與錄音，去河邊找鬼攘须。笑死漆撞，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的于宙。我是一名探鬼主播浮驳，決...
沈念sama閱讀 38,349評(píng)論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼捞魁！你這毒婦竟也來(lái)了至会？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,979評(píng)論 0贊 259
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤谱俭，失蹤者是張志新（化名）和其女友劉穎奉件，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體昆著，經(jīng)...
沈念sama閱讀 43,469評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡县貌，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,938評(píng)論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了凑懂。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片煤痕。...
茶點(diǎn)故事閱讀 38,059評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖接谨，靈堂內(nèi)的尸體忽然破棺而出摆碉，到底是詐尸還是另有隱情，我是刑警寧澤脓豪，帶...
沈念sama閱讀 33,703評(píng)論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布巷帝，位于F島的核電站，受9級(jí)特大地震影響扫夜，放射性物質(zhì)發(fā)生泄漏楞泼。R本人自食惡果不足惜驰徊，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,257評(píng)論 3贊 307
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望堕阔。院中可真熱鬧辣垒，春花似錦、人聲如沸印蔬。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,262評(píng)論 0贊 19
一樁弒父案脱衙，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)侥猬。三九已至，卻和暖如春捐韩，著一層夾襖步出監(jiān)牢的瞬間退唠，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,485評(píng)論 1贊 262
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工瞧预，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留仅政，地道東北人圆丹。一個(gè)月前我還...
沈念sama閱讀 45,501評(píng)論 2贊 354
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像硝枉，于是被迫代替她去往敵國(guó)和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子倦微，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,792評(píng)論 2贊 345

Kaggle-Titanic，跟著做

Kaggle-Titanic羡儿，跟著做

數(shù)據(jù)分析的假設(shè)##

數(shù)據(jù)分組觀察##

觀察完分組特征后，是時(shí)候可視化數(shù)據(jù)了##

修正數(shù)據(jù)##

推薦閱讀更多精彩內(nèi)容