數(shù)據(jù)挖掘組隊(duì)學(xué)習(xí)之模型融合

同DataWhale一起組隊(duì)學(xué)習(xí):https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12281978.0.0.6802593a2HCrSE&postId=95535

模型融合是比賽后期一個(gè)重要的環(huán)節(jié),大體來(lái)說(shuō)有如下的類型方式。

  1. 簡(jiǎn)單加權(quán)融合:
    • 回歸(分類概率):算術(shù)平均融合(Arithmetic mean)熔号,幾何平均融合(Geometric mean)损离;
    • 分類:投票(Voting)
    • 綜合:排序融合(Rank averaging)岂嗓,log融合
  1. stacking/blending:
    • 構(gòu)建多層模型,并利用預(yù)測(cè)結(jié)果再擬合預(yù)測(cè)。
  1. boosting/bagging(在xgboost,Adaboost,GBDT中已經(jīng)用到):
    • 多樹(shù)的提升方法

Stacking相關(guān)理論介紹

1) 什么是 stacking

簡(jiǎn)單來(lái)說(shuō) stacking 就是當(dāng)用初始訓(xùn)練數(shù)據(jù)學(xué)習(xí)出若干個(gè)基學(xué)習(xí)器后整慎,將這幾個(gè)學(xué)習(xí)器的預(yù)測(cè)結(jié)果作為新的訓(xùn)練集,來(lái)學(xué)習(xí)一個(gè)新的學(xué)習(xí)器围苫。

1584448793231_6TygjXwjNb.jpg

將個(gè)體學(xué)習(xí)器結(jié)合在一起的時(shí)候使用的方法叫做結(jié)合策略裤园。對(duì)于分類問(wèn)題,我們可以使用投票法來(lái)選擇輸出最多的類剂府。對(duì)于回歸問(wèn)題拧揽,我們可以將分類器輸出的結(jié)果求平均值。

上面說(shuō)的投票法和平均法都是很有效的結(jié)合策略腺占,還有一種結(jié)合策略是使用另外一個(gè)機(jī)器學(xué)習(xí)算法來(lái)將個(gè)體機(jī)器學(xué)習(xí)器的結(jié)果結(jié)合在一起强法,這個(gè)方法就是Stacking。

在stacking方法中湾笛,我們把個(gè)體學(xué)習(xí)器叫做初級(jí)學(xué)習(xí)器饮怯,用于結(jié)合的學(xué)習(xí)器叫做次級(jí)學(xué)習(xí)器或元學(xué)習(xí)器(meta-learner),次級(jí)學(xué)習(xí)器用于訓(xùn)練的數(shù)據(jù)叫做次級(jí)訓(xùn)練集嚎研。次級(jí)訓(xùn)練集是在訓(xùn)練集上用初級(jí)學(xué)習(xí)器得到的蓖墅。

2) 如何進(jìn)行 stacking

算法示意圖如下:

1584448806789_1ElRtHaacw.jpg

引用自 西瓜書(shū)《機(jī)器學(xué)習(xí)》

  • 過(guò)程1-3 是訓(xùn)練出來(lái)個(gè)體學(xué)習(xí)器库倘,也就是初級(jí)學(xué)習(xí)器。
  • 過(guò)程5-9是 使用訓(xùn)練出來(lái)的個(gè)體學(xué)習(xí)器來(lái)得預(yù)測(cè)的結(jié)果论矾,這個(gè)預(yù)測(cè)的結(jié)果當(dāng)做次級(jí)學(xué)習(xí)器的訓(xùn)練集教翩。
  • 過(guò)程11 是用初級(jí)學(xué)習(xí)器預(yù)測(cè)的結(jié)果訓(xùn)練出次級(jí)學(xué)習(xí)器,得到我們最后訓(xùn)練的模型贪壳。

3)Stacking的方法講解

首先饱亿,我們先從一種“不那么正確”但是容易懂的Stacking方法講起。

Stacking模型本質(zhì)上是一種分層的結(jié)構(gòu)闰靴,這里簡(jiǎn)單起見(jiàn)彪笼,只分析二級(jí)Stacking.假設(shè)我們有2個(gè)基模型 Model1_1、Model1_2 和 一個(gè)次級(jí)模型Model2

Step 1. 基模型 Model1_1蚂且,對(duì)訓(xùn)練集train訓(xùn)練配猫,然后用于預(yù)測(cè) train 和 test 的標(biāo)簽列,分別是P1杏死,T1

Model1_1 模型訓(xùn)練:

\left(\begin{array}{c}{\vdots} \\ {X_{train}} \\ {\vdots}\end{array}\right) \overbrace{\Longrightarrow}^{\text {Model1_1 Train} }\left(\begin{array}{c}{\vdots} \\ {Y}_{True} \\ {\vdots}\end{array}\right)

訓(xùn)練后的模型 Model1_1 分別在 train 和 test 上預(yù)測(cè)泵肄,得到預(yù)測(cè)標(biāo)簽分別是P1,T1

\left(\begin{array}{c}{\vdots} \\ {X_{train}} \\ {\vdots}\end{array}\right) \overbrace{\Longrightarrow}^{\text {Model1_1 Predict} }\left(\begin{array}{c}{\vdots} \\ {P}_{1} \\ {\vdots}\end{array}\right)

\left(\begin{array}{c}{\vdots} \\ {X_{test}} \\ {\vdots}\end{array}\right) \overbrace{\Longrightarrow}^{\text {Model1_1 Predict} }\left(\begin{array}{c}{\vdots} \\ {T_{1}} \\ {\vdots}\end{array}\right)

Step 2. 基模型 Model1_2 淑翼,對(duì)訓(xùn)練集train訓(xùn)練腐巢,然后用于預(yù)測(cè)train和test的標(biāo)簽列,分別是P2玄括,T2

Model1_2 模型訓(xùn)練:

\left(\begin{array}{c}{\vdots} \\ {X_{train}} \\ {\vdots}\end{array}\right) \overbrace{\Longrightarrow}^{\text {Model1_2 Train} }\left(\begin{array}{c}{\vdots} \\ {Y}_{True} \\ {\vdots}\end{array}\right)

訓(xùn)練后的模型 Model1_2 分別在 train 和 test 上預(yù)測(cè)冯丙,得到預(yù)測(cè)標(biāo)簽分別是P2,T2

\left(\begin{array}{c}{\vdots} \\ {X_{train}} \\ {\vdots}\end{array}\right) \overbrace{\Longrightarrow}^{\text {Model1_2 Predict} }\left(\begin{array}{c}{\vdots} \\ {P}_{2} \\ {\vdots}\end{array}\right)

\left(\begin{array}{c}{\vdots} \\ {X_{test}} \\ {\vdots}\end{array}\right) \overbrace{\Longrightarrow}^{\text {Model1_2 Predict} }\left(\begin{array}{c}{\vdots} \\ {T_{2}} \\ {\vdots}\end{array}\right)

Step 3. 分別把P1,P2以及T1,T2合并惠豺,得到一個(gè)新的訓(xùn)練集和測(cè)試集train2,test2.

\overbrace{\left(\begin{array}{c}{\vdots} \\ {P_{1}} \\ {\vdots}\end{array} \begin{array}{c}{\vdots} \\ {P_{2}} \\ {\vdots}\end{array} \right)}^{\text {Train_2 }} and \overbrace{\left(\begin{array}{c}{\vdots} \\ {T_{1}} \\ {\vdots}\end{array} \begin{array}{c}{\vdots} \\ {T_{2}} \\ {\vdots}\end{array} \right)}^{\text {Test_2 }}

再用 次級(jí)模型 Model2 以真實(shí)訓(xùn)練集標(biāo)簽為標(biāo)簽訓(xùn)練,以train2為特征進(jìn)行訓(xùn)練,預(yù)測(cè)test2,得到最終的測(cè)試集預(yù)測(cè)的標(biāo)簽列 Y_{Pre}风宁。

\overbrace{\left(\begin{array}{c}{\vdots} \\ {P_{1}} \\ {\vdots}\end{array} \begin{array}{c}{\vdots} \\ {P_{2}} \\ {\vdots}\end{array} \right)}^{\text {Train_2 }} \overbrace{\Longrightarrow}^{\text {Model2 Train} }\left(\begin{array}{c}{\vdots} \\ {Y}_{True} \\ {\vdots}\end{array}\right)

\overbrace{\left(\begin{array}{c}{\vdots} \\ {T_{1}} \\ {\vdots}\end{array} \begin{array}{c}{\vdots} \\ {T_{2}} \\ {\vdots}\end{array} \right)}^{\text {Test_2 }} \overbrace{\Longrightarrow}^{\text {Model1_2 Predict} }\left(\begin{array}{c}{\vdots} \\ {Y}_{Pre} \\ {\vdots}\end{array}\right)

這就是我們兩層堆疊的一種基本的原始思路想法洁墙。在不同模型預(yù)測(cè)的結(jié)果基礎(chǔ)上再加一層模型,進(jìn)行再訓(xùn)練戒财,從而得到模型最終的預(yù)測(cè)热监。

Stacking本質(zhì)上就是這么直接的思路,但是直接這樣有時(shí)對(duì)于如果訓(xùn)練集和測(cè)試集分布不那么一致的情況下是有一點(diǎn)問(wèn)題的饮寞,其問(wèn)題在于用初始模型訓(xùn)練的標(biāo)簽再利用真實(shí)標(biāo)簽進(jìn)行再訓(xùn)練孝扛,毫無(wú)疑問(wèn)會(huì)導(dǎo)致一定的模型過(guò)擬合訓(xùn)練集,這樣或許模型在測(cè)試集上的泛化能力或者說(shuō)效果會(huì)有一定的下降幽崩,因此現(xiàn)在的問(wèn)題變成了如何降低再訓(xùn)練的過(guò)擬合性苦始,這里我們一般有兩種方法。

    1. 次級(jí)模型盡量選擇簡(jiǎn)單的線性模型
    1. 利用K折交叉驗(yàn)證

K-折交叉驗(yàn)證:
訓(xùn)練:

1584448819632_YvJOXMk02P.jpg

預(yù)測(cè):

1584448826203_k8KPy9n7D9.jpg

5.4 代碼示例

5.4.1 回歸\分類概率-融合:

1)簡(jiǎn)單加權(quán)平均慌申,結(jié)果直接融合

## 生成一些簡(jiǎn)單的樣本數(shù)據(jù)陌选,test_prei 代表第i個(gè)模型的預(yù)測(cè)值
test_pre1 = [1.2, 3.2, 2.1, 6.2]
test_pre2 = [0.9, 3.1, 2.0, 5.9]
test_pre3 = [1.1, 2.9, 2.2, 6.0]

# y_test_true 代表第模型的真實(shí)值
y_test_true = [1, 3, 2, 6] 
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

## 定義結(jié)果的加權(quán)平均函數(shù)
def Weighted_method(test_pre1,test_pre2,test_pre3,w=[1/3,1/3,1/3]):
    Weighted_result = w[0]*pd.Series(test_pre1)+w[1]*pd.Series(test_pre2)+w[2]*pd.Series(test_pre3)
    return Weighted_result
from sklearn import metrics
# 各模型的預(yù)測(cè)結(jié)果計(jì)算MAE
print('Pred1 MAE:',metrics.mean_absolute_error(y_test_true, test_pre1))
print('Pred2 MAE:',metrics.mean_absolute_error(y_test_true, test_pre2))
print('Pred3 MAE:',metrics.mean_absolute_error(y_test_true, test_pre3))
Pred1 MAE: 0.1750000000000001
Pred2 MAE: 0.07499999999999993
Pred3 MAE: 0.10000000000000009
## 根據(jù)加權(quán)計(jì)算MAE
w = [0.3,0.4,0.3] # 定義比重權(quán)值
Weighted_pre = Weighted_method(test_pre1,test_pre2,test_pre3,w)
print('Weighted_pre MAE:',metrics.mean_absolute_error(y_test_true, Weighted_pre))
Weighted_pre MAE: 0.05750000000000027

可以發(fā)現(xiàn)加權(quán)結(jié)果相對(duì)于之前的結(jié)果是有提升的,這種我們稱其為簡(jiǎn)單的加權(quán)平均。

還有一些特殊的形式咨油,比如mean平均您炉,median平均

## 定義結(jié)果的加權(quán)平均函數(shù)
def Mean_method(test_pre1,test_pre2,test_pre3):
    Mean_result = pd.concat([pd.Series(test_pre1),pd.Series(test_pre2),pd.Series(test_pre3)],axis=1).mean(axis=1)
    return Mean_result
Mean_pre = Mean_method(test_pre1,test_pre2,test_pre3)
print('Mean_pre MAE:',metrics.mean_absolute_error(y_test_true, Mean_pre))
Mean_pre MAE: 0.06666666666666693
## 定義結(jié)果的加權(quán)平均函數(shù)
def Median_method(test_pre1,test_pre2,test_pre3):
    Median_result = pd.concat([pd.Series(test_pre1),pd.Series(test_pre2),pd.Series(test_pre3)],axis=1).median(axis=1)
    return Median_result
Median_pre = Median_method(test_pre1,test_pre2,test_pre3)
print('Median_pre MAE:',metrics.mean_absolute_error(y_test_true, Median_pre))
Median_pre MAE: 0.07500000000000007

2) Stacking融合(回歸):

from sklearn import linear_model

def Stacking_method(train_reg1,train_reg2,train_reg3,y_train_true,test_pre1,test_pre2,test_pre3,model_L2= linear_model.LinearRegression()):
    model_L2.fit(pd.concat([pd.Series(train_reg1),pd.Series(train_reg2),pd.Series(train_reg3)],axis=1).values,y_train_true)
    Stacking_result = model_L2.predict(pd.concat([pd.Series(test_pre1),pd.Series(test_pre2),pd.Series(test_pre3)],axis=1).values)
    return Stacking_result
## 生成一些簡(jiǎn)單的樣本數(shù)據(jù),test_prei 代表第i個(gè)模型的預(yù)測(cè)值
train_reg1 = [3.2, 8.2, 9.1, 5.2]
train_reg2 = [2.9, 8.1, 9.0, 4.9]
train_reg3 = [3.1, 7.9, 9.2, 5.0]
# y_test_true 代表第模型的真實(shí)值
y_train_true = [3, 8, 9, 5] 

test_pre1 = [1.2, 3.2, 2.1, 6.2]
test_pre2 = [0.9, 3.1, 2.0, 5.9]
test_pre3 = [1.1, 2.9, 2.2, 6.0]

# y_test_true 代表第模型的真實(shí)值
y_test_true = [1, 3, 2, 6] 
model_L2= linear_model.LinearRegression()
Stacking_pre = Stacking_method(train_reg1,train_reg2,train_reg3,y_train_true,
                               test_pre1,test_pre2,test_pre3,model_L2)
print('Stacking_pre MAE:',metrics.mean_absolute_error(y_test_true, Stacking_pre))
Stacking_pre MAE: 0.04213483146067476

可以發(fā)現(xiàn)模型結(jié)果相對(duì)于之前有進(jìn)一步的提升役电,這是我們需要注意的一點(diǎn)是赚爵,對(duì)于第二層Stacking的模型不宜選取的過(guò)于復(fù)雜,這樣會(huì)導(dǎo)致模型在訓(xùn)練集上過(guò)擬合法瑟,從而使得在測(cè)試集上并不能達(dá)到很好的效果冀膝。

5.4.2 分類模型融合:

對(duì)于分類,同樣的可以使用融合方法瓢谢,比如簡(jiǎn)單投票畸写,Stacking...

from sklearn.datasets import make_blobs
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score,roc_auc_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold

1)Voting投票機(jī)制:

Voting即投票機(jī)制,分為軟投票和硬投票兩種氓扛,其原理采用少數(shù)服從多數(shù)的思想枯芬。

'''
硬投票:對(duì)多個(gè)模型直接進(jìn)行投票,不區(qū)分模型結(jié)果的相對(duì)重要度采郎,最終投票數(shù)最多的類為最終被預(yù)測(cè)的類千所。
'''
iris = datasets.load_iris()

x=iris.data
y=iris.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

clf1 = XGBClassifier(learning_rate=0.1, n_estimators=150, max_depth=3, min_child_weight=2, subsample=0.7,
                     colsample_bytree=0.6, objective='binary:logistic')
clf2 = RandomForestClassifier(n_estimators=50, max_depth=1, min_samples_split=4,
                              min_samples_leaf=63,oob_score=True)
clf3 = SVC(C=0.1)

# 硬投票
eclf = VotingClassifier(estimators=[('xgb', clf1), ('rf', clf2), ('svc', clf3)], voting='hard')
for clf, label in zip([clf1, clf2, clf3, eclf], ['XGBBoosting', 'Random Forest', 'SVM', 'Ensemble']):
    scores = cross_val_score(clf, x, y, cv=5, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))
Accuracy: 0.96 (+/- 0.02) [XGBBoosting]
Accuracy: 0.33 (+/- 0.00) [Random Forest]
Accuracy: 0.95 (+/- 0.03) [SVM]
Accuracy: 0.95 (+/- 0.03) [Ensemble]
'''
軟投票:和硬投票原理相同,增加了設(shè)置權(quán)重的功能蒜埋,可以為不同模型設(shè)置不同權(quán)重淫痰,進(jìn)而區(qū)別模型不同的重要度。
'''
x=iris.data
y=iris.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

clf1 = XGBClassifier(learning_rate=0.1, n_estimators=150, max_depth=3, min_child_weight=2, subsample=0.8,
                     colsample_bytree=0.8, objective='binary:logistic')
clf2 = RandomForestClassifier(n_estimators=50, max_depth=1, min_samples_split=4,
                              min_samples_leaf=63,oob_score=True)
clf3 = SVC(C=0.1, probability=True)

# 軟投票
eclf = VotingClassifier(estimators=[('xgb', clf1), ('rf', clf2), ('svc', clf3)], voting='soft', weights=[2, 1, 1])
clf1.fit(x_train, y_train)

for clf, label in zip([clf1, clf2, clf3, eclf], ['XGBBoosting', 'Random Forest', 'SVM', 'Ensemble']):
    scores = cross_val_score(clf, x, y, cv=5, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))
Accuracy: 0.96 (+/- 0.02) [XGBBoosting]
Accuracy: 0.33 (+/- 0.00) [Random Forest]
Accuracy: 0.95 (+/- 0.03) [SVM]
Accuracy: 0.96 (+/- 0.02) [Ensemble]

2)分類的Stacking\Blending融合:

stacking是一種分層模型集成框架整份。

以兩層為例待错,第一層由多個(gè)基學(xué)習(xí)器組成,其輸入為原始訓(xùn)練集烈评,第二層的模型則是以第一層基學(xué)習(xí)器的輸出作為訓(xùn)練集進(jìn)行再訓(xùn)練火俄,從而得到完整的stacking模型, stacking兩層模型都使用了全部的訓(xùn)練數(shù)據(jù)。

'''
5-Fold Stacking
'''
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier,GradientBoostingClassifier
import pandas as pd
#創(chuàng)建訓(xùn)練的數(shù)據(jù)集
data_0 = iris.data
data = data_0[:100,:]

target_0 = iris.target
target = target_0[:100]

#模型融合中使用到的各個(gè)單模型
clfs = [LogisticRegression(solver='lbfgs'),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]
 
#切分一部分?jǐn)?shù)據(jù)作為測(cè)試集
X, X_predict, y, y_predict = train_test_split(data, target, test_size=0.3, random_state=2020)

dataset_blend_train = np.zeros((X.shape[0], len(clfs)))
dataset_blend_test = np.zeros((X_predict.shape[0], len(clfs)))

#5折stacking
n_splits = 5
skf = StratifiedKFold(n_splits)
skf = skf.split(X, y)

for j, clf in enumerate(clfs):
    #依次訓(xùn)練各個(gè)單模型
    dataset_blend_test_j = np.zeros((X_predict.shape[0], 5))
    for i, (train, test) in enumerate(skf):
        #5-Fold交叉訓(xùn)練讲冠,使用第i個(gè)部分作為預(yù)測(cè)瓜客,剩余的部分來(lái)訓(xùn)練模型,獲得其預(yù)測(cè)的輸出作為第i部分的新特征竿开。
        X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]
        clf.fit(X_train, y_train)
        y_submission = clf.predict_proba(X_test)[:, 1]
        dataset_blend_train[test, j] = y_submission
        dataset_blend_test_j[:, i] = clf.predict_proba(X_predict)[:, 1]
    #對(duì)于測(cè)試集谱仪,直接用這k個(gè)模型的預(yù)測(cè)值均值作為新的特征。
    dataset_blend_test[:, j] = dataset_blend_test_j.mean(1)
    print("val auc Score: %f" % roc_auc_score(y_predict, dataset_blend_test[:, j]))

clf = LogisticRegression(solver='lbfgs')
clf.fit(dataset_blend_train, y)
y_submission = clf.predict_proba(dataset_blend_test)[:, 1]

print("Val auc Score of Stacking: %f" % (roc_auc_score(y_predict, y_submission)))

val auc Score: 1.000000
val auc Score: 0.500000
val auc Score: 0.500000
val auc Score: 0.500000
val auc Score: 0.500000
Val auc Score of Stacking: 1.000000

Blending否彩,其實(shí)和Stacking是一種類似的多層模型融合的形式

其主要思路是把原始的訓(xùn)練集先分成兩部分疯攒,比如70%的數(shù)據(jù)作為新的訓(xùn)練集,剩下30%的數(shù)據(jù)作為測(cè)試集列荔。

在第一層卸例,我們?cè)谶@70%的數(shù)據(jù)上訓(xùn)練多個(gè)模型称杨,然后去預(yù)測(cè)那30%數(shù)據(jù)的label,同時(shí)也預(yù)測(cè)test集的label筷转。

在第二層姑原,我們就直接用這30%數(shù)據(jù)在第一層預(yù)測(cè)的結(jié)果做為新特征繼續(xù)訓(xùn)練,然后用test集第一層預(yù)測(cè)的label做特征呜舒,用第二層訓(xùn)練的模型做進(jìn)一步預(yù)測(cè)

其優(yōu)點(diǎn)在于:

  • 1.比stacking簡(jiǎn)單(因?yàn)椴挥眠M(jìn)行k次的交叉驗(yàn)證來(lái)獲得stacker feature)
  • 2.避開(kāi)了一個(gè)信息泄露問(wèn)題:generlizers和stacker使用了不一樣的數(shù)據(jù)集

缺點(diǎn)在于:

  • 1.使用了很少的數(shù)據(jù)(第二階段的blender只使用training set10%的量)
  • 2.blender可能會(huì)過(guò)擬合
  • 3.stacking使用多次的交叉驗(yàn)證會(huì)比較穩(wěn)健
    '''
'''
Blending
'''
 
#創(chuàng)建訓(xùn)練的數(shù)據(jù)集
#創(chuàng)建訓(xùn)練的數(shù)據(jù)集
data_0 = iris.data
data = data_0[:100,:]

target_0 = iris.target
target = target_0[:100]
 
#模型融合中使用到的各個(gè)單模型
clfs = [LogisticRegression(solver='lbfgs'),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        #ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]
 
#切分一部分?jǐn)?shù)據(jù)作為測(cè)試集
X, X_predict, y, y_predict = train_test_split(data, target, test_size=0.3, random_state=2020)

#切分訓(xùn)練數(shù)據(jù)集為d1,d2兩部分
X_d1, X_d2, y_d1, y_d2 = train_test_split(X, y, test_size=0.5, random_state=2020)
dataset_d1 = np.zeros((X_d2.shape[0], len(clfs)))
dataset_d2 = np.zeros((X_predict.shape[0], len(clfs)))
 
for j, clf in enumerate(clfs):
    #依次訓(xùn)練各個(gè)單模型
    clf.fit(X_d1, y_d1)
    y_submission = clf.predict_proba(X_d2)[:, 1]
    dataset_d1[:, j] = y_submission
    #對(duì)于測(cè)試集锭汛,直接用這k個(gè)模型的預(yù)測(cè)值作為新的特征。
    dataset_d2[:, j] = clf.predict_proba(X_predict)[:, 1]
    print("val auc Score: %f" % roc_auc_score(y_predict, dataset_d2[:, j]))

#融合使用的模型
clf = GradientBoostingClassifier(learning_rate=0.02, subsample=0.5, max_depth=6, n_estimators=30)
clf.fit(dataset_d1, y_d2)
y_submission = clf.predict_proba(dataset_d2)[:, 1]
print("Val auc Score of Blending: %f" % (roc_auc_score(y_predict, y_submission)))
val auc Score: 1.000000
val auc Score: 1.000000
val auc Score: 1.000000
val auc Score: 1.000000
val auc Score: 1.000000
Val auc Score of Blending: 1.000000

參考博客:https://blog.csdn.net/Noob_daniel/article/details/76087829

3)分類的Stacking融合(利用mlxtend):

!pip install mlxtend

import warnings
warnings.filterwarnings('ignore')
import itertools
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import StackingClassifier

from sklearn.model_selection import cross_val_score
from mlxtend.plotting import plot_learning_curves
from mlxtend.plotting import plot_decision_regions

# 以python自帶的鳶尾花數(shù)據(jù)集為例
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
lr = LogisticRegression()
sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], 
                          meta_classifier=lr)

label = ['KNN', 'Random Forest', 'Naive Bayes', 'Stacking Classifier']
clf_list = [clf1, clf2, clf3, sclf]

fig = plt.figure(figsize=(10,8))
gs = gridspec.GridSpec(2, 2)
grid = itertools.product([0,1],repeat=2)

clf_cv_mean = []
clf_cv_std = []
for clf, label, grd in zip(clf_list, label, grid):
        
    scores = cross_val_score(clf, X, y, cv=3, scoring='accuracy')
    print("Accuracy: %.2f (+/- %.2f) [%s]" %(scores.mean(), scores.std(), label))
    clf_cv_mean.append(scores.mean())
    clf_cv_std.append(scores.std())
        
    clf.fit(X, y)
    ax = plt.subplot(gs[grd[0], grd[1]])
    fig = plot_decision_regions(X=X, y=y, clf=clf)
    plt.title(label)

plt.show()
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: mlxtend in d:\programdata\anaconda3\lib\site-packages (0.17.2)
Requirement already satisfied: scipy>=1.2.1 in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (1.4.1)
Requirement already satisfied: numpy>=1.16.2 in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (1.16.5)
Requirement already satisfied: matplotlib>=3.0.0 in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (3.1.1)
Requirement already satisfied: joblib>=0.13.2 in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (0.14.1)
Requirement already satisfied: pandas>=0.24.2 in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (0.25.3)
Requirement already satisfied: setuptools in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (41.4.0)
Requirement already satisfied: scikit-learn>=0.20.3 in d:\programdata\anaconda3\lib\site-packages (from mlxtend) (0.21.3)
Requirement already satisfied: cycler>=0.10 in d:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in d:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (2.4.2)
Requirement already satisfied: python-dateutil>=2.1 in d:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in d:\programdata\anaconda3\lib\site-packages (from pandas>=0.24.2->mlxtend) (2019.3)
Requirement already satisfied: six in d:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=3.0.0->mlxtend) (1.12.0)
Accuracy: 0.91 (+/- 0.01) [KNN]
Accuracy: 0.93 (+/- 0.05) [Random Forest]
Accuracy: 0.92 (+/- 0.03) [Naive Bayes]
Accuracy: 0.95 (+/- 0.03) [Stacking Classifier]
數(shù)據(jù)挖掘之模型融合_44_1.png

可以發(fā)現(xiàn) 基模型 用 'KNN', 'Random Forest', 'Naive Bayes' 然后再這基礎(chǔ)上 次級(jí)模型加一個(gè) 'LogisticRegression'袭蝗,模型測(cè)試效果有著很好的提升唤殴。

5.4.3 一些其它方法:

將特征放進(jìn)模型中預(yù)測(cè),并將預(yù)測(cè)結(jié)果變換并作為新的特征加入原有特征中再經(jīng)過(guò)模型預(yù)測(cè)結(jié)果 (Stacking變化)

(可以反復(fù)預(yù)測(cè)多次將結(jié)果加入最后的特征中)

def Ensemble_add_feature(train,test,target,clfs):
    
    # n_flods = 5
    # skf = list(StratifiedKFold(y, n_folds=n_flods))

    train_ = np.zeros((train.shape[0],len(clfs*2)))
    test_ = np.zeros((test.shape[0],len(clfs*2)))

    for j,clf in enumerate(clfs):
        '''依次訓(xùn)練各個(gè)單模型'''
        # print(j, clf)
        '''使用第1個(gè)部分作為預(yù)測(cè)到腥,第2部分來(lái)訓(xùn)練模型朵逝,獲得其預(yù)測(cè)的輸出作為第2部分的新特征。'''
        # X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]

        clf.fit(train,target)
        y_train = clf.predict(train)
        y_test = clf.predict(test)

        ## 新特征生成
        train_[:,j*2] = y_train**2
        test_[:,j*2] = y_test**2
        train_[:, j+1] = np.exp(y_train)
        test_[:, j+1] = np.exp(y_test)
        # print("val auc Score: %f" % r2_score(y_predict, dataset_d2[:, j]))
        print('Method ',j)
    
    train_ = pd.DataFrame(train_)
    test_ = pd.DataFrame(test_)
    return train_,test_

from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()

data_0 = iris.data
data = data_0[:100,:]

target_0 = iris.target
target = target_0[:100]

x_train,x_test,y_train,y_test=train_test_split(data,target,test_size=0.3)
x_train = pd.DataFrame(x_train) ; x_test = pd.DataFrame(x_test)

#模型融合中使用到的各個(gè)單模型
clfs = [LogisticRegression(),
        RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),
        ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),
        GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]

New_train,New_test = Ensemble_add_feature(x_train,x_test,y_train,clfs)

clf = LogisticRegression()
# clf = GradientBoostingClassifier(learning_rate=0.02, subsample=0.5, max_depth=6, n_estimators=30)
clf.fit(New_train, y_train)
y_emb = clf.predict_proba(New_test)[:, 1]

print("Val auc Score of stacking: %f" % (roc_auc_score(y_test, y_emb)))
Method  0
Method  1
Method  2
Method  3
Method  4
Val auc Score of stacking: 1.000000

5.4.4 本賽題示例

import pandas as pd
import numpy as np
import warnings
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')
%matplotlib inline

import itertools
import matplotlib.gridspec as gridspec
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier
# from mlxtend.classifier import StackingClassifier
from sklearn.model_selection import cross_val_score, train_test_split
# from mlxtend.plotting import plot_learning_curves
# from mlxtend.plotting import plot_decision_regions

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split

from sklearn import linear_model
from sklearn import preprocessing
from sklearn.svm import SVR
from sklearn.decomposition import PCA,FastICA,FactorAnalysis,SparsePCA

import lightgbm as lgb
import xgboost as xgb
from sklearn.model_selection import GridSearchCV,cross_val_score
from sklearn.ensemble import RandomForestRegressor,GradientBoostingRegressor

from sklearn.metrics import mean_squared_error, mean_absolute_error
## 數(shù)據(jù)讀取
Train_data = pd.read_csv('datalab/used_car_train_20200313.csv', sep=' ')
TestA_data = pd.read_csv('datalab/used_car_testA_20200313.csv', sep=' ')

print(Train_data.shape)
print(TestA_data.shape)
(150000, 31)
(50000, 30)
Train_data.head()
numerical_cols = Train_data.select_dtypes(exclude = 'object').columns
print(numerical_cols)
Index(['SaleID', 'name', 'regDate', 'model', 'brand', 'bodyType', 'fuelType',
       'gearbox', 'power', 'kilometer', 'regionCode', 'seller', 'offerType',
       'creatDate', 'price', 'v_0', 'v_1', 'v_2', 'v_3', 'v_4', 'v_5', 'v_6',
       'v_7', 'v_8', 'v_9', 'v_10', 'v_11', 'v_12', 'v_13', 'v_14'],
      dtype='object')
feature_cols = [col for col in numerical_cols if col not in ['SaleID','name','regDate','price']]
X_data = Train_data[feature_cols]
Y_data = Train_data['price']

X_test  = TestA_data[feature_cols]

print('X train shape:',X_data.shape)
print('X test shape:',X_test.shape)
X train shape: (150000, 26)
X test shape: (50000, 26)
def Sta_inf(data):
    print('_min',np.min(data))
    print('_max:',np.max(data))
    print('_mean',np.mean(data))
    print('_ptp',np.ptp(data))
    print('_std',np.std(data))
    print('_var',np.var(data))
print('Sta of label:')
Sta_inf(Y_data)
Sta of label:
_min 11
_max: 99999
_mean 5923.327333333334
_ptp 99988
_std 7501.973469876438
_var 56279605.94272992
X_data = X_data.fillna(-1)
X_test = X_test.fillna(-1)
def build_model_lr(x_train,y_train):
    reg_model = linear_model.LinearRegression()
    reg_model.fit(x_train,y_train)
    return reg_model

def build_model_ridge(x_train,y_train):
    reg_model = linear_model.Ridge(alpha=0.8)#alphas=range(1,100,5)
    reg_model.fit(x_train,y_train)
    return reg_model

def build_model_lasso(x_train,y_train):
    reg_model = linear_model.LassoCV()
    reg_model.fit(x_train,y_train)
    return reg_model

def build_model_gbdt(x_train,y_train):
    estimator = GradientBoostingRegressor(loss='ls',subsample= 0.85,max_depth= 5,n_estimators = 100)
    param_grid = { 
            'learning_rate': [0.05,0.08,0.1,0.2],
            }
    gbdt = GridSearchCV(estimator, param_grid,cv=3)
    gbdt.fit(x_train,y_train)
    print(gbdt.best_params_)
    # print(gbdt.best_estimator_ )
    return gbdt

def build_model_xgb(x_train,y_train):
    model = xgb.XGBRegressor(n_estimators=120, learning_rate=0.08, gamma=0, subsample=0.8,\
        colsample_bytree=0.9, max_depth=5) #, objective ='reg:squarederror'
    model.fit(x_train, y_train)
    return model

def build_model_lgb(x_train,y_train):
    estimator = lgb.LGBMRegressor(num_leaves=63,n_estimators = 100)
    param_grid = {
        'learning_rate': [0.01, 0.05, 0.1],
    }
    gbm = GridSearchCV(estimator, param_grid)
    gbm.fit(x_train, y_train)
    return gbm

2)XGBoost的五折交叉回歸驗(yàn)證實(shí)現(xiàn)

## xgb
xgr = xgb.XGBRegressor(n_estimators=120, learning_rate=0.1, subsample=0.8,\
        colsample_bytree=0.9, max_depth=7) # ,objective ='reg:squarederror'

scores_train = []
scores = []

## 5折交叉驗(yàn)證方式
sk=StratifiedKFold(n_splits=5,shuffle=True,random_state=0)
for train_ind,val_ind in sk.split(X_data,Y_data):
    
    train_x=X_data.iloc[train_ind].values
    train_y=Y_data.iloc[train_ind]
    val_x=X_data.iloc[val_ind].values
    val_y=Y_data.iloc[val_ind]
    
    xgr.fit(train_x,train_y)
    pred_train_xgb=xgr.predict(train_x)
    pred_xgb=xgr.predict(val_x)
    
    score_train = mean_absolute_error(train_y,pred_train_xgb)
    scores_train.append(score_train)
    score = mean_absolute_error(val_y,pred_xgb)
    scores.append(score)

print('Train mae:',np.mean(score_train))
print('Val mae',np.mean(scores))
Train mae: 600.0127885014529
Val mae 691.9976473362078

3)劃分?jǐn)?shù)據(jù)集乡范,并用多種方法訓(xùn)練和預(yù)測(cè)

## Split data with val
x_train,x_val,y_train,y_val = train_test_split(X_data,Y_data,test_size=0.3)

## Train and Predict
print('Predict LR...')
model_lr = build_model_lr(x_train,y_train)
val_lr = model_lr.predict(x_val)
subA_lr = model_lr.predict(X_test)

print('Predict Ridge...')
model_ridge = build_model_ridge(x_train,y_train)
val_ridge = model_ridge.predict(x_val)
subA_ridge = model_ridge.predict(X_test)

print('Predict Lasso...')
model_lasso = build_model_lasso(x_train,y_train)
val_lasso = model_lasso.predict(x_val)
subA_lasso = model_lasso.predict(X_test)

print('Predict GBDT...')
model_gbdt = build_model_gbdt(x_train,y_train)
val_gbdt = model_gbdt.predict(x_val)
subA_gbdt = model_gbdt.predict(X_test)

Predict LR...
Predict Ridge...
Predict Lasso...
Predict GBDT...
{'learning_rate': 0.2}

一般比賽中效果最為顯著的兩種方法

print('predict XGB...')
model_xgb = build_model_xgb(x_train,y_train)
val_xgb = model_xgb.predict(x_val)
subA_xgb = model_xgb.predict(X_test)

print('predict lgb...')
model_lgb = build_model_lgb(x_train,y_train)
val_lgb = model_lgb.predict(x_val)
subA_lgb = model_lgb.predict(X_test)
predict XGB...
predict lgb...
print('Sta inf of lgb:')
Sta_inf(subA_lgb)
Sta inf of lgb:
_min -113.02647702199383
_max: 90367.18180594654
_mean 5926.360831805605
_ptp 90480.20828296854
_std 7352.037499240903
_var 54052455.39024443

1)加權(quán)融合

def Weighted_method(test_pre1,test_pre2,test_pre3,w=[1/3,1/3,1/3]):
    Weighted_result = w[0]*pd.Series(test_pre1)+w[1]*pd.Series(test_pre2)+w[2]*pd.Series(test_pre3)
    return Weighted_result

## Init the Weight
w = [0.3,0.4,0.3]

## 測(cè)試驗(yàn)證集準(zhǔn)確度
val_pre = Weighted_method(val_lgb,val_xgb,val_gbdt,w)
MAE_Weighted = mean_absolute_error(y_val,val_pre)
print('MAE of Weighted of val:',MAE_Weighted)

## 預(yù)測(cè)數(shù)據(jù)部分
subA = Weighted_method(subA_lgb,subA_xgb,subA_gbdt,w)
print('Sta inf:')
Sta_inf(subA)
## 生成提交文件
sub = pd.DataFrame()
sub['SaleID'] = X_test.index
sub['price'] = subA
sub.to_csv('./sub_Weighted.csv',index=False)
MAE of Weighted of val: 721.1704120165163
Sta inf:
_min -197.09928483735297
_max: 91079.8298898976
_mean 5928.720726400139
_ptp 91276.92917473496
_std 7341.282090664513
_var 53894422.73471152
## 與簡(jiǎn)單的LR(線性回歸)進(jìn)行對(duì)比
val_lr_pred = model_lr.predict(x_val)
MAE_lr = mean_absolute_error(y_val,val_lr_pred)
print('MAE of lr:',MAE_lr)
MAE of lr: 2601.82041433559

2)Starking融合

## Starking

## 第一層
train_lgb_pred = model_lgb.predict(x_train)
train_xgb_pred = model_xgb.predict(x_train)
train_gbdt_pred = model_gbdt.predict(x_train)

Strak_X_train = pd.DataFrame()
Strak_X_train['Method_1'] = train_lgb_pred
Strak_X_train['Method_2'] = train_xgb_pred
Strak_X_train['Method_3'] = train_gbdt_pred

Strak_X_val = pd.DataFrame()
Strak_X_val['Method_1'] = val_lgb
Strak_X_val['Method_2'] = val_xgb
Strak_X_val['Method_3'] = val_gbdt

Strak_X_test = pd.DataFrame()
Strak_X_test['Method_1'] = subA_lgb
Strak_X_test['Method_2'] = subA_xgb
Strak_X_test['Method_3'] = subA_gbdt
Strak_X_test.head()
## level2-method 
model_lr_Stacking = build_model_lr(Strak_X_train,y_train)
## 訓(xùn)練集
train_pre_Stacking = model_lr_Stacking.predict(Strak_X_train)
print('MAE of Stacking-LR:',mean_absolute_error(y_train,train_pre_Stacking))

## 驗(yàn)證集
val_pre_Stacking = model_lr_Stacking.predict(Strak_X_val)
print('MAE of Stacking-LR:',mean_absolute_error(y_val,val_pre_Stacking))

## 預(yù)測(cè)集
print('Predict Stacking-LR...')
subA_Stacking = model_lr_Stacking.predict(Strak_X_test)

MAE of Stacking-LR: 635.088640438716
MAE of Stacking-LR: 717.0504813030163
Predict Stacking-LR...
subA_Stacking[subA_Stacking<10]=10  ## 去除過(guò)小的預(yù)測(cè)值

sub = pd.DataFrame()
sub['SaleID'] = TestA_data.SaleID
sub['price'] = subA_Stacking
sub.to_csv('./sub_Stacking.csv',index=False)
print('Sta inf:')
Sta_inf(subA_Stacking)
Sta inf:
_min 10.0
_max: 93069.56247871982
_mean 5926.1644584540845
_ptp 93059.56247871982
_std 7391.202609036913
_var 54629876.00783407

3.4 經(jīng)驗(yàn)總結(jié)

比賽的融合這個(gè)問(wèn)題配名,個(gè)人的看法來(lái)說(shuō)其實(shí)涉及多個(gè)層面,也是提分和提升模型魯棒性的一種重要方法:

  • 1)結(jié)果層面的融合晋辆,這種是最常見(jiàn)的融合方法渠脉,其可行的融合方法也有很多,比如根據(jù)結(jié)果的得分進(jìn)行加權(quán)融合瓶佳,還可以做Log芋膘,exp處理等。在做結(jié)果融合的時(shí)候霸饲,有一個(gè)很重要的條件是模型結(jié)果的得分要比較近似为朋,然后結(jié)果的差異要比較大,這樣的結(jié)果融合往往有比較好的效果提升厚脉。

  • 2)特征層面的融合习寸,這個(gè)層面其實(shí)感覺(jué)不叫融合,準(zhǔn)確說(shuō)可以叫分割器仗,很多時(shí)候如果我們用同種模型訓(xùn)練融涣,可以把特征進(jìn)行切分給不同的模型童番,然后在后面進(jìn)行模型或者結(jié)果融合有時(shí)也能產(chǎn)生比較好的效果精钮。

  • 3)模型層面的融合,模型層面的融合可能就涉及模型的堆疊和設(shè)計(jì)剃斧,比如加Staking層轨香,部分模型的結(jié)果作為特征輸入等,這些就需要多實(shí)驗(yàn)和思考了幼东,基于模型層面的融合最好不同模型類型要有一定的差異臂容,用同種模型不同的參數(shù)的收益一般是比較小的科雳。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市脓杉,隨后出現(xiàn)的幾起案子糟秘,更是在濱河造成了極大的恐慌,老刑警劉巖球散,帶你破解...
    沈念sama閱讀 207,248評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件尿赚,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡蕉堰,警方通過(guò)查閱死者的電腦和手機(jī)凌净,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,681評(píng)論 2 381
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)屋讶,“玉大人冰寻,你說(shuō)我怎么就攤上這事∶笊” “怎么了斩芭?”我有些...
    開(kāi)封第一講書(shū)人閱讀 153,443評(píng)論 0 344
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)羹奉。 經(jīng)常有香客問(wèn)我秒旋,道長(zhǎng),這世上最難降的妖魔是什么诀拭? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 55,475評(píng)論 1 279
  • 正文 為了忘掉前任迁筛,我火速辦了婚禮,結(jié)果婚禮上耕挨,老公的妹妹穿的比我還像新娘细卧。我一直安慰自己,他們只是感情好筒占,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,458評(píng)論 5 374
  • 文/花漫 我一把揭開(kāi)白布贪庙。 她就那樣靜靜地躺著,像睡著了一般翰苫。 火紅的嫁衣襯著肌膚如雪止邮。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 49,185評(píng)論 1 284
  • 那天奏窑,我揣著相機(jī)與錄音导披,去河邊找鬼。 笑死埃唯,一個(gè)胖子當(dāng)著我的面吹牛撩匕,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播墨叛,決...
    沈念sama閱讀 38,451評(píng)論 3 401
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼止毕,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼模蜡!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起扁凛,我...
    開(kāi)封第一講書(shū)人閱讀 37,112評(píng)論 0 261
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤忍疾,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后谨朝,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體膝昆,經(jīng)...
    沈念sama閱讀 43,609評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,083評(píng)論 2 325
  • 正文 我和宋清朗相戀三年叠必,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了荚孵。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,163評(píng)論 1 334
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡纬朝,死狀恐怖收叶,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情共苛,我是刑警寧澤判没,帶...
    沈念sama閱讀 33,803評(píng)論 4 323
  • 正文 年R本政府宣布,位于F島的核電站隅茎,受9級(jí)特大地震影響澄峰,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜辟犀,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,357評(píng)論 3 307
  • 文/蒙蒙 一俏竞、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧堂竟,春花似錦魂毁、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,357評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至税稼,卻和暖如春烦秩,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背郎仆。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 31,590評(píng)論 1 261
  • 我被黑心中介騙來(lái)泰國(guó)打工只祠, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留丸升,地道東北人铆农。 一個(gè)月前我還...
    沈念sama閱讀 45,636評(píng)論 2 355
  • 正文 我出身青樓牺氨,卻偏偏與公主長(zhǎng)得像狡耻,于是被迫代替她去往敵國(guó)和親墩剖。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,925評(píng)論 2 344

推薦閱讀更多精彩內(nèi)容

  • 一般來(lái)說(shuō)夷狰,通過(guò)融合多個(gè)不同的模型岭皂,可能提升機(jī)器學(xué)習(xí)的性能,這一方法在各種機(jī)器學(xué)習(xí)比賽中廣泛應(yīng)用沼头,比如在kaggle...
    塵囂看客閱讀 19,478評(píng)論 3 19
  • KAGGLE ENSEMBLING GUIDE 聲明:文章來(lái)自網(wǎng)絡(luò)爷绘,手動(dòng)翻譯筆記,僅做學(xué)習(xí)參考进倍。文末附上原地址土至,轉(zhuǎn)...
    壹刀_文閱讀 2,591評(píng)論 0 3
  • 機(jī)器學(xué)習(xí)有監(jiān)督學(xué)習(xí)算法的最終目標(biāo)是學(xué)習(xí)出一個(gè)穩(wěn)定的且在各個(gè)方面表現(xiàn)都較好的模型,然而現(xiàn)實(shí)是有時(shí)候我們只能得到多個(gè)有...
    殉道者之花火閱讀 979評(píng)論 0 2
  • 集成學(xué)習(xí) 原理 《機(jī)器學(xué)習(xí)》周志華 8.1 個(gè)體與集成 集成學(xué)習(xí)(ensemble learning) 通過(guò)構(gòu)建并...
    hxiaom閱讀 1,012評(píng)論 0 2
  • 洋蔥打卡395天 疫情爆發(fā)1個(gè)多月猾昆,湖北地區(qū)持續(xù)嚴(yán)陣以待陶因,許多民眾至今還不能自由外出。一名黃曉明工作室的員工便在網(wǎng)...
    20d43f9031f0閱讀 142評(píng)論 0 0