對于線性回歸:
方法一:以前的cross validation中有一種方法是train/test split,現(xiàn)在挪到model_selection庫中优烧,randomly partition the data into training and test sets, by default, 25 percent of the data is assigned to the test set结澄。這種方法只能得到一次劃分結(jié)果的評估結(jié)果幽污,不準確吁津。
score算的是r-squared系數(shù)凭舶,好像score和cross_val_score默認算的就是r-squared系統(tǒng)
// from sklearn.model_selection import train_test_split
// X_train,X_test,y_train,y_test=train_test_split(X,y)
// model=LinearRegression()
// model.fit(X,y)
// model.score(X_test,y_test)
方法二:用model_selection庫中的cross_val_score
// from sklearn.model_selection import cross_val_score
// model=LinearRegression()
// scores=cross_val_score(model,X,y,cv=5)
cv=5表示cross_val_score采用的是k-fold cross validation的方法咽筋,重復(fù)5次交叉驗證
實際上溶推,cross_val_score可以用的方法有很多,如kFold, leave-one-out, ShuffleSplit等奸攻,舉例而言:
//cv=ShuffleSplit(n_splits=3,test_size=0.3,random_state=0)
//cross_val_score(model, X,y, cv=cv)
對于邏輯回歸:
邏輯回歸用于處理分類問題蒜危,線性回歸求解how far it was from the decision boundary(求距離)的評估方式明顯不適合分類問題。
The most common metrics are accuracy, precision, recall, F1 measure, true negatives, false positives and false negatives
1睹耐、計算confusion matrix
Confusion matrix 由 true positives, true negatives, false positives以及 false negatives組成辐赞。
// confusion_matrix=confusion_matrix(y_test, y_pred)
2、accuracy: measures a fraction of the classifier's predictions that are correct.
// accuracy_score(y_true,y_pred)
LogisticRegression.score() 默認使用accuracy
3硝训、precision: 比如說我們預(yù)測得了cancer中實際確實得病的百分比
// classifier=LogisticRegression()
// classifier.fit(X_train,y_train)
// precisions= cross_val_score(classifier, X_train,y_train,cv=5,scoring='precision')
4响委、recall: 比如說實際得了cancer新思,被我們預(yù)測出來的百分比
// recalls= cross_val_score(classifier,X_train,y_train,cv=5,scoring='recall')
5、precision和recall之間是一個trade-off的關(guān)系赘风,用F1score來表征性能夹囚,F(xiàn)1score越高越好
// fls=cross_val_score(classifier, X_train, y_train, cv=5,scoring='f1')
6、ROC曲線和AUC的值
ROC曲線的橫坐標為false positive rate(FPR),縱坐標為true positive rate(TPR)
AUC數(shù)值=ROC曲線下的面積
// classifier=LogisticRegression()
// classifier.fit(X_train, y_train)
// predictions = classifier.predict_proba(X_test)
// false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:,1])
// roc_auc=auc(false_positive_rate, recall)