邏輯回歸(logistic regression)又稱“對數(shù)幾率回歸风瘦。雖然它的名字是回歸凹蜈,但卻是一種分類學(xué)習(xí)方法。邏輯回歸也可以從二元分類擴(kuò)展到多分類糕伐,這就是多項(xiàng)式回歸砰琢。
1.構(gòu)造預(yù)測函數(shù)h(x)
對數(shù)幾率函數(shù)是一種“Sigmoid"函數(shù),Sigmoid的函數(shù)輸出是介于(0,1)之間,中間值是0.5陪汽。sig(t)<0.5則說明當(dāng)前數(shù)據(jù)屬于反類/0類训唱;sig(t)>0.5則說明當(dāng)前數(shù)據(jù)屬于正類/1類。所以可以將sigmoid函數(shù)看成樣本數(shù)據(jù)的概率密度函數(shù)挚冤。
2.構(gòu)造損失函數(shù)
用最大似然法估計(jì)參數(shù)况增,優(yōu)點(diǎn):大樣本數(shù)據(jù)中參數(shù)的估計(jì)穩(wěn)定,偏差小你辣,估計(jì)方差小巡通。
概率函數(shù):
因?yàn)闃颖緮?shù)據(jù)(m個)獨(dú)立,取似然函數(shù)為:
取對數(shù)似然函數(shù):
基于最大似然估計(jì)推導(dǎo)得到Cost函數(shù)和J函數(shù):
3.梯度下降求最小值
模型的改進(jìn)
避免過擬合---正則化
讓準(zhǔn)確率最大化---SVM
Kernal Logistics Regression
請參考別人寫的文章https://blog.csdn.net/qq_34993631/article/details/79345889
配視頻
https://www.youtube.com/watch?v=AbaIkcQUQuo
L1,L2正則化比較
L2是收縮舍哄,L2稀疏性
L2比L1要快
建議用邏輯回歸是至少用一個正則化宴凉,特征要標(biāo)準(zhǔn)化。
線性回歸與邏輯回歸的比較
邏輯回歸與樸素貝葉斯的比較
因?yàn)闃闼刎惾~斯對數(shù)據(jù)做出了更強(qiáng)的假設(shè)表悬,但需要更少的示例來估計(jì)參數(shù)
例子
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve,roc_auc_score,accuracy_score,confusion_matrix
from sklearn.linear_model import LogisticRegression
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1]
}
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','admitted'])
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0) #train is based on 75% of the dataset, test is based on 25% of dataset
logistic_regression= LogisticRegression() #邏輯回歸
logistic_regression.fit(X_train,y_train) #訓(xùn)練
y_pred=logistic_regression.predict(X_test) #預(yù)測
print (X_test) #test dataset
print (y_pred) #predicted values
print(confusion_matrix(y_test, y_pred))
print("Accuracy:",accuracy_score(y_test, y_pred))
#Plot ROC curve
y_pred_proba = logistic_regression.predict_proba(X_test)[::,1]
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
auc = roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()
ROC曲線的縱軸是“真正例率”弥锄,橫軸是“假正例率”。