補一個前面回歸欠的小筆記:?
1诈嘿、在做回歸的時候而涉,往往用梯度下降算法來最小化損失函數(shù)怀喉,梯度是一個矢量胯杭,梯度方向上的導數(shù)最大脓恕,所以梯度是增長最快的方向,我們就可以推出代虾,-梯度就是下降最快的方向进肯。
2、在fit模型的時候棉磨,未必越復雜越好江掩,可能會引起overfitting。 A more complex model does not always lead to better performance on testing data.
3乘瓤、損失函數(shù)往往可以加入一個正則化條件环形,比如lamda乘上一個系數(shù)平方的和,以此引入系數(shù)的波動衙傀。系數(shù)波動的越大抬吟,越不靠譜。一般來說统抬,抖動得很厲害的function火本,一般都不太對。we believe that smoother function is more likely to be correct聪建。越平滑钙畔,感覺越靠譜點。
3.模型error的來源金麸。源于bias和error擎析。 簡單的model,bias大(截距b)挥下,variance小揍魂,function space小,underfitting棚瘟;復雜的模型一般是bias小现斋,variance大,overfitting偎蘸。
4.怎么看underfit,overfit庄蹋。 如果模型對train data 比較fit, 對test data不fit, 說明是overfitting了禀苦;另一方面蔓肯,如果對train data都fit不起來遂鹊,那就是underfitting了振乏。
5.怎么解決large variance的問題:
(1)增加data(增加N。 E(S^2)=((N-1)/N)sigma^2. N越大ES方和sigma方的差距越斜恕)
(2)正規(guī)化 regularization
? ?(3) cross-validation交叉驗證
6.Naive Bayes在條件概率=0的時候怎么辦慧邮?用拉普拉斯平滑调限,分子加1,分母加n(attribute的個數(shù))Use Laplacian smoothing to avoid the conditional probability equals to 0. Method: add one to numerator and add # of attributes to the denominator.
7.SVM的是否對outlier敏感性問題:
“effect does not directly mean sensitive”. Outliers may affect the hyperplane does not mean this model would be sensitive to outliers. Outlier’s effect on hyperplane mainly depends on the parameter C. The smaller the C is, the margin would get bigger, the model is more tolerant for outliers. The larger the C is, the model is less tolerant for outliers and end up with a smaller margin. It also means a smaller training error and maybe over fitting.
8.holt-winters和季節(jié)arima的異同之處:
For the difference between holt-winters and seasonal arima. Sarima and holt-winters both extracted the development features of the series but sarima is more general. The ETS are the method to find solutions for arima or the special cases of arima. However, arima won’t cover all ets solutions. For example, Holt-winter’s addictive method is very close to sarima given specific parameter. But holt-winter’s multiplicative method has no equivalent sarima counterparts.