How to Prevent Overfitting in Machine Learning Models
Very deep neural networks with a huge number of parameters are very robust machine learning systems. But, in this type of massive networks, overfitting is a common serious problem. Learning how to deal with overfitting is essential to mastering machine learning. The fundamental issue in machine learning is the tension between optimization and generalization. Optimization refers to the process of adjusting a model to get the best performance possible on the training data (the learningin machine learning), whereas generalization refers to how well the trained model performs on the data that it has **never seen before ** (test set). The goal of the game is to get a good generalization. But, you don’t control generalization; you can only adjust the model based on its training data.
具有大量參數(shù)的深度神經(jīng)網(wǎng)絡(luò)是一種非常魯棒的機(jī)器學(xué)習(xí)系統(tǒng)猜憎。但是,這類大規(guī)模的網(wǎng)絡(luò)兰英,常常面臨過(guò)擬合的困擾匆笤。 如何解決過(guò)擬合的問(wèn)題壤巷,是機(jī)器學(xué)習(xí)中必須掌握的一種能力。 機(jī)器學(xué)習(xí)的中的基本問(wèn)題就是在最優(yōu)化和泛化之間的平衡。最優(yōu)化是盡量在訓(xùn)練數(shù)據(jù)集上盡可能取得最好的性能简肴。但是贮缅,泛化要求模型在從來(lái)沒(méi)見過(guò)的數(shù)據(jù)上表現(xiàn)出最好的性能榨咐。但是,我們無(wú)法控制泛化谴供,我們能做的就是再訓(xùn)練數(shù)據(jù)集上調(diào)整優(yōu)化模型块茁,讓模型盡可能提升泛化能力。
How do you know whether a model is overfitting?
**如何判斷模型是否過(guò)擬合?
The clear sign of overfitting is when the model accuracy is high in the training set, but the accuracy drops significantly with new data or in the test set. This means the model knows the training data very well, but can not generalize. This case makes your model useless in production or AB test in most domains.
模型過(guò)擬合最明顯的特征就是在訓(xùn)練集上高正確率数焊,但是在新數(shù)據(jù)集或測(cè)試集上正確率會(huì)明顯下降永淌。這說(shuō)明模型對(duì)訓(xùn)練數(shù)據(jù)集學(xué)習(xí)的太好了,但是無(wú)法泛化佩耳。這樣的模型就無(wú)法在實(shí)際產(chǎn)品或AB測(cè)試中使用遂蛀。
How can you prevent overfitting?
如何防止過(guò)擬合?
Okay, now let’s say you found that your model overfits. But what to do now to prevent your model from overfitting?
Fortunately, there are many ways you can try to prevent your model from overfitting. Below I have described a few of the most widely used solutions for overfitting.
如果我們發(fā)現(xiàn)了模型過(guò)擬合干厚,那如何解決過(guò)擬合的問(wèn)題呢李滴?
幸運(yùn)的是,有些辦法來(lái)阻止模型的過(guò)擬合發(fā)生萍诱。
1. Reduce the network size
1. 減小網(wǎng)絡(luò)尺寸
The simplest way to prevent overfitting is to reduce the size of the model: the number of learnable parameters in the model (which is determined by the number of layers and the number of units per layer).
解決過(guò)擬合最簡(jiǎn)單的方式是減小模型的規(guī)模悬嗓,也就是模型要學(xué)習(xí)的參數(shù)數(shù)量。這些參數(shù)決定了網(wǎng)絡(luò)層數(shù)和每層的單元數(shù)裕坊。
2. Cross-Validation
2. 交叉驗(yàn)證
In cross-validation, the initial training data is used as small train-test splits. Then, these splits are used to tune the model. The most popular form of cross-validation is K-fold cross-validation. K represents the number of folds. Here is a short video from Udacity which explains K-fold cross-validation very well.
交叉驗(yàn)證就是將訓(xùn)練集劃分為很小的多份訓(xùn)練測(cè)試數(shù)據(jù)集包竹,然后將這些劃分后的測(cè)試集用來(lái)微調(diào)模型。最受歡迎的是K-flod 交叉驗(yàn)證籍凝。K表示的是folds的數(shù)量周瞎, Udacity上有一個(gè)短視頻非常好的介紹了K-flod交叉驗(yàn)證.(我沒(méi)找到視頻鏈接)K-fold交叉驗(yàn)證也是非常常用的一種技術(shù)手段。
3. Add weight regularization
3. 增加權(quán)重規(guī)則化
Given two explanations for something, the explanation most likely to be correct is the simplest one — the one that makes fewer assumptions. This idea also applies to the models learned by neural networks: given some training data and a network architecture, multiple sets of weight values could explain the data. Simpler models are less likely to overfit than complex ones. A simple model in this context is a model where the distribution of parameter values has less entropy (or a model with fewer parameters). Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights to take only small values, which makes the distribution of weight values more regular. This is called weight regularization, and it’s done by adding to the loss function of the network a cost associated with having large weights.
This cost comes in two flavors: L1 regularization — The cost added is proportional to the absolute value of the weight coefficients .
**L2 regularization **— The cost added is proportional to thesquare of the value of the weight coefficients. L2 regularization is also called weight decay in the context of neural networks. [1]
L1規(guī)則化 :在權(quán)重系數(shù)絕對(duì)值上增加代價(jià)
L2規(guī)則化:在權(quán)重系數(shù)方差上增加代價(jià)
4. Removing irrelevant features
4. 刪除不相關(guān)的特征
Improve the data by removing irrelevant features. A dataset may contain many features that do not contribute much to the prediction. Removing those less important features can improve accuracy and reduce overfitting. You can use the scikit-learn’s feature selection module for this pupose.
通過(guò)刪除不相關(guān)的特征提升數(shù)據(jù)饵蒂。數(shù)據(jù)集中包含許多特征声诸,有些特征對(duì)最終的推理是沒(méi)有貢獻(xiàn)的。刪除這些不重要的特征退盯,提升正確率彼乌,降低過(guò)擬合。
可以使用scilit-learn‘s的特征選擇 來(lái)實(shí)現(xiàn)渊迁。
5. Adding dropout
5. 網(wǎng)絡(luò)中增加dropout
Dropout, applied to a layer, consists of randomly dropping out(setting to zero) a number of output features of the layer during training. Let’s say a
given layer would normally return a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a given input sample during training. After applying dropout, this vector will have a few zero entries distributed at random: for example, [0.2, 0.5, 1.3, 0.8, 1.1]
Dropout機(jī)制就是在訓(xùn)練過(guò)程中慰照,相關(guān)層中機(jī)刪除一定數(shù)量的輸出特征(這些特征被置為0)。比如琉朽,給定一個(gè)層毒租,訓(xùn)練過(guò)重中給定輸入一般返回的向量是 [0.2, 0.5, 1.3, 0.8, 1.1] ,如果應(yīng)用droput箱叁,隨機(jī)替換部分值為0墅垮,變成[0.2, 0.5, 1.3, 0.8, 1.1] 。
6. Data Augmentation
6. 增加數(shù)據(jù)
The simplest way to reduce overfitting is to increase the size of the training data. Let’s consider we are dealing with images. In this case, there are a few ways of increasing the size of the training data — rotating the image, flipping, scaling, shifting, etc. This technique is known as data augmentation. This usually provides a big leap in improving the accuracy of the model.
降低過(guò)擬合最簡(jiǎn)單的方式是增加訓(xùn)練數(shù)據(jù)量耕漱。理論上算色,如果訓(xùn)練數(shù)據(jù)集是全量數(shù)據(jù),就不會(huì)有過(guò)擬合了螟够。 增加訓(xùn)練數(shù)據(jù)的方法剃允,比如對(duì)圖像,可以通過(guò)旋轉(zhuǎn),拉伸斥废,放大等方式。這些技術(shù)成為數(shù)據(jù)擴(kuò)張给郊,這種方法往往能大幅提升模型的能力牡肉。