介紹
在Pytorch
中有6種學(xué)習(xí)率調(diào)整方法纵顾,分別如下:
StepLR
MultiStepLR
ExponentialLR
CosineAnnealingLR
ReduceLRonPlateau
LambdaLR
它們用來(lái)在不停的迭代中去修改學(xué)習(xí)率汇跨,這6種方法都繼承于一個(gè)基類(lèi)_LRScheduler
,這個(gè)類(lèi)有三個(gè)主要屬性以及兩個(gè)主要方法己单。
三個(gè)主要屬性分別是:
-
optimizer
:關(guān)聯(lián)的優(yōu)化器 -
last_epoch
:記錄epoch數(shù) -
base_lrs
:記錄初始學(xué)習(xí)率
兩個(gè)主要方法分別是:
-
step()
:更新下一個(gè)epoch的學(xué)習(xí)率 -
get_last_lr()
:返回上次計(jì)算后的學(xué)習(xí)率
先分別介紹一下Pytorch
中提供的6種學(xué)習(xí)率調(diào)整方法
注意
學(xué)習(xí)率的調(diào)整只能在epoch循環(huán)中使用,不能在batch循環(huán)中使用,因?yàn)槟菢訉?dǎo)致學(xué)習(xí)率快速下降。而且從屬性中的last_epoch也可以看出梗肝,學(xué)習(xí)率調(diào)整是在epoch層面進(jìn)行的。
StepLR
功能:等間隔調(diào)整學(xué)習(xí)率铺董,也就是每隔一段時(shí)間去調(diào)整學(xué)習(xí)率巫击,最終學(xué)習(xí)率的圖形是階梯形狀逐漸下降(gamma小于1)或者上升(gamma大于1,估計(jì)沒(méi)有人會(huì)這么設(shè)置吧)
主要參數(shù):
-
step_size
:調(diào)整學(xué)習(xí)率的間隔數(shù) -
gamma
:調(diào)整系數(shù)精续,也就是每次調(diào)整學(xué)習(xí)率坝锰,都將之前的學(xué)習(xí)率乘以這個(gè)系數(shù),具體調(diào)整方式:lr=lr*gamma
下面我們來(lái)看一看StepLR
的圖形變化重付,這里取gamma為0.5顷级,共50個(gè)epoch:
詳細(xì)代碼見(jiàn)附錄
MultiStepLR
功能:按給定間隔調(diào)整學(xué)習(xí)率
主要參數(shù):
-
milestones
:設(shè)定調(diào)整時(shí)刻數(shù),這個(gè)參數(shù)是一個(gè)列表确垫,其中每一項(xiàng)都是一個(gè)整數(shù)弓颈,代表所需要調(diào)整學(xué)習(xí)率的epoch時(shí)刻,例如:[50,125,180]
表示分別在epoch為50删掀,125翔冀,180時(shí)進(jìn)行調(diào)整 -
gamma
:調(diào)整系數(shù)與StepLR
中的gamma
是同樣的含義
下圖是MultiStepLR
的變化曲線,這里設(shè)置的milestones
為[20, 25, 35]
披泪,可以看出在第20纤子,25,35個(gè)epoch時(shí),學(xué)習(xí)率有所變化
ExponentialLR
功能:按指數(shù)衰減調(diào)整學(xué)習(xí)率
主要參數(shù):
-
gamma
:指數(shù)的底控硼,通常設(shè)置為接近1的數(shù)(0.9)泽论,調(diào)整方式:lr=lr*gamma**epoch
下圖是MultiStepLR
的變化曲線,gamma
取值為0.9卡乾,可以看出翼悴,這里的學(xué)習(xí)率是呈指數(shù)形式下降的
CosineAnnealingLR
功能:余弦周期調(diào)整學(xué)習(xí)率,這種調(diào)整方式是可以增大學(xué)習(xí)率的
主要參數(shù):
-
T_max
:下降周期说订,這個(gè)參數(shù)表示的是余弦周期的一半 -
eta_min
:學(xué)習(xí)率下限
調(diào)整方式:
下圖是CosineAnnealingLR
的變化曲線抄瓦,T_max
設(shè)置為10潮瓶,eta_min
沒(méi)有設(shè)置陶冷,默認(rèn)為0,從圖中可以看出學(xué)習(xí)率的變化周期性的變化毯辅,Cos
函數(shù)的周期是T_max
的2倍埂伦,也就是20
ReduceLRonPlateau
功能:監(jiān)控指標(biāo),當(dāng)指標(biāo)不再變化則調(diào)整思恐,非常實(shí)用沾谜,可以監(jiān)控loss
或者accuracy
主要參數(shù):
-
mode
:min/max兩種模式,min觀察監(jiān)控的指標(biāo)不下降就調(diào)整胀莹,max觀察監(jiān)控的指標(biāo)不上升就調(diào)整 -
factor
:調(diào)整系數(shù)基跑,相當(dāng)于StepLR
中的gamma
-
patience
:“耐心”,接受連續(xù)幾次不變化 -
cooldown
:“冷卻時(shí)間”描焰,停止監(jiān)控一段時(shí)間 -
verbose
:是否打印日志 -
min_lr
:學(xué)習(xí)率下限 -
eps
:學(xué)習(xí)率衰減最小值
下圖是ReduceLRonPlateau
的變化曲線媳否,一些參數(shù)設(shè)置如下:
lr = 0.1
factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True
這里最初使用一個(gè)固定的loss_value=0.5
來(lái)模擬loss
的不變化,然后再第4個(gè)epoch
時(shí)荆秦,將loss_value
設(shè)置為0.4篱竭,圖像如下:
終端中輸出如下信息:
Epoch 10: reducing learning rate of group 0 to 3.0000e-02.
Epoch 19: reducing learning rate of group 0 to 9.0000e-03.
Epoch 28: reducing learning rate of group 0 to 2.7000e-03.
Epoch 37: reducing learning rate of group 0 to 8.1000e-04.
Epoch 46: reducing learning rate of group 0 to 2.4300e-04.
分析
終端中顯示第10個(gè)epoch時(shí),對(duì)學(xué)習(xí)率進(jìn)行了調(diào)整步绸,其中0掺逼,1,2瓤介,3的epoch(4次epoch)吕喘,沒(méi)有對(duì)學(xué)習(xí)率進(jìn)行調(diào)整,在epoch=3
(第4次)時(shí)刑桑,由于對(duì)loss_value
手動(dòng)減小到了0.4氯质,模擬了loss減小,所以ReduceLRonPlateau
的patience
在epoch=4
(第5次)時(shí)重新開(kāi)始計(jì)數(shù)漾月,直到epoch=8
(第9次)時(shí)病梢,patience
到達(dá)了極限(patience=5),所以在epoch=9
(第10次)時(shí)對(duì)學(xué)習(xí)率進(jìn)行了調(diào)整,學(xué)習(xí)率被乘以0.3蜓陌,調(diào)整到了0.03觅彰。
此后,ReduceLRonPlateau
進(jìn)入cooldown
狀態(tài)钮热,等待3輪(cooldown=3
)不對(duì)loss
進(jìn)行監(jiān)控填抬,直到epoch=12
(第13次),然后繼續(xù)觀察loss
的變化,觀察5個(gè)epoch隧期,此時(shí)epoch=17
(第18次)飒责,patience
又到達(dá)了極限(patience=5),在epoch=18
(第19次)時(shí)對(duì)學(xué)習(xí)率進(jìn)行了調(diào)整仆潮,學(xué)習(xí)率又被乘以0.3宏蛉,調(diào)整到了0.009。
后續(xù)依次類(lèi)推性置,學(xué)習(xí)率分別在第28拾并、37、46次時(shí)被進(jìn)行了調(diào)整鹏浅。
LambdaLR
功能:自定義調(diào)整策略
主要參數(shù):
-
lr_lambda
:function or list嗅义,如果是list,則list中每一元素都得是function隐砸。這里傳入lr_lambda
的參數(shù)是last_epoch
下面使用LambdaLR
模擬一下ExponentialLR
之碗,gamma
設(shè)置為0.95
lambda epoch: 0.95**epoch
生成的曲線如下圖所示:
附錄
下面代碼中的Net
為假的網(wǎng)絡(luò),無(wú)實(shí)際意義
StepLR代碼
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
# 生成一堆假數(shù)據(jù)
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = StepLR(optimizer=optimizer, step_size=5, gamma=0.5)
lr_list = []
for i in range(50):
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
lr_list.append(optimizer.param_groups[0]['lr'])
# 繪制lr變化曲線
plt.plot(lr_list)
plt.legend(labels=['StepLR:gamma=0.5'])
plt.show()
print(scheduler)
MultiStepLRd代碼
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import MultiStepLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
# 生成一堆假數(shù)據(jù)
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = MultiStepLR(optimizer=optimizer, milestones=[20, 25, 35], gamma=0.5)
lr_list = []
for i in range(50):
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
lr_list.append(optimizer.param_groups[0]['lr'])
scheduler.step()
# 繪制lr變化曲線
plt.plot(lr_list)
plt.legend(labels=['MultiStepLR:gamma=0.5'])
plt.show()
print(scheduler)
ExponentialLR代碼
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ExponentialLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
gamma = 0.9
# 生成一堆假數(shù)據(jù)
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ExponentialLR(optimizer=optimizer, gamma=gamma)
lr_list = []
for i in range(50):
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
lr_list.append(optimizer.param_groups[0]['lr'])
scheduler.step()
# 繪制lr變化曲線
plt.plot(lr_list)
plt.legend(labels=['ExponentialLR: gamma={}'.format(gamma)])
plt.show()
print(scheduler)
ReduceLRonPlateau代碼
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
loss_value = 0.5
factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True
# 生成一堆假數(shù)據(jù)
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ReduceLROnPlateau(optimizer=optimizer, mode=mode, factor=factor, patience=patience,
verbose=verbose, cooldown=cooldown, min_lr=min_lr)
lr_list = []
for i in range(50):
lr_list.append(optimizer.param_groups[0]['lr'])
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# 手動(dòng)模擬學(xué)習(xí)率的降低
if i == 3:
loss_value = 0.4
scheduler.step(loss_value)
# 繪制lr變化曲線
plt.plot(lr_list)
plt.legend(labels=['ReduceLROnPlateau'])
plt.show()
print(scheduler)
LambdaLR代碼
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
# 生成一堆假數(shù)據(jù)
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = LambdaLR(optimizer=optimizer, lr_lambda=lambda epoch: 0.95**epoch) # 模擬ExponentialLR
lr_list = []
for i in range(50):
lr_list.append(scheduler.get_last_lr())
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
# 繪制lr變化曲線
plt.plot(lr_list)
plt.legend(labels=['LambdaLR'])
plt.show()
print(scheduler)