在模型訓(xùn)練DL模型時(shí)速妖,隨著模型的epoch迭代,往往會(huì)推薦逐漸減小learning rate,在一些實(shí)驗(yàn)中也證明確實(shí)對訓(xùn)練的收斂有正向效果而芥。對于learning rate的改變,有定制衰減規(guī)則直接控制的诽嘉,也有通過算法自動(dòng)尋優(yōu)的蔚出。這里主要介紹下TF自帶的兩種衰減方法:指數(shù)衰減和多項(xiàng)式衰減。
指數(shù)衰減(tf.train.exponential_decay)
方法原型:
tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None){#exponential_decay}
參數(shù):
learning_rate:初始值
global_step:全局step數(shù)(每個(gè)step對應(yīng)一次batch)
decay_steps:learning rate更新的step周期虫腋,即每隔多少step更新一次learning rate的值
decay_rate:指數(shù)衰減參數(shù)(對應(yīng)α^t中的α)
staircase:是否階梯性更新learning rate骄酗,也就是global_step/decay_steps的結(jié)果是float型還是向下取整
計(jì)算公式:
decayed_learning_rate=learning_rate*decay_rate^(global_step/decay_steps)
多項(xiàng)式衰減(tf.train.polynomial_decay)
方法原型:
tf.train.polynomial_decay(learning_rate, global_step, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False, name=None){#polynomial_decay}
參數(shù):
learning_rate:初始值
global_step:全局step數(shù)(每個(gè)step對應(yīng)一次batch)
decay_steps:learning rate更新的step周期,即每隔多少step更新一次learning rate的值
end_learning_rate:衰減最終值
power:多項(xiàng)式衰減系數(shù)(對應(yīng)(1-t)^α的α)
cycle:step超出decay_steps之后是否繼續(xù)循環(huán)t
計(jì)算公式:
當(dāng)cycle=False時(shí)
global_step=min(global_step, decay_steps)
decayed_learning_rate=
(learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate
當(dāng)cycle=True時(shí)
decay_steps=decay_steps*ceil(global_step/decay_steps)
decayed_learning_rate=
(learning_rate-end_learning_rate)*(1-global_step/decay_steps)^(power)+end_learning_rate
注:ceil是向上取整
更新lr的一般代碼:
def _configure_learning_rate(num_samples_per_epoch, global_step):
"""Configures the learning rate.
Args:
num_samples_per_epoch: The number of samples in each epoch of training.
global_step: The global_step tensor.
Returns:
A `Tensor` representing the learning rate.
Raises:
ValueError: if
"""
decay_steps = int(num_samples_per_epoch / FLAGS.batch_size *
FLAGS.num_epochs_per_decay)
if FLAGS.sync_replicas:
decay_steps /= FLAGS.replicas_to_aggregate
if FLAGS.learning_rate_decay_type == 'exponential':
return tf.train.exponential_decay(FLAGS.learning_rate,
global_step,
decay_steps,
FLAGS.learning_rate_decay_factor,
staircase=True,
name='exponential_decay_learning_rate')
elif FLAGS.learning_rate_decay_type == 'fixed':
return tf.constant(FLAGS.learning_rate, name='fixed_learning_rate')
elif FLAGS.learning_rate_decay_type == 'polynomial':
return tf.train.polynomial_decay(FLAGS.learning_rate,
global_step,
decay_steps,
FLAGS.end_learning_rate,
power=1.0,
cycle=False,
name='polynomial_decay_learning_rate')
else:
raise ValueError('learning_rate_decay_type [%s] was not recognized',
FLAGS.learning_rate_decay_type)