余弦退火调度:平滑衰减学习率的经典策略

FreeGuideOnline 最新 2026-06-21

python import torch import torch.nn as nn import torch.optim as optim from torch.optim.lr_scheduler import CosineAnnealingLR import matplotlib.pyplot as plt

model = nn.Linear(10, 2) optimizer = optim.SGD(model.parameters(), lr=0.1) scheduler = CosineAnnealingLR(optimizer, T_max=50, eta_min=0.001)

lrs = [] for epoch in range(100): optimizer.step() # 模拟训练步 scheduler.step() lrs.append(optimizer.param_groups[0]['lr'])

绘制学习率曲线

plt.plot(lrs) plt.xlabel('Epoch') plt.ylabel('Learning Rate') plt.title('Cosine Annealing Learning Rate Schedule') plt.show()


对于热重启版本:

```python
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=20, T_mult=2, eta_min=0)
  • T_0:第一个周期的长度
  • T_mult:每个后续周期长度乘以此因子(例如 T_mult=2 则周期长度为 20, 40, 80...)
  • eta_min:最低学习率

使用 TensorFlow/Keras

import tensorflow as tf

total_epochs = 100
lr_schedule = tf.keras.optimizers.schedules.CosineDecay(
    initial_learning_rate=0.1,
    decay_steps=total_epochs,
    alpha=0.001   # eta_min / eta_max
)

optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule)