Optimizers
AdamWConfig
Adam optimizer with decoupled weight decay.| Parameter | Default | Description |
|---|---|---|
lr | 3e-4 | Learning rate |
weight_decay | 1e-2 | L2 regularization strength |
SGDConfig
Stochastic Gradient Descent with momentum.| Parameter | Default | Description |
|---|---|---|
lr | 1e-3 | Learning rate |
weight_decay | 1e-4 | L2 regularization strength |
momentum | 0.9 | Momentum factor |
Learning Rate Schedulers
CosineDecayWithWarmupConfig
Linear warmup followed by cosine decay.| Parameter | Default | Description |
|---|---|---|
max_lr | 3e-4 | Peak learning rate (after warmup) |
min_lr | 3e-5 | Final learning rate (after decay) |
warmup_iters | 200 | Iterations to ramp up to max_lr |
decay_iters | 1000 | Iterations for cosine decay |
CosineAnnealingWarmRestartsConfig
Cosine annealing with periodic restarts.| Parameter | Default | Description |
|---|---|---|
T_0 | 500 | Initial cycle length |
T_mult | 1 | Cycle length multiplier (1 = same length) |
eta_min | 1e-5 | Minimum learning rate |
Optimization Configs
For hyperparameter search, use theOptConfig variants: