minerva.optimizers.lars¶
Classes¶
Layer-wise Adaptive Rate Scaling (LARS) optimizer. |
Module Contents¶
- class minerva.optimizers.lars.LARS(params, lr, momentum=0.9, weight_decay=1e-06, eta=0.001, epsilon=1e-08, exclude_from_layer_adaptation=None)[source]¶
Bases:
torch.optim.optimizer.OptimizerLayer-wise Adaptive Rate Scaling (LARS) optimizer.
This optimizer implements the LARS algorithm, which adapts the learning rate for each layer based on the ratio of the weight norm to the gradient norm. This helps stabilize training and allows for larger learning rates.
Layer-wise Adaptive Rate Scaling (LARS) optimizer.
This optimizer implements the LARS algorithm, which adapts the learning rate for each layer based on the ratio of the weight norm to the gradient norm. This helps stabilize training and allows for larger learning rates.
Parameters¶
- paramsAny
Parameters to optimize.
- lrfloat
Base learning rate.
- momentumfloat, optional, default: 0.9
Momentum factor.
- weight_decayfloat, optional, default: 1e-6
Weight decay (L2 penalty) coefficient.
- etafloat, optional, default: 0.001
Trust coefficient for layer-wise rate scaling.
- epsilonfloat, optional, default: 1e-8
Small constant for numerical stability.
- exclude_from_layer_adaptationSet[str], optional
Set of parameter names to exclude from layer-wise adaptation (e.g., batch normalization layers and biases).
Attributes¶
- exclude_setSet[str]
Set of parameter names excluded from layer-wise adaptation.
References¶
- exclude_set¶
- Parameters:
params (Any)
lr (float)
momentum (float)
weight_decay (float)
eta (float)
epsilon (float)
exclude_from_layer_adaptation (Optional[Set[str]])