minerva.optimizers.lars

Classes

LARS

Layer-wise Adaptive Rate Scaling (LARS) optimizer.

Module Contents

class minerva.optimizers.lars.LARS(params, lr, momentum=0.9, weight_decay=1e-06, eta=0.001, epsilon=1e-08, exclude_from_layer_adaptation=None)[source]

Bases: torch.optim.optimizer.Optimizer

Layer-wise Adaptive Rate Scaling (LARS) optimizer.

This optimizer implements the LARS algorithm, which adapts the learning rate for each layer based on the ratio of the weight norm to the gradient norm. This helps stabilize training and allows for larger learning rates.

Layer-wise Adaptive Rate Scaling (LARS) optimizer.

This optimizer implements the LARS algorithm, which adapts the learning rate for each layer based on the ratio of the weight norm to the gradient norm. This helps stabilize training and allows for larger learning rates.

Parameters

paramsAny

Parameters to optimize.

lrfloat

Base learning rate.

momentumfloat, optional, default: 0.9

Momentum factor.

weight_decayfloat, optional, default: 1e-6

Weight decay (L2 penalty) coefficient.

etafloat, optional, default: 0.001

Trust coefficient for layer-wise rate scaling.

epsilonfloat, optional, default: 1e-8

Small constant for numerical stability.

exclude_from_layer_adaptationSet[str], optional

Set of parameter names to exclude from layer-wise adaptation (e.g., batch normalization layers and biases).

Attributes

exclude_setSet[str]

Set of parameter names excluded from layer-wise adaptation.

References

exclude_set
step(closure=None)[source]

Performs a single optimization step.

Parameters

closurecallable, optional

A closure that reevaluates the model and returns the loss.

Returns

losstorch.Tensor or None

Loss from the closure if provided, otherwise None.

Parameters:

closure (Optional[Callable[[], float]])

Parameters:
  • params (Any)

  • lr (float)

  • momentum (float)

  • weight_decay (float)

  • eta (float)

  • epsilon (float)

  • exclude_from_layer_adaptation (Optional[Set[str]])