minerva.models.finetune_adapters

Classes

LoRA

LoRA (Low-Rank Adaptation) for Linear Layers.

Module Contents

class minerva.models.finetune_adapters.LoRA(original_module, bias=True, alpha=1, r=4)[source]

Bases: torch.nn.Module

LoRA (Low-Rank Adaptation) for Linear Layers.

This module applies low-rank adaptation to an existing linear layer. LoRA enables fine-tuning of pre-trained models efficiently by introducing learnable low-rank matrices that adapt the weights of the original layer while keeping its parameters frozen.

Parameters

original_moduletorch.nn.Module

The original linear or transformer layer (e.g., torch.nn.Linear) to which LoRA is applied. It must have in_features and out_features attributes.

biasbool, optional

Whether to include a bias term in the LoRA adaptation layers. Default is True.

alphafloat, optional

The scaling factor for the LoRA output. Default is 1.

rint, optional

The rank of the low-rank matrices used for adaptation. Default is 4.

Attributes

original_moduletorch.nn.Module

The original module that LoRA adapts.

matrix_Atorch.nn.Linear

The low-rank matrix A with dimensions (in_features, r).

matrix_Btorch.nn.Linear

The low-rank matrix B with dimensions (r, out_features).

scalingfloat

The scaling factor applied to the LoRA adaptation output.

Methods

init_weights():

Initializes the weights of the low-rank matrices A and B. Matrix A is initialized using Kaiming uniform initialization, and matrix B is initialized with zeros.

forward(x):

Computes the forward pass through the adapted module.

Examples

>>> import torch
>>> import torch.nn as nn
>>> from lora_module import LoRA
>>> # Original linear layer
>>> original_layer = nn.Linear(128, 64)
>>> # Wrap the original layer with LoRA
>>> lora_layer = LoRA(original_layer, alpha=2, r=8)
>>> # Input tensor
>>> x = torch.randn(16, 128)  # batch size of 16
>>> # Forward pass
>>> output = lora_layer(x)
>>> print(output.shape)
torch.Size([16, 64])

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]

Forward pass of the LoRA module.

Computes the output as the sum of the original module’s output and the low-rank adaptation output, scaled by the specified scaling factor.

Parameters

xtorch.Tensor

The input tensor with shape (batch_size, in_features).

Returns

torch.Tensor

The output tensor with shape (batch_size, out_features).

Notes

The output is computed as: .. math:

y =         ext{original_module}(x) +       ext{scaling} \cdot B(A(x)),

where A and B are the learnable low-rank matrices.

Parameters:

x (torch.Tensor)

init_weights()[source]

Initialize weights for the low-rank matrices.

Matrix A is initialized with Kaiming uniform initialization, which is suitable for layers with ReLU activations. Matrix B is initialized with zeros to ensure that the original module’s behavior is not perturbed at the start.

matrix_A
matrix_B
original_module
scaling = 0.25
Parameters:
  • original_module (torch.nn.Module)

  • bias (bool)

  • alpha (int)

  • r (int)