minerva.models.finetune_adapters

Classes

LoRA

LoRA (Low-Rank Adaptation) for Linear Layers.

Module Contents

class minerva.models.finetune_adapters.LoRA(original_module, bias=True, alpha=1, r=4)[source]

Bases: torch.nn.Module

LoRA (Low-Rank Adaptation) for Linear Layers.

This module applies low-rank adaptation to an existing linear layer. LoRA enables fine-tuning of pre-trained models efficiently by introducing learnable low-rank matrices that adapt the weights of the original layer while keeping its parameters frozen.

Parameters

original_moduletorch.nn.Module: The original linear or transformer layer (e.g., torch.nn.Linear) to which LoRA is applied. It must have in_features and out_features attributes.
biasbool, optional: Whether to include a bias term in the LoRA adaptation layers. Default is True.
alphafloat, optional: The scaling factor for the LoRA output. Default is 1.
rint, optional: The rank of the low-rank matrices used for adaptation. Default is 4.

Attributes

original_moduletorch.nn.Module: The original module that LoRA adapts.
matrix_Atorch.nn.Linear: The low-rank matrix A with dimensions (in_features, r).
matrix_Btorch.nn.Linear: The low-rank matrix B with dimensions (r, out_features).
scalingfloat: The scaling factor applied to the LoRA adaptation output.

Methods

init_weights():: Initializes the weights of the low-rank matrices A and B. Matrix A is initialized using Kaiming uniform initialization, and matrix B is initialized with zeros.
forward(x):: Computes the forward pass through the adapted module.

Examples

>>> import torch
>>> import torch.nn as nn
>>> from lora_module import LoRA

>>> # Original linear layer
>>> original_layer = nn.Linear(128, 64)

>>> # Wrap the original layer with LoRA
>>> lora_layer = LoRA(original_layer, alpha=2, r=8)

>>> # Input tensor
>>> x = torch.randn(16, 128)  # batch size of 16

>>> # Forward pass
>>> output = lora_layer(x)
>>> print(output.shape)
torch.Size([16, 64])

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]

Forward pass of the LoRA module.

Computes the output as the sum of the original module’s output and the low-rank adaptation output, scaled by the specified scaling factor.

Parameters

xtorch.Tensor: The input tensor with shape (batch_size, in_features).

Returns

torch.Tensor: The output tensor with shape (batch_size, out_features).

Notes

The output is computed as: .. math:

y =         ext{original_module}(x) +       ext{scaling} \cdot B(A(x)),

where A and B are the learnable low-rank matrices.

Parameters:: x (torch.Tensor)

init_weights()[source]

Initialize weights for the low-rank matrices.

Matrix A is initialized with Kaiming uniform initialization, which is suitable for layers with ReLU activations. Matrix B is initialized with zeros to ensure that the original module’s behavior is not perturbed at the start.

matrix_A

matrix_B

original_module

scaling = 0.25

Parameters:

original_module (torch.nn.Module)
bias (bool)
alpha (int)
r (int)