minerva.models.nets.image.vit¶
Attributes¶
Classes¶
A sequential container. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Base class for all neural network modules. |
|
Masked Autoencoder with VisionTransformer backbone. |
|
A modular Lightning model wrapper for supervised learning tasks. |
|
A sequential container. |
|
Base class for all neural network modules. |
|
Vision Transformer with support for patch or hybrid CNN input stage |
|
Vision Transformer with support for global average pooling |
Functions¶
|
|
|
|
|
|
|
Module Contents¶
- class minerva.models.nets.image.vit.Conv2dReLU(in_channels, out_channels, kernel_size, padding=0, stride=1, use_batchnorm=True)[source]¶
Bases:
torch.nn.SequentialA sequential container.
Modules will be added to it in the order they are passed in the constructor. Alternatively, an
OrderedDictof modules can be passed in. Theforward()method ofSequentialaccepts any input and forwards it to the first module it contains. It then “chains” outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.The value a
Sequentialprovides over manually calling a sequence of modules is that it allows treating the whole container as a single module, such that performing a transformation on theSequentialapplies to each of the modules it stores (which are each a registered submodule of theSequential).What’s the difference between a
Sequentialand atorch.nn.ModuleList? AModuleListis exactly what it sounds like–a list for storingModules! On the other hand, the layers in aSequentialare connected in a cascading way.Example:
# Using Sequential to create a small model. When `model` is run, # input will first be passed to `Conv2d(1,20,5)`. The output of # `Conv2d(1,20,5)` will be used as the input to the first # `ReLU`; the output of the first `ReLU` will become the input # for `Conv2d(20,64,5)`. Finally, the output of # `Conv2d(20,64,5)` will be used as input to the second `ReLU` model = nn.Sequential( nn.Conv2d(1, 20, 5), nn.ReLU(), nn.Conv2d(20, 64, 5), nn.ReLU() ) # Using Sequential with OrderedDict. This is functionally the # same as the above code model = nn.Sequential( OrderedDict( [ ("conv1", nn.Conv2d(1, 20, 5)), ("relu1", nn.ReLU()), ("conv2", nn.Conv2d(20, 64, 5)), ("relu2", nn.ReLU()), ] ) )
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class minerva.models.nets.image.vit.DecoderBlock(in_channels, out_channels, skip_channels=0, use_batchnorm=True)[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- conv1¶
- conv2¶
- up¶
- class minerva.models.nets.image.vit.DecoderCup[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- blocks¶
- conv_feature1¶
- conv_feature2¶
- conv_feature3¶
- conv_feature4¶
- conv_more¶
- up2¶
- up3¶
- up4¶
- class minerva.models.nets.image.vit.MLAHead(mla_channels=256, mlahead_channels=128, norm_cfg=None)[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- head2¶
- head3¶
- head4¶
- head5¶
- class minerva.models.nets.image.vit.MMAdaptivePadding(kernel_size, stride, dilation, padding='corner')[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training – Boolean represents whether this module is in training or evaluation mode.
- Parameters:
kernel_size (Tuple[int, int])
stride (Tuple[int, int])
dilation (Tuple[int, int])
padding (str)
with a convolutional layer using a given kernel size, stride, and dilation.
Parameters¶
- kernel_sizeTuple[int, int]
Size of the convolution kernel.
- strideTuple[int, int]
Stride of the convolution.
- dilationTuple[int, int]
Dilation rate of the convolution.
- paddingstr, default=”corner”
Padding mode. Options are “same” or “corner”.
- dilation¶
- kernel_size¶
- padding = 'corner'¶
- stride¶
- class minerva.models.nets.image.vit.MMFFN(embed_dims, feedforward_channels, dropout_type, dropout_params, act_type, act_params, num_fcs, ffn_drop)[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- Parameters:
embed_dims (int)
feedforward_channels (int)
dropout_type (type)
dropout_params (Optional[dict])
act_type (type)
act_params (Optional[dict])
num_fcs (int)
ffn_drop (float)
Feed-forward network used within the Transformer encoder layer.
Parameters¶
- embed_dimsint
Dimensionality of the token embeddings.
- feedforward_channelsint
Number of hidden units in the feed-forward layer.
- dropout_typetype
Dropout module class (e.g., nn.Dropout, DropPath).
- dropout_paramsOptional[dict]
Parameters for the dropout layer.
- act_typetype
Activation function class (e.g., nn.GELU).
- act_paramsOptional[dict]
Parameters for the activation function.
- num_fcsint
Number of fully-connected layers. Only supports 2.
- ffn_dropfloat
Dropout rate applied after each FC layer.
- activate¶
- dropout_layer¶
- layers¶
- class minerva.models.nets.image.vit.MMMultiheadAttention(embed_dims, num_heads, attn_drop, proj_drop, batch_first, bias)[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training – Boolean represents whether this module is in training or evaluation mode.
- Parameters:
embed_dims (int)
num_heads (int)
attn_drop (float)
proj_drop (float)
batch_first (bool)
bias (bool)
Parameters¶
- embed_dimsint
Dimensionality of each token embedding.
- num_headsint
Number of attention heads.
- attn_dropfloat
Dropout rate for attention weights.
- proj_dropfloat
Dropout rate for output projection.
- batch_firstbool
Whether the input is in (B, L, C) format.
- biasbool
If True, add bias terms to the query, key, and value projections.
- attn¶
- batch_first¶
- dropout_layer¶
- proj_drop¶
- class minerva.models.nets.image.vit.MMPatchEmbed(in_channels, embed_dims, patch_size, stride, dilation, bias, norm_type, norm_params, patch_norm, padding_type='corner')[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training – Boolean represents whether this module is in training or evaluation mode.
- Parameters:
in_channels (int)
embed_dims (int)
patch_size (int)
stride (Optional[int])
dilation (int)
bias (bool)
norm_type (Optional[type])
norm_params (Optional[dict])
patch_norm (bool)
padding_type (str)
Parameters¶
- in_channelsint
Number of input image channels.
- embed_dimsint
Dimensionality of the output patch embeddings.
- patch_sizeint
Size of the square patches.
- strideOptional[int]
Stride for the convolution. If None, defaults to patch size.
- dilationint
Dilation applied to the convolution.
- biasbool
Whether to include a bias term in the projection.
- norm_typeOptional[type]
Normalization layer class (e.g., nn.LayerNorm).
- norm_paramsOptional[dict]
Parameters to initialize the normalization layer.
- patch_normbool
Whether to apply normalization after patch embedding.
- padding_typestr, default=”corner”
Padding strategy for adaptive padding.
- adapt_padding¶
- projection¶
- class minerva.models.nets.image.vit.MMTransformerEncoderLayer(embed_dims, num_heads, feedforward_channels, drop_rate, attn_drop_rate, drop_path_rate, num_fcs, qkv_bias, act_type, act_params, dropout_type, dropout_params, norm_type, norm_params, batch_first, with_cp)[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- Parameters:
embed_dims (int)
num_heads (int)
feedforward_channels (int)
drop_rate (float)
attn_drop_rate (float)
drop_path_rate (float)
num_fcs (int)
qkv_bias (bool)
act_type (type)
act_params (Optional[dict])
dropout_type (type)
dropout_params (Optional[dict])
norm_type (type)
norm_params (Optional[dict])
batch_first (bool)
with_cp (bool)
Transformer encoder block consisting of multi-head attention and FFN.
Parameters¶
- embed_dimsint
Token embedding dimension.
- num_headsint
Number of attention heads.
- feedforward_channelsint
Hidden dimension in the FFN.
- drop_ratefloat
Dropout rate after attention and FFN.
- attn_drop_ratefloat
Dropout rate for attention weights.
- drop_path_ratefloat
Stochastic depth drop path rate.
- num_fcsint
Number of FC layers in FFN. Must be 2.
- qkv_biasbool
Whether to use bias in QKV projections.
- act_typetype
Activation function type.
- act_paramsOptional[dict]
Activation function parameters.
- dropout_typetype
Dropout class (e.g., nn.Dropout, DropPath).
- dropout_paramsOptional[dict]
Dropout parameters.
- norm_typetype
Normalization layer type.
- norm_paramsOptional[dict]
Parameters for normalization layers.
- batch_firstbool
Whether input has shape (B, L, C).
- with_cpbool
Whether to use checkpointing for memory savings.
- attn¶
- ffn¶
- ln1¶
- ln2¶
- with_cp¶
- class minerva.models.nets.image.vit.MaskedAutoencoderViT(img_size=224, patch_size=16, in_chans=1, embed_dim=1024, depth=24, num_heads=16, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4.0, norm_layer=nn.LayerNorm, norm_pix_loss=False)[source]¶
Bases:
lightning.LightningModuleMasked Autoencoder with VisionTransformer backbone.
- Args:
img_size (int): Size of input image. patch_size (int): Size of image patch. in_chans (int): Number of input channels. embed_dim (int): Dimension of token embeddings. depth (int): Number of transformer blocks. num_heads (int): Number of attention heads. decoder_embed_dim (int): Dimension of decoder embeddings. decoder_depth (int): Number of decoder transformer blocks. decoder_num_heads (int): Number of decoder attention heads. mlp_ratio (float): Ratio of MLP hidden layer size to embedding size. norm_layer (torch.nn.LayerNorm): Normalization layer. norm_pix_loss (bool): Whether to normalize pixel loss.
- References:
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- blocks¶
- cls_token¶
- decoder_blocks¶
- decoder_embed¶
- decoder_norm¶
- decoder_pos_embed¶
- decoder_pred¶
- forward(imgs, mask_ratio=0.75)[source]¶
Forward pass.
- Args:
imgs (torch.Tensor): Input images of shape (N, C, H, W). mask_ratio (float): Ratio of values to mask.
- Returns:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Loss value, predicted output, binary mask.
- forward_decoder(x, ids_restore)[source]¶
Forward pass through the decoder.
- Args:
x (torch.Tensor): Input tensor of shape (N, L, D). ids_restore (torch.Tensor): Indices to restore the original order of patches.
- Returns:
torch.Tensor: Decoded output tensor of shape (N, L, patch_size^2 * in_chans).
- forward_encoder(x, mask_ratio)[source]¶
Forward pass through the encoder.
- Args:
x (torch.Tensor): Input tensor of shape (N, C, H, W). mask_ratio (float): Ratio of values to mask.
- Returns:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Encoded representation, binary mask, shuffled indices.
- forward_loss(imgs, pred, mask)[source]¶
Calculate the loss.
- Args:
imgs (torch.Tensor): Input images of shape (N, C, H, W). pred (torch.Tensor): Predicted output of shape (N, L, patch_size^2 * in_chans). mask (torch.Tensor): Binary mask of shape (N, L).
- Returns:
torch.Tensor: Computed loss value.
- in_chans = 1¶
- mask_token¶
- norm¶
- norm_pix_loss = False¶
- patch_embed¶
- patchify(imgs)[source]¶
Extract patches from input images.
- Args:
imgs (torch.Tensor): Input images of shape (N, C, H, W).
- Returns:
torch.Tensor: Patches of shape (N, num_patches, patch_size^2 * in_chans).
- pos_embed¶
- random_masking(x, mask_ratio)[source]¶
Perform per-sample random masking by per-sample shuffling.
- Args:
x (torch.Tensor): Input tensor of shape (N, L, D). mask_ratio (float): Ratio of values to mask.
- Returns:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Masked input, binary mask, shuffled indices.
- training_step(batch, batch_idx)[source]¶
Training step.
- Args:
batch (Tuple[torch.Tensor]): Input batch of images and corresponding labels. batch_idx (int): Index of the current batch.
- Returns:
Dict[str, torch.Tensor]: Dictionary containing the loss value for the current step.
- class minerva.models.nets.image.vit.SFM_BasePatch16_Downstream(img_size=(512, 512), num_classes=6, in_chans=1, loss_fn=None, learning_rate=0.001, **kwargs)[source]¶
Bases:
minerva.models.nets.base.SimpleSupervisedModelA modular Lightning model wrapper for supervised learning tasks.
This class enables the construction of supervised models by combining a backbone (feature extractor), an optional adapter, and a fully connected (FC) head. It provides a clean interface for setting up custom training, validation, and testing pipelines with pluggable loss functions, metrics, optimizers, and learning rate schedulers.
The architecture is structured as follows:
Backbone Model
v
Adapter (Optional)
- (Flatten if needed)
v
Fully Connected Head
v
Loss Function
Training and validation steps comprise the following steps:
Forward pass input through the backbone.
Pass through adapter (if provided).
Flatten the output (if flatten is True) before the FC head.
Forward through the FC head.
Compute loss with respect to targets.
Backpropagate and update parameters.
Compute metrics and log them.
Return loss. train_loss, val_loss, and test_loss are always logged, along with any additional metrics specified in the train_metrics, val_metrics, and test_metrics dictionaries.
This wrapper is especially useful to quickly set up supervised models for various tasks, such as image classification, object detection, and segmentation. It is designed to be flexible and extensible, allowing users to easily swap out components like the backbone, adapter, and FC head as needed. The model is built with a focus on simplicity and modularity, making it easy to adapt to different use cases and requirements. The model is designed to be used with PyTorch Lightning and is compatible with its training loop.
Note: For more complex architectures that does not follow the above structure should not inherit from this class.
Note: Input batches must be tuples (input_tensor, target_tensor).
Create a SFM model with a ViT base backbone. The ViT-Base-16 backbone has the following configuration: - Patch size: 16 - Embedding dimension: 768 - Depth: 12 - Number of heads: 12
Parameters¶
- img_sizeUnion[int, Tuple[int, …]]
Size of the input image. Note that, to use default pre-trained SFM model, the size should be (512, 512).
- num_classesint
Number of classes for segmentation head. Default is 6.
- in_chansint
Number of input channels. Default is 1.
- loss_fnOptional[torch.nn.Module], optional
Loss function, by default None
- learning_ratefloat, optional
Learning rate value, by default 1e-3
- _single_step(batch, batch_idx, step_name)[source]¶
Perform a single train/validation/test step. It consists in making a forward pass with the input data on the backbone model, computing the loss between the output and the input data, and logging the loss.
Parameters¶
- batchtorch.Tensor
The input data. It must be a 2-element tuple of tensors, where the first tensor is the input data and the second tensor is the mask.
- batch_idxint
The index of the batch.
- step_namestr
The name of the step. It will be used to log the loss. The possible values are: “train”, “val” and “test”. The loss will be logged as “{step_name}_loss”.
Returns¶
- torch.Tensor
A tensor with the loss value.
- Parameters:
batch (torch.Tensor)
batch_idx (int)
step_name (str)
- Return type:
torch.Tensor
- predict_step(batch, batch_idx, dataloader_idx=0)[source]¶
Step function called during
predict(). By default, it callsforward(). Override to add any processing logic.The
predict_step()is used to scale inference on multi-devices.To prevent an OOM error, it is possible to use
BasePredictionWritercallback to write the predictions to disk or database after each batch or on epoch end.The
BasePredictionWritershould be used while using a spawn based accelerator. This happens forTrainer(strategy="ddp_spawn")or training on 8 TPU cores withTrainer(accelerator="tpu", devices=8)as predictions won’t be returned.- Args:
batch: The output of your data iterable, normally a
DataLoader. batch_idx: The index of this batch. dataloader_idx: The index of the dataloader that produced this batch.(only if multiple dataloaders used)
- Return:
Predicted output (optional).
Example
class MyModel(LightningModule): def predict_step(self, batch, batch_idx, dataloader_idx=0): return self(batch) dm = ... model = MyModel() trainer = Trainer(accelerator="gpu", devices=2) predictions = trainer.predict(model, dm)
- Parameters:
batch (torch.Tensor)
batch_idx (int)
dataloader_idx (int)
- Return type:
torch.Tensor
- Parameters:
img_size (Union[int, Tuple[int, Ellipsis]])
num_classes (int)
in_chans (int)
loss_fn (Optional[torch.nn.Module])
learning_rate (float)
- class minerva.models.nets.image.vit.SegmentationHead(in_channels, out_channels, kernel_size=3, upsampling=1)[source]¶
Bases:
torch.nn.SequentialA sequential container.
Modules will be added to it in the order they are passed in the constructor. Alternatively, an
OrderedDictof modules can be passed in. Theforward()method ofSequentialaccepts any input and forwards it to the first module it contains. It then “chains” outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.The value a
Sequentialprovides over manually calling a sequence of modules is that it allows treating the whole container as a single module, such that performing a transformation on theSequentialapplies to each of the modules it stores (which are each a registered submodule of theSequential).What’s the difference between a
Sequentialand atorch.nn.ModuleList? AModuleListis exactly what it sounds like–a list for storingModules! On the other hand, the layers in aSequentialare connected in a cascading way.Example:
# Using Sequential to create a small model. When `model` is run, # input will first be passed to `Conv2d(1,20,5)`. The output of # `Conv2d(1,20,5)` will be used as the input to the first # `ReLU`; the output of the first `ReLU` will become the input # for `Conv2d(20,64,5)`. Finally, the output of # `Conv2d(20,64,5)` will be used as input to the second `ReLU` model = nn.Sequential( nn.Conv2d(1, 20, 5), nn.ReLU(), nn.Conv2d(20, 64, 5), nn.ReLU() ) # Using Sequential with OrderedDict. This is functionally the # same as the above code model = nn.Sequential( OrderedDict( [ ("conv1", nn.Conv2d(1, 20, 5)), ("relu1", nn.ReLU()), ("conv2", nn.Conv2d(20, 64, 5)), ("relu2", nn.ReLU()), ] ) )
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class minerva.models.nets.image.vit.SetrVitBackbone(original_resolution, img_size, patch_size, embed_dims, interpolate_mode, in_channels, patch_norm, stride, dilatation, bias, norm_type, norm_params, padding_type, num_layers, num_heads, out_indices, drop_rate, with_cls_token, mlp_ratio, attn_drop_rate, drop_path_rate, num_fcs, qkv_bias, output_cls_token, act_type, act_params, with_cp, dropout_type, dropout_params, batch_first=True)[source]¶
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self) -> None: super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training – Boolean represents whether this module is in training or evaluation mode.
- Parameters:
original_resolution (Optional[tuple])
img_size (tuple)
patch_size (int)
embed_dims (int)
interpolate_mode (str)
in_channels (int)
patch_norm (bool)
stride (Optional[int])
dilatation (int)
bias (bool)
norm_type (type)
norm_params (Optional[dict])
padding_type (str)
num_layers (int)
num_heads (int)
out_indices (Union[int, List[int], Tuple[int, Ellipsis]])
drop_rate (float)
with_cls_token (bool)
mlp_ratio (int)
attn_drop_rate (float)
drop_path_rate (float)
num_fcs (int)
qkv_bias (bool)
output_cls_token (bool)
act_type (type)
act_params (dict)
with_cp (bool)
dropout_type (type)
dropout_params (Optional[dict])
batch_first (bool)
Parameters¶
- original_resolutionOptional[tuple]
Original training image resolution (used for interpolating positional embeddings).
- img_sizetuple
Target image size (H, W).
- patch_sizeint
Size of square patches.
- embed_dimsint
Dimensionality of patch embeddings.
- interpolate_modestr
Interpolation method for resizing positional embeddings.
- in_channelsint
Number of input channels.
- patch_normbool
Whether to apply normalization after patch embedding.
- strideOptional[int]
Convolution stride for patch embedding.
- dilatationint
Dilation factor for convolution.
- biasbool
Whether to use bias in convolution.
- norm_typetype
Normalization layer class.
- norm_paramsOptional[dict]
Parameters for normalization layers.
- padding_typestr
Padding type for adaptive padding (“same” or “corner”).
- num_layersint
Number of transformer encoder layers.
- num_headsint
Number of attention heads.
- out_indicesUnion[int, List[int], Tuple[int, …]]
Indices of layers whose outputs are returned.
- drop_ratefloat
Dropout rate after positional encoding.
- with_cls_tokenbool
Whether to use a class token in the encoder.
- mlp_ratioint
Expansion ratio for the hidden layer in FFN.
- attn_drop_ratefloat
Dropout rate in attention.
- drop_path_ratefloat
Stochastic depth drop rate.
- num_fcsint
Number of FCs in FFN. Must be 2.
- qkv_biasbool
Whether to use bias in QKV projections.
- output_cls_tokenbool
Whether to return the class token in outputs.
- act_typetype
Activation function class.
- act_paramsdict
Parameters for the activation function.
- with_cpbool
Whether to use checkpointing for memory savings.
- dropout_typetype
Dropout class used in FFN.
- dropout_paramsOptional[dict]
Parameters for dropout.
- batch_firstbool, default=True
If True, inputs/outputs are in shape (B, L, C).
- _pos_embeding(patched_img, hw_shape, pos_embed)[source]¶
Positioning embeding method. Resize the pos_embed, if the input image size doesn’t match the training size.
- Args:
- patched_img (torch.Tensor):
The patched image, it should be shape of [B, L1, C].
- hw_shape (tuple):
The downsampled image resolution. pos_embed (torch.Tensor): The pos_embed weighs, it should be shape of [B, L2, c].
- Return:
- torch.Tensor:
The pos encoded image feature.
- Parameters:
hw_shape (tuple)
- cls_token¶
- drop_after_pos¶
- embed_dims¶
- img_size¶
- interpolate_mode¶
- layers¶
- load_backbone(path)[source]¶
Loads pretrained weights and handles positional embedding resizing if necessary.
- Parameters:
path (str)
- original_resolution¶
- output_cls_token¶
- patch_embed¶
- patch_size¶
- pos_embed¶
- static resize_pos_embed(pos_embed, input_shape, pos_shape, mode='bicubic')[source]¶
Resize pos_embed weights. Resize pos_embed using bicubic interpolate method.
- Args:
- pos_embed (torch.Tensor):
Position embedding weights.
- input_shape (tuple):
Tuple for (downsampled input image height, downsampled input image width).
- pos_shape (tuple):
The resolution of downsampled origin training image.
- mode (str):
Algorithm used for upsampling:
'linear'|'bilinear'|'bicubic'|'trilinear'. Default:'bicubic'
- Return:
- torch.Tensor:
The resized pos_embed of shape [B, L_new, C]
- Parameters:
mode (str)
- with_cls_token¶
- class minerva.models.nets.image.vit.VIT_MLAHead(img_size=768, mla_channels=256, mlahead_channels=128, num_classes=6, norm_layer=nn.BatchNorm2d, norm_cfg=None, **kwargs)[source]¶
Bases:
torch.nn.ModuleVision Transformer with support for patch or hybrid CNN input stage
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- BatchNorm¶
- cls¶
- img_size = 768¶
- mla_channels = 256¶
- mlahead¶
- mlahead_channels = 128¶
- norm_cfg = None¶
- num_classes = 6¶
- class minerva.models.nets.image.vit.VisionTransformer(global_pool=False, **kwargs)[source]¶
Bases:
timm.models.vision_transformer.VisionTransformer,lightning.LightningModuleVision Transformer with support for global average pooling
- Args:
img_size: Input image size. patch_size: Patch size. in_chans: Number of image input channels. num_classes: Number of classes for classification head. global_pool: Type of global pooling for final sequence (default: ‘token’). embed_dim: Transformer embedding dimension. depth: Depth of transformer. num_heads: Number of attention heads. mlp_ratio: Ratio of mlp hidden dim to embedding dim. qkv_bias: Enable bias for qkv projections if True. init_values: Layer-scale init values (layer-scale enabled if not None). class_token: Use class token. no_embed_class: Don’t include position embeddings for class (or reg) tokens. reg_tokens: Number of register tokens. pre_norm: Enable norm after embeddings, before transformer blocks (standard in CLIP ViT). final_norm: Enable norm after transformer blocks, before head (standard in most ViT). fc_norm: Move final norm after pool (instead of before), if None, enabled when global_pool == ‘avg’. drop_rate: Head dropout rate. pos_drop_rate: Position embedding dropout rate. attn_drop_rate: Attention dropout rate. drop_path_rate: Stochastic depth rate. weight_init: Weight initialization scheme. fix_init: Apply weight initialization fix (scaling w/ layer index). embed_layer: Patch embedding layer. embed_norm_layer: Normalization layer to use / override in patch embed module. norm_layer: Normalization layer. act_layer: MLP activation layer. block_fn: Transformer block layer.
- decoder¶
- forward_features(x)[source]¶
Forward pass through feature layers (embeddings, transformer blocks, post-transformer norm).
- global_pool = False¶
- loss_fn¶
- segmentation_head¶
- minerva.models.nets.image.vit.interpolate_pos_embed(model, checkpoint_model, newsize1=None, newsize2=None)[source]¶
- minerva.models.nets.image.vit.mae_vit_base_patch16¶
- minerva.models.nets.image.vit.mae_vit_base_patch16D4d256¶
- minerva.models.nets.image.vit.mae_vit_huge_patch14¶
- minerva.models.nets.image.vit.mae_vit_large_patch16¶
- minerva.models.nets.image.vit.mae_vit_large_patch16D4d256¶
- minerva.models.nets.image.vit.mae_vit_small_patch16¶