minerva.models.nets.setr

Classes

SETR_PUP

Initializes the SetR model.

_SETRMLAHead

Multi level feature aggretation head of SETR.

_SETRUPHead

Naive upsampling head and Progressive upsampling head of SETR.

_SetR_PUP

Initializes the SETR PUP model.

Module Contents

class minerva.models.nets.setr.SETR_PUP(image_size=512, patch_size=16, num_layers=24, num_heads=16, hidden_dim=1024, mlp_dim=4096, encoder_dropout=0.1, num_classes=1000, norm_layer=None, decoder_channels=256, num_convs=4, up_scale=2, kernel_size=3, align_corners=False, decoder_dropout=0.1, conv_norm=None, conv_act=None, interpolate_mode='bilinear', loss_fn=None, train_metrics=None, val_metrics=None, test_metrics=None, aux_output=True, aux_output_layers=[9, 14, 19], aux_weights=[0.3, 0.3, 0.3])

Bases: lightning.LightningModule

Initializes the SetR model.

Parameters

image_sizeint or tuple[int, int]

The input image size. Defaults to 512.

patch_sizeint

The size of each patch. Defaults to 16.

num_layersint

The number of layers in the transformer encoder. Defaults to 24.

num_headsint

The number of attention heads in the transformer encoder. Defaults to 16.

hidden_dimint

The hidden dimension of the transformer encoder. Defaults to 1024.

mlp_dimint

The dimension of the MLP layers in the transformer encoder. Defaults to 4096.

encoder_dropoutfloat

The dropout rate for the transformer encoder. Defaults to 0.1.

num_classesint

The number of output classes. Defaults to 1000.

norm_layernn.Module, optional

The normalization layer to be used in the decoder. Defaults to None.

decoder_channelsint

The number of channels in the decoder. Defaults to 256.

num_convsint

The number of convolutional layers in the decoder. Defaults to 4.

up_scaleint

The scale factor for upsampling in the decoder. Defaults to 2.

kernel_sizeint

The kernel size for convolutional layers in the decoder. Defaults to 3.

align_cornersbool

Whether to align corners during interpolation in the decoder. Defaults to False.

decoder_dropoutfloat

The dropout rate for the decoder. Defaults to 0.1.

conv_normnn.Module, optional

The normalization layer to be used in the convolutional layers of the decoder. Defaults to None.

conv_actnn.Module, optional

The activation function to be used in the convolutional layers of the decoder. Defaults to None.

interpolate_modestr

The interpolation mode for upsampling in the decoder. Defaults to “bilinear”.

loss_fnnn.Module, optional

The loss function to be used during training. Defaults to None.

train_metricsDict[str, Metric], optional

The metrics to be used for training evaluation. Defaults to None.

val_metricsDict[str, Metric], optional

The metrics to be used for validation evaluation. Defaults to None.

test_metricsDict[str, Metric], optional

The metrics to be used for testing evaluation. Defaults to None.

aux_outputbool

Whether to include auxiliary output heads in the model. Defaults to True.

aux_output_layerslist[int] | None

The indices of the layers to output auxiliary predictions. Defaults to [9, 14, 19].

aux_weightslist[float]

The weights for the auxiliary predictions. Defaults to [0.3, 0.3, 0.3].

_compute_metrics(y_hat, y, step_name)
Parameters:
  • y_hat (torch.Tensor)

  • y (torch.Tensor)

  • step_name (str)

_loss_func(y_hat, y)

Calculate the loss between the output and the input data.

Parameters

y_hattorch.Tensor

The output data from the forward pass.

ytorch.Tensor

The input data/label.

Returns

torch.Tensor

The loss value.

Parameters:
  • y_hat (torch.Tensor | Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor])

  • y (torch.Tensor)

Return type:

torch.Tensor

_single_step(batch, batch_idx, step_name)

Perform a single step of the training/validation loop.

Parameters

batchtorch.Tensor

The input data.

batch_idxint

The index of the batch.

step_namestr

The name of the step, either “train” or “val”.

Returns

torch.Tensor

The loss value.

Parameters:
  • batch (torch.Tensor)

  • batch_idx (int)

  • step_name (str)

configure_optimizers()
forward(x)
Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

predict_step(batch, batch_idx, dataloader_idx=None)
Parameters:
  • batch (torch.Tensor)

  • batch_idx (int)

  • dataloader_idx (int | None)

test_step(batch, batch_idx)
Parameters:
  • batch (torch.Tensor)

  • batch_idx (int)

training_step(batch, batch_idx)
Parameters:
  • batch (torch.Tensor)

  • batch_idx (int)

validation_step(batch, batch_idx)
Parameters:
  • batch (torch.Tensor)

  • batch_idx (int)

Parameters:
  • image_size (int | tuple[int, int])

  • patch_size (int)

  • num_layers (int)

  • num_heads (int)

  • hidden_dim (int)

  • mlp_dim (int)

  • encoder_dropout (float)

  • num_classes (int)

  • norm_layer (Optional[torch.nn.Module])

  • decoder_channels (int)

  • num_convs (int)

  • up_scale (int)

  • kernel_size (int)

  • align_corners (bool)

  • decoder_dropout (float)

  • conv_norm (Optional[torch.nn.Module])

  • conv_act (Optional[torch.nn.Module])

  • interpolate_mode (str)

  • loss_fn (Optional[torch.nn.Module])

  • train_metrics (Optional[Dict[str, torchmetrics.Metric]])

  • val_metrics (Optional[Dict[str, torchmetrics.Metric]])

  • test_metrics (Optional[Dict[str, torchmetrics.Metric]])

  • aux_output (bool)

  • aux_output_layers (list[int] | None)

  • aux_weights (list[float])

class minerva.models.nets.setr._SETRMLAHead(channels, conv_norm, conv_act, in_channels, out_channels, num_classes, mla_channels=128, up_scale=4, kernel_size=3, align_corners=True, dropout=0.1, threshold=None)

Bases: torch.nn.Module

Multi level feature aggretation head of SETR.

MLA head of SETR.

Parameters:
  • channels (int)

  • conv_norm (Optional[torch.nn.Module])

  • conv_act (Optional[torch.nn.Module])

  • in_channels (list[int])

  • out_channels (int)

  • num_classes (int)

  • mla_channels (int)

  • up_scale (int)

  • kernel_size (int)

  • align_corners (bool)

  • dropout (float)

  • threshold (Optional[float])

forward(x)
class minerva.models.nets.setr._SETRUPHead(channels, in_channels, num_classes, norm_layer, conv_norm, conv_act, num_convs, up_scale, kernel_size, align_corners, dropout, interpolate_mode)

Bases: torch.nn.Module

Naive upsampling head and Progressive upsampling head of SETR.

Naive or PUP head of SETR.

Initializes the SETR model.

Parameters

channelsint

Number of output channels.

in_channelsint

Number of input channels.

num_classesint

Number of output classes.

norm_layernn.Module

Normalization layer.

conv_normnn.Module

Convolutional normalization layer.

conv_actnn.Module

Convolutional activation layer.

num_convsint

Number of convolutional layers.

up_scaleint

Upsampling scale factor.

kernel_sizeint

Kernel size for convolutional layers.

align_cornersbool

Whether to align corners during upsampling.

dropoutfloat

Dropout rate.

interpolate_modestr

Interpolation mode for upsampling.

Raises

AssertionError

If kernel_size is not 1 or 3.

forward(x)
Parameters:
  • channels (int)

  • in_channels (int)

  • num_classes (int)

  • norm_layer (torch.nn.Module)

  • conv_norm (torch.nn.Module)

  • conv_act (torch.nn.Module)

  • num_convs (int)

  • up_scale (int)

  • kernel_size (int)

  • align_corners (bool)

  • dropout (float)

  • interpolate_mode (str)

class minerva.models.nets.setr._SetR_PUP(image_size, patch_size, num_layers, num_heads, hidden_dim, mlp_dim, num_convs, num_classes, decoder_channels, up_scale, encoder_dropout, kernel_size, decoder_dropout, norm_layer, interpolate_mode, conv_norm, conv_act, align_corners, aux_output=False, aux_output_layers=None)

Bases: torch.nn.Module

Initializes the SETR PUP model.

Parameters

image_sizeint or tuple[int, int]

The size of the input image.

patch_sizeint

The size of each patch in the input image.

num_layersint

The number of layers in the transformer encoder.

num_headsint

The number of attention heads in the transformer encoder.

hidden_dimint

The hidden dimension of the transformer encoder.

mlp_dimint

The dimension of the feed-forward network in the transformer encoder.

num_convsint

The number of convolutional layers in the decoder.

num_classesint

The number of output classes.

decoder_channelsint

The number of channels in the decoder.

up_scaleint

The scale factor for upsampling in the decoder.

encoder_dropoutfloat

The dropout rate for the transformer encoder.

kernel_sizeint

The kernel size for the convolutional layers in the decoder.

decoder_dropoutfloat

The dropout rate for the decoder.

norm_layernn.Module

The normalization layer to be used.

interpolate_modestr

The mode for interpolation during upsampling.

conv_normnn.Module

The normalization layer to be used in the decoder convolutional layers.

conv_actnn.Module

The activation function to be used in the decoder convolutional layers.

align_cornersbool

Whether to align corners during upsampling.

forward(x)
Parameters:

x (torch.Tensor)

Parameters:
  • image_size (int | tuple[int, int])

  • patch_size (int)

  • num_layers (int)

  • num_heads (int)

  • hidden_dim (int)

  • mlp_dim (int)

  • num_convs (int)

  • num_classes (int)

  • decoder_channels (int)

  • up_scale (int)

  • encoder_dropout (float)

  • kernel_size (int)

  • decoder_dropout (float)

  • norm_layer (torch.nn.Module)

  • interpolate_mode (str)

  • conv_norm (torch.nn.Module)

  • conv_act (torch.nn.Module)

  • align_corners (bool)

  • aux_output (bool)

  • aux_output_layers (list[int] | None)