minerva.models.nets.setr

Classes

`SETR_PUP`	Initializes the SetR model.
`_SETRMLAHead`	Multi level feature aggretation head of SETR.
`_SETRUPHead`	Naive upsampling head and Progressive upsampling head of SETR.
`_SetR_PUP`	Initializes the SETR PUP model.

Module Contents

class minerva.models.nets.setr.SETR_PUP(image_size=512, patch_size=16, num_layers=24, num_heads=16, hidden_dim=1024, mlp_dim=4096, encoder_dropout=0.1, num_classes=1000, norm_layer=None, decoder_channels=256, num_convs=4, up_scale=2, kernel_size=3, align_corners=False, decoder_dropout=0.1, conv_norm=None, conv_act=None, interpolate_mode='bilinear', loss_fn=None, train_metrics=None, val_metrics=None, test_metrics=None, aux_output=True, aux_output_layers=[9, 14, 19], aux_weights=[0.3, 0.3, 0.3])

Bases: lightning.LightningModule

Initializes the SetR model.

Parameters

image_sizeint or tuple[int, int]: The input image size. Defaults to 512.
patch_sizeint: The size of each patch. Defaults to 16.
num_layersint: The number of layers in the transformer encoder. Defaults to 24.
num_headsint: The number of attention heads in the transformer encoder. Defaults to 16.
hidden_dimint: The hidden dimension of the transformer encoder. Defaults to 1024.
mlp_dimint: The dimension of the MLP layers in the transformer encoder. Defaults to 4096.
encoder_dropoutfloat: The dropout rate for the transformer encoder. Defaults to 0.1.
num_classesint: The number of output classes. Defaults to 1000.
norm_layernn.Module, optional: The normalization layer to be used in the decoder. Defaults to None.
decoder_channelsint: The number of channels in the decoder. Defaults to 256.
num_convsint: The number of convolutional layers in the decoder. Defaults to 4.
up_scaleint: The scale factor for upsampling in the decoder. Defaults to 2.
kernel_sizeint: The kernel size for convolutional layers in the decoder. Defaults to 3.
align_cornersbool: Whether to align corners during interpolation in the decoder. Defaults to False.
decoder_dropoutfloat: The dropout rate for the decoder. Defaults to 0.1.
conv_normnn.Module, optional: The normalization layer to be used in the convolutional layers of the decoder. Defaults to None.
conv_actnn.Module, optional: The activation function to be used in the convolutional layers of the decoder. Defaults to None.
interpolate_modestr: The interpolation mode for upsampling in the decoder. Defaults to “bilinear”.
loss_fnnn.Module, optional: The loss function to be used during training. Defaults to None.
train_metricsDict[str, Metric], optional: The metrics to be used for training evaluation. Defaults to None.
val_metricsDict[str, Metric], optional: The metrics to be used for validation evaluation. Defaults to None.
test_metricsDict[str, Metric], optional: The metrics to be used for testing evaluation. Defaults to None.
aux_outputbool: Whether to include auxiliary output heads in the model. Defaults to True.
aux_output_layerslist[int] | None: The indices of the layers to output auxiliary predictions. Defaults to [9, 14, 19].
aux_weightslist[float]: The weights for the auxiliary predictions. Defaults to [0.3, 0.3, 0.3].

_compute_metrics(y_hat, y, step_name)

Parameters:

y_hat (torch.Tensor)
y (torch.Tensor)
step_name (str)

_loss_func(y_hat, y)

Calculate the loss between the output and the input data.

Parameters

y_hattorch.Tensor: The output data from the forward pass.
ytorch.Tensor: The input data/label.

Returns

torch.Tensor: The loss value.

Parameters:

y_hat (torch.Tensor | Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor])
y (torch.Tensor)

Return type:

torch.Tensor

_single_step(batch, batch_idx, step_name)

Perform a single step of the training/validation loop.

Parameters

batchtorch.Tensor: The input data.
batch_idxint: The index of the batch.
step_namestr: The name of the step, either “train” or “val”.

Returns

torch.Tensor: The loss value.

Parameters:

batch (torch.Tensor)
batch_idx (int)
step_name (str)

configure_optimizers()

forward(x)

Parameters:: x (torch.Tensor)
Return type:: torch.Tensor

predict_step(batch, batch_idx, dataloader_idx=None)

Parameters:

batch (torch.Tensor)
batch_idx (int)
dataloader_idx (int | None)

test_step(batch, batch_idx)

Parameters:

batch (torch.Tensor)
batch_idx (int)

training_step(batch, batch_idx)

Parameters:

batch (torch.Tensor)
batch_idx (int)

validation_step(batch, batch_idx)

Parameters:

batch (torch.Tensor)
batch_idx (int)

Parameters:

image_size (int | tuple[int, int])
patch_size (int)
num_layers (int)
num_heads (int)
hidden_dim (int)
mlp_dim (int)
encoder_dropout (float)
num_classes (int)
norm_layer (Optional[torch.nn.Module])
decoder_channels (int)
num_convs (int)
up_scale (int)
kernel_size (int)
align_corners (bool)
decoder_dropout (float)
conv_norm (Optional[torch.nn.Module])
conv_act (Optional[torch.nn.Module])
interpolate_mode (str)
loss_fn (Optional[torch.nn.Module])
train_metrics (Optional[Dict[str, torchmetrics.Metric]])
val_metrics (Optional[Dict[str, torchmetrics.Metric]])
test_metrics (Optional[Dict[str, torchmetrics.Metric]])
aux_output (bool)
aux_output_layers (list[int] | None)
aux_weights (list[float])

class minerva.models.nets.setr._SETRMLAHead(channels, conv_norm, conv_act, in_channels, out_channels, num_classes, mla_channels=128, up_scale=4, kernel_size=3, align_corners=True, dropout=0.1, threshold=None)

Bases: torch.nn.Module

Multi level feature aggretation head of SETR.

MLA head of SETR.

Parameters:

channels (int)
conv_norm (Optional[torch.nn.Module])
conv_act (Optional[torch.nn.Module])
in_channels (list[int])
out_channels (int)
num_classes (int)
mla_channels (int)
up_scale (int)
kernel_size (int)
align_corners (bool)
dropout (float)
threshold (Optional[float])

forward(x)

class minerva.models.nets.setr._SETRUPHead(channels, in_channels, num_classes, norm_layer, conv_norm, conv_act, num_convs, up_scale, kernel_size, align_corners, dropout, interpolate_mode)

Bases: torch.nn.Module

Naive upsampling head and Progressive upsampling head of SETR.

Naive or PUP head of SETR.

Initializes the SETR model.

Parameters

channelsint: Number of output channels.
in_channelsint: Number of input channels.
num_classesint: Number of output classes.
norm_layernn.Module: Normalization layer.
conv_normnn.Module: Convolutional normalization layer.
conv_actnn.Module: Convolutional activation layer.
num_convsint: Number of convolutional layers.
up_scaleint: Upsampling scale factor.
kernel_sizeint: Kernel size for convolutional layers.
align_cornersbool: Whether to align corners during upsampling.
dropoutfloat: Dropout rate.
interpolate_modestr: Interpolation mode for upsampling.

Raises

AssertionError: If kernel_size is not 1 or 3.

forward(x)

Parameters:

channels (int)
in_channels (int)
num_classes (int)
norm_layer (torch.nn.Module)
conv_norm (torch.nn.Module)
conv_act (torch.nn.Module)
num_convs (int)
up_scale (int)
kernel_size (int)
align_corners (bool)
dropout (float)
interpolate_mode (str)

class minerva.models.nets.setr._SetR_PUP(image_size, patch_size, num_layers, num_heads, hidden_dim, mlp_dim, num_convs, num_classes, decoder_channels, up_scale, encoder_dropout, kernel_size, decoder_dropout, norm_layer, interpolate_mode, conv_norm, conv_act, align_corners, aux_output=False, aux_output_layers=None)

Bases: torch.nn.Module

Initializes the SETR PUP model.

Parameters

image_sizeint or tuple[int, int]: The size of the input image.
patch_sizeint: The size of each patch in the input image.
num_layersint: The number of layers in the transformer encoder.
num_headsint: The number of attention heads in the transformer encoder.
hidden_dimint: The hidden dimension of the transformer encoder.
mlp_dimint: The dimension of the feed-forward network in the transformer encoder.
num_convsint: The number of convolutional layers in the decoder.
num_classesint: The number of output classes.
decoder_channelsint: The number of channels in the decoder.
up_scaleint: The scale factor for upsampling in the decoder.
encoder_dropoutfloat: The dropout rate for the transformer encoder.
kernel_sizeint: The kernel size for the convolutional layers in the decoder.
decoder_dropoutfloat: The dropout rate for the decoder.
norm_layernn.Module: The normalization layer to be used.
interpolate_modestr: The mode for interpolation during upsampling.
conv_normnn.Module: The normalization layer to be used in the decoder convolutional layers.
conv_actnn.Module: The activation function to be used in the decoder convolutional layers.
align_cornersbool: Whether to align corners during upsampling.

forward(x)

Parameters:: x (torch.Tensor)

Parameters:

image_size (int | tuple[int, int])
patch_size (int)
num_layers (int)
num_heads (int)
hidden_dim (int)
mlp_dim (int)
num_convs (int)
num_classes (int)
decoder_channels (int)
up_scale (int)
encoder_dropout (float)
kernel_size (int)
decoder_dropout (float)
norm_layer (torch.nn.Module)
interpolate_mode (str)
conv_norm (torch.nn.Module)
conv_act (torch.nn.Module)
align_corners (bool)
aux_output (bool)
aux_output_layers (list[int] | None)