minerva.models.nets.image.deeplabv3

Classes

DeepLabV3

A DeeplabV3 with a ResNet50 backbone

DeepLabV3Backbone

A ResNet50 backbone for DeepLabV3

DeepLabV3PredictionHead

The prediction head for DeepLabV3

DeepLabV3RegressionHead

Regression head for DeepLabV3 (continuous per-pixel/voxel prediction).

Module Contents

class minerva.models.nets.image.deeplabv3.DeepLabV3(backbone=None, pred_head=None, loss_fn=None, learning_rate=0.001, num_classes=6, pretrained=False, weights_path=None, train_metrics=None, val_metrics=None, test_metrics=None, optimizer=torch.optim.Adam, optimizer_kwargs=None, lr_scheduler=None, lr_scheduler_kwargs=None, output_shape=None, freeze_backbone=False, interpolate_mode='bilinear', flatten=False, loss_squeeze=True, loss_long=True)[source]

Bases: minerva.models.nets.base.SimpleSupervisedModel

A DeeplabV3 with a ResNet50 backbone

References

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. “Rethinking Atrous Convolution for Semantic Image Segmentation”, 2017

Initializes a DeepLabV3 model.

Parameters

backbone: Optional[nn.Module]

The backbone network. Defaults to None, which will use a ResNet50 backbone.

pred_head: Optional[nn.Module]

The prediction head network. Defaults to None, which will use a DeepLabV3PredictionHead with specified number of classes.

loss_fn: Optional[nn.Module]

The loss function. Defaults to None, which will use a CrossEntropyLoss.

learning_rate: float

The learning rate for the optimizer. Defaults to 0.001.

num_classes: int

The number of classes for prediction. Defaults to 6.

pretrained: bool

Whether to use pretrained weights. Defaults to False.

weights_path: Optional[str]

Path to local pretrained weights file. If provided with pretrained=True, loads weights from this path instead of downloading. Defaults to None.

train_metrics: Optional[Dict[str, Metric]]

The metrics to be computed during training. Defaults to None.

val_metrics: Optional[Dict[str, Metric]]

The metrics to be computed during validation. Defaults to None.

test_metrics: Optional[Dict[str, Metric]]

The metrics to be computed during testing. Defaults to None.

optimizer: type

Optimizer class to be instantiated. By default, it is set to torch.optim.Adam. Should be a subclass of torch.optim.Optimizer (e.g., torch.optim.SGD).

optimizer_kwargsdict, optional

Additional kwargs passed to the optimizer constructor.

lr_schedulertype, optional

Learning rate scheduler class to be instantiated. By default, it is set to None, which means no scheduler will be used. Should be a subclass of torch.optim.lr_scheduler.LRScheduler (e.g., torch.optim.lr_scheduler.StepLR).

lr_scheduler_kwargsdict, optional

Additional kwargs passed to the scheduler constructor.

output_shape: Optional[Tuple[int, …]]

The output shape of the model. If None, the output shape will be the same as the input shape. Defaults to None. This is useful for models that require a specific output shape, that is different from the input shape.

freeze_backbone: bool

Whether to freeze the backbone weights during training. Defaults to False.

interpolate_mode: Optional[str]

The interpolation mode to use when upscaling the output to the desired output shape. Defaults to “bilinear”. Other options include “nearest”, “bicubic”, etc. See PyTorch documentation for torch.nn.functional.interpolate for all options. Use None to disable upscaling.

flatten: bool

Whether to flatten the output of the backbone before passing it to the prediction head. Defaults to False. Set to True for classification tasks where the prediction head is a fully connected layer.

loss_squeeze: bool

Whether to squeeze the target tensor in the loss function. Defaults to True. This is useful for segmentation tasks where the target tensor has a singleton channel dimension (e.g., shape (B, 1, H, W)) and the loss function expects shape (B, H, W).

loss_long: bool

Whether to convert the target tensor to long type in the loss function. Defaults to True. This is useful for classification tasks where the target tensor is of integer type.

_loss_func(y_hat, y)[source]

Computes the loss between predictions and ground truth.

Parameters

y_hatTensor

Predicted tensor of shape (batch_size, num_classes, height, width)

yTensor

Ground truth tensor of shape (batch_size, 1, height, width)

Parameters:
  • y_hat (torch.Tensor)

  • y (torch.Tensor)

Return type:

torch.Tensor

forward(x)[source]

Performs the forward pass of the DeepLabV3 model.

Parameters

xTensor

Input tensor of shape (batch_size, channels, height, width)

Returns

Tensor

Output tensor of shape (batch_size, num_classes, height, width)

Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

interpolate_mode = 'bilinear'
loss_long = True
output_shape = None
squeeze_loss = True
Parameters:
  • backbone (Optional[torch.nn.Module])

  • pred_head (Optional[torch.nn.Module])

  • loss_fn (Optional[torch.nn.Module])

  • learning_rate (float)

  • num_classes (int)

  • pretrained (bool)

  • weights_path (Optional[str])

  • train_metrics (Optional[Dict[str, torchmetrics.Metric]])

  • val_metrics (Optional[Dict[str, torchmetrics.Metric]])

  • test_metrics (Optional[Dict[str, torchmetrics.Metric]])

  • optimizer (type)

  • optimizer_kwargs (Optional[Dict[str, Any]])

  • lr_scheduler (Optional[type])

  • lr_scheduler_kwargs (Optional[Dict[str, Any]])

  • output_shape (Optional[Tuple[int, Ellipsis]])

  • freeze_backbone (bool)

  • interpolate_mode (Optional[str])

  • flatten (bool)

  • loss_squeeze (bool)

  • loss_long (bool)

class minerva.models.nets.image.deeplabv3.DeepLabV3Backbone(num_classes=6, pretrained=False, weights_path=None)[source]

Bases: torch.nn.Module

A ResNet50 backbone for DeepLabV3

Initializes the DeepLabV3 backbone model.

Parameters

num_classes: int

The number of classes for classification. Default is 6.

pretrained: bool

Whether to use pretrained weights. If True and weights_path is None, will attempt to download ImageNet pretrained weights. Default is False.

weights_path: Optional[str]

Path to local pretrained weights file. If provided with pretrained=True, loads weights from this path instead of downloading. Default is None.

RN50model
forward(x)[source]

Performs the forward pass of the backbone.

Parameters

xTensor

Input tensor of shape (batch_size, channels, height, width)

Returns

Tensor

Feature map tensor from the ResNet50 backbone

freeze_weights()[source]

Freezes all parameters in the backbone, making them non-trainable.

unfreeze_weights()[source]

Unfreezes all parameters in the backbone, making them trainable.

Parameters:
  • num_classes (int)

  • pretrained (bool)

  • weights_path (Optional[str])

class minerva.models.nets.image.deeplabv3.DeepLabV3PredictionHead(in_channels=2048, num_classes=6, atrous_rates=(12, 24, 36))[source]

Bases: torch.nn.Sequential

The prediction head for DeepLabV3

Initializes the DeepLabV3 prediction head.

Parameters

in_channels: int

Number of input channels. Defaults to 2048.

num_classes: int

Number of output classes. Defaults to 6.

atrous_rates: Sequence[int]

A sequence of atrous rates for the ASPP module. Defaults to (12, 24, 36).

forward(input)[source]

Performs the forward pass of the prediction head.

Parameters

inputTensor

Input tensor from the backbone

Returns

Tensor

Output tensor with class predictions

Return type:

torch.Tensor

Parameters:
  • in_channels (int)

  • num_classes (int)

  • atrous_rates (Sequence[int])

class minerva.models.nets.image.deeplabv3.DeepLabV3RegressionHead(in_channels=2048, out_channels=1, atrous_rates=(12, 24, 36))[source]

Bases: torch.nn.Sequential

Regression head for DeepLabV3 (continuous per-pixel/voxel prediction).

Parameters

in_channelsint

Number of input channels from the backbone (typically 2048 for ResNet50).

out_channelsint

Number of output channels (1 for single regression target).

atrous_ratesSequence[int]

Atrous (dilation) rates for ASPP.

forward(x)[source]

Forward pass for regression head.

Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

Parameters:
  • in_channels (int)

  • out_channels (int)

  • atrous_rates (Sequence[int])