minerva.models.nets.image.vit_local.patch_embed¶
Classes¶
2D Image to Patch Embedding |
Functions¶
|
Resample the weights of the patch embedding kernel to target resolution. |
Module Contents¶
- class minerva.models.nets.image.vit_local.patch_embed.PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None, flatten=True, output_fmt=None, bias=True, strict_img_size=True, dynamic_img_pad=False)[source]¶
Bases:
torch.nn.Module2D Image to Patch Embedding
Initialize the PatchEmbed module.
Parameters¶
- img_sizeint or Tuple[int, int], default=224
Input image size. If None, image size will be inferred dynamically.
- patch_sizeint or Tuple[int, int], default=16
Size of each image patch.
- in_chansint, default=3
Number of input channels (e.g., 3 for RGB images).
- embed_dimint, default=768
Dimension of the output patch embeddings.
- norm_layerCallable, optional
Normalization layer applied to the output embeddings.
- flattenbool, default=True
If True, flattens patches into a sequence (N, L, C).
- output_fmtstr, optional
Output tensor format. If specified, overrides flatten.
- biasbool, default=True
Whether to include a bias term in the projection layer.
- strict_img_sizebool, default=True
If True, enforces input images to match the specified size exactly.
- dynamic_img_padbool, default=False
If True, applies dynamic padding for images not divisible by patch size.
- dynamic_feat_size(img_size)[source]¶
Get grid (feature) size for given image size taking account of dynamic padding. NOTE: must be torchscript compatible so using fixed tuple indexing
- Parameters:
img_size (Tuple[int, int])
- Return type:
Tuple[int, int]
- dynamic_img_pad: torch.jit.Final[bool]¶
- forward(x)[source]¶
Forward pass that converts an input image into patch embeddings.
Parameters¶
- xtorch.Tensor
Input tensor of shape (B, C, H, W), where B is batch size, C is number of channels, and H, W are spatial dimensions.
Returns¶
- torch.Tensor
Patch embeddings tensor. Shape depends on output format: - If flatten=True: (B, num_patches, embed_dim) - If flatten=False and output_fmt=’NCHW’: (B, embed_dim, H_p, W_p) - If using another output format: tensor is converted accordingly.
- Parameters:
x (torch.Tensor)
- norm¶
- output_fmt: timm.layers.format.Format¶
- patch_size¶
- proj¶
- set_input_size(img_size=None, patch_size=None)[source]¶
- Parameters:
img_size (Optional[Union[int, Tuple[int, int]]])
patch_size (Optional[Union[int, Tuple[int, int]]])
- strict_img_size = True¶
- Parameters:
img_size (Union[int, Tuple[int, int]])
patch_size (Union[int, Tuple[int, int]])
in_chans (int)
embed_dim (int)
norm_layer (Optional[Callable])
flatten (bool)
output_fmt (Optional[str])
bias (bool)
strict_img_size (bool)
dynamic_img_pad (bool)
- minerva.models.nets.image.vit_local.patch_embed.resample_patch_embed(patch_embed, new_size, interpolation='bicubic', antialias=True)[source]¶
Resample the weights of the patch embedding kernel to target resolution. We resample the patch embedding kernel by approximately inverting the effect of patch resizing.
With this resizing, we can for example load a B/8 filter into a B/16 model and, on 2x larger input image, the result will match.
- Args:
patch_embed: original parameter to be resized. new_size (tuple(int, int): target shape (height, width)-only. interpolation (str): interpolation for resize antialias (bool): use anti-aliasing filter in resize verbose (bool): log operation
- Returns:
Resized patch embedding kernel.
- Parameters:
patch_embed (torch.nn.Parameter)
new_size (List[int])
interpolation (str)
antialias (bool)