minerva.models.nets.image.vit_local.patch_embed =============================================== .. py:module:: minerva.models.nets.image.vit_local.patch_embed Classes ------- .. autoapisummary:: minerva.models.nets.image.vit_local.patch_embed.PatchEmbed Functions --------- .. autoapisummary:: minerva.models.nets.image.vit_local.patch_embed.resample_patch_embed Module Contents --------------- .. py:class:: PatchEmbed(img_size = 224, patch_size = 16, in_chans = 3, embed_dim = 768, norm_layer = None, flatten = True, output_fmt = None, bias = True, strict_img_size = True, dynamic_img_pad = False) Bases: :py:obj:`torch.nn.Module` 2D Image to Patch Embedding Initialize the PatchEmbed module. Parameters ---------- img_size : int or Tuple[int, int], default=224 Input image size. If None, image size will be inferred dynamically. patch_size : int or Tuple[int, int], default=16 Size of each image patch. in_chans : int, default=3 Number of input channels (e.g., 3 for RGB images). embed_dim : int, default=768 Dimension of the output patch embeddings. norm_layer : Callable, optional Normalization layer applied to the output embeddings. flatten : bool, default=True If True, flattens patches into a sequence (N, L, C). output_fmt : str, optional Output tensor format. If specified, overrides `flatten`. bias : bool, default=True Whether to include a bias term in the projection layer. strict_img_size : bool, default=True If True, enforces input images to match the specified size exactly. dynamic_img_pad : bool, default=False If True, applies dynamic padding for images not divisible by patch size. .. py:method:: _init_img_size(img_size) .. py:method:: dynamic_feat_size(img_size) Get grid (feature) size for given image size taking account of dynamic padding. NOTE: must be torchscript compatible so using fixed tuple indexing .. py:attribute:: dynamic_img_pad :type: torch.jit.Final[bool] .. py:method:: feat_ratio(as_scalar=True) .. py:method:: forward(x) Forward pass that converts an input image into patch embeddings. Parameters ---------- x : torch.Tensor Input tensor of shape (B, C, H, W), where B is batch size, C is number of channels, and H, W are spatial dimensions. Returns ------- torch.Tensor Patch embeddings tensor. Shape depends on output format: - If `flatten=True`: (B, num_patches, embed_dim) - If `flatten=False` and `output_fmt='NCHW'`: (B, embed_dim, H_p, W_p) - If using another output format: tensor is converted accordingly. .. py:attribute:: norm .. py:attribute:: output_fmt :type: timm.layers.format.Format .. py:attribute:: patch_size .. py:attribute:: proj .. py:method:: set_input_size(img_size = None, patch_size = None) .. py:attribute:: strict_img_size :value: True .. py:function:: resample_patch_embed(patch_embed, new_size, interpolation = 'bicubic', antialias = True) Resample the weights of the patch embedding kernel to target resolution. We resample the patch embedding kernel by approximately inverting the effect of patch resizing. Code based on: https://github.com/google-research/big_vision/blob/b00544b81f8694488d5f36295aeb7972f3755ffe/big_vision/models/proj/flexi/vit.py With this resizing, we can for example load a B/8 filter into a B/16 model and, on 2x larger input image, the result will match. Args: patch_embed: original parameter to be resized. new_size (tuple(int, int): target shape (height, width)-only. interpolation (str): interpolation for resize antialias (bool): use anti-aliasing filter in resize verbose (bool): log operation Returns: Resized patch embedding kernel.