minerva.data.readers
====================

.. py:module:: minerva.data.readers


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/minerva/data/readers/base_file_iterator/index
   /autoapi/minerva/data/readers/csv_reader/index
   /autoapi/minerva/data/readers/index_reader/index
   /autoapi/minerva/data/readers/mdio_reader/index
   /autoapi/minerva/data/readers/multi_reader/index
   /autoapi/minerva/data/readers/patched_array_reader/index
   /autoapi/minerva/data/readers/png_reader/index
   /autoapi/minerva/data/readers/reader/index
   /autoapi/minerva/data/readers/tabular_reader/index
   /autoapi/minerva/data/readers/tiff_reader/index
   /autoapi/minerva/data/readers/zarr_reader/index


Classes
-------

.. autoapisummary::

   minerva.data.readers.LazyPaddedPatchedArrayReader
   minerva.data.readers.LazyPaddedPatchedMDIOReader
   minerva.data.readers.LazyPaddedPatchedZarrReader
   minerva.data.readers.NumpyArrayReader
   minerva.data.readers.PNGReader
   minerva.data.readers.PatchedArrayReader
   minerva.data.readers.PatchedMDIOReader
   minerva.data.readers.PatchedZarrReader
   minerva.data.readers.TiffReader
   minerva.data.readers._Reader


Package Contents
----------------

.. py:class:: LazyPaddedPatchedArrayReader(data, data_shape, stride = None, pad_width = None, pad_mode = 'constant', pad_kwargs = None)

   Bases: :py:obj:`PatchedArrayReader`


   Reads patches from a NumPy array.
   This class is a subclass of `PatchedArrayReader` and is designed to perform padding only when the patch
   consumed by `__get_item__` is in a region that uses the padding (boundary regions).
   If no padding is necessary, use PatchedArrayReader.

   Reads data from a NumPy array and generates patches from it.

   Parameters
   ----------
   data : ArrayLike
       The input array from which patches are generated.
   data_shape : Tuple[int, ...]
       The shape of the patches to be extracted. This will be the shape of
       the subarray that is returned when a patch is accessed using
       __getitem__.
   stride : Tuple[int, ...], optional
       The stride between consecutive patches. If `None`, the stide will
       be the same as `data_shape`. By default None
   pad_width : Tuple[Tuple[int, int], ...], optional
       The width of padding to be applied to the data array. By default
       `None`, that is, no padding is applied. Check the documentation of
       `numpy.pad` for more information.
   pad_mode : str, optional
       The padding mode, by default "constant". Check the documentation of
       `numpy.pad` for more information.
   pad_kwargs : dict, optional
       Additional keyword arguments for padding, by default None

   Examples
   --------

   ```python
   >>> import numpy as np
   >>> # Generate a 10x10 array
   >>> data = np.arange(100).reshape(10, 10)
   >>> # Create a reader that generates 5x5 patches with a stride of 2 in the
   >>> # first dimension and 5 in the second dimension.
   >>> reader = PatchedArrayReader(
   >>>    data,
   >>>    data_shape=(5, 5),
   >>>    stride=(2, 5),
   >>> )
   >>> # Printing the number of patches that can be extracted from the data
   >>> print(len(reader))
   6
   >>> # Printing the indices of the patches
   >>> print(reader.indices)
   [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)]
   >>> # Fetch the first patch and print its shape
   >>> print(reader[0].shape)
   (5, 5)
   >>> # Fetch the third patch and print its content
   >>> print(reader[2])
   [[20 21 22 23 24]
    [30 31 32 33 34]
    [40 41 42 43 44]
    [50 51 52 53 54]
    [60 61 62 63 64]]
   ```



   .. py:method:: __getitem__(idx)

      Fetch a patch from the data array.

      Parameters
      ----------
      idx : int
          The index of the patch to be fetched.

      Returns
      -------
      np.ndarray
          The patch that was fetched from the data array with shape
          `data_shape`



   .. py:method:: _get_patches()

      Compute the left upper corner indices of the patches that will be
      extracted from the data array. The patches are extracted with a stride
      between them. A list of indices is returned, where each index is a tuple
      of integers representing the coordinates of the left upper corner of the
      patches.

      Returns
      -------
      List[Tuple[int, ...]]
          A list of indices (coordinates) representing the left upper corner
          of the patches.



   .. py:method:: _pad_data(pad_width, mode='constant', **kwargs)

      Apply padding to the data array.

      Parameters
      ----------
      pad_width : Tuple[Tuple[int, int], ...]
          The width of padding to be applied to the data array
      mode : str, optional
          The padding mode, by default "constant"



.. py:class:: LazyPaddedPatchedMDIOReader(*args, path, mdio_data = 'data/chunked_012', **kwargs)

   Bases: :py:obj:`minerva.data.readers.zarr_reader.LazyPaddedPatchedZarrReader`


   Reads patches from a MDIO array. This class is a subclass of
   `LazyPaddedPatchedZarrReader` and is designed to read patches from the the
   data Zarr array inside thre MDIO array subdirectory,
   performing padding in a lazy manner (padding is done in `__getitem__` call).
   If no padding is necessary, use PatchedMDIOReader.

   Reads patches from a MDIO array. This class is a subclass of
   `LazyPaddedPatchedZarrReader`. All other parameters are the same as parent class.
   Please refer to the documentation of `LazyPaddedPatchedZarrReader` for more
   information.

   Parameters
   ----------
   path : PathLike
       Path to the MDIO array.
   mdio_data: str
       Name of data array inside MDIO subdirectory. Defaults to 'data/chunked_012'
   Notes
   -----
   1.  The MDIO array is expected to be stored on disk. If the array is not
       stored on disk, it is recommended to use the `LazyPaddedPatchedArrayReader`
       class instead.

   2.  Padding is computed when necessary during `__getitem__` calls. 

   Examples
   ---------

   ```python
   >>> from pathlib import Path
   >>> data_path = Path("data.mdio")
   >>> reader = LazyPaddedPatchedMDIOReader(
   ...     path=data_path,
   ...     data_shape=(5, 5),
   ...     stride=(2, 5),
   ...     pad_width=((1,1), (0,2)),
   ... )
   >>> print(len(reader))
   >>> print(reader[0])



   .. py:attribute:: mdio_path


.. py:class:: LazyPaddedPatchedZarrReader(*args, path, **kwargs)

   Bases: :py:obj:`minerva.data.readers.patched_array_reader.LazyPaddedPatchedArrayReader`


   Reads patches from a Zarr array. This class is a subclass of
   `LazyPaddedPatchedArrayReader` and is designed to read patches from a Zarr array,
   performing padding in a lazy manner (padding is done in `__getitem__` call).
   If no padding is necessary, use PatchedZarrReader.

   Reads patches from a Zarr array. This class is a subclass of
   `LazyPaddedPatchedArrayReader`. All other parameters are the same as parent class.
   Please refer to the documentation of `LazyPaddedPatchedArrayReader` for more
   information. This class can be used when padding is necessary and the whole dataset doesn't fit into memory.

   Parameters
   ----------
   path : PathLike
       Path to the Zarr array.
   Notes
   -----
   1.  The Zarr array is expected to be stored on disk. If the array is not
       stored on disk, it is recommended to use the `LazyPaddedPatchedArrayReader`
       class instead.

   2.  Padding is computed when necessary during `__getitem__` calls. 

   Examples
   ---------

   ```python
   >>> from pathlib import Path
   >>> data_path = Path("data.zarr")
   >>> reader = LazyPaddedPatchedZarrReader(
   ...     path=data_path,
   ...     data_shape=(5, 5),
   ...     stride=(2, 5),
   ...     pad_width=((1,1), (0,2)),
   ... )
   >>> print(len(reader))
   >>> print(reader[0])



   .. py:attribute:: path


.. py:class:: NumpyArrayReader(data, data_shape, stride = None, pad_width = None, pad_mode = 'constant', pad_kwargs = None, allow_pickle = True, npz_key = None)

   Bases: :py:obj:`PatchedArrayReader`


   This class is used to read data from a NumPy array. It is designed to generate
   patches from the data and provides sequential access to them. This class can
   serve as a base class for other readers.

   Assumptions:
   - The input data is expected to be a NumPy-like array, that is, it should
       support NumPy-like indexing.
   - Patches are fixed-size subarrays of the data.
   - Patches can have overlap between them.

   Reads data from a NumPy array and generates patches from it.

   Parameters
   ----------
   data : ArrayLike
       The input array from which patches are generated.
   data_shape : Tuple[int, ...]
       The shape of the patches to be extracted. This will be the shape of
       the subarray that is returned when a patch is accessed using
       __getitem__.
   stride : Tuple[int, ...], optional
       The stride between consecutive patches. If `None`, the stide will
       be the same as `data_shape`. By default None
   pad_width : Tuple[Tuple[int, int], ...], optional
       The width of padding to be applied to the data array. By default
       `None`, that is, no padding is applied. Check the documentation of
       `numpy.pad` for more information.
   pad_mode : str, optional
       The padding mode, by default "constant". Check the documentation of
       `numpy.pad` for more information.
   pad_kwargs : dict, optional
       Additional keyword arguments for padding, by default None

   Examples
   --------

   ```python
   >>> import numpy as np
   >>> # Generate a 10x10 array
   >>> data = np.arange(100).reshape(10, 10)
   >>> # Create a reader that generates 5x5 patches with a stride of 2 in the
   >>> # first dimension and 5 in the second dimension.
   >>> reader = PatchedArrayReader(
   >>>    data,
   >>>    data_shape=(5, 5),
   >>>    stride=(2, 5),
   >>> )
   >>> # Printing the number of patches that can be extracted from the data
   >>> print(len(reader))
   6
   >>> # Printing the indices of the patches
   >>> print(reader.indices)
   [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)]
   >>> # Fetch the first patch and print its shape
   >>> print(reader[0].shape)
   (5, 5)
   >>> # Fetch the third patch and print its content
   >>> print(reader[2])
   [[20 21 22 23 24]
    [30 31 32 33 34]
    [40 41 42 43 44]
    [50 51 52 53 54]
    [60 61 62 63 64]]
   ```



.. py:class:: PNGReader(path, sort_method = None, delimiter = None, key_index = 0, reverse = False)

   Bases: :py:obj:`minerva.data.readers.base_file_iterator.BaseFileIterator`


   A base class for iterating over files in a directory in a custom sorted 
   order.

   Load image files from a directory.

   Parameters
   ----------
   path : Union[Path, str]
       The path to the directory containing the image files. Files will be
       searched recursively.
   sort_method : Optional[List[str]], optional
       A list specifying how to sort each part of the filename. Each
       element can  be either "text" (lexicographical) or "numeric"
       (numerically). By default, None, which will use "numeric" if
       numeric parts are detected.
   delimiter : Optional[str], optional
       The delimiter to split filenames into components, by default None.
   key_index : Union[int, List[int]], optional
       The index (or list of indices) of the part(s) of the filename to
       use  for sorting. If a list is provided, files will be sorted
       based on  multiple parts in sequence. Thus, first by the part at
       index 0, then by the part at index 1, and so on. By default 0.
   reverse : bool, optional
       Whether to sort in reverse order, by default False.

   Raises
   ------
   NotADirectoryError
       If the path is not a directory.


   .. py:method:: __getitem__(index)

      Retrieve the PNG file at the specified index.



   .. py:method:: __str__()


   .. py:attribute:: root_dir


.. py:class:: PatchedArrayReader(data, data_shape, stride = None, pad_width = None, pad_mode = 'constant', pad_kwargs = None)

   Bases: :py:obj:`minerva.data.readers.reader._Reader`


   This class is used to read data from a NumPy array. It is designed to generate
   patches from the data and provides sequential access to them. This class can
   serve as a base class for other readers.

   Assumptions:
   - The input data is expected to be a NumPy-like array, that is, it should
       support NumPy-like indexing.
   - Patches are fixed-size subarrays of the data.
   - Patches can have overlap between them.

   Reads data from a NumPy array and generates patches from it.

   Parameters
   ----------
   data : ArrayLike
       The input array from which patches are generated.
   data_shape : Tuple[int, ...]
       The shape of the patches to be extracted. This will be the shape of
       the subarray that is returned when a patch is accessed using
       __getitem__.
   stride : Tuple[int, ...], optional
       The stride between consecutive patches. If `None`, the stide will
       be the same as `data_shape`. By default None
   pad_width : Tuple[Tuple[int, int], ...], optional
       The width of padding to be applied to the data array. By default
       `None`, that is, no padding is applied. Check the documentation of
       `numpy.pad` for more information.
   pad_mode : str, optional
       The padding mode, by default "constant". Check the documentation of
       `numpy.pad` for more information.
   pad_kwargs : dict, optional
       Additional keyword arguments for padding, by default None

   Examples
   --------

   ```python
   >>> import numpy as np
   >>> # Generate a 10x10 array
   >>> data = np.arange(100).reshape(10, 10)
   >>> # Create a reader that generates 5x5 patches with a stride of 2 in the
   >>> # first dimension and 5 in the second dimension.
   >>> reader = PatchedArrayReader(
   >>>    data,
   >>>    data_shape=(5, 5),
   >>>    stride=(2, 5),
   >>> )
   >>> # Printing the number of patches that can be extracted from the data
   >>> print(len(reader))
   6
   >>> # Printing the indices of the patches
   >>> print(reader.indices)
   [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)]
   >>> # Fetch the first patch and print its shape
   >>> print(reader[0].shape)
   (5, 5)
   >>> # Fetch the third patch and print its content
   >>> print(reader[2])
   [[20 21 22 23 24]
    [30 31 32 33 34]
    [40 41 42 43 44]
    [50 51 52 53 54]
    [60 61 62 63 64]]
   ```



   .. py:method:: __getitem__(idx)

      Fetch a patch from the data array.

      Parameters
      ----------
      idx : int
          The index of the patch to be fetched.

      Returns
      -------
      np.ndarray
          The patch that was fetched from the data array with shape
          `data_shape`



   .. py:method:: __len__()

      Return the number of patches that can be extracted from the data
      array.

      Returns
      -------
      int
          The number of patches that can be extracted from the data array.



   .. py:method:: __str__()


   .. py:method:: _get_patches()

      Compute the left upper corner indices of the patches that will be
      extracted from the data array. The patches are extracted with a stride
      between them. A list of indices is returned, where each index is a tuple
      of integers representing the coordinates of the left upper corner of the
      patches.

      Returns
      -------
      List[Tuple[int, ...]]
          A list of indices (coordinates) representing the left upper corner
          of the patches.



   .. py:method:: _pad_data(pad_width, mode='constant', **kwargs)

      Apply padding to the data array.

      Parameters
      ----------
      pad_width : Tuple[Tuple[int, int], ...]
          The width of padding to be applied to the data array
      mode : str, optional
          The padding mode, by default "constant"



   .. py:attribute:: data


   .. py:attribute:: data_shape


   .. py:attribute:: indices
      :value: []



   .. py:attribute:: pad_kwargs


   .. py:attribute:: pad_mode
      :value: 'constant'



   .. py:attribute:: pad_width
      :value: None



   .. py:attribute:: shape


   .. py:attribute:: stride


.. py:class:: PatchedMDIOReader(*args, path, mdio_data = 'data/chunked_012', **kwargs)

   Bases: :py:obj:`minerva.data.readers.zarr_reader.PatchedZarrReader`


   Reads patches from a MDIO array. This class is a subclass of
   `PatchedZarrReader` and is designed to read patches from the the
   data Zarr array inside thre MDIO array subdirectory.

   Reads patches from a MDIO array. This class is a subclass of
   `PatchedZarrReader`. All other parameters are the same as parent class.
   Please refer to the documentation of `PatchedZarrReader` for more
   information.

   Parameters
   ----------
   path : PathLike
       Path to the MDIO array.
   mdio_data: str
       Name of data array inside MDIO subdirectory. Defaults to 'data/chunked_012'
   Notes
   -----
   1.  The MDIO array is expected to be stored on disk. If the array is not
       stored on disk, it is recommended to use the `PatchedArrayReader`
       class instead.

   2.  When using padding, the padding is applied to the entire array. This
       will load the entire array into memory. If the array is too large to
       fit into memory, it is recommended to pad before.
       See `LazyPaddedPatchedMDIOReader` for an a lazy alternative.

   Examples
   ---------

   ```python
   >>> from pathlib import Path
   >>> data_path = Path("data.mdio")
   >>> reader = PatchedMDIOReader(
   ...     path=data_path,
   ...     data_shape=(5, 5),
   ...     stride=(2, 5),
   ... )
   >>> print(len(reader))
   >>> print(reader[0])



   .. py:attribute:: mdio_path


.. py:class:: PatchedZarrReader(*args, path, **kwargs)

   Bases: :py:obj:`minerva.data.readers.patched_array_reader.PatchedArrayReader`


   Reads patches from a Zarr array. This class is a subclass of
   `PatchedArrayReader` and is designed to read patches from a Zarr array.

   Reads patches from a Zarr array. This class is a subclass of
   `PatchedArrayReader`. All other parameters are the same as parent class.
   Please refer to the documentation of `PatchedArrayReader` for more
   information.

   Parameters
   ----------
   path : PathLike
       Path to the Zarr array.
   Notes
   -----
   1.  The Zarr array is expected to be stored on disk. If the array is not
       stored on disk, it is recommended to use the `PatchedArrayReader`
       class instead.

   2.  When using padding, the padding is applied to the entire array. This
       will load the entire array into memory. If the array is too large to
       fit into memory, it is recommended to pad before.
       See `LazyPaddedPatchedZarrReader` for an a lazy alternative.

   Examples
   ---------

   ```python
   >>> from pathlib import Path
   >>> data_path = Path("data.zarr")
   >>> reader = PatchedZarrReader(
   ...     path=data_path,
   ...     data_shape=(5, 5),
   ...     stride=(2, 5),
   ... )
   >>> print(len(reader))
   >>> print(reader[0])



   .. py:attribute:: path


.. py:class:: TiffReader(path, sort_method = None, delimiter = None, key_index = 0, reverse = False)

   Bases: :py:obj:`minerva.data.readers.base_file_iterator.BaseFileIterator`


   A base class for iterating over files in a directory in a custom sorted 
   order.

   Load image files from a directory.

   Parameters
   ----------
   path : Union[Path, str]
       The path to the directory containing the image files. Files will be
       searched recursively.
   sort_method : Optional[List[str]], optional
       A list specifying how to sort each part of the filename. Each
       element can  be either "text" (lexicographical) or "numeric"
       (numerically). By default, None, which will use "numeric" if
       numeric parts are detected.
   delimiter : Optional[str], optional
       The delimiter to split filenames into components, by default None.
   key_index : Union[int, List[int]], optional
       The index (or list of indices) of the part(s) of the filename to
       use  for sorting. If a list is provided, files will be sorted
       based on  multiple parts in sequence. Thus, first by the part at
       index 0, then by the part at index 1, and so on. By default 0.
   reverse : bool, optional
       Whether to sort in reverse order, by default False.

   Raises
   ------
   NotADirectoryError
       If the path is not a directory.


   .. py:method:: __getitem__(index)

      Retrieve the TIFF file at the specified index.



   .. py:method:: __str__()


   .. py:attribute:: root_dir


.. py:class:: _Reader

   Base class for readers. Readers define an ordered collection of data and
   provide methods to access it. This class primarily handles:

   1. Definition of data structure and storage.
   2. Reading data from the source.

   The access is handled by the __getitem__ and __len__ methods, which should be
   implemented by a subclass. Readers usually returns a single item at a time,
   that can be a single image, a single label, etc.


   .. py:method:: __getitem__(index)
      :abstractmethod:


      Retrieve an item from the reader at the specified index.

      Parameters
      ----------
      index : int
          Index of the item to retrieve.

      Returns
      -------
      Any
          An item from the reader.



   .. py:method:: __len__()
      :abstractmethod:


      Get the length of the reader.

      Returns
      -------
      int
          The length of the reader.