minerva.data.readers
Submodules
Classes
This class loads a PNG file from a directory. It assumes that the PNG |
|
This class is used to read data from a NumPy array. It is designed to generate |
|
Reads patches from a Zarr array. This class is a subclass of |
|
This class loads a TIFF file from a directory. It assumes that the TIFF |
|
Base class for readers. Readers define an ordered collection of data and |
Package Contents
- class minerva.data.readers.PNGReader(path)
Bases:
minerva.data.readers.reader._Reader
This class loads a PNG file from a directory. It assumes that the PNG files are named with a number as the filename, starting from 0. This is shown below.
` /path/ ├── 0.png ├── 1.png ├── 2.png └── ... `
Thus, the element at index i will be the file i.png.
This class loads a PNG file from a directory.
Parameters
- pathUnion[Path, str]
Path to the directory containing the PNG files.
- __getitem__(index)
Retrieve the PNG file at the specified index. The index will be used as the filename of the PNG file.
Parameters
- indexint
Index of the PNG file to retrieve.
Returns
- np.ndarray
The PNG file as a NumPy array.
Raises
- ValueError
If the specified file does not exist in the given path.
- Parameters:
index (int)
- Return type:
numpy.ndarray
- Parameters:
path (Union[pathlib.Path, str])
- class minerva.data.readers.PatchedArrayReader(data, data_shape, stride=None, pad_width=None, pad_mode='constant', pad_kwargs=None)
Bases:
minerva.data.readers.reader._Reader
This class is used to read data from a NumPy array. It is designed to generate patches from the data and provides sequential access to them. This class can serve as a base class for other readers.
Assumptions: - The input data is expected to be a NumPy-like array, that is, it should
support NumPy-like indexing.
Patches are fixed-size subarrays of the data.
Patches can have overlap between them.
Reads data from a NumPy array and generates patches from it.
Parameters
- dataArrayLike
The input array from which patches are generated.
- data_shapeTuple[int, …]
The shape of the patches to be extracted. This will be the shape of the subarray that is returned when a patch is accessed using __getitem__.
- strideTuple[int, …], optional
The stride between consecutive patches. If None, the stide will be the same as data_shape. By default None
- pad_widthTuple[Tuple[int, int], …], optional
The width of padding to be applied to the data array. By default None, that is, no padding is applied. Check the documentation of numpy.pad for more information.
- pad_modestr, optional
The padding mode, by default “constant”. Check the documentation of numpy.pad for more information.
- pad_kwargsdict, optional
Additional keyword arguments for padding, by default None
Examples
```python >>> import numpy as np >>> # Generate a 10x10 array >>> data = np.arange(100).reshape(10, 10) >>> # Create a reader that generates 5x5 patches with a stride of 2 in the >>> # first dimension and 5 in the second dimension. >>> reader = PatchedArrayReader( >>> data, >>> data_shape=(5, 5), >>> stride=(2, 5), >>> ) >>> # Printing the number of patches that can be extracted from the data >>> print(len(reader)) 6 >>> # Printing the indices of the patches >>> print(reader.indices) [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)] >>> # Fetch the first patch and print its shape >>> print(reader[0].shape) (5, 5) >>> # Fetch the third patch and print its content >>> print(reader[2]) [[20 21 22 23 24]
[30 31 32 33 34] [40 41 42 43 44] [50 51 52 53 54] [60 61 62 63 64]]
- __getitem__(idx)
Fetch a patch from the data array.
Parameters
- idxint
The index of the patch to be fetched.
Returns
- np.ndarray
The patch that was fetched from the data array with shape data_shape
- Parameters:
idx (int)
- Return type:
numpy.ndarray
- __len__()
Return the number of patches that can be extracted from the data array.
Returns
- int
The number of patches that can be extracted from the data array.
- Return type:
int
- _get_patches()
Compute the left upper corner indices of the patches that will be extracted from the data array. The patches are extracted with a stride between them. A list of indices is returned, where each index is a tuple of integers representing the coordinates of the left upper corner of the patches.
Returns
- List[Tuple[int, …]]
A list of indices (coordinates) representing the left upper corner of the patches.
- Return type:
List[Tuple[int, Ellipsis]]
- Parameters:
data (numpy.typing.ArrayLike)
data_shape (Tuple[int, Ellipsis])
stride (Tuple[int, Ellipsis])
pad_width (Tuple[Tuple[int, int], Ellipsis])
pad_mode (str)
pad_kwargs (dict)
- class minerva.data.readers.PatchedZarrReader(*args, path, **kwargs)
Bases:
minerva.data.readers.patched_array_reader.PatchedArrayReader
Reads patches from a Zarr array. This class is a subclass of PatchedArrayReader and is designed to read patches from a Zarr array.
Reads patches from a Zarr array. This class is a subclass of PatchedArrayReader. All other parameters are the same as parent class. Please refer to the documentation of PatchedArrayReader for more information.
Parameters
- pathPathLike
Path to the Zarr array.
Notes
The Zarr array is expected to be stored on disk. If the array is not stored on disk, it is recommended to use the PatchedArrayReader class instead.
When using padding, the padding is applied to the entire array. This will load the entire array into memory. If the array is too large to fit into memory, it is recommended to pad before.
Examples
```python >>> from pathlib import Path >>> data_path = Path(“data.zarr”) >>> reader = PatchedZarrReader( … path=data_path, … data_shape=(5, 5), … stride=(2, 5), … ) >>> print(len(reader)) >>> print(reader[0])
- Parameters:
path (minerva.utils.typing.PathLike)
- class minerva.data.readers.TiffReader(path)
Bases:
minerva.data.readers.reader._Reader
This class loads a TIFF file from a directory. It assumes that the TIFF files are named with a number as the filename, starting from 0. This is shown below.
` /path/ ├── 0.tiff ├── 1.tiff ├── 2.tiff └── ... `
Thus, the element at index i will be the file i.tiff.
- Parameters:
path (str)
- __getitem__(index)
Retrieve the TIFF file at the specified index. The index will be used as the filename of the TIFF file.
Parameters
- indexint
Index of the TIFF file to retrieve.
Returns
- np.ndarray
The TIFF file as a NumPy array.
Raises
- ValueError
If the specified file does not exist in the given path.
- Parameters:
index (Union[int, slice])
- Return type:
numpy.ndarray
- class minerva.data.readers._Reader
Base class for readers. Readers define an ordered collection of data and provide methods to access it. This class primarily handles:
Definition of data structure and storage.
Reading data from the source.
The access is handled by the __getitem__ and __len__ methods, which should be implemented by a subclass. Readers usually returns a single item at a time, that can be a single image, a single label, etc.