minerva.data.readers¶
Submodules¶
- minerva.data.readers.base_file_iterator
- minerva.data.readers.csv_reader
- minerva.data.readers.index_reader
- minerva.data.readers.mdio_reader
- minerva.data.readers.multi_reader
- minerva.data.readers.numpy_reader
- minerva.data.readers.patched_array_reader
- minerva.data.readers.png_reader
- minerva.data.readers.reader
- minerva.data.readers.tabular_reader
- minerva.data.readers.text_reader
- minerva.data.readers.tiff_reader
- minerva.data.readers.zarr_reader
Classes¶
Base class for readers. Readers define an ordered collection of data and |
|
A class that returns the asked index as the item. Useful for some ssl methods and techniques. |
|
Reads patches from a NumPy array. |
|
Reads patches from a MDIO array. This class is a subclass of |
|
Reads patches from a Zarr array. This class is a subclass of |
|
This class is used to read data from a NumPy-like array. It is designed |
|
A base class for iterating over files in a directory in a custom sorted |
|
A base class for iterating over files in a directory in a custom sorted |
|
This class is used to read data from a NumPy-like array. It is designed |
|
Reads patches from a MDIO array. This class is a subclass of |
|
Reads patches from a Zarr array. This class is a subclass of |
|
Base class for readers. Readers define an ordered collection of data and |
|
A base class for iterating over files in a directory in a custom sorted |
|
A base class for iterating over files in a directory in a custom sorted |
|
Base class for readers. Readers define an ordered collection of data and |
Package Contents¶
- class minerva.data.readers.CSVReader(path, columns_to_select, cast_to=None, data_shape=None, reader_kwargs=None)[source]¶
Bases:
minerva.data.readers.tabular_reader.TabularReaderBase class for readers. Readers define an ordered collection of data and provide methods to access it. This class primarily handles:
Definition of data structure and storage.
Reading data from the source.
The access is handled by the __getitem__ and __len__ methods, which should be implemented by a subclass. Readers usually returns a single item at a time, that can be a single image, a single label, etc.
Reader to select columns from a DataFrame and return them as a NumPy array. The DataFrame is indexed by the row number. Each row of the DataFrame is considered as a sample. Thus, the __getitem__ method will return the columns of the DataFrame at the specified index as a NumPy array.
Parameters¶
- dfpd.DataFrame
The DataFrame to select the columns from. The DataFrame should have the columns that are specified in the columns_to_select parameter.
- columns_to_selectUnion[str, list[str]]
A string or a list of strings used to select the columns from the DataFrame. The string can be a regular expression pattern or a column name. The columns that match the pattern will be selected. Note that if columns_to_select is a list, the result is always a numpy array with the columns in the same order as the list. If the columns_to_select is a string, the result is a numpy array if the selected columns are more than one, otherwise it is a single value (which is not a numpy array).
- cast_tostr, optional
Cast the selected columns to the specified data type. If None, the data type of the columns will not be changed. (default is None)
- data_shapetuple[int, …], optional
The shape of the data to be returned. If None, the data will be returned as a 1D array. If provided, the data will be reshaped to the specified shape. (default is None)
- Parameters:
path (Union[minerva.utils.typing.PathLike, pandas.DataFrame])
columns_to_select (Union[str, List[str]])
cast_to (Optional[str])
data_shape (Optional[Tuple[int, Ellipsis]])
reader_kwargs (Optional[dict])
- class minerva.data.readers.IndexReader(len=None)[source]¶
Bases:
minerva.data.readers.reader._ReaderA class that returns the asked index as the item. Useful for some ssl methods and techniques. If you previously have the length of your dataset you can set it on the initialization, otherwise when calling len it will return None. This class does not support slicing, negative indexes or out of range indexes.
- Parameters:
len (Optional[int])
- __getitem__(index)[source]¶
Retrieve an item from the reader at the specified index.
Parameters¶
- indexint
Index of the item to retrieve.
Returns¶
- Any
An item from the reader.
- Parameters:
index (int)
- Return type:
int
- __len__()[source]¶
Get the length of the reader.
Returns¶
- int
The length of the reader.
- Return type:
int
- len = None¶
- class minerva.data.readers.LazyPaddedPatchedArrayReader(data, data_shape, stride=None, pad_width=None, pad_mode='constant', pad_kwargs=None, index_bounds=None)[source]¶
Bases:
PatchedArrayReaderReads patches from a NumPy array. This class is a subclass of PatchedArrayReader and is designed to perform padding only when the patch consumed by __get_item__ is in a region that uses the padding (boundary regions). If no padding is necessary, use PatchedArrayReader.
Reads data from a NumPy array and generates patches from it.
Parameters¶
- dataArrayLike
The input array from which patches are generated.
- data_shapeTuple[int, …]
The shape of the patches to be extracted. This will be the shape of the subarray that is returned when a patch is accessed using __getitem__.
- strideTuple[int, …], optional
The stride between consecutive patches. If None, the stide will be the same as data_shape. By default None
- pad_widthTuple[Tuple[int, int], …], optional
The width of padding to be applied to the data array. By default None, that is, no padding is applied. Check the documentation of numpy.pad for more information.
- pad_modestr, optional
The padding mode, by default “constant”. Check the documentation of numpy.pad for more information.
- pad_kwargsdict, optional
Additional keyword arguments for padding, by default None
- index_boundsTuple[Tuple[int, …], Tuple[int, …]], optional
A tuple of two tuples specifying the start and end indices for each dimension to create a sub-region of the Zarr array. If None, the entire array is used.
Examples¶
```python >>> import numpy as np >>> # Generate a 10x10 array >>> data = np.arange(100).reshape(10, 10) >>> # Create a reader that generates 5x5 patches with a stride of 2 in the >>> # first dimension and 5 in the second dimension. >>> reader = PatchedArrayReader( >>> data, >>> data_shape=(5, 5), >>> stride=(2, 5), >>> ) >>> # Printing the number of patches that can be extracted from the data >>> print(len(reader)) 6 >>> # Printing the indices of the patches >>> print(reader.indices) [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)] >>> # Fetch the first patch and print its shape >>> print(reader[0].shape) (5, 5) >>> # Fetch the third patch and print its content >>> print(reader[2]) [[20 21 22 23 24]
[30 31 32 33 34] [40 41 42 43 44] [50 51 52 53 54] [60 61 62 63 64]]
- __getitem__(idx)[source]¶
Fetch a patch from the data array.
Parameters¶
- idxint
The index of the patch to be fetched.
Returns¶
- np.ndarray
The patch that was fetched from the data array with shape data_shape
- Parameters:
idx (int)
- Return type:
numpy.ndarray
- _get_patches()[source]¶
Compute the left upper corner indices of the patches that will be extracted from the data array. The patches are extracted with a stride between them. A list of indices is returned, where each index is a tuple of integers representing the coordinates of the left upper corner of the patches.
Returns¶
- List[Tuple[int, …]]
A list of indices (coordinates) representing the left upper corner of the patches.
- Return type:
List[Tuple[str, Tuple[int, Ellipsis]]]
- _pad_data(pad_width, mode='constant', **kwargs)[source]¶
Apply padding to the data array.
Parameters¶
- pad_widthTuple[Tuple[int, int], …]
The width of padding to be applied to the data array
- modestr, optional
The padding mode, by default “constant”
- Parameters:
pad_width (Tuple[Tuple[int, int], Ellipsis])
- Parameters:
data (numpy.typing.ArrayLike)
data_shape (Tuple[int, Ellipsis])
stride (Optional[Tuple[int, Ellipsis]])
pad_width (Optional[Tuple[Tuple[int, int], Ellipsis]])
pad_mode (str)
pad_kwargs (Optional[Dict])
index_bounds (Tuple[Tuple[int, Ellipsis], Tuple[int, Ellipsis]])
- class minerva.data.readers.LazyPaddedPatchedMDIOReader(*args, path, mdio_data='data/chunked_012', **kwargs)[source]¶
Bases:
minerva.data.readers.zarr_reader.LazyPaddedPatchedZarrReaderReads patches from a MDIO array. This class is a subclass of LazyPaddedPatchedZarrReader and is designed to read patches from the the data Zarr array inside thre MDIO array subdirectory, performing padding in a lazy manner (padding is done in __getitem__ call). If no padding is necessary, use PatchedMDIOReader.
Reads patches from a MDIO array. This class is a subclass of LazyPaddedPatchedZarrReader. All other parameters are the same as parent class. Please refer to the documentation of LazyPaddedPatchedZarrReader for more information.
Parameters¶
- pathPathLike
Path to the MDIO array.
- mdio_data: str
Name of data array inside MDIO subdirectory. Defaults to ‘data/chunked_012’
Notes¶
The MDIO array is expected to be stored on disk. If the array is not stored on disk, it is recommended to use the LazyPaddedPatchedArrayReader class instead.
Padding is computed when necessary during __getitem__ calls.
Examples¶
```python >>> from pathlib import Path >>> data_path = Path(“data.mdio”) >>> reader = LazyPaddedPatchedMDIOReader( … path=data_path, … data_shape=(5, 5), … stride=(2, 5), … pad_width=((1,1), (0,2)), … ) >>> print(len(reader)) >>> print(reader[0])
- mdio_path¶
- Parameters:
path (minerva.utils.typing.PathLike)
mdio_data (str)
- class minerva.data.readers.LazyPaddedPatchedZarrReader(*args, path, **kwargs)[source]¶
Bases:
minerva.data.readers.patched_array_reader.LazyPaddedPatchedArrayReaderReads patches from a Zarr array. This class is a subclass of LazyPaddedPatchedArrayReader and is designed to read patches from a Zarr array, performing padding in a lazy manner (padding is done in __getitem__ call). If no padding is necessary, use PatchedZarrReader.
Reads patches from a Zarr array. This class is a subclass of LazyPaddedPatchedArrayReader. All other parameters are the same as parent class. Please refer to the documentation of LazyPaddedPatchedArrayReader for more information. This class can be used when padding is necessary and the whole dataset doesn’t fit into memory.
Parameters¶
- pathPathLike
Path to the Zarr array.
Notes¶
The Zarr array is expected to be stored on disk. If the array is not stored on disk, it is recommended to use the LazyPaddedPatchedArrayReader class instead.
Padding is computed when necessary during __getitem__ calls.
Examples¶
```python >>> from pathlib import Path >>> data_path = Path(“data.zarr”) >>> reader = LazyPaddedPatchedZarrReader( … path=data_path, … data_shape=(5, 5), … stride=(2, 5), … pad_width=((1,1), (0,2)), … ) >>> print(len(reader)) >>> print(reader[0])
- path¶
- Parameters:
path (minerva.utils.typing.PathLike)
- class minerva.data.readers.NumpyArrayReader(data, data_shape, stride=None, pad_width=None, pad_mode='constant', pad_kwargs=None, allow_pickle=True, npz_key=None)[source]¶
Bases:
minerva.data.readers.patched_array_reader.PatchedArrayReaderThis class is used to read data from a NumPy-like array. It is designed to generate patches from the data and provides sequential access to them. This class can serve as a base class for other readers.
Assumptions: - The input data is expected to be a NumPy-like array, that is, it should
support NumPy-like indexing.
Patches are fixed-size subarrays of the data.
Patches can have overlap between them.
Reads data from a NumPy array and generates patches from it.
Parameters¶
- dataArrayLike
The input array from which patches are generated.
- data_shapeTuple[int, …]
The shape of the patches to be extracted. This will be the shape of the subarray that is returned when a patch is accessed using __getitem__.
- strideTuple[int, …], optional
The stride between consecutive patches. If None, the stide will be the same as data_shape. By default None
- pad_widthTuple[Tuple[int, int], …], optional
The width of padding to be applied to the data array. By default None, that is, no padding is applied. Check the documentation of numpy.pad for more information.
- pad_modestr, optional
The padding mode, by default “constant”. Check the documentation of numpy.pad for more information.
- pad_kwargsdict, optional
Additional keyword arguments for padding, by default None
- index_boundsTuple[Tuple[int, …], Tuple[int, …]], optional
A tuple of two tuples specifying the start and end indices for each dimension to create a sub-region of the Zarr array. If None, the entire array is used.
Examples¶
```python >>> import numpy as np >>> # Generate a 10x10 array >>> data = np.arange(100).reshape(10, 10) >>> # Create a reader that generates 5x5 patches with a stride of 2 in the >>> # first dimension and 5 in the second dimension. >>> reader = PatchedArrayReader( >>> data, >>> data_shape=(5, 5), >>> stride=(2, 5), >>> ) >>> # Printing the number of patches that can be extracted from the data >>> print(len(reader)) 6 >>> # Printing the indices of the patches >>> print(reader.indices) [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)] >>> # Fetch the first patch and print its shape >>> print(reader[0].shape) (5, 5) >>> # Fetch the third patch and print its content >>> print(reader[2]) [[20 21 22 23 24]
[30 31 32 33 34] [40 41 42 43 44] [50 51 52 53 54] [60 61 62 63 64]]
- Parameters:
data (Union[numpy.typing.ArrayLike, minerva.utils.typing.PathLike])
data_shape (Tuple[int, Ellipsis])
stride (Optional[Tuple[int, Ellipsis]])
pad_width (Optional[Tuple[Tuple[int, int], Ellipsis]])
pad_mode (str)
pad_kwargs (Optional[Dict])
allow_pickle (bool)
npz_key (Optional[str])
- class minerva.data.readers.NumpyFolderReader(path, sort_method=None, delimiter=None, key_index=0, reverse=False, filters=None, allow_pickle=True, array_key=None)[source]¶
Bases:
minerva.data.readers.base_file_iterator.BaseFileIteratorA base class for iterating over files in a directory in a custom sorted order.
Load image files from a directory.
Parameters¶
- pathUnion[Path, str]
The path to the directory containing the image files. Files will be searched recursively.
- sort_methodOptional[List[str]], optional
A list specifying how to sort each part of the filename. Each element can be either “text” (lexicographical) or “numeric” (numerically). By default, None, which will use “numeric” if numeric parts are detected.
- delimiterOptional[str], optional
The delimiter to split filenames into components, by default None.
- key_indexUnion[int, List[int]], optional
The index (or list of indices) of the part(s) of the filename to use for sorting. If a list is provided, files will be sorted based on multiple parts in sequence. Thus, first by the part at index 0, then by the part at index 1, and so on. By default 0.
- reversebool, optional
Whether to sort in reverse order, by default False.
- filters: Optional[Union[List[str], str]]
An optional string or list of strings containing regular expressions with which to filter files by their stems. Files that match at least one pattern are kept, and the others are excluded. Defaults to None, which means no files are excluded.
Raises¶
- NotADirectoryError
If the path is not a directory.
- __getitem__(index)[source]¶
Retrieve the PNG file at the specified index.
- Parameters:
index (int)
- Return type:
numpy.ndarray
- allow_pickle = True¶
- array_key = None¶
- root_dir¶
- Parameters:
path (minerva.utils.typing.PathLike)
sort_method (Optional[List[str]])
delimiter (Optional[str])
key_index (Union[int, List[int]])
reverse (bool)
filters (Optional[Union[List[str], str]])
allow_pickle (bool)
array_key (Optional[str])
- class minerva.data.readers.PNGReader(path, sort_method=None, delimiter=None, key_index=0, reverse=False, filters=None)[source]¶
Bases:
minerva.data.readers.base_file_iterator.BaseFileIteratorA base class for iterating over files in a directory in a custom sorted order.
Load image files from a directory.
Parameters¶
- pathUnion[Path, str]
The path to the directory containing the image files. Files will be searched recursively.
- sort_methodOptional[List[str]], optional
A list specifying how to sort each part of the filename. Each element can be either “text” (lexicographical) or “numeric” (numerically). By default, None, which will use “numeric” if numeric parts are detected.
- delimiterOptional[str], optional
The delimiter to split filenames into components, by default None.
- key_indexUnion[int, List[int]], optional
The index (or list of indices) of the part(s) of the filename to use for sorting. If a list is provided, files will be sorted based on multiple parts in sequence. Thus, first by the part at index 0, then by the part at index 1, and so on. By default 0.
- reversebool, optional
Whether to sort in reverse order, by default False.
- filters: Optional[Union[List[str], str]]
An optional string or list of strings containing regular expressions with which to filter files by their stems. Files that match at least one pattern are kept, and the others are excluded. Defaults to None, which means no files are excluded.
Raises¶
- NotADirectoryError
If the path is not a directory.
- __getitem__(index)[source]¶
Retrieve the PNG file at the specified index.
- Parameters:
index (int)
- Return type:
numpy.ndarray
- root_dir¶
- Parameters:
path (minerva.utils.typing.PathLike)
sort_method (Optional[List[str]])
delimiter (Optional[str])
key_index (Union[int, List[int]])
reverse (bool)
filters (Optional[Union[List[str], str]])
- class minerva.data.readers.PatchedArrayReader(data, data_shape, stride=None, pad_width=None, pad_mode='constant', pad_kwargs=None, index_bounds=None)[source]¶
Bases:
minerva.data.readers.reader._ReaderThis class is used to read data from a NumPy-like array. It is designed to generate patches from the data and provides sequential access to them. This class can serve as a base class for other readers.
Assumptions: - The input data is expected to be a NumPy-like array, that is, it should
support NumPy-like indexing.
Patches are fixed-size subarrays of the data.
Patches can have overlap between them.
Reads data from a NumPy array and generates patches from it.
Parameters¶
- dataArrayLike
The input array from which patches are generated.
- data_shapeTuple[int, …]
The shape of the patches to be extracted. This will be the shape of the subarray that is returned when a patch is accessed using __getitem__.
- strideTuple[int, …], optional
The stride between consecutive patches. If None, the stide will be the same as data_shape. By default None
- pad_widthTuple[Tuple[int, int], …], optional
The width of padding to be applied to the data array. By default None, that is, no padding is applied. Check the documentation of numpy.pad for more information.
- pad_modestr, optional
The padding mode, by default “constant”. Check the documentation of numpy.pad for more information.
- pad_kwargsdict, optional
Additional keyword arguments for padding, by default None
- index_boundsTuple[Tuple[int, …], Tuple[int, …]], optional
A tuple of two tuples specifying the start and end indices for each dimension to create a sub-region of the Zarr array. If None, the entire array is used.
Examples¶
```python >>> import numpy as np >>> # Generate a 10x10 array >>> data = np.arange(100).reshape(10, 10) >>> # Create a reader that generates 5x5 patches with a stride of 2 in the >>> # first dimension and 5 in the second dimension. >>> reader = PatchedArrayReader( >>> data, >>> data_shape=(5, 5), >>> stride=(2, 5), >>> ) >>> # Printing the number of patches that can be extracted from the data >>> print(len(reader)) 6 >>> # Printing the indices of the patches >>> print(reader.indices) [(0, 0), (0, 5), (2, 0), (2, 5), (4, 0), (4, 5)] >>> # Fetch the first patch and print its shape >>> print(reader[0].shape) (5, 5) >>> # Fetch the third patch and print its content >>> print(reader[2]) [[20 21 22 23 24]
[30 31 32 33 34] [40 41 42 43 44] [50 51 52 53 54] [60 61 62 63 64]]
- __getitem__(idx)[source]¶
Fetch a patch from the data array.
Parameters¶
- idxint
The index of the patch to be fetched.
Returns¶
- np.ndarray
The patch that was fetched from the data array with shape data_shape
- Parameters:
idx (int)
- Return type:
numpy.ndarray
- __len__()[source]¶
Return the number of patches that can be extracted from the data array.
Returns¶
- int
The number of patches that can be extracted from the data array.
- Return type:
int
- _get_patches()[source]¶
Compute the left upper corner indices of the patches that will be extracted from the data array. The patches are extracted with a stride between them. A list of indices is returned, where each index is a tuple of integers representing the coordinates of the left upper corner of the patches.
Returns¶
- List[Tuple[int, …]]
A list of indices (coordinates) representing the left upper corner of the patches.
- Return type:
List[Tuple[int, Ellipsis]]
- _pad_data(pad_width, mode='constant', **kwargs)[source]¶
Apply padding to the data array.
Parameters¶
- pad_widthTuple[Tuple[int, int], …]
The width of padding to be applied to the data array
- modestr, optional
The padding mode, by default “constant”
- Parameters:
pad_width (Tuple[Tuple[int, int], Ellipsis])
- data¶
- data_shape¶
- indices = []¶
- pad_kwargs¶
- pad_mode = 'constant'¶
- pad_width = None¶
- shape¶
- stride¶
- Parameters:
data (numpy.typing.ArrayLike)
data_shape (Tuple[int, Ellipsis])
stride (Optional[Tuple[int, Ellipsis]])
pad_width (Optional[Tuple[Tuple[int, int], Ellipsis]])
pad_mode (str)
pad_kwargs (Optional[Dict])
index_bounds (Tuple[Tuple[int, Ellipsis], Tuple[int, Ellipsis]])
- class minerva.data.readers.PatchedMDIOReader(*args, path, mdio_data='data/chunked_012', **kwargs)[source]¶
Bases:
minerva.data.readers.zarr_reader.PatchedZarrReaderReads patches from a MDIO array. This class is a subclass of PatchedZarrReader and is designed to read patches from the the data Zarr array inside thre MDIO array subdirectory.
Reads patches from a MDIO array. This class is a subclass of PatchedZarrReader. All other parameters are the same as parent class. Please refer to the documentation of PatchedZarrReader for more information.
Parameters¶
- pathPathLike
Path to the MDIO array.
- mdio_data: str
Name of data array inside MDIO subdirectory. Defaults to ‘data/chunked_012’
Notes¶
The MDIO array is expected to be stored on disk. If the array is not stored on disk, it is recommended to use the PatchedArrayReader class instead.
When using padding, the padding is applied to the entire array. This will load the entire array into memory. If the array is too large to fit into memory, it is recommended to pad before. See LazyPaddedPatchedMDIOReader for an a lazy alternative.
Examples¶
```python >>> from pathlib import Path >>> data_path = Path(“data.mdio”) >>> reader = PatchedMDIOReader( … path=data_path, … data_shape=(5, 5), … stride=(2, 5), … ) >>> print(len(reader)) >>> print(reader[0])
- mdio_path¶
- Parameters:
path (minerva.utils.typing.PathLike)
mdio_data (str)
- class minerva.data.readers.PatchedZarrReader(*args, path, **kwargs)[source]¶
Bases:
minerva.data.readers.patched_array_reader.PatchedArrayReaderReads patches from a Zarr array. This class is a subclass of PatchedArrayReader and is designed to read patches from a Zarr array.
Reads patches from a Zarr array. This class is a subclass of PatchedArrayReader. All other parameters are the same as parent class. Please refer to the documentation of PatchedArrayReader for more information.
Parameters¶
- pathPathLike
Path to the Zarr array.
Notes¶
The Zarr array is expected to be stored on disk. If the array is not stored on disk, it is recommended to use the PatchedArrayReader class instead.
When using padding, the padding is applied to the entire array. This will load the entire array into memory. If the array is too large to fit into memory, it is recommended to pad before. See LazyPaddedPatchedZarrReader for an a lazy alternative.
Examples¶
```python >>> from pathlib import Path >>> data_path = Path(“data.zarr”) >>> reader = PatchedZarrReader( … path=data_path, … data_shape=(5, 5), … stride=(2, 5), … ) >>> print(len(reader)) >>> print(reader[0])
- path¶
- Parameters:
path (minerva.utils.typing.PathLike)
- class minerva.data.readers.TabularReader(df, columns_to_select, cast_to=None, data_shape=None)[source]¶
Bases:
minerva.data.readers.reader._ReaderBase class for readers. Readers define an ordered collection of data and provide methods to access it. This class primarily handles:
Definition of data structure and storage.
Reading data from the source.
The access is handled by the __getitem__ and __len__ methods, which should be implemented by a subclass. Readers usually returns a single item at a time, that can be a single image, a single label, etc.
Reader to select columns from a DataFrame and return them as a NumPy array. The DataFrame is indexed by the row number. Each row of the DataFrame is considered as a sample. Thus, the __getitem__ method will return the columns of the DataFrame at the specified index as a NumPy array.
Parameters¶
- dfpd.DataFrame
The DataFrame to select the columns from. The DataFrame should have the columns that are specified in the columns_to_select parameter.
- columns_to_selectUnion[str, list[str]]
A string or a list of strings used to select the columns from the DataFrame. The string can be a regular expression pattern or a column name. The columns that match the pattern will be selected. Note that if columns_to_select is a list, the result is always a numpy array with the columns in the same order as the list. If the columns_to_select is a string, the result is a numpy array if the selected columns are more than one, otherwise it is a single value (which is not a numpy array).
- cast_tostr, optional
Cast the selected columns to the specified data type. If None, the data type of the columns will not be changed. (default is None)
- data_shapetuple[int, …], optional
The shape of the data to be returned. If None, the data will be returned as a 1D array. If provided, the data will be reshaped to the specified shape. (default is None)
- __getitem__(index)[source]¶
Return the columns of the DataFrame at the specified row index as a NumPy array. The columns are selected based on the self.columns_to_select.
Parameters¶
- indexint
The row index to select the columns from the DataFrame.
Returns¶
- np.ndarray
The selected columns from the row as a NumPy array.
- Parameters:
index (int)
- Return type:
numpy.ndarray
- __len__()[source]¶
Return the number of samples in the DataFrame. The number of samples is equal to the number of rows in the DataFrame.
Returns¶
- int
The number of samples in the DataFrame.
- Return type:
int
- __str__()[source]¶
Return a string representation of the TabularReader object.
Returns¶
- str
A string representation of the TabularReader object.
- Return type:
str
- cast_to = None¶
- columns_to_select¶
- data_shape = None¶
- df¶
- return_single = False¶
- Parameters:
df (pandas.DataFrame)
columns_to_select (Union[str, List[str]])
cast_to (Optional[str])
data_shape (Optional[Tuple[int, Ellipsis]])
- class minerva.data.readers.TextReader(path, sort_method=None, delimiter=None, key_index=0, reverse=False, filters=None)[source]¶
Bases:
minerva.data.readers.base_file_iterator.BaseFileIteratorA base class for iterating over files in a directory in a custom sorted order.
Load image files from a directory.
Parameters¶
- pathUnion[Path, str]
The path to the directory containing the image files. Files will be searched recursively.
- sort_methodOptional[List[str]], optional
A list specifying how to sort each part of the filename. Each element can be either “text” (lexicographical) or “numeric” (numerically). By default, None, which will use “numeric” if numeric parts are detected.
- delimiterOptional[str], optional
The delimiter to split filenames into components, by default None.
- key_indexUnion[int, List[int]], optional
The index (or list of indices) of the part(s) of the filename to use for sorting. If a list is provided, files will be sorted based on multiple parts in sequence. Thus, first by the part at index 0, then by the part at index 1, and so on. By default 0.
- reversebool, optional
Whether to sort in reverse order, by default False.
- filters: Optional[Union[List[str], str]]
An optional string or list of strings containing regular expressions with which to filter files by their stems. Files that match at least one pattern are kept, and the others are excluded. Defaults to None, which means no files are excluded.
Raises¶
- NotADirectoryError
If the path is not a directory.
- __getitem__(index)[source]¶
Retrieve the TIFF file at the specified index.
- Parameters:
index (int)
- Return type:
numpy.ndarray
- root_dir¶
- Parameters:
path (minerva.utils.typing.PathLike)
sort_method (Optional[List[str]])
delimiter (Optional[str])
key_index (Union[int, List[int]])
reverse (bool)
filters (Optional[Union[List[str], str]])
- class minerva.data.readers.TiffReader(path, sort_method=None, delimiter=None, key_index=0, reverse=False, filters=None)[source]¶
Bases:
minerva.data.readers.base_file_iterator.BaseFileIteratorA base class for iterating over files in a directory in a custom sorted order.
Load image files from a directory.
Parameters¶
- pathUnion[Path, str]
The path to the directory containing the image files. Files will be searched recursively.
- sort_methodOptional[List[str]], optional
A list specifying how to sort each part of the filename. Each element can be either “text” (lexicographical) or “numeric” (numerically). By default, None, which will use “numeric” if numeric parts are detected.
- delimiterOptional[str], optional
The delimiter to split filenames into components, by default None.
- key_indexUnion[int, List[int]], optional
The index (or list of indices) of the part(s) of the filename to use for sorting. If a list is provided, files will be sorted based on multiple parts in sequence. Thus, first by the part at index 0, then by the part at index 1, and so on. By default 0.
- reversebool, optional
Whether to sort in reverse order, by default False.
- filters: Optional[Union[List[str], str]]
An optional string or list of strings containing regular expressions with which to filter files by their stems. Files that match at least one pattern are kept, and the others are excluded. Defaults to None, which means no files are excluded.
Raises¶
- NotADirectoryError
If the path is not a directory.
- __getitem__(index)[source]¶
Retrieve the TIFF file at the specified index.
- Parameters:
index (int)
- Return type:
numpy.ndarray
- root_dir¶
- Parameters:
path (minerva.utils.typing.PathLike)
sort_method (Optional[List[str]])
delimiter (Optional[str])
key_index (Union[int, List[int]])
reverse (bool)
filters (Optional[Union[List[str], str]])
- class minerva.data.readers._Reader[source]¶
Base class for readers. Readers define an ordered collection of data and provide methods to access it. This class primarily handles:
Definition of data structure and storage.
Reading data from the source.
The access is handled by the __getitem__ and __len__ methods, which should be implemented by a subclass. Readers usually returns a single item at a time, that can be a single image, a single label, etc.