minerva.data.datasets.binary_tree_subset¶
Classes¶
Subset of a dataset at specified indices. |
Functions¶
|
Recursively builds a list of size indices that are approximately |
Module Contents¶
- class minerva.data.datasets.binary_tree_subset.BinaryTreeSubset(dataset, size)[source]¶
Bases:
torch.utils.data.SubsetSubset of a dataset at specified indices.
Note
When subclassing Subset and overriding __getitem__, you must also override __getitems__ to ensure DataLoader works correctly with your custom logic. If you override only __getitem__, a NotImplementedError will be raised when using DataLoader.
A simple implementation of __getitems__ can delegate to __getitem__:
def __getitems__(self, indices): return [self.__getitem__(idx) for idx in indices]
For better performance, consider implementing batch-aware logic in __getitems__ instead of calling __getitem__ multiple times.
- Args:
dataset (Dataset): The whole Dataset indices (sequence): Indices in the whole set selected for subset
A subset of a PyTorch Dataset whose elements are selected using a binary tree-style midpoint sampling strategy for approximate even distribution.
This is useful for tasks such as hierarchical sampling or balanced data reduction, where a representative subset of a dataset is desired while preserving some notion of coverage across the index space.
Parameters¶
- datasetDataset
The base dataset from which to create the subset.
- sizeint
The number of samples to include in the subset. Must be positive and no greater than the length of the base dataset.
Raises¶
- ValueError
If size is non-positive or exceeds the size of the dataset.
- Parameters:
dataset (torch.utils.data.Dataset)
size (int)
- minerva.data.datasets.binary_tree_subset.build_indices(size, start, end)[source]¶
Recursively builds a list of size indices that are approximately evenly distributed across the interval [start, end) using a divide-and-conquer midpoint strategy.
Parameters¶
- sizeint
The number of indices to generate.
- startint
The start of the interval (inclusive).
- endint
The end of the interval (exclusive).
Returns¶
- List[int]
A list of indices of length size, approximately evenly spaced within the given interval.
- Parameters:
size (int)
start (int)
end (int)