minerva.data.datasets.binary_tree_subset

Classes

BinaryTreeSubset

Subset of a dataset at specified indices.

Functions

build_indices(size, start, end)

Recursively builds a list of size indices that are approximately

Module Contents

class minerva.data.datasets.binary_tree_subset.BinaryTreeSubset(dataset, size)[source]

Bases: torch.utils.data.Subset

Subset of a dataset at specified indices.

Note

When subclassing Subset and overriding __getitem__, you must also override __getitems__ to ensure DataLoader works correctly with your custom logic. If you override only __getitem__, a NotImplementedError will be raised when using DataLoader.

A simple implementation of __getitems__ can delegate to __getitem__:

def __getitems__(self, indices):
    return [self.__getitem__(idx) for idx in indices]

For better performance, consider implementing batch-aware logic in __getitems__ instead of calling __getitem__ multiple times.

Args:

dataset (Dataset): The whole Dataset indices (sequence): Indices in the whole set selected for subset

A subset of a PyTorch Dataset whose elements are selected using a binary tree-style midpoint sampling strategy for approximate even distribution.

This is useful for tasks such as hierarchical sampling or balanced data reduction, where a representative subset of a dataset is desired while preserving some notion of coverage across the index space.

Parameters

datasetDataset

The base dataset from which to create the subset.

sizeint

The number of samples to include in the subset. Must be positive and no greater than the length of the base dataset.

Raises

ValueError

If size is non-positive or exceeds the size of the dataset.

__str__()[source]
Parameters:
  • dataset (torch.utils.data.Dataset)

  • size (int)

minerva.data.datasets.binary_tree_subset.build_indices(size, start, end)[source]

Recursively builds a list of size indices that are approximately evenly distributed across the interval [start, end) using a divide-and-conquer midpoint strategy.

Parameters

sizeint

The number of indices to generate.

startint

The start of the interval (inclusive).

endint

The end of the interval (exclusive).

Returns

List[int]

A list of indices of length size, approximately evenly spaced within the given interval.

Parameters:
  • size (int)

  • start (int)

  • end (int)