minerva.data.datasets.binary_tree_subset ======================================== .. py:module:: minerva.data.datasets.binary_tree_subset Classes ------- .. autoapisummary:: minerva.data.datasets.binary_tree_subset.BinaryTreeSubset Functions --------- .. autoapisummary:: minerva.data.datasets.binary_tree_subset.build_indices Module Contents --------------- .. py:class:: BinaryTreeSubset(dataset, size) Bases: :py:obj:`torch.utils.data.Subset` Subset of a dataset at specified indices. .. note:: When subclassing `Subset` and overriding `__getitem__`, you **must** also override `__getitems__` to ensure `DataLoader` works correctly with your custom logic. If you override only `__getitem__`, a `NotImplementedError` will be raised when using `DataLoader`. A simple implementation of `__getitems__` can delegate to `__getitem__`: .. code-block:: python def __getitems__(self, indices): return [self.__getitem__(idx) for idx in indices] For better performance, consider implementing batch-aware logic in `__getitems__` instead of calling `__getitem__` multiple times. Args: dataset (Dataset): The whole Dataset indices (sequence): Indices in the whole set selected for subset A subset of a PyTorch Dataset whose elements are selected using a binary tree-style midpoint sampling strategy for approximate even distribution. This is useful for tasks such as hierarchical sampling or balanced data reduction, where a representative subset of a dataset is desired while preserving some notion of coverage across the index space. Parameters ---------- dataset : Dataset The base dataset from which to create the subset. size : int The number of samples to include in the subset. Must be positive and no greater than the length of the base dataset. Raises ------ ValueError If `size` is non-positive or exceeds the size of the dataset. .. py:method:: __str__() .. py:function:: build_indices(size, start, end) Recursively builds a list of `size` indices that are approximately evenly distributed across the interval [start, end) using a divide-and-conquer midpoint strategy. Parameters ---------- size : int The number of indices to generate. start : int The start of the interval (inclusive). end : int The end of the interval (exclusive). Returns ------- List[int] A list of indices of length `size`, approximately evenly spaced within the given interval.