minerva.data.data_modules.har_rodrigues_24¶

Classes¶

HARDataModuleCPC

A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is

Module Contents¶

class minerva.data.data_modules.har_rodrigues_24.HARDataModuleCPC(data_path, input_size=6, window=60, overlap=30, batch_size=64, use_train_as_val=False, use_val_with_train=True, columns=None, num_workers=8, drop_last=True, label='standard activity code', transpose_data=True)[source]¶

Bases: lightning.LightningDataModule

A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is consistent data splits, data preparation and transforms across models.

Example:

import lightning as L
import torch.utils.data as data
from lightning.pytorch.demos.boring_classes import RandomDataset

class MyDataModule(L.LightningDataModule):
    def prepare_data(self):
        # download, IO, etc. Useful with shared filesystems
        # only called on 1 GPU/TPU in distributed
        ...

    def setup(self, stage):
        # make assignments here (val/train/test split)
        # called on every process in DDP
        dataset = RandomDataset(1, 100)
        self.train, self.val, self.test = data.random_split(
            dataset, [80, 10, 10], generator=torch.Generator().manual_seed(42)
        )

    def train_dataloader(self):
        return data.DataLoader(self.train)

    def val_dataloader(self):
        return data.DataLoader(self.val)

    def test_dataloader(self):
        return data.DataLoader(self.test)

    def on_exception(self, exception):
        # clean up state after the trainer faced an exception
        ...

    def teardown(self):
        # clean up state after the trainer stops, delete files...
        # called on every process in DDP
        ...

Data module for Human Activity Recognition (HAR) using CPC.

This class handles the creation of training, validation, and test dataloaders for the HAR dataset. It uses the HARDatasetCPC class to load the data.

Parameters¶

data_pathUnion[PathLike, List[PathLike]]: The root directory where the dataset is stored. If a list is the datasets will be concatenated, in their respective order, to each partition (train, val, test).
input_sizeint, optional: The number of input features (default is 6).
windowint, optional: The size of the sliding window (default is 60).
overlapint, optional: The overlap size for the sliding window (default is 30).
batch_sizeint, optional: The batch size for the dataloaders (default is 64).
use_val_with_trainbool: Whether to use the training set with validation set togheter.
labelOptional[str]: The column to be used as the label. If None, no labels will be used. If ‘return_index_as_label’, the index of the data will be used as the label.
transpose_databool: If True, the data will be returned as a vector of shape (C, T), else the data will be returned as a vector of shape (T, C).

__repr__()[source]¶

batch_size = 64¶

data_path¶

drop_last = True¶

label = 'standard activity code'¶

num_workers = 8¶

test_dataloader()[source]¶

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

test()
prepare_data()
setup()

Note:: Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Note:: If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

test_dataset¶

train_dataloader()[source]¶

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

fit()
prepare_data()
setup()

Note:: Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

train_dataset¶

transpose_data = True¶

val_dataloader()[source]¶

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

fit()
validate()
prepare_data()
setup()

Note:: Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Note:: If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

val_dataset¶

Parameters:

data_path (Union[minerva.utils.typing.PathLike, List[minerva.utils.typing.PathLike]])
input_size (int)
window (int)
overlap (int)
batch_size (int)
use_train_as_val (bool)
use_val_with_train (bool)
columns (Optional[List[str]])
num_workers (int)
drop_last (bool)
label (Optional[str])
transpose_data (bool)