minerva.data.data_modules.har_rodrigues_24

Classes

HARDataModuleCPC

A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is

Module Contents

class minerva.data.data_modules.har_rodrigues_24.HARDataModuleCPC(data_path, input_size=6, window=60, overlap=30, batch_size=64, use_train_as_val=False, use_val_with_train=True, columns=None, num_workers=8, drop_last=True, label='standard activity code', transpose_data=True)[source]

Bases: lightning.LightningDataModule

A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is consistent data splits, data preparation and transforms across models.

Example:

import lightning as L
import torch.utils.data as data
from lightning.pytorch.demos.boring_classes import RandomDataset

class MyDataModule(L.LightningDataModule):
    def prepare_data(self):
        # download, IO, etc. Useful with shared filesystems
        # only called on 1 GPU/TPU in distributed
        ...

    def setup(self, stage):
        # make assignments here (val/train/test split)
        # called on every process in DDP
        dataset = RandomDataset(1, 100)
        self.train, self.val, self.test = data.random_split(
            dataset, [80, 10, 10], generator=torch.Generator().manual_seed(42)
        )

    def train_dataloader(self):
        return data.DataLoader(self.train)

    def val_dataloader(self):
        return data.DataLoader(self.val)

    def test_dataloader(self):
        return data.DataLoader(self.test)

    def on_exception(self, exception):
        # clean up state after the trainer faced an exception
        ...

    def teardown(self):
        # clean up state after the trainer stops, delete files...
        # called on every process in DDP
        ...

Data module for Human Activity Recognition (HAR) using CPC.

This class handles the creation of training, validation, and test dataloaders for the HAR dataset. It uses the HARDatasetCPC class to load the data.

Parameters

data_pathUnion[PathLike, List[PathLike]]

The root directory where the dataset is stored. If a list is the datasets will be concatenated, in their respective order, to each partition (train, val, test).

input_sizeint, optional

The number of input features (default is 6).

windowint, optional

The size of the sliding window (default is 60).

overlapint, optional

The overlap size for the sliding window (default is 30).

batch_sizeint, optional

The batch size for the dataloaders (default is 64).

use_val_with_trainbool

Whether to use the training set with validation set togheter.

labelOptional[str]

The column to be used as the label. If None, no labels will be used. If ‘return_index_as_label’, the index of the data will be used as the label.

transpose_databool

If True, the data will be returned as a vector of shape (C, T), else the data will be returned as a vector of shape (T, C).

__repr__()[source]
batch_size = 64
data_path
drop_last = True
label = 'standard activity code'
num_workers = 8
test_dataloader()[source]

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

  • test()

  • prepare_data()

  • setup()

Note:

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note:

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

test_dataset
train_dataloader()[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

  • fit()

  • prepare_data()

  • setup()

Note:

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

train_dataset
transpose_data = True
val_dataloader()[source]

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

  • fit()

  • validate()

  • prepare_data()

  • setup()

Note:

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Note:

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

val_dataset
Parameters:
  • data_path (Union[minerva.utils.typing.PathLike, List[minerva.utils.typing.PathLike]])

  • input_size (int)

  • window (int)

  • overlap (int)

  • batch_size (int)

  • use_train_as_val (bool)

  • use_val_with_train (bool)

  • columns (Optional[List[str]])

  • num_workers (int)

  • drop_last (bool)

  • label (Optional[str])

  • transpose_data (bool)