Getting Started with Minerva for Human Activity Recognition

Human Activity Recognition (HAR) is a challenging task that involves identifying actions performed by individuals based on sensor data—typically time-series signals from accelerometers and gyroscopes.

In this notebook, we will use Minerva to train and evaluate an 1D ResNet-SE model from scratch for classifying human activities using DAGHAR Dataset.

Thus, this notebook is a step-by-step guide to get you started with Minerva for HAR. It comprehends the following steps:

  1. Data Preparation

  2. Model Creation

  3. Model Training

  4. Model Evaluation

Imports

[2]:
from pathlib import Path

import lightning as L
import torch
from torchmetrics import Accuracy

from minerva.data.datasets.series_dataset import MultiModalSeriesCSVDataset
from minerva.data.data_modules.base import MinervaDataModule
from minerva.models.nets.time_series.resnet import ResNetSE1D_5

1. Data Preparation

We begin by preparing the data required for training and evaluation.

For this tutorial, we will use the standardized view of the DAGHAR Dataset, as introduced in the following paper:

Napoli, O., Duarte, D., Alves, P., Soto, D.H.P., de Oliveira, H.E., Rocha, A., Boccato, L. and Borin, E., 2024.
A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.
Scientific Data, 11(1), p.1192.

This dataset includes time-series data from two tri-axial sensors—an accelerometer and a gyroscope—collected via smartphones. It is organized into six different datasets:

  • KuHar

  • MotionSense

  • RealWorld-waist

  • RealWorld-thigh

  • UCI

  • WISDM

In this notebook, we will work with the standardized view of the MotionSense dataset.

You can download and extract the dataset using the commands below (wget and unzip):

[3]:
!wget https://zenodo.org/records/13987073/files/standardized_view.zip?download=1 -O daghar_standardized_view.zip
!unzip -o daghar_standardized_view.zip -d daghar_standardized_view
--2025-03-22 17:41:07--  https://zenodo.org/records/13987073/files/standardized_view.zip?download=1
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.43.25, 188.185.48.194, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 191454525 (183M) [application/octet-stream]
Saving to: ‘daghar_standardized_view.zip’

daghar_standardized 100%[===================>] 182.58M  5.08MB/s    in 9m 27s

2025-03-22 17:50:35 (330 KB/s) - ‘daghar_standardized_view.zip’ saved [191454525/191454525]

Archive:  daghar_standardized_view.zip
   creating: daghar_standardized_view/standardized_view/
   creating: daghar_standardized_view/standardized_view/KuHar/
  inflating: daghar_standardized_view/standardized_view/KuHar/validation.csv
  inflating: daghar_standardized_view/standardized_view/KuHar/train.csv
  inflating: daghar_standardized_view/standardized_view/KuHar/test.csv
   creating: daghar_standardized_view/standardized_view/MotionSense/
  inflating: daghar_standardized_view/standardized_view/MotionSense/validation.csv
  inflating: daghar_standardized_view/standardized_view/MotionSense/train.csv
  inflating: daghar_standardized_view/standardized_view/MotionSense/test.csv
   creating: daghar_standardized_view/standardized_view/RealWorld_thigh/
  inflating: daghar_standardized_view/standardized_view/RealWorld_thigh/validation.csv
  inflating: daghar_standardized_view/standardized_view/RealWorld_thigh/train.csv
  inflating: daghar_standardized_view/standardized_view/RealWorld_thigh/test.csv
   creating: daghar_standardized_view/standardized_view/RealWorld_waist/
  inflating: daghar_standardized_view/standardized_view/RealWorld_waist/validation.csv
  inflating: daghar_standardized_view/standardized_view/RealWorld_waist/train.csv
  inflating: daghar_standardized_view/standardized_view/RealWorld_waist/test.csv
   creating: daghar_standardized_view/standardized_view/UCI/
  inflating: daghar_standardized_view/standardized_view/UCI/validation.csv
  inflating: daghar_standardized_view/standardized_view/UCI/train.csv
  inflating: daghar_standardized_view/standardized_view/UCI/test.csv
   creating: daghar_standardized_view/standardized_view/WISDM/
  inflating: daghar_standardized_view/standardized_view/WISDM/validation.csv
  inflating: daghar_standardized_view/standardized_view/WISDM/train.csv
  inflating: daghar_standardized_view/standardized_view/WISDM/test.csv

Once extracted, the folder structure will look like this:

daghar_standardized_view/standardized_view
├── KuHar
│   ├── train.csv
│   ├── test.csv
│   └── val.csv
├── MotionSense
│   ├── train.csv
│   ├── test.csv
│   └── val.csv
├── RealWorld_waist
│   ├── train.csv
│   ├── test.csv
│   └── val.csv
├── RealWorld_thigh
│   ├── train.csv
│   ├── test.csv
│   └── val.csv
├── UCI
│   ├── train.csv
│   ├── test.csv
│   └── val.csv
└── WISDM
    ├── train.csv
    ├── test.csv
    └── val.csv

Each dataset is split into train.csv, val.csv, and test.csv files. Each file contains time-series data with the following structure:

Column Range

Description

accel-x-0 to accel-x-59

60 time steps of accelerometer x-axis

accel-y-0 to accel-y-59

60 time steps of accelerometer y-axis

accel-z-0 to accel-z-59

60 time steps of accelerometer z-axis

gyro-x-0 to gyro-x-59

60 time steps of gyroscope x-axis

gyro-y-0 to gyro-y-59

60 time steps of gyroscope y-axis

gyro-z-0 to gyro-z-59

60 time steps of gyroscope z-axis

standard activity code

Encoded activity label

Each row represents one sample, composed of 6 channels (3 from each sensor) and 60 time steps per channel, representing 3 seconds of data at a sampling rate of 20 Hz.

All datasets in DAGHAR share the same structure and label set. The activity codes are mapped as follows:

Standard Activity Code

Activity

0

Sit

1

Stand

2

Walk

3

Stair-up

4

Stair-down

5

Run

1.1. Creating Datasets

Our root level directory is daghar_standardized_view/standardized_view. We will create 3 dataset objects for training, validation, and testing using the MultiModalSeriesCSVDataset class from Minerva.

This class allows us to read any CSV file containing multimodal time-series in the format described above. The parameters required to create the dataset are:

  • data_path: Path to the CSV file

  • feature_prefixes: List of prefixes for the features to be considered. For instance, ['accel-x', 'accel-y', 'accel-z'] will read all columns starting with accel-x, than all columns starting with accel-y, and, finally, all columns starting with accel-z. Then, a single sample will be composed of 3 channels, where first channel will contain all columns starting with accel-x, second channel will contain all columns starting with accel-y, and third channel will contain all columns starting with accel-z. Note that the order of the prefixes in the list will determine the order of the channels in the sample. Also, other prefixes will be ignored.

  • features_as_channels: If True, the features will be treated as channels. If False, the features will be flattened into a single channel.

  • label: Column name of the label.

[3]:
root_data_dir = Path("daghar_standardized_view/standardized_view/MotionSense")

# Create the train dataset
train_dataset = MultiModalSeriesCSVDataset(
    data_path=root_data_dir / "train.csv",
    feature_prefixes=[
        "accel-x-",
        "accel-y",
        "accel-z",
        "gyro-x",
        "gyro-y",
        "gyro-z",
    ],
    features_as_channels=True,
    label="standard activity code",
)

# Create the validation dataset
val_dataset = MultiModalSeriesCSVDataset(
    data_path=root_data_dir / "validation.csv",
    feature_prefixes=[
        "accel-x-",
        "accel-y",
        "accel-z",
        "gyro-x",
        "gyro-y",
        "gyro-z",
    ],
    features_as_channels=True,
    label="standard activity code",
)

# Create the test dataset
test_dataset = MultiModalSeriesCSVDataset(
    data_path=root_data_dir / "test.csv",
    feature_prefixes=[
        "accel-x-",
        "accel-y",
        "accel-z",
        "gyro-x",
        "gyro-y",
        "gyro-z",
    ],
    features_as_channels=True,
    label="standard activity code",
)
[4]:
print(f"Train dataset: {train_dataset}")
print(f"Validation dataset: {val_dataset}")
print(f"Test dataset: {test_dataset}")
Train dataset: MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/train.csv (3558 samples)
Validation dataset: MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/validation.csv (420 samples)
Test dataset: MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/test.csv (1062 samples)

Let’s take a look at the first sample of the training dataset. As each sample has 6 channels and 60 time steps, the shape of the sample will be (6, 60). The label will be a single integer representing the activity code.

[5]:
X, y = train_dataset[0]
print(f"The shape of the input is {X.shape} and the label is {y}")
The shape of the input is (6, 60) and the label is 4

1.2. Creating the MinervaDataModule

Minerva models are implemented using Pytorch Lightning. Thus, to train a model we should create a LightningDataModule object that will handle the data loading and preprocessing. Minerva provides a MinervaDataModule class that extends Pytorch Lightning’s LightningDataModule class and standardizes the data loading process.

We may create a MinervaDataModule object by passing the training, validation, and testing datasets, as well as the batch size and the number of workers for data loading

[6]:
data_module = MinervaDataModule(
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    test_dataset=test_dataset,
    batch_size=64,
    name="MotionSense Dataset",
    num_workers=4
)

print(data_module)
==================================================
               🆔 MotionSense Dataset
==================================================
├── Predict Split: test
└── Dataloader class: <class 'torch.utils.data.dataloader.DataLoader'>
📂 Datasets:
   ├── Train Dataset:
   │      MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/train.csv (3558 samples)
   ├── Val Dataset:
   │      MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/validation.csv (420 samples)
   └── Test Dataset:
          MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/test.csv (1062 samples)

🛠 **Dataloader Configurations:**
   ├── Train Dataloader Kwargs:
         ├── batch_size: 64
         ├── num_workers: 4
         ├── shuffle: true
         ├── drop_last: false
   ├── Val Dataloader Kwargs:
         ├── batch_size: 64
         ├── num_workers: 4
         ├── shuffle: false
         ├── drop_last: false
   └── Test Dataloader Kwargs:
         ├── batch_size: 64
         ├── num_workers: 4
         ├── shuffle: false
         ├── drop_last: false
==================================================

2. Creating the Model

Minerva provides several models to be used in Human Activity Recognition. In this notebook we will use the `ResNetSE1D_5 model <https://ieeexplore.ieee.org/document/9771436>`__ which is a 1D ResNet model with Squeeze-and-Excitation blocks and 5 residual blocks.

To create the model, we just need to call the ResNetSE1D_5 class from Minerva and pass the following parameters:

  • input_shape: Shape of each input samples, in the format (channels, time_steps). In this case, it will be (6, 60).

  • num_classes: Number of classes in the dataset. In this case, it will be 6.

[7]:
model = ResNetSE1D_5(
    input_shape=(6, 60),
    num_classes=6,
)
model
[W322 18:28:16.875825496 NNPACK.cpp:62] Could not initialize NNPACK! Reason: Unsupported hardware.
[7]:
ResNetSE1D_5(
  (backbone): _ResNet1D(
    (conv_block): ConvolutionalBlock(
      (block): Sequential(
        (0): Conv1d(6, 64, kernel_size=(5,), stride=(1,))
        (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
        (3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
    )
    (residual_blocks): Sequential(
      (0): ResNetSEBlock(
        (block): Sequential(
          (0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
          (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
          (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): SqueezeAndExcitation1D(
            (block): Sequential(
              (0): Linear(in_features=64, out_features=32, bias=True)
              (1): ReLU()
              (2): Linear(in_features=32, out_features=64, bias=True)
              (3): Sigmoid()
            )
          )
        )
      )
      (1): ResNetSEBlock(
        (block): Sequential(
          (0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
          (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
          (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): SqueezeAndExcitation1D(
            (block): Sequential(
              (0): Linear(in_features=64, out_features=32, bias=True)
              (1): ReLU()
              (2): Linear(in_features=32, out_features=64, bias=True)
              (3): Sigmoid()
            )
          )
        )
      )
      (2): ResNetSEBlock(
        (block): Sequential(
          (0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
          (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
          (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): SqueezeAndExcitation1D(
            (block): Sequential(
              (0): Linear(in_features=64, out_features=32, bias=True)
              (1): ReLU()
              (2): Linear(in_features=32, out_features=64, bias=True)
              (3): Sigmoid()
            )
          )
        )
      )
      (3): ResNetSEBlock(
        (block): Sequential(
          (0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
          (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
          (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): SqueezeAndExcitation1D(
            (block): Sequential(
              (0): Linear(in_features=64, out_features=32, bias=True)
              (1): ReLU()
              (2): Linear(in_features=32, out_features=64, bias=True)
              (3): Sigmoid()
            )
          )
        )
      )
      (4): ResNetSEBlock(
        (block): Sequential(
          (0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
          (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU()
          (3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
          (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (5): SqueezeAndExcitation1D(
            (block): Sequential(
              (0): Linear(in_features=64, out_features=32, bias=True)
              (1): ReLU()
              (2): Linear(in_features=32, out_features=64, bias=True)
              (3): Sigmoid()
            )
          )
        )
      )
    )
    (global_avg_pool): AdaptiveAvgPool1d(output_size=1)
  )
  (fc): Linear(in_features=64, out_features=6, bias=True)
  (loss_fn): CrossEntropyLoss()
)

3. Defining the trainer

As we are using Pytorch Lightning, we need to define a Trainer object to train the model. We can define the trainer by passing the following parameters:

  • max_epochs: Maximum number of epochs to train the model.

  • acceleartor: Device to use for training. It can be cpu or gpu.

  • devices: the list or the number of accelerator to use for training.

For this example we will disable logging and checkpointing, by setting logger=False and checkpoint_callback=False.

[8]:
trainer = L.Trainer(
    max_epochs=100,
    devices=1,
    accelerator="gpu",
    logger=False,
    enable_checkpointing=False
)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

3.1. Training the model

To train the model we need to have three objects: the model, the data module, and the trainer. We can train the model by calling the fit method from the trainer and passing the model and the data module.

The fit method will train the model for the number of epochs defined in the trainer object. Also, training dataloader will be used for training, and validation dataloader will be used for validation.

[9]:
trainer.fit(model, data_module)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name     | Type             | Params | Mode
------------------------------------------------------
0 | backbone | _ResNet1D        | 126 K  | train
1 | fc       | Linear           | 390    | train
2 | loss_fn  | CrossEntropyLoss | 0      | train
------------------------------------------------------
127 K     Trainable params
0         Non-trainable params
127 K     Total params
0.509     Total estimated model params size (MB)
76        Modules in train mode
0         Modules in eval mode

/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py:512: You called `self.log('val_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Epoch 0:   2%|▏         | 1/56 [00:00<00:13,  4.19it/s]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py:512: You called `self.log('train_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Epoch 99: 100%|██████████| 56/56 [00:03<00:00, 16.92it/s, val_loss=1.960, train_loss=0.0358]
`Trainer.fit` stopped: `max_epochs=100` reached.
Epoch 99: 100%|██████████| 56/56 [00:03<00:00, 16.90it/s, val_loss=1.960, train_loss=0.0358]

4. Evaluating Model

Once model is trained, we can evaluate the performance of the model on the test dataset. The performance is evaluated using the accuracy metric. To evaluate the model, we perform the following steps:

  1. Perform inference on the test dataset using the trained model. This is done using the trainer.predict method. The method returns the predicted logits for each sample in the test dataset.

  2. Calculate the predicted labels by taking the argmax of the logits.

  3. Obtain the labels from the test dataset.

  4. Create the accuracy metric object and pass the predicted labels and the true labels.

[ ]:
# 1. Obtain predictions for the test set
predictions = trainer.predict(model, data_module)
# As predictions is a list of batches, we concatenate them along the first dimension
predictions = torch.cat(predictions, dim=0) # type: ignore
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0: 100%|██████████| 17/17 [00:00<00:00, 94.88it/s]
[11]:
# 2. We an use the torch.argmax function to obtain the class with the highest probability
predicted_classes = torch.argmax(predictions, dim=1) # type: ignore
# Let's print the predicted classes
print(f"The shape of the predicted labels is {predicted_classes.shape} and dtype {predicted_classes.dtype}")
The shape of the predicted labels is torch.Size([1062]) and dtype torch.int64
[12]:
# 3. Let's obtain the true labels
_, y = data_module.test_dataset[:] # type: ignore
y = torch.from_numpy(y)
# Let's print the true labels
print(f"The shape of the true labels is {y.shape} and dtype {y.dtype}")
The shape of the true labels is torch.Size([1062]) and dtype torch.int64
[ ]:
# 4. Let's create the accuracy metric object and compute the accuracy
acc_metric = Accuracy(task="multiclass", num_classes=6)
score = acc_metric(predicted_classes, y)
print(f"The accuracy of the model is {score.item()*100:.2f}%")
The accuracy of the model is 81.45%