Getting Started with Minerva for Human Activity Recognition
Human Activity Recognition (HAR) is a challenging task that involves identifying actions performed by individuals based on sensor data—typically time-series signals from accelerometers and gyroscopes.
In this notebook, we will use Minerva to train and evaluate an 1D ResNet-SE model from scratch for classifying human activities using DAGHAR Dataset.
Thus, this notebook is a step-by-step guide to get you started with Minerva for HAR. It comprehends the following steps:
Data Preparation
Model Creation
Model Training
Model Evaluation
Imports
[2]:
from pathlib import Path
import lightning as L
import torch
from torchmetrics import Accuracy
from minerva.data.datasets.series_dataset import MultiModalSeriesCSVDataset
from minerva.data.data_modules.base import MinervaDataModule
from minerva.models.nets.time_series.resnet import ResNetSE1D_5
1. Data Preparation
We begin by preparing the data required for training and evaluation.
For this tutorial, we will use the standardized view of the DAGHAR Dataset, as introduced in the following paper:
Napoli, O., Duarte, D., Alves, P., Soto, D.H.P., de Oliveira, H.E., Rocha, A., Boccato, L. and Borin, E., 2024.
A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.
Scientific Data, 11(1), p.1192.
This dataset includes time-series data from two tri-axial sensors—an accelerometer and a gyroscope—collected via smartphones. It is organized into six different datasets:
KuHar
MotionSense
RealWorld-waist
RealWorld-thigh
UCI
WISDM
In this notebook, we will work with the standardized view of the MotionSense dataset.
You can download and extract the dataset using the commands below (wget
and unzip
):
[3]:
!wget https://zenodo.org/records/13987073/files/standardized_view.zip?download=1 -O daghar_standardized_view.zip
!unzip -o daghar_standardized_view.zip -d daghar_standardized_view
--2025-03-22 17:41:07-- https://zenodo.org/records/13987073/files/standardized_view.zip?download=1
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.43.25, 188.185.48.194, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 191454525 (183M) [application/octet-stream]
Saving to: ‘daghar_standardized_view.zip’
daghar_standardized 100%[===================>] 182.58M 5.08MB/s in 9m 27s
2025-03-22 17:50:35 (330 KB/s) - ‘daghar_standardized_view.zip’ saved [191454525/191454525]
Archive: daghar_standardized_view.zip
creating: daghar_standardized_view/standardized_view/
creating: daghar_standardized_view/standardized_view/KuHar/
inflating: daghar_standardized_view/standardized_view/KuHar/validation.csv
inflating: daghar_standardized_view/standardized_view/KuHar/train.csv
inflating: daghar_standardized_view/standardized_view/KuHar/test.csv
creating: daghar_standardized_view/standardized_view/MotionSense/
inflating: daghar_standardized_view/standardized_view/MotionSense/validation.csv
inflating: daghar_standardized_view/standardized_view/MotionSense/train.csv
inflating: daghar_standardized_view/standardized_view/MotionSense/test.csv
creating: daghar_standardized_view/standardized_view/RealWorld_thigh/
inflating: daghar_standardized_view/standardized_view/RealWorld_thigh/validation.csv
inflating: daghar_standardized_view/standardized_view/RealWorld_thigh/train.csv
inflating: daghar_standardized_view/standardized_view/RealWorld_thigh/test.csv
creating: daghar_standardized_view/standardized_view/RealWorld_waist/
inflating: daghar_standardized_view/standardized_view/RealWorld_waist/validation.csv
inflating: daghar_standardized_view/standardized_view/RealWorld_waist/train.csv
inflating: daghar_standardized_view/standardized_view/RealWorld_waist/test.csv
creating: daghar_standardized_view/standardized_view/UCI/
inflating: daghar_standardized_view/standardized_view/UCI/validation.csv
inflating: daghar_standardized_view/standardized_view/UCI/train.csv
inflating: daghar_standardized_view/standardized_view/UCI/test.csv
creating: daghar_standardized_view/standardized_view/WISDM/
inflating: daghar_standardized_view/standardized_view/WISDM/validation.csv
inflating: daghar_standardized_view/standardized_view/WISDM/train.csv
inflating: daghar_standardized_view/standardized_view/WISDM/test.csv
Once extracted, the folder structure will look like this:
daghar_standardized_view/standardized_view
├── KuHar
│ ├── train.csv
│ ├── test.csv
│ └── val.csv
├── MotionSense
│ ├── train.csv
│ ├── test.csv
│ └── val.csv
├── RealWorld_waist
│ ├── train.csv
│ ├── test.csv
│ └── val.csv
├── RealWorld_thigh
│ ├── train.csv
│ ├── test.csv
│ └── val.csv
├── UCI
│ ├── train.csv
│ ├── test.csv
│ └── val.csv
└── WISDM
├── train.csv
├── test.csv
└── val.csv
Each dataset is split into train.csv
, val.csv
, and test.csv
files. Each file contains time-series data with the following structure:
Column Range |
Description |
---|---|
|
60 time steps of accelerometer x-axis |
|
60 time steps of accelerometer y-axis |
|
60 time steps of accelerometer z-axis |
|
60 time steps of gyroscope x-axis |
|
60 time steps of gyroscope y-axis |
|
60 time steps of gyroscope z-axis |
|
Encoded activity label |
Each row represents one sample, composed of 6 channels (3 from each sensor) and 60 time steps per channel, representing 3 seconds of data at a sampling rate of 20 Hz.
All datasets in DAGHAR share the same structure and label set. The activity codes are mapped as follows:
Standard Activity Code |
Activity |
---|---|
0 |
Sit |
1 |
Stand |
2 |
Walk |
3 |
Stair-up |
4 |
Stair-down |
5 |
Run |
1.1. Creating Datasets
Our root level directory is daghar_standardized_view/standardized_view
. We will create 3 dataset objects for training, validation, and testing using the MultiModalSeriesCSVDataset
class from Minerva.
This class allows us to read any CSV file containing multimodal time-series in the format described above. The parameters required to create the dataset are:
data_path
: Path to the CSV filefeature_prefixes
: List of prefixes for the features to be considered. For instance,['accel-x', 'accel-y', 'accel-z']
will read all columns starting withaccel-x
, than all columns starting withaccel-y
, and, finally, all columns starting withaccel-z
. Then, a single sample will be composed of 3 channels, where first channel will contain all columns starting withaccel-x
, second channel will contain all columns starting withaccel-y
, and third channel will contain all columns starting withaccel-z
. Note that the order of the prefixes in the list will determine the order of the channels in the sample. Also, other prefixes will be ignored.features_as_channels
: If True, the features will be treated as channels. If False, the features will be flattened into a single channel.label
: Column name of the label.
[3]:
root_data_dir = Path("daghar_standardized_view/standardized_view/MotionSense")
# Create the train dataset
train_dataset = MultiModalSeriesCSVDataset(
data_path=root_data_dir / "train.csv",
feature_prefixes=[
"accel-x-",
"accel-y",
"accel-z",
"gyro-x",
"gyro-y",
"gyro-z",
],
features_as_channels=True,
label="standard activity code",
)
# Create the validation dataset
val_dataset = MultiModalSeriesCSVDataset(
data_path=root_data_dir / "validation.csv",
feature_prefixes=[
"accel-x-",
"accel-y",
"accel-z",
"gyro-x",
"gyro-y",
"gyro-z",
],
features_as_channels=True,
label="standard activity code",
)
# Create the test dataset
test_dataset = MultiModalSeriesCSVDataset(
data_path=root_data_dir / "test.csv",
feature_prefixes=[
"accel-x-",
"accel-y",
"accel-z",
"gyro-x",
"gyro-y",
"gyro-z",
],
features_as_channels=True,
label="standard activity code",
)
[4]:
print(f"Train dataset: {train_dataset}")
print(f"Validation dataset: {val_dataset}")
print(f"Test dataset: {test_dataset}")
Train dataset: MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/train.csv (3558 samples)
Validation dataset: MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/validation.csv (420 samples)
Test dataset: MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/test.csv (1062 samples)
Let’s take a look at the first sample of the training dataset. As each sample has 6 channels and 60 time steps, the shape of the sample will be (6, 60)
. The label will be a single integer representing the activity code.
[5]:
X, y = train_dataset[0]
print(f"The shape of the input is {X.shape} and the label is {y}")
The shape of the input is (6, 60) and the label is 4
1.2. Creating the MinervaDataModule
Minerva models are implemented using Pytorch Lightning. Thus, to train a model we should create a LightningDataModule
object that will handle the data loading and preprocessing. Minerva provides a MinervaDataModule
class that extends Pytorch Lightning’s LightningDataModule
class and standardizes the data loading process.
We may create a MinervaDataModule
object by passing the training, validation, and testing datasets, as well as the batch size and the number of workers for data loading
[6]:
data_module = MinervaDataModule(
train_dataset=train_dataset,
val_dataset=val_dataset,
test_dataset=test_dataset,
batch_size=64,
name="MotionSense Dataset",
num_workers=4
)
print(data_module)
==================================================
🆔 MotionSense Dataset
==================================================
├── Predict Split: test
└── Dataloader class: <class 'torch.utils.data.dataloader.DataLoader'>
📂 Datasets:
├── Train Dataset:
│ MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/train.csv (3558 samples)
├── Val Dataset:
│ MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/validation.csv (420 samples)
└── Test Dataset:
MultiModalSeriesCSVDataset at daghar_standardized_view/standardized_view/MotionSense/test.csv (1062 samples)
🛠 **Dataloader Configurations:**
├── Train Dataloader Kwargs:
├── batch_size: 64
├── num_workers: 4
├── shuffle: true
├── drop_last: false
├── Val Dataloader Kwargs:
├── batch_size: 64
├── num_workers: 4
├── shuffle: false
├── drop_last: false
└── Test Dataloader Kwargs:
├── batch_size: 64
├── num_workers: 4
├── shuffle: false
├── drop_last: false
==================================================
2. Creating the Model
Minerva provides several models to be used in Human Activity Recognition. In this notebook we will use the `ResNetSE1D_5
model <https://ieeexplore.ieee.org/document/9771436>`__ which is a 1D ResNet model with Squeeze-and-Excitation blocks and 5 residual blocks.
To create the model, we just need to call the ResNetSE1D_5
class from Minerva and pass the following parameters:
input_shape
: Shape of each input samples, in the format(channels, time_steps)
. In this case, it will be(6, 60)
.num_classes
: Number of classes in the dataset. In this case, it will be 6.
[7]:
model = ResNetSE1D_5(
input_shape=(6, 60),
num_classes=6,
)
model
[W322 18:28:16.875825496 NNPACK.cpp:62] Could not initialize NNPACK! Reason: Unsupported hardware.
[7]:
ResNetSE1D_5(
(backbone): _ResNet1D(
(conv_block): ConvolutionalBlock(
(block): Sequential(
(0): Conv1d(6, 64, kernel_size=(5,), stride=(1,))
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
)
(residual_blocks): Sequential(
(0): ResNetSEBlock(
(block): Sequential(
(0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): SqueezeAndExcitation1D(
(block): Sequential(
(0): Linear(in_features=64, out_features=32, bias=True)
(1): ReLU()
(2): Linear(in_features=32, out_features=64, bias=True)
(3): Sigmoid()
)
)
)
)
(1): ResNetSEBlock(
(block): Sequential(
(0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): SqueezeAndExcitation1D(
(block): Sequential(
(0): Linear(in_features=64, out_features=32, bias=True)
(1): ReLU()
(2): Linear(in_features=32, out_features=64, bias=True)
(3): Sigmoid()
)
)
)
)
(2): ResNetSEBlock(
(block): Sequential(
(0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): SqueezeAndExcitation1D(
(block): Sequential(
(0): Linear(in_features=64, out_features=32, bias=True)
(1): ReLU()
(2): Linear(in_features=32, out_features=64, bias=True)
(3): Sigmoid()
)
)
)
)
(3): ResNetSEBlock(
(block): Sequential(
(0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): SqueezeAndExcitation1D(
(block): Sequential(
(0): Linear(in_features=64, out_features=32, bias=True)
(1): ReLU()
(2): Linear(in_features=32, out_features=64, bias=True)
(3): Sigmoid()
)
)
)
)
(4): ResNetSEBlock(
(block): Sequential(
(0): Conv1d(64, 32, kernel_size=(5,), stride=(1,), padding=same)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv1d(32, 64, kernel_size=(5,), stride=(1,), padding=same)
(4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): SqueezeAndExcitation1D(
(block): Sequential(
(0): Linear(in_features=64, out_features=32, bias=True)
(1): ReLU()
(2): Linear(in_features=32, out_features=64, bias=True)
(3): Sigmoid()
)
)
)
)
)
(global_avg_pool): AdaptiveAvgPool1d(output_size=1)
)
(fc): Linear(in_features=64, out_features=6, bias=True)
(loss_fn): CrossEntropyLoss()
)
3. Defining the trainer
As we are using Pytorch Lightning, we need to define a Trainer
object to train the model. We can define the trainer by passing the following parameters:
max_epochs
: Maximum number of epochs to train the model.acceleartor
: Device to use for training. It can becpu
orgpu
.devices
: the list or the number of accelerator to use for training.
For this example we will disable logging and checkpointing, by setting logger=False
and checkpoint_callback=False
.
[8]:
trainer = L.Trainer(
max_epochs=100,
devices=1,
accelerator="gpu",
logger=False,
enable_checkpointing=False
)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
3.1. Training the model
To train the model we need to have three objects: the model, the data module, and the trainer. We can train the model by calling the fit
method from the trainer and passing the model and the data module.
The fit
method will train the model for the number of epochs defined in the trainer object. Also, training dataloader will be used for training, and validation dataloader will be used for validation.
[9]:
trainer.fit(model, data_module)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params | Mode
------------------------------------------------------
0 | backbone | _ResNet1D | 126 K | train
1 | fc | Linear | 390 | train
2 | loss_fn | CrossEntropyLoss | 0 | train
------------------------------------------------------
127 K Trainable params
0 Non-trainable params
127 K Total params
0.509 Total estimated model params size (MB)
76 Modules in train mode
0 Modules in eval mode
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py:512: You called `self.log('val_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Epoch 0: 2%|▏ | 1/56 [00:00<00:13, 4.19it/s]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py:512: You called `self.log('train_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Epoch 99: 100%|██████████| 56/56 [00:03<00:00, 16.92it/s, val_loss=1.960, train_loss=0.0358]
`Trainer.fit` stopped: `max_epochs=100` reached.
Epoch 99: 100%|██████████| 56/56 [00:03<00:00, 16.90it/s, val_loss=1.960, train_loss=0.0358]
4. Evaluating Model
Once model is trained, we can evaluate the performance of the model on the test dataset. The performance is evaluated using the accuracy metric. To evaluate the model, we perform the following steps:
Perform inference on the test dataset using the trained model. This is done using the
trainer.predict
method. The method returns the predicted logits for each sample in the test dataset.Calculate the predicted labels by taking the argmax of the logits.
Obtain the labels from the test dataset.
Create the accuracy metric object and pass the predicted labels and the true labels.
[ ]:
# 1. Obtain predictions for the test set
predictions = trainer.predict(model, data_module)
# As predictions is a list of batches, we concatenate them along the first dimension
predictions = torch.cat(predictions, dim=0) # type: ignore
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0: 100%|██████████| 17/17 [00:00<00:00, 94.88it/s]
[11]:
# 2. We an use the torch.argmax function to obtain the class with the highest probability
predicted_classes = torch.argmax(predictions, dim=1) # type: ignore
# Let's print the predicted classes
print(f"The shape of the predicted labels is {predicted_classes.shape} and dtype {predicted_classes.dtype}")
The shape of the predicted labels is torch.Size([1062]) and dtype torch.int64
[12]:
# 3. Let's obtain the true labels
_, y = data_module.test_dataset[:] # type: ignore
y = torch.from_numpy(y)
# Let's print the true labels
print(f"The shape of the true labels is {y.shape} and dtype {y.dtype}")
The shape of the true labels is torch.Size([1062]) and dtype torch.int64
[ ]:
# 4. Let's create the accuracy metric object and compute the accuracy
acc_metric = Accuracy(task="multiclass", num_classes=6)
score = acc_metric(predicted_classes, y)
print(f"The accuracy of the model is {score.item()*100:.2f}%")
The accuracy of the model is 81.45%