minerva.pipelines.base

Classes

Pipeline

Pipelines provide a versatile API for automating tasks efficiently.

Module Contents

class minerva.pipelines.base.Pipeline(log_dir=None, ignore=None, cache_result=False, save_run_status=False)

Bases: lightning.pytorch.core.mixins.HyperparametersMixin

Pipelines provide a versatile API for automating tasks efficiently. They are runnable objects that keeps track of their parameters, results, and status, allowing the reproductibility and traceability of the experiments.

This is the base class for all pipelines. It provides the basic structure for running a pipeline and saving the results and status of the runs. Users should inherit from this class and implement the _run method.

Pipelines are clonal objects, meaning that they can be cloned to create new pipelines with the same configuration. Cloned pipelines do receive a new pipeline_id and run_count.

Pipelines expose their public API though properties (which are read-only) and though the run method. Users should not access or modify the internal attributes directly. The run method may set desired attributed (hence properties), used to be accessed after or during the run. The run method may return a result, which can be cached and accessed through the result property (if the cache_result is set to True).

Create a new Pipeline object.

Parameters

log_dirPath | str, optional

The default logging directory where all related pipeline files should be saved. By default None (uses current working directory)

ignorestr | List[str], optional

Pipeline __init__ attributes are saved into config attibute. This option allows to ignore some attributes from being saved. This is quite useful when the attributes are not serializable or very large. By default None (save all __init__ attribute values)

cache_resultbool, optional

If True, the result of the last execution of run method is stored at the result attribute. This is useful to avoid recomputing the same result multiple times. If False, the result is not stored, by default False

save_run_statusbool, optional

If True, save the status of each run in a YAML file. This file will be saved in the working directory with the name run_{pipeline_id}.yaml. By default False.

_initialize_vars()

Initialize the internal variables of the pipeline. This method is used on __init__ and on clone method.

abstract _run(*args, **kwargs)

Default pipeline method to be implemented in derived classes. This implements the pipeline logic.

Returns

Any

The result of the pipeline run.

Return type:

Any

_save_pipeline_info(path)

Save the pipeline information to a YAML file.

Parameters

pathPathLike

The path to save the pipeline information.

Parameters:

path (minerva.utils.typing.PathLike)

static clone(other)

Clone a pipeline object. This method creates a new pipeline object with the same configuration as the original pipeline. The new pipeline will have a new pipeline ID and a new run count.

Parameters

otherPipeline

The pipeline object to be cloned.

Returns

Pipeline

The new pipeline object (deep-copyied)

Parameters:

other (Pipeline)

Return type:

Pipeline

property config: Dict[str, Any]

Configuration of the pipeline. This configuration includes the __init__ attributes of the pipeline, except the ones that are ignored.

Returns

Dict[str, Any]

The configuration of the pipeline.

Return type:

Dict[str, Any]

property full_info: Dict[str, Any]

Get all information about the pipeline. This includes, the pipeline information, the configuration, the system information, and the status of the last run.

Returns

Dict[str, Any]

The dictionary with all information about the pipeline.

Return type:

Dict[str, Any]

property log_dir: pathlib.Path

Return the log_dir where everything inside pipeline should be saved.

Returns

Path

Path to the pipeline’s log_dir

Return type:

pathlib.Path

property pipeline_id: str

Return the ID of the pipeline. This ID is unique for each pipeline object and is generated at the creation of the object.

Returns

str

The pipeline ID

Return type:

str

property pipeline_info: Dict[str, str]

Return default information about the pipeline. This information includes the class name, the creation time, the pipeline ID, the working directory, and the number of runs.

Returns

Dict[str, str]

The dictionary with the pipeline information

Return type:

Dict[str, str]

property result: Any

Return the cached result of the last run. If the cache_result is set to False, this property will return None.

Returns

Any

The result of the last run.

Return type:

Any

run(*args, **kwargs)

Default entry-point for running the pipeline. This method calls the _run method, which should be implemented in the derived classes. This method handles the status of the run, the caching of the result, and the saving of the run status.

Returns

Any

The result of the run, from the _run method.

Raises

Exception

Raises any exception that occurs during the run.

Return type:

Any

property run_status: Dict[str, Any]

Status of the last run of the pipeline.

Returns

Dict[str, Any]

Dictionary with the status of the last run.

Return type:

Dict[str, Any]

system_info()

System information about the host, the python environment, and the git repository (if available).

Returns

Dict[str, Any]

The dictionary with the system information.

Return type:

Dict[str, Any]

Parameters:
  • log_dir (pathlib.Path | str)

  • ignore (str | List[str])

  • cache_result (bool)

  • save_run_status (bool)