minerva.pipelines.base

Classes

Pipeline

Pipelines provide a versatile API for automating tasks efficiently.

Module Contents

class minerva.pipelines.base.Pipeline(log_dir=None, ignore=None, cache_result=False, save_run_status=False, seed=None)[source]

Bases: lightning.pytorch.core.mixins.HyperparametersMixin

Pipelines provide a versatile API for automating tasks efficiently. They are runnable objects that keeps track of their parameters, results, and status, allowing the reproductibility and traceability of the experiments.

This is the base class for all pipelines. It provides the basic structure for running a pipeline and saving the results and status of the runs. Users should inherit from this class and implement the _run method.

Pipelines are clonal objects, meaning that they can be cloned to create new pipelines with the same configuration. Cloned pipelines do receive a new pipeline_id and run_count.

Pipelines expose their public API though properties (which are read-only) and though the run method. Users should not access or modify the internal attributes directly. The run method may set desired attributed (hence properties), used to be accessed after or during the run. The run method may return a result, which can be cached and accessed through the result property (if the cache_result is set to True).

Create a new Pipeline object.

Parameters

log_dirPathLike, optional: The default logging directory where all related pipeline files should be saved. By default None (uses current working directory)
ignoreUnion[str, List[str]], optional: Pipeline __init__ attributes are saved into config attibute. This option allows to ignore some attributes from being saved. This is quite useful when the attributes are not serializable or very large. By default None (save all __init__ attribute values)
cache_resultbool, optional: If True, the result of the last execution of run method is stored at the result attribute. This is useful to avoid recomputing the same result multiple times. If False, the result is not stored, by default False
save_run_statusbool, optional: If True, save the status of each run in a YAML file. This file will be saved in the working directory with the name run_{pipeline_id}.yaml. By default False.
seedOptional[int], optional: Seed to be used by the pipeline. If None, a random seed is generated and used. By default None.

_cache_result = False

_cached_run_status = []

_generate_seed()[source]

Return type:: int

_initialize_vars()[source]: Initialize the internal variables of the pipeline. This method is used on __init__ and on clone method.

_pipeline_id

abstract _run(*args, **kwargs)[source]

Default pipeline method to be implemented in derived classes. This implements the pipeline logic.

Returns

Any: The result of the pipeline run.

Return type:: Any

_save_pipeline_info(path)[source]

Save the pipeline information to a YAML file.

Parameters

pathPathLike: The path to save the pipeline information.

Parameters:: path (minerva.utils.typing.PathLike)

_save_run_status = False

static clone(other)[source]

Clone a pipeline object. This method creates a new pipeline object with the same configuration as the original pipeline. The new pipeline will have a new pipeline ID and a new run count.

Parameters

otherPipeline: The pipeline object to be cloned.

Returns

Pipeline: The new pipeline object (deep-copyied)

Parameters:: other (Pipeline)
Return type:: Pipeline

property config: Dict[str, Any]

Configuration of the pipeline. This configuration includes the __init__ attributes of the pipeline, except the ones that are ignored.

Returns

Dict[str, Any]: The configuration of the pipeline.

Return type:: Dict[str, Any]

property full_info: Dict[str, Any]

Get all information about the pipeline. This includes, the pipeline information, the configuration, the system information, and the status of the last run.

Returns

Dict[str, Any]: The dictionary with all information about the pipeline.

Return type:: Dict[str, Any]

property log_dir: pathlib.Path

Return the log_dir where everything inside pipeline should be saved.

Returns

Path: Path to the pipeline’s log_dir

Return type:: pathlib.Path

property pipeline_id: str

Return the ID of the pipeline. This ID is unique for each pipeline object and is generated at the creation of the object.

Returns

str: The pipeline ID

Return type:: str

property pipeline_info: Dict[str, str | float | int]

Return default information about the pipeline. This information includes the class name, the creation time, the pipeline ID, the working directory, and the number of runs.

Returns

Dict[str, str]: The dictionary with the pipeline information

Return type:: Dict[str, Union[str, float, int]]

property result: Any

Return the cached result of the last run. If the cache_result is set to False, this property will return None.

Returns

Any: The result of the last run.

Return type:: Any

run(*args, **kwargs)[source]

Default entry-point for running the pipeline. This method calls the _run method, which should be implemented in the derived classes. This method handles the status of the run, the caching of the result, and the saving of the run status.

Returns

Any: The result of the run, from the _run method.

Raises

Exception: Raises any exception that occurs during the run.

Return type:: Any

property run_status: Dict[str, Any]

Status of the last run of the pipeline.

Returns

Dict[str, Any]: Dictionary with the status of the last run.

Return type:: Dict[str, Any]

seed

property system_info: Dict[str, Any]

System information about the host, the python environment, and the git repository (if available).

Returns

Dict[str, Any]: The dictionary with the system information.

Return type:: Dict[str, Any]

Parameters:

log_dir (Optional[minerva.utils.typing.PathLike])
ignore (Optional[Union[str, List[str]]])
cache_result (bool)
save_run_status (bool)
seed (Optional[int])