dasf.datasets.base
Base module for most of the DASF Datasets.
Classes
Class representing a generic dataset based on a TargeteredTransform |
|
Class representing an dataset wich is defined as an array of a defined |
|
Class representing an dataset wich is defined as a Zarr array of a |
|
Class representing an dataset wich is defined as a HDF5 dataset of a |
|
Class representing an dataset wich is defined as a Xarray dataset of a |
|
A class representing a labeled dataset. Each item is a 2-element tuple, |
|
Class representing an dataset wich is defined as a dataframe. |
|
Class representing an dataset wich is defined as a Parquet. |
Module Contents
- class dasf.datasets.base.Dataset(name, download=False, root=None, *args, **kwargs)[source]
Bases:
dasf.transforms.base.TargeteredTransform
Class representing a generic dataset based on a TargeteredTransform
object.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- *argstype
Additional arguments without keys.
- **kwargstype
Additional keyworkded arguments.
Constructor of the object Dataset.
- _name
- _download
- _root
- _metadata
- _data = None
- _chunks = None
- __set_dataset_cache_dir()
Generate cached directory in $HOME to store dataset(s).
- Parameters:
name (str)
download (bool)
root (str)
- class dasf.datasets.base.DatasetArray(name, download=False, root=None, chunks='auto')[source]
Bases:
Dataset
Class representing an dataset wich is defined as an array of a defined
shape.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
Constructor of the object DatasetArray.
- _chunks
- _root_file
- __operator_check__(other)[source]
Check what type of the data we are handling
- Examples:
DatasetArray with array-like; or DatasetArray with DatasetArray
Parameters
- otherAny
array-like of DatasetArray for the operation.
Returns
- dataAny
A data representing the internal array or the class itself.
- __array__(dtype=None)[source]
Array interface is required to support most of the array functions.
Parameters
- dtypeAny
Type of the internal array, default=None (not used)
Returns
- dataAny
A data representing the internal array or the class itself.
- __array_ufunc__(ufunc, method, *inputs, **kwargs)[source]
Any class, array subclass or not, can define this method or set it to None in order to override the behavior of Arrays ufuncs.
Parameters
- ufuncCallable
The ufunc object that was called.
- methodStr
A string indicating which Ufunc method was called (one of “__call__”, “reduce”, “reduceat”, “accumulate”, “outer”, “inner”).
- inputsAny
A tuple of the input arguments to the ufunc.
- kwargsAny
A dictionary containing the optional input arguments of the ufunc. If given, any out arguments, both positional and keyword, are passed as a tuple in kwargs. See the discussion in Universal functions (ufunc) for details.
Returns
- arrayarray-like
The return either the result of the operation.
- __check_op_input(in_data)
Return the proper type of data for operation
>>> Result = DatasetArray + Numpy; or >>> Result = DatasetArray + DatasetArray
Parameters
- in_dataAny
Input data to be analyzed.
Returns
- dataAny
A data representing the internal array or the class itself.
- __add__(other)[source]
Internal function of adding two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArray
A sum with two arrays.
- __sub__(other)[source]
Internal function of subtracting two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A subtraction of two arrays.
- __mul__(other)[source]
Internal function of multiplication two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A multiplication of two arrays.
- __div__(other)[source]
Internal function of division two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A division of two arrays.
- __copy_attrs_from_data()
Extends metadata to new transformed object (after operations).
- __npy_header()
Read an array header from a filelike object.
- _lazy_load(xp, **kwargs)[source]
Lazy load the dataset using an CPU dask container.
Parameters
- xptype
Library used to load the file. It must follow numpy library.
- **kwargstype
Additional keyworkded arguments to the load.
Returns
- Any
The data (or a Future load object, for _lazy operations).
- _load(xp, **kwargs)[source]
Load data using CPU container.
Parameters
- xpModule
A module that load data (implement load function)
- **kwargstype
Additional kwargs to xp.load function.
- _load_meta()[source]
Load metadata to inspect.
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- from_array(array)[source]
Load data from an existing array.
Parameters
- arrayarray-like
Input data to be initialized.
- Parameters:
name (str)
download (bool)
root (str)
- class dasf.datasets.base.DatasetZarr(name, download=False, root=None, backend=None, chunks=None)[source]
Bases:
Dataset
Class representing an dataset wich is defined as a Zarr array of a defined shape.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
Constructor of the object DatasetZarr.
- _backend
- _chunks
- _root_file
- _lazy_load(xp, **kwargs)[source]
Lazy load the dataset using an CPU dask container.
Parameters
- xptype
Library used to load the file. It must follow numpy library.
- **kwargstype
Additional keyworkded arguments to the load.
Returns
- Any
The data (or a Future load object, for _lazy operations).
- _load(xp, **kwargs)[source]
Load data using CPU container.
Parameters
- xpModule
A module that load data (implement load function)
- **kwargstype
Additional kwargs to xp.load function.
- _load_meta()[source]
Load metadata to inspect.
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- __read_zarray(key)
Returns the value of ZArray JSON metadata.
- property shape: tuple
Returns the shape of an array.
Returns
- tuple
A tuple with the shape.
- Return type:
tuple
- metadata()[source]
Return a dictionary with all metadata information from data.
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- __check_op_input(in_data)
Return the proper type of data for operation
>>> Result = DatasetZarr + Numpy; or >>> Result = DatasetZarr + DatasetZarr
Parameters
- in_dataAny
Input data to be analyzed.
Returns
- dataAny
A data representing the internal array or the class itself.
- __add__(other)[source]
Internal function of adding two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A sum with two arrays.
- __sub__(other)[source]
Internal function of subtracting two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A subtraction of two arrays.
- __mul__(other)[source]
Internal function of multiplication two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A multiplication of two arrays.
- __div__(other)[source]
Internal function of division two array datasets.
Parameters
- otherAny
A data representing an array or a DatasetArray.
Returns
- DatasetArry
A division of two arrays.
- __copy_attrs_from_data()
Extends metadata to new transformed object (after operations).
- Parameters:
name (str)
download (bool)
root (str)
backend (str)
- class dasf.datasets.base.DatasetHDF5(name, download=False, root=None, chunks='auto', dataset_path=None)[source]
Bases:
Dataset
Class representing an dataset wich is defined as a HDF5 dataset of a defined shape.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
- dataset_pathstr
Relative path of the internal HDF5 dataset (the default is None).
Constructor of the object DatasetHDF5.
- _chunks
- _root_file
- _dataset_path
- _lazy_load(xp, **kwargs)[source]
Lazy load the dataset using an CPU dask container.
Parameters
- xptype
Library used to load the file. It must follow numpy library.
- **kwargstype
Additional keyworkded arguments to the load.
Returns
- Any
The data (or a Future load object, for _lazy operations).
- _load(xp=None, **kwargs)[source]
Load data using CPU container.
Parameters
- xpModule
A module that load data (implement load function) (placeholder).
- **kwargstype
Additional kwargs to xp.load function.
- Parameters:
name (str)
download (str)
root (str)
dataset_path (str)
- class dasf.datasets.base.DatasetXarray(name, download=False, root=None, chunks=None, data_var=None)[source]
Bases:
Dataset
Class representing an dataset wich is defined as a Xarray dataset of a defined shape.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
- data_varAny
Key (or index) of the internal Xarray dataset (the default is None).
Constructor of the object DatasetXarray.
- _chunks
- _root_file
- _data_var
- _load_meta()[source]
Load metadata to inspect.
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- Parameters:
name (str)
download (bool)
root (str)
- class dasf.datasets.base.DatasetLabeled(name, download=False, root=None, chunks='auto')[source]
Bases:
Dataset
A class representing a labeled dataset. Each item is a 2-element tuple, where the first element is a array of data and the second element is the respective label. The items can be accessed from dataset[x].
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
Attributes
- __chunkstype
Description of attribute __chunks.
Constructor of the object DatasetLabeled.
- _chunks
- metadata()[source]
Return a dictionary with all metadata information from data (train and labels).
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- _lazy_load(xp, **kwargs)[source]
Lazy load the dataset using an CPU dask container.
Parameters
- xptype
Library used to load the file. It must follow numpy library.
- **kwargstype
Additional keyworkded arguments to the load.
Returns
- Tuple
A Future object that will return a tuple: (data, label).
- Return type:
tuple
- _load(xp, **kwargs)[source]
Load data using CPU container.
Parameters
- xpModule
A module that load data (implement load function)
- **kwargstype
Additional kwargs to xp.load function.
Returns
- Tuple
A 2-element tuple: (data, label)
- Return type:
tuple
- Parameters:
name (str)
download (bool)
root (str)
- class dasf.datasets.base.DatasetDataFrame(name, download=True, root=None, chunks='auto')[source]
Bases:
Dataset
Class representing an dataset wich is defined as a dataframe.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
Constructor of the object DatasetDataFrame.
- _chunks
- _root_file
- _load_meta()[source]
Load metadata to inspect.
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- metadata()[source]
Return a dictionary with all metadata information from data.
Returns
- dict
A dictionary with metadata information.
- Return type:
dict
- Parameters:
name (str)
download (bool)
root (str)
- class dasf.datasets.base.DatasetParquet(name, download=True, root=None, chunks='auto')[source]
Bases:
DatasetDataFrame
Class representing an dataset wich is defined as a Parquet.
Parameters
- namestr
Symbolic name of the dataset.
- downloadbool
If the dataset must be downloaded (the default is False).
- rootstr
Root download directory (the default is None).
- chunksAny
Number of blocks of the array (the default is “auto”).
Constructor of the object DatasetParquet.
- Parameters:
name (str)
download (bool)
root (str)