dasf.datasets.base ================== .. py:module:: dasf.datasets.base .. autoapi-nested-parse:: Base module for most of the DASF Datasets. Classes ------- .. autoapisummary:: dasf.datasets.base.Dataset dasf.datasets.base.DatasetArray dasf.datasets.base.DatasetZarr dasf.datasets.base.DatasetHDF5 dasf.datasets.base.DatasetXarray dasf.datasets.base.DatasetLabeled dasf.datasets.base.DatasetDataFrame dasf.datasets.base.DatasetParquet Module Contents --------------- .. py:class:: Dataset(name, download = False, root = None, *args, **kwargs) Bases: :py:obj:`dasf.transforms.base.TargeteredTransform` Class representing a generic dataset based on a TargeteredTransform object. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). *args : type Additional arguments without keys. **kwargs : type Additional keyworkded arguments. Constructor of the object Dataset. .. py:attribute:: _name .. py:attribute:: _download .. py:attribute:: _root .. py:attribute:: _metadata .. py:attribute:: _data :value: None .. py:attribute:: _chunks :value: None .. py:method:: __set_dataset_cache_dir() Generate cached directory in $HOME to store dataset(s). .. py:method:: download() Skeleton of the download method. .. py:method:: __len__() Return internal data length. .. py:method:: __getitem__(idx) Generic __getitem__() function based on internal data. Parameters ---------- idx : Any Key of the fetched data. It can be an integer or a tuple. .. py:class:: DatasetArray(name, download = False, root = None, chunks='auto') Bases: :py:obj:`Dataset` Class representing an dataset wich is defined as an array of a defined shape. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). Constructor of the object DatasetArray. .. py:attribute:: _chunks .. py:attribute:: _root_file .. py:method:: __operator_check__(other) Check what type of the data we are handling Examples: DatasetArray with array-like; or DatasetArray with DatasetArray Parameters ---------- other : Any array-like of DatasetArray for the operation. Returns ------- data : Any A data representing the internal array or the class itself. .. py:method:: __repr__() Return a class representation based on internal array. .. py:method:: __array__(dtype=None) Array interface is required to support most of the array functions. Parameters ---------- dtype : Any Type of the internal array, default=None (not used) Returns ------- data : Any A data representing the internal array or the class itself. .. py:method:: __array_ufunc__(ufunc, method, *inputs, **kwargs) Any class, array subclass or not, can define this method or set it to None in order to override the behavior of Arrays ufuncs. Parameters ---------- ufunc : Callable The ufunc object that was called. method : Str A string indicating which Ufunc method was called (one of "__call__", "reduce", "reduceat", "accumulate", "outer", "inner"). inputs : Any A tuple of the input arguments to the ufunc. kwargs : Any A dictionary containing the optional input arguments of the ufunc. If given, any out arguments, both positional and keyword, are passed as a tuple in kwargs. See the discussion in Universal functions (ufunc) for details. Returns ------- array : array-like The return either the result of the operation. .. py:method:: __check_op_input(in_data) Return the proper type of data for operation >>> Result = DatasetArray + Numpy; or >>> Result = DatasetArray + DatasetArray Parameters ---------- in_data : Any Input data to be analyzed. Returns ------- data : Any A data representing the internal array or the class itself. .. py:method:: __add__(other) Internal function of adding two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArray A sum with two arrays. .. py:method:: __sub__(other) Internal function of subtracting two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A subtraction of two arrays. .. py:method:: __mul__(other) Internal function of multiplication two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A multiplication of two arrays. .. py:method:: __div__(other) Internal function of division two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A division of two arrays. .. py:method:: __copy_attrs_from_data() Extends metadata to new transformed object (after operations). .. py:method:: __npy_header() Read an array header from a filelike object. .. py:method:: _lazy_load(xp, **kwargs) Lazy load the dataset using an CPU dask container. Parameters ---------- xp : type Library used to load the file. It must follow numpy library. **kwargs : type Additional keyworkded arguments to the load. Returns ------- Any The data (or a Future load object, for `_lazy` operations). .. py:method:: _load(xp, **kwargs) Load data using CPU container. Parameters ---------- xp : Module A module that load data (implement `load` function) **kwargs : type Additional `kwargs` to `xp.load` function. .. py:method:: _load_meta() Load metadata to inspect. Returns ------- dict A dictionary with metadata information. .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _load_gpu() Load data with GPU container (e.g. cupy). .. py:method:: _load_cpu() Load data with CPU container (e.g. numpy). .. py:method:: from_array(array) Load data from an existing array. Parameters ---------- array : array-like Input data to be initialized. .. py:method:: load() Placeholder for load function. .. py:property:: shape :type: tuple Returns the shape of an array. Returns ------- tuple A tuple with the shape. .. py:method:: metadata() Return a dictionary with all metadata information from data. Returns ------- dict A dictionary with metadata information. .. py:class:: DatasetZarr(name, download = False, root = None, backend = None, chunks=None) Bases: :py:obj:`Dataset` Class representing an dataset wich is defined as a Zarr array of a defined shape. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). Constructor of the object DatasetZarr. .. py:attribute:: _backend .. py:attribute:: _chunks .. py:attribute:: _root_file .. py:method:: _lazy_load(xp, **kwargs) Lazy load the dataset using an CPU dask container. Parameters ---------- xp : type Library used to load the file. It must follow numpy library. **kwargs : type Additional keyworkded arguments to the load. Returns ------- Any The data (or a Future load object, for `_lazy` operations). .. py:method:: _load(xp, **kwargs) Load data using CPU container. Parameters ---------- xp : Module A module that load data (implement `load` function) **kwargs : type Additional `kwargs` to `xp.load` function. .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _load_cpu() Load data with CPU container (e.g. numpy). .. py:method:: _load_gpu() Load data with GPU container (e.g. cupy). .. py:method:: load() Placeholder for load function. .. py:method:: _load_meta() Load metadata to inspect. Returns ------- dict A dictionary with metadata information. .. py:method:: __read_zarray(key) Returns the value of ZArray JSON metadata. .. py:property:: shape :type: tuple Returns the shape of an array. Returns ------- tuple A tuple with the shape. .. py:property:: chunksize Returns the chunksize of an array. Returns ------- tuple A tuple with the chunksize. .. py:method:: metadata() Return a dictionary with all metadata information from data. Returns ------- dict A dictionary with metadata information. .. py:method:: __repr__() Return a class representation based on internal array. .. py:method:: __check_op_input(in_data) Return the proper type of data for operation >>> Result = DatasetZarr + Numpy; or >>> Result = DatasetZarr + DatasetZarr Parameters ---------- in_data : Any Input data to be analyzed. Returns ------- data : Any A data representing the internal array or the class itself. .. py:method:: __add__(other) Internal function of adding two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A sum with two arrays. .. py:method:: __sub__(other) Internal function of subtracting two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A subtraction of two arrays. .. py:method:: __mul__(other) Internal function of multiplication two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A multiplication of two arrays. .. py:method:: __div__(other) Internal function of division two array datasets. Parameters ---------- other : Any A data representing an array or a DatasetArray. Returns ------- DatasetArry A division of two arrays. .. py:method:: __copy_attrs_from_data() Extends metadata to new transformed object (after operations). .. py:class:: DatasetHDF5(name, download = False, root = None, chunks='auto', dataset_path = None) Bases: :py:obj:`Dataset` Class representing an dataset wich is defined as a HDF5 dataset of a defined shape. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). dataset_path : str Relative path of the internal HDF5 dataset (the default is None). Constructor of the object DatasetHDF5. .. py:attribute:: _chunks .. py:attribute:: _root_file .. py:attribute:: _dataset_path .. py:method:: _lazy_load(xp, **kwargs) Lazy load the dataset using an CPU dask container. Parameters ---------- xp : type Library used to load the file. It must follow numpy library. **kwargs : type Additional keyworkded arguments to the load. Returns ------- Any The data (or a Future load object, for `_lazy` operations). .. py:method:: _load(xp=None, **kwargs) Load data using CPU container. Parameters ---------- xp : Module A module that load data (implement `load` function) (placeholder). **kwargs : type Additional `kwargs` to `xp.load` function. .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _load_cpu() Load data with CPU container (e.g. numpy). .. py:method:: _load_gpu() Load data with GPU container (e.g. cupy). .. py:method:: load() Placeholder for load function. .. py:method:: _load_meta() Load metadata to inspect. Returns ------- dict A dictionary with metadata information. .. py:method:: metadata() Return a dictionary with all metadata information from data. Returns ------- dict A dictionary with metadata information. .. py:class:: DatasetXarray(name, download = False, root = None, chunks=None, data_var=None) Bases: :py:obj:`Dataset` Class representing an dataset wich is defined as a Xarray dataset of a defined shape. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). data_var : Any Key (or index) of the internal Xarray dataset (the default is None). Constructor of the object DatasetXarray. .. py:attribute:: _chunks .. py:attribute:: _root_file .. py:attribute:: _data_var .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _load_cpu() Load data with CPU container (e.g. numpy). .. py:method:: _load_gpu() Load data with GPU container (e.g. cupy). .. py:method:: load() Placeholder for load function. .. py:method:: _load_meta() Load metadata to inspect. Returns ------- dict A dictionary with metadata information. .. py:method:: metadata() Return a dictionary with all metadata information from data. Returns ------- dict A dictionary with metadata information. .. py:method:: __len__() Return internal data length. .. py:method:: __getitem__(idx) A __getitem__() function based on internal Xarray data. Parameters ---------- idx : Any Key of the fetched data. It can be an integer or a tuple. .. py:class:: DatasetLabeled(name, download = False, root = None, chunks='auto') Bases: :py:obj:`Dataset` A class representing a labeled dataset. Each item is a 2-element tuple, where the first element is a array of data and the second element is the respective label. The items can be accessed from `dataset[x]`. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). Attributes ---------- __chunks : type Description of attribute `__chunks`. Constructor of the object DatasetLabeled. .. py:attribute:: _chunks .. py:method:: download() Download the dataset. .. py:method:: metadata() Return a dictionary with all metadata information from data (train and labels). Returns ------- dict A dictionary with metadata information. .. py:method:: _lazy_load(xp, **kwargs) Lazy load the dataset using an CPU dask container. Parameters ---------- xp : type Library used to load the file. It must follow numpy library. **kwargs : type Additional keyworkded arguments to the load. Returns ------- Tuple A Future object that will return a tuple: (data, label). .. py:method:: _load(xp, **kwargs) Load data using CPU container. Parameters ---------- xp : Module A module that load data (implement `load` function) **kwargs : type Additional `kwargs` to `xp.load` function. Returns ------- Tuple A 2-element tuple: (data, label) .. py:method:: _load_meta() Load metadata to inspect. Returns ------- dict A dictionary with metadata information. .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _load_gpu() Load data with GPU container (e.g. cupy). .. py:method:: _load_cpu() Load data with CPU container (e.g. numpy). .. py:method:: load() Placeholder for load function. .. py:method:: __getitem__(idx) A __getitem__() function for data and labeled data together. Parameters ---------- idx : Any Key of the fetched data. It can be an integer or a tuple. .. py:class:: DatasetDataFrame(name, download = True, root = None, chunks='auto') Bases: :py:obj:`Dataset` Class representing an dataset wich is defined as a dataframe. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). Constructor of the object DatasetDataFrame. .. py:attribute:: _chunks .. py:attribute:: _root_file .. py:method:: _load_meta() Load metadata to inspect. Returns ------- dict A dictionary with metadata information. .. py:method:: metadata() Return a dictionary with all metadata information from data. Returns ------- dict A dictionary with metadata information. .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _load_gpu() Load data with GPU container (e.g. CuDF). .. py:method:: _load_cpu() Load data with CPU container (e.g. pandas). .. py:method:: load() Placeholder for load function. .. py:property:: shape :type: tuple Returns the shape of an array. Returns ------- tuple A tuple with the shape. .. py:method:: __len__() Return internal data length. .. py:method:: __getitem__(idx) A __getitem__() function based on internal dataframe. Parameters ---------- idx : Any Key of the fetched data. It can be an integer or a tuple. .. py:class:: DatasetParquet(name, download = True, root = None, chunks='auto') Bases: :py:obj:`DatasetDataFrame` Class representing an dataset wich is defined as a Parquet. Parameters ---------- name : str Symbolic name of the dataset. download : bool If the dataset must be downloaded (the default is False). root : str Root download directory (the default is None). chunks : Any Number of blocks of the array (the default is "auto"). Constructor of the object DatasetParquet. .. py:method:: _lazy_load_gpu() Load data with GPU container + DASK. (It does not load immediattly) .. py:method:: _lazy_load_cpu() Load data with CPU container + DASK. (It does not load immediattly) .. py:method:: _load_gpu() Load data with GPU container (e.g. CuDF). .. py:method:: _load_cpu() Load data with CPU container (e.g. pandas).