dasf.ml.dl.pytorch_lightning ============================ .. py:module:: dasf.ml.dl.pytorch_lightning Classes ------- .. autoapisummary:: dasf.ml.dl.pytorch_lightning.TorchDataLoader dasf.ml.dl.pytorch_lightning.NeuralNetClassifier Functions --------- .. autoapisummary:: dasf.ml.dl.pytorch_lightning.run_dask_clustered dasf.ml.dl.pytorch_lightning.fit Module Contents --------------- .. py:class:: TorchDataLoader(train, val=None, test=None, batch_size=64) Bases: :py:obj:`pytorch_lightning.LightningDataModule` A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is consistent data splits, data preparation and transforms across models. Example:: class MyDataModule(LightningDataModule): def __init__(self): super().__init__() def prepare_data(self): # download, split, etc... # only called on 1 GPU/TPU in distributed def setup(self, stage): # make assignments here (val/train/test split) # called on every process in DDP def train_dataloader(self): train_split = Dataset(...) return DataLoader(train_split) def val_dataloader(self): val_split = Dataset(...) return DataLoader(val_split) def test_dataloader(self): test_split = Dataset(...) return DataLoader(test_split) def teardown(self): # clean up after fit or test # called on every process in DDP Attributes: prepare_data_per_node: If True, each LOCAL_RANK=0 will call prepare data. Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data. allow_zero_length_dataloader_with_multiple_devices: If True, dataloader with zero length within local rank is allowed. Default value is False. .. py:attribute:: _train .. py:attribute:: _val .. py:attribute:: _test .. py:attribute:: _batch_size .. py:method:: prepare_data() Use this to download and prepare data. Downloading and saving data with multiple processes (distributed settings) will result in corrupted data. Lightning ensures this method is called only within a single process, so you can safely add your downloading logic within. .. warning:: DO NOT set state to the model (use ``setup`` instead) since this is NOT called on every device Example:: def prepare_data(self): # good download_data() tokenize() etc() # bad self.split = data_split self.some_state = some_other_state() In a distributed environment, ``prepare_data`` can be called in two ways (using :ref:`prepare_data_per_node`) 1. Once per node. This is the default and is only called on LOCAL_RANK=0. 2. Once in total. Only called on GLOBAL_RANK=0. Example:: # DEFAULT # called once per node on LOCAL_RANK=0 of that node class LitDataModule(LightningDataModule): def __init__(self): super().__init__() self.prepare_data_per_node = True # call on GLOBAL_RANK=0 (great for shared file systems) class LitDataModule(LightningDataModule): def __init__(self): super().__init__() self.prepare_data_per_node = False This is called before requesting the dataloaders: .. code-block:: python model.prepare_data() initialize_distributed() model.setup(stage) model.train_dataloader() model.val_dataloader() model.test_dataloader() model.predict_dataloader() .. py:method:: setup(stage=None) Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP. Args: stage: either ``'fit'``, ``'validate'``, ``'test'``, or ``'predict'`` Example:: class LitModel(...): def __init__(self): self.l1 = None def prepare_data(self): download_data() tokenize() # don't do this self.something = else def setup(self, stage): data = load_data(...) self.l1 = nn.Linear(28, data.num_classes) .. py:method:: train_dataloader() Implement one or more PyTorch DataLoaders for training. Return: A collection of :class:`torch.utils.data.DataLoader` specifying training samples. In the case of multiple dataloaders, please see this :ref:`section `. The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer. For data processing use the following pattern: - download in :meth:`prepare_data` - process and split in :meth:`setup` However, the above are only necessary for distributed processing. .. warning:: do not assign state in prepare_data - :meth:`~pytorch_lightning.trainer.trainer.Trainer.fit` - :meth:`prepare_data` - :meth:`setup` Note: Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself. Example:: # single dataloader def train_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=True ) return loader # multiple dataloaders, return as list def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a list of tensors: [batch_mnist, batch_cifar] return [mnist_loader, cifar_loader] # multiple dataloader, return as dict def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar} return {'mnist': mnist_loader, 'cifar': cifar_loader} .. py:method:: val_dataloader() Implement one or multiple PyTorch DataLoaders for validation. The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer. It's recommended that all data downloads and preparation happen in :meth:`prepare_data`. - :meth:`~pytorch_lightning.trainer.trainer.Trainer.fit` - :meth:`~pytorch_lightning.trainer.trainer.Trainer.validate` - :meth:`prepare_data` - :meth:`setup` Note: Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself. Return: A :class:`torch.utils.data.DataLoader` or a sequence of them specifying validation samples. Examples:: def val_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def val_dataloader(self): return [loader_a, loader_b, ..., loader_n] Note: If you don't need a validation dataset and a :meth:`validation_step`, you don't need to implement this method. Note: In the case where you return multiple validation dataloaders, the :meth:`validation_step` will have an argument ``dataloader_idx`` which matches the order here. .. py:method:: test_dataloader() Implement one or multiple PyTorch DataLoaders for testing. For data processing use the following pattern: - download in :meth:`prepare_data` - process and split in :meth:`setup` However, the above are only necessary for distributed processing. .. warning:: do not assign state in prepare_data - :meth:`~pytorch_lightning.trainer.trainer.Trainer.test` - :meth:`prepare_data` - :meth:`setup` Note: Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself. Return: A :class:`torch.utils.data.DataLoader` or a sequence of them specifying testing samples. Example:: def test_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def test_dataloader(self): return [loader_a, loader_b, ..., loader_n] Note: If you don't need a test dataset and a :meth:`test_step`, you don't need to implement this method. Note: In the case where you return multiple test dataloaders, the :meth:`test_step` will have an argument ``dataloader_idx`` which matches the order here. .. py:function:: run_dask_clustered(func, client=None, **kwargs) .. py:function:: fit(model, X, y, max_iter, accel, strategy, devices, ngpus, batch_size=32, plugins=None, meta=None) .. py:class:: NeuralNetClassifier(model, max_iter=100, batch_size=32) Bases: :py:obj:`dasf.transforms.base.Fit` Class representing a Fit operation of the pipeline. .. py:attribute:: _model .. py:attribute:: _accel :value: None .. py:attribute:: _strategy :value: None .. py:attribute:: _max_iter .. py:attribute:: _devices :value: 0 .. py:attribute:: _ngpus :value: 0 .. py:attribute:: _batch_size .. py:attribute:: __trainer :value: False .. py:attribute:: __handler .. py:method:: _lazy_fit_generic(X, y, accel, ngpus) .. py:method:: _lazy_fit_gpu(X, y=None) Respective lazy fit mocked function for GPUs. .. py:method:: _lazy_fit_cpu(X, y=None) Respective lazy fit mocked function for CPUs. .. py:method:: __fit_generic(X, y, accel, ngpus) .. py:method:: _fit_gpu(X, y=None) Respective immediate fit mocked function for local GPU(s). .. py:method:: _fit_cpu(X, y=None) Respective immediate fit mocked function for local CPU(s).