1. Add CIFAR10 datamodule (see how MNIST is integrated with the template, and similarly integrate CIFAR10)
Create a class in file src/datamodules/cifar10_datamodule.py
Configured for instantiation configs/datamodule/ciafr10.yaml
_target_: src.datamodules.cifar10_datamodule.CIFAR10DataModule
data_dir: ${paths.data_dir}
batch_size: 128
train_val_test_split: [45_000, 5_000, 10_000]
num_workers: 0
pin_memory: False
Create a new LightningModule class named TIMMLitModule in src/models/timm_module.py
Configured for instantiation configs/models/timm.yaml
_target_: src.models.timm_module.TIMMLitModule
_target_: torch.optim.Adam
_partial_: true
lr: 0.001
weight_decay: 0.0
_target_: timm.create_model
model_name: resnet18
pretrained: True
num_classes: 10
Create a confgs/experiment/cifar.yaml
to run any experiment by overriding the model, datamodule or model params
Create a confgs/hparams_search/cifar10_optuna.yaml
to tune hyperparameters
Update confgs/train.yaml
and congs/eval.yaml
with model config and data module config
- _self_
- datamodule: cifar10.yaml
- model: timm.yaml
Create a Dockerfile
Update the Makefile
4. Include scripts train.py and eval.py for training and eval(metrics) for the model, docker run :<>tag python3 src/train.py experiment=experiment_name.yaml
Volume Mount
docker run --volume `pwd`:/workspace/project/ pl-hydra-timm:latest python3 src/train.py experiment=cifar.yaml
Inference of any pretrained timm model
Very similar to git commands to manage data verion control
Installing DVC
sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
wget -qO - https://dvc.org/deb/iterative.asc | sudo apt-key add -
sudo apt update
sudo apt install dvc
pip install dvc
- When already in git repository - ( you already will have git init, check
ls -al
find .git folder)
dvc init
Now, there will be .dvc
folder created
- Add data folder to track
dvc add data
A new file data.dvc
md5 hash created to track all changes
- autostage: if enabled, DVC will automatically stage (git add) DVC files created or modified by DVC commands.
dvc config core.autostage true
- Check remote where we can push/push ( this is for git, we need same for dvc)
git remote -v
- Go and create a new folder ( now in gdrive ), say lightning-hydra
get into the folder and check the url as below - https://drive.google.com/drive/u/1/folders/1t9Vs8OwPOtQGnz1aR4KyPQA2k7FbKR5A https://drive.google.com/drive/u/1/folders/1ts8OwPOtQGnz1aR4KyPQA2k7FbKR5A
folder id - 1ts8OwPOtQGnz1aR4KyPQA2k7FbKR5A
Add a remote
dvc remote add gdrive gdrive://1t9Vs8OwPOtQGnz1aR4KyPQA2k7FbKR5A
- Add folders and files to stage
git add .
git commit -m "updated dvc"
- Push the changes to gdrive ( NOTE : give permission to folders and gmail account)
-r : remote, gdrive : name of remote folder
dvc push -r gdrive
Read/ Watch here
- Need to have a
describing stages and dependecies, see below -
cmd: python3 src/train.py experiment=mnist
- data/MNIST
- To run pipelines, dvc reproduce
dvc repro train-mnist
- ...
- lets just run for normal grid search before using optuna
```-m`` because we need multiple runs with different batch This will run 4 jobs in parallel
python3 src/train.py -m experiment=mnist datamodule.batch_size=16,32,64,128 tags=["batch_size_exp"]
💡✨ Median Pruner : Prune if the trial’s best intermediate result is worse than median of intermediate results of previous trials at the same step.
Pure Optuna example code here
Hydra plugin for Optuna here
Setup hydra yaml file under configs/hparams_search/mnist_optuna.yaml
- Choose metric to be optimised : this should be logged in
ofmodel class
in lightning module! here MNISTLitModule - Mention direction : maximise or minimise according to metric used
- Total number of trails to run
- Choice of Sampler : TPE is bayesian
- n_startup_trials: 10 # number of random sampling runs before optimization starts
- Define hyperparameter search space
model.optimizer.lr: interval(0.0001, 0.1)
datamodule.batch_size: choice(32, 64, 128, 256)
model.net.lin1_size: choice(64, 128, 256)
model.net.lin2_size: choice(64, 128, 256)
model.net.lin3_size: choice(32, 64, 128, 256)
To run the hyperparameter search
python train.py -m hparams_search=mnist_optuna experiment=mnist
Read here
CometLogger - Track your parameters, metrics, source code and more using Comet.
CSVLogger - Log to local file system in yaml and CSV format.
MLFlowLogger - Log using MLflow.
NeptuneLogger - Log using Neptune.
TensorBoardLogger - Log to local file system in TensorBoard format.
WandbLogger - Log using Weights and Biases.
Remote Logging with PyTorch Lightning: https://pytorch-lightning.readthedocs.io/en/stable/common/remote_fs.html (Links to an external site.)
had logger specified as null (default).
To enable logger, we should add below line to the yaml file specific to experiment in experiment folder (for e.g. mnist.yaml
in experiment folder) to use particular say, tensorboard logger.
- override /logger: tensorboard.yaml
Run and check
python3 src/train.py experiment=mnist
Go to tensorboard folder within logs folder (bind_all : someone else can access in same network)
tensorboard --logdir . --bind_all
It will open in local browser..
NOTE : We can change in train.yaml also which will become a default for all whether you run with experiment or without
To Log to multiple logger we have many_logger.yaml
in logger folder. This contains list of loggers say - tensorboard, csvlogger, mlflow etc. We could
- override /logger: many_loggers.yaml
is also enabled via many_loggers.yaml
Run the experiment again python3 src/train.py experiment=mnist
Go to the mlflow folder inside the logs (must have child folder created - ./mlruns/mlruns/meta.yaml) and run below command. You will be able to see the logs for hyperparams, accuracy etc.
mlflow ui
In training class, in training step, while saving logs set on_step=True
as shown below -
def training_step(self, batch: Any, batch_idx: int):
loss, preds, targets = self.step(batch)
# update and log metrics
self.train_acc(preds, targets)
self.log("train/loss", self.train_loss, on_step=True, on_epoch=True, prog_bar=True)
self.log("train/acc", self.train_acc, on_step=True, on_epoch=True, prog_bar=True)
def validation_epoch_end(self, outputs: List[Any]):
acc = self.val_acc.compute() # get current val acc
self.val_acc_best(acc) # update best so far val acc
# log `val_acc_best` as a value through `.compute()` method, instead of as a metric object
# otherwise metric would be reset by lightning after each epoch
self.log("val/acc_best", self.val_acc_best.compute(), prog_bar=True)
loss = self.val_loss.compute() #add validation loss to hp_metric
self.log("hp_metric", loss)
model.optimizer.lr: interval(0.0001, 0.1)
model.optimizer._target_: choice(torch.optim.SGD, torch.optim.Adam, torch.optim.RMSprop)
datamodule.batch_size: choice(64, 128, 256)
Best Params :
Batch size : 64
Learning rate : 0.068378
HP_Metric/ Validation Loss : 0.19418
Optimizer : SGD
- Overrides the model.yaml
settings in model folder
Checkout Gradio demo to get some flavour.
Add mnistDemo.py
Script is similar to train.py
or eval.py
structure wise.
Import necessary libraries
Define a function demo which would be called in main with configurations provided by config yaml () - a. Checks whether model (check point) path is given or not b. Instansiate model for inference (.pth file) c. Loads weights, model d. Define interface function
source = "canvas"
: User can draw and we infer
image mode = "L"
: Single channel (because we using MNIST dataset which is single channel -B&W)
invert color = true
- Because when we use canvas digits are black in color and background in white
live = true
: Realtime inferenece applications
Add gradio in requirements.txt
Create a config file demoMnist.yaml
to fetch configurations while running app under config folder ( this is similar to eval.yaml file)
Still needs :
- callbacks: default.yaml
- experiment: null
To assert : make it mandatory to provide ckpt path use as below -
ckpt_path: ???
python src/demoMnist.py ckpt_path=logs/train/runs/2022-09-30_06-02-58/checkpoints/last.ckpt experiment=mnist
We need to provide data module, experiment, trainer, parameters to instansiate the model everytime
Create model - load weights, having model.py (e.g. mnist_module.py)
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
TorchScript is a statically typed subset of Python that can either be written directly (using the TORCH.JIT.SCRIPT decorator) or generated automatically from Python code via Tracing. When using tracing, code is automatically converted into this subset of Python by recording only the actual operators on tensors and simply executing and discarding the other surrounding Python code.
In other words - You can export as non python representaion of the model to be loaded by any environment ( e.g. pure c++ )
Provides Efficient and Portable Pytorch production deployment
captures both the operations and full conditional logic of your model, whereas torch.jit.trace
will actually run the model with given dummy inputs and it will freeze the conditional logic as per the dummy values provided.
You can read about edge cases of Tracing here:
Compile your model to TorchSript example
has a handy method to_torchscript()
that returns a scripted module which you can save or directly use
💡If you want to script a different method (export a function with torch script or trace), you can decorate the method with torch.jit.export()
- Imports in mnist_module.py file to export the tranformations also
import torch.nn.functional as F
from torchvision import transforms as T
Add below lines above forward function in mnist_module.py.
self.predict_transform = T.Normalize((0.1307,), (0.3081,))
Add below line jst after the forward function, this exports the forward_jit func whenver there is torchscript or trace. Here we are push the necessary transforms also to model instansiation. So that nothing is required.
def forward_jit(self, x: torch.Tensor):
with torch.no_grad():
# transform the inputs
x = self.predict_transform(x)
# forward pass
logits = self(x)
preds = F.softmax(logits, dim=-1)
return preds
- In train.py to save the serialized model (or complied model). Add below line just after - train_metrics = trainer.callback_metrics
log.info("Scripting Model..")
scripted_model = model.to_torchscript(method="script")
torch.jit.save(scripted_model, f"{cfg.paths.output_dir}/model.script.pt")
log.info(f"Saving traced model to {cfg.paths.output_dir}/model.script.pt")
- demoMnistScripted.yaml
# @package _global_
- _self_
- paths: default.yaml
- hydra: default.yaml
task_name: "demo_traced"
# checkpoint is necessary for demo
ckpt_path: ???
python3 src/demoMnistScripted.py ckpt_path=logs/train/runs/2022-09-30_12-43-23/model.script.pt
Create a Dockerfile named Dockerfile.demoMNIST
Copy deployable model to root folder ( or some deploy folder)
cp logs/train/runs/2022-09-30_12-43-23/model.script.pt model.script.pt
Create / add files and folders not required in .dockerignore
: To reduce the docker image size
Docker build by -
docker build -f Dockerfile.demoMNIST -t testapp .
Docker run by
docker run -it -p 8080:8080 testapp:latest
- In train.py to save the serialized model (or complied model). Add below line just after - train_metrics = trainer.callback_metrics
log.info("Scripting Model..")
scripted_model = model.to_torchscript(method="script")
torch.jit.save(scripted_model, f"{cfg.paths.output_dir}/model.script.pt")
log.info(f"Saving traced model to {cfg.paths.output_dir}/model.script.pt")
- Create a new file
inside config folder containing following lines of code. It helps set default paths and makes it mandatory to provide model path to load.
# @package _global_
- _self_
- paths: default.yaml
- hydra: default.yaml
task_name: "demo_traced"
# checkpoint is necessary for demo
ckpt_path: ???
- Train for few epochs ( I did only for on epoch)
python3 src/train.py experiment=cifar
- Create a new file
- Create a Gradio app interface
- Loads the model
- It must accept image from user, and give the top 10 predictions
import pyrootutils
from typing import List, Tuple
import torch
import hydra
import gradio as gr
from omegaconf import DictConfig
from torchvision import transforms
from src import utils
- test
python3 src/demoCIFAR10Scripted.py ckpt_path=logs/train/runs/2022-10-05_05-12-27/model.script.pt
Create a dockerfile named
Create a requirements file named
Docker build by -
docker build -f Dockerfile.demoCIFAR10 -t vikashkr117/gradio-cifar-app .
- Docker run by
docker run -it -p 8080:8080 vikashkr117/gradio-cifar-app:latest
- Login to docker hub from CLI
docker login -u vikashkr117
At the password prompt, enter the personal access token.
- Push the image
docker push vikashkr117/gradio-cifar-app
- Go to Play with Docker and start a session. Pull the image using below command
docker pull vikashkr117/gradio-cifar-app
- Run the pulled docker image
docker run -it -p 8080:8080 vikashkr117/gradio-cifar-app:latest
Copy the gradio link provide e.g. https://<......>.gradio.app and paste in web browser
Place an image and click on submit, you will see top 10 classes and probabilities associated with it.
Cheers 🥂
python3 src/demoCIFAR10Scripted.py ckpt_path=logs/train/runs/2022-10-05_05-12-27/model.script.pt
Ref : https://gradio.app/image_classification_in_pytorch/
✨💡✨ One folder or file cannot be tracked by both - git or dvc- Yes/No?
To fix for gitpod.io (or just use gitpod in local VS code)
ssh -L 8080:localhost:8080
To see all files/folders, size, users and permisions
ls -alrth
Run the tests locally using pre-commit run —all-files
This will run all the tests defined in the .pre-commit-config.yaml
To look into docker files :-
docker container run --rm -it testapp /bin/sh
docker create --name="tmp_$$" image:tag
docker export tmp_$$ | tar t
docker rm tmp_$$
Dockerfile ignore
# 1. Ignore everything
# 2. Add files and directories that should be included
# 3. Bonus step: ignore any unnecessary files that may be inside those allowed directories in 2
docker image history
RUN apt install -y
&& rm -rf /var/lib/apt/lists
CMD [ "python3", "src/demoMnistScripted.py" , "ckpt_path=model.script.pt"]
- Create a tar file
tar -czvf model.tar.gz demo_model