Hyperparameter setting for training from scratch on CIFAR-10 #134

Yuancheng-Xu · 2022-12-01T00:53:10Z

Hi,

I am trying to train a convext on CIFAR-10 for a research project that doesn't allow using BN. I use the following configuration:

python -m torch.distributed.launch --nproc_per_node=4 main.py \
  --data_set image_folder --data_path ./CIFAR-10-images/train --eval_data_path ./CIFAR-10-images/test \
  --nb_classes 10 --num_workers 8 --warmup_epochs 0 \
  --save_ckpt false \
  --cutmix 0 --mixup 0 \
  --model_ema_eval true \
  --model convnext_tiny \
  --epochs 100 --lr 4e-4 --weight_decay 5e-2 --opt 'sgd' --input_size 32\
  --output_dir results/100epochs_lr_4e-4_wd_5e-2_sgd_inputsize_32 \

And the accuracy is only 75% percent (standard ResNet18 is about 93%). If I change the optimizer from AdamW to SGD, the best accuracy actually drops to below 50%. If I use the default input size 224, the accuracy is 84%, still significantly low.

Can ConvNeXt work on CIFAR10 without fine-tuning from a pretrained model? Could you provide a recommended set of hyper parameters for CIFAR10 (that should be robust to different types of optimizers and without mix-up and cutmix)?

Also I have another question on fine-tuning on CIFAR10: it seems that in the colab file the input_size is the default 224. However CIFAR10 image is 32*32. Does this mean that in the data preparation stage the image will be padded to 224 * 224?

Thank you!

The text was updated successfully, but these errors were encountered:

slerman12 · 2022-12-05T15:04:54Z

I was also wondering about this. It seems the 32x32 size of CIFAR-10 is incompatible with this model due to the down-sampling layers.

shamikbose · 2023-05-01T18:52:00Z

@Yuancheng-Xu It seems like it can. The downsampling layers should be set to a smaller kernel and stride size (2 and 2 respectively). Without this, the output of the downsampling layers is effectively the same size as the kernel.
In addition, you might want to choose a smaller kernel and padding size for the Block convolutional layers
Here's a notebook showing the training progress https://juliusruseckas.github.io/ml/convnext-cifar10.html

shamikbose · 2023-05-04T15:30:00Z

@Yuancheng-Xu I managed to get accuracy to 87% by making a few changes to the code in the link above. Basic changes are mentioned in this repository https://github.com/shamikbose/Fujitsu_Assessment
Main changes were as follows:

The downsampling convolutional layers were modified (4x4 -> 2x2) for the smaller image size in the dataset

This improved accuracy from 70% to 80%

Keeping CIFAR-10 training recipes in mind, the architecture was modified to be a 3-block architecture instead of a 4-block one

This improved accuracy from 80% to 85%

Kernel size was changed (7 -> 3)

This improved accuracy from 85% to 87%

Yuancheng-Xu · 2023-05-05T16:51:59Z

Thanks a lot!

iamsh4shank · 2023-06-05T21:46:26Z

Hey @shamikbose, I tried training the ImageNet100 dataset for custom input_size = 32, but the accuracy that I am getting is too low. What could I change in the architecture (I tried with making the kernel and stride small)? Any other approach that might help me to get good accuracy?

shamikbose · 2023-06-05T21:48:52Z

@iamsh4shank The parameters used for ImageNet100 are mentioned in the paper. You should be able to reproduce it using those values.

iamsh4shank · 2023-06-05T21:57:32Z

Actually ig it was for input_size 224 but on changing it to 32 I get accuracy really low

shamikbose · 2023-06-05T22:11:48Z

With image size 32, try the parameters mentioned here #134 (comment)

iamsh4shank · 2023-06-05T22:22:05Z

I did try changing the Conv layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L28) with kernel size 3 and padding 1. Also, I changed the downsampling layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L74) with kernel size 2 and stride 2. It did not change the accuracy much. I am getting test accuracy like 4-5 percent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter setting for training from scratch on CIFAR-10 #134

Hyperparameter setting for training from scratch on CIFAR-10 #134

Yuancheng-Xu commented Dec 1, 2022

slerman12 commented Dec 5, 2022

shamikbose commented May 1, 2023 •

edited

Loading

shamikbose commented May 4, 2023

Yuancheng-Xu commented May 5, 2023

iamsh4shank commented Jun 5, 2023

shamikbose commented Jun 5, 2023

iamsh4shank commented Jun 5, 2023 •

edited

Loading

shamikbose commented Jun 5, 2023

iamsh4shank commented Jun 5, 2023 •

edited

Loading

Hyperparameter setting for training from scratch on CIFAR-10 #134

Hyperparameter setting for training from scratch on CIFAR-10 #134

Comments

Yuancheng-Xu commented Dec 1, 2022

slerman12 commented Dec 5, 2022

shamikbose commented May 1, 2023 • edited Loading

shamikbose commented May 4, 2023

Yuancheng-Xu commented May 5, 2023

iamsh4shank commented Jun 5, 2023

shamikbose commented Jun 5, 2023

iamsh4shank commented Jun 5, 2023 • edited Loading

shamikbose commented Jun 5, 2023

iamsh4shank commented Jun 5, 2023 • edited Loading

shamikbose commented May 1, 2023 •

edited

Loading

iamsh4shank commented Jun 5, 2023 •

edited

Loading

iamsh4shank commented Jun 5, 2023 •

edited

Loading