Code for CVPR 2024 Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
- It is best to start a new conda environment.
conda create -n [env_name] python=3.10
conda activate [env_name]
- Install required packages
pip install torch==1.12.1 torchvision==0.13.1
pip install -U openmim
pip install -r requirements.txt
python -m mim install mmcv-full==1.6.2 mmsegmentation==0.27.0
- Deal with NLTK stuffs
python
>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('averaged_perceptron_tagger')
Note that much of this section is adapted from the data preparation section of TCL README.
In training, we use Conceptual Caption 3m and 12m. We use img2dataset tool and follow these instructions to download and preprocess the datasets.
In the paper, we use 6 benchmarks; PASCAL VOC, PASCAL Context, and COCO-Object, COCO-Stuff, Cityscapes, and ADE20k. We need to prepare 5 datasets: PASCAL VOC, PASCAL Context, COCO-Stuff164k, Cityscapes, and ADE20k.
Please download and setup PASCAL VOC, PASCAL Context, COCO-Stuff164k, Cityscapes, and ADE20k datasets following MMSegmentation data preparation document.
COCO-Object dataset uses only object classes from COCO-Stuff164k dataset by collecting instance semgentation annotations. Run the following command to convert instance segmentation annotations to semantic segmentation annotations:
python convert_dataset/convert_coco.py data/coco_stuff164k/ -o data/coco_stuff164k/
The overall file structure is as follows:
ImageTextCoDecomposition
├── data
│ ├── gcc3m
│ │ ├── gcc-train-000000.tar
│ │ ├── ...
│ ├── gcc12m
│ │ ├── cc-000000.tar
│ │ ├── ...
│ ├── cityscapes
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2012
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClass
│ │ │ ├── ImageSets
│ │ │ │ ├── Segmentation
│ │ ├── VOC2010
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClassContext
│ │ │ ├── ImageSets
│ │ │ │ ├── SegmentationContext
│ │ │ │ │ ├── train.txt
│ │ │ │ │ ├── val.txt
│ │ │ ├── trainval_merged.json
│ │ ├── VOCaug
│ │ │ ├── dataset
│ │ │ │ ├── cls
│ ├── ade
│ │ ├── ADEChallengeData2016
│ │ │ ├── annotations
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ │ │ ├── images
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ ├── coco_stuff164k
│ │ ├── images
│ │ │ ├── train2017
│ │ │ ├── val2017
│ │ ├── annotations
│ │ │ ├── train2017
│ │ │ ├── val2017
Follow the original installation process to setup your environment, then run the following script
sh scripts/train.sh
We provide our official checkpoint to reproduce the main results of our paper.
- Evaluation
sh scripts/eval.sh
@article{wu2024image,
title={Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation},
author={Wu, Ji-Jia and Chang, Andy Chia-Hao and Chuang, Chieh-Yu and Chen, Chun-Pei and Liu, Yu-Lun and Chen, Min-Hung and Hu, Hou-Ning and Chuang, Yung-Yu and Lin, Yen-Yu},
journal={arXiv preprint arXiv:2404.04231},
year={2024}
}