Skip to content

kevin-q2/ds598_midterm

 
 

Repository files navigation

DS598 DL4DS Midterm Project

For my project I used the Microsoft Git Large model trained on the coco image dataset [4][5]. I found that this one was relatively simple to implement and work with. Fine tuning the model took the most time, I had to experiment with the attention mask, learning rate, and batch sizes to finally get a model that performs well. I ended up finding a nice parameter set that got me a CIDEr score of ~75 after only 1 epoch. I had fun learning about hugging face and implementation of deep learning models!

The model is completely contained within demo/train.py and demo/test.py but most of my experiments and work were done within experiments.ipynb

References

  1. CIDEr: Consensus-based image description evaluation
  2. BLEU: A Misunderstood Metric from Another Age, Medium Post
  3. BLEU Metric, HuggingFace space
  4. Microsoft Git Large
  5. GIT: A Generative Image-to-text Transformer for Vision and Language, Jianfeng Wang and Zhengyuan Yang and Xiaowei Hu and Linjie Li and Kevin Lin and Zhe Gan and Zicheng Liu and Ce Liu and Lijuan Wang (2022)

About

DL4DS Spring 2024 Midterm Challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 73.5%
  • Python 26.0%
  • Shell 0.5%