DS598 DL4DS Midterm Project

For my project I used the Microsoft Git Large model trained on the coco image dataset [4][5]. I found that this one was relatively simple to implement and work with. Fine tuning the model took the most time, I had to experiment with the attention mask, learning rate, and batch sizes to finally get a model that performs well. I ended up finding a nice parameter set that got me a CIDEr score of ~75 after only 1 epoch. I had fun learning about hugging face and implementation of deep learning models!

The model is completely contained within demo/train.py and demo/test.py but most of my experiments and work were done within experiments.ipynb

References

CIDEr: Consensus-based image description evaluation
BLEU: A Misunderstood Metric from Another Age, Medium Post
BLEU Metric, HuggingFace space
Microsoft Git Large
GIT: A Generative Image-to-text Transformer for Vision and Language, Jianfeng Wang and Zhengyuan Yang and Xiaowei Hu and Linjie Li and Kevin Lin and Zhe Gan and Zicheng Liu and Ce Liu and Lijuan Wang (2022)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
.gitignore		.gitignore
README.md		README.md
cnnlstm_test.sh		cnnlstm_test.sh
cnnlstm_train.sh		cnnlstm_train.sh
demo_test.sh		demo_test.sh
demo_train.sh		demo_train.sh
error.txt		error.txt
experiments.ipynb		experiments.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DS598 DL4DS Midterm Project

References

About

Releases

Packages

Languages

kevin-q2/ds598_midterm

Folders and files

Latest commit

History

Repository files navigation

DS598 DL4DS Midterm Project

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages