SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

In this work, we present SailCompass, a comprehensive suite of evaluation scripts designed for robust and reproducible evaluation of multilingual language models targeting Southeast Asian languages.

SailCompass encompasses three major SEA languages and covers eight primary tasks using 14 datasets, spanning three task types: generation, multiple-choice questions, and classification.

Please refer to SailCompass Paper for more details.

Environment Setup

We use OpenCompass to evaluate the models. To install the required packages, run the following command under this folder:

conda create --name sailcompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate sailcompass

git clone https://github.com/sail-sg/sailcompass sailcompass

###git clone submodule
cd sailcompass
git submodule update --init --recursive

###git clone opencompass and copy the config
bash setup_environment.sh

###download eval data from huggingface
mkdir data
python download_eval_data.py

Evaluation Script

To build the evaluation script, run the following command under this folder:

bash setup_sailcompass.sh

Run Evaluation

To run the evaluation, run the following command under this folder:

cd opencompass
python run.py configs/eval_sailcompass.py -w outputs/sailcompass --num-gpus 1 --max-num-workers 64 --debug

Acknowledgment

Thanks to the contributors of the opencompass.

Citing this work

If you use sailcompass benchmark in your work, please cite

@misc{sailcompass,
      title={SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages}, 
      author={Jia Guo and Longxu Dou and Guangtao Zeng and Stanley Kok and Wei Lu and Qian Liu},
      year={2024},
}

Contact

If you have any questions, please raise an issue on our GitHub repository or contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
datasets		datasets
opencompass @ 001e77f		opencompass @ 001e77f
rouge @ 6b73c45		rouge @ 6b73c45
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
download_eval_data.py		download_eval_data.py
eval_sailcompass.py		eval_sailcompass.py
icl_sailor_evaluator.py		icl_sailor_evaluator.py
sailor_text_postprocessors.py		sailor_text_postprocessors.py
setup_environment.sh		setup_environment.sh
setup_sailcompass.sh		setup_sailcompass.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

Environment Setup

Evaluation Script

Run Evaluation

Acknowledgment

Citing this work

Contact

About

Releases

Packages

Contributors 2

Languages

sail-sg/sailcompass

Folders and files

Latest commit

History

Repository files navigation

SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

Environment Setup

Evaluation Script

Run Evaluation

Acknowledgment

Citing this work

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages