Important Notification About Data Bug

In the initial release, there was a bug in data_2_terminology/train.tsv. If you used this file, you must redownload the fixed file, as the old file did not properly include all terminologies.
I've added diffcheck.py so you can check that the only difference between the data_2/train.tsv and data_2_terminology/train.tsv is the 608 terminology pairs.
The baseline results for using terminology has also been updated, but the differences from before are minor.

Sorry about the confusion!!

Baseline for WMT21 Machine Translation using Terminologies Task

This is a baseline for the WMT21 Machine Translation using Terminologies task. The task invites participants to explore methods to incorporate terminologies into either the training or the inference process, in order to improve both the accuracy and consistency of MT systems on a new domain.

For the baseline, we consider the English-to-French translation task, and evaluation is performed on the TICO-19 dataset, which is part of the overall evaluation for the task in WMT21.

Model

The baseline finetunes OPUS-MT systems which are pre-trained on the OPUS parallel data using the Marian toolkit. We used the Huggingface ported version and used Huggingface + Pytorch to train.

Datasets

In the challenge you are allowed to use any parallel or monolingual data from previous WMT shared tasks. However, to reduce training time & resources we finetuned the pre-trained English-to-French model from MarianMT (OPUS-MT) on the following datasets:

Training Datasets

Medline

Split	Num. examples
Training1	614093
Training2	6540

Taus

Split	Num. examples
Train	885606

Terminologies

Split	Num. examples
Train	608

Subsampled dataset for finetuning

For the intial training dataset, we have two subsampled versions:

data_2

Dataset	Taus	Medline1	Medline2	Terminologies	Total
Num. examples	30000	30000	6540	0	66540

Note that Medline2 has max. 6540 pairs after filtering empty examples

data_2_terminology

Dataset	Taus	Medline1	Medline2	Terminologies	Total
Num. examples	30000	30000	6540	608	67148

Evaluation Dataset

During training, we evaluate on the dev set of TICO-19, and the final evaluation is performed on the test set of TICO-19. Note that the dev and test sets are on the smaller side.

Split	Num. examples
Dev	971
Test	2100

Baseline results

Training Epochs	Use terminology	Eval set	BLEU
3 epochs	No	TICO-19 Dev	40.0991
3 epochs	Yes	TICO-19 Dev	40.3334
3 epochs	No	TICO-19 Test	37.5342
3 epochs	Yes	TICO-19 Test	37.6491

10 epochs	No	TICO-19 Dev	39.9382
10 epochs	Yes	TICO-19 Dev	40.0829
10 epochs	No	TICO-19 Test	37.4869
10 epochs	Yes	TICO-19 Test	37.579

Note that due to the small size of the data, these results can vary depending on various settings (hyperparameters, training epochs, etc.). However, generally the results should be better when using terminology than not.

Run the code on Colab

You can run the baseline experiments from this colab notebook. To make further changes to the code, make sure to choose "Save a copy in Drive" to save an editable copy to your own Google Drive.

TODO

code for generating the datasets will be added
add editable transformer code

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data_2		data_2
data_2_terminology		data_2_terminology
terminologies		terminologies
README.md		README.md
baseline_hf.py		baseline_hf.py
baseline_hf.sh		baseline_hf.sh
dataset.py		dataset.py
diffcheck.py		diffcheck.py
predict.sh		predict.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Important Notification About Data Bug

Baseline for WMT21 Machine Translation using Terminologies Task

Model

Datasets

Training Datasets

Subsampled dataset for finetuning

Evaluation Dataset

Baseline results

Run the code on Colab

TODO

About

Releases

Packages

Languages

mnskim/wmt_baseline

Folders and files

Latest commit

History

Repository files navigation

Important Notification About Data Bug

Baseline for WMT21 Machine Translation using Terminologies Task

Model

Datasets

Training Datasets

Subsampled dataset for finetuning

Evaluation Dataset

Baseline results

Run the code on Colab

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages