Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

This repo combines a Tacotron2 model with a ML-VAE and adversarial learning to target accent conversion in TTS settings (pick a speaker A with and assign them accent B).

Paper link: https://arxiv.org/abs/2406.01018

Samples link: https://amaai-lab.github.io/Accented-TTS-MLVAE-ADV/

This code is built upon Comprehensive-TTS: https://github.com/keonlee9420/Comprehensive-Transformer-TTS

Training

First download your dataset (L2Arctic / and CMUArctic) and preprocess the audio data into mel spectrogram .npy arrays with the preprocess.py script. We used L2CMU in this paper, which stands for a combination of L2Arctic (24 speakers) and CMUArctic (4 speakers). Then run CUDA_VISIBLE_DEVICES=X python train.py --dataset L2CMU

Inference

Once trained, you can run extract_stats.py to retrieve the accent and speaker embeddings of your evaluation set and store them. Then, you can synthesize with one of the synth scripts. :-)

Once trained, to generate (accent-converted / non-converted) speech, you can run

CUDA_VISIBLE_DEVICES=X python synthesize.py --dataset L2Arctic --restore_step [N] --mode [batch/single] --text [TXT] --speaker_id [SPID] --accent [ACC]

SPID = ABA, ASI, NCC,... speaker ID from the L2Arctic dataset

ACC = Arabic, Chinese, Hindi, Korean, Spanish, Vietnamese (accents from L2Arctic)

Unfortunately, we do not provide a trained model as of now.

BibTeX citation

@article{melechovsky2024accent,
      title={Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training}, 
      author={Jan Melechovsky and Ambuj Mehrish and Berrak Sisman and Dorien Herremans},
      journal={arXiv preprint arXiv:2406.01018},
      year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
audio		audio
config/L2CMU		config/L2CMU
model		model
preprocessor		preprocessor
samples		samples
text		text
utils		utils
L2formatter.py		L2formatter.py
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
extract_stats.py		extract_stats.py
index.html		index.html
metadata.csv		metadata.csv
mlvae.py		mlvae.py
plot_embs.py		plot_embs.py
preprocess.py		preprocess.py
schematic.png		schematic.png
synth_batch_extract_stats.py		synth_batch_extract_stats.py
synthesize_testset.py		synthesize_testset.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

Training

Inference

BibTeX citation

About

Releases

Packages

Contributors 2

Languages

AMAAI-Lab/Accented-TTS-MLVAE-ADV

Folders and files

Latest commit

History

Repository files navigation

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

Training

Inference

BibTeX citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages