GitHub - yzmyyff/voicebox-pytorch: Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Voicebox - Pytorch (wip)

Implementation of Voicebox, new SOTA Text-to-Speech model from MetaAI, in Pytorch. Press release

In this work, we will use rotary embeddings. The authors seem unaware that ALiBi cannot be straightforwardly used for bidirectional models.

The paper also addresses the issue with time embedding incorrectly subjected to relative distances (they concat the time embedding along the frame dimension of the audio tokens). This repository will use adaptive normalization, as applied successfully in Paella

Appreciation

for awarding me the Imminent Grant to advance the state of open sourced text-to-speech solutions. This project was started and will be completed under this grant.
StabilityAI for the generous sponsorship, as well as my other sponsors, for affording me the independence to open source artificial intelligence.
Bryan Chiang for the ongoing code review, sharing his expertise on TTS, and pointing me to an open sourced implementation of conditional flow matching
Manmay for getting the repository started with the alignment code
@chenht2010 for finding a bug with rotary positions, and for validating that the code in the repository converges

Install

$ pip install voicebox-pytorch

Usage

import torch

from voicebox_pytorch import (
    VoiceBox,
    ConditionalFlowMatcherWrapper
)

model = VoiceBox(
    dim = 512,
    num_phoneme_tokens = 256,
    depth = 2,
    dim_head = 64,
    heads = 16
)

cfm_wrapper = ConditionalFlowMatcherWrapper(
    voicebox = model,
    use_torchode = False   # by default will use torchdiffeq with midpoint as in paper, but can use the promising torchode package too
)

x = torch.randn(2, 1024, 512)
phonemes = torch.randint(0, 256, (2, 1024))
mask = torch.randint(0, 2, (2, 1024)).bool()

loss = cfm_wrapper(
    x,
    phoneme_ids = phonemes,
    cond = x,
    mask = mask
)

loss.backward()

# after much training above...

sampled = cfm_wrapper.sample(
    phoneme_ids = phonemes,
    cond = x,
    mask = mask
) # (2, 1024, 512) <- same as cond

Todo

Citations

@article{Le2023VoiceboxTM,
    title   = {Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale},
    author  = {Matt Le and Apoorv Vyas and Bowen Shi and Brian Karrer and Leda Sari and Rashel Moritz and Mary Williamson and Vimal Manohar and Yossi Adi and Jay Mahadeokar and Wei-Ning Hsu},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2306.15687},
    url     = {https://api.semanticscholar.org/CorpusID:259275061}
}

@inproceedings{dao2022flashattention,
    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
    booktitle = {Advances in Neural Information Processing Systems},
    year    = {2022}
}

@misc{torchdiffeq,
    author  = {Chen, Ricky T. Q.},
    title   = {torchdiffeq},
    year    = {2018},
    url     = {https://github.com/rtqichen/torchdiffeq},
}

@inproceedings{lienen2022torchode,
    title     = {torchode: A Parallel {ODE} Solver for PyTorch},
    author    = {Marten Lienen and Stephan G{\"u}nnemann},
    booktitle = {The Symbiosis of Deep Learning and Differential Equations II, NeurIPS},
    year      = {2022},
    url       = {https://openreview.net/forum?id=uiKVKTiUYB0}
}

@article{siuzdak2023vocos,
    title   = {Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
    author  = {Siuzdak, Hubert},
    journal = {arXiv preprint arXiv:2306.00814},
    year    = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
images		images
voicebox_pytorch		voicebox_pytorch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voicebox - Pytorch (wip)

Appreciation

Install

Usage

Todo

Citations

About

Releases

Packages

Languages

License

yzmyyff/voicebox-pytorch

Folders and files

Latest commit

History

Repository files navigation

Voicebox - Pytorch (wip)

Appreciation

Install

Usage

Todo

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages