Modern TTS models for various languages #523

snakers4 · 2021-04-02T07:32:29Z

Consider giving a go to Silero TTS models. These are published under an open license assuming non-commercial / personal usage. Please see our TTS models here - https://github.com/snakers4/silero-models#text-to-speech (corresponding article https://habr.com/ru/post/549482/).

What is most important our TTS models can run on one CPU thread / core decently and depend mostly only on PyTorch.

Just let me repost some of the benchmarks here:

RTF (Real Time Factor) - time the synthesis takes divided by audio duration;
RTS = 1 / RTF (Real Time Speed) - how much the synthesis is "faster" than realtime;

We benchmarked the models on two devices using Pytorch 1.8 utils:

CPU - Intel i7-6800K CPU @ 3.40GHz;
GPU - 1080 Ti;
When measuring CPU performance, we also limited the number of threads used;

For the 16KHz models we got the following metrics:

| BatchSize | Device        | RTF   | RTS   |
| --------- | ------------- | ----- | ----- |
| 1         | CPU 1 thread  | 0.7   | 1.4   |
| 1         | CPU 2 threads | 0.4   | 2.3   |
| 1         | CPU 4 threads | 0.3   | 3.1   |
| 4         | CPU 1 thread  | 0.5   | 2.0   |
| 4         | CPU 2 threads | 0.3   | 3.2   |
| 4         | CPU 4 threads | 0.2   | 4.9   |
| ---       | -----------   | ---   | ---   |
| 1         | GPU           | 0.06  | 16.9  |
| 4         | GPU           | 0.02  | 51.7  |
| 8         | GPU           | 0.01  | 79.4  |
| 16        | GPU           | 0.008 | 122.9 |
| 32        | GPU           | 0.006 | 161.2 |
| ---       | -----------   | ---   | ---   |

For the 8KHz models we got the following metrics:

| BatchSize | Device        | RTF   | RTS   |
| --------- | ------------- | ----- | ----- |
| 1         | CPU 1 thread  | 0.5   | 1.9   |
| 1         | CPU 2 threads | 0.3   | 3.0   |
| 1         | CPU 4 threads | 0.2   | 4.2   |
| 4         | CPU 1 thread  | 0.4   | 2.8   |
| 4         | CPU 1 threads | 0.2   | 4.4   |
| 4         | CPU 4 threads | 0.1   | 6.6   |
| ---       | -----------   | ---   | ---   |
| 1         | GPU           | 0.06  | 17.5  |
| 4         | GPU           | 0.02  | 55.0  |
| 8         | GPU           | 0.01  | 92.1  |
| 16        | GPU           | 0.007 | 147.7 |
| 32        | GPU           | 0.004 | 227.5 |
| ---       | -----------   | ---   | ---   |

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modern TTS models for various languages #523

Modern TTS models for various languages #523

snakers4 commented Apr 2, 2021

Modern TTS models for various languages #523

Modern TTS models for various languages #523

Comments

snakers4 commented Apr 2, 2021