Guidelines to Record a TTS Dataset at Home

This section describes how to get the best audio quality when recording TTS data at home.

Recommended Data

Start by recording the Harvard sentences, which should produce about 20 to 40 minutes of usable data for TTS.

Hardware Requirements

Use a microphone like the Audio-Technica AT2020USB+ or the Blue Yeti USB mic.
Get a boom filter or windscreen.

Software Requirements

This document focuses on Audacity: it’s free, and it has a db meter and shortcuts for making recording easy.

Reaper is also quite good but not yet covered on this document.

Do not use software that does not have a numbered db meter.

Recording Prerequisites

Connect your microphone to your computer.
Open Audacity.
Select your audio interface.

Set the bit depth to 24-bit preferably or 16-bit.

Set the sampling rate to 96 kHz preferably or 44 kHz.

On the microphone, set the microphone pattern to Cardioid if you have that option.

Set up the boom filter or windscreen.

Select the most quiet room in your environment, and close your windows and doors.
Eliminate external sources of noise, for example, air conditioning, computer fan, and so on.

Adjusting the Microphone Level and Body Position Before Recording

Set your microphone gain to the direction of a clock's hour hand marking '9 o'clock.'
Press the recording button on Audacity.
Make sure you’re talking onto the side of the microphone that has the brand logo.

Position yourself at least a fist away from the microphone and no more than a foot away.

Speak into the microphone with the voice you will use during the TTS data recording.
Adjust the microphone gain to optimize the signal to noise ratio:

If you’re recording with 24-bit, make sure the db meter is hitting between -24 db and -6 db while you’re recording.

If you’re recording with 16-bit, make sure the db meter is between -12 db and -6 db while you’re recording.

Do not change the microphone gain after you’ve adjusted it.

Positioning Yourself Just Right, Too Far or Too Close

1 fist away (just right) – no distortions, minimal room sound, good signal to noise ratio.

1 inch away – muffled (proximity effect) and distortion from plosives like /b/ and /p/.

2 feet away – lots of room sound and bad signal-to-noise ratio.

Recording the TTS Data

Prepare your script such that you can easily read it.
Press Shift+R to record into a new track, read the first sentence, and press space bar to stop recording.
Repeat Step 2 for each sentence in your dataset until you have completed the recording of the last one.
Export all files by clicking on File > Export > Export Multiple....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tts-dataset-recording-at-home.md

tts-dataset-recording-at-home.md

Guidelines to Record a TTS Dataset at Home

Recommended Data

Hardware Requirements

Software Requirements

Recording Prerequisites

Adjusting the Microphone Level and Body Position Before Recording

Positioning Yourself Just Right, Too Far or Too Close

Recording the TTS Data

Files

tts-dataset-recording-at-home.md

Latest commit

History

tts-dataset-recording-at-home.md

File metadata and controls

Guidelines to Record a TTS Dataset at Home

Recommended Data

Hardware Requirements

Software Requirements

Recording Prerequisites

Adjusting the Microphone Level and Body Position Before Recording

Positioning Yourself Just Right, Too Far or Too Close

Recording the TTS Data