Are you sure you want to delete this access key?
This repo contains the training code for Phoneme-level ASR for Voice Conversion (VC) and TTS (Text-Mel Alignment) used in StarGANv2-VC and StyleTTS.
git clone https://github.com/yl4579/AuxiliaryASR.git
cd AuxiliaryASR
pip install SoundFile torchaudio torch jiwer pyyaml click matplotlib g2p_en librosa
train_list.txt
and val_list.txt
in the Data
folder (see Training section for more details).python train.py --config_path ./Configs/config.yml
Please specify the training and validation data in config.yml
file. The data list format needs to be filename.wav|label|speaker_number
, see train_list.txt as an example (a subset for LJSpeech). Note that speaker_number
can just be 0
for ASR, but it is useful to set a meaningful number for TTS training (if you need to use this repo for StyleTTS).
Checkpoints and Tensorboard logs will be saved at log_dir
. To speed up training, you may want to make batch_size
as large as your GPU RAM can take. However, please note that batch_size = 64
will take around 10G GPU RAM.
This repo is set up for English with the g2p_en package, but you can train it with other languages. If you would like to train for datasets in different languages, you will need to modify the meldataset.py file (L86-93) with your own phonemizer. You also need to change the vocabulary file (word_index_dict.txt) and change n_token
in config.yml
to reflect the number of tokens. A recommended phonemizer for other languages is phonemizer.
The author would like to thank @tosaka-m for his great repository and valuable discussions.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?