Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  git
ButterCream 2191521072
add mandarin required token and update config
2 weeks ago
2191521072
add mandarin required token and update config
2 weeks ago
78ed84f65c
Add files via upload
1 year ago
e6ec33fe66
add gitignore for checkpoints
1 month ago
1bdcfe1dfa
Initial commit
1 year ago
29fdbbf8d4
Update README.md
1 year ago
5cb3ee46c4
Add files via upload
1 year ago
1d769f90e9
account for my stupidity
1 month ago
4101138abb
fix padding errors by converting to mono, apparently not accounted for
1 month ago
5cb3ee46c4
Add files via upload
1 year ago
8c8426f167
fix charmap issues
1 month ago
1939a7172f
attempt resolve issues
1 month ago
5cb3ee46c4
Add files via upload
1 year ago
ac2f8be93d
use wavs.txt (or other) as the root of the dataset, as opposed to having to call train from the root of the dataset
1 month ago
2191521072
add mandarin required token and update config
2 weeks ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

AuxiliaryASR

This repo contains the training code for Phoneme-level ASR for Voice Conversion (VC) and TTS (Text-Mel Alignment) used in StarGANv2-VC and StyleTTS.

Pre-requisites

  1. Python >= 3.7
  2. Clone this repository:
git clone https://github.com/yl4579/AuxiliaryASR.git
cd AuxiliaryASR
  1. Install python requirements:
pip install SoundFile torchaudio torch jiwer pyyaml click matplotlib g2p_en librosa
  1. Prepare your own dataset and put the train_list.txt and val_list.txt in the Data folder (see Training section for more details).

Training

python train.py --config_path ./Configs/config.yml

Please specify the training and validation data in config.yml file. The data list format needs to be filename.wav|label|speaker_number, see train_list.txt as an example (a subset for LJSpeech). Note that speaker_number can just be 0 for ASR, but it is useful to set a meaningful number for TTS training (if you need to use this repo for StyleTTS).

Checkpoints and Tensorboard logs will be saved at log_dir. To speed up training, you may want to make batch_size as large as your GPU RAM can take. However, please note that batch_size = 64 will take around 10G GPU RAM.

Languages

This repo is set up for English with the g2p_en package, but you can train it with other languages. If you would like to train for datasets in different languages, you will need to modify the meldataset.py file (L86-93) with your own phonemizer. You also need to change the vocabulary file (word_index_dict.txt) and change n_token in config.yml to reflect the number of tokens. A recommended phonemizer for other languages is phonemizer.

References

Acknowledgement

The author would like to thank @tosaka-m for his great repository and valuable discussions.

Tip!

Press p or to see the previous file or, n or to see the next file

About

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)

Collaborators 1

Comments

Loading...