1 Branches

Configs

2191521072

add mandarin required token and update config

2 weeks ago

Data

78ed84f65c

Add files via upload

1 year ago

.gitignore

e6ec33fe66

add gitignore for checkpoints

1 month ago

LICENSE

1bdcfe1dfa

Initial commit

1 year ago

README.md

29fdbbf8d4

Update README.md

1 year ago

layers.py

5cb3ee46c4

Add files via upload

1 year ago

meldataset.py

1d769f90e9

account for my stupidity

1 month ago

models.py

4101138abb

fix padding errors by converting to mono, apparently not accounted for

1 month ago

optimizers.py

5cb3ee46c4

Add files via upload

1 year ago

text_utils.py

8c8426f167

fix charmap issues

1 month ago

train.py

1939a7172f

attempt resolve issues

1 month ago

trainer.py

5cb3ee46c4

Add files via upload

1 year ago

utils.py

ac2f8be93d

use wavs.txt (or other) as the root of the dataset, as opposed to having to call train from the root of the dataset

1 month ago

word_index_dict.csv

2191521072

add mandarin required token and update config

2 weeks ago

DagsHub Storage

You have to be logged in to leave a comment.

AuxiliaryASR

This repo contains the training code for Phoneme-level ASR for Voice Conversion (VC) and TTS (Text-Mel Alignment) used in StarGANv2-VC and StyleTTS.

Pre-requisites

Python >= 3.7
Clone this repository:

git clone https://github.com/yl4579/AuxiliaryASR.git
cd AuxiliaryASR

Install python requirements:

pip install SoundFile torchaudio torch jiwer pyyaml click matplotlib g2p_en librosa

Prepare your own dataset and put the train_list.txt and val_list.txt in the Data folder (see Training section for more details).

Training

python train.py --config_path ./Configs/config.yml

Please specify the training and validation data in config.yml file. The data list format needs to be filename.wav|label|speaker_number, see train_list.txt as an example (a subset for LJSpeech). Note that speaker_number can just be 0 for ASR, but it is useful to set a meaningful number for TTS training (if you need to use this repo for StyleTTS).

Checkpoints and Tensorboard logs will be saved at log_dir. To speed up training, you may want to make batch_size as large as your GPU RAM can take. However, please note that batch_size = 64 will take around 10G GPU RAM.

Languages

This repo is set up for English with the g2p_en package, but you can train it with other languages. If you would like to train for datasets in different languages, you will need to modify the meldataset.py file (L86-93) with your own phonemizer. You also need to change the vocabulary file (word_index_dict.txt) and change n_token in config.yml to reflect the number of tokens. A recommended phonemizer for other languages is phonemizer.

References

Acknowledgement

The author would like to thank @tosaka-m for his great repository and valuable discussions.

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

AuxiliaryASR

Pre-requisites

Training

Languages

References

Acknowledgement

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

ShoukanLabs / AuxiliaryASR

README.md

AuxiliaryASR

Pre-requisites

Training

Languages

References

Acknowledgement

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

ShoukanLabs
/
AuxiliaryASR