You have to be logged in to leave a comment.

UnitY2 forced alignment extractor

Please refer to Section 3.3.2 of the "Seamless: Multilingual Expressive and Streaming Speech Translation" paper to read more details about aligner design & training.

We provide a light-weight wrapper to extract alignments between given text and acoustic unit sequences. Unit extractor is also available from the wrapper itself.

Alignment extractor codebase

The entire codebase is located in /src/seamless_communication/models/aligner. It is built using fairseq2 library. This time we release a mutlilingual (38 languages following SeamlessM4Tv2 target languages) checkpoint to load the alignment toolkit. This checkpoint corresponds to nar_t2u_aligner asset card.

Usage examples

For large-scale alignment extraction offline unit extraction is preferred. Refer to /src/seamless_communication/cli/m4t/audio_to_units for more details on offline unit extraction.

Alignment extractor initialization:

from seamless_communication.models.aligner.alignment_extractor import AlignmentExtractor
from fairseq2.typing import Device
import torch

extractor = AlignmentExtractor(
    aligner_model_name_or_card="nar_t2u_aligner",
    unit_extractor_model_name_or_card="xlsr2_1b_v2",
    unit_extractor_output_layer=35,
    unit_extractor_kmeans_model_uri="https://dl.fbaipublicfiles.com/seamlessM4T/models/unit_extraction/kmeans_10k.npy",
)

large unit extractor checkpoint will be downloaded, this takes time
by default cpu device is used, but fp16 (dtype=torch.float16) & cuda (device=Device("cuda")) are supported, see class constructor for details

Extracting alignment

Ru audio example:

audio link: https://models.silero.ai/denoise_models/sample0.wav (thanks Silero team for public audio samples)
ru_transcription: первое что меня поразило это необыкновенно яркий солнечный свет похожий на электросварку

alignment_durations, _, tokenized_text_tokens = extractor.extract_alignment("sample0.wav", ru_transcription, plot=True, add_trailing_silence=True)

audio will be resampled to 16kHz for unit extraction
alignment_durations contains number of units (20ms frames) aligned per each token from tokenized_text_tokens.
add_trailing_silence sets extra silence token in the end of the given text sequence. That is useful when there is no terminal punctuation provided in the text itself.

Ru alignment plot:

En audio example:

audio link: https://dl.fbaipublicfiles.com/seamlessM4T/LJ037-0171_sr16k.wav
en_transcription: the examination and testimony of the experts enabled the commision to conclude that five shots may have been fired.

alignment_durations, _, tokenized_text_tokens = extractor.extract_alignment("LJ037-0171_sr16k.wav", en_transcription, plot=True, add_trailing_silence=False)

here we set add_trailing_silence to False since terminal punctuation exists, but True will also work

En alignment plot:

Integration test

If you encounter issues with produced alignments, please run integration test with the alignment extraction toolkit to make sure that your environment works good.

Run from the repo root:

pytest -vv tests/integration/models/test_unity2_aligner.py

Tip!

Press p or to see the previous file or, n or to see the next file

unity2_aligner_README.md 3.3 KB

Permalink History Raw

UnitY2 forced alignment extractor

Alignment extractor codebase

Usage examples

Integration test

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub-Science / seamless_communication mirror of https://github.com/facebookresearch/seamless_communication

unity2_aligner_README.md 3.3 KB Permalink History Raw

UnitY2 forced alignment extractor

Alignment extractor codebase

Usage examples

Integration test

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub-Science
/
seamless_communication
mirror of https://github.com/facebookresearch/seamless_communication

unity2_aligner_README.md 3.3 KB

Permalink History Raw