Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Ovidijus Parsiunas d64d1b738a
Update README.md (#262)
5 months ago
..
71886b2e43
Seamless November release. (#221)
5 months ago
d64d1b738a
Update README.md (#262)
5 months ago
71886b2e43
Seamless November release. (#221)
5 months ago
6ab3787931
Introduce expressivity_predict, and change pretssel_inference to expressivity_evaluate. (#251)
5 months ago

README.md

You have to be logged in to leave a comment. Sign In

Evaluating SeamlessStreaming and Seamless models

SeamlessStreaming is the streaming only model and Seamless is the expressive streaming model.

Quick start:

Evaluation can be run with the streaming_evaluate CLI.

We use the seamless_streaming_unity for loading the speech encoder and T2U models, and seamless_streaming_monotonic_decoder for loading the text decoder for streaming evaluation. This is already set as defaults for the streaming_evaluate CLI, but can be overridden using the --unity-model-name and --monotonic-decoder-model-name args if required.

Note that the numbers in our paper use single precision floating point format (fp32) for evaluation by setting --dtype fp32. Also note that the results from running these evaluations might be slightly different from the results reported in our paper (which will be updated soon with the new results).

S2TT:

Set the task to s2tt for evaluating the speech-to-text translation part of the SeamlessStreaming model.

streaming_evaluate --task s2tt --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_lang_code>

Note: The --ref-field can be used to specify the name of the reference column in the dataset.

ASR:

Set the task to asr for evaluating the automatic speech recognition part of the SeamlessStreaming model. Make sure to pass the source language as the --tgt-lang arg.

streaming_evaluate --task asr --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_source_lang_code> 

S2ST:

SeamlessStreaming:

Set the task to s2st for evaluating the speech-to-speech translation part of the SeamlessStreaming model.

streaming_evaluate --task s2st --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_lang_code>

Seamless:

The Seamless model is a unified model for streaming expressive speech-to-speech translation. Use the --expressive arg for running evaluation of this unified model.

streaming_evaluate --task s2st --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_lang_code> --expressive --gated-model-dir <path_to_vocoder_checkpoints_dir>

The Seamless model uses vocoder_pretssel which is a 24KHz version (vocoder_pretssel) by default. In the current version of our paper, we use 16KHz version (vocoder_pretssel_16khz) for the evaluation, so in order to reproduce those results please add this arg to the above command: --vocoder-name vocoder_pretssel_16khz.

vocoder_pretssel or vocoder_pretssel_16khz checkpoints are gated, please check out this section to acquire these checkpoints. Also, make sure to add --gated-model-dir <path_to_vocoder_checkpoints_dir>

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...