Are you sure you want to delete this access key?
SeamlessStreaming is the streaming only model and Seamless is the expressive streaming model.
Evaluation can be run with the streaming_evaluate
CLI.
We use the seamless_streaming_unity
for loading the speech encoder and T2U models, and seamless_streaming_monotonic_decoder
for loading the text decoder for streaming evaluation. This is already set as defaults for the streaming_evaluate
CLI, but can be overridden using the --unity-model-name
and --monotonic-decoder-model-name
args if required.
Note that the numbers in our paper use single precision floating point format (fp32) for evaluation by setting --dtype fp32
. Also note that the results from running these evaluations might be slightly different from the results reported in our paper (which will be updated soon with the new results).
Set the task to s2tt
for evaluating the speech-to-text translation part of the SeamlessStreaming model.
streaming_evaluate --task s2tt --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_lang_code>
Note: The --ref-field
can be used to specify the name of the reference column in the dataset.
Set the task to asr
for evaluating the automatic speech recognition part of the SeamlessStreaming model. Make sure to pass the source language as the --tgt-lang
arg.
streaming_evaluate --task asr --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_source_lang_code>
Set the task to s2st
for evaluating the speech-to-speech translation part of the SeamlessStreaming model.
streaming_evaluate --task s2st --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_lang_code>
The Seamless model is a unified model for streaming expressive speech-to-speech translation. Use the --expressive
arg for running evaluation of this unified model.
streaming_evaluate --task s2st --data-file <path_to_data_tsv_file> --audio-root-dir <path_to_audio_root_directory> --output <path_to_evaluation_output_directory> --tgt-lang <3_letter_lang_code> --expressive --gated-model-dir <path_to_vocoder_checkpoints_dir>
The Seamless model uses vocoder_pretssel
which is a 24KHz version (vocoder_pretssel
) by default. In the current version of our paper, we use 16KHz version (vocoder_pretssel_16khz
) for the evaluation, so in order to reproduce those results please add this arg to the above command: --vocoder-name vocoder_pretssel_16khz
.
vocoder_pretssel
or vocoder_pretssel_16khz
checkpoints are gated, please check out this section to acquire these checkpoints. Also, make sure to add --gated-model-dir <path_to_vocoder_checkpoints_dir>
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?