README.md

58796ae188

Inference instructions for M4T (#274)

5 months ago

__init__.py

71886b2e43

Seamless November release. (#221)

5 months ago

predict.py

6e047a9ae1

Enable using `m4t_evaluate` with a Manifest JSON file (#395)

1 month ago

You have to be logged in to leave a comment.

Inference with SeamlessM4T models

Refer to the SeamlessM4T README for an overview of the M4T models.

Inference is run with the CLI, from the root directory of the repository.

The model can be specified with --model_name seamlessM4T_v2_large, seamlessM4T_large or seamlessM4T_medium:

S2ST:

m4t_predict <path_to_input_audio> --task s2st --tgt_lang <tgt_lang> --output_path <path_to_save_audio> --model_name seamlessM4T_v2_large

S2TT:

m4t_predict <path_to_input_audio> --task s2tt --tgt_lang <tgt_lang> --model_name seamlessM4T_v2_large

T2TT:

m4t_predict <input_text> --task t2tt --tgt_lang <tgt_lang> --src_lang <src_lang> --model_name seamlessM4T_v2_large

T2ST:

m4t_predict <input_text> --task t2st --tgt_lang <tgt_lang> --src_lang <src_lang> --output_path <path_to_save_audio> --model_name seamlessM4T_v2_large

ASR:

m4t_predict <path_to_input_audio> --task asr --tgt_lang <tgt_lang> --model_name seamlessM4T_v2_large

Inference breakdown

Inference calls for the Translator object instantiated with a multitask UnitY or UnitY2 model with the options:

and a vocoder:

vocoder_v2 for seamlessM4T_v2_large.
vocoder_36langs for seamlessM4T_large or seamlessM4T_medium.

import torch
from seamless_communication.inference import Translator


# Initialize a Translator object with a multitask model, vocoder on the GPU.
translator = Translator("seamlessM4T_large", "vocoder_36langs", torch.device("cuda:0"), torch.float16)

Now predict() can be used to run inference as many times on any of the supported tasks.

Given an input audio with <path_to_input_audio> or an input text <input_text> in <src_lang>, we first set the text_generation_opts, unit_generation_opts and then translate into <tgt_lang> as follows:

S2ST and T2ST (speech output):

# S2ST
text_output, speech_output = translator.predict(
    input=<path_to_input_audio>,
    task_str="S2ST",
    tgt_lang=<tgt_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=unit_generation_opts
)

# T2ST
text_output, speech_output = translator.predict(
    input=<input_text>,
    task_str="T2ST",
    tgt_lang=<tgt_lang>,
    src_lang=<src_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=unit_generation_opts
)

Note that <src_lang> must be specified for T2ST.

The generated units are synthesized and the output audio file is saved with:

# Save the translated audio output:
import torchaudio
torchaudio.save(
    <path_to_save_audio>,
    speech_output.audio_wavs[0][0].cpu(),
    sample_rate=speech_output.sample_rate,
)

S2TT, T2TT and ASR (text output):

# S2TT
text_output, _ = translator.predict(
    input=<path_to_input_audio>,
    task_str="S2TT",
    tgt_lang=<tgt_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=None
)

# ASR
# This is equivalent to S2TT with `<tgt_lang>=<src_lang>`.
    text_output, _ = translator.predict(
    input=<path_to_input_audio>,
    task_str="ASR",
    tgt_lang=<src_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=None
)

# T2TT
text_output, _ = translator.predict(
    input=<input_text>,
    task_str="T2TT",
    tgt_lang=<tgt_lang>,
    src_lang=<src_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=None
)

Note that <src_lang> must be specified for T2TT

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

Inference with SeamlessM4T models

Inference breakdown

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub-Science / seamless_communication mirror of https://github.com/facebookresearch/seamless_communication

README.md

Inference with SeamlessM4T models

Inference breakdown

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub-Science
/
seamless_communication
mirror of https://github.com/facebookresearch/seamless_communication