You have to be logged in to leave a comment.

On-device Models [Experimental]

Apart from SeamlessM4T-LARGE (2.3B) and SeamlessM4T-MEDIUM (1.2B) models, we are also developing a small model (281M) targeting for on-device inference. This folder contains an example to run an exported small model covering most tasks (ASR/S2TT/S2ST). The model could be executed on popular mobile devices with Pytorch Mobile (https://pytorch.org/mobile/home/).

Updates

[2023/8/23] Uploaded new on-device models with several fixes to reduce size and avoid OOM. Metrics should be close to what's reported below, will rerun eval and update.

Overview

Model	Checkpoint	Num Params	Disk Size	Supported Tasks	Supported Languages
UnitY-Small	🤗 Model card - checkpoint	281M	747MB	S2ST, S2TT, ASR	eng, fra, hin, por, spa
UnitY-Small-S2T	🤗 Model card - checkpoint	235M	481MB	S2TT, ASR	eng, fra,hin, por, spa

UnitY-Small-S2T is a pruned version of UnitY-Small without 2nd pass unit decoding.

Inference

To use exported model, users don't need seamless_communication or fairseq2 dependency.

import torchaudio
import torch
audio_input, _ = torchaudio.load(TEST_AUDIO_PATH) # Load waveform using torchaudio

s2t_model = torch.jit.load("unity_on_device_s2t.ptl") # Load exported S2T model
with torch.no_grad():
    text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT
print(text) # Show text output 

s2st_model = torch.jit.load("unity_on_device.ptl")
with torch.no_grad():
    text, units, waveform = s2st_model(audio_input, tgt_lang=TGT_LANG) # S2ST model also returns waveform
print(text)
torchaudio.save(f"{OUTPUT_FOLDER}/result.wav", waveform.unsqueeze(0), sample_rate=16000) # Save output waveform to local file

Also running the exported model doesn't need python runtime. For example, you could load this model in C++ following this tutorial, or building your own on-device applications similar to this example

Metrics

S2TT BLEU / S2ST ASR-BLEU on FLEURS

For ASR-BLEU, we follow the same protocol as SeamlessM4T Large/Medium models: We used Whisper-large-v2 for Eng-X and Whisper-medium for X-Eng when evaluating ASR BLEU.

Direction	1st-pass BLEU (S2TT)	2nd-pass ASR-BLEU (S2ST)
eng-hin	10.43	15.06
eng-por	21.54	17.35
eng-rus	7.88	5.11
eng-spa	12.78	11.75
hin-eng	12.92	10.50
por-eng	22.99	24.81
rus-eng	18.24	18.24
spa-eng	14.37	14.85

ASR WER on FLEURS

LANG	WER
eng	27.3
hin	41.5
por	25.2
rus	33.0
spa	18.0

Tip!

Press p or to see the previous file or, n or to see the next file

on_device_README.md 3.1 KB

Permalink History Raw

On-device Models [Experimental]

Updates

Overview

Inference

Metrics

S2TT BLEU / S2ST ASR-BLEU on FLEURS

ASR WER on FLEURS

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub-Science / seamless_communication mirror of https://github.com/facebookresearch/seamless_communication

on_device_README.md 3.1 KB Permalink History Raw

On-device Models [Experimental]

Updates

Overview

Inference

Metrics

S2TT BLEU / S2ST ASR-BLEU on FLEURS

ASR WER on FLEURS

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub-Science
/
seamless_communication
mirror of https://github.com/facebookresearch/seamless_communication

on_device_README.md 3.1 KB

Permalink History Raw