Are you sure you want to delete this access key?
Apart from SeamlessM4T-LARGE (2.3B) and SeamlessM4T-MEDIUM (1.2B) models, we are also developing a small model (281M) targeting for on-device inference. This folder contains an example to run an exported small model covering most tasks (ASR/S2TT/S2ST). The model could be executed on popular mobile devices with Pytorch Mobile (https://pytorch.org/mobile/home/).
[2023/8/23] Uploaded new on-device models with several fixes to reduce size and avoid OOM. Metrics should be close to what's reported below, will rerun eval and update.
Model | Checkpoint | Num Params | Disk Size | Supported Tasks | Supported Languages |
---|---|---|---|---|---|
UnitY-Small | 🤗 Model card - checkpoint | 281M | 747MB | S2ST, S2TT, ASR | eng, fra, hin, por, spa |
UnitY-Small-S2T | 🤗 Model card - checkpoint | 235M | 481MB | S2TT, ASR | eng, fra,hin, por, spa |
UnitY-Small-S2T is a pruned version of UnitY-Small without 2nd pass unit decoding.
To use exported model, users don't need seamless_communication or fairseq2 dependency.
import torchaudio
import torch
audio_input, _ = torchaudio.load(TEST_AUDIO_PATH) # Load waveform using torchaudio
s2t_model = torch.jit.load("unity_on_device_s2t.ptl") # Load exported S2T model
with torch.no_grad():
text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT
print(text) # Show text output
s2st_model = torch.jit.load("unity_on_device.ptl")
with torch.no_grad():
text, units, waveform = s2st_model(audio_input, tgt_lang=TGT_LANG) # S2ST model also returns waveform
print(text)
torchaudio.save(f"{OUTPUT_FOLDER}/result.wav", waveform.unsqueeze(0), sample_rate=16000) # Save output waveform to local file
Also running the exported model doesn't need python runtime. For example, you could load this model in C++ following this tutorial, or building your own on-device applications similar to this example
For ASR-BLEU, we follow the same protocol as SeamlessM4T Large/Medium models: We used Whisper-large-v2 for Eng-X and Whisper-medium for X-Eng when evaluating ASR BLEU.
Direction | 1st-pass BLEU (S2TT) | 2nd-pass ASR-BLEU (S2ST) |
---|---|---|
eng-hin | 10.43 | 15.06 |
eng-por | 21.54 | 17.35 |
eng-rus | 7.88 | 5.11 |
eng-spa | 12.78 | 11.75 |
hin-eng | 12.92 | 10.50 |
por-eng | 22.99 | 24.81 |
rus-eng | 18.24 | 18.24 |
spa-eng | 14.37 | 14.85 |
LANG | WER |
---|---|
eng | 27.3 |
hin | 41.5 |
por | 25.2 |
rus | 33.0 |
spa | 18.0 |
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?