Farewell to learning a language? SeamlessM4T, an instant translator that allows you to speak 101 languages, is introduced by Meta.

We live in a time when technology seems to be opening up new doors almost every week that seem straight out of a sci-fi movie. The latest installment from Meta, the company behind Instagram, Facebook, and WhatsApp, will allow us to communicate with anyone, regardless of their language.
Called SeamlessM4T, it’s an AI model Meta is pitching as the first multi-media, multi-lingual system capable of translating and transcribing text and audio in more than 100 languages. Are we really any closer to building the universal translator that Douglas Adams envisioned with the Babel fish in The Hitchhiker’s Guide to the Galaxy?
The technology, developed by Meta, Mark Zuckerberg’s company that owns Facebook, Instagram, and WhatsApp, promises to ward off the curse of multilingual communication. According to Nature, the model allows for instant translation from voice to voice or from text to voice, and vice versa, mimicking the expression and tone of speakers.
SeamlessM4T (Multilingual and Multimedia Machine Translation) outperforms traditional consecutive translation systems by integrating everything into a unified model, improving accuracy by 8% to 23%. It is also significantly more robust to background noise and speech variations, with a 50% improvement in its ability to adapt to these challenges.
The model, led by lead researcher Marta Costa Jossa of Meta's Fundamental AI Research (FAIR) division, was trained using a million hours of open-spoken audio, allowing it to translate even combinations of languages not explicitly included in their training.
Meta has decided to make the model and its data publicly available for non-commercial use in order to promote research and development in the field of speech translation.
Despite its progress, SeamlessM4T faces significant challenges. In critical contexts like medicine and law, where accuracy is critical, aspects such as proper noun translation, colloquial expressions, gender bias issues, and accent recognition still need to be improved. Nevertheless, the technology represents a crucial step toward more seamless global communications, maintaining Meta’s leadership in personal communications.
To train the model, the team collected millions of hours of audio of speeches, along with human translations, from the internet and other sources, such as the United Nations archives, and transcripts of those speeches were also used.
In Nature, Tanel Alomae, from the Language Technology Laboratory at Tallinn University (Estonia), highlights the system’s impressive ability to translate speech in real time, thanks to the 4.5 million hours of multilingual audio that were used to train it. “This approach allows the model to learn patterns in the data, making it easier to adapt to specific tasks without the need for large amounts of dedicated training data,” he explains.