Logistics

Speaker: Leda Sari
Date: May 8, 2025
Venue: Conversational AI Reading Group at Mila (Meeting details)
Host: Pooneh Mousavi

Abstract

Voicebox is a non-autoregressive generative speech model based on flow-matching and is trained to perform speech infilling given audio context and the corresponding text. The Voicebox model can be used for zero-shot text-to-speech synthesis, noise removal, content editing, style conversion, and diverse sample generation. In this talk, we will first review the Voicebox model. We will then focus on the synthetic speech generation capability of the model and present several use cases of these synthetic signals in various applications including automatic speech recognition and spoken language understanding. Through a few early studies on using Voicebox generated speech signals, we will discuss the cost saving benefits of the approach in terms of speech data collection and potential shortcomings of using synthetic speech in these applications.

Papers

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale, Le et al., paper
Audiobox: Unified Audio Generation with Natural Language Prompts, Vyas et al., paper on Arxiv
Towards Selection of Text-to-speech Data to Augment ASR Training, Liu et al., paper on Arxiv
Using Voicebox-based Synthetic Speech for ASR Adaptation, Dhamyal et al., paper on ISCA archive
Improving Spoken Semantic Parsing using Synthetic Data from Large Generative Models, Sharma et al., paper on ISCA archive
Flow Matching for Generative Modeling, Lipman et al., paper on Arxiv
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition, Kim et al., paper on Arxiv

Professional Activities

YFRSW 2025 - Sponsorship opportunities are available for 2025! We are also looking for current female PhD students who can volunteer to talk in our panel discussion.
LinkedIn page for YFRSW
IEEE MLSP 2025 - Sponsorship opportunities are available for 2025! Also stay tuned for registration!
IEEE ASRU 2025 - Call for demos will be available soon!

The Voicebox Model and Its Applications

Logistics

Abstract

Papers

Other Links

Professional Activities