Founding Research Scientist - Speech Synthesis
Listed on 2026-01-30
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Scientist, Artificial Intelligence
About Nuance Labs
Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.
We’re working toward AI that understands subtle human signals—a shift in tone, a hesitant pause, a quirked eyebrow—and responds in a way that feels genuinely human. This is foundational work at the intersection of speech, multimodal learning, and real-time systems.
We’re backed by a $10M seed round from Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.
Why This Role ExistsSpeech is at the core of human interaction—and it’s the backbone of truly human AI. While today’s voice systems have made progress on prosody and naturalness, real-time, emotionally grounded, multimodal speech generation remains unsolved.
This role exists to own and push the frontier of speech synthesis inside a broader human foundation model. As a Founding Research Scientist, you’ll help define how speech models are trained, evaluated, and integrated into a real-time system that unifies voice, language, and expression.
This is a blank-page role with real agency. You’ll help decide what problems matter, how we approach them, and how research turns into systems that actually work in the world.
What You’ll Be BuildingYou’ll help create the first human foundation model that operates across text, speech, facial expression, and body language in real time.
Your work will contribute to systems that:
Understand fine-grained human signals, from vocal nuance to subtle changes in expression
Generate lifelike, responsive speech that adapts frame-by-frame to context and emotion
Power real-time avatars whose voice, tone, and expression evolve naturally in interaction
This is a rare opportunity to shape foundational technology in a space where the boundaries are still being defined.
What You’ll OwnYou’ll operate as a founding-level researcher with end-to-end ownership over speech synthesis research and its path to production.
You will:
Design, train, and evaluate state-of-the-art speech synthesis and audio generation models
Own the full ML pipeline, from data wrangling and rapid prototyping to large-scale training and benchmarking
Push research breakthroughs into practical, real-time systems
Explore new architectures and training strategies for expressive, low-latency speech generation
Write clean, maintainable research code that supports fast iteration
Collaborate closely with researchers across vision, language, and multimodal modeling
You’re someone who loves frontier research—but you also care deeply about whether things actually work. You’re comfortable with ambiguity, motivated by unsolved problems, and excited to chart your own course.
You likely:
Enjoy blank-page research problems and setting your own technical direction
Move quickly from ideas to experiments to results
Care about both model quality and real-world constraints like latency and stability
Thrive alongside other highly driven, deeply technical collaborators
PhD or equivalent experience in speech synthesis
, audio generation
, or closely related fieldsDeep expertise in training speech or audio models (e.g., TTS, speech-to-speech, neural vocoders)
Strong command of modern deep learning methods and large-scale training workflows
Experience running the full ML lifecycle, from dataset construction through evaluation
Ability to translate research insights into working systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills
Publications at top ML, speech, or audio conferences
Experience with real-time or low-latency ML systems
Prior work on multimodal models involving speech, vision, or language
Experience shipping ML systems used by real users
Joining Nuance Labs now means shaping the core research direction of a company tackling one of the hardest problems in AI: real-time, emotionally intelligent human interaction.
You’ll have outsized ownership, direct influence on foundational systems, and the chance to work in-person with a world-class team that blends frontier research with product-grade engineering. If you want your research to define a new category—not just incrementally improve an existing one—this role offers that opportunity.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).