×
Register Here to Apply for Jobs or Post Jobs. X

Founding Research Scientist - Speech Synthesis

Job in Seattle, King County, Washington, 98127, USA
Listing for: Rethink recruit
Full Time position
Listed on 2026-01-30
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Scientist, Artificial Intelligence
Job Description & How to Apply Below

About Nuance Labs

Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.

We’re working toward AI that understands subtle human signals—a shift in tone, a hesitant pause, a quirked eyebrow—and responds in a way that feels genuinely human. This is foundational work at the intersection of speech, multimodal learning, and real-time systems.

We’re backed by a $10M seed round from Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.

Why This Role Exists

Speech is at the core of human interaction—and it’s the backbone of truly human AI. While today’s voice systems have made progress on prosody and naturalness, real-time, emotionally grounded, multimodal speech generation remains unsolved.

This role exists to own and push the frontier of speech synthesis inside a broader human foundation model. As a Founding Research Scientist, you’ll help define how speech models are trained, evaluated, and integrated into a real-time system that unifies voice, language, and expression.

This is a blank-page role with real agency. You’ll help decide what problems matter, how we approach them, and how research turns into systems that actually work in the world.

What You’ll Be Building

You’ll help create the first human foundation model that operates across text, speech, facial expression, and body language in real time.

Your work will contribute to systems that:

  • Understand fine-grained human signals, from vocal nuance to subtle changes in expression

  • Generate lifelike, responsive speech that adapts frame-by-frame to context and emotion

  • Power real-time avatars whose voice, tone, and expression evolve naturally in interaction

This is a rare opportunity to shape foundational technology in a space where the boundaries are still being defined.

What You’ll Own

You’ll operate as a founding-level researcher with end-to-end ownership over speech synthesis research and its path to production.

You will:

  • Design, train, and evaluate state-of-the-art speech synthesis and audio generation models

  • Own the full ML pipeline, from data wrangling and rapid prototyping to large-scale training and benchmarking

  • Push research breakthroughs into practical, real-time systems

  • Explore new architectures and training strategies for expressive, low-latency speech generation

  • Write clean, maintainable research code that supports fast iteration

  • Collaborate closely with researchers across vision, language, and multimodal modeling

Who Will Thrive Here

You’re someone who loves frontier research—but you also care deeply about whether things actually work. You’re comfortable with ambiguity, motivated by unsolved problems, and excited to chart your own course.

You likely:

  • Enjoy blank-page research problems and setting your own technical direction

  • Move quickly from ideas to experiments to results

  • Care about both model quality and real-world constraints like latency and stability

  • Thrive alongside other highly driven, deeply technical collaborators

Requirements
  • PhD or equivalent experience in speech synthesis
    , audio generation
    , or closely related fields

  • Deep expertise in training speech or audio models (e.g., TTS, speech-to-speech, neural vocoders)

  • Strong command of modern deep learning methods and large-scale training workflows

  • Experience running the full ML lifecycle, from dataset construction through evaluation

  • Ability to translate research insights into working systems

  • Strong coding skills and a commitment to clean, maintainable research code

  • Clear communication and strong collaboration skills

Nice to Have
  • Publications at top ML, speech, or audio conferences

  • Experience with real-time or low-latency ML systems

  • Prior work on multimodal models involving speech, vision, or language

  • Experience shipping ML systems used by real users

Why Join Now

Joining Nuance Labs now means shaping the core research direction of a company tackling one of the hardest problems in AI: real-time, emotionally intelligent human interaction.

You’ll have outsized ownership, direct influence on foundational systems, and the chance to work in-person with a world-class team that blends frontier research with product-grade engineering. If you want your research to define a new category—not just incrementally improve an existing one—this role offers that opportunity.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary