×
Register Here to Apply for Jobs or Post Jobs. X

Founding Research Scientist - MLLM Training

Job in Seattle, King County, Washington, 98127, USA
Listing for: Rethink recruit
Apprenticeship/Internship position
Listed on 2026-01-30
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Scientist, Artificial Intelligence
Salary/Wage Range or Industry Benchmark: 120000 - 150000 USD Yearly USD 120000.00 150000.00 YEAR
Job Description & How to Apply Below

About Nuance Labs

Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.

We’re working toward AI that can read subtle human signals—a shift in tone, a glance, a pause—and respond in a way that feels natural and grounded in context. This is foundational work at the frontier of multimodal learning and real-time systems.

We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.

Why This Role Exists

Large multimodal models are advancing quickly, but real-time, human-centered interaction remains unsolved
. Training models that can reason across text, speech, vision, and embodied signals—while operating under tight latency constraints—requires new approaches to architecture, data, and optimization.

This role exists to own and define how multimodal large language models are trained inside a broader human foundation model. As a Founding Research Scientist, you’ll set technical direction, design training strategies, and turn research ideas into systems that can operate in the real world.

This is a blank-page role with real agency. You’ll decide what problems matter, how we tackle them, and how research translates into working, scalable models.

What You’ll Be Building

You’ll help build the first human foundation model that operates across text, speech, facial expression, and body language in real time.

Your work will power systems that:

  • Understand fine-grained human signals across modalities and infer meaning in context

  • Reason autoregressively over multimodal inputs in real time

  • Drive lifelike avatars whose expressions, gestures, and tone evolve frame-by-frame during interaction

The field is wide open. Existing solutions treat language, voice, and vision as separate problems. This role offers the rare chance to how modalities are trained and unified at the foundation-model level.

What You’ll Own

You’ll operate as a founding-level researcher with end-to-end ownership over MLLM training and evaluation.

You will:

  • Design and train multimodal large language models and autoregressive architectures

  • Own the full ML pipeline, from dataset design and preprocessing to large-scale training and benchmarking

  • Develop training strategies that balance quality, generalization, and real-time performance

  • Push research breakthroughs into practical, production-oriented systems

  • Explore new architectures, objectives, and scaling strategies for multimodal reasoning

  • Write clean, maintainable research code that enables rapid iteration

  • Collaborate closely with researchers across speech, vision, and systems engineering

Who Will Thrive Here

You’re comfortable operating at the research frontier and making progress without a playbook. You care deeply about model behavior, but you’re equally motivated by getting things to work outside the lab.

You likely:

  • Enjoy blank-page research problems and defining technical direction

  • Move quickly from ideas to experiments to results

  • Think deeply about data, evaluation, and failure modes

  • Thrive in highly collaborative, cross-domain teams

Requirements
  • PhD or equivalent experience in multimodal LLMs
    , MLLM training
    , or closely related fields

  • Deep expertise in training large-scale autoregressive models

  • Strong command of modern deep learning and distributed training systems

  • Experience running the full ML lifecycle, from data curation to evaluation

  • Ability to translate research insights into practical systems

  • Strong coding skills and a commitment to clean, maintainable research code

  • Clear communication and strong collaboration skills

Nice to Have
  • Publications at top ML or multimodal AI conferences

  • Experience with real-time or low-latency ML systems

  • Prior work unifying language, vision, and/or speech models

  • Experience shipping large ML systems into production

Why Join Now

Joining Nuance Labs now means defining the training foundation of a category-defining AI system. You’ll have outsized influence over core research decisions, work in-person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary