Founding Research Scientist - MLLM Training Job Seattle area,Washington USA,IT/Tech

About Nuance Labs

Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.

We’re working toward AI that can read subtle human signals—a shift in tone, a glance, a pause—and respond in a way that feels natural and grounded in context. This is foundational work at the frontier of multimodal learning and real-time systems.

We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.

Why This Role Exists

Large multimodal models are advancing quickly, but real-time, human-centered interaction remains unsolved
. Training models that can reason across text, speech, vision, and embodied signals—while operating under tight latency constraints—requires new approaches to architecture, data, and optimization.

This role exists to own and define how multimodal large language models are trained inside a broader human foundation model. As a Founding Research Scientist, you’ll set technical direction, design training strategies, and turn research ideas into systems that can operate in the real world.

This is a blank-page role with real agency. You’ll decide what problems matter, how we tackle them, and how research translates into working, scalable models.

What You’ll Be Building

You’ll help build the first human foundation model that operates across text, speech, facial expression, and body language in real time.

Your work will power systems that:

Understand fine-grained human signals across modalities and infer meaning in context
Reason autoregressively over multimodal inputs in real time
Drive lifelike avatars whose expressions, gestures, and tone evolve frame-by-frame during interaction

The field is wide open. Existing solutions treat language, voice, and vision as separate problems. This role offers the rare chance to how modalities are trained and unified at the foundation-model level.

What You’ll Own

You’ll operate as a founding-level researcher with end-to-end ownership over MLLM training and evaluation.

You will:

Design and train multimodal large language models and autoregressive architectures
Own the full ML pipeline, from dataset design and preprocessing to large-scale training and benchmarking
Develop training strategies that balance quality, generalization, and real-time performance
Push research breakthroughs into practical, production-oriented systems
Explore new architectures, objectives, and scaling strategies for multimodal reasoning
Write clean, maintainable research code that enables rapid iteration
Collaborate closely with researchers across speech, vision, and systems engineering

Who Will Thrive Here

You’re comfortable operating at the research frontier and making progress without a playbook. You care deeply about model behavior, but you’re equally motivated by getting things to work outside the lab.

You likely:

Enjoy blank-page research problems and defining technical direction
Move quickly from ideas to experiments to results
Think deeply about data, evaluation, and failure modes
Thrive in highly collaborative, cross-domain teams

Requirements

PhD or equivalent experience in multimodal LLMs
, MLLM training
, or closely related fields
Deep expertise in training large-scale autoregressive models
Strong command of modern deep learning and distributed training systems
Experience running the full ML lifecycle, from data curation to evaluation
Ability to translate research insights into practical systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills

Nice to Have

Publications at top ML or multimodal AI conferences
Experience with real-time or low-latency ML systems
Prior work unifying language, vision, and/or speech models
Experience shipping large ML systems into production

Why Join Now

Joining Nuance Labs now means defining the training foundation of a category-defining AI system. You’ll have outsized influence over core research decisions, work in-person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language