Founding Research Scientist - Video Diffusion Job Seattle area,Washington USA,IT/Tech

About Nuance Labs

Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.

We’re creating AI that understands subtle human signals—a raised eyebrow, a micro-expression, a shift in posture—and responds naturally in context. This work sits at the frontier of multimodal modeling, real-time systems, and generative video.

We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with deep experience at Apple and Meta, shipping low-latency ML products used by millions.

Why This Role Exists

High-quality video generation has progressed rapidly—but real-time, expressive, controllable video diffusion for interactive systems remains unsolved
. Most approaches are offline, slow, or brittle when pushed into real-world constraints.

This role exists to define and own video diffusion research inside a broader human foundation model. As a Founding Research Scientist, you’ll shape how motion, expression, and visual coherence are learned, generated, and integrated into real-time multimodal systems.

This is a true blank-page role. You’ll decide which research paths are worth pursuing, how models are trained and evaluated, and how breakthroughs turn into systems that actually work.

What You’ll Be Building

You’ll help build the first human foundation model operating across text, speech, facial expression, and body language in real time.

Your work will enable systems that:

Understand fine-grained visual signals such as facial expression, gesture, and posture
Generate lifelike, temporally coherent video with expressive motion and identity consistency
Power real-time avatars whose expressions and movements evolve frame-by-frame during interaction

The field is wide open. While current systems excel at static visuals or offline generation, real-time, multimodal video generation remains a foundational challenge—and this role is about defining that future.

What You’ll Own

You’ll operate as a founding-level researcher with end-to-end ownership over video diffusion research and its path to production.

You will:

Design and train state-of-the-art video diffusion and generative vision models
Own the full ML pipeline, from dataset construction and preprocessing to large-scale training and evaluation
Explore architectures and objectives for temporal coherence, cont rollability, and identity preservation
Push research breakthroughs into practical, low-latency systems
Develop benchmarks and evaluation strategies for expressive, real-time video generation
Write clean, maintainable research code that supports rapid iteration
Collaborate closely with researchers across speech, language, and multimodal learning

Who Will Thrive Here

You enjoy frontier research, but you care deeply about real-world constraints. You’re comfortable navigating ambiguity, setting your own technical direction, and collaborating across domains.

You likely:

Love blank-page research problems and defining new problem spaces
Move quickly from ideas to experiments to insights
Think deeply about model behavior, failure modes, and evaluation
Thrive in small, highly technical, in-person teams

Requirements

PhD or equivalent experience in video diffusion
, generative vision
, or closely related fields
Strong background in training large-scale generative models for images or video
Deep expertise in modern deep learning and large-scale training systems
Experience running the full ML lifecycle, from data curation through evaluation
Ability to translate research ideas into practical systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills

Nice to Have

Publications at top ML, vision, or generative modeling conferences
Experience with real-time or low-latency generative systems
Prior work on avatars, facial animation, or human motion modeling
Experience shipping ML systems used by real users

Why Join Now

Joining Nuance Labs now means defining the visual foundation of a category-defining AI system. You’ll have outsized influence over core research directions, work closely in person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language