×
Register Here to Apply for Jobs or Post Jobs. X

Founding Research Scientist - Video Diffusion

Job in Seattle, King County, Washington, 98127, USA
Listing for: Rethink recruit
Full Time position
Listed on 2026-02-12
Job specializations:
  • IT/Tech
    Data Scientist, AI Engineer, Artificial Intelligence, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 160000 USD Yearly USD 120000.00 160000.00 YEAR
Job Description & How to Apply Below

About Nuance Labs

Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.

We’re creating AI that understands subtle human signals—a raised eyebrow, a micro-expression, a shift in posture—and responds naturally in context. This work sits at the frontier of multimodal modeling, real-time systems, and generative video.

We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with deep experience at Apple and Meta, shipping low-latency ML products used by millions.

Why This Role Exists

High-quality video generation has progressed rapidly—but real-time, expressive, controllable video diffusion for interactive systems remains unsolved
. Most approaches are offline, slow, or brittle when pushed into real-world constraints.

This role exists to define and own video diffusion research inside a broader human foundation model. As a Founding Research Scientist, you’ll shape how motion, expression, and visual coherence are learned, generated, and integrated into real-time multimodal systems.

This is a true blank-page role. You’ll decide which research paths are worth pursuing, how models are trained and evaluated, and how breakthroughs turn into systems that actually work.

What You’ll Be Building

You’ll help build the first human foundation model operating across text, speech, facial expression, and body language in real time.

Your work will enable systems that:

  • Understand fine-grained visual signals such as facial expression, gesture, and posture
  • Generate lifelike, temporally coherent video with expressive motion and identity consistency
  • Power real-time avatars whose expressions and movements evolve frame-by-frame during interaction

The field is wide open. While current systems excel at static visuals or offline generation, real-time, multimodal video generation remains a foundational challenge—and this role is about defining that future.

What You’ll Own

You’ll operate as a founding-level researcher with end-to-end ownership over video diffusion research and its path to production.

You will:

  • Design and train state-of-the-art video diffusion and generative vision models
  • Own the full ML pipeline, from dataset construction and preprocessing to large-scale training and evaluation
  • Explore architectures and objectives for temporal coherence, cont rollability, and identity preservation
  • Push research breakthroughs into practical, low-latency systems
  • Develop benchmarks and evaluation strategies for expressive, real-time video generation
  • Write clean, maintainable research code that supports rapid iteration
  • Collaborate closely with researchers across speech, language, and multimodal learning
Who Will Thrive Here

You enjoy frontier research, but you care deeply about real-world constraints. You’re comfortable navigating ambiguity, setting your own technical direction, and collaborating across domains.

You likely:

  • Love blank-page research problems and defining new problem spaces
  • Move quickly from ideas to experiments to insights
  • Think deeply about model behavior, failure modes, and evaluation
  • Thrive in small, highly technical, in-person teams
Requirements
  • PhD or equivalent experience in video diffusion
    , generative vision
    , or closely related fields
  • Strong background in training large-scale generative models for images or video
  • Deep expertise in modern deep learning and large-scale training systems
  • Experience running the full ML lifecycle, from data curation through evaluation
  • Ability to translate research ideas into practical systems
  • Strong coding skills and a commitment to clean, maintainable research code
  • Clear communication and strong collaboration skills
Nice to Have
  • Publications at top ML, vision, or generative modeling conferences
  • Experience with real-time or low-latency generative systems
  • Prior work on avatars, facial animation, or human motion modeling
  • Experience shipping ML systems used by real users
Why Join Now

Joining Nuance Labs now means defining the visual foundation of a category-defining AI system. You’ll have outsized influence over core research directions, work closely in person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary