Founding Research Scientist - MLLM Training
Listed on 2026-01-30
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Scientist, Artificial Intelligence
About Nuance Labs
Nuance Labs is an early-stage deep tech startup building the first real-time human foundation model
—a unified system across text, speech, and vision designed to make AI socially and emotionally intelligent.
We’re working toward AI that can read subtle human signals—a shift in tone, a glance, a pause—and respond in a way that feels natural and grounded in context. This is foundational work at the frontier of multimodal learning and real-time systems.
We’ve raised a $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels, and our team includes world-class researchers from MIT, UW, and Oxford with decades of experience at Apple and Meta, shipping ultra-low-latency ML systems used by millions.
Why This Role ExistsLarge multimodal models are advancing quickly, but real-time, human-centered interaction remains unsolved
. Training models that can reason across text, speech, vision, and embodied signals—while operating under tight latency constraints—requires new approaches to architecture, data, and optimization.
This role exists to own and define how multimodal large language models are trained inside a broader human foundation model. As a Founding Research Scientist, you’ll set technical direction, design training strategies, and turn research ideas into systems that can operate in the real world.
This is a blank-page role with real agency. You’ll decide what problems matter, how we tackle them, and how research translates into working, scalable models.
What You’ll Be BuildingYou’ll help build the first human foundation model that operates across text, speech, facial expression, and body language in real time.
Your work will power systems that:
Understand fine-grained human signals across modalities and infer meaning in context
Reason autoregressively over multimodal inputs in real time
Drive lifelike avatars whose expressions, gestures, and tone evolve frame-by-frame during interaction
The field is wide open. Existing solutions treat language, voice, and vision as separate problems. This role offers the rare chance to how modalities are trained and unified at the foundation-model level.
What You’ll OwnYou’ll operate as a founding-level researcher with end-to-end ownership over MLLM training and evaluation.
You will:
Design and train multimodal large language models and autoregressive architectures
Own the full ML pipeline, from dataset design and preprocessing to large-scale training and benchmarking
Develop training strategies that balance quality, generalization, and real-time performance
Push research breakthroughs into practical, production-oriented systems
Explore new architectures, objectives, and scaling strategies for multimodal reasoning
Write clean, maintainable research code that enables rapid iteration
Collaborate closely with researchers across speech, vision, and systems engineering
You’re comfortable operating at the research frontier and making progress without a playbook. You care deeply about model behavior, but you’re equally motivated by getting things to work outside the lab.
You likely:
Enjoy blank-page research problems and defining technical direction
Move quickly from ideas to experiments to results
Think deeply about data, evaluation, and failure modes
Thrive in highly collaborative, cross-domain teams
PhD or equivalent experience in multimodal LLMs
, MLLM training
, or closely related fieldsDeep expertise in training large-scale autoregressive models
Strong command of modern deep learning and distributed training systems
Experience running the full ML lifecycle, from data curation to evaluation
Ability to translate research insights into practical systems
Strong coding skills and a commitment to clean, maintainable research code
Clear communication and strong collaboration skills
Publications at top ML or multimodal AI conferences
Experience with real-time or low-latency ML systems
Prior work unifying language, vision, and/or speech models
Experience shipping large ML systems into production
Joining Nuance Labs now means defining the training foundation of a category-defining AI system. You’ll have outsized influence over core research decisions, work in-person with a world-class team, and help solve one of the hardest problems in AI: real-time, multimodal human interaction.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).