AI Researcher; Voice Job New York New York USA,Software Development

Position: AI Researcher (Voice)
Location: New York

AI Researcher (Voice) – Tavus

Base pay range
: $/yr – $/yr

About Us
Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real‑time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face‑to‑face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms.

Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale.

We’re a Series‑A company backed by world‑class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

The Role

We’re looking for a Senior Researcher to join our core AI team. The ideal partner‑in‑crime works well in startup environments, prioritizes autonomously, and thrives on calculated risk‑taking. We’re moving fast and expect team members to pave the path, not just ride it.

Responsibilities

Lead research efforts on generative video and audio models (e.g., text‑to‑speech, speech‑to‑speech, audio‑to‑expression, and other speech and multimodal AI topics)
Collaborate with the Applied ML team to product ionize our research
Stay current with the latest advancements and help create the next breakthroughs

Requirements

Proven experience with flow matching, diffusion models, and autoregressive networks in the audio domain
Experience training deep learning models from medium‑sized to large
Experience building streaming text‑to‑speech or speech‑to‑speech models
Strong foundations in audio modeling and a track record of rapid prototyping and innovation
Familiarity with state‑of‑the‑art architectures in representation learning (audio or image domain, face animation) and deep understanding of the core domain expertise above
Excellent programming skills and fluency in Py Torch
Published original research in top‑tier or solid second‑tier venues (e.g., CVPR, NeurIPS, BMVC)
Enthusiasm for building lifelike, expressive avatars for real‑time applications

Preferred Experience

Skills in 3D graphics and Gaussian splatting
Additional experience with generative models
PhD or equivalent experience preferred
Experience leading research teams
Knowledge of best software development practices

Approximately 80% of the work is hybrid in San Francisco; we offer relocation and are open to remote candidates.

Benefits & Culture

At Tavus, you’ll join a diverse, supportive team driven by people. This position offers a flexible work schedule, unlimited PTO, competitive healthcare, gear stipends, and plenty of fun. We value culture creators over cultural fit, and we’re committed to diversity, equity, and inclusion.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language