MLE SpeechLLM Evaluations Job San Francisco area,California USA,IT/Tech

Machine Learning Engineer, SpeechLLM Evaluations
$250,000 - $350,000 bonus equity
San Francisco, CA. Hybrid (3 days onsite)
Full-time / Permanent

Deep Rec has partnered with a high-growth AI company building speech intelligence products used by more than 1.5 million people worldwide.

This is a chance to define how their foundational Speech LLMs are measured, improved, and trusted. If you've ever felt model evaluation deserves the same attention as model training, you'll have the space to prove it here.

The Opportunity

You'll join an early Speech LLM team where your work shapes research decisions, product quality, and model releases. You'll own the systems that answer one of the hardest questions in AI: how do you measure something as human as conversation, expression, and understanding?

What You'll Do

- Build evaluation frameworks for speech and conversational AI models
- Define benchmarks for transcription, audio quality, and dialogue performance
- Create automated evaluation pipelines for training checkpoints
- Own dashboards that surface model health and regressions
- Partner with researchers to translate capabilities into measurable outcomes
- Investigate unexpected performance changes during model training
- Improve evaluation speed, quality, and reliability across the research lifecycle

What You'll Bring

Essential

- Strong Python engineering experience in production environments
- Experience building ML evaluation, data, or experimentation systems
- Deep understanding of statistics, benchmarking, and model performance analysis
- Ability to explain technical findings to varied audiences

Desirable

- Experience with speech metrics such as WER, CER, PESQ, or MOS
- Familiarity with LLM-as-a-Judge evaluation methods

- Experience with ML observability tools such as Weights & Biases or MLflow

* We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.
* What's In It For You

- Foundational role within a growing Speech LLM research team
- Ownership of evaluation strategy and technical direction
- Work on problems spanning ML, statistics, software, and research
- Comprehensive healthcare, 401(k) matching, parental leave, and unlimited PTO
- Conference, learning, and career development support
- Direct influence on products used by a global customer base