×
Register Here to Apply for Jobs or Post Jobs. X

MLE SpeechLLM Evaluations

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: DEEPREC.AI
Full Time position
Listed on 2026-06-21
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software), AI Evaluation, Data Scientist
Job Description & How to Apply Below
Machine Learning Engineer, SpeechLLM Evaluations
$250,000 - $350,000 bonus equity
San Francisco, CA. Hybrid (3 days onsite)
Full-time / Permanent

Deep Rec has partnered with a high-growth AI company building speech intelligence products used by more than 1.5 million people worldwide.

This is a chance to define how their foundational Speech LLMs are measured, improved, and trusted. If you've ever felt model evaluation deserves the same attention as model training, you'll have the space to prove it here.

The Opportunity

You'll join an early Speech LLM team where your work shapes research decisions, product quality, and model releases. You'll own the systems that answer one of the hardest questions in AI: how do you measure something as human as conversation, expression, and understanding?

What You'll Do

- Build evaluation frameworks for speech and conversational AI models
- Define benchmarks for transcription, audio quality, and dialogue performance
- Create automated evaluation pipelines for training checkpoints
- Own dashboards that surface model health and regressions
- Partner with researchers to translate capabilities into measurable outcomes
- Investigate unexpected performance changes during model training
- Improve evaluation speed, quality, and reliability across the research lifecycle

What You'll Bring

Essential

- Strong Python engineering experience in production environments
- Experience building ML evaluation, data, or experimentation systems
- Deep understanding of statistics, benchmarking, and model performance analysis
- Ability to explain technical findings to varied audiences

Desirable

- Experience with speech metrics such as WER, CER, PESQ, or MOS
- Familiarity with LLM-as-a-Judge evaluation methods

- Experience with ML observability tools such as Weights & Biases or MLflow

* We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.
* What's In It For You

- Foundational role within a growing Speech LLM research team
- Ownership of evaluation strategy and technical direction
- Work on problems spanning ML, statistics, software, and research
- Comprehensive healthcare, 401(k) matching, parental leave, and unlimited PTO
- Conference, learning, and career development support
- Direct influence on products used by a global customer base
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary