×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Researcher, Audio

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Bland
Full Time position
Listed on 2026-05-24
Job specializations:
  • Software Development
    Software Engineer, Data Scientist, AI Engineer (Applied/Software)
Salary/Wage Range or Industry Benchmark: 160000 - 250000 USD Yearly USD 160000.00 250000.00 YEAR
Job Description & How to Apply Below

Machine Learning Researcher, Audio

Location: San Francisco, CA or Remote (US)

About Bland

At , our mission is to empower enterprises to build AI phone agents ed in San Francisco, we are a fast‑growing team reimagining how customers interact with businesses through voice. We have raised $65 million from leading Silicon Valley investors, including Emergence Capital, Scale Venture Partners, Y Combinator, and founders of Twilio, Affim, and Eleven Labs.

Voice is quickly becoming the primary interface between businesses and their customers. We are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human.

The Role:

Machine Learning Researcher, Audio

As a Machine Learning Researcher at Bland, you’ll be working on foundational research and development across the core components of our voice stack: speech‑to‑text, large language models, neural audio codecs, and text‑to‑speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale.

This is not a narrow research role. You will take ideas from theory to large‑scale training to production inference systems serving millions of calls per day. You will design new modeling approaches, validate them with rigorous experimentation, and collaborate with engineering teams to deploy them into real customer environments.

What You Will Do Build and Scale Next‑Generation TTS Systems
  • Design and train large scale text‑to‑speech models capable of expressive, controllable, human‑sounding output.
  • Develop neural audio codec‑based TTS architectures for efficient, high‑fidelity generation.
  • Improve prosody modeling, question inflection, emotional expression, and multi‑speaker robustness.
  • Optimize for real‑time, low‑latency inference in production.
Advance Speech‑to‑Text Modeling
  • Build and fine‑tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
  • Leverage self‑supervised pretraining and large‑scale weak supervision.
  • Improve transcription accuracy for real‑world enterprise scenarios, including structured extraction and conversational nuance.
Pioneer Neural Audio Codecs
  • Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
  • Explore discrete and continuous latent representations for scalable speech modeling.
  • Design codec architectures that enable downstream generative modeling and controllable synthesis.
Develop Scalable Training Pipelines
  • Curate and process massive audio datasets across languages, speakers, and environments.
  • Design staged training curricula and data filtering strategies.
  • Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.
Run Rigorous Experiments
  • Design ablation studies that isolate the impact of architectural changes.
  • Measure improvements using both objective metrics and perceptual evaluations.
  • Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.
What Makes You a Great Fit Deep Research Foundations
  • Experience with self‑supervised learning, multimodal modeling, or generative modeling.
  • Ability to derive new formulations and implement them efficiently.
Expertise in Voice Modeling
  • Hands‑on experience building or scaling TTS, STT, or neural audio codec systems.
  • Familiarity with large scale speech datasets and real‑world audio variability.
  • Strong intuition for audio quality, prosody, and conversational dynamics.
Systems and Hardware Awareness
  • Experience training and serving large models on modern accelerators.
  • Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
  • Understanding of real‑time constraints in telephony or streaming environments.
Experimental Rigor
  • Track record of designing controlled experiments and meaningful ablations.
  • Comfortable working with both offline benchmarks and live production metrics.
  • Ability to move quickly from hypothesis to validation.
Builder Mentality
  • Comfortable in fast‑moving startup environments.
  • Strong ownership mindset from research through deployment.
  • Excited by ambiguous, unsolved problems.
How You Show Up
  • You treat unsolved problems as…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary