×
Register Here to Apply for Jobs or Post Jobs. X

Applied Scientist​/Research Engineer—Speech AI

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: GTN Technical Staffing
Full Time position
Listed on 2026-06-19
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software), AI Evaluation
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

We are looking for a senior technical contributor to help develop the next generation of real-time speech and conversational AI systems. This person will work across applied research, model development, training infrastructure, evaluation, and production deployment.

The role is ideal for someone who has deep experience with modern machine learning for audio, speech, and language, and who enjoys moving beyond prototypes into systems that must perform reliably in live environments. You will work closely with engineering and product teams to improve model quality, speed, reliability, and scalability for speech-driven user experiences.

This is a hands‑on position. You should be comfortable experimenting with new model architectures, training and evaluating large models, improving inference performance, and translating research results into production‑ready capabilities.

Responsibilities Develop High‑Quality Speech Generation Systems
  • Build and improve machine learning models for natural, expressive speech generation.
  • Work on cont rollability, speaker consistency, pacing, tone, and conversational timing.
  • Explore model architectures that improve output quality while keeping latency low.
  • Improve performance for real‑time use cases where responsiveness and reliability matter.
  • Partner with infrastructure and product teams to move successful approaches into production.
Improve Speech Understanding and Recognition
  • Train, adapt, and evaluate models that convert speech into accurate, usable text.
  • Improve recognition quality across varied speakers, accents, noisy conditions, phone‑quality audio, interruptions, and mixed‑language conversations.
  • Use large‑scale audio data, weak labels, self‑supervised methods, and targeted fine‑tuning strategies.
  • Improve downstream usefulness of transcripts for conversation analysis, structured output, and intent understanding.
Advance Audio Representation and Compression Methods
  • Research and implement model components that represent speech efficiently and preserve perceptual quality.
  • Explore learned audio representations that support generation, recognition, and efficient deployment.
  • Evaluate different approaches for balancing quality, speed, compute cost, and scalability.
  • Build systems that can support both experimentation and production use.
Build Training and Evaluation Workflows
  • Create reliable pipelines for collecting, cleaning, processing, and evaluating speech data.
  • Design evaluation methods that combine automated metrics, model diagnostics, and human quality review.
  • Support large‑scale training jobs across modern accelerator infrastructure.
  • Improve throughput, reproducibility, monitoring, and cost efficiency of model development workflows.
  • Design controlled experiments to test model, data, and training improvements.
  • Compare approaches using clear benchmarks and production‑relevant quality measures.
  • Move quickly from hypothesis to implementation, measurement, and iteration.
  • Communicate results clearly to research, engineering, and product stakeholders.
What We’re Looking For
  • Strong background in modern machine learning, especially speech, audio, language, generative modeling, multimodal systems, or large‑scale model training.
  • Ability to implement new model ideas efficiently and evaluate them with technical rigor.
  • Strong understanding of current techniques used in speech and language systems.
Speech and Audio Experience
  • Practical experience building or improving systems for speech generation, speech recognition, audio modeling, or related areas.
  • Experience working with large and varied audio datasets.
  • Strong judgment around speech quality, naturalness, latency, robustness, and user‑facing model behavior.
  • Familiarity with real‑world audio issues such as background noise, channel quality, interruptions, speaker variation, and conversational dynamics.
  • Experience training, deploying, or serving large models on modern compute infrastructure.
  • Understanding of practical inference constraints, including latency, memory use, throughput, quantization, and serving efficiency.
  • Comfort working with systems that need to operate reliably in live, low‑latency environments.
  • Experience designing benchmarks, ablation…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary