Senior Machine Learning Engineer Job Chicago area,Illinois USA,IT/Tech

Overview

SR.Machine Learning Engineer — Agentic Voice (Healthcare)

Location:

West Loop, Chicago, IL (Hybrid — 3 days/week in office)

Job Summary

We’re looking for a hands-on, entrepreneurial Senior Machine Learning Engineer who has already taken voice-centric AI systems (TTS, STT, LLM-driven dialog) from prototype to planet-scale production. You will own the full lifecycle of our ML stack—research, data pipelines, training, evaluation, deployment, and relentless optimisation—so that millions of patients can have natural, sub-second conversations with our Agentic Voice platform. You’ll collaborate tightly with product, infra, and compliance teams, set a high technical bar for ML excellence.

What sets this role apart: You'll specialize in creating highly optimized, domain-specific conversational AI models by fine-tuning and compressing existing LLMs and specialized conversational architectures for specific use cases. We need someone who can rapidly research, prototype, and deploy smaller, faster, cheaper models that outperform general-purpose solutions in conversational settings — achieving 10x speed improvements and 90% cost reductions while building efficient pipelines for intent classification, dialogue management, and text-based optimization systems that improve conversational quality of our dialogue systems.

Responsibilities

Advanced Model Optimization & Fine Tuning
Apply LoRA, QLoRA, DPO, RLHF and parameter-efficient methods to create smaller, faster models optimized for conversational contexts; implement quantization, pruning, knowledge distillation to reduce model size while preserving quality; work with modern conversational architectures (DeBERTa, Set Fit, sentence transformers, lightweight decoder models) for domain-specific use cases; rapidly evaluate and adapt latest research for conversational applications.
End-to-End ML Engineering
Design, build, and maintain high-performing STT, TTS, and LLM pipelines that operate at < 800 ms end-to-end latency and thousands of concurrent calls; train and fine-tune smaller, task-specific LLMs optimized for real-time accuracy, latency and cost.
Inference at Scale
Optimize GPU- and CPU-based serving on EKS / Kubernetes using dynamic batching, quantisation, speculative decoding, and streaming gRPC / Web Sockets; extend Lang Graph / Lang Chain flows and Model Context Protocol (MCP) schemas to orchestrate complex multi-turn healthcare conversations safely and compliantly.
Data & Evaluation
Build robust data pipelines (Kafka → Snowflake / S3) for conversation logs; design offline and online evaluation frameworks for ASR/WER, TTS MOS, and task-completion metrics.
Technical Leadership
Establish ML best practices—versioning, monitoring, A/B gating, CICD for models—and mentor engineers on ML ops, audio processing, and prompt engineering.
Cross-Functional Collaboration
Work daily with product managers, designers, compliance leads, and customer teams to translate business goals into scalable voice experiences; stay on the cutting edge of open-source speech and LLM research; run rapid POCs (e.g., Whisper-v3, Bark); explore efficient fine tuning techniques (LORA, DPO); continuously improve model performance in production environments.
Reliability & Compliance
Ensure HIPAA-grade security, auditable PHI handling, guardrails, and fallback strategies to keep conversations safe and reliable 24 × 7.

Qualifications

Education

B.S. or M.S. in Computer Science, Machine Learning, or related field.

Experience

7+ years building production ML systems, 2+ specifically in speech / conversational AI.
Proven track record shipping voice AI or large-scale LLM products to tens-of-millions of users or thousands of concurrent sessions.

Technical Expertise

Advanced Fine-tuning & Model Compression: Proven experience with parameter-efficient fine-tuning techniques (LoRA, QLoRA, adapters) for conversational applications; knowledge of few-shot learning frameworks for conversational tasks with limited data; experience with model compression techniques (quantization GPTQ/AWQ, pruning, knowledge distillation) for real-time inference.
Speech: Deep understanding of ASR (Whisper, NeMo, Kaldi) and TTS (Tacotron, Fast…


Increase/decrease your Search Radius (miles)



Job Posting Language