×
Register Here to Apply for Jobs or Post Jobs. X

AI Engineer - Model

Job in Denver, Denver County, Colorado, 80285, USA
Listing for: Fathom
Full Time position
Listed on 2026-06-04
Job specializations:
  • Software Development
    AI Engineer, Software Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 150000 USD Yearly USD 120000.00 150000.00 YEAR
Job Description & How to Apply Below
Position: AI Engineer - Model Performance

Role Overview

We’re hiring a Model Performance Engineer to own the speed, cost, and reliability of our model inference stack, and to build the fine‑tuning infrastructure that makes the rest of the AI team faster.

This is not a research role. You’ll be optimizing real systems serving millions of meetings — choosing between quantization trade‑offs, debugging speculative decoding, or figuring out why one GPU family’s tail latency explodes at high concurrency while another stays stable.

Responsibilities
  • Benchmark FP8 quantization across GPU families, find that FP8 KV cache causes catastrophic repetition loops, identify static quantization as 6% faster than dynamic on certain hardware, and ship a production config that gets 1.3x speedup with less than 1% quality degradation.
  • Evaluate serving frameworks (vLLM vs SGLang) with speculative decoding — discover that ngram speculation degrades ASR quality while EAGLE3 draft models don’t, and that torch.compile makes certain GPUs 7% slower.
  • Build a fine‑tuning pipeline that takes a JSONL dataset and produces an optimized tune ready for serving, so a teammate can train a small classifier in an afternoon instead of a week.
  • Optimize GPU spend — know which GPU families are best for batch workloads (stable under high concurrency) vs latency‑sensitive paths, and when a 30% cost premium isn’t worth it.
  • Debug production inference issues — trace a quality regression to a serving framework upgrade that changed the default attention backend, or find that audio format handling in the multimodal pipeline silently drops segments.
Hard Skills
  • Deep experience with LLM serving frameworks (vLLM, SGLang, Tensor

    RT‑LLM, or similar) — not just deploying them, but tuning them: attention backends, scheduling strategies, CUDA graph warmup, prefix caching.
  • Hands‑on quantization experience — you understand weight vs activation quantization, per‑channel vs per‑tensor scaling, and when dynamic quantization introduces more overhead than it saves.
  • Production fine‑tuning experience — LoRA/QLoRA SFT, familiarity with training frameworks, understanding of data formatting, learning rate schedules, and how to diagnose training failures.
  • Strong Python skills. You’ll write serving infrastructure, benchmarking harnesses, and training pipelines — not notebooks.
  • Comfort with GPU profiling and performance analysis. You should be able to look at a benchmark result and know whether the bottleneck is compute, memory bandwidth, or scheduling overhead.
Strong Signals
  • Cost modeling for GPU infrastructure — you’ve had to choose between GPU types and justify the tradeoff.
  • Experience with multimodal models (audio/vision encoders + LLM decoders).
  • Experience with Modal, Ray Serve, or similar serverless GPU platforms.
  • Understanding of audio processing (codecs, chunking, sample rates).
  • Experience building internal tooling that other engineers use — this role succeeds when the rest of the team ships faster.
Not Required
  • ML research background or publications.
  • Prompt engineering expertise.
  • Frontend or full‑stack experience.
  • Masters/PhD (though it’s fine if you have one).
Benefits
  • The opportunity to shape the foundational software services of a growing company.
  • A role that balances innovation and incremental improvement.
  • A dynamic and collaborative engineering team.
  • Competitive compensation and benefits.
  • A supportive environment that encourages innovation and personal growth.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary