AI Production Engineer
Listed on 2026-02-28
-
Software Development
AI Engineer
About Distyl AI
Distyl AI develops production‑grade AI systems to power core operational workflows for Fortune 500 companies. Powered by a strategic partnership with OpenAI, in‑house software accelerators, and deep enterprise AI expertise, we deliver working AI systems with rapid time to value – within a quarter. Our products have helped Fortune 500 customers across diverse industries, from insurance and CPG to non‑profits. As part of our team, you will help companies identify, build, and realize value from their GenAI investments, often for the first time.
We are customer‑centric, working backward from the customer’s problem and holding ourselves accountable for creating both financial impact and improving the lives of end‑users. Distyl is led by proven leaders from top companies like Palantir and Apple and is backed by Lightspeed, Khosla, Coatue, Dell Technologies Capital, Nat Friedman (Former CEO of Git Hub), Brad Gerstner (Founder and CEO of Altimeter), and board members of over a dozen Fortune 500 companies.
At Distyl, AI systems live or die on latency, reliability, and operational excellence.
AI Production Engineers focus on building and operating AI systems that perform in real time, at scale, under strict reliability constraints. They work with the rest of the forward deployed team to ensure our AI systems are reliable, performant, and secure from the ground up. These engineers are hands‑on system owners who bring deep expertise in production engineering—low‑latency services, voice pipelines, batch processing, and observability—while remaining fully accountable for the behavior and value of customer‑facing AI systems.
They thrive in high‑performance environments and take personal ownership of making sure AI systems are fast, stable, and trustworthy in production.
- Own the performance and reliability characteristics of AI systems deployed in customer environments
- Design, build, and operate low‑latency AI services—including real‑time voice and interaction pipelines—as well as large‑scale batch processing workflows that execute complex AI workloads reliably
- AI Production Engineers are the escalation point for performance and reliability risk, and have veto power on launches that violate production constraints
- Deeply involved in system design, implementation, and operation, investigating performance bottlenecks, failure modes, and scaling limits across AI pipelines, APIs, orchestration layers, and infrastructure
- Design and evolve observability systems—metrics, logs, tracing, alerts—that make AI behavior understandable and actionable in production
- Work directly with Forward Deployed AI Engineers, Product Engineers, and Architects to ensure that production constraints meaningfully shape system design
- Step in on high‑risk or high‑impact issues, debug live systems, and harden AI services so they can operate continuously under real‑world load
- Help turn one‑off production solutions into reusable patterns and platform capabilities, raising the overall production bar for Distyl’s AI systems over time
- 3+ years of software engineering experience
- Deep Production Engineering
Experience:
Built and operated high‑scale systems—low‑latency APIs, streaming pipelines, real‑time services, or large batch processing systems—and can reason deeply about performance, throughput, and reliability. Experience with real‑time voice systems is a strong plus - Strong Systems and Backend Fundamentals:
Write high‑quality production code and understand distributed systems concepts such as concurrency, fault tolerance, back pressure, and graceful degradation. You are comfortable optimizing systems under tight latency and throughput constraints - Operational Excellence Mindset:
Treat observability, instrumentation, and incident response as first‑class concerns. Logging, metrics, tracing, alerting, and on‑call readiness are integral to how you design and operate systems - Ownership of AI Systems in Production:
Take responsibility for AI systems end‑to‑end—design, deployment, monitoring, and ongoing health. When something breaks, you care about understanding why, fixing it properly, and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).