Engineering Manager, Inference Developer Productivity
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Data Engineer
Overview
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
RoleAnthropic's Inference organization is responsible for serving Claude to millions of users and enterprise customers with the speed, reliability, and efficiency that frontier AI demands. As we scale across multiple accelerator platforms—GPUs, TPUs, and Trainium—the complexity of our development environment grows in lockstep. We re looking for an Engineering Manager to build and lead a new team focused on developer productivity within Inference: making every engineer in the org dramatically more effective at building, testing, and shipping inference software.
This is a leadership role at the intersection of infrastructure and developer experience. You ll own the tool chains, workflows, and feedback loops that Inference engineers depend on every day. Your work will establish priorities to keep our engineering velocity high and driving investments to keep the larger org productive. You ll partner closely with Anthropic s central Infrastructure organization (where company-wide developer productivity efforts live) while ensuring the Inference org s unique needs are met.
This role is ideal for someone who gets deep satisfaction from unblocking other engineers, who thinks in terms of systems and feedback loops, and who can lead a team that operates at the seam between ML infrastructure and software engineering productivity.
Responsibilities- Build and lead a high-performing team focused on developer productivity for the Inference organization, hiring engineers who combine infrastructure expertise with a service-oriented mindset
- Own accelerator toolchain management across GPU (CUDA), TPU, and Trainium platforms—keeping compilers, drivers, libraries, and frameworks current, compatible, and well-tested so that Inference engineers can focus on model serving rather than environment issues
- Build infrastructure for efficient accelerator usage during development—including devbox environments, automation for pre- and post-landing validation, and shared tooling that reduces the friction of working across heterogeneous hardware
- Establish and drive productivity metrics across the Inference org, creating dashboards, alerts, and processes that surface slowdowns early (e.g., smoke tests red for extended periods, build times regressing, toolchain breakages) and ensure rapid resolution
- Identify and eliminate inefficiencies across Inference engineering workflows—proactively finding bottlenecks, toil, and friction points that slow down the org, and building systems or driving process changes to address them
- Partner with Anthropic s Infrastructure org to align on company-wide developer productivity initiatives, contribute Inference-specific requirements, and avoid duplicating effort while ensuring Inference s specialized needs (multi-accelerator support, large-scale testing) are well-served
- Coach and develop engineers on your team, providing clear direction, actionable feedback, and growth opportunities in a fast-moving environment
- You may be a good fit if you have 3+ years of engineering management experience, ideally leading infrastructure, platform, or developer productivity teams
- Strong technical background in systems engineering, build/test infrastructure, or ML infrastructure—you can go deep on toolchain issues, CI/CD pipelines, and developer workflow optimization
- Experience managing tool chains or development environments for compute-intensive workloads (ML training/inference, HPC, large-scale distributed systems)
- Familiarity with at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron) and an appetite to learn the others
- A track record of defining and using engineering metrics to drive organizational improvement—you ve built dashboards, set SLOs on developer workflows, or led initiatives to measurably improve engineering velocity
- Experience partnering across…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).