Engineering Manager, Accelerator Platform San Francisco, CA NY | Seattle, WA
Listed on 2026-02-20
-
IT/Tech
Systems Engineer, Hardware Engineer
Engineering Manager, Accelerator Platform About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the RoleEvery time someone talks to Claude—through the API, claude.ai, our cloud partners, or any of our expanding surfaces—the request lands on an AI accelerator. Not one kind, many kinds: TPUs, Trainium chips, GPUs. Each arrives with its own software stack, performance characteristics, failure modes, and operational quirks. Someone has to take raw silicon and turn it into a platform that the rest of Anthropic can build on without thinking about which chip is underneath.
That’s us.
The Accelerator Platform team owns the bring‑up and normalization of new hardware platforms for Anthropic's first‑party inference fleet. We sit between the low‑level systems teams and the serving infrastructure that runs production inference—bridging the gap so that every new accelerator generation ships as a first‑class production platform. It’s deeply technical work at the intersection of hardware enablement, distributed systems, and ML infrastructure, and it is directly on the critical path for Anthropic's compute strategy.
We’re hiring an Engineering Manager to build and lead this team. You'll inherit a small nucleus of experienced engineers and grow it into a standalone platform organization. You'll set technical direction, hire a strong team, and partner closely with hardware vendors, cloud providers, and teams across Inference to bring new accelerator generations online quickly and reliably.
Responsibilities- Build and lead the Accelerator Platform team—hiring, developing, and retaining engineers who thrive at the hardware/software boundary
- Own the end‑to‑end bring‑up lifecycle for new accelerator platforms (multiple generations of Trainium, TPUs, and GPUs), from initial silicon availability through production‑ready inference
- Define and drive the platform normalization layer—ensuring new hardware integrates cleanly with Anthropic's inference serving stack to provide a consistent abstraction
- Partner with cloud providers (AWS, GCP, Microsoft Azure) and chip vendors on hardware roadmaps, capacity planning, and platform‑specific technical challenges
- Collaborate closely with teams across Inference and Infrastructure to ensure new platforms meet production reliability and latency requirements from day one
- Contribute to Anthropic's multi‑cloud compute strategy—helping the organization maintain optionality across accelerator families and avoid lock‑in to any single vendor
- Manage the team's priorities across competing demands: new platform bring‑up, ongoing production support for existing platforms, and longer‑term investments in tooling and automation
- Have significant experience managing infrastructure or platform engineering teams (3+ years in engineering management)
- Have deep technical fluency in systems programming, distributed systems, or hardware/software co‑design—understand the stack deeply enough to make sound technical and hiring decisions
- Have experience bringing up or operating heterogeneous compute infrastructure at scale—whether that’s GPU clusters, TPU pods, custom ASICs, or FPGA deployments
- Are comfortable with ambiguity and can build structure where none exists; this team will be carved out as a new entity and you’ll define its charter, processes, and culture from scratch
- Think strategically about hardware roadmaps and can translate vendor capabilities into engineering plans
- Build strong cross‑functional relationships—this role requires tight collaboration with hardware vendors, cloud partners, and multiple internal teams
- Care deeply about both technical excellence and the people doing the work
- Have direct experience with ML accelerator architectures (GPU/CUDA, TPU/XLA, Trainium/Neuron, or similar)
- Have worked on ML inference serving infrastructure at scale (1000+ accelerators)
- Have…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).