×
Register Here to Apply for Jobs or Post Jobs. X

Senior Engineer, Datacenter Server Lifecycle

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: anthropic
Full Time position
Listed on 2026-03-02
Job specializations:
  • IT/Tech
    Systems Engineer, Data Engineer
Salary/Wage Range or Industry Benchmark: 320000 - 405000 USD Yearly USD 320000.00 405000.00 YEAR
Job Description & How to Apply Below

Senior Engineer, Datacenter Server Lifecycle About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the Role

Anthropic is expanding beyond cloud infrastructure, and this role sits at the heart of that effort. As a Senior Engineer on the Datacenter Machine Lifecycle team, you will own the end‑to‑end operational journey of every machine in our facility — from initial provisioning and deployment, across its working life, through maintenance and refresh, and all the way to decommissioning. This is greenfield work: you will help define the processes, tooling, and operational standards that govern how we run and retire hardware at scale.

A distinguishing aspect of this role is its deep intersection with security. The machines in our datacenter handle some of the most sensitive workloads in AI — training frontier models and serving millions of users interacting with Claude. Ensuring that every machine in the fleet is trusted, attested, and operating with a verified chain of integrity from the hardware up is a core part of the job, not an afterthought.

You will partner closely with our Infrastructure Security team to define and enforce trusted compute standards across the lifecycle, from secure provisioning through end‑of‑life handling.

Responsibilities
  • Lead the build‑out of automation to support datacenters containing tens of thousands of servers.
  • Own and define the end‑to‑end machine lifecycle strategy — from provisioning and deployment through operation, maintenance, refresh, and decommissioning — and maintain automation and operational procedures for common lifecycle events (e.g. hardware failures, firmware upgrades, fleet rotations).
  • Partner closely with Infrastructure Security to design and enforce trusted compute standards across the machine lifecycle.
  • Work closely with our Networking team to ensure end‑to‑end connectivity across all sites.
  • Build and maintain tooling to track machine health, configuration, and operational status across the full datacenter fleet.
Qualifications
  • Have 5+ years of experience in datacenter operations, hardware infrastructure management, or a closely related discipline.
  • Have deep, hands‑on experience with server hardware — including rack deployment, cabling, troubleshooting, and understanding failure modes at scale.
  • Understand hardware lifecycle management end‑to‑end: asset tracking, provisioning workflows, maintenance scheduling, and decommissioning practices.
  • Have strong proficiency in at least one programming language (e.g., Python, Rust, Go, or Java).
  • Are comfortable navigating ambiguity and working independently to drive progress on complex, cross‑functional problems.
  • Communicate clearly and can build consensus with a wide range of stakeholders.
  • Have working knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.
  • Are comfortable with occasional travel to datacenter sites across North America.
Strong Candidates May Also Have
  • Hands‑on experience with GPU or AI accelerator hardware (e.g. NVIDIA A100/H100, AMD MI300, Google TPUs, or AWS Trainium) and an understanding of their operational demands.
  • Familiarity with modern provisioning tooling such as coreboot, Linux Boot, or u-root.
  • Experience building or contributing to datacenter automation or fleet management platforms.
  • Experience building and deploying server operating system distributions across thousands of hosts.
  • A background in large‑scale capacity planning and hardware refresh strategy, ideally at a hyperscaler or large cloud provider.
  • Experience with trusted compute and hardware security concepts such as secure boot, TPM, hardware attestation, and firmware verification — or a strong desire to develop deep expertise in this area.
Compensation

$320,000 - $405,000 USD

Logistics

Education requirements:

We require at least a Bachelor's degree in a related field or equivalent experience.

Location‑based hybrid…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary