×
Register Here to Apply for Jobs or Post Jobs. X

Evaluation Reliability SRE

Job in Cupertino, Santa Clara County, California, 95015, USA
Listing for: Apple
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
** Weekly

Hours:

** 40

** Role Number:*
* ** Summary*
* Siri's quality signal drives every model and product decision before a release ships. But a signal is only as trustworthy as the infrastructure behind it.

The Evaluation Reliability Engineering (ERE) team exists to make that infrastructure bulletproof. Within ERE, Core SRE owns the production backbone: resource management, session orchestration, on-call response, and the observability systems that surface failures before they corrupt evaluation signal. We sit at the intersection of distributed systems, ML evaluation infrastructure, and operational excellence.

** Description*
* This is a senior hands-on role. You share primary on-call as part of a global follow-the-sun rotation, lead incident investigations end-to-end, and set the operational bar the rest of the team works against. You are fluent with agentic coding tools like Claude Code, Cursor, or Copilot, and use them as a force multiplier across runbook authoring, automation, and log analysis.

** Minimum Qualifications*
* + 5+ years of site reliability, infrastructure, or platform engineering experience with direct on-call ownership in production systems

+ Hands-on orchestration experience (Kubernetes or equivalent): cluster health, resource management, scheduling, and failure diagnosis at scale

** Preferred Qualifications*
* + Experience owning or closely operating a device or VM provisioning pipeline; familiarity with virtualization-layer failure modes is a strong plus

+ Track record of improving system reliability against measurable outcomes - uptime, MTTR, incident frequency - not just responding to incidents but eliminating their causes

+ Incident command discipline: able to lead a multi-team incident from declaration to close-out

+ Depth in at least one of: distributed systems reliability, device management infrastructure, evaluation or ML platform operations

+ Demonstrated cross-team technical influence; prior experience shaping reliability practices beyond the immediate team
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary