×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer - Info Apps

Job in Austin, Travis County, Texas, 78719, USA
Listing for: Apple Inc.
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Do you love building and scaling infrastructure that delights millions of customers? At Apple, we believe reliability is a feature. We are looking for a Site Reliability Engineer to join our team in overseeing the performance and availability of our core backend services in News, Stocks, Weather, Books and Creator Studio applications. As a SRE, you won't just be responding to alerts;

you will be shaping the evolution of our observability strategy, a mentor for incident management, and a champion for automation. You will help us refine our "Golden Signals" and ensure our Kubernetes-based ecosystem remains world-class.

In this role, you will be a key pillar of our engineering organization, ensuring that our services remain highly available and performant. Your impact will include:
System Architecture:
Designing and implementing the next generation of our telemetry and alerting systems. Reliability Engineering:
Defining SLOs/SLIs and ensuring our monitoring strategy captures the true health of the user experience. Operational Excellence:
Reducing operational load through software; if you have to do it twice, you'll want to automate it.

Collaboration:

Partnering with App Dev teams to influence the "design for reliability" phase of the software development lifecycle. Mentorship:
Acting as a technical lead for junior members and off-shore partners, providing guidance on runbook development and disaster recovery.

Search u0026 Data:
Specialized experience operating and tuning Solr or Elasticsearch working:
Strong understanding of TCP/IP, Load Balancing (ELB/ALB), and Service Mesh (Istio/Linkerd). Data Systems:

Experience with Kafka, Cassandra, or Postgres in a distributed environment.

Experience:

5+ years in SRE, Dev Ops, or Infrastructure roles with a proven track record of managing high-traffic, internet-facing production environments. Kubernetes Expertise:
Deep experience building and operating container orchestration systems (EKS/GKE/Vanilla K8s). You should be comfortable troubleshooting from the networking layer up to the application pod. Observability Champion:
Expert knowledge of the 4 Golden Signals (Latency, Traffic, Errors, and Saturation). Proficiency with tools like Prometheus, Grafana, and Splunk is essential. Cloud Proficiency:
Hands-on experience designing and maintaining resilient infrastructure on public cloud providers (AWS, GCP, or Azure). Scripting u0026 Automation:
Strong ability to code at a scripting level (Python or Go preferred) to automate toil and build self-healing systems. Incident Leadership:
Experience leading incident response, performing Root Cause Analysis (RCA), and implementing blameless post-mortems to improve system resilience. Infrastructure as Code:
Proficient in Terraform, Cloud Formation, or Pulumi to manage immutable infrastructure.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary