More jobs:
Site Reliability Engineer - Human Engineering
Job in
Austin, Travis County, Texas, 78719, USA
Listed on 2026-06-03
Listing for:
Apple Inc.
Full Time
position Listed on 2026-06-03
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
We're looking for a Site Reliability Engineer who thinks like a systems engineer first and an operator second. You won't just keep things running - you'll shape how our platform evolves. Our team operates 50+ services across Kubernetes and AWS, handles sensitive health and research data, and is ramping up many architectural shifts: new service-to-service auth patterns, event-driven pipelines, and a move from on-prem to cloud-native infrastructure.
We need someone who gets excited about that kind of work, can reason about distributed systems at the design level, and is a strong enough communicator to bring the rest of the team along.
The Human Engineering Software team builds tools used across Apple for user studies, research participant management, health data collection, and privacy-preserving analytics. Our infrastructure spans Django backends, Kubernetes clusters (self-hosted and AWS), Postgre
SQL, Redis, Kafka, Elasticsearch and a growing set of internal service integrations. This role is engineering-forward SRE. You'll spend as much time designing systems as operating them. You'll work closely with our full-stack engineers to improve how services communicate, how we observe production behavior, and how we ship changes safely. You'll have a seat at the architecture table - we want you proposing solutions, not just implementing them.
BS in Computer Science, Engineering, or equivalent practical experience, with 7+ years of experience in distributed systems
Experience with event-driven architectures (Kafka, Rabbit
MQ, or similar messaging systems)
Experience with service mesh or API gateway patterns (Istio, Envoy, Kong, or similar) Familiarity with Django/Python web applications and their operational characteristics (Celery, Gunicorn, Postgre
SQL)
Experience with observability tooling beyond basic monitoring: distributed tracing, SLO frameworks, structured logging Background working with sensitive data (health data, PII) and associated compliance requirements Experience leading incident response and building on-call culture Contributions to internal or open-source infrastructure tooling
BS in Computer Science, Engineering, or equivalent practical experience, with 5+ years of experience in distributed systems Deep experience with Kubernetes in production - cluster operations, networking, storage, troubleshooting Strong proficiency designing and operating services in AWS (EC2, EKS, RDS, S3, IAM, VPC) Hands-on infrastructure-as-code experience (Terraform, Helm, or equivalent) Proficiency in at least one backend language (Python, Go, or similar) - you can write production services, not just scripts
Experience with CI/CD pipeline design and Git Ops workflows Strong understanding of networking fundamentals: DNS, load balancing, TLS, firewall rules, service discovery Excellent communication skills. You can explain a complex system to a room of engineers who didn't build it Experience building internal automation or self-service tooling (Slack bots, CLI tools, workflow orchestration) that reduced manual operational work
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×