Site Reliability Engineering; SRE - Data Platform
Listed on 2026-02-07
-
IT/Tech
Cloud Computing, SRE/Site Reliability
Overview
Summary Apple Services Engineering (ASE) designs and maintains the systems, platforms, and infrastructure that support Apple's global services, such as Apple Music, iCloud, Siri, Maps, and more. ASE services must scale globally, remain highly available and consistently performant. If you are passionate about designing, engineering, and running systems and infrastructure that will help millions of customers, this is the place for you.
Description Apple Services infrastructure is planetary scale. Our Data Platform Site Reliability Engineering team manages the infrastructure and applications on bare-metal and cloud computing platforms to deliver data processing, governance, and storage for many of Apple’s global products and organizations. Our platform teams work with exabytes of data, terabytes of memory, and hundreds of thousands of jobs running millions of executors to support predictable and performant data analytics.
Our platform enables key features in Apple Music, TV, Maps, News, and other world class products. Ensuring all of these technologies in geographically distributed data centers work together in harmony presents unique challenges. As an SRE at Apple, you’ll need to solve problems that arise using empirical data, teamwork, and your own unique expertise.
Data Platform Services SREs work directly with our partner engineering teams, tightly collaborating with the software developers to deliver seamless experiences for our customers. We run a mix of open source, vendor licensed, and proprietary tools which you will use and have opportunities to improve upon. The cross functional team collaborates to ensure we apply a consistent incident management process across all data platform services and provide user journey based SLOs derived from exhaustive observability metrics, high availability architecture, and automation for deployments.
We think critically and strive to balance long-term optimal solutions with the business priorities for each engineering challenge we face. Good ideas are heard and results are rewarded.
- BS/MS in Computer Science or Equivalent
- 5+ years of software development or production operations experience in a large-scale environment
- Proficiency in authoring and releasing code in Go, Python, or Java using common configuration management and software delivery platforms
- Experience operating production applications at scale, including well designed performance testing, HA and disaster recovery concepts, capacity planning, and managing distributed systems on internal and public cloud infrastructure, principally Kubernetes
- Understanding of the Linux Operating System, containers and virtualization, standard networking protocols, and components
- Strong sense of ownership and integrity demonstrated through clear communication and collaboration
- Demonstrates excellent troubleshooting and problem solving skills using the scientific method
- Proficiency with the architecture, deployment, performance tuning, and troubleshooting of open source data analytics or governance technologies such as Apache Spark, Flink, Hive, Hadoop/HDFS, Trino, and/or Druid.
- The successful candidate is frustrated with toil and has an acute drive to both automate manual operations and evolve them into automatic processes.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.
Apple accepts applications to this posting on an ongoing basis.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).