Engineering Manager, Platform Services
Listed on 2026-06-05
-
IT/Tech
SRE/Site Reliability, Cloud Computing
Austin, Texas, United States Software and Services
At Apple, new insights often become revolutionary products, services, and customer experiences very quickly. Bring passion and dedication to your job, and there's no telling what you could accomplish. Enterprise Technology Services (ETS) is part of IS&T and delivers global-scale platforms and services that keep Apple's operations secure and running. The team manages identity, device security, and anti-abuse platforms — covering everything from manufacturing and repairs to software updates and activations.
ETS also oversees supply chain, manufacturing, and partner integration platforms, protecting data on more than 2.5 billion devices worldwide. And when Apple prepares for a global product launch, ETS owns the systems that ramp factory production — managing serial numbers, network credentials, and verified software. The Emerging Technologies team specializes in building forward-looking, extremely scalable platforms. The team is passionate about solving challenging problems, exploring new domains, and engineering transformational solutions.
The diversity of our team and thinking inspires innovation that runs through everything we do.
As a Manager for Site Reliability and Operations (SRE), you will lead a team of Site Reliability Engineers to ensure the reliability, scalability, and performance of production systems. This role combines technical expertise with leadership skills to drive operational excellence and foster a culture of collaboration and continuous improvement. As part of the role, you will work with the team to automate operations, optimize infrastructure, and troubleshoot issues in an exciting, fast‑paced environment.
This role is designed for driven individuals who:
• Love learning new technologies and thrive in solving complex challenges.
• Comfortable in a fast‑paced, changing environment and able to manage competing priorities.
• Ability to work effectively across teams and influence without authority.
• Are independent, motivated, and excited to take on ambitious projects.
• Excel at collaborating with engineering teams and can stay calm under pressure.
• Have a passion for delivering quality, reliable solutions in a dynamic, high‑energy workplace.
- Lead, mentor, and develop a team of SREs.
- Foster a culture of reliability and excellence within the team.
- Promote continuous learning and knowledge sharing.
- Help the team build and maintain robust, highly available systems.
- Automate CI/CD processes.
- Ensure the availability and performance of production systems.
- Oversee incident response, post‑mortem analysis, and root cause investigations.
- Implement and maintain service‑level objectives (SLOs) and service‑level indicators (SLIs).
- Work closely with development, quality, product, and other engineering teams to ensure reliability is prioritized in the development lifecycle.
- Communicate effectively with stakeholders regarding reliability metrics, incident reports, and team progress.
- Develop and execute a strategic roadmap for the SRE team.
- Identify areas for improvement and propose solutions that align with business goals.
- Optimize resource allocation and usage for operational efficiency.
- Identify and assess risks to production systems and work to mitigate them.
- BS degree or higher in Computer Science or a related field.
- 5+ years in a site reliability engineering, Dev Ops, or related role, with at least 2 years in a lead capacity.
- Strong understanding of systems architecture, cloud infrastructure, and monitoring tools.
- Proficiency in one or more programming languages, in particular Java.
- Proven experience in leading and mentoring engineering teams.
- Strong analytical skills and the ability to troubleshoot complex systems.
- Knowledge of fundamentals of network, databases, system administration, version control, CI/CD automations.
- Machine Learning will be a plus.
- Strong problem‑solving and communication skills.
- Knowledgeable with container‑based technologies such as Docker, Kubernetes, or EKS.
- Knowledgeable with modern web services architectures and cloud platforms such as AWS and GCP.
- Exceptiona…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).