Site Reliability Engineer II
Listed on 2026-06-18
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability
Job Description
The Enterprise Technology Services organization partners with every part of the American Express business to power the company’s growth and innovation with trust and efficiency, and drive competitive differentiation with speed. We support the delivery and operations of technology, digital, and data capabilities, platforms, and services globally. Specifically, our team is responsible for the company’s technology engineering, architecture, and infrastructure, providing 24x7 support to ensure an uninterrupted, high-quality experience for customers and colleagues.
We also provide product management for core enterprise platforms, and lead technology risk and information security, enterprise data governance and platforms, digital product and design, and enterprise AI platforms on behalf of the company.
Site Reliability Engineer II collaborates with engineering teams to enhance system resilience, scalability, and performance through feature development, automation, architectural design, resiliency testing, and disaster recovery planning, while promoting best practices for continuous improvement.
Responsibilities- Collaborates with Software Engineering teams to design, develop, and implement features that enhance system resilience, scalability, and performance, while identifying and addressing potential system bottlenecks and failure points with guidance from senior colleagues
- Develops and implements automation tools and frameworks, including infrastructure as code (IaC) practices to streamline operational workflows, deployment processes, and infrastructure management, with guidance from peers and leaders
- Collaborates with senior engineers to contribute to the architectural design of systems, ensuring that reliability, scalability, and performance considerations are integrated into design discussions and decision‑making processes
- Collaborates in the design and execution of chaos engineering experiments and other resiliency testing, analyzing results and implementing improvements to enhance system robustness and recovery capabilities, with guidance from peers and leaders
- Develops and implements disaster recovery plans and business continuity strategies, ensuring systems can recover quickly and effectively from unexpected disruptions
- Collaborates with seniors to promote and implement best practices such as error budgeting, service‑level objectives (SLOs), and service‑level indicators (SLIs), contributing to a culture of continuous improvement and reliability
- Collaborates and co‑creates effectively with teams in product and the business to align technology initiatives with business objectives
- Participates in a 24‑by‑7 on‑call rotation team, including working on a weekend shift rota at least once every 4–6 weeks
Education
Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, Engineering, and/or comparable experience; advanced degree preferred
- Knowledge of modern observability stack – Splunk, Elastic Search, Prometheus, Grafana
- Knowledge of containerization technologies (e.g., Kubernetes, Docker) and microservices architecture
- Knowledge of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms
- Knowledge of cloud‑based Site Reliability Engineering (SRE) practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud
- Experience in software development, or technology operations, with a focus on Site Reliability Engineering
- Experience in Linux/Unix systems, object‑oriented programming languages (e.g., Java), scripting languages (e.g., Python, Bash), and cloud platforms (e.g., AWS, Azure, GCP)
Employment eligibility to work with American Express in the UK is required as the company will not pursue visa sponsorship for these positions.
About American ExpressAt American Express, our culture is built on a 175‑year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. From delivering differentiated products to providing world‑class customer service, we…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: