Site Reliability Engineer II Job London area,Greater London England UK,IT/Tech

Location: Greater London

Job Description

The Enterprise Technology Services organization partners with every part of the American Express business to power the company’s growth and innovation with trust and efficiency, and drive competitive differentiation with speed. We support the delivery and operations of technology, digital, and data capabilities, platforms, and services globally. Specifically, our team is responsible for the company’s technology engineering, architecture, and infrastructure, providing 24x7 support to ensure an uninterrupted, high-quality experience for customers and colleagues.

We also provide product management for core enterprise platforms, and lead technology risk and information security, enterprise data governance and platforms, digital product and design, and enterprise AI platforms on behalf of the company.

Site Reliability Engineer II collaborates with engineering teams to enhance system resilience, scalability, and performance through feature development, automation, architectural design, resiliency testing, and disaster recovery planning, while promoting best practices for continuous improvement.

Responsibilities

Collaborates with Software Engineering teams to design, develop, and implement features that enhance system resilience, scalability, and performance, while identifying and addressing potential system bottlenecks and failure points with guidance from senior colleagues
Develops and implements automation tools and frameworks, including infrastructure as code (IaC) practices to streamline operational workflows, deployment processes, and infrastructure management, with guidance from peers and leaders
Collaborates with senior engineers to contribute to the architectural design of systems, ensuring that reliability, scalability, and performance considerations are integrated into design discussions and decision‑making processes
Collaborates in the design and execution of chaos engineering experiments and other resiliency testing, analyzing results and implementing improvements to enhance system robustness and recovery capabilities, with guidance from peers and leaders
Develops and implements disaster recovery plans and business continuity strategies, ensuring systems can recover quickly and effectively from unexpected disruptions
Collaborates with seniors to promote and implement best practices such as error budgeting, service‑level objectives (SLOs), and service‑level indicators (SLIs), contributing to a culture of continuous improvement and reliability
Collaborates and co‑creates effectively with teams in product and the business to align technology initiatives with business objectives
Participates in a 24‑by‑7 on‑call rotation team, including working on a weekend shift rota at least once every 4–6 weeks

Qualifications

Education

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, Engineering, and/or comparable experience; advanced degree preferred
Knowledge of modern observability stack – Splunk, Elastic Search, Prometheus, Grafana
Knowledge of containerization technologies (e.g., Kubernetes, Docker) and microservices architecture
Knowledge of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms
Knowledge of cloud‑based Site Reliability Engineering (SRE) practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud

Work Experience

Experience in software development, or technology operations, with a focus on Site Reliability Engineering
Experience in Linux/Unix systems, object‑oriented programming languages (e.g., Java), scripting languages (e.g., Python, Bash), and cloud platforms (e.g., AWS, Azure, GCP)

Employment Eligibility

Employment eligibility to work with American Express in the UK is required as the company will not pursue visa sponsorship for these positions.

About American Express

At American Express, our culture is built on a 175‑year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. From delivering differentiated products to providing world‑class customer service, we…