Lead Site Reliability Engineer
Listed on 2026-07-02
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing: Infrastructure & Operations
Role Overview
Lead Site Reliability Engineer – The Business Operations team seeks a highly motivated and experienced Lead SRE to ensure the reliability, scalability, and performance of our applications that support Mastercard’s global operations. The role involves delivering technical expertise, fostering automation, and mentoring staff.
Responsibilities- Develop subject‑matter expertise in Site Reliability Engineering, influencing stakeholders and advancing solutions for existing products and services.
- Implement and maintain high‑availability system solutions to ensure stability, performance, and operational continuity.
- Evaluate operational requirements and develop effective technical solutions within existing frameworks.
- Lead automation and scripting efforts to streamline operational processes and incident response workflows.
- Troubleshoot and resolve complex system issues, escalating as necessary to maintain system health and proactively address risks.
- Contribute to documentation, knowledge sharing, and best practices to improve team operational procedures.
- Conduct reviews and quality‑assurance activities to uphold organizational standards for system stability.
- Keep current with industry trends and emerging technologies relevant to system reliability and operational automation.
- Guide and mentor junior team members, fostering a culture of continuous improvement.
- Observability – Implement solutions that enable collection, analysis, and visualization of metrics, logs, and traces for incident detection and continuous improvement.
- Programming and Scripting – Write and maintain code and scripts to automate tasks, build operational tools, and support monitoring, deployment, and incident response using languages such as Python, Go, Bash, or similar.
- Systems and Network Administration – Configure, operate, and troubleshoot Linux/Unix systems and network components with knowledge of networking, security, and system reliability.
- Cloud Computing and Infrastructure – Design, deploy, and manage applications and infrastructure on cloud platforms (AWS, Azure, GCP) to ensure scalability, security, availability, and operational efficiency.
- Reliability and Scalability – Design and operate systems for high availability, fault tolerance, and disaster recovery, ensuring scalability to meet current and future demand.
- Dev Ops Practices – Apply Dev Ops principles and practices, including CI/CD pipelines, containerization, and orchestration, to accelerate software delivery and operations.
- Troubleshooting – Systematically identify, diagnose, and resolve technical issues across systems, applications, and networks to restore functionality and minimize disruption.
- Capacity Planning and Performance Optimization – Monitor resource utilization, forecast future capacity needs, and optimize system performance to support growth and efficient infrastructure usage.
- IT Service Management – Apply IT service management principles to incident, problem, and change management, ensuring reliable service delivery aligned to business needs.
- Proactive Monitoring and Improvement – Use application reliability signals to anticipate issues, identify risks, and drive preventative improvements that enhance application performance and availability.
- Abide by Mastercard’s security policies and practices.
- Ensure the confidentiality and integrity of the information being accessed.
- Report any suspected information security violation or breach.
- Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.
In line with Mastercard’s total compensation philosophy, the successful candidate will be offered a competitive base salary and may be eligible for an annual bonus or commissions. Benefits for full‑time employees generally include insurance (medical, prescription drug, dental, vision, disability, life), a flexible spending account and health savings account, paid leaves (including 16 weeks new parent leave, up to 20 days bereavement leave), 80 hours of paid sick and safe time, 25 days of vacation, 5 personal days, 10 U.S. paid holidays, 401(k) with a company match, deferred compensation for eligible roles, fitness reimbursement or on‑site fitness facilities, tuition reimbursement, and more.
Interns receive paid sick/safe time, jury duty leave, and on‑site fitness facilities in certain locations.
Mastercard is a merit‑based, inclusive, and equal‑opportunity employer that considers applicants without regard to gender, gender identity, sexual orientation, race, ethnicity, disability or veteran status, or any other characteristic protected by law.
Pay RangesO'Fallon, Missouri: $122,000 – $207,000 USD
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).