Lead Site Reliability Engineer Job Springfield area,Missouri USA,IT/Tech

Title and Summary

Lead Biz Ops Engineer

The Mastercard Business Operations (Biz Ops) organization is seeking a Lead Biz Ops Engineer to serve as a technical authority and operational architect across critical platforms. This role is designed for a senior individual contributor who thrives at system‑level thinking, drives SRE, Dev Ops maturity at scale, and influences outcomes across programs and portfolios.

As a Lead Biz Ops Engineer, you will operate beyond a single application or team, shaping reliability strategy, defining standards, and elevating operational excellence across Mastercard’s most business‑critical services. You will partner deeply with product engineering, architecture, security, and leadership to ensure platforms are designed, delivered, and operated with resilience, scalability, and customer trust at their core.

Overview

If this describes you, you'll feel at home here: you proactively design out operational risk rather than reacting to it; you influence without authority and lead through technical credibility and data; you see CI/CD, automation, observability, and reliability as foundational engineering disciplines, not tooling exercises. Biz Ops is at the forefront of Mastercard’s Operational Resilience evolution, driving modern tooling, standardized practices, and consistent operating models across the enterprise.

Mission

Biz Ops acts as the production readiness and operational resilience steward for Mastercard platforms. Your mission is to embed reliability, operability, and compliance into platform design and delivery, ensuring services are highly available, resilient, and performant; observable, self‑healing, and automation‑driven; secure, compliant, and auditable by design; operated through repeatable, scalable, low‑toil processes. You will provide continuous feedback loops into engineering and product teams, ensuring lessons learned from production meaningfully improve future designs and customer experience.

What

We Do in Biz Ops

We deliver this mission through deep incident ownership with rigorous root‑cause analysis tied to business impact; a shift‑left operational mindset that influences architecture and design before code reaches production; enterprise‑grade risk management, controls, and compliance oversight; standardized and streamlined support models that reduce friction for partners; bridging product intent and operational reality to deliver reliable, customer‑centric platforms. At the Lead level, you are expected to shape these practices, not just execute them.

Key Responsibilities

Technical Leadership & Architecture Act as a Lead‑level technical authority for reliability, operability, and production readiness across multiple platforms or programs. Influence system architecture, design patterns, and platform standards to improve resiliency, scalability, and fault tolerance. Partner with engineering and architecture teams during pre‑production and roadmap phases to guide capacity planning, failure modeling, and launch readiness. Challenge designs constructively, advocating for operational simplicity, automation, and sustainable on‑call models.
Operational Excellence & Reliability Own and evolve availability, latency, performance, and reliability objectives for critical systems. Lead complex production events and cross‑platform investigations, reducing MTTR through systemic fixes, not workarounds. Champion blameless postmortems, ensuring remediation actions translate into measurable reliability improvements. Identify recurring failure patterns and drive engineering‑led elimination of toil.
Dev Ops, CI/CD & Automation Provide leadership for CI/CD strategy, ensuring pipelines support automated validation, risk‑based gating, and safe, repeatable deployments. Drive adoption of automation‑first practices across build, deploy, test, recovery, and compliance workflows. Influence Dev Ops standards across teams, enabling consistent, high‑quality software delivery at scale.
Observability & Self‑Healing Systems Define and promote standards for monitoring, alerting, SLOs, and telemetry. Enable proactive detection, predictive alerting, and self‑healing capabilities across…