Senior DevOps/SRE Engineer
Listed on 2026-05-19
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations, Cybersecurity, SRE/Site Reliability
Atlanta, Austin, Chicago, Dallas, Houston
Description & Requirements WHAT MAKES US A GREAT PLACE TO WORKWe are proud to be consistently recognized as one of the world’s best places to work. We are currently the top ranked consulting firm on Glassdoor’s Best Places to Work list and have earned the #1 overall spot a record seven times.
Extraordinary teams are at the heart of our business strategy, but these don’t happen by chance. They require intentional focus on bringing together a broad set of backgrounds, cultures, experiences, perspectives, and skills in a supportive and inclusive work environment. We hire people with exceptional talent and create an environment in which every individual can thrive professionally and personally.
WHO YOU’LL WORK WITHAs the premier consulting partner for the private equity industry, Bain's PEG boasts a global practice that is over three times larger than any competitor. Our network of over 1,000 professionals supports private equity and institutional investor clients through every stage of the investment life cycle, from deal generation and due diligence to portfolio value creation and exit planning.
Bain & Company is developing a suite of cutting-edge data and software solutions designed to revolutionize how the private equity industry uses data for investment insights and decision-making.
The PEG Innovation team's mission is to create analytical solutions for Bain clients, teams, and the broader institutional investor space using proprietary software and data products. This includes the development, commercialization, and daily management of Bain's proprietary datasets, data, and software businesses.
WHERE YOU’LL FIT WITHIN THE TEAMSenior Dev Ops / SRE Engineers own the CI/CD pipelines, Git Ops infrastructure, Kubernetes operations, and reliability engineering practices that keep the PE platform running at production quality on Microsoft Azure. You make it safe to deploy frequently and easy to recover when things go wrong. You work closely with Platform Engineering, Data Platform, and Product squads to ensure every team can ship confidently and operate their services without heroics.
WHATYOU'LL DO Core Platform Reliability, Delivery, and Operations (80%)
- Design, build, and maintain CI/CD pipelines across all repositories using reusable Git Hub Actions workflows.
- Own the ArgoCD Git Ops configuration; manage application promotion from staging to production.
- Operate and upgrade the EKS cluster; manage node groups, Karpenter provisioners, and cluster add‑ons.
- Maintain the Terraform estate across all environments; review and apply infrastructure changes via Atlantis.
- Define and maintain SLOs, alerting rules, and Grafana dashboards for all platform services.
- Operate and maintain Hashi Corp Vault (and/or Azure Key Vault); manage auth backends, policies, and secret engine configuration.
- Implement and maintain supply chain security controls: image scanning, signing, SBOM generation, and OPA policy enforcement.
- Collaborate with the Security Engineer on network policy, egress controls, and compliance requirements.
- Participate in on‑call rotation; lead incident response and post‑incident review process.
- Automate repeatable operational work; reduce manual fixes through tooling and runbook automation.
- Document runbooks proactively and keep them current as systems evolve.
- Use AI tooling to draft infrastructure code and runbook content, validating outputs against security and compliance standards before merging.
- Partner with product and engineering teams to tune reliability practices (SLOs, alerting thresholds, deployment safety checks) and to remove friction from developer workflows.
- Communicate clearly during incidents: calm, factual, and action‑oriented.
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field (or equivalent practical experience).
- 6+ years of experience in Dev Ops, SRE, Platform Engineering, or Production Operations roles supporting cloud‑hosted, multi‑service platforms.
- Demonstrated experience owning production CI/CD, Git Ops, and Kubernetes operations for multi‑service platforms.
- Experience operating and upgrading…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).