Lead SRE/DevOps Engineer
Listed on 2025-12-23
-
IT/Tech
IT Support, Cloud Computing, Systems Engineer, SRE/Site Reliability
This range is provided by Synechron. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Direct message the job poster from Synechron
Base pay range$/yr - $/yr
We are hiring Sr. Data Engineer with ITIL certification - NYC. Please share resumes to / Day 1 onsite (Hybrid 3 Days…At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & Dev Ops, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our Fin Labs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more.
Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.
We are seeking a highly skilled Lead Site Reliability Engineer (SRE) / Dev Ops Engineer to drive the reliability, observability, and operational excellence of our platforms. This role will lead major initiatives around monitoring, automation, incident response, and performance optimization leveraging enterprise tools such as Dynatrace, Big Panda, and Log Scale/Mon Pro. Candidate will partner closely with engineering, operations, and product teams to build robust systems, improve service availability, and ensure a seamless user experience through proactive observability and best-in-class SRE practices.
Additional Information*
The base salary for this position will vary based on geography and other factors. In accordance with law, the base salary for this role if filled within Pittsburgh, PA/Dallas, TX is $125k - $135k/year & benefits (see below).
The RoleResponsibilities:
Observability & Monitoring
- Implement and enhance proactive observability frameworks to anticipate and mitigate issues before they occur.
- Optimize experience monitoring and user interaction metrics across applications and services.
- Manage and improve the event catalog, ensuring all system events are structured and actionable.
- Build and maintain dashboards, alerts, and health reporting using tools like Dynatrace, Big Panda, Mon Pro, and Log Scale.
- Perform service tuning to improve system performance based on real-time metrics and data analysis.
- Establish and maintain observability standards and best practices across teams.
- Conduct chaos testing and resilience validation to ensure high system availability.
- Lead anomaly detection practices to quickly identify and respond to unusual system behavior.
SRE Practices
- Ensure platform stability, performance, and reliability through proven reliability engineering principles.
- Drive SRE initiatives, including continuous improvement projects within the Site Reliability Center.
- Develop, maintain, and scale automated orchestration pipelines to streamline operations and improve efficiency.
- Create, maintain, and enforce SRE standards, including SLIs, SLOs, and operational playbooks.
- Lead and conduct root cause analysis for critical incidents and drive long-term remediation improvements.
- Own the problem management lifecycle—identifying, tracking, and resolving underlying issues to prevent recurring incidents.
- Collaborate with cross-functional teams to address systemic issues and drive operational resilience.
- 7+ years of experience in SRE, Dev Ops, or Infrastructure Engineering roles.
- Hands-on expertise with observability/monitoring tools such as:
- Log Scale / Mon Pro / Logic Monitor or similar log and metrics platforms
- Solid experience with cloud platforms (AWS, Azure, or GCP).
- Strong proficiency in automation & orchestration (Terraform, Ansible, Jenkins, Git Hub Actions, etc.).
- Proven track record in incident management, RCA, and implementing…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).