Lead Site Reliability Engineer
Syracuse, Onondaga County, New York, 13201, USA
Listed on 2026-05-30
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
Could your creative thinking build the future? A Lead Site Reliability Engineer at McGraw Hill makes a difference for learners and educators across the world. Our team needs individuals with new ideas who connect with people in innovative ways.
Impact the MomentMcGraw Hill, a leading provider of digital educational resources and content, is seeking a Lead Site Reliability Engineer to lead a team of 6 Engineers for our Digital Platform Group. You will support our K‑12 learning platforms that serve millions of students and educators nationwide, ensuring their reliability, scalability, and performance. Working closely with engineering and product teams, you will leverage expertise in AWS, Terraform, and observability tools to drive automation, enhance resiliency, and maintain the health of our cloud‑based infrastructure.
Remote position – open to applicants authorized to work in the United States.
What you will be doing:- Lead a 6‑member SRE team supporting production infrastructure and services
- Manage backlog, sprint planning, and team velocity
- Own reliability, uptime, security, cost, and performance of services
- Define and monitor SLOs for application workloads
- Plan on‑call rotations and work to reduce alert fatigue
- Forecast seasonal growth and capacity planning
- Mentor engineers and foster professional growth
- Report status and issues to leadership monthly
- Partner with development teams
- Collaborate with Cyber Security on risk mitigation
- Collaborate with Fin Ops on cost reduction
- Design and troubleshoot highly‑distributed, cloud‑based production systems
- Maintain infrastructure‑as‑code and monitoring‑as‑code practices
- Improve system resiliency through failure injection and chaos testing
- Participate in on‑call rotation and resolve operational issues
- Optimize existing systems for performance and cost
- Ensure telemetry provides visibility to application performance
- Support agile development practices and code reviews
- 5+ years of experience in SRE, Dev Ops, or Software Engineering roles supporting enterprise applications.
- Strong problem‑solving, triage, and root cause analysis skills with a systems engineering mindset.
- Deep expertise in the AWS ecosystem, with hands‑on experience across core services including ECS, RDS, EKS, IAM, Cloud Watch, and networking configurations.
- Expertise with Terraform for managing and automating scalable cloud infrastructure.
- Skilled in CI/CD pipelines (e.g., Git Hub Actions) and managing end‑to‑end software delivery life cycles.
- Strong familiarity with telemetry and observability tools (e.g., New Relic, Datadog), including querying logs and metrics for performance monitoring.
The work you do at McGraw Hill will matter. We are collectively designing content that will build the future of education. Play your part and experience a sense of fulfillment that will inspire you to even greater heights.
The pay range for this position is between $124,000 and $155,000 annually. Base pay may vary based on experience and location. A full range of medical and other benefits may be provided. Learn more about our benefit offerings.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).