×
Register Here to Apply for Jobs or Post Jobs. X

Senior Manager, Site Reliability Engineering

Job in Dallas, Dallas County, Texas, 75215, USA
Listing for: JCPenney
Full Time position
Listed on 2026-04-29
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Salary/Wage Range or Industry Benchmark: 103500 - 172500 USD Yearly USD 103500.00 172500.00 YEAR
Job Description & How to Apply Below

Overview

Senior Manager, Site Reliability Engineering

The Site Reliability Engineering Manager is responsible for overseeing the daily operations and delivery of the Site Reliability Engineering teams. This role plays a key part in driving team productivity and ensuring the ongoing health, performance, resilience, and stability of Catalyst’s eCommerce and CRM platforms. In addition to managing operational aspects, the SRE Sr.Manager actively contributes to the technical direction of the team.

This includes shaping the automation strategy, guiding telemetry and observability practices, leading solution delivery, and managing incidents and problems affecting platform reliability. This is a hybrid leadership role that combines technical expertise with people management. The SRE Manager also contributes to both short and long-term planning initiatives—spanning systems architecture, team development, and organizational strategy.

What You Will Do
  • Provide both technical and people leadership to Site Reliability Engineering (SRE) teams through regular one-on-one meetings, team syncs, and performance reviews.
  • Manage project execution by organizing cross-functional teams, assigning responsibilities, and tracking progress against defined schedules and milestones.
  • Assist in budgeting, workforce planning, hiring, and third-party contract negotiations to support team growth and operational goals.
  • Drive continuous improvements in platform reliability, stability, and performance by overseeing the deployment of fully automated telemetry, observability, and AI-driven monitoring solutions.
  • Lead the development and enhancement of intelligent alerting and automated incident response systems to improve service restoration speed and issue detection.
  • Collaborate with administrators and platform engineers on implementation decisions to ensure highly reliable infrastructure, systems, and integrations.
  • Document all changes in accordance with change control policies and documentation standards; identify risks and recommend corrective actions when necessary.
  • Provide advanced Incident Management and Problem Management support by analyzing telemetry data and system logs to identify, remediate, and prevent reliability issues.
  • Participate in on-call escalation support rotations in alignment with the 24/7/365 support model.
  • Act as the Escalation Manager/Critical Incident Manager during major incidents, guiding teams through structured and effective service recovery.
  • Communicate timely updates and incident reports to senior leadership during and after critical events.
  • Lead conversations and provide business and engineering support for both internal stakeholders and external customers.
What You Will Need Experience & Leadership
  • 10+ years of experience in global organizations, with a proven ability to communicate effectively across all levels—from executives to individual contributors.
  • 5+ years of hands-on Site Reliability Engineering (SRE) experience, including platform automation, telemetry, observability, and self-healing systems.
  • Demonstrated leadership and collaboration in high-availability, mission-critical digital environments.
  • Should have strong support knowledge and understanding on retail ecommerce flow - Web and Mobile technologies.
  • Work with software engineers across scrum teams and performance engineering to ensure systems are meeting reliability and performance standards.
  • Hands‑on experience with debugging, optimizing code and automation.
  • Identify opportunities to adopt innovative technologies and continuous improvement – Automation, Shift left, Self‑Heal.
Platform & Application Support
  • Extensive experience supporting and administering digital retail and eCommerce platforms with one of the Cloud providers (AWS/Azure/Google Cloud).
  • Demonstrated experience in application design, software development, testing and production support of Java‑J2EE based eCommerce applications.
  • Practical experience monitoring and maintaining streaming platform technologies such as Apache Kafka.
  • Deep understanding of cloud-native architectures and platform operations.
Monitoring, Telemetry & Observability
  • Proficient with modern monitoring, logging, and telemetry…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary