×
Register Here to Apply for Jobs or Post Jobs. X

Manager Site Reliability Engineering

Job in New York City, Richmond County, New York, 10261, USA
Listing for: Sphere Entertainment Co
Full Time position
Listed on 2025-11-27
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, Cybersecurity
Salary/Wage Range or Industry Benchmark: 96000 - 160000 USD Yearly USD 96000.00 160000.00 YEAR
Job Description & How to Apply Below

Manager Site Reliability Engineering

Location: New York City, NY

Position Type: Full-time

Sphere Entertainment Co. (NYSE: SPHR) is a premier live entertainment and media company. The Company includes Sphere, a next‑generation entertainment medium powered by cutting‑edge technologies to redefine the future of entertainment. The first Sphere venue in Las Vegas opened in September 2023. In addition, the Company includes MSG Networks, which operates two regional sports and entertainment networks, MSG Network and MSG Sportsnet, as well as a direct‑to‑consumer and authenticated streaming product, MSG+, delivering a wide range of live sports content and other programming.

More information is available at

Who are we hiring?

The Manager, Site Reliability Engineering will lead the platform stability, scalability, and security efforts for our digital sports streaming application. This is a hands‑on technical leadership role focused on maintaining the reliability of our AWS‑based infrastructure, enhancing observability and automation, and ensuring the performance and security of systems that power live and on‑demand video streaming. This role will be central to triaging video playback issues, guiding cloud architecture, and reducing mean time to recovery (MTTR).

What will you do?

  • Own the reliability, performance, and security of the platform infrastructure that supports our live and on‑demand video streaming app
  • Lead and grow a small technical team (SRE, Video Ops) and act as a hands‑on mentor and contributor.
  • Design and maintain robust monitoring, logging, and alerting systems, using tools such as Cloud Watch, Datadog, and Conviva, to ensure visibility into platform health, fast incident response, and high availability across our video streaming infrastructure.
  • Define and enforce operational best practices including disaster recovery, redundancy, backup, and failover strategies.
  • Investigate and resolve complex issues across the application stack, from infrastructure and APIs to video delivery and playback.
  • Lead incident response efforts and participate in an on‑call rotation during peak traffic events (typically evenings EST).
  • Collaborate with Product and Engineering teams to guide architectural decisions that prioritize platform resilience, scalability, and security.
  • Partner with L1 Operations and Customer Care teams to triage issues, drive incident resolution, and close the loop on recurring or systemic problems
  • Own the implementation and continuous strengthening of platform security, including identity management, secrets handling, IAM policies, and AWS‑level hardening.
  • Evaluate and introduce new tools, technologies, and architectural patterns to improve the reliability of the system.
  • Track and improve SLAs, SLOs, and operational KPIs related to uptime, latency, video playback quality, and security posture.

What do you need to succeed?

  • 5+ years of experience in SRE, Dev Ops, or platform infrastructure roles, with 2+ years in a team lead or manager capacity.
  • Experience operating and scaling production environments in AWS, including services like Cloud Front, Lambda, S3, API Gateway, and Cloud Watch.
  • AWS Certification (Solutions Architect, Dev Ops Engineer, or similar) or equivalent deep hands‑on experience.
  • Strong background in system observability, with experience using tools like Conviva, Cloud Watch, and Datadog for monitoring, distributed tracing, and alerting.
  • Deep understanding of video streaming architecture including HLS/DASH, CDNs, DRM, SSAI, and multi‑platform delivery (mobile, web, CTV).
  • Expertise in scripting and automation using Python, Bash, or similar, with infrastructure‑as‑code tools like Terraform or Cloud Formation
  • Proven ability to lead platform security initiatives, including IAM policy management, token handling, and securing service architecture.
  • Experience collaborating with engineering teams to improve CI/CD pipelines, automate infrastructure changes, and support safe production releases.
  • Strong analytical and troubleshooting skills across application, network, and video delivery layers.
  • Excellent communication skills with the ability to drive cross‑functional alignment and manage vendor…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary