Director Site Reliability Engineering
Listed on 2026-03-11
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing
About Movius
At Movius, we solve a critical gap companies face with employee-to-client communication over voice and messaging. We are the leading global provider of Secure Communication as a Service (SCaaS™). Our flagship solution, Multi Line™, enhances workflows, resolves compliance gaps, and unifies cross‑channel messaging. Movius AI‑powered solutions enable businesses to build strong, lasting customer relationships in a company‑owned, controllable system. Welcome to Phone 3.0™.
Headquartered in Alpharetta, GA, with offices in New York, Silicon Valley, Bangalore, and London, Movius partners with leading carriers like T‑Mobile, Vodafone, TELUS, BT, Singtel, and more. Learn more ius.ai.
Director, Site Reliability EngineeringRole Overview
We are seeking a Director of Site Reliability Engineering (SRE) to lead the reliability, scalability, and operational excellence of our Mobile‑first SIP‑based communications SaaS platform
. This platform supports mission‑critical voice, messaging, and unified communications services used by highly regulated global enterprise customers.
The Director of SRE will be responsible for ensuring carrier‑grade reliability, performance, and security of our distributed multi‑cloud infrastructure while building and leading a high‑performing SRE organization. This role partners closely with Engineering, Product, Security, and Customer Experience to deliver resilient, scalable, and observable systems.
The ideal candidate combines deep technical expertise in real‑time communications infrastructure with strong leadership and operational discipline.
Key Responsibilities Reliability & Platform Operations- Own availability, reliability, and performance of the communications SaaS platform supporting voice, SMS/RCS/MMS, SIP signaling, and mobile services.
- Define and manage SLOs, SLIs, and error budgets for mission‑critical services.
- Drive operational excellence through incident management, post‑mortems, and continuous improvement.
- Ensure 99.99%+ service availability for carrier and enterprise customers.
- Oversee reliability of SIP signaling infrastructure, SBCs, media servers, messaging gateways, and telecom interconnects.
- Ensure stability and scaling of real‑time voice and messaging workloads across distributed multi‑cloud environments.
- Collaborate with telecom partners and carriers to maintain high service quality and interconnect reliability.
- Lead reliability engineering across multi‑region multi‑cloud infrastructure (AWS and/or IBM cloud).
- Build highly available architectures with geo‑redundancy, active‑active deployments, and automated failover.
- Drive infrastructure‑as‑code, automation, and self‑healing systems.
- Establish best‑in‑class monitoring, alerting, tracing, and observability frameworks.
- Implement deep telemetry for call quality, SIP performance, messaging delivery, and system health.
- Use data‑driven insights to improve system resilience and operational response.
- Lead 24/7 operational readiness including on‑call processes and war room coordination.
- Define incident severity models, response playbooks, and escalation frameworks.
- Conduct blameless post‑incident reviews and drive systemic improvements.
- Partner with security teams to ensure platform resilience against fraud, abuse, and telecom‑specific threats.
- Maintain compliance with telecom and enterprise security standards.
- Build and scale a world‑class SRE organization across multiple regions.
- Mentor senior engineers and technical leaders.
- Drive a culture of ownership, reliability, and operational excellence.
- Work closely with software engineering, product and customer experience teams.
- Influence architecture decisions to ensure systems are operable, scalable, and resilient.
- 10+ years of experience in site reliability engineering, cloud infrastructure, or platform operations.
- 5+ years of leadership experience managing SRE or infrastructure teams.
- Strong expertise in real‑time communications systems, including:
- SIP signaling
- SBCs
- Media infrastructure
- VoIP…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).