Principal SRE Engineer
Listed on 2026-02-08
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Overview
At Movius, we solve a critical gap companies face with employee-to-client communication over voice and messaging. We are the leading global provider of Secure Communication as a Service (SCaaS™). Our flagship solution, Multi Line™, enhances workflows, resolves compliance gaps and unifies cross-channel messaging. Movius AI-powered solutions enable businesses to build strong and lasting relationships with their customers in a company-owned, controllable system.
Welcome to Phone 3.0™.
Headquartered in Alpharetta, GA, with offices in Silicon Valley, Bangalore, India, New York, and London, Movius partners with leading global wireless carriers like T-Mobile, Vodafone, TELUS, BT, Singtel & more. To learn more about Movius, visit (Use the "Apply for this Job" box below)..
Responsibilities- Maintain architecture blueprints, playbooks, and templates for SLOs, postmortems, and change management.
- Lead the end-to-end SRE architecture and define technical, reliability, and automation standards.
- Drive the SRE roadmap aligned with business SLAs, platform goals, and cloud strategy (AWS preferred).
- Serve as reliability authority in design reviews and architecture boards.
- Architect and manage full-stack observability (Elastic Stack, Open Telemetry, Prometheus) with integrated traces, metrics, and logs.
- Define and automate SLO/SLI tracking, error budgets, and incident management life cycles.
- Build event-driven, self-healing systems and automate infrastructure, deployments, and monitoring.
- Optimize distributed systems, Kubernetes workloads, and microservices for scale, performance, and cost.
- Lead chaos engineering, root cause analysis, and continuous improvement to reduce MTTR.
- Mentor engineers in reliability, automation, and architecture best practices; champion an automation-first culture.
- Bachelor’s or Master’s degree in Computer Science, IT, or equivalent experience.
- 15+ years in Dev Ops, Infrastructure, or SRE roles.
- 4+ years in a senior/principal-level capacity driving SRE strategy and automation.
- Proven success designing and scaling large distributed, cloud-native platforms.
- Telecom domain experience is a plus.
- Deep knowledge of AWS (EKS, EC2, RDS, IAM, VPC, Kafka, Cloud Watch, API Gateway, Lambda, WAF, KMS).
- Helm Chart mastery and container orchestration (EKS).
- Hands-on experience with Elastic APM and observability tools.
- Expert in Terraform, Jenkins, Bitbucket, and Python/Bash/Go scripting.
- Strong grasp of SLO/SLI frameworks, error budgets, and AIOps.
- Experience in chaos engineering, performance optimization, and resilience testing.
- Excellent documentation and system design communication skills.
- AWS Certified Solutions Architect – Professional or Dev Ops Engineer – Professional.
- Certified Kubernetes Administrator (CKA) or Application Developer (CKAD).
- SRE Foundation, Google SRE, Dynatrace Performance Professional, or Elastic Certified Engineer.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).