Director, Site Reliability Engineering & Cloud Operations; SRE
Listed on 2026-05-23
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
At Resideo, we imagine a world where homes and buildings are good for the planet, and where technology works to simplify everyday life. In that world, people are healthy, happy, and secure. To help create this future, we will work every day to simplify the connected world so people have peace of mind and can focus on what matters most. Resideo is making a large investment in our engineering group to develop and launch new products around the world (NPI).
This is an exciting opportunity to lead cloud operations for one of the largest IoT ecosystems in the world, shaping the future of cloud infrastructure, SRE, and AI‑driven operations. You will work alongside world‑class engineering talent and cutting‑edge technologies to ensure Resideo’s mission of simplifying everyday life through innovative connected products.
Job Duties Cloud Infrastructure & SRE Strategy- Define and execute global cloud operations and SRE strategies, ensuring 99.999%+ uptime for mission‑critical IoT services.
- Architect, implement, and optimize multi‑cloud infrastructure to support IoT devices with low‑latency data processing, scalability, and high availability.
- Drive cost optimization strategies while balancing performance, redundancy, and financial efficiency across cloud platforms (Azure).
- Develop automated deployment, monitoring, and recovery systems using technologies like Kubernetes, Terraform, Ansible, and CI/CD pipelines.
- Establish and refine SLOs, SLIs, and KPIs for service reliability, performance, and capacity planning.
- Build and optimize incident management, disaster recovery, and resilience engineering frameworks.
- Leverage AI/ML‑driven automation for proactive failure detection and remediation.
- Implement robust security practices and ensure cloud security, compliance with standards such as SOC2, GDPR, and NIST, and oversee the zero‑trust security model for IoT data protection.
- Collaborate with security and compliance teams to manage risk and ensure regulatory adherence across cloud platforms.
- Lead and mentor a global team of Cloud Engineers, SREs, and software professionals, fostering a culture of continuous learning and innovation.
- Partner with product management, software engineering, and customer support to optimize IoT device onboarding, firmware updates, and cloud‑to‑edge performance.
- Collaborate with finance and executive leadership to develop long‑term cloud investment strategies.
- 15+ years in Computer Science, Electrical Engineering, or a related field.
- 15+ years of experience in Cloud Operations, SRE, or Infrastructure Engineering, with 8+ years in technical leadership roles.
- 5+ years of experience managing large‑scale, distributed IoT cloud environments supporting billions of data points per day.
- 5+ years of deep professional experience in Azure cloud platforms, including networking, storage, compute, and database services.
- 5+ years of experience in Kubernetes, Terraform, CI/CD pipelines, and observability tools (e.g., Prometheus, Grafana, ELK).
- 5+ years of experience in large‑scale systems design and architecture, focusing on reliability, performance, and scalability of cloud‑native platforms.
- 5+ years of hands‑on experience with tools like Terraform, Ansible, CDK, Pulumi for Infrastructure‑as‑Code (IaC), and managing cloud‑native architectures.
- Strong background in AI/ML‑driven automation for cloud infrastructure monitoring, self‑healing, and optimization.
- Solid understanding of security‑first cloud architectures, Dev Sec Ops , and compliance standards (SOC2, GDPR, NIST).
- Proven ability to manage teams across multiple global time zones, ensuring operational excellence and driving performance in large, distributed environments.
- Expertise in incident management, disaster recovery, and building resilience engineering frameworks.
- Ability and desire to review code, system designs, and engage in system engineering discussions and decisions.
- Experience managing Consumer IoT ecosystems with large‑scale sensor data processing and real‑time analytics.
- Expertise in serverless architecture,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).