Site Reliability Engineer
Listed on 2026-06-27
-
IT/Tech
Cloud Computing: Infrastructure & Operations, SRE/Site Reliability, Systems Engineer, IT Infrastructure
Job Description
Site Reliability Engineer (SRE) for Google Cloud Platform (GCP) and Azure Platform, focused on building, maintaining, and improving the reliability, scalability, and performance of cloud infrastructure using Infrastructure as Code (IaC) and Terraform Enterprise. Supports the delivery of secure, compliant, and highly available cloud environments aligned with enterprise standards and regulatory requirements.
Works closely with engineering and platform teams to develop and maintain reusable IaC modules, Terraform configurations, and automated cloud services, enabling consistent and efficient infrastructure provisioning. Contributes to the implementation of standardized platform patterns, including networking, identity, logging, and monitoring capabilities.
Participates in the end-to-end lifecycle of cloud infrastructure, including deployment, monitoring, incident response, and continuous improvement. Helps implement and maintain CI/CD pipelines, policy‑as‑code frameworks, and automation solutions to ensure reliable and repeatable deployments.
Applies SRE principles and practices, including monitoring, alerting, incident management, and root cause analysis, to improve system reliability and reduce operational risk. Supports the definition and tracking of service performance through metrics such as availability and latency.
Collaborates with architecture, security, and engineering teams to ensure infrastructure is secure, compliant, and operationally resilient. Contributes to Dev Sec Ops practices by integrating security and compliance controls into automated workflows.
Continuously identifies opportunities to improve system reliability, reduce manual effort, and enhance automation. Leverages emerging tools and technologies, including AI/ML where applicable, to support proactive operations, observability, and platform stability.
Key Responsibilities- Design, develop, and maintain Google Cloud Platform (GCP) infrastructure using Infrastructure as Code (IaC) with Terraform Enterprise
- Contribute to the implementation of scalable, secure, and compliant cloud solutions aligned with enterprise standards
- Develop and maintain reusable Terraform modules and standardized infrastructure patterns to enable consistent and automated provisioning of GCP resources
- Follow and contribute to code quality standards, design patterns, and peer review practices to ensure reliable and maintainable infrastructure code
- Support the adoption and use of Terraform Enterprise for automated provisioning, policy enforcement, and infrastructure governance
- Implement and maintain cloud automation workflows, including provisioning, configuration management, and environment setup
- Build and enhance CI/CD pipelines for infrastructure delivery, ensuring automated testing, validation, and compliance checks
- Implement policy-as-code and security controls, ensuring infrastructure meets regulatory and enterprise compliance requirements
- Participate in the end-to-end lifecycle of infrastructure delivery, including deployment, monitoring, and continuous improvement
- Collaborate with architecture, security, and engineering teams to ensure secure, resilient, and compliant cloud configurations
- Apply Dev Sec Ops and cloud‑native practices to improve automation, security, and deployment efficiency
- Contribute to observability, logging, and monitoring solutions to support proactive incident detection and response
- Execute testing and validation of IaC modules, including integration and deployment verification
- Identify opportunities to automate manual processes and improve operational efficiency
- Support reliability, scalability, and performance of cloud platforms through automation and standardization
- Troubleshoot and resolve infrastructure and platform issues, contributing to root cause analysis and continuous improvement
- Work with stakeholders to implement infrastructure solutions that meet technical and business requirements
- Evaluate and adopt emerging tools and technologies to enhance automation, reliability, and platform performance
- Conduct performance testing and capacity planning to ensure systems scale reliably under load
- Optimize system performance,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).