DevSecOps/Site Reliability Engineer
Listed on 2026-06-12
-
IT/Tech
Cybersecurity, Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability
We are seeking a skilled Dev Sec Ops / Sovereign Cloud Engineer with 4–5 years of experience to design, operate, and secure regulated cloud environments across AWS and GCP. The role combines cloud operations, security engineering, and SRE practices, including Kubernetes management, Infrastructure as Code (Terraform), CI/CD automation (Git/Git Lab), and secrets management (Hashi Corp Vault). The ideal candidate will drive security incident response, platform reliability, observability (Datadog), and 24x7 incident management (Pager Duty), while embedding security‑by‑design principles, automation‑first practices, and continuous improvement into sovereign cloud platforms.
Strong expertise in cloud security, networking fundamentals, compliance‑driven environments, and cross‑functional incident leadership is essential.
- Deploy, manage and maintain the organization’s Sovereign Cloud strategy, ensuring compliance with regulatory and data residency requirements on AWS/GCP public cloud.
- Managing and operating Kubernetes clusters including upgrades, scaling, and workload optimization.
- Security Incident Response & Risk Mitigation:
Investigate, analyze, and remediate cloud security incidents, proactively identifying and mitigating vulnerabilities within AWS environments. - Secure Cloud Strategy & Continuous Improvement:
Support and enhance the organization’s AWS cloud strategy by embedding security best practices and continuously improving the cloud security posture. - Security Automation & Dev Sec Ops Enablement:
Implement and tune security tools, automate policy‑driven responses, and advocate Dev Sec Ops practices to ensure secure‑by‑design cloud operations. - Implementing and managing Infrastructure as Code (Terraform) to provision, modify, and secure cloud resources.
- Maintaining and optimizing CI/CD pipelines using Git/Git Lab, ensuring secure and automated deployments.
- Managing secrets and secure access using Hashi Corp Vault, including token lifecycle, access policies, and secrets rotation.
- Troubleshooting complex infrastructure, networking, container, and performance issues across distributed systems.
- Observability & Reliability Engineering:
Monitor system health and performance using Datadog, define and manage SLIs/SLOs, and drive continuous reliability improvements aligned with SRE principles. - Incident Management & Operational Governance:
Manage 24x7 alerting and incident response through Pager Duty, perform root cause analysis (RCA), and actively contribute to incident, problem, and change management processes. - Cloud Security & Performance Optimization:
Conduct proactive system hardening, vulnerability remediation, performance tuning, and capacity planning across cloud environments. - Automation & Continuous Improvement:
Develop automation using Python/Bash/Terraform/Ansible to reduce manual effort, improve operational efficiency, and strengthen platform resilience.
To succeed in this role, you must have:
- 4–5 years of hands‑on experience in Dev Sec Ops , SRE roles.
- Experience with Terraform (IaC) Deployment, Git/Git Lab for CI/CD and version control for best practices.
- Experience working with Hashi Corp Vault, Terraform, Datadog, Pager Duty, Confluence etc.
- Strong understanding of Cloud Security principles (IAM, encryption, network security, container security, vulnerability management).
- Experience in incident management, change management, and root cause analysis processes.
- SRE & Automation Expertise:
Strong understanding of SRE principles (SLIs, SLOs, error budgets, reliability metrics) with hands‑on scripting experience in Python and/or Bash to drive automation‑first practices. - Networking & Compliance Knowledge:
Solid grasp of networking fundamentals (TCP/IP, DNS, load balancing, firewalls, VPNs, private endpoints) with experience or exposure to regulated and compliance‑driven environments. - Leads incident bridges during P1/P2 outages, coordinating cross‑functional teams and driving timely resolution with clear RCA.
- Maintains a strong customer and business‑focused mindset while prioritizing tasks.
- Cloud certifications in a plus.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).