Site Reliability Engineer - Kubernetes
Listed on 2026-05-16
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Location: City of Syracuse
Secure Every Identity, from AI to Human
Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. We are looking for builders and owners who operate with speed, urgency, and execution excellence.
This is an opportunity to do career‑defining work. We're all in on this mission. If you are too, let’s talk.
Okta Workforce Identity Cloud (WIC) provides easy, secure access for your workforce so you can focus on other strategic priorities—like reducing costs and doing more for your customers.
If you like to be challenged and have a passion for solving large‑scale automation, testing, and tuning problems, we would love to hear from you. The ideal candidate exemplifies the ethic of “If you have to do something more than once, automate it” and can rapidly self‑educate on new concepts and tools.
Position OverviewThe Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud‑native applications and services. This position focuses on architecting and managing reliable, scalable, and secure Kubernetes‑based platforms on AWS, ensuring high availability and performance while optimizing costs and automation. The ideal candidate will have hands‑on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh.
Key Responsibilities- Kubernetes Platform Creation:
Design, implement, and maintain highly available, scalable, and fault‑tolerant Kubernetes platforms. - AWS Infrastructure Management:
Build, manage, and optimize AWS cloud infrastructure, including EKS, ECS, S3, VPCs, RDS, IAM, and more. - Helm Management:
Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. - Karpenter Implementation:
Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands. - Istio Service Mesh Management:
Configure and manage Istio to provide service‑to‑service communication, security, and observability within Kubernetes clusters. - Platform Automation & Scaling:
Automate the deployment, scaling, and management of infrastructure and applications with CI/CD pipelines. - Incident Management & Troubleshooting:
Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security. - Security & Compliance:
Design and implement secure cloud infrastructure with appropriate access controls and compliance frameworks. - Documentation & Knowledge Sharing:
Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices.
- 4+ years of experience with Kubernetes/Helm.
- 4+ years of experience with Terraform.
- 5+ years of experience with AWS.
- Experience with multi‑region cloud environments.
- Proven experience with AWS services (EC2, RDS, S3, Cloud Formation, IAM, etc.) and solid understanding of cloud-native architectures.
- Strong expertise in Kubernetes platform creation, management, and optimisation.
- Hands‑on experience with Helm for Kubernetes application deployment and management.
- Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage.
- Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability.
- Proficiency in CI/CD pipelines and automation tools (Jenkins, Git Lab, Circle
CI, Terraform, Ansible, Spinnaker). Strong scripting skills in Python, Bash, or Go. - Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, Cloud Watch, and ELK Stack.
- Understanding of security best practices for cloud platforms and Kubernetes (RBAC, encryption, compliance frameworks).
- Familiarity with Docker and containerisation principles.
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent professional experience).
- Certifications (preferred): CKA, CKAD, or AWS Certified Dev Ops Engineer.
- Access to federal environments and/or protected federal data, with…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).