More jobs:
Job Description & How to Apply Below
Key Responsibilities:
Cloud Infrastructure & Operations:
Build, deploy, and maintain scalable, secure, and highly available cloud infrastructure on AWS across multi-account environments.
Implement and maintain Infrastructure as Code (IaC) using Terraform and AWS Cloud Formation, contributing to reusable modules used across product teams.
Optimize cloud environments for cost, performance, and reliability, supporting Fin Ops practices including Savings Plans, Spot strategy, and Graviton adoption.
Collaborate with engineering, data, and security teams to support resilient distributed systems.
Participate in continuous improvement initiatives across the platform.
Own incident response: on-call rotation, triage, mitigation, and blameless post-mortems.
Provide Cloud and Infrastructure support across platform teams.
Kubernetes & EKS:
Deploy, operate, and maintain Amazon EKS clusters in a multi-tenant production environment.
Support cluster upgrades, patching, and Kubernetes version lifecycle activities.
Contribute to internal Helm chart libraries and Git Ops-driven cluster configuration using ArgoCD or Flux.
Security & Reliability:
Implement zero-trust network principles and enforce IAM least-privilege across AWS accounts.
Support SRE practices: contribute to SLO definitions and monitoring for EKS, API Gateway, and related services.
Participate in incident response, postmortem analysis, and blameless RCA processes for platform-level issues.
Support chaos engineering exercises and disaster recovery testing across availability zones and regions.
Collaboration & Growth:
Partner with software engineering teams to deliver end-to-end solutions from design through production.
Evaluate new AWS services and open-source tooling to improve infrastructure capabilities.
Required Qualifications:
Hands-on experience with AWS cloud services: EC2, VPC, IAM, EKS, S3, Cloud Watch, API Gateway, Route 53, and more.
Experience operating Amazon EKS in production: cluster lifecycle, RBAC, IRSA, node groups, and autoscaling.
Proficiency in Infrastructure as Code with Terraform and AWS Cloud Formation.
Solid understanding of containerization:
Docker, Kubernetes architecture, and container lifecycle management.
Experience with monitoring and logging tools:
Prometheus, Grafana, Dynatrace, Open Search, ELK/Loki.
Strong Linux/Unix systems administration and scripting in Bash, Python, or similar.
Good knowledge of cloud security best practices: IAM, RBAC, secrets management, and network security.
Experience with Helm and Git Ops tools (ArgoCD, Flux).
Solid networking fundamentals: VPCs, subnets, load balancing, DNS, and Kubernetes ingress controllers.
Ability to troubleshoot distributed systems and debug complex production issues.
Strong problem-solving skills and the ability to work effectively in a fast-paced team environment.
Preferred
Skills:
AWS
Certifications:
Solutions Architect Associate/Professional or Dev Ops Engineer Professional.
Kubernetes
Certifications:
CKA or CKAD.
Experience with Karpenter for EKS node provisioning.
Exposure to microservices architecture and distributed systems at financial-services scale.
Experience with AWS API Gateway and Lambda Authorize rs for JWT/OIDC-based auth flows.
Background in cost optimization and performance tuning (Graviton, Spot, Savings Plans).
Familiarity with identity federation: OIDC, OAuth2, SAML, Auth0 integration.
Understanding of AI/ML infrastructure: model training pipelines, deployment on EKS, and model monitoring.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×