Senior Platform Engineer – Kubernetes & Middleware Platforms
Listed on 2025-12-01
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, Data Engineer
Overview
We’re seeking a strategic technologist and Kubernetes/Container platform engineer with 15+ years of experience to lead and scale our Kubernetes container platform and middleware stack. You will architect, deploy, operate, and evolve high-availability Kubernetes infrastructure (EKS and Open Shift), ensuring seamless middleware operations (Kafka, Redis Enterprise Cluster, 3
Scale API Gateway). You will automate container deployment (ArgoCD), enforce container security and network policies, oversee capacity planning, helm chart development, and define policies-as-code governance across the environment. Your mission is to deliver a hardened, future-ready platform that enables multiple engineering teams to develop, deploy, and scale cloud-native applications reliably and securely.
- Design and implement infrastructure abstractions and APIs that simplify deploying AI workloads using Kubernetes-native operations and Git Ops patterns.
- Architect, deploy, and manage Kubernetes platforms (AWS EKS and Red Hat Open Shift) across different environments.
- Implement Git Ops workflows with ArgoCD to manage declarative app deployments.
- Design and operate middleware infrastructure:
- Highly available Kafka clusters (mirroring, partitioning, tooling)
- Managed Redis Enterprise clusters (sharding, high-availability, replication)
- 3
Scale API Gateway development and administration
- Build and manage helm charts, templating, parameterization, and versioning for both platform and middleware stacks.
- Enforce container security and policy governance using policies-as-code tools (e.g. OPA, Kyverno), scanning (e.g. Clair, Snyk), and automated admission controls.
- Implement network policies (Kubernetes Network Policy / Calico) to enforce segmentation and micro-segmentation.
- Configure and manage service mesh (e.g. Istio, Linkerd) for observability, traffic controls, and secure service-to-service communication.
- Conduct capacity planning, cluster sizing, resource tuning, and autoscaling strategies.
- Conduct architecture reviews, train engineers, and drive platform best practices across teams.
- Partner with SREs to define platform SLAs, uptime targets, resilience benchmarks, and alerting/monitoring.
- Lead incident response and root cause analysis, automating recovery workflows and improving platform resiliency.
- 15+ years of overall engineering experience, including:
- At least 8+ years with Kubernetes platforms (EKS, Open Shift) in production.
- Experience in managing streaming and caching infrastructure at scale (Kafka, Redis Enterprise Clusters).
- Hands-on administration or development of API Management / Gateway platforms — preferably Red Hat 3
Scale. - Demonstrated ability to collaborate with cross-functional teams to deploy AI workloads on Kubernetes or cloud-native platforms.
- Deep knowledge of Dev Sec Ops principles, container security, governance, and compliance in enterprise environments.
- Strong automation experience:
Helm, Git Ops, ArgoCD, IaC (Terraform/AWS-Cloud Formation/Ansible). - Experience configuring service mesh, network policy controls, and multi-tenancy in Kubernetes.
- Experience scripting with Python, Bash, Groovy, or equivalent; hands-on experience developing automation tooling, custom Kubernetes operators/controllers, or other platform-level integrations.
- Thorough understanding of core Kubernetes concepts and observability tooling.
- Experience with capacity planning, cluster sizing, and performance tuning for critical infrastructure.
- Strong troubleshooting skills across Kubernetes, middleware, and distributed systems; experience leading incident response and root cause analysis.
- EKS and/or Open Shift administration certification (CKA, AWS Certified Kubernetes Administrator, Red Hat Certified Open Shift Administrator, or equivalent).
- Knowledge of middleware architecture for high-throughput, low-latency messaging systems.
- Experience with cloud cost optimization and chargeback models.
- Familiarity with CI/CD pipelines (Jenkins, Git Hub Actions) and alerting (Prometheus, Grafana, ELK/Splunk or similar).
- Familiarity with CNCF ecosystem tools and emerging trends in platform engineering and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).