×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer - Managed Kubernetes

Job in Seattle, King County, Washington, 98127, USA
Listing for: Lambda
Full Time position
Listed on 2025-12-03
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below

Senior Site Reliability Engineer – Managed Kubernetes

Join Lambda’s mission to make compute as ubiquitous as electricity. The Senior Site Reliability Engineer role focuses on operating production Kubernetes clusters for AI/ML workloads.

What You’ll Do
  • Operate and maintain bare‑metal Kubernetes clusters scaling to thousands of nodes.
  • Handle cluster degradation, recovery, resizing, and incident response using fleet management tools.
  • Participate in a well‑managed on‑call rotation for critical incidents.
  • Assist customers with Kubernetes questions, workload integration, storage, and authentication.
  • Work closely with HPC Ops and Datacenter Ops teams for low‑level or cross‑functional issues.
  • Use Python and Go to create tooling and automate validation of platform quality.
  • Design, build, and maintain scalable control plane services, operators, and custom controllers for Kubernetes.
  • Develop automation for cluster lifecycle management: provisioning, upgrades, patching, and deletion.
  • Define and implement SLOs and SLIs for Kubernetes services, workloads, and platform reliability.
About You – Must‑Have
  • 6+ years of experience in SRE, operations engineering, or similar roles with deep Linux cluster knowledge.
  • Strong programming skills in Go and Python; experience with Git Ops (ArgoCD), Helm, and Kubernetes operators.
  • Proven experience operating Kubernetes clusters in production environments (on‑prem, EKS, GKE, or similar).
  • Can work independently or as part of a team, handling incidents via tickets or live messaging.
  • Familiarity with observability tools such as Prometheus, Grafana, Fluent Bit and CI/CD pipelines.
  • Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar.
Nice‑to‑Have
  • Deep Kubernetes expertise: CRDs, CSI, CNI, and operator coding.
  • Experience with HPC clusters, AI/ML workloads, or large‑scale GPU clusters.
  • Hybrid or multi‑cloud Kubernetes environment experience.
  • Contributions to CNCF projects or Kubernetes SIGs.
Benefits
  • Generous cash and equity compensation.
  • Health, dental, and vision coverage.
  • 401(k) with company match.
  • Paid time off and flexible paid time off plans.
  • Wellness and commuter stipends for select roles.
Equal Opportunity Employer

Lambda is an Equal Opportunity Employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation, veteran status, citizenship, or any other factors prohibited by law.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary