×
Register Here to Apply for Jobs or Post Jobs. X

Software Engineer - Managed Kubernetes

Remote / Online - Candidates ideally in
Bellevue, King County, Washington, 98009, USA
Listing for: AI Chopping Block, Inc.
Remote/Work from Home position
Listed on 2026-05-22
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Staff Software Engineer - Managed Kubernetes

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU.

If you'd like to build the world's best AI cloud, join us.

* Note:

This position requires presence in our San Francisco, San Jose, or Bellevue office location 4 days per week;
Lambda’s designated work from home day is currently Tuesday.

About the Role

Lambda is building the AI Cloud of the future. We are seeking a Staff Engineer to help our development of our Managed Kubernetes platform. Think GKE, but purpose-built for AI workloads and running on bare metal. This is a foundational technical leadership role where you will shape the infrastructure that powers the next generation of AI training and inference at scale.

As a Staff Engineer on our Orchestration team, you will collaborate to help drive the technical vision for Lambda's managed orchestration services, including Managed Kubernetes, Managed Slurm on Kubernetes, and higher-level platform services for inference and AIOps. You'll work at the intersection of distributed systems, GPU-accelerated computing, and Cloud Native infrastructure to build systems that are reliable, performant, and elegantly simple for our customers.

This is not a role for someone who just operates Kubernetes; it is a technical leadership role for an engineer who has synthesized the core domains of infrastructure (compute, network, storage, security) and can design holistic solutions across all of them. You'll be working closely with NVIDIA's open-source ecosystem, and partnering with internal teams across the stack to deliver a world‑class managed platform.

What

You'll Do:
Product Engineering
  • Drive technical vision for Lambda's Managed Kubernetes bare‑metal platform, including control plane scalability, multi‑tenancy, cluster lifecycle management, and high availability
  • Integrate and extend NVIDIA's open‑source ecosystem: GPU Operator, Network Operator, DCGM, NCCL, and emerging projects like AICR and Topograph for topology‑aware scheduling and placement
  • Design GPU‑aware orchestration systems
  • Lead development of services that power our managed services
  • Inform on and help with networking solutions for AI workloads: CNI integration (Cilium, Multus), high‑performance fabrics (Infini Band, RoCE), RDMA, and GPUDirect. You will work closely with our Network team to define and drive requirements
  • Inform and help with storage architecture requirements for AI workloads. You will partner with Storage teams on what managed K8s, Slurm, and future services need
  • Build the foundation for Managed Slurm on Kubernetes, enabling traditional HPC workloads to run seamlessly alongside Kubernetes workload
  • Design higher‑level platform services for inference, including model serving infrastructure, autoscaling based on inference load, and multi‑model deployment patterns
  • Design self‑healing systems and automation for incident response, root cause analysis, and platform resilience
  • Lead chaos engineering efforts to validate system behavior under failure conditions at scale
  • Establish operational excellence for a managed service: upgrade automation, security patching, and zero‑downtime maintenance
Cross‑Functional Infrastructure Leadership
  • Serve as the technical bridge between Orchestration and other infrastructure teams (Network, Storage, Security), translating platform requirements into actionable specifications
  • Drive infrastructure‑wide decisions that enable successful managed services. You’re someone who understands what’s needed end‑to‑end, not just at the Kubernetes layer
  • Provide input on bare‑metal provisioning, network topology, and storage systems to ensure they meet the needs of managed the services being built by the Orchestration organization
  • Champion consistency and standardization across Lambda's infrastructure stack
  • Work directly with customers and internal teams to understand existing deployments and chart a path to the managed platform
Technical Leadership
  • Set technical direction for…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary