Senior Platform Engineer Job San Francisco area,California USA,IT/Tech

As a Senior Platform Engineer at vCluster Labs, you aren't just maintaining infrastructure; you are the backbone of our engineering velocity. In this role, you will act as "Customer Zero" for our own products, building the internal platforms that enable our engineering teams to ship faster, securely, and more efficiently. You will have the unique opportunity to use vCluster to create cutting‑edge internal services while providing critical feedback directly to our product teams.

As a Sr. Platform Engineer, your role will include:

Infrastructure Management: Own and improve our multi‑cloud infrastructure spread across AWS, GCP, and Digital Ocean. You will manage Kubernetes clusters, handle patching, manage access, and enhance to ensure our tooling has robust alerts and metrics.
CI/CD Optimization: Drive the improvement of Git Hub CI pipelines. You will be responsible for creating secure, repeatable testing environments and automating pipeline updates to streamline the developer experience.
Internal Services Architecture: Architect and host infrastructure for engineering development, including internal services and vCluster‑specific platforms (e.g., loft.rocks, vCluster Cloud). You will empower engineers to build pipelines securely through education and tooling.
Customer Zero: Act as the first and most critical user of our products. You will push vCluster features to their limits to create useful internal tools, discovering bugs and providing feedback to Engineering to shape the future of our software.
Terraform Automation: Focus on automating updates and managing infrastructure as code using Terraform Spacelift. You will give the team the ability to create infrastructure on demand, ensuring scalability and consistency.
Execution: Manage a variety of Kanban tasks via Linear, ranging from improving observability to handling Git Hub policy requests, release engineering, and access management.

This role could be a fit for you if you bring:

Experience: You have 5+ years Platform Engineering or Dev Ops, with a focus on modern, cloud‑native technologies.
Technical Fluency: You are an expert in HCL and Terraform Modules, and you have deep experience administering Kubernetes clusters.
Pipeline Mastery: You have extensive "must‑have" experience with Git Hub Pipelines and know how to optimize them for speed and security.
Cloud Proficiency: You have hands‑on experience managing and deploying in public clouds, specifically AWS or GCP.
Modern Tech Mindset: You thrive in environments that reject legacy tech. You want to work with a modern stack where you can pick your own hardware and solve a variety of problems, from pipelines to internal services.

Bonus points for:

Automation

Skills:

Experience writing automation scripts with Bash or Python.
Programming: Proficiency in Go or Python is a significant plus.
Kubernetes Depth: Relevant certifications such as CKA (Certified Kubernetes Administrator) or experience writing Kubernetes Operators.
Documentation: Basic experience writing technical documentation and a willingness to build AI Automation in the documentation to contribute to our knowledge base.
Tools: Familiarity with Linear for task management.

About vCluster Labs

We are a venture‑backed tech startup and the company pioneering Kubernetes virtualization for the AI era. We raised +$30M from top‑tier VCs such as Khosla Ventures (first investor in OpenAI, Git Lab, Stripe, Doordash) and are in a hyper‑growth phase looking for motivated people to complement our team. Our headquarters are in San Francisco (Salesforce Tower), but our team is distributed around the globe and we have a remote‑first work culture.

We are the leading platform for operating GPU infrastructure, enabling AI Cloud providers to deliver a hyperscaler‑like experience to their customers and AI factories that need to build that same experience for their internal teams. Our platform delivers the full operational stack operators need to run their GPU data centers — managed Kubernetes, fast isolated tenant provisioning, and automated node provisioning and lifecycle management — enabling them to accelerate time to value, reduce operational burden, and maximize the…