Senior Backend Engineer, Delivery: Runway; Platform Engineering
Listed on 2026-05-30
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
Overview of this role
As a Senior Engineer on the Runway team, you'll lead the design, evolution, and operation of the Kubernetes-based platform and developer tooling that powers Git Lab's engineering organization. You'll drive strategic infrastructure initiatives across platform architecture, automation, and developer experience. That includes operating production Kubernetes clusters across cloud environments, scaling our ArgoCD-based Git Ops workflows, and setting infrastructure-as-code practices and standards across teams.
You'll mentor engineers, influence architectural decisions, and drive platform improvements that enhance reliability, observability, and security controls like RBAC and secrets management. Your work will establish clear patterns that make it easier for application teams to adopt modern practices and ship with confidence.
- Evolve ArgoCD Git Ops standards across environments (Application Sets, sync policies, and deployment guardrails)
- Build reusable Terraform modules and practices for safe, repeatable cloud infrastructure provisioning and drift detection
- Manage and evolve production-grade Kubernetes clusters across cloud environments, contributing to architectural decisions on upgrades, scaling, disaster recovery, and reliability improvements.
- Implement and maintain Git Ops workflows using ArgoCD, including Application Sets, sync policies, and deployment standards, and share best practices with teams adopting these patterns.
- Build and maintain reusable Terraform modules that enable safe, repeatable cloud infrastructure provisioning, including state management and drift detection practices.
- Lead incident response, drive post-mortems to clear conclusions, and implement improvements to availability, performance, and resilience as part of on‑call rotation.
- Partner with application teams to onboard services onto the platform, writing documentation, runbooks, and self‑service tooling that improves developer productivity.
- Implement security controls such as RBAC, network policies, and secrets management that meet compliance requirements.
- Contribute to CI pipeline integrations as part of end‑to‑end delivery workflows.
- Proficiency in Go for writing and maintaining production‑grade services and automation tooling, with the ability to guide others on best practices and code quality. Python or Bash experience is a plus.
- Hands‑on experience owning production Kubernetes clusters across one or more cloud environments (for example, Amazon EKS, Google GKE, or Azure AKS), including upgrades, scaling, disaster recovery, and reliability engineering.
- Experience designing and operating Git Ops‑based continuous delivery workflows (for example, ArgoCD or Flux) and infrastructure as code (Terraform or equivalent), including reusable modules and safe infrastructure provisioning practices.
- Solid understanding of networking fundamentals (DNS, load balancing, ingress) and the ability to reason through failure modes and design tradeoffs, not just apply familiar patterns.
- Ability to deliver well‑structured, independently driven work, including clear documentation of decisions, tradeoffs, and future considerations, in support of a team that operates asynchronously across time zones.
- Strong written and verbal communication skills, including the ability to write system documentation, establish runbooks, and share knowledge in ways that help the broader team move faster and get unblocked quickly.
The Runway team builds and operates the Kubernetes‑based platform and developer tooling that Git Lab engineers use to ship changes safely and reliably. We own core platform capabilities like production cluster lifecycle management, Git Ops‑based delivery workflows (ArgoCD), infrastructure as code foundations (Terraform modules and standards), and the reliability and observability practices that keep the platform healthy. We collaborate asynchronously with application and security partners to onboard services, improve self‑service workflows, and strengthen controls like RBAC, network policies, and secrets management.
The team's focus is reducing friction…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: