Lead Platform/DevOps Engineer
Listed on 2025-12-16
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, Network Engineer
Bellevue Office, Sunset Corporate Campus
About the Company
Armada is an edge computing startup that provides computing infrastructure to remote areas where connectivity and cloud infrastructure is limited, as well as areas where data needs to be processed locally for real-time analytics and AI at the edge. We’re looking to bring on the most brilliant minds to help further our mission of bridging the digital divide with advanced technology infrastructure that can be rapidly deployed anywhere.
About the Role
We are looking for an experienced and detail-oriented Lead Platform Engineer to join our growing Edge team. This is a critical role where you will leverage deep technical expertise in cloud infrastructure and Kubernetes while valuing mentorship, collaboration, and open communication.
You will be responsible for the overall architecture, design, automation, optimization, and operation of our Kubernetes-based platform, supporting our Galleon mobile data centers and cloud integration. You will work on building and managing resilient, secure, and scalable Kubernetes environments across diverse edge locations and cloud infrastructure, ensuring the reliability of our distributed computing platform.
Location. This role is office-based at our Bellevue, Washington office.
What You’ll Do (Key Responsibilities)
- Architect and Lead the design, deployment, configuration, and management of highly available Kubernetes clusters on-prem (Galleon data centers) and cloud (AWS, Azure, GCP) environments. This includes designing the cluster layout, resource allocation, and storage configurations.
- Mentor and Guide team members in administering, maintaining, and monitoring the health, performance, and capacity of Kubernetes clusters and underlying infrastructure.
- Implement and manage Kubernetes networking solutions (CNI plugins, Ingress controllers) and storage solutions (PV/PVC, Storage Classes, CSI drivers).
- Design, deploy, configure, and manage Microsoft Azure Local and HCI environments.
- Maintain and monitor containerized platform services running within the clusters and robust monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK stack).
- Drive Infrastructure-as-Code (IaC) initiatives using tools like Terraform, Ansible, Helm, and potentially Kubernetes Operators, promoting automation, repeatability, and reliability.
- Support and troubleshoot complex issues related to the Kubernetes platform, containerized services, networking, and infrastructure.
- Implement and enforce Kubernetes security best practices (RBAC, Network Policies, Secrets Management, Security Contexts, Image Scanning).
- Automate cluster operations, deployment pipelines (CI/CD integration), and infrastructure provisioning using Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible).
- Lead the optimization of Kubernetes clusters for performance, scalability, and resource utilization, particularly in edge environments.
- Develop and maintain comprehensive documentation for cluster architecture, configurations, operational procedures, and runbooks.
- Work in collaboration with software engineering, Dev Ops, security teams, and product managers to ensure seamless integration, deployment, and secure operation of applications on Kubernetes.
- Lead the evaluation and integration of new technologies from the Kubernetes ecosystem.
- Contribute to the operational excellence of the platform, including participating in on‑call rotations, incident management, and building self‑healing capabilities.
Required Qualifications
- At least 12+ years of experience in Dev Sec Ops /SRE and platform engineering, with a significant focus on building and managing complex production environments.
- Minimum of 5 years of hands‑on experience designing, deploying, and administering production Kubernetes clusters, with experience specifically in on‑premises and bare‑metal deployments.
- Deep expertise in Linux administration and troubleshooting, demonstrated through at least 5+ years of hands‑on experience managing complex Linux environments.
- Deep understanding of Kubernetes architecture, core components, operational best practices, and lifecycle management.
- Strong understanding and proven…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).