Infrastructure Engineer
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Cloud Computing
Overview
About Overland AI. Founded in 2022 and headquartered in Seattle, Washington, Overland AI is transforming land operations for modern defense. The company leverages over a decade of advanced research in robotics and machine learning, as well as a field-test forward ethos, to deliver combined capabilities for unit commanders. Our Over Drive autonomy stack enables ground vehicles to navigate and operate off-road in any terrain without GPS or direct operator control.
Our intuitive Over Watch C2 interface provides commanders with precise coordination capabilities essential for mission success.
Overland AI is looking for an experienced Infrastructure Engineer to help design, build, and operate the systems that power our AI model training, experiment management, and robotic deployments. This role spans on-premise environments, cloud infrastructure, networking, and automation. You ll work hands-on with servers, storage, firewalls, wireless equipment, and high-performance compute resources—while also developing scalable tooling that improves reliability, observability, and developer velocity.
The ideal candidate has 5+ years of experience in infrastructure engineering, Dev Ops, SRE, or systems engineering, with deep knowledge of on-prem environments, AWS deployments at scale, and modern infrastructure-as-code and automation practices.
What You'll Do- Build, operate, and evolve on-premise and cloud infrastructure supporting AI/ML development and robotics programs
- Develop CI/CD pipelines using Git Lab or Git Hub Actions
- Deploy and manage AWS environments including IAM, EC2, VPCs, and S3
- Implement and maintain infrastructure-as-code (Terraform, Ansible, Puppet, Chef, etc.)
- Install, configure, and troubleshoot physical servers, networking equipment, and storage systems
- Support Kubernetes clusters (clusteradm, Kops, EKS) and Git Ops workflows (ArgoCD, Flux, Spinnaker)
- Build custom automation and internal infrastructure tooling
- Manage observability stacks (Prometheus/Grafana, ELK, Datadog, etc.)
- Partner closely with engineering teams to ensure reliability, security, and efficient scaling
- Document systems, processes, and runbooks to support local and remote teams
- 5+ years in Dev Ops, SRE, infrastructure engineering, or systems engineering
- Experience with AWS orchestration and deployments at scale
- CI/CD experience with Git Lab, Git Hub Actions, or similar platforms
- Proficiency with infrastructure-as-code tooling (Terraform, Ansible, Puppet, Chef, etc.)
- Experience with Kubernetes and Git Ops patterns
- Experience with observability and monitoring stacks
- Experience with on-prem hardware environments (VMWare, Proxmox, or equivalent)
- Hands-on experience building and troubleshooting physical servers and networks
- Strong Linux administration skills
- Deep understanding of networking: firewalls, L3 switching, routing, VPNs, WAN/wireless systems
- Ability to program in Python, Go, Rust, or a similar language (in addition to shell)
- Excellent documentation, communication, and collaboration skills
- Familiarity with experiment tracking, ML infrastructure, or data visualization tooling
- Experience integrating hardware or embedded systems
- Experience deploying or supporting wireless/WAN infrastructure in field, test, or event environments
- Familiarity with ML/AI infrastructure, high-performance compute clusters, or robotics-focused environments
- Ability to travel in-state, including occasional long days during deployments or testing
- Ability to travel out-of-state for ~1–2 weeks per year
- Ability to work onsite in our Seattle office at least 3 days per week
- Ability to participate in 24x7 on-call rotation
- Ability to obtain and maintain a DoD Security Clearance
- Competitive salary: $130K – $225K annually
- Equity compensation
- Best-in-class healthcare, dental, and vision plans
- Unlimited PTO
- 401(k) with company match
- Parental leave
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).