Senior Site Reliability Engineer
Listed on 2025-12-06
-
IT/Tech
Systems Engineer, Cloud Computing
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center.
As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.
Anduril Maritime delivers platforms, systems, and integrated effects in the maritime domain. Our autonomous vehicles (sub-surface and surface) are the cornerstone of these capabilities, and we continually strive to push the boundaries of the possible in terms of endurance, autonomy and mission capability. The Maritime team develops and maintains core products and payloads, and adapts and applies those products to serve a wide variety of defense, IC and commercial customers in US and international markets.
ABOUTTHE JOB
As a Senior Site Reliability Engineer on the Maritime Digital Shipbuilding team, you will build and operate the infrastructure that keeps our digital production systems running at full speed. You’ll develop and manage CI/CD pipelines, automate infrastructure with code, and deploy applications and machine learning models across cloud and edge environments with security, traceability, and reliability in mind.
You’ll work closely with software, data, and operations engineers to turn designs into working systems—streamlining development, improving performance, and keeping production stable as we scale. You’ll also collaborate with digital, manufacturing, and corporate technology teams across Anduril in a high-tech, fast-paced culture of innovation focused on solving real problems and delivering results.
If you’re driven to build systems that last, thrive on deep technical challenges, and want to see your work directly shape how we design, build, and sustain complex platforms, you’ll be helping build the future of digital shipbuilding and the next generation of maritime vehicles.
WHAT YOU'LL DO- Build and Manage CI/CD Pipelines:
Develop and maintain CI/CD pipelines using tools like Git Hub Actions and Jfrog Artifactory to ensure seamless integration and deployment of machine learning models and applications. - Infrastructure as Code (IaC):
Utilize Terraform and Ansible to automate infrastructure provisioning and management on cloud platforms such as Azure, AWS, or Google Cloud Platform (GCP). - Containerization and Orchestration:
Implement containerization solutions with Docker and manage container orchestration using Kubernetes to ensure reliable deployment and scaling of applications. - Model Management and Deployment:
Set up and maintain model registries and feature stores (e.g., MLflow, Kubeflow), and manage deployment pipelines for both batch and real-time inference. - Monitoring and Logging:
Establish comprehensive monitoring and logging solutions using tools like ELK Stack (Elasticsearch, Logstash, Kibana), Prometheus, and Grafana to ensure the smooth operation of deployment environments. - Collaborate with Cross-Functional Teams:
Work closely with development, data science, and operations teams to foster collaboration and ensure the efficient and effective deployment of machine learning models. - Optimize Performance:
Utilize parallel computing frameworks such as CUDA and OpenCL to accelerate high-performance computing tasks, ensuring timely processing of large datasets and complex simulations.
- Advanced proficiency in programming languages (C++ for high-performance computing, Python for scripting and integration).
- Experience with CI/CD tools like Git Hub Actions, Jfrog Artifactory, and Git.
- Proficiency with IaC tools (Terraform, Ansible).
- Experience with cloud platforms (Azure, AWS, GCP).
- Proficiency in containerization (Docker) and container…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).