×
Register Here to Apply for Jobs or Post Jobs. X

ML Infrastructure Engineer — ML Platform, Tooling & Systems

Job in Irvine, Orange County, California, 92713, USA
Listing for: Medium
Full Time position
Listed on 2025-11-03
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 70000 - 300000 USD Yearly USD 70000.00 300000.00 YEAR
Job Description & How to Apply Below
Position: 1.61 ML Infrastructure Engineer — ML Platform, Tooling & Systems

Overview

Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence. We go beyond typical data-driven approaches or pure transformer-based architectures, and are charting a new course, with already-globally-deployed solutions delivering real-world results and rapidly improving models through real-field applications.

Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that solve the hardest challenges in autonomy — deploying globally today to unlock the full potential of embodied intelligence. Our solutions go beyond conventional data-driven ML or purely transformer-based models. We’re building real-world AI that learns from experience and delivers tangible, continuous improvements in the field.

Are you excited by the challenge of supporting ML teams with robust, scalable infrastructure? Do you want to help accelerate real-time robotics through better developer workflows and reliable systems?

Field AI is hiring an ML Infrastructure Engineer to own the software platform and tooling that enables fast, reliable AI development and deployment across our ML and robotics stacks.

What You Will Get To Do
  • Build ML Infrastructure & Developer Tooling
  • Design and implement internal tools, libraries, and CLI utilities that streamline experimentation, model training, and evaluation.
  • Improve local and cloud development environments using Docker, internal package registries, and monorepos.
  • Build reusable templates and interfaces for training, evaluation, and inference pipelines.
  • Support the ML Lifecycle (Data → Models → Deployment)
  • Develop pipelines for dataset ingestion, transformation, versioning, and validation.
  • Automate model training, evaluation, packaging, and deployment to cloud and edge environments.
  • Ensure integrity and traceability across data, code, and model artifacts.
  • Improve Build Systems and Developer Experience
  • Maintain and evolve a shared monorepo across ML, robotics, and software teams.
  • Leverage Bazel or similar systems to enable fast, reproducible builds and tests.
  • Enhance developer workflows to support consistent environments and reduce friction.
  • Own CI/CD and Automation for ML Systems
  • Build and maintain CI/CD pipelines (e.g., Git Hub Actions, AWS Step Functions) for ML experimentation and deployment.
  • Automate regression testing and benchmarking models.
  • Develop observability tools: dashboards, telemetry systems, and model health monitoring.
  • Collaborate Across Engineering & Research Teams
  • Work closely with ML scientists, software engineers, and roboticists to translate high-level platform needs into robust engineering solutions.
  • Participate in code and design reviews, documentation, and cross-team planning
What You Have
  • 3+ years of industry experience in software engineering, infrastructure, MLOps, or Dev Ops roles.
  • Deep familiarity with the ML lifecycle, including data preparation, model training, packaging, and deployment.
  • Strong software engineering foundations: proficiency with Git, Python, and system design.
  • Experience building and managing containerized environments (e.g., Docker) and working with orchestration tools (e.g., Kubernetes).
  • Hands-on experience with CI/CD workflows and infrastructure-as-code (e.g., Terraform, AWS CDK).
  • Experience with cloud ML platforms (AWS, GCP, or Azure).
  • A strong product mindset — building internal tools with empathy for researchers and engineers.
What Will Set You Apart
  • Experience with distributed training frameworks (e.g., PyTorch DDP, FSDP, Deep Speed, Megatron).
  • Familiarity with orchestrating large-scale training jobs using Kubernetes-based platforms (e.g., Ray, Sage Maker, EKS, Karpenter).
  • Background in hybrid edge-cloud ML deployments or infrastructure supporting robotic systems.
  • Prior work in environments requiring real-time ML performance, safety validation, or regulatory traceability.
Compensation and Benefits

Our salary range is between ($70,000 - $300,000 annual), but we take into consideration an individual's background…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary