ML Systems Engineer,Distributed Systems Job Irvine area,California USA,Software Development

Position: Staff ML Systems Engineer, Distributed Systems

Field

AI’s Irvine team is where embodied AI meets real robots, real sensors, and real field deployments. Based in the heart of Southern California’s robotics ecosystem, we build risk‑aware, reliable, field‑ready AI systems that solve the hardest problems in robotics and unlock the full potential of embodied intelligence. If you want your work to ship, get tested on hardware, and improve through real deployments, Irvine is the place.

We go beyond typical data‑driven approaches or pure transformer‑only architectures, combining rigorous engineering with learning systems proven in globally deployed solutions that deliver results today and get better every time our robots run in the field.

We are seeking a Senior / Staff ML Systems Engineer to architect and build the distributed infrastructure that powers large‑scale machine learning workflows across the organization.

This role sits at the intersection of machine learning, distributed systems, and platform engineering. You will be responsible for designing scalable systems that support data processing, model training, evaluation, and post‑processing pipelines while enabling ML teams to efficiently develop, operate, and scale production‑grade workflows.

You will play a critical role in defining the architectural patterns, tooling, and infrastructure that underpin our machine learning platform.

What You’ll Get to Do

Design and build scalable distributed machine learning pipelines across data processing, model training, evaluation, and post‑processing workflows
Architect distributed execution systems, including parallelization strategies, workload scheduling, resource allocation, and fault tolerance mechanisms
Develop reusable abstractions, frameworks, and libraries that simplify distributed pipeline development
Optimize performance across distributed CPU and GPU environments, improving throughput, utilization, and reliability
Design systems that effectively manage data partitioning, memory utilization, serialization overhead, and compute efficiency
Partner closely with ML engineers, data engineers, and infrastructure teams to product ionize research workflows and enable large‑scale model development
Establish best practices and engineering standards for distributed machine learning infrastructure
Evaluate and guide decisions around distributed computing frameworks, infrastructure technologies, and system design trade‑offs
Improve observability, debugging, monitoring, and operational tooling for distributed systems at scale

What You Have

5+ years of experience building distributed systems, backend infrastructure, machine learning platforms, or large‑scale data processing systems
Strong Python programming skills, including experience with concurrency, performance optimization, and systems development
Experience with distributed computing frameworks such as Ray, Spark, Dask, Flink, or similar technologies
Experience designing and scaling data pipelines or machine learning workflows
Strong system design skills with demonstrated expertise in scalability, reliability, and performance optimization
Experience diagnosing and resolving bottlenecks in distributed environments
Ability to work cross‑functionally and drive technical decisions across multiple teams

The Extras That Set You Apart

Experience building infrastructure for machine learning training and inference systems
Familiarity with modern ML frameworks such as PyTorch or Tensor Flow
Experience with multi‑node or multi‑GPU training architectures, including DDP, FSDP, Deep Speed, or similar technologies
Experience operating Kubernetes‑based infrastructure and large‑scale cloud systems
Deep understanding of distributed systems concepts including data locality, serialization costs, scheduling, and resource management
Experience with distributed debugging, observability, and workflow orchestration platforms
Proven ability to establish technical direction and influence architecture across organizations

Our salary range is generous. Base pay may vary based on role scope, job‑related knowledge, skills, experience, and the Irvine, California market.

Why Join Field

AI in Irvine?

In Irvine, you will work where the robots are. Our local team…