Machine Learning Engineer, ML Infrastructure
Listed on 2026-06-18
-
IT/Tech
Machine Learning/ ML Engineer, Data Engineering
San Francisco, CA, USA
Staff Machine Learning Engineer, ML Infrastructure
Department
: AI & Machine Learning
Requisition : JOBREQ-2615904
Role descriptionUnity Vector builds an offline ML platform that powers insight, experimentation, attribution, and AI-driven decision-making across the company. Our systems operate at scale across batch and streaming data, supporting analytics, product intelligence, machine learning pipelines, and business operations. As data volume and complexity grow, our platform also supports large-scale model training, feature generation, and experimentation workflows that power production ML systems.
To support this growth, we need strong technical ownership to ensure our ML pipelines remain reliable, scalable, and architecturally sound. We are seeking a staff ML engineer to design and evolve the large-scale offline platform. This role focuses on building reliable infrastructure for generating training datasets, orchestrating ML workflows, and enabling efficient, distributed model training will work closely with ML engineers and platform teams to ensure our pipelines can efficiently handle growing data volumes and increasingly complex training workloads.
You will play a key role in shaping how model datasets are prepared as well as model training, validated, and delivered to distributed training systems, while ensuring the reliability, scalability, and performance of our offline ML platform.
What you'll be doing- Design and operate large-scale data pipelines that generate training datasets used for machine learning training and experimentation
- Develop infrastructure that supports distributed training workflows using technologies such as Pytorch, Ray Data, and Ray Train, etc.
- Integrate ML pipelines with workflow orchestration systems (e.g., Flyte, Airflow, or similar) to enable reliable multi-stage training workflows
- Improve reproducibility and observability of ML pipelines through dataset validation, monitoring, and automated testing
- Optimize performance and resource utilization across distributed compute systems used for data processing and model training
- Partner closely with ML engineers to enable efficient large-scale experimentation and model iteration
- Lead architectural improvements to ensure our offline ML pipelines remain scalable, reliable, and cost‑efficient
- Strong experience building large-scale ML pipelines
- Experience working with distributed computing frameworks such as Ray, Spark, Flink and familiarity in the Ray ecosystem (Ray Data, Ray Train) for distributed data processing and model training
- Experience building infrastructure for training data generation, dataset preparation, or ML feature pipelines
- Deep experience designing and operating production‑grade data pipelines
- Strong programming skills in Python and experience working with large-scale distributed workloads
- Experience with modern data infrastructure (data lakes, warehouses, orchestration systems, streaming platforms)
- Strong systems thinking, with the ability to reason about performance, scalability, reliability, and cost trade‑offs in distributed systems
- Proven ability to lead technical direction and influence architectural decisions across teams without formal authority
- Relocation support is not available for this position
At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well‑being and work‑life balance. While specific benefits vary, here are some of the ways we strive to take care of our eligible team members globally: comprehensive health, life, and disability insurance; commute subsidy; employee stock ownership; competitive retirement/pension plans; generous vacation and personal days;
support for new parents through leave and family‑care programs; office food snacks; mental health and wellbeing programs and support; employee resource groups; global employee assistance program; training and development programs; volunteering and donation matching program.
Gross pay salary: $209,700—$283,800 USD
Equal OpportunityUnity is a proud equal opportunity employer. We are committed to fostering an inclusive,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).