Senior Software Engineer; ML Infrastructure
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-11
Listing for:
Nuro
Full Time
position Listed on 2026-06-11
Job specializations:
-
IT/Tech
Data Engineering, Machine Learning/ ML Engineer, Systems Engineer
Job Description & How to Apply Below
Requirements
- Experience:
5+ years of professional experience in ML Infrastructure, Backend Platform Engineering, or Distributed Systems , - Resource Provisioning:
Deep familiarity with modern Infrastructure-as-Code and provisioning tools such as Terraform, Pulumi, or Crossplane , - Workload Scheduling:
Hands-on experience building or managing large-scale orchestrators for compute-heavy workloads (e.g., Kubernetes, Kube Ray, Ray, Slurm, or Volcano) , - Distributed Data Processing:
Proficiency in at least one distributed processing framework, such as Apache Spark or Apache Beam, for large-scale data extraction and transformation , - Feature Management:
Experience implementing or maintaining feature stores and caching layers (e.g., Feast, Hopsworks, or Redis-based custom caching) , - Systems Design: A strong understanding of distributed systems, networking, and storage bottlenecks in the context of high-performance computing ,
- (Desirable) Active contributor to open-source projects in the MLOps or Cloud-Native ecosystem (e.g., CNCF, Ray, or Kubeflow communities) ,
- (Desirable) Experience with high-performance storage systems (e.g., Lustre, Ceph, or specialized NVMe caching) for ML data loading ,
- (Desirable) Knowledge of cost-optimization strategies for large-scale GPU clusters in public clouds (AWS, GCP, or Azure)
- Nuro is seeking a Software Engineer with expertise in large-scale infrastructure, workload orchestration, and data processing to join our ML Infrastructure team ,
- In this role, you will focus on building and evolving the core platform that provides researchers and engineers with seamless access to compute and data resources ,
- You will be responsible for executing the technical strategy for automated resource provisioning, high-performance workload scheduling, and efficient feature management to accelerate the Nuro Driver™ development lifecycle ,
- You will build the foundation that powers Nuro’s model development from experimentation to production.
Key responsibilities include: , - Resource Provisioning & IaC:
Scaling automated infrastructure-as-code (IaC) pipelines to manage thousands of GPU/CPU nodes across diverse environments , - Intelligent Scheduling:
Designing and optimizing workload orchestration to maximize hardware utilization, minimize job wait times, and handle massive-scale distributed training , - Data & ETL:
Designing robust pipelines for the extraction and transformation of petabyte-scale sensor and telemetry data into ML-ready formats , - Feature Management:
Implementing robust feature caching and storage solutions to reduce redundant computations and ensure low-latency access to pre-computed features , - Platform Abstraction:
Contributing to a unified ML platform that abstracts complex cloud infrastructure for end-users
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×