More jobs:
Senior ML Infrastructure Engineer; PyTorch, Kubernetes, GPU Training
Job in
Redwood City, San Mateo County, California, 94061, USA
Listed on 2026-07-04
Listing for:
Finoit Inc
Full Time, Apprenticeship/Internship
position Listed on 2026-07-04
Job specializations:
-
Software Development
Machine Learning/ ML Engineer, Data Engineering
Job Description & How to Apply Below
Senior ML Infrastructure Engineer (PyTorch, Kubernetes, GPU Training)
ShortJob Description
We are seeking a Senior ML Infrastructure Engineer to design and scale the infrastructure powering large-scale machine learning training workloads. In this role, you'll build high-performance GPU training platforms, optimize distributed training pipelines, and improve the developer experience for ML researchers.
Responsibilities- Design and scale distributed ML training infrastructure for large GPU clusters.
- Build and optimize training pipelines using Py Torch ,
Deep Speed
, and distributed training frameworks. - Develop and maintain job scheduling systems using Kubernetes and/or SLURM
. - Create high-throughput data pipelines for large-scale multimodal datasets.
- Optimize GPU utilization, memory efficiency, and overall system performance.
- Build low-latency inference pipelines for production ML deployments.
- 7+ years of experience in ML Infrastructure, HPC, or Distributed Systems.
- Strong experience with Py Torch ,
Deep Speed
, FSDP
, ZeRO
, or similar distributed training frameworks. - Hands-on experience with Kubernetes
, cloud platforms (
AWS/GCP
), and containerized environments. - Strong understanding of distributed systems, GPU optimization, NCCL, memory management, and performance tuning.
- Experience building scalable ML infrastructure from development through production.
Location: Redwood City, CA (On-site)
Employment Type: Full-Time
- Experience with multimodal AI, robotics data pipelines, Triton, TensorRT, custom ML kernels, or ML compiler/runtime optimization.
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×