Machine Learning Infrastructure Engineer Intern
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-18
Listing for:
PlusAI
Apprenticeship/Internship
position Listed on 2026-06-18
Job specializations:
-
Software Development
AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Job Description & How to Apply Below
Requirements
- Systems Programming:
Strong proficiency in C++ and a solid understanding of memory management, computer architecture, and parallel processing principles - Deep Learning Frameworks:
Hands‑on experience with PyTorch, specifically understanding custom operations, autograd, and training loops - Performance‑Oriented Mindset:
Strong problem‑solving skills with a deep interest in performance tuning, algorithmic efficiency, and low‑level system optimization - (Desirable) GPU Programming
Experience:
Practical experience writing and optimizing custom GPU kernels using CUDA or OpenAI Triton - (Desirable) Hardware Profiling Tools:
Familiarity with hardware and software profiling tools, particularly NVIDIA Nsight (Systems/Compute) and the PyTorch Profiler - (Desirable) LLM for Code Generation:
Experience using or prompting LLMs for code writing, refactoring, or exploring AI‑assisted software development workflows - (Desirable) Autonomous Vehicle Perception: A foundational understanding of Bird’s Eye View (BEV) models, 3D perception, or spatial transformers used in autonomous driving
- Ready to get hands‑on with real‑world, large‑scale data challenges? We’re seeking a Machine Learning Infrastructure Engineer Intern to join us in a project that focuses on identifying the bottlenecks and implementing high‑performance custom kernels (using CUDA, Triton, or C++) to accelerate BEV model training
- Uniquely, this internship will also explore the utilization of LLMs (Large Language Models) to assist in high‑performance code generation, kernel optimization, and automated performance profiling with Nsight and Pytorch profiler
- Identify Training Bottlenecks:
Profile and analyze Bird’s Eye View (BEV) model training pipelines to pinpoint computational and memory bottlenecks - Develop Custom Kernels:
Design and implement high‑performance custom compute kernels using CUDA, Triton, or C++ to accelerate the model training process - Leverage LLMs for Optimization:
Explore and integrate Large Language Models (LLMs) to assist in generating high‑performance code and optimizing kernel logic - Automate Profiling Workflows:
Build systems to automate performance profiling and analysis using tools like NVIDIA Nsight and the PyTorch Profiler - Iterative Performance Tuning:
Continuously analyze profiling data generated by both human and LLM‑assisted workflows to maximize GPU utilization and reduce training times
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×