Machine Learning Infrastructure Engineer Intern Job San Francisco area,California USA,Software Development

Requirements

Systems Programming:
Strong proficiency in C++ and a solid understanding of memory management, computer architecture, and parallel processing principles
Deep Learning Frameworks:
Hands‑on experience with PyTorch, specifically understanding custom operations, autograd, and training loops
Performance‑Oriented Mindset:
Strong problem‑solving skills with a deep interest in performance tuning, algorithmic efficiency, and low‑level system optimization
(Desirable) GPU Programming

Experience:

Practical experience writing and optimizing custom GPU kernels using CUDA or OpenAI Triton
(Desirable) Hardware Profiling Tools:
Familiarity with hardware and software profiling tools, particularly NVIDIA Nsight (Systems/Compute) and the PyTorch Profiler
(Desirable) LLM for Code Generation:
Experience using or prompting LLMs for code writing, refactoring, or exploring AI‑assisted software development workflows
(Desirable) Autonomous Vehicle Perception: A foundational understanding of Bird’s Eye View (BEV) models, 3D perception, or spatial transformers used in autonomous driving

What the job involves

Ready to get hands‑on with real‑world, large‑scale data challenges? We’re seeking a Machine Learning Infrastructure Engineer Intern to join us in a project that focuses on identifying the bottlenecks and implementing high‑performance custom kernels (using CUDA, Triton, or C++) to accelerate BEV model training
Uniquely, this internship will also explore the utilization of LLMs (Large Language Models) to assist in high‑performance code generation, kernel optimization, and automated performance profiling with Nsight and Pytorch profiler
Identify Training Bottlenecks:
Profile and analyze Bird’s Eye View (BEV) model training pipelines to pinpoint computational and memory bottlenecks
Develop Custom Kernels:
Design and implement high‑performance custom compute kernels using CUDA, Triton, or C++ to accelerate the model training process
Leverage LLMs for Optimization:
Explore and integrate Large Language Models (LLMs) to assist in generating high‑performance code and optimizing kernel logic
Automate Profiling Workflows:
Build systems to automate performance profiling and analysis using tools like NVIDIA Nsight and the PyTorch Profiler
Iterative Performance Tuning:
Continuously analyze profiling data generated by both human and LLM‑assisted workflows to maximize GPU utilization and reduce training times

#J-18808-Ljbffr