Senior ML Accelerator Engineer - GPU
Listed on 2026-06-12
-
Software Development
AI Engineer (Applied/Software)
GM’s vision of Zero Crashes, Zero Emissions, and Zero Congestion guides everything we do in autonomous and assisted driving. The AV organization is building advanced automated driving technologies, including Level 4–capable fully self-driving systems, to move us toward safer, more sustainable, and more accessible mobility.
For the AI Kernels & Compilers team, that mission shows up in the details: turning cutting‑edge perception, prediction, and planning research into production‑grade software that can run efficiently and reliably on real vehicles pioneer new approaches to model export, kernel development, and performance engineering so that every cycle on our accelerators translates into better situational awareness, faster reaction times, and more robust behavior on the road.
If you want your compiler and kernels work to directly influence how automated vehicles understand and react to the world — while operating at the safety, reliability and scale of a company like GM — this is where that impact becomes real.
About the TeamThe AI Kernels team builds high‑performance GPU kernels and custom libraries that sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own making core AI workloads faster, more reliable, and easier to maintain and deploy on real cars, under real‑world constraints.
That means:Designing and implementing custom operators when vendor libraries hit their limits
Integrating those kernels deep into our ML runtime stackDebugging and tuning GPU performance across the AV software stack, often on hardware‑in‑the‑loop
We partner closely with AI Solutions, AI Compilers, AI Architecture, and AI Tooling to ensure models deploy efficiently to the car while consistently meeting strict latency, throughput, and reliability targets. If you enjoy pushing GPUs to their limits and seeing your work directly impact how autonomous vehicles perceive and act in the world, this is the team for you.
What you’ll be doing (Responsibilities)Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads.
Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack.
Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans.
Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production.
Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review.
Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs
Your Skills & Abilities (Required Qualifications)Minimum 2+ years of relevant industry experience or equivalent experience
BS, MS or PhD in CS, or related technical field
Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture.
Hands‑on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar.
Strong background in software architecture, library design, and design patterns.
Strong C++ programming skills with the ability to feel comfortable in large codebases.
Solid background in system performance, high performance computing and/or architecture‑aware optimizations.
Strong communication skills and the ability to work collaboratively within a team
Excellent analytical and problem‑solving skillsWhat Will Give You A Competitive Edge (Preferred Qualifications)2+ years of relevant industry experience or equivalent experience
Experience with tensor core programming, CUTLASS and/or Cu Te Experience with ML model architectures, in particular transformer‑based
Experience with low latency or real time systems
Experience with lower levels of an accelerator software stack (i.e. drivers,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).