Senior Engineer GPU Kernel and Performance
Listed on 2026-06-09
-
IT/Tech
AI Engineer (Applied/Software), Systems Engineer
Dive in and do the best work of your career rney alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast‑paced environment of a true industry disruptor, you’ll find your place here. We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world.
Digital Ocean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. Digital Ocean aims to be the Inference Cloud of choice for digitally native companies and you will help ensure we can offer the industry‑leading performance for our inference services. You will be responsible for the architectural decisions that maximize throughput and minimize latency for the world’s most advanced large models.
As an IC leader, you will act as a force multiplier for the engineering organization, solving the most complex bottlenecks in memory bandwidth and compute utilization while guiding the technical roadmap for our high‑performance inference fleet.
- Performance Architecture:
Lead the technical strategy for benchmarking and performance optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure extracts maximum value from every TFLOP. - Deep‑Dive Optimization:
Engineer solutions for complex performance issues, including attention layer optimizations, memory and precision management, and advanced parallelization across multi‑node GPU clusters. - Technological
Innovation: Proactively implement cutting‑edge optimization techniques to keep Digital Ocean at the forefront of the Gen AI landscape. Some examples of projects you may work on:- Improving batch size performance using AMD’s AITER library for AMD MI355X – identify and tune AITER’s CK (composable kernel) or ASK (assembly) to optimize FP8 / BF16
- Identify kernel fusion opportunities for GLM‑5 kernels for different layers of the Transformer block (Flash Attention, RMS Norm)
- Tune expert gateway router kernels for MoE models like Qwen3‑235B, Deep Seek V3, GLM‑5 etc
- Hardware & Ecosystem Mastery:
Act as the subject matter expert on modern GPU families (NVIDIA/AMD) and their software stacks (CUDA, ROCm, Tensor
RT, OpenAI Triton), advising on hardware procurement and software integration. - Technical Mentorship:
Lead by example through high‑quality code and design reviews, elevating the technical bar for the team without the administrative overhead of direct management. - Strategic
Collaboration:
Partner with Product Management and TPMs to translate "theoretical hardware limits" into "shippable product features," ensuring our platform is both powerful and developer‑friendly. - Community Leadership:
Maintain a strong presence in the GPU infrastructure and model performance optimization communities, contributing to and integrating the best of open‑source AI.
- Technical Depth: 5+ years of experience in high‑performance computing or AI infrastructure, with a proven track record of solving compute utilization and memory bandwidth bottlenecks.
- Gen AI Literacy:
Deep familiarity with the Gen AI (LLM, VLM, LMM) landscape, including the specific quirks and architectural requirements of major model families. - Optimization Expert:
Hands‑on experience with attention‑layer optimizations and parallelization strategies across distributed GPU environments. - Hardware Fluency:
Comprehensive understanding of NVIDIA and AMD GPU architectures and their respective software ecosystems (CUDA, ROCm, etc.). - Open Source Mastery:
Extensive experience integrating, building with, and contributing to open‑source software projects. - Systems Design:
Excellent system design skills, particularly related to low‑level GPU programming – optimization, memory access patterns, and parallel execution. - Leadership through Influence:
Experience acting as a technical lead, driving design and delivery through cross‑functional alignment and expert‑level delegation.
- $ to $209,000
- This is a remote role
- We…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).