GPU Pipeline Microarchitect & RTL Designer
Listed on 2025-11-13
-
Engineering
Systems Engineer
Own microarchitecture and RTL for high-throughput GPU pipeline blocks. You’ll translate product goals into clear specs, deliver timing-clean RTL, and partner across verification, physical design, and—critically—the GPU architecture team to land measurable PPA wins.
Responsibilities- Microarchitecture:
Define pipeline stages, flow control, queues/buffers, and interfaces; write concise design specs and lead reviews. - RTL Design & PPA:
Implement clean, synthesizable System Verilog; drive performance/power/area optimizations (datapaths, arbitration, back pressure, gating). - Architecture collaboration:
Work day-to-day with the architecture team to refine requirements, align on performance targets, and iterate on uArch choices with data. - Verification Partnership:
Build unit tests, create coverage plans, and author SVA; collaborate with UVM/formal to close corner cases. - Quality & Sign‑off:
Run lint/CDC/RDC; support synthesis/STA and timing convergence; engage with PD/DFT for constraints and test. - Bring‑up & Debug:
Support emulation/FPGA and silicon; instrument counters, analyze traces, and root‑cause issues end‑to‑end. - Communication & teamwork:
Communicate trade‑offs clearly across architecture, software, and PD; mentor peers and contribute to cross‑IP integration.
- 5+ years industry experience on desktop, mobile, or data center GPUs with real, shipped project ownership.
- Proficient in RTL design (System Verilog) and PPA optimization across performance, power, and area.
- Team player with strong understanding of overall GPU architecture and micro‑architecture (SIMT/SIMD execution, scheduling and flow control, memory hierarchy).
- Hands‑on first:
Able to build unit tests, drive coverage‑based verification (functional/code), and write robust SVA.
Depth in at least one of the following domains:
- Instruction Scheduler (warp/wavefront issuing, fairness, QoS)
- Job Scheduler / Command Submission
- L1/L2 Cache Design (coherency, miss handling, prefetch)
- Command Processor (front‑end, MMIO, context management)
- Tensor Core Design (matrix/tensor datapaths, mixed precision)
Nice to Have:
Experience with ray tracing blocks, texture/sampler, ROP/blend, or MMU/TLB. Performance modeling, perf counter design, and trace analysis. EDA fluency: VCS/Questa, Verdi, Jasper/IFV, DC/Genus, Prime Time/Tempus; emulation (Palladium/Veloce) or FPGA protos. Collaboration with compiler/LLVM and driver/runtime teams.
Tapeout‑quality RTL for one or more pipeline blocks with signed‑off PPA. Coverage closure against a clear plan (≥ target functional/code coverage) with SVA‑backed correctness. Demonstrated perf/power gains on target workloads vs. baseline.
Seniority level- Mid‑Senior level
- Full‑time
- Technology, Information and Internet
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).