×
Register Here to Apply for Jobs or Post Jobs. X

Technical Lead, Runtime Software​/Hardware; Spatial AI Accelerator

Job in San Jose, Santa Clara County, California, 95199, USA
Listing for: Persimmons
Full Time position
Listed on 2026-03-09
Job specializations:
  • IT/Tech
    Hardware Engineer, AI Engineer, Systems Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Technical Lead, Runtime Software/Hardware (Spatial AI Accelerator)

Who we are:

Persimmons is building the infrastructure that will power the next decade of AI. Founded in 2023 by veteran technologists from the worlds of semiconductors, AI systems, and software innovation, We’re on a mission to enable smarter devices, more sustainable data centers, and entirely new applications the world hasn’t imagined yet.

Why join us:

We’re growing fast and looking for bold thinkers, builders, and curious problem-solvers who want to push the limits of AI hardware and software. If you're ready to join a world-class team and play a critical role in making a global impact - we want to talk to you.

Summary of Role:

Persimmons.ai seeks a multidisciplinary Technical Lead for runtime software/hardware and compiler integration, focused on our next-generation custom spatial AI accelerator. You will architect and guide the runtime system bridging compiler, host, driver, device firmware, and control hardware: enabling high-performance, robust, and scalable execution of modern AI workloads.

This is a hands-on and technical leadership role spanning system design, cross-stack engineering, technical mentorship, and collaboration with compiler, ML framework, and hardware teams.

What you’ll do:
  • Architect, design, and implement the runtime stack for Persimmons' custom spatial accelerator, covering host drivers, device runtime, and hardware/firmware control loops.
  • Lead technical direction and decisions for runtime–hardware interface, device work and command queue infrastructure, and memory management.
  • Coordinate with compiler/backend, ML systems, and hardware architects to ensure seamless end-to-end ML model execution.
  • Define and co-design hardware support features essential to runtime: queueing structures, synchronization primitives, interrupt/event signaling, dispatching and orchestrating ML workloads on spatial execution fabric.
  • Drive performance analysis, development tools for tracing, bottleneck identification, and runtime-level optimizations for latency, throughput, and hardware utilization.
  • Build and mentor a cross-disciplinary engineering team focused on runtime and system validation—establishing best practices, technical standards, and robust software-hardware collaboration.
  • Champion efficient tooling, simulation/emulation environments, and test infrastructure for system validation and robust runtime dev/debug.


* We do not expect candidates to meet all of the requirements listed below; strong candidates will demonstrate expertise in several key areas.*

  • Deep experience architecting runtime software, device firmware, hardware interfaces, or control systems for AI accelerators and/or high-performance SoCs.
  • Hands‑on expertise developing drivers, resource managers, command/queue control, and dispatching and synchronization primitives (queues, barriers, event notifications) for custom hardware.
  • Strong understanding of C/C++ multi‑threaded programming and concurrent system design, including experience developing and debugging software that leverages threads, synchronization primitives, and parallel runtime constructs to maximize hardware utilization and performance in latency‑ and throughput‑sensitive environments.
  • Solid understanding of hardware–software co‑design principles: memory hierarchies, DMA engines, interconnects, job scheduling, on‑device synchronization.
  • Experience integrating kernel libraries into device runtime stacks—connecting optimized compute kernels (such as SIMD operations and common AI operator libraries) to runtime software through seamless invocation and well‑defined APIs, efficient scheduling and memory/resource management.
  • Experience with modern large language model (LLM) inference servers and serving stacks (e.g., vLLM, Tensor

    RT‑LLM, Triton Inference Server, Hugging Face Text Generation Inference, Ray Serve), including their architecture, runtime scheduling, memory management, batching, streaming, and distributed deployment. Understanding of how runtime design, kernel integration, and hardware acceleration impact performance, scalability, and latency in LLM serving workloads.
  • Experience with system‑level performance tuning, debugging complex hardware–software interactions, and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary