More jobs:
Senior Firmware Engineer, Edge AI/NPU Runtime
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-05
Listing for:
Tacit
Full Time
position Listed on 2026-06-05
Job specializations:
-
IT/Tech
AI Engineer (Applied/Software), Hardware Engineer, Systems Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
About Tacit
We are an early-stage, deep tech startup based in San Francisco, developing innovative hardware that rethinks human-computer interaction. We are backed by General Catalyst, Khosla Ventures, and Greylock Partners, with a founding team from Stanford, Brain Gate, Oculus, and Tesla. While we can't reveal too much just yet, our team is tackling cutting-edge engineering challenges to bring revolutionary products to life.
About the role
We're looking for a Senior Firmware Engineer, Edge AI / NPU Runtime to help architect, optimize, and ship next-generation neurotech hardware with production-grade on-device intelligence. You will own critical parts of the embedded AI stack, from realtime sensor acquisition through preprocessing, NPU/DSP-accelerated inference, postprocessing, telemetry, and product deployment.
This is a hands-on role for someone who wants to work close to the hardware while shaping the intelligence users experience in the product. You'll help define how models run on-device, how sensor data moves through the system, and how we meet tight latency, reliability, and power budgets in real-world use.
What you'll do
- Edge AI & NPU Inference
- Own deployment of ML models onto embedded targets using NPUs, DSPs, MCUs, or other hardware accelerators.
- Integrate embedded inference runtimes, vendor NPU/DSP SDKs, and model deployment workflows into production firmware.
- Optimize inference latency, memory footprint, throughput, power consumption, and accelerator utilization on production hardware.
- Partner with ML teams on quantization, operator support, model architecture tradeoffs, calibration datasets, and accuracy/performance regressions.
- Realtime Sensor-to-Inference Systems
- Build realtime sensor-to-inference pipelines, including acquisition, time stamping, synchronization, preprocessing, feature extraction, model execution, and postprocessing.
- Design low-latency data movement using DMA, interrupts, ring buffers, deterministic scheduling, and efficient memory layouts.
- Support streaming inference patterns such as sliding windows, temporal models, event-driven execution, and continuous sensor processing.
- Maintain inference quality and timing guarantees under real-world conditions such as sensor noise, clock drift, dropped samples, variable system load, and power-state transitions.
- Power-Optimized Embedded Firmware
- Optimize end-to-end energy per inference across sensing, preprocessing, model execution, postprocessing, and idle time.
- Use low-power firmware techniques such as sleep states, duty cycling, subsystem power gating, clock scaling, batching/windowing, and dynamic power management.
- Profile and improve power consumption across sensors, CPU, NPU/DSP, memory, and supporting firmware infrastructure.
- Product Quality & Debugging
- Bring up and debug firmware across sensors, accelerators, power systems, embedded compute, and production hardware.
- Use lab tools, traces, logs, telemetry, and instrumentation to root-cause complex embedded system issues.
- Translate product and customer experience goals into concrete latency, reliability, responsiveness, and power targets.
- Build diagnostics, validation hooks, and performance benchmarks to ensure reliable real-world edge inference behavior.
- 5+ years of experience in embedded firmware, embedded systems, or edge ML systems.
- Strong C/C++/Rust experience on resource-constrained embedded platforms.
- Experience with RTOS-based systems such as FreeRTOS, Zephyr, Thread
X, or similar. - Experience deploying or optimizing ML inference on embedded targets, NPUs, DSPs, MCUs, or edge SoCs.
- Strong understanding of realtime embedded systems, including DMA, interrupts, concurrency, memory management, and low-latency data movement.
- Experience optimizing embedded systems for latency, memory footprint, throughput, and power consumption.
- Hands-on debugging and bring-up experience across embedded hardware and firmware systems, with strong cross-functional communication across firmware, ML, electrical, software, and product teams.
- Experience with embedded inference runtimes, deployment tool chains, or edge AI SoCs/accelerators such as Tensor Flow Lite Micro, ONNX Runtime, CMSIS-NN, Qualcomm QNN/SNPE, ARM Ethos-U/Vela, TVM, Execu Torch, Qualcomm, ARM, Cadence/Tensilica, Syntiant, Ambiq, Nordic, NXP, ST, Hailo, Google Edge TPU, or similar.
- Experience with quantized inference, fixed-point math, SIMD/DSP optimization, accelerator programming, or model conversion workflows.
- Experience with streaming or time-series ML workloads such as biosignals, sensor fusion, audio, gesture recognition, keyword spotting, or other realtime inference systems.
- Experience shipping battery-powered consumer electronics, wearable, neurotech, AR/VR, robotics, camera, IoT, or other embedded AI products.
$150,000 - $200,000/year
Benefits
- Competitive equity package
- Comprehensive medical, dental, and vision insurance
- Company size: 20-30 people
- Unlimited PTO
- Visa sponsorship
- 3% 401k matching
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×