×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Inference Engineer - Model Optimization & Deployment

Job in Foster City, San Mateo County, California, 94420, USA
Listing for: Zoox
Full Time position
Listed on 2026-05-22
Job specializations:
  • Software Development
    AI Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 242000 - 290000 USD Yearly USD 242000.00 290000.00 YEAR
Job Description & How to Apply Below

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands‑on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal‑constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real‑time, deterministic execution on edge devices.

In

This Role, You Will
  • Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine‑tuning (LoRA, QLoRA).
  • Architect and implement model conversion and compilation pipelines using Tensor

    RT for edge deployment.
  • Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
  • Develop and optimize custom ML OPs and Tensor

    RT Plugins with efficient CUDA kernels to minimize latency and maximize memory bandwidth on AI accelerators.
  • Write production‑level, low‑latency, and memory‑safe C++ and CUDA code for real‑time inference on vehicle systems.
Qualifications
  • Deep expertise in model quantization (PTQ, QAT) and mixed‑precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
  • Proven experience optimizing large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs/VLAs) utilizing Efficient Attention mechanisms (e.g., Flash Attention, Linear Attention), KV‑cache optimization (e.g., Paged Attention) and Speculative Decoding.
  • Extensive experience with model conversion/compilation pipelines (e.g., ONNX, Tensor

    RT, torch.compile) and performing rigorous latency benchmark and model quality parity valuation.
  • Proficiency in low‑level programming for AI accelerators, specifically developing and optimizing custom ML OPs and Tensor

    RT Plugins with efficient CUDA kernel implementations.
  • Production‑level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory‑safe, real‑time inference code for edge devices.
Bonus Qualifications
  • Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi‑modal sensor processing (Vision, LiDAR, Radar).
  • Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, Deep Speed, Megatron‑LM) and runtime efficiency optimization for GPU clusters.
  • Experience with end‑to‑end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., Tensor

    RT-LLM).
Base Salary Range

$242,000 - $290,000 a year

There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign‑on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance.

The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to [email protected] or your assigned recruiter.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary