×
Register Here to Apply for Jobs or Post Jobs. X

ML Software Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: CAPSA
Full Time position
Listed on 2026-06-15
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Systems Engineer, Data Scientist
Salary/Wage Range or Industry Benchmark: 120000 - 150000 USD Yearly USD 120000.00 150000.00 YEAR
Job Description & How to Apply Below

About ZETIC.ai

ZETIC.ai builds an end-to-end on-device AI deployment and benchmarking platform that helps companies run their existing AI models efficiently on real consumer devices—without relying on expensive cloud GPU infrastructure.

We specialize in hardware-aware optimization and deployment across heterogeneous mobile accelerators (NPU/GPU/CPU), enabling fast iteration, clear performance decisions, and controlled production rollout at scale.

Our mission is to make high-performance on-device AI practical and shippable for every team that already has models.

Job Description

We’re hiring an ML Software Engineer (On-Device AI Model Optimizations) to drive the end-to-end effort of porting and optimizing LLMs and multimodal models (ASR, TTS, Vision encoders, etc.) onto edge devices, especially mobile NPUs.

The Role

You will own the performance roadmap (latency, memory, power/thermal), lead model-side optimization strategy, and collaborate closely with runtime/SDK and app engineers to ship real deployments.

Responsibilities
  • Lead model-side optimization and deployment for LLM + multimodal workloads (ASR/TTS/Vision encoders, etc.) on NPU/GPU/CPU paths.
  • Own performance targets and trade-offs across latency / memory / accuracy / battery.
  • quantization (PTQ/QAT), pruning, distillation, operator fusion, KV-cache strategies, attention optimizations, speculative decoding (where applicable), etc.
  • Build and maintain evaluation + profiling pipelines: on-device benchmarks, regression tracking, correctness checks, and performance dashboards.
  • Collaborate with runtime/SDK engineers to resolve compiler/runtime constraints (ops coverage, precision, layout, scheduling).
  • Work with product/engineering to define “ready-to-ship” criteria and ensure reliable production deployment across device variants.
Qualifications
  • 3+ years (or equivalent) building and shipping ML systems, with substantial hands-on experience optimizing models for real-world deployment.
  • Strong understanding of deep learning fundamentals and performance bottlenecks (compute, memory bandwidth, cache behavior).
  • Practical experience with at least one of:
    • LLM inference optimization (quantization, attention/KV cache, decode-time performance)
    • ASR/TTS deployment (streaming, latency constraints, audio pre/post)
    • Vision encoder optimization (image preprocessing, feature extraction performance)
  • Solid software engineering skills in Python + C/C++ (or equivalent low-level performance language).
  • Experience debugging numerical issues and ensuring correctness across mixed precision / quantized inference.
  • Comfortable working across ambiguous constraints and turning “it should be faster” into measurable engineering work.
Preferred Qualifications
  • Direct experience deploying to mobile/edge accelerators (NPU/DSP/GPU) and/or working with hardware vendor stacks.
  • Experience with model compilation tool chains and performance tooling (profilers, operator-level tracing, memory analysis).
  • Experience shipping SDKs or inference runtimes used by external developers.
  • Familiarity with multi-device deployment realities: device fragmentation, fallback paths, capability detection, and reproducibility.
Required Skillset
  • Edge/On-device ML optimization mindset (latency, memory, power, thermal)
  • Quantization & mixed-precision inference (PTQ/QAT; int8/fp16 strategies)
  • Performance profiling + debugging (numerical + system-level)
Preferred Skills
  • Model architecture understanding across transformers / conformers / diffusion-vocoders (as applicable)
  • Cross-functional collaboration (runtime/compiler/app/product)
Required Toolset
  • C/C++ (performance-critical components / integration work)
  • Benchmarking & profiling tools (device profilers, operator-level tracing, memory tools)
Must Have
  • Proven ability to make models materially faster/smaller on real devices (not just on GPU Server)
  • Can lead optimization efforts end-to-end with clear metrics and deliverables
  • Comfortable with heterogeneous execution (NPU/GPU/CPU fallbacks)
Compensation Range
  • Equity: meaningful early-stage option grant (role & level dependent)
  • Benefits: standard US benefits package (details shared during process)
Job Information

Company ZETIC.ai

Location San Francisco, CA Seoul, South Korea

Employment Type

Full-Time

Workplace Type On-Site

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary