ML Software Engineer Job San Francisco area,California USA,IT/Tech

About ZETIC.ai

ZETIC.ai builds an end-to-end on-device AI deployment and benchmarking platform that helps companies run their existing AI models efficiently on real consumer devices—without relying on expensive cloud GPU infrastructure.

We specialize in hardware-aware optimization and deployment across heterogeneous mobile accelerators (NPU/GPU/CPU), enabling fast iteration, clear performance decisions, and controlled production rollout at scale.

Our mission is to make high-performance on-device AI practical and shippable for every team that already has models.

Job Description

We’re hiring an ML Software Engineer (On-Device AI Model Optimizations) to drive the end-to-end effort of porting and optimizing LLMs and multimodal models (ASR, TTS, Vision encoders, etc.) onto edge devices, especially mobile NPUs.

The Role

You will own the performance roadmap (latency, memory, power/thermal), lead model-side optimization strategy, and collaborate closely with runtime/SDK and app engineers to ship real deployments.

Responsibilities

Lead model-side optimization and deployment for LLM + multimodal workloads (ASR/TTS/Vision encoders, etc.) on NPU/GPU/CPU paths.
Own performance targets and trade-offs across latency / memory / accuracy / battery.
quantization (PTQ/QAT), pruning, distillation, operator fusion, KV-cache strategies, attention optimizations, speculative decoding (where applicable), etc.
Build and maintain evaluation + profiling pipelines: on-device benchmarks, regression tracking, correctness checks, and performance dashboards.
Collaborate with runtime/SDK engineers to resolve compiler/runtime constraints (ops coverage, precision, layout, scheduling).
Work with product/engineering to define “ready-to-ship” criteria and ensure reliable production deployment across device variants.

Qualifications

3+ years (or equivalent) building and shipping ML systems, with substantial hands-on experience optimizing models for real-world deployment.
Strong understanding of deep learning fundamentals and performance bottlenecks (compute, memory bandwidth, cache behavior).
Practical experience with at least one of:
- LLM inference optimization (quantization, attention/KV cache, decode-time performance)
- ASR/TTS deployment (streaming, latency constraints, audio pre/post)
- Vision encoder optimization (image preprocessing, feature extraction performance)
Solid software engineering skills in Python + C/C++ (or equivalent low-level performance language).
Experience debugging numerical issues and ensuring correctness across mixed precision / quantized inference.
Comfortable working across ambiguous constraints and turning “it should be faster” into measurable engineering work.

Preferred Qualifications

Direct experience deploying to mobile/edge accelerators (NPU/DSP/GPU) and/or working with hardware vendor stacks.
Experience with model compilation tool chains and performance tooling (profilers, operator-level tracing, memory analysis).
Experience shipping SDKs or inference runtimes used by external developers.
Familiarity with multi-device deployment realities: device fragmentation, fallback paths, capability detection, and reproducibility.

Required Skillset

Edge/On-device ML optimization mindset (latency, memory, power, thermal)
Quantization & mixed-precision inference (PTQ/QAT; int8/fp16 strategies)
Performance profiling + debugging (numerical + system-level)

Preferred Skills

Model architecture understanding across transformers / conformers / diffusion-vocoders (as applicable)
Cross-functional collaboration (runtime/compiler/app/product)

Required Toolset

C/C++ (performance-critical components / integration work)
Benchmarking & profiling tools (device profilers, operator-level tracing, memory tools)

Must Have

Proven ability to make models materially faster/smaller on real devices (not just on GPU Server)
Can lead optimization efforts end-to-end with clear metrics and deliverables
Comfortable with heterogeneous execution (NPU/GPU/CPU fallbacks)

Compensation Range

Equity: meaningful early-stage option grant (role & level dependent)
Benefits: standard US benefits package (details shared during process)

Job Information

Company ZETIC.ai

Location San Francisco, CA Seoul, South Korea

Employment Type

Full-Time

Workplace Type On-Site

#J-18808-Ljbffr