ML Software Engineer
Listed on 2026-06-15
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Systems Engineer, Data Scientist
About ZETIC.ai
ZETIC.ai builds an end-to-end on-device AI deployment and benchmarking platform that helps companies run their existing AI models efficiently on real consumer devices—without relying on expensive cloud GPU infrastructure.
We specialize in hardware-aware optimization and deployment across heterogeneous mobile accelerators (NPU/GPU/CPU), enabling fast iteration, clear performance decisions, and controlled production rollout at scale.
Our mission is to make high-performance on-device AI practical and shippable for every team that already has models.
Job DescriptionWe’re hiring an ML Software Engineer (On-Device AI Model Optimizations) to drive the end-to-end effort of porting and optimizing LLMs and multimodal models (ASR, TTS, Vision encoders, etc.) onto edge devices, especially mobile NPUs.
The RoleYou will own the performance roadmap (latency, memory, power/thermal), lead model-side optimization strategy, and collaborate closely with runtime/SDK and app engineers to ship real deployments.
Responsibilities- Lead model-side optimization and deployment for LLM + multimodal workloads (ASR/TTS/Vision encoders, etc.) on NPU/GPU/CPU paths.
- Own performance targets and trade-offs across latency / memory / accuracy / battery.
- quantization (PTQ/QAT), pruning, distillation, operator fusion, KV-cache strategies, attention optimizations, speculative decoding (where applicable), etc.
- Build and maintain evaluation + profiling pipelines: on-device benchmarks, regression tracking, correctness checks, and performance dashboards.
- Collaborate with runtime/SDK engineers to resolve compiler/runtime constraints (ops coverage, precision, layout, scheduling).
- Work with product/engineering to define “ready-to-ship” criteria and ensure reliable production deployment across device variants.
- 3+ years (or equivalent) building and shipping ML systems, with substantial hands-on experience optimizing models for real-world deployment.
- Strong understanding of deep learning fundamentals and performance bottlenecks (compute, memory bandwidth, cache behavior).
- Practical experience with at least one of:
- LLM inference optimization (quantization, attention/KV cache, decode-time performance)
- ASR/TTS deployment (streaming, latency constraints, audio pre/post)
- Vision encoder optimization (image preprocessing, feature extraction performance)
- Solid software engineering skills in Python + C/C++ (or equivalent low-level performance language).
- Experience debugging numerical issues and ensuring correctness across mixed precision / quantized inference.
- Comfortable working across ambiguous constraints and turning “it should be faster” into measurable engineering work.
- Direct experience deploying to mobile/edge accelerators (NPU/DSP/GPU) and/or working with hardware vendor stacks.
- Experience with model compilation tool chains and performance tooling (profilers, operator-level tracing, memory analysis).
- Experience shipping SDKs or inference runtimes used by external developers.
- Familiarity with multi-device deployment realities: device fragmentation, fallback paths, capability detection, and reproducibility.
- Edge/On-device ML optimization mindset (latency, memory, power, thermal)
- Quantization & mixed-precision inference (PTQ/QAT; int8/fp16 strategies)
- Performance profiling + debugging (numerical + system-level)
- Model architecture understanding across transformers / conformers / diffusion-vocoders (as applicable)
- Cross-functional collaboration (runtime/compiler/app/product)
- C/C++ (performance-critical components / integration work)
- Benchmarking & profiling tools (device profilers, operator-level tracing, memory tools)
- Proven ability to make models materially faster/smaller on real devices (not just on GPU Server)
- Can lead optimization efforts end-to-end with clear metrics and deliverables
- Comfortable with heterogeneous execution (NPU/GPU/CPU fallbacks)
- Equity: meaningful early-stage option grant (role & level dependent)
- Benefits: standard US benefits package (details shared during process)
Company ZETIC.ai
Location San Francisco, CA Seoul, South Korea
Employment Type
Full-Time
Workplace Type On-Site
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).