Inference Optimization Engineer; local/edge runtime Job Phoenix area,Arizona USA,Software Development

Position: Inference Optimization Engineer (local / edge runtime)

What You'll Do

Profile and optimize local inference (llama.cpp-vulkan and vLLM) for latency, throughput, and memory on edge hardware
Tune KV cache, continuous batching, and scheduling for interactive agent workloads
Drive quantization strategy (GGUF / AWQ / GPTQ) and validate quality impact with the Post-Training team
Cut CPU overhead and improve engine startup, model load, and lifecycle (start / stop / health)
Benchmark across hardware tiers and publish honest performance comparisons
Upstream fixes and patches to open‑source engines where it helps us

What You'll Learn / Grow Into

The internals of modern inference engines and where the milliseconds actually go
Hardware‑aware optimization across iGPU / CPU paths (Vulkan, SYCL, oneAPI, CUDA where relevant)
The quality‑vs‑speed‑vs‑memory trade space for small models
Interest in local / edge AI and squeezing hardware

Required Qualifications

BS/MS in CS, EE, Math or related STEM field
5+ years software development background
Strong in C++ and/or Python; comfortable reading systems‑level code
Understands how LLM inference works (attention, KV cache, decoding)
Has profiled and optimized real performance problems (CPU or GPU) and can prove the speedup
Linux, build systems, and low‑level debugging expertise

Preferred Qualifications

Hands‑on with llama.cpp, vLLM, ggml, or similar engines
Experience with GPU / accelerator programming (Vulkan, CUDA, SYCL, Metal) or SIMD / CPU kernels
Familiarity with quantization formats and their quality trade‑offs
Open‑source contributions to inference engines

Annual Salary Range

US: $ - USD

Work Model

This role will be eligible for a hybrid work model allowing employees to split their time between working on‑site at their assigned Intel site and off‑site.

EEO Statement

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

#J-18808-Ljbffr

Inference Optimization Engineer; local​/edge runtime

Inference Optimization Engineer; local/edge runtime