×
Register Here to Apply for Jobs or Post Jobs. X

Inference Optimization Engineer; local​/edge runtime

Job in Phoenix, Maricopa County, Arizona, 85003, USA
Listing for: Intel
Full Time position
Listed on 2026-06-24
Job specializations:
  • Software Development
    AI Engineer (Applied/Software), C++ Developer, Python, Software Engineer
Salary/Wage Range or Industry Benchmark: 200000 - 250000 USD Yearly USD 200000.00 250000.00 YEAR
Job Description & How to Apply Below
Position: Inference Optimization Engineer (local / edge runtime)

What You'll Do

  • Profile and optimize local inference (llama.cpp-vulkan and vLLM) for latency, throughput, and memory on edge hardware
  • Tune KV cache, continuous batching, and scheduling for interactive agent workloads
  • Drive quantization strategy (GGUF / AWQ / GPTQ) and validate quality impact with the Post-Training team
  • Cut CPU overhead and improve engine startup, model load, and lifecycle (start / stop / health)
  • Benchmark across hardware tiers and publish honest performance comparisons
  • Upstream fixes and patches to open‑source engines where it helps us
What You'll Learn / Grow Into
  • The internals of modern inference engines and where the milliseconds actually go
  • Hardware‑aware optimization across iGPU / CPU paths (Vulkan, SYCL, oneAPI, CUDA where relevant)
  • The quality‑vs‑speed‑vs‑memory trade space for small models
  • Interest in local / edge AI and squeezing hardware
Required Qualifications
  • BS/MS in CS, EE, Math or related STEM field
  • 5+ years software development background
  • Strong in C++ and/or Python; comfortable reading systems‑level code
  • Understands how LLM inference works (attention, KV cache, decoding)
  • Has profiled and optimized real performance problems (CPU or GPU) and can prove the speedup
  • Linux, build systems, and low‑level debugging expertise
Preferred Qualifications
  • Hands‑on with llama.cpp, vLLM, ggml, or similar engines
  • Experience with GPU / accelerator programming (Vulkan, CUDA, SYCL, Metal) or SIMD / CPU kernels
  • Familiarity with quantization formats and their quality trade‑offs
  • Open‑source contributions to inference engines
Annual Salary Range

US: $ -  USD

Work Model

This role will be eligible for a hybrid work model allowing employees to split their time between working on‑site at their assigned Intel site and off‑site.

EEO Statement

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary