Sr. Manager, AI Model Deployment
Listed on 2026-01-12
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
General Information
Req #: WD
Career area:
Artificial Intelligence
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Monday, October 20, 2025
Working time:
Full-time
Additional Locations:
- United States of America - Illinois - Chicago
- United States of America - North Carolina - Morrisville
The Lenovo AI Technology Center (LATC) - Lenovo's global AI Center of Excellence - is driving our transformation into an AI-first organization. We are assembling a world‑class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world's leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (Think Pad, Yoga), PCs, workstations, servers, and services/solutions.
This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting‑edge foundation models and to enable flexible, hybrid‑cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast‑moving AI ecosystem, including novel models and agentic orchestration & collaboration across mobile, edge, and cloud resources.
This space is evolving fast and so are we. If you're ready to shape AI at a truly global scale, with products that touch every corner of life and work, there's no better time to join us. #LATC
Lenovo is seeking a technical leader to head our AI Model Deployment & Optimization team. In this high‑impact role, you will drive the development and large‑scale deployment of cutting‑edge AI capabilities across Lenovo devices and platforms – from on‑device inference to cloud‑enabled workloads. You will be responsible for adapting, fine‑tuning, and optimizing open‑source and proprietary foundation models for performance, efficiency, and user impact, ensuring they run seamlessly across a range of computing environments including Windows and Android, and hardware architectures from Qualcomm, nVidia, Intel, AMD, Media Tek, and others.
Your team will sit at the intersection of AI software, hardware acceleration, and product innovation, pushing the boundaries of model compression, quantization, pruning, distillation, and hardware‑aware AI optimization. This is a unique opportunity to shape how AI reaches hundreds of millions of users globally.
Key Responsibilities- Lead Lenovo's AI model deployment and optimization across devices, laptops, and cloud environments.
- Adapt, fine‑tune, and optimize open‑source foundation models (e.g., OpenAI, Google, Microsoft, Meta) for Lenovo's product portfolio.
- Drive initiatives in model compression, quantization, pruning, and distillation to achieve maximum efficiency on constrained devices while preserving model quality.
- Collaborate closely with hardware architecture teams to align AI model efficiency with device and accelerator capabilities.
- Develop hardware‑aware optimization algorithms and integrate them into model deployment pipelines.
- Utilize the latest AI frameworks and libraries from the industry to get the best inference performance out of the model and the hardware.
- Establish and maintain reproducible workflows, automation pipelines, and release‑readiness criteria for AI models.
- Build, mentor, and inspire a high‑performance applied AI engineering team.
- Experience:
10+ years in production software development, including AI/ML engineering, with 3+ years in leadership roles. Proven track record in model deployment and optimization onstrated ability to deliver production‑grade AI models optimized for on‑device and/or cloud environments. - Optimization Techniques:
Strong expertise in quantization, pruning, distillation, graph optimization, mixed precision, and hardware‑specific tuning (NPUs, GPUs, TPUs, custom accelerators). - Familiarity with model inference frameworks such as ONNX Runtime, Tensor
RT, TVM, OpenVINO, Radeon
ML, QNN, and Neuro Pilot. - Data & Telemetry:
Building feedback loops from runtime telemetry to guide…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).