ML Application Engineer – AI Inference & Model Optimization/Senior level KSA Job Riyadh area,Riyadh Region Saudi Arabia,IT/Tech

Position: ML Application Engineer – AI Inference & Model Optimization (Staff/Senior Staff level KSA

Company

Qualcomm Middle East Information Technology Company LLC

Company

Qualcomm Middle East Information Technology Company LLC

Job Area

Engineering Group, Engineering Group >
Software Engineering

General Summary

About Us

Qualcomm is enabling a world where everyone and everything can be intelligently connected. You interact with products and technologies made possible by Qualcomm every day, including intelligent edge devices, next-generation computing platforms, and advanced AI solutions. Qualcomm’s leadership in AI, high‑performance compute, and connectivity is driving innovation across cloud, edge, and data center environments - delivering scalable, power‑efficient platforms that power the next generation of intelligent infrastructure.

About

The Role

Qualcomm is seeking Machine Learning Applications Engineer – AI Inference & Model Optimization to support the enablement of rack-scale deep learning workloads on advanced Qualcomm AI inference accelerators. These accelerators utilize Qualcomm's expertise in hardware-accelerated AI to deliver high-performance, energy-efficient generative AI and computer vision inference solutions for modern data centers.

This is a customer‑facing, highly technical role focused on porting, optimizing, and validating deep learning AI models on production systems, and enabling Qualcomm’s partners to develop and deploy advanced machine learning applications - including computer vision, speech, generative AI and state of the art multimodal reasoning models - using popular frameworks such as PyTorch, Tensor Flow, and ONNX on Qualcomm Cloud AI accelerators.

Key responsibilities include evaluating models for throughput, latency, and accuracy; profiling and optimizing model performance; building robust application pipelines; integrating customer frameworks; and contributing to documentation, training, and demonstrations.

The role requires strong expertise in AI models, quantization, performance optimization, and deployment, plus the ability to shape architecture, workload sizing, and system design. It also requires experience with deep learning model development across hardware platforms, solid programming skills, collaboration with cross-functional teams, and proficiency in machine learning frameworks, Linux, and container orchestration tools.

The ideal candidate can effectively bridge AI model requirements ↔ hardware capabilities ↔ customer expectations, guiding customers from model selection → hardware sizing → deployment decisions → production readiness.

What You’ll Do

AI Model Porting & Optimization
Deploy, optimize and scale deep learning AI models onto accelerator‑based data center platforms, including:
Model conversion workflows
Quantization techniques (INT8 / mixed precision)
Runtime integration and optimization
Integrate ML models onto Qualcomm’s Cloud AI ML stack from frameworks such as PyTorch, Tensor Flow, and ONNX.
Drive improvements in model throughput, latency, and accuracy, with clear trade‑off analysis.
Build, test, and deploy scalable inference pipelines using serving frameworks such as vLLM, TGI, and Triton.
Optimize workloads for LLM and GenAI models across both multi-SoC and multi-card architectures.
Collaborate with engineering teams to analyze and refine training and inference for advanced deep learning applications.
Identify bottlenecks across compute, memory, and runtime, and guide optimization strategies.
Contribute to Qualcomm’s Cloud AI Git Hub repository and developer documentation, sharing technical best practices and solutions.
Develop and integrate end-to-end ML application pipelines with customer frameworks and libraries.
Customer‑Facing Technical Engagement
Act as a trusted technical advisor for customers deploying AI workloads.
Engage in hardware sizing and architecture discussions, aligning model requirements with infrastructure capabilities.
Provide technical guidance on:
AI model selection
Deployment feasibility
System architecture and performance expectations
Lead discussions on model capabilities and limitations based on real customer use cases.
Model–Infrastructure Alignment
Assess and evaluate AI model requirements and recommend alternative model approaches…

ML Application Engineer – AI Inference & Model Optimization​/Senior level KSA

ML Application Engineer – AI Inference & Model Optimization/Senior level KSA