Principal ML Infrastructure Engineer Job Dallas area,Texas USA,IT/Tech

Position: Principal ML Infrastructure Engineer (Relocation Available)

Overview

AI Infrastructure Engineer (GPU Systems & Model Deployment) (Principal and Entry level available)

We are seeking an AI Infrastructure Engineer to design and optimize high-performance systems that enable machine learning models to run reliably and efficiently in production environments. This role is focused on GPU-accelerated inference, low-latency model serving, and bridging the gap between research models and real-world deployment. You will work closely with ML researchers and software engineers to ensure models are production-ready, scalable, and performant.

This is a hands-on systems role with a strong emphasis on C++, CUDA, and GPU inference optimisation
.

Core Responsibilities

Design and maintain GPU-accelerated infrastructure for deploying machine learning models in production
Build and optimize high-throughput, low-latency inference pipelines
Develop and maintain performance-critical components in C++
Optimize GPU utilization through CUDA programming and kernel tuning
Support model conversion, optimization, and deployment using inference runtimes
Partner with ML researchers to transition models from experimentation to production
Diagnose and improve system performance relative to baseline benchmarks
Ensure deployed systems are reliable, observable, and maintainable in production environments

Required Qualifications

Masters or PhD required
Strong C++ expertise with experience writing and optimizing production-grade systems
Hands-on CUDA programming experience and GPU performance optimization
Solid understanding of GPU architectures and memory management

Preferred / Nice-to-Have Qualifications

Experience with TensorRT or similar GPU inference runtimes
1–7 years of experience as a Software Development Engineer supporting production model deployment
Experience with model optimization, quantization, or runtime acceleration techniques
Exposure to ML frameworks (e.g., PyTorch, Tensor Flow) from a systems or deployment perspective
Experience working with containerized environments and CI/CD pipelines

Tech Environment (Representative, Not Exhaustive)

C++, CUDA
GPU inference runtimes (e.g., Tensor

RT)
Linux, containers, cloud or on-prem GPU systems

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language