Principal ML Infrastructure Engineer
Job in
Dallas, Dallas County, Texas, 75215, USA
Listed on 2026-02-16
Listing for:
Franklin Fitch
Full Time
position Listed on 2026-02-16
Job specializations:
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Systems Engineer
Job Description & How to Apply Below
Overview
AI Infrastructure Engineer (GPU Systems & Model Deployment) (Principal and Entry level available)
We are seeking an AI Infrastructure Engineer to design and optimize high-performance systems that enable machine learning models to run reliably and efficiently in production environments. This role is focused on GPU-accelerated inference, low-latency model serving, and bridging the gap between research models and real-world deployment. You will work closely with ML researchers and software engineers to ensure models are production-ready, scalable, and performant.
This is a hands-on systems role with a strong emphasis on C++, CUDA, and GPU inference optimisation
.
- Design and maintain GPU-accelerated infrastructure for deploying machine learning models in production
- Build and optimize high-throughput, low-latency inference pipelines
- Develop and maintain performance-critical components in C++
- Optimize GPU utilization through CUDA programming and kernel tuning
- Support model conversion, optimization, and deployment using inference runtimes
- Partner with ML researchers to transition models from experimentation to production
- Diagnose and improve system performance relative to baseline benchmarks
- Ensure deployed systems are reliable, observable, and maintainable in production environments
- Masters or PhD required
- Strong C++ expertise with experience writing and optimizing production-grade systems
- Hands-on CUDA programming experience and GPU performance optimization
- Solid understanding of GPU architectures and memory management
- Experience with TensorRT or similar GPU inference runtimes
- 1–7 years of experience as a Software Development Engineer supporting production model deployment
- Experience with model optimization, quantization, or runtime acceleration techniques
- Exposure to ML frameworks (e.g., PyTorch, Tensor Flow) from a systems or deployment perspective
- Experience working with containerized environments and CI/CD pipelines
- C++, CUDA
- GPU inference runtimes (e.g., Tensor
RT) - Linux, containers, cloud or on-prem GPU systems
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×