Premise LLM Inference & GPU Systems Engineer
Listed on 2026-07-01
-
Software Development
AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Company Overview
NTT DATA strives to hire exceptional, innovative, and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.
We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (US-NC), United States (US).
Job DescriptionWe are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an Open Shift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.
Key Responsibilities- NVIDIA GPU Runtime Optimization:
Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management. - Inference Serving:
Deploy and manage inference engines including vLLM and TensorRT-LLM. - Hardware Utilization:
Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration. - Model Lifecycle Management:
Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement. - Platform Operations:
Operate and maintain the Open Shift AI ecosystem as the primary container platform for GenAI workloads.
- 5+ years expertise as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
- 5+ years hands‑on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
- 3+ years experience in Open Shift AI and GPU orchestration tools like RunAI.
- Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
- Proven track record managing the Hugging Face deployment lifecycle.
NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For Pay Transparency information, please . If you’d like more information on your EEO rights under the law, please .
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).