Premise LLM Inference & GPU Systems Engineer Job Charlotte area,North Carolina USA,Software Development

Position: On-Premise LLM Inference & GPU Systems Engineer

Company Overview

NTT DATA strives to hire exceptional, innovative, and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.

We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (US-NC), United States (US).

Job Description

We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an Open Shift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities

NVIDIA GPU Runtime Optimization:
Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
Inference Serving:
Deploy and manage inference engines including vLLM and TensorRT-LLM.
Hardware Utilization:
Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
Model Lifecycle Management:
Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
Platform Operations:
Operate and maintain the Open Shift AI ecosystem as the primary container platform for GenAI workloads.

Required Qualifications

5+ years expertise as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
5+ years hands‑on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
3+ years experience in Open Shift AI and GPU orchestration tools like RunAI.
Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
Proven track record managing the Hugging Face deployment lifecycle.

Equal Employment Opportunity Statement

NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For Pay Transparency information, please . If you’d like more information on your EEO rights under the law, please .

#J-18808-Ljbffr