×
Register Here to Apply for Jobs or Post Jobs. X

Premise LLM Inference & GPU Systems Engineer

Job in Charlotte, Mecklenburg County, North Carolina, 28245, USA
Listing for: NTT DATA North America
Full Time position
Listed on 2026-07-01
Job specializations:
  • Software Development
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 150000 USD Yearly USD 120000.00 150000.00 YEAR
Job Description & How to Apply Below
Position: On-Premise LLM Inference & GPU Systems Engineer

Company Overview

NTT DATA strives to hire exceptional, innovative, and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.

We are currently seeking a On-Premise LLM Inference & GPU Systems Engineer to join our team in Charlotte, North Carolina (US-NC), United States (US).

Job Description

We are seeking an AI Infrastructure Runtime Engineer to build and maintain large-scale on-prem LLM infrastructure. This is an enterprise private GenAI environment running on NVIDIA H200 GPU clusters and an Open Shift AI deployment ecosystem. You will manage production inference internally, including self-hosting open-source LLMs like Llama. We are focused exclusively on inferencing; this role involves no model training infrastructure or fine-tuning pipelines.

Key Responsibilities
  • NVIDIA GPU Runtime Optimization:
    Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
  • Inference Serving:
    Deploy and manage inference engines including vLLM and TensorRT-LLM.
  • Hardware Utilization:
    Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
  • Model Lifecycle Management:
    Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
  • Platform Operations:
    Operate and maintain the Open Shift AI ecosystem as the primary container platform for GenAI workloads.
Required Qualifications
  • 5+ years expertise as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.
  • 5+ years hands‑on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
  • 3+ years experience in Open Shift AI and GPU orchestration tools like RunAI.
  • Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
  • Proven track record managing the Hugging Face deployment lifecycle.
Equal Employment Opportunity Statement

NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For Pay Transparency information, please . If you’d like more information on your EEO rights under the law, please .

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary