×
Register Here to Apply for Jobs or Post Jobs. X

AI Infra Engineer

Job in Morrisville, Wake County, North Carolina, 27560, USA
Listing for: Computer Task
Full Time position
Listed on 2026-02-21
Job specializations:
  • IT/Tech
    Systems Engineer, AI Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 110000 USD Yearly USD 80000.00 110000.00 YEAR
Job Description & How to Apply Below

AI Infra Engineer

CTG is seeking to fill an AI Infra Engineer opening for our client in Morrisville, NC.

Location: Morrisville, NC
Duration: 12 months+ contract with ability to go long term

This role combines IT operations, hardware troubleshooting, and AI infrastructure expertise. Expect to handle day‑to‑day system administration, diagnose and resolve issues, and ensure optimal performance for ML workloads.

Key Responsibilities Hardware Management and Troubleshooting
  • Monitor and maintain GPU servers/workstations, including diagnosing and resolving hardware failures (e.g., GPU faults, power issues, cooling problems). Coordinate repairs, replacements, or upgrades as needed to ensure system uptime.
Software and Driver Management
  • Install, update, and configure CUDA drivers, Linux operating systems (e.g., Ubuntu or CentOS), and related dependencies. Ensure compatibility across hardware and software stacks for seamless ML operations.
Performance Benchmarking
  • Run and analyze MLPerf benchmarks to evaluate system performance, identify bottlenecks, and optimize configurations for ML training tasks.
System Diagnostics and Problem Resolution
  • Proactively monitor systems for issues, perform root‑cause analysis on failures or performance degradation, and implement fixes. This includes debugging kernel errors, network issues, or resource contention during LLM training.
General Infrastructure Ops
  • Implement best practices for security, backups, logging, and monitoring. Handle routine maintenance, such as firmware updates, patch management, and capacity planning for the GPU cluster.
Minimum Requirements
  • Proven experience (3+ years) in managing GPU‑accelerated servers or high‑performance computing (HPC) environments, preferably in AI/ML contexts.
  • Strong knowledge of Linux system administration, including shell scripting, package management, and networking.
  • Hands‑on experience with NVIDIA CUDA toolkit, drivers, and GPU hardware (e.g., A100, H100, or similar).
  • Familiarity with ML benchmarking tools like MLPerf and frameworks such as Tensor Flow, PyTorch, or Hugging Face for LLM training.
  • Ability to diagnose hardware and software issues using tools like nvidia‑smi, dmesg, top/htop, or Prometheus/Grafana for monitoring.
  • Understanding of AI infrastructure ops, including containerization (Docker/Kubernetes) and orchestration for distributed training. Excellent problem‑solving skills with a proactive approach to preventing downtime.
Preferred Qualifications
  • Experience with cluster management tools like Slurm, Kubernetes, or Ray for scaling ML workloads.
  • Knowledge of hardware diagnostics for servers (e.g., IPMI, BIOS configuration, RAID setups).
  • Background in IT operations with AI focus, such as Dev Ops for ML (MLOps).
  • Certifications like RHCE (Red Hat Certified Engineer), NVIDIA certifications, or similar.
  • Ability to work independently in a remote or on‑site setup, with strong communication skills for reporting issues.

Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required.

CTG does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services for this role.

To Apply

Please apply directly to this requisition using the link provided. For additional information, please contact Recruiter Jamie Robinson at

Equal Employment Opportunity Statement

CTG will consider for employment all qualified applicants including those with criminal histories in a manner consistent with the requirements of all applicable local, state, and federal laws.

CTG is an Equal Opportunity Employer. CTG will assure equal opportunity and consideration to all applicants and employees in recruitment, selection, placement, training, benefits, compensation, promotion, transfer, and release of individuals without regard to race, creed, religion, color, national origin, sex, sexual orientation, gender identity and gender expression, age, disability, marital or veteran status, citizenship status, or any other discriminatory factors as required by law.

CTG is fully committed to promoting employment opportunities for members of protected classes.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary