×
Register Here to Apply for Jobs or Post Jobs. X

HPC Infrastructure Platform Engineer

Job in Oak Ridge, Anderson County, Tennessee, 37830, USA
Listing for: Oak Ridge National Laboratory
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing: Infrastructure & Operations, Systems Administrator
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Requisition Id16521

Overview

The High-Performance Computing Systems Section within the National Center for Computational Sciences (NCCS) is seeking an HPC Infrastructure Platform Engineer to join the HPC Infrastructure group. The preferred candidate will possess commensurate knowledge, skills and abilities in addition to relevant education, certifications, experience and demonstrated ability to work as a member of a team.

Major Duties/Responsibilities
  • Linux Administration:
    • Deploy, configure and manage HPC-scale services in a Linux environment, primarily Red Hat and Rocky
    • Perform regular patches, updates and backups
    • Monitor systems using tools like Nagios and Grafana
    • Respond to and assist in troubleshooting issues
  • Kubernetes Administration:
    • Build and maintain foundational internal platforms and tools to enable the HPC Infrastructure team to reliably deploy, monitor and scale applications
    • Design standardized and automated workflow patterns, build and maintain CI/CD pipelines
    • Offer self-service, excellent documentation and assistance to HPC Infrastructure group members for efficient consumption of platform services
    • Develop, maintain and review high quality code for internal tools using programming languages such as Python, Golang, or Rust
  • Identity Management and Security:
    • Deploy, configure and support identity and access management services using LDAP and Ping Federate
    • Maintain and enable secure access for human users and automated workloads in Kubernetes
  • Virtualization and Automation:
    • Deploy and manage resources in the NCCS VMware environment
    • Identify potential automation targets and lead efforts to automate processes
    • Define policies and procedures for automation and configuration management for the team and organization as a whole
  • Project Management and Leadership:
    • Lead small Infrastructure projects through the project lifecycle
    • Mentor and train junior staff, creating training documentation, holding knowledge sharing sessions, and fostering skill growth throughout the team
    • Propose and implement improvements to existing Infrastructure systems as well as new systems, processes and procedures
Basic Qualifications
  • Bachelor's degree in computer science or closely related field and a minimum of 5 years of experience in Linux systems and Kubernetes platform administration, or a master's degree and a minimum of 4 years of experience in Linux systems and Kubernetes platform administration
  • An equivalent combination of education and experience will be considered
Preferred Qualifications
  • Excellent interpersonal/communication skills and the ability to work within a team
  • Strong experience designing, building and maintaining Kubernetes platform tools
  • Strong working knowledge of Linux system fundamentals and common network protocols
  • Programming and scripting skills in common languages such as Python and bash
  • Understanding of versioning and code review tools like Git Hub and Git Lab
  • Experience implementing and supporting highly-available systems and services
  • Experience with configuration management tools such as Puppet or Ansible
  • Experience deploying and maintaining virtual environments using VMWare
  • Experience deploying, maintaining and troubleshooting a variety of infrastructure services such as OpenLDAP, DNS, DHCP, etc.
  • Ability to plan, prioritize and complete assigned projects with minimal supervision
Special Requirements
  • This position requires the ability to obtain and maintain a clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program
Benefits
  • Prescription Drug Plan
  • Dental Plan
  • Vision Plan
  • 401(k) Retirement Plan
  • Contributory Pension Plan
  • Life Insurance
  • Disability Benefits
  • Generous Vacation and Holidays
  • Parental Leave
  • Legal Insurance with Identity Theft Protection
  • Employee Assistance Plan
  • Flexible Spending Accounts
  • Health Savings Accounts
  • Wellness Programs
  • Educational Assistance
  • Relocation Assistance
  • Employee Discounts
Equal Employment Opportunity

ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary