Compute Technical Consultant, Onsite; LANL Los Alamos, NM
Job in
Santa Fe, Santa Fe County, New Mexico, 87503, USA
Listed on 2026-06-18
Listing for:
Hewlett Packard Enterprise Company
Full Time
position Listed on 2026-06-18
Job specializations:
-
IT/Tech
IT Support, Hardware Engineer, Systems Engineer, Systems Administrator
Job Description & How to Apply Below
High Performance Compute Technical Consultant, Onsite (LANL) Los Alamos, NM
This role has been designed as 'Onsite' with an expectation that you will primarily work from an HPE partner/customer office.
Key Responsibilities- Monitor and maintain system health across large-scale HPC compute, network, and storage infrastructure
- Troubleshoot and repair hardware issues on HPC servers and supporting systems
- Perform basic Linux system administration tasks as needed
- Create, monitor, update, and close support tickets
- Perform hardware component replacements using spares
- Operate hand tools and low‑power tools for server maintenance
- Track and document hardware repairs, part replacements, and returns
- Create, update, and maintain site documentation, processes, and workflows
- Assist with new system installation and expansion activities
- Read system documentation and diagrams to locate components
- Collaborate with team members using email, Teams, Slack, and in‑person communication
- Participate in on‑call schedule to support 24x7 operations
- Maintain tools and workspace in an organized manner
- Ability to obtain a Q Clearance (required)
- US Citizenship (required)
- Must be able to work onsite 5 days per week in Los Alamos, NM, with additional onsite work for on‑call support. This is not a remote position
- Strong mechanical aptitude and comfort using common hand tools (screwdrivers, pliers, wrenches, cable tools, etc.) for assembling, disassembling, and maintaining server hardware and related equipment
- Ability to lift up to 50 lbs individually and up to 75 lbs with assistance
- Solid understanding of computer hardware components (servers, drives, memory modules, power supplies, cabling, and peripherals)
- Proficiency with basic computer operations on Windows and macOS (Mac Book), including OS navigation, file management, and standard productivity tools such as Slack, SharePoint, Microsoft Office (Word, Excel, Outlook, and Teams)
- Associate's degree, some college, or technical training (BS preferred)
- 2+ years of Linux System Administration Experience, including strong command‑line navigation, log analysis and monitoring (journalctl, syslog, log files), troubleshooting system and application issues, and scripting/automation using Bash or Python.
- Experience using Redfish (along with IPMI) for out‑of‑band server hardware management and monitoring. This includes utilizing the Redfish RESTful API for querying system health, power/thermal monitoring, firmware inventory, component status (processors, memory, drives, NICs), event logs, and performing actions such as system resets, power control, and BIOS configuration.
- 2+ years of hands‑on experience troubleshooting and maintaining server hardware in a datacenter environment, including diagnosing hardware faults (power, thermal, storage, networking), performing component replacements (drives, memory, CPUs, PSUs, HBAs, NICs), rack mounting/decommissioning servers, and managing cable infrastructure
- 1+ year of experience with high‑speed networking concepts and troubleshooting for Ethernet, HPE Slingshot, and Infini Band fabrics, including link diagnostics, performance tuning, cable/fiber management, switch configuration, and fault isolation in large‑scale HPC environments.
- Previous experience in a 24x7 production support environment
- Strong troubleshooting and problem‑solving skills with the ability to work independently, including systematically diagnosing complex hardware, software, and network issues through log analysis, debugging tools, and root cause analysis while minimizing downtime in high‑availability environments
- Experience reading technical diagrams, schematics, and working with ticketing systems
- Experience with Git for version control of code, scripts, configuration files, and documentation (including cloning, branching, committing, merging, and resolving conflicts)
- Experience with High‑Performance Computing (HPC) systems, clusters, or large‑scale AI infrastructure
- Experience with large‑scale storage systems, including installation, configuration, monitoring, and troubleshooting of parallel file systems, enterprise SAN/NAS solutions, object storage, and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×