×
Register Here to Apply for Jobs or Post Jobs. X

HPC Engineer

Job in Milpitas, Santa Clara County, California, 95035, USA
Listing for: KLA
Full Time position
Listed on 2026-05-30
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, Data Engineer, AI Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Staff HPC Engineer

Company Overview

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice‑controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays.

The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem‑solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems.

There is never a dull moment with us.

Group / Division

The Information Technology (IT) group at KLA is involved in every aspect of the global business. IT’s mission is to enable business growth and productivity by connecting people, process, and technology. It focuses not only on enhancing the technology that enables our business to thrive but also on how employees use and are empowered by technology. This integrated approach to customer service, creativity and technological excellence enables employee productivity, business analytics, and process excellence.

Job Description /

Preferred Qualifications

The Staff HPC Engineer designs, builds, optimizes, and supports large‑scale compute environments used for scientific computing, AI/ML workloads, simulation, and data‑intensive research. This role blends systems engineering, performance tuning, cluster architecture, and hands‑on troubleshooting. The engineer partners with researchers, developers, and IT teams to deliver reliable, scalable, and high‑performance compute infrastructure.

Key Responsibilities
  • HPC Architecture & Engineering
  • Design and implement HPC clusters, including compute, storage, networking, and job‑scheduling components.
  • Evaluate and integrate new technologies (GPUs, accelerators, interconnects, file systems).
  • Develop automation for cluster provisioning, configuration, and lifecycle management.
  • Architect solutions for large‑scale parallel workloads, AI/ML pipelines, and data‑intensive applications.
Performance Optimization
  • Profile and tune applications for CPU, GPU, memory, and I/O performance.
  • Optimize MPI, OpenMP, CUDA, and other parallel programming frameworks.
  • Benchmark hardware and software stacks to guide procurement and architecture decisions.
Operations & Reliability
  • Maintain and monitor HPC clusters, job schedulers (Slurm, PBS, LSF), and distributed file systems (Lustre, GPFS, BeeGFS).
  • Troubleshoot complex system issues across compute, storage, and network layers.
  • Implement security best practices, patching, and compliance controls.
  • Ensure high availability and efficient resource utilization.
Automation & Dev Ops
  • Build and maintain CI/CD pipelines for HPC‑related software and infrastructure.
  • Use tools such as Ansible, Terraform, Kubernetes, or custom scripts to automate workflows.
  • Develop monitoring and observability solutions (Prometheus, Grafana, ELK, etc.).
Collaboration & Leadership
  • Work closely with researchers, data scientists, and engineering teams to support workload optimization.
  • Provide technical leadership, mentorship, and guidance to junior engineers.
  • Document architectures, procedures, and best practices.
  • Participate in capacity planning and long‑term HPC strategy.
Required Qualifications
  • Extensive experience with Linux systems engineering in large‑scale compute environments.
  • Solid understanding of distributed systems and cloud infrastructure.
  • Deep knowledge of HPC schedulers (Slurm preferred), MPI stacks, and parallel computing models.
  • Strong understanding of high‑speed interconnects (Infini Band, RoCE) and distributed storage systems.
  • Proficiency in scripting languages (Python, Go, Bash) and automation frameworks.
  • Experience with GPUs (NVIDIA CUDA, MIG, NVLink) and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary