Senior Software Engineer – HPC & AI Advanced Development Job Phoenix Arizona USA,Software Development

Senior Software Engineer – HPC & AI Advanced Development

Join to apply for the Senior Software Engineer – HPC & AI Advanced Development role at Hewlett Packard Enterprise

Senior Software Engineer – HPC & AI Advanced Development

3 days ago Be among the first 25 applicants

Join to apply for the Senior Software Engineer – HPC & AI Advanced Development role at Hewlett Packard Enterprise

This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home.

Who We Are

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next.

We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description

HPE is seeking a motivated and skilled Senior
Software Engineer to join the Advanced Programming Team within the HPC & AI Advanced Development Organization. This position is remote within the United States and requires valid U.S. work authorization.

In this role, the software engineer will collaboratively solve challenges in scaling high-fidelity, discrete-event simulations on HPE supercomputers, using distributed memory and resilient execution techniques like checkpointing. There will also be the development of workflows for distributed, large-scale data analysis of traces, logs, and telemetry data from simulations and HPC systems.

Key Responsibilities

Distributed HPC/AI workflow development, experimentation, and testing for enabling interactive processing of large-scale telemetry datasets (terabytes to petabytes).
Building solutions by composing existing open-source solutions and using distributed and parallel programming approaches for scaling data and simulation size.
Actively participate in a collaborative, consensus-driven design process.
Work in an Agile development environment.
Create documentation, collaborate with users, and present progress in writing, slides, and verbally.

Required Skills And Qualifications

6-8 years of industry or comparable experience in software engineering.
Proficiency in one or more programming languages such as C, C++, or Python.
Exposure to high-performance computing (HPC) or scientific computing.
Experience designing, building, or operating distributed large-scale systems in production environments.
Experience with software engineering workflows, including version control, code reviews, automated testing, and CI/CD pipelines.
Proficient in conveying technical concepts clearly and effectively through documentation, presentations, and design discussions.
Strong analytical and problem-solving skills.

Nice to Haves:
Experience in one or more of the following areas

Experience collaborating with scientists or engineers on data science, data analytics, simulations, or modeling.
Experience with distributed-memory parallel programming on supercomputers or large-scale clusters.
Background in digital twin software development, including integration with visualization tools and AI/ML workflows.
Experience working on containerization & orchestration technologies such as Docker, Podman, Apptainer, Slurm and Kubernetes.
Experience developing or supporting workflows for HPC system design and operation.
Experience developing AI surrogates especially in the context of detecting real-time HPC system errors.
Experience incorporating and fine-tuning LLMs to provide a chat interface for any analysis or development environment.
Knowledge of parallel and discrete event simulation, especially with SST ((Use the "Apply for this Job" box below).).
Familiarity with checkpointing techniques (efficiency, size optimization, recovery, persistence).
Familiarity with performance debugging and optimization at scale.
Familiarity with Pandas, Num Py, Dask,…


Increase/decrease your Search Radius (miles)



Job Posting Language