Senior Software Developer, HPC Cluster Management
Listed on 2025-12-05
-
Software Development
Software Engineer, DevOps
Senior Software Developer, HPC Cluster Management
Join to apply for the Senior Software Developer, HPC Cluster Management role at NVIDIA
.
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation powered by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. Our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.
Doing what’s never been done takes vision, innovation, and the world’s best talent. As an NVIDIAN, you will be immersed in a diverse, supportive environment where everyone is inspired to do their best work.
We have positions available for enthusiastic, hardworking and experienced software developers for working on our hardware integration and bare-metal provisioning related functionality in our Linux-based cluster management software environment. NVIDIA's Base Command Manager (BCM) is used to power thousands of Linux clusters around the world, varying from a few nodes to several thousands of nodes. BCM clusters can run on-premises, completely in the cloud, or in a hybrid environment.
WhatYou’ll Be Doing
- Development of the head node and compute node installation and provisioning processes.
- Work on functionality in the area of edge site deployment.
- Integrating our product with the latest hardware (e.g., GPUs, DPUs, accelerators, high-speed interconnects such as Infiniband).
- Develop new features in firmware management and network configuration for existing and next generation of Nvidia platforms.
- Develop functionality that makes Bright clusters usable for a wider range of workloads, and increases scalability to allow clusters to scale to huge number of nodes.
- Adding support for new Linux distributions.
- Improving support for alternative CPU architectures such as ARM.
- Work on adding features to our Ansible collections for Cluster Installation and Management.
- Assist our support team with customer support requests in the above mentioned features and help our customers to use our product more efficiently.
- Degree in Computer Science or related field (or equivalent experience).
- 7+ years of experience in software development and/or related roles.
- Our software is based on Linux. You should be very familiar with the Linux operating system and in particular with networking concepts in Linux. In addition, good practical knowledge about the most common software that is installed as part of a typical Linux installation is required.
- You are proficient in Python and intimately familiar with object oriented software design, design patterns, and concurrent programming techniques.
- Emphasis on high quality of work and in producing clean code.
- Eager to learn and use new technologies.
- Experience with Ansible.
- Experience with high-performance computing and system administration.
- Knowledge of Kubernetes, AWS, Azure, GCE, Open Stack, Jenkins and distributed programming.
- Proficiency in C++.
- Mid-Senior level
- Full-time
- Software Development
- Computer Hardware Manufacturing
- Software Development
- Computers and Electronics Manufacturing
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).