×
Register Here to Apply for Jobs or Post Jobs. X

GPU Infrastructure Software Engineer

Job in Sunnyvale, Santa Clara County, California, 94085, USA
Listing for: Core Weave
Full Time position
Listed on 2026-06-07
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Job Description & How to Apply Below
Position: Staff GPU Infrastructure Software Engineer
Core Weave is The Essential Cloud for AI. Built for pioneers by pioneers, Core Weave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, Core Weave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability.

Founded in 2017, Core Weave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at

About the role

As a Staff Software Engineer, you will define and drive the technical vision for GPU performance validation and infrastructure testing across Core Weave's global fleet. You will lead large-scale initiatives spanning hardware validation, performance benchmarking, Kubernetes infrastructure, and AI/ML platform reliability.

This role requires deep technical expertise combined with the ability to influence architecture, engineering practices, and organizational priorities across multiple teams. You will partner closely with Fleet Engineering, Infrastructure, Product, Hardware, and AI Platform teams to ensure Core Weave delivers industry-leading performance, reliability, and efficiency for GPU workloads at hyperscale.

What You'll Do

* Define the long-term technical strategy and architecture for Core Weave's GPU performance testing and validation platform.

* Lead the design and implementation of scalable systems for validating performance, reliability, and health across Core Weave's global infrastructure footprint.

* Drive cross-functional initiatives spanning infrastructure testing, hardware qualification, fleet provisioning, and AI infrastructure performance optimization.

* Architect and develop backend services, APIs, and automation frameworks in Go and/or Python that support large-scale testing and validation workflows.

* Design and oversee Kubernetes-native testing platforms, operators, and controllers used across thousands of GPUs and clusters.

* Establish performance benchmarks, testing methodologies, and operational standards for new hardware platforms and infrastructure deployments.

* Influence engineering standards, deployment strategies, observability practices, and reliability frameworks across multiple teams.

* Identify and solve systemic performance bottlenecks impacting customer workloads, infrastructure efficiency, and fleet utilization.

* Partner with hardware vendors and internal stakeholders to evaluate emerging technologies and shape future infrastructure investments.

* Mentor senior engineers and act as a technical leader across the organization through design reviews, architecture discussions, and strategic initiatives.

* Serve as a key technical decision-maker during critical incidents involving performance, scalability, and infrastructure reliability.

Who you are:

* 8+ years of software engineering experience, including experience leading large-scale technical initiatives.

* Strong proficiency in Go and/or Python.

* Deep hands-on experience operating Kubernetes-based infrastructure at production scale.

* Proven track record of architecting distributed systems and driving technical direction across multiple teams.

* Experience leading cross-functional efforts with significant business and engineering impact.

* Strong systems-level understanding of infrastructure performance, reliability, and scalability.

Preferred (if applicable):

* Experience building infrastructure testing, validation, or qualification platforms at scale.

* Deep understanding of GPU architectures, accelerator technologies, and performance optimization.

* HPC, distributed computing, or large-scale infrastructure experience.

* Experience with AI/ML training and inference infrastructure.

* Experience working closely with hardware vendors and datacenter operations teams.

* Contributions to open-source infrastructure, Kubernetes, or performance engineering projects.

Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match.

Why Core Weave?

At Core Weave, we work hard, have fun, and move fast!  We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

* Be Curious at Your Core

* Act Like an Owner

* Empower Employees

* Deliver Best-in-Class Client Experiences

* Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.

Come…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary