More jobs:
GPU Infrastructure Software Engineer
Job in
Sunnyvale, Santa Clara County, California, 94085, USA
Listed on 2026-06-07
Listing for:
Core Weave
Full Time
position Listed on 2026-06-07
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
Core Weave is The Essential Cloud for AI. Built for pioneers by pioneers, Core Weave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, Core Weave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability.
Founded in 2017, Core Weave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at
About the role
As a Staff Software Engineer, you will define and drive the technical vision for GPU performance validation and infrastructure testing across Core Weave's global fleet. You will lead large-scale initiatives spanning hardware validation, performance benchmarking, Kubernetes infrastructure, and AI/ML platform reliability.
This role requires deep technical expertise combined with the ability to influence architecture, engineering practices, and organizational priorities across multiple teams. You will partner closely with Fleet Engineering, Infrastructure, Product, Hardware, and AI Platform teams to ensure Core Weave delivers industry-leading performance, reliability, and efficiency for GPU workloads at hyperscale.
What You'll Do
* Define the long-term technical strategy and architecture for Core Weave's GPU performance testing and validation platform.
* Lead the design and implementation of scalable systems for validating performance, reliability, and health across Core Weave's global infrastructure footprint.
* Drive cross-functional initiatives spanning infrastructure testing, hardware qualification, fleet provisioning, and AI infrastructure performance optimization.
* Architect and develop backend services, APIs, and automation frameworks in Go and/or Python that support large-scale testing and validation workflows.
* Design and oversee Kubernetes-native testing platforms, operators, and controllers used across thousands of GPUs and clusters.
* Establish performance benchmarks, testing methodologies, and operational standards for new hardware platforms and infrastructure deployments.
* Influence engineering standards, deployment strategies, observability practices, and reliability frameworks across multiple teams.
* Identify and solve systemic performance bottlenecks impacting customer workloads, infrastructure efficiency, and fleet utilization.
* Partner with hardware vendors and internal stakeholders to evaluate emerging technologies and shape future infrastructure investments.
* Mentor senior engineers and act as a technical leader across the organization through design reviews, architecture discussions, and strategic initiatives.
* Serve as a key technical decision-maker during critical incidents involving performance, scalability, and infrastructure reliability.
Who you are:
* 8+ years of software engineering experience, including experience leading large-scale technical initiatives.
* Strong proficiency in Go and/or Python.
* Deep hands-on experience operating Kubernetes-based infrastructure at production scale.
* Proven track record of architecting distributed systems and driving technical direction across multiple teams.
* Experience leading cross-functional efforts with significant business and engineering impact.
* Strong systems-level understanding of infrastructure performance, reliability, and scalability.
Preferred (if applicable):
* Experience building infrastructure testing, validation, or qualification platforms at scale.
* Deep understanding of GPU architectures, accelerator technologies, and performance optimization.
* HPC, distributed computing, or large-scale infrastructure experience.
* Experience with AI/ML training and inference infrastructure.
* Experience working closely with hardware vendors and datacenter operations teams.
* Contributions to open-source infrastructure, Kubernetes, or performance engineering projects.
Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match.
Why Core Weave?
At Core Weave, we work hard, have fun, and move fast! We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:
* Be Curious at Your Core
* Act Like an Owner
* Empower Employees
* Deliver Best-in-Class Client Experiences
* Achieve More Together
We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.
Come…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×