×
Register Here to Apply for Jobs or Post Jobs. X

Compute Platform Engineer — GPU Infrastructure - Rapidly AI start up

Job in New York, New York County, New York, 10261, USA
Listing for: Oscar Faye
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    IT Infrastructure, Systems Engineer
Salary/Wage Range or Industry Benchmark: 250000 USD Yearly USD 250000.00 YEAR
Job Description & How to Apply Below
Position: Compute Platform Engineer — GPU Infrastructure - Rapidly growing AI start up
Location: New York

Our client is a frontier AI company building at the cutting edge of what is possible in artificial intelligence. Well‑funded, talent‑dense, and moving with genuine urgency. They are not building on top of someone else’s foundation. They are building the foundation itself. The team is small by design but growing fast, and every engineer they hire has a direct line to the infrastructure decisions that matter.

They are already generating significant revenue with marquee enterprise and government clients.

The Role

This is a Compute Platform engineering role focused on the GPU infrastructure layer that powers large‑scale model training. You will not be inheriting someone else’s architecture and maintaining it. You will be shaping it, working alongside the training teams to co‑design fault tolerance, cluster health strategies, and remediation workflows that determine how reliably and efficiently the company trains its models.

What You Will Be Working On

Cluster health monitoring, automatic node remediation, and topology‑aware scheduling across large multi‑GPU fleets. GPU‑to‑GPU network performance tuning and debugging h‑performance storage management across multiple data centres, including datasets and checkpointing at petabyte scale. Capacity planning and hardware preparation for next‑generation GPU deployments, Blackwell hardware is already in production.

What They Are Looking For
  • Strong systems‑level engineering experience with a focus on cluster‑wide behaviour rather than individual service reliability.
  • Hands‑on experience operating large GPU fleets, not just scheduling workloads on them, but understanding what happens at the hardware and network layer when things go wrong.
  • Experience operating and managing large GPU clusters at scale (5000+ ideally)
  • Familiarity with NCCL and GPU‑to‑GPU communication.
  • Experience with high‑performance storage products such as VAST or Lustre across multiple data centres.
  • Strong coding ability in Go, C++ or Python.
  • Kubernetes‑first mindset with the depth to operate below the abstraction when needed. Prior exposure to Infini Band or bare metal GPU provisioning is a significant advantage.
What Is On Offer

Base salary up to $600,000 depending on level and experience. Equity packages starting in the millions, with long‑term upside tied directly to the company’s trajectory. Comprehensive benefits. The opportunity to join a real rocket ship at the perfect time to realize real wealth creation.

On‑site in London or New York - San Francisco will be considered but their focus is on London and NY for now.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary