×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer -AI Infrastructure Operations

Job in Seattle, King County, Washington, 98127, USA
Listing for: Nscale
Full Time position
Listed on 2026-04-28
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability, Cloud Computing, Network Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 165000 USD Yearly USD 100000.00 165000.00 YEAR
Job Description & How to Apply Below

Overview

About Nscale Nscale is the GPU cloud engineered for AI—purpose-built to deliver high-performance, cost-efficient infrastructure for AI-native startups and global enterprises. We enable organizations to accelerate innovation, reduce the complexity of AI development, and achieve meaningful business outcomes through scalable, sustainable compute. Our culture is defined by ownership, accountability, and rapid innovation. We operate with urgency and transparency, and every team member contributes to building the infrastructure powering the future of AI.

The

Opportunity

Nscale’s AI Infrastructure Operations team supports one of the most demanding AI platforms in the industry. We are looking for a Senior Site Reliability Engineer to help design, build, and operate reliable, scalable infrastructure across our GPU cloud.

What You’ll Be Doing
  • Design, build, and improve automation, tooling, and infrastructure systems supporting AI and HPC workloads
  • Contribute to the development of control-plane systems and operational frameworks
  • Define and implement SLOs, SLIs, and monitoring strategies to ensure system reliability
  • Participate in incident response and root cause analysis, driving improvements to reduce recurrence
  • Identify and address reliability and performance bottlenecks across systems
  • Collaborate with Engineering, Network, and Fleet teams to improve system design and operational processes
  • Drive improvements in availability, scalability, and operational efficiency
  • Mentor junior engineers and contribute to a strong engineering and reliability culture
What You Bring
  • 5–8+ years of experience in SRE, Systems Engineering, or Software Engineering in production environments
  • Strong software engineering skills with experience building automation and infrastructure tooling
  • Solid understanding of Linux systems, networking, and distributed systems
  • Experience troubleshooting issues across infrastructure, OS, networking, and application layers
  • Familiarity with monitoring, alerting, and observability tools
  • Ability to balance reliability, performance, and delivery speed
Preferred Experience
  • Experience with AI or HPC environments, including GPUs or high-performance systems
  • Exposure to high-speed networking (Infini Band/RDMA)
  • Familiarity with Kubernetes, cloud platforms, or bare-metal environments
  • Experience with observability systems in high-scale environments

The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.

Salary Range: $100,000 USD - $165,000 USD

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice:
Here.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary