×
Register Here to Apply for Jobs or Post Jobs. X

Senior Systems Software Engineer, AI Infrastructure

Job in Coos Bay, Coos County, Oregon, 97458, USA
Listing for: NVIDIA
Full Time position
Listed on 2026-02-24
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 152000 - 287500 USD Yearly USD 152000.00 287500.00 YEAR
Job Description & How to Apply Below

Why consider this job opportunity

  • Base salary range of $152,000 - $287,500, depending on level and experience.
  • Eligibility for equity and comprehensive benefits package.
  • Opportunity for career advancement and growth within a leading technology company.
  • Work in a diverse and supportive environment that values innovation and creativity.
  • Chance to contribute to groundbreaking AI Infrastructure projects that shape the future of computing.
Job Responsibilities
  • Develop and maintain large-scale systems for AI Infrastructure, ensuring reliability, operability, and scalability.
  • Collaborate on tooling for HPC, GPU Training, and AI Model training workflows.
  • Build tools and frameworks to enhance observability and improve system performance.
  • Implement SRE fundamentals, including incident management and performance optimization.
  • Work with engineering teams to deliver innovative solutions and uphold high standards for code and infrastructure.
Qualifications
  • Degree in Computer Science or related field, or equivalent experience with 5+ years in Software Development, SRE, or Production Engineering.
  • Proficiency in Python and at least one other programming language (C/C++, Go, Perl, Ruby).
  • Expertise in systems engineering within Linux or Windows environments and cloud platforms (AWS, Azure, GCP, or OCI).
  • Strong understanding of SRE principles, including error budgets, SLOs, SLAs, and Infrastructure as Code tools.
  • Hands‑on experience with observability platforms and CI/CD systems.
Preferred Qualifications
  • Experience in AI training, inferencing, and data infrastructure services.
  • Proficiency in deep learning frameworks like PyTorch, Tensor Flow, JAX, and Ray.
  • Strong background in cloud or hardware health monitoring and system reliability.
  • Hands‑on expertise in operating and scaling distributed systems with stringent SLAs.
  • Knowledge of incident, change, and problem management processes.

We prioritize candidate privacy and champion equal‑opportunity employment. Central to our mission is our partnership with companies that share this commitment. We aim to foster a fair, transparent, and secure hiring environment for all. If you encounter any employer not adhering to these principles, please bring it to our attention immediately.

We are not the EOR (Employer of Record) for this position. Our role in this specific opportunity is to connect outstanding candidates with a top‑tier employer.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary