×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Arlington, Arlington County, Virginia, 22201, USA
Listing for: The Recruiting Guy
Full Time position
Listed on 2025-11-14
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, Network Engineer
Salary/Wage Range or Industry Benchmark: 175000 - 250000 USD Yearly USD 175000.00 250000.00 YEAR
Job Description & How to Apply Below

Senior Cloud Infrastructure Engineer

Location:

San Francisco, CA (On‑site only) — must live within commuting distance or be willing to relocate.

Compensation: $175,000 – $250,000 per year (Salaried W2 Full‑Time).

About the Company

We are a pioneering open source technology company transforming how creators interact with generative AI. Our node‑based visual interface enables artists, developers, and innovators to design, control, and customize AI workflows with complete flexibility.

About the Role

You will lead the design, deployment, and maintenance of large‑scale distributed systems that power AI workloads. Collaborate closely with core engineers to shape the company’s long‑term infrastructure vision while ensuring scalability, performance, and reliability across environments.

What You’ll Do
  • Design, build, and maintain the core infrastructure that powers AI workloads at scale.
  • Manage and automate GPU compute clusters using tools such as Python, Kubernetes, Terraform, and Ansible.
  • Architect and operate systems for orchestration, observability, distributed storage, and networking.
  • Ensure reliability, scalability, and performance across production environments.
  • Collaborate closely with core engineers to design infrastructure for new features and systems.
  • Contribute to technical strategy and long‑term infrastructure vision.
  • Drive best practices for infrastructure automation, deployment, and monitoring.
Requirements
  • 5+ years experience as an Infrastructure Engineer or Site Reliability Engineer building and operating large‑scale distributed systems.
  • Skilled in Python and comfortable working with infrastructure‑as‑code tools such as Terraform and Ansible.
  • Familiar with container orchestration systems such as Kubernetes and related tooling like FluxCD, Prometheus, and Grafana.
  • Capable of managing high‑performance GPU environments across cloud and bare metal setups.
  • Highly adaptable, resourceful, and motivated by building things from the ground up.
  • Excited to work in a small, fast‑growing team where autonomy and accountability are key.
  • Comfortable working on‑site in a startup setting where collaboration and speed matter most.
Bonus Points
  • Experience contributing to or maintaining open‑source projects.
  • Background working with AI infrastructure, ML pipelines, or GPU orchestration.
  • Strong computer science fundamentals and ability to work across different programming languages or frameworks.
Skills

fluxcd, ansible, kubernetes, grafana, prometheus, python, terraform, infrastructure.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary