Senior Site Reliability Engineer Job Raleigh area,North Carolina USA,IT/Tech

Senior Cloud Infrastructure Engineer

Location: San Francisco, CA (On‑Site only)

Employment Type: Salaried W2 Full‑Time

Salary Range: $ – $ per year

Relocation Assistance: None – Must live within commuting distance of San Francisco or be willing to relocate.

About The Company

We are a pioneering open‑source technology company based in San Francisco, transforming the way creators interact with generative AI. Our platform is a powerful, node‑based visual interface that allows artists, developers, and innovators to design, control, and customize AI workflows with complete flexibility. Users can connect modular components, build complex pipelines, and run everything locally at impressive speed and precision. Our mission is to make generative AI open, transparent, and accessible to everyone, fostering community collaboration and creative empowerment.

About

The Role

As a Senior Cloud Infrastructure Engineer, you will lead the design, deployment, and maintenance of large‑scale distributed systems that power AI workloads. You will collaborate closely with core engineers to shape the company’s long‑term infrastructure vision while ensuring scalability, performance, and reliability across environments.

What You’ll Do

Design, build, and maintain the core infrastructure that powers AI workloads at scale
Manage and automate GPU compute clusters using Python, Kubernetes, Terraform, and Ansible
Architect and operate systems for orchestration, observability, distributed storage, and networking
Ensure reliability, scalability, and performance across production environments
Collaborate closely with core engineers to design infrastructure for new features and systems
Contribute to technical strategy and long‑term infrastructure vision
Drive best practices for infrastructure automation, deployment, and monitoring

Requirements

5+ years experience as an Infrastructure Engineer or Site Reliability Engineer building and operating large‑scale distributed systems
Skilled in Python and comfortable working with infrastructure‑as‑code tools such as Terraform and Ansible
Familiar with container orchestration systems such as Kubernetes and related tooling like FluxCD, Prometheus, and Grafana
Capable of managing high‑performance GPU environments across cloud and bare‑metal setups
Highly adaptable, resourceful, and motivated by building things from the ground up
Excited to work in a small, fast‑growing team where autonomy and accountability are key
Comfortable working on‑site in a startup setting where collaboration and speed matter most

Bonus Points

Experience contributing to or maintaining open‑source projects
Background working with AI infrastructure, ML pipelines, or GPU orchestration
Strong computer science fundamentals and ability to work across different programming languages or frameworks

Skills

fluxcd, ansible, kubernetes, grafana, prometheus, python, terraform, infrastructure

Seniority Level

Mid‑Senior level

Employment Type

Full‑time

Job Function

Engineering and Information Technology

Industry

Human Resources Services

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language