Site Reliability Engineer Job San Francisco California USA,IT/Tech

Site Reliability Engineer role

USC or GC only are considered at this time.

San Francisco
- Local to Bay area only but role is remote and occasion meeting required

Latest update, 03/31/2026:

The Site Reliability Engineer role is critical for us right now - we have enterprise customers with urgent reliability issues that need immediate attention. We're excited to see candidates who can jump in and own this piece of our infrastructure!

What our team says about this role

Client is looking for 2 SREs. From a financial perspective, the business has hit $7M+ in ARR, are meaningfully profitable, and are growing exponentially.

They're looking for 2 SREs with strong programming expertise and experience with large-scale systems to own reliability and performance for enterprise customers including Nvidia, Samsara, Zapier and PwC.

Avoid candidates that are too CI/CD or Dev Ops focused - they need people with genuine debugging experience in production environments

The role has a base salary of $150K - $250K + equity and they have a preference for on-site in San Francisco but the search is also open to remote for strong candidates that are not based in the SF Area."

"We are looking for a Site Reliability Engineer with strong programming expertise and experience with large-scale systems to own the reliability and performance for our enterprise customers including Nvidia, Samsara, Zapier and PwC.

You will work closely with our Co-founders and have a massive impact on our product and customer satisfaction."

Tech stack

Python, C, Rust, Kubernetes, FastAPI, Redis, Postgres, Prisma

Seniority

4-8+ years of experience in production or reliability engineering,
with a focus on debugging and fixing system-
level issues.

Work experience

Experience debugging memory leaks in a production environment.
Experience as a production engineer or reliability engineer with direct experience fixing issues rather than only reporting them.
Experience working with large scale systems (at least 1k+ RPS).

Hard skills

Strong programming ability in C and Rust
Experience with Postgre

SQL, Redis, Kubernetes or Prometheus/Grafana

Soft skills

Excited to work at an early-stage startup and willing to work ~60 hours / week.

What you will do:

Work directly with enterprise customers to debug and resolve production issues.
Own the reliability and performance, with a focus on debugging memory leaks, connection pool issues, and other critical bugs.
Proactively improve the overall reliability of the system to prevent future issues.
Profile systems, run benchmarks, and work to improve latency and throughput.
Collaborate in a fast-paced startup environment.

Role Details

Title:

Site Reliability Engineer
Core responsibilities include owning product reliability and performance, debugging memory leaks, and working closely with enterprise customers.
Reports to the co-founder and collaborate with the entire 5-person company.

Candidate Requirements

Must have experience with large-scale systems,
ideally from big tech companies like Meta, Amazon, Microsoft,
etc.
Strong debugging skills, particularly with memory leaks, are essential.
Programming proficiency required; C, Rust, and Python are required
.
Looking for candidates with 4+ years of experience, but open to more senior candidates if they fit the role.

Company Context

Client is a profitable AI company with a $7 million ARR, used by companies like Netflix, NASA, and Nvidia.
The company is a small, dynamic team focusing on open-source AI gateways.

Compensation and Logistics

Salary range set at $200,000 to $250,000.
Remote work is possible, but Bay Area candidates preferred for occasional on-site work.

Timeline and Urgency

Hiring is urgent due to customer issues and potential churn.
Interview process includes recruiter screen, 30-minute call, technical round, and on-site session.

Pain Points

Current customer issues need immediate attention to prevent churn.
Lack of dedicated personnel focusing solely on product reliability.

Ideal Candidate Profile

Preferred from big tech companies with experience in high-traffic environments.
Hands-on, eager to work in a startup environment, willing to handle long hours and high-pressure situations.