Site Reliability Engineer
Remote / Online - Candidates ideally in
San Francisco, San Francisco County, California, 94102, USA
Listed on 2026-06-02
San Francisco, San Francisco County, California, 94102, USA
Listing for:
3B Staffing
Remote/Work from Home
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing, Network Engineer
Job Description & How to Apply Below
USC or GC only are considered at this time.
San Francisco
- Local to Bay area only but role is remote and occasion meeting required
Latest update, 03/31/2026:
The Site Reliability Engineer role is critical for us right now - we have enterprise customers with urgent reliability issues that need immediate attention. We're excited to see candidates who can jump in and own this piece of our infrastructure!
What our team says about this role
Client is looking for 2 SREs. From a financial perspective, the business has hit $7M+ in ARR, are meaningfully profitable, and are growing exponentially.
They're looking for 2 SREs with strong programming expertise and experience with large-scale systems to own reliability and performance for enterprise customers including Nvidia, Samsara, Zapier and PwC.
Avoid candidates that are too CI/CD or Dev Ops focused - they need people with genuine debugging experience in production environments
The role has a base salary of $150K - $250K + equity and they have a preference for on-site in San Francisco but the search is also open to remote for strong candidates that are not based in the SF Area."
"We are looking for a Site Reliability Engineer with strong programming expertise and experience with large-scale systems to own the reliability and performance for our enterprise customers including Nvidia, Samsara, Zapier and PwC.
You will work closely with our Co-founders and have a massive impact on our product and customer satisfaction."
Tech stack
Python, C, Rust, Kubernetes, FastAPI, Redis, Postgres, Prisma
Seniority
4-8+ years of experience in production or reliability engineering,
with a focus on debugging and fixing system-
level issues.
Work experience
- Experience debugging memory leaks in a production environment.
- Experience as a production engineer or reliability engineer with direct experience fixing issues rather than only reporting them.
- Experience working with large scale systems (at least 1k+ RPS).
- Strong programming ability in C and Rust
- Experience with Postgre
SQL, Redis, Kubernetes or Prometheus/Grafana
- Excited to work at an early-stage startup and willing to work ~60 hours / week.
- Work directly with enterprise customers to debug and resolve production issues.
- Own the reliability and performance, with a focus on debugging memory leaks, connection pool issues, and other critical bugs.
- Proactively improve the overall reliability of the system to prevent future issues.
- Profile systems, run benchmarks, and work to improve latency and throughput.
- Collaborate in a fast-paced startup environment.
- Title:
Site Reliability Engineer - Core responsibilities include owning product reliability and performance, debugging memory leaks, and working closely with enterprise customers.
- Reports to the co-founder and collaborate with the entire 5-person company.
- Must have experience with large-scale systems,
ideally from big tech companies like Meta, Amazon, Microsoft,
etc. - Strong debugging skills, particularly with memory leaks, are essential.
- Programming proficiency required; C, Rust, and Python are required
. - Looking for candidates with 4+ years of experience, but open to more senior candidates if they fit the role.
- Client is a profitable AI company with a $7 million ARR, used by companies like Netflix, NASA, and Nvidia.
- The company is a small, dynamic team focusing on open-source AI gateways.
- Salary range set at $200,000 to $250,000.
- Remote work is possible, but Bay Area candidates preferred for occasional on-site work.
- Hiring is urgent due to customer issues and potential churn.
- Interview process includes recruiter screen, 30-minute call, technical round, and on-site session.
- Current customer issues need immediate attention to prevent churn.
- Lack of dedicated personnel focusing solely on product reliability.
- Preferred from big tech companies with experience in high-traffic environments.
- Hands-on, eager to work in a startup environment, willing to handle long hours and high-pressure situations.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×