×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in San Francisco, San Francisco County, California, 94102, USA
Listing for: United IT
Full Time position
Listed on 2026-07-01
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing: Infrastructure & Operations, SRE/Site Reliability
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer

Staff Site Reliability Engineer (SRE)

Location:

San Francisco, CA

Job Responsibilities

As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include:

As a Staff SRE, you'll operate at the highest level of technical expertise and influence. You won't just solve problems; you'll prevent them at a fundamental level across organizational boundaries.

  • Design, implement, and lead large-scale, cross-functional projects to improve the reliability, performance, and efficiency of our core services and infrastructure (10× impact).
  • Drive the reduction of toil by developing and deploying sophisticated automation tools and frameworks, championing the "everything as code" philosophy.
  • Serve as a technical escalation point for critical incidents, perform deep-dive root cause analyses (RCAs), and implement robust corrective measures to prevent recurrence.
  • Define and implement SLOs, SLIs, and Error Budgets for critical services. Enhance our monitoring, logging, and tracing systems to provide comprehensive visibility into system health.
  • Set the technical direction and best practices for the entire SRE and engineering organization. Mentor mid-level and senior engineers on design patterns, operational rigor, and reliability principles.

We're looking for a leader and a deep technical expert with a proven track record of solving the hardest scaling and reliability challenges.

Required Qualifications
  • 8+ years of progressive experience in Site Reliability Engineering, Production Engineering, or a closely related role.
  • Expert-level proficiency with AWS, including networking, compute, and storage.
  • Deep expertise in Kubernetes and the cloud-native ecosystem.
  • Fluency in at least one major scripting/programming language for automation and tooling (e.g., Python, Go, or Java).
  • Solid experience with monitoring and logging solutions (Datadog)
  • Proven ability to design and implement robust, highly available distributed systems.
  • Demonstrated experience with Infrastructure as Code tools like Terraform.
  • Exceptional communication skills, capable of explaining complex technical issues to both technical and non-technical audiences.
Nice-to-Have
  • Experience implementing Service Mesh technologies (e.g., Istio, Linkerd).
  • A strong understanding of security principles and practices in a cloud environment.
  • Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer).
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary