×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Brahma Consulting Group
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, Cybersecurity
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

We are looking for an experienced Site Reliability Engineer to help scale a data- and ML-heavy platform with reliability, observability, and operational excellence at its core. You’ll work closely with software engineers and data scientists to design, automate, and operate the infrastructure that powers data pipelines, machine learning workloads, and real-time analytics systems. This is a hands-on, high-impact role with broad ownership across the stack and significant influence on how our platform and operations evolve.

Responsibilities
  • Design, build, and maintain scalable infrastructure to support real-time analytics and ML workloads.
  • Improve system reliability and performance through automation, observability, and proactive capacity and resilience planning.
  • Own and evolve CI/CD pipelines, deployment automation, rollback mechanisms, and configuration management.
  • Implement and maintain monitoring, alerting, and incident response processes (SLOs, runbooks, on-call).
  • Collaborate closely with engineering and data science teams to drive a culture of reliability, performance, and operational excellence.
  • Ensure security, compliance, and operational readiness across cloud and on-prem infrastructure.
  • Lead post-incident reviews and drive continuous improvement initiatives.
Required Qualifications
  • 8+ years of experience in SRE, Dev Ops, or infrastructure engineering roles.
  • 5+ years of experience with datacenter operations and/or system and network administration.
  • Strong experience with containerization (Docker) and orchestration (Kubernetes).
  • Deep knowledge of Linux systems, networking, and systems performance tuning.
  • Solid understanding of Infrastructure as Code (e.g., Terraform, Ansible) and config management.
  • Strong scripting and coding skills, applying sound engineering principles to IaC and automation (Terraform, Ansible, Bash, Python).
  • Experience with monitoring and observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, Open Telemetry).
  • Proficiency with CI/CD tools and pipelines (e.g., Git Hub Actions, ArgoCD or similar).
  • Proven ability to debug complex, distributed systems and automate robust solutions.
  • Excellent communication skills and comfort working cross-functionally in fast-moving environments.
Preferred Qualifications
  • Experience with NVIDIA DGX / POD architectures and related tooling (e.g., Base Command Manager, Mission Control, Run:

    AI).
  • Experience with major cloud providers and managed services (e.g., AWS).
  • Familiarity with security and compliance for cloud-native infrastructure (e.g., SOC 2 or similar environments).
  • Experience at high-growth or top-tier tech companies (FAANG or VC-backed).
What You’ll Get
  • Ownership of mission-critical infrastructure at a company solving real-world enterprise problems.
  • A front-row seat in a high-performance engineering culture that values quality and velocity.
  • The opportunity to shape how the platform scales—from deployment strategies to incident management practices.
  • An environment that emphasizes curiosity, accountability, and meaningful impact.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary