×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer; SRE | Dev Ops Engineer

Job in Menlo Park, San Mateo County, California, 94029, USA
Listing for: BioSpace
Full Time position
Listed on 2026-06-05
Job specializations:
  • IT/Tech
    Cloud Computing: Infrastructure & Operations, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer (SRE) | Dev Ops Engineer #4770

Overview

Our mission is to detect cancer early, when it can be cured. We are working to change the trajectory of cancer mortality and bring stakeholders together to adopt innovative, safe, and effective technologies that can transform cancer care. We are a healthcare company pioneering new technologies to advance early cancer detection. We have built a multi‑disciplinary organization of scientists, engineers, and physicians and we are using the power of next‑generation sequencing (NGS), population‑scale clinical studies, and state‑of‑the‑art computer science and data science to overcome one of medicine’s greatest challenges.

On‑site

Expectations

You will work on‑site full‑time at our office located in Menlo Park, California. Beginning in Fall 2026, you will work at our new headquarters in Sunnyvale, California.

Responsibilities
  • Design, build, and operate highly available, fault‑tolerant cloud infrastructure across AWS, GCP, and/or Azure.
  • Architect and maintain scalable CI/CD pipelines and deployment frameworks for enterprise‑grade software delivery.
  • Lead infrastructure‑as‑code adoption and maturity using tools such as Terraform, Cloud Formation, and Ansible.
  • Own Kubernetes reliability across multi‑cluster environments, including upgrades, scaling, and workload lifecycle management.
  • Establish and evolve observability platforms (metrics, logs, traces) and define SLO/SLI frameworks across teams.
  • Lead incident response for critical outages, drive root cause analysis, and implement preventative improvements.
  • Optimize infrastructure for cost, performance, and scalability, partnering closely with engineering and finance stakeholders.
  • Define and enforce Dev Ops, reliability, and security best practices across the organization.
  • Partner cross‑functionally with engineering, data, QA, security, and IT teams to design resilient systems.
  • Mentor engineers and contribute to technical leadership through design reviews, standards, and knowledge sharing.
Success Measures for the First Year
  • Conduct a comprehensive assessment of the current infrastructure, drive infrastructure‑as‑code adoption to 95%+ across critical systems, and establish clear health and reliability baselines for the Kubernetes platform.
  • Standardize observability using modern tooling and implement an SLO/SLI framework adopted across multiple product teams, including defined SLAs for critical data systems.
  • Strengthen security and compliance posture across cloud environments by implementing consistent baselines, launching a compliance‑as‑code framework, and reducing mean time to resolution (MTTR) for production incidents.
  • Define, document, and drive adoption of engineering standards, best practices, and operational guidelines across platform and product teams.
  • Develop and align stakeholders on a forward‑looking platform reliability and infrastructure roadmap.
  • Demonstrate measurable mentorship and technical leadership impact across the engineering organization.
  • Evaluate and provide recommendations on emerging infrastructure needs, including support for AI/ML and advanced data workloads.
Required Qualifications
  • BS in Computer Science, Engineering, or related field, or equivalent experience.
  • 8+ years of experience in Site Reliability Engineering, Dev Ops, or platform engineering.
  • Strong hands‑on experience with at least one major cloud platform (AWS, GCP, or Azure).
  • Experience implementing infrastructure‑as‑code solutions (Terraform, Cloud Formation, or similar).
  • Experience designing and operating CI/CD pipelines (e.g., Git Lab CI, Git Hub Actions, Jenkins).
  • Hands‑on experience with Kubernetes and containerized systems in production environments.
  • Proficiency in scripting or programming for automation (Python, Go, Bash, or Power Shell).
  • Experience with observability and monitoring tools (Prometheus, Grafana, Open Telemetry, Datadog).
  • Strong understanding of networking, security, and distributed systems fundamentals.
  • Experience working in regulated environments and familiarity with frameworks such as ISO 27001, NIST, SOC 2, or HIPAA.
Preferred Qualifications
  • 10+ years of experience in SRE, Dev Ops, or infrastructure engineering.
  • Experience operating multi‑cluster Kubernetes…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary