Site Reliability Engineer
Listed on 2026-06-28
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing: Infrastructure & Operations
Site Reliability Engineer
We are looking for a Site Reliability Engineer to own the digital infrastructure that powers our research.
This includes compute resources that we rent from third parties, container registries, and dashboards. The main objective is to make sharing these resources easy and efficient, ensuring the infrastructure is reliable and accessible to the right people.
This role spans a broad spectrum of activities:
Compute Access:
Ensure easy and efficient access to compute resources for our researchers.Resource Visibility:
Provide clear visibility into resource utilization and cluster health.Auto-Scaling:
Enable automatic scaling of compute resources based on demand.Access Management:
Ensure the right people have access to the right resources.Reproducibility:
Drive towards deterministic deployments and reproducible research environments.Process Automation:
Automate operational processes where it makes sense to increase efficiency.Current stack:
Ansible, Kubernetes, Docker, Tailscale, Python, Grafana, Prometheus, and Talos Linux. We're not religious about any of it.
Qualifications:
Ownership:
You are comfortable being the person accountable when the cluster is unhealthy or capacity is tight.Systems Intuition:
You understand how schedulers, containers, networking, storage, and hardware interact. You can reason about failure modes and design systems that degrade predictably.Operational Rigor:
You value observability, reproducibility, and clear operational boundaries. You leave systems in a state that other engineers can understand, operate, and debug without you.Pragmatism:
You can support experimental research workloads without forcing everything into a rigid "production" mold. You know when to stabilize and when to allow controlled chaos to speed up discovery.
Location & Visa:
This role is in-person in Emeryville, CA.
Visa sponsorship may be available for qualified candidates.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).