Site Reliability Engineer
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-09
Listing for:
HappyRobot Inc
Full Time
position Listed on 2026-06-09
Job specializations:
-
IT/Tech
IT Support, Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Happy Robot is the infrastructure for enterprises to build and orchestrate AI work forces. Our AI workers don't just communicate - they make decisions, take action, and run operations autonomously across voice, email, and enterprise systems. Born in Y Combinator (S23) and backed by a16z and Base
10 with over $60M raised, we power critical operations for global enterprises worldwide.
Our platform is battle-tested in the most demanding environments - where AI has real consequences. We started in logistics, built our own voice stack, models, and orchestration layer from the ground up, and are now bringing that infrastructure to every enterprise that runs the real economy. Learn more about our vision in our manifesto.
About the Role
We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You'll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.
This is a high-impact, high-trust role where you'll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.
Must-Have
- 3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)
- Strong problem-solving skills and ability to dive into unfamiliar backend codebases
- Strong Go and Kubernetes experience.
- Familiarity with observability and monitoring tools (e.g., Grafana, Prometheus, Sentry)
- Clear, calm communication under pressure - especially during live incidents
- Experience working with distributed systems or services at scale
- Built or maintained internal tooling for on-call teams or reliability workflows
- Familiarity with deployment pipelines, CI/CD, or infra-as-code
- Experience improving system observability (e.g., custom metrics, traces, log pipelines)
- Opportunity to work at a high-growth AI startup, backed by top investors.
- Fast Growth - Backed by a16z and YC, on track for double-digit ARR.
- Top-Tier Compensation - Competitive salary + equity in a high-growth startup.
- Ownership & Autonomy - Take full ownership of projects and ship fast.
- Work With the Best - Join a world-class team of engineers and builders.
Extreme Ownership
We take full responsibility for our work, outcomes, and team success. No excuses, no blame-shifting - if something needs fixing, we own it and make it better. This means stepping up, even when it's not "your job." If a ball is dropped, we pick it up. If a customer is unhappy, we fix it. If a process is broken, we redesign it.
We don't wait for someone else to solve it - we lead with accountability and expect the same from those around us.
Craftsmanship
Putting care and intention into every task, striving for excellence, and taking deep ownership of the quality and outcome of your work. Craftsmanship means never settling for "just fine." We sweat the details because details compound. Whether it's a product feature, an internal doc, or a sales call - we treat it as a reflection of our standards. We aim to deliver jaw-dropping customer experiences by being curious, meticulous, and proud of what we build - even when nobody's watching.
We are "majos"
Be friendly & have fun with your coworkers. Always be genuine & honest, but kind. "Majo" is our way of saying: be a good human. Be approachable, helpful, and warm. We're building something ambitious, and it's easier (and more fun) when we enjoy the ride together. We give feedback with kindness, challenge each other with respect, and celebrate wins together without ego.
Urgency with Focus
Create the highest impact in the shortest amount of time. Move fast, but in the right direction. We operate with speed because time is our most limited resource. But speed without focus is chaos. We prioritize ruthlessly, act decisively, and stay aligned. We aim for high…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×