Site Reliability Engineer
Listed on 2026-06-09
-
IT/Tech
Systems Engineer, IT Support, Cybersecurity, SRE/Site Reliability
About Happy Robot
Happy Robot is the infrastructure for enterprises to build and orchestrate AI work forces. Our AI workers don't just communicate - they make decisions, take action, and run operations autonomously across voice, email, and enterprise systems. Born in Y Combinator (S23) and backed by a16z and Base
10 with over $60M raised, we power critical operations for global enterprises worldwide.
Our platform is battle-tested in the most demanding environments - where AI has real consequences. We started in logistics, built our own voice stack, models, and orchestration layer from the ground up, and are now bringing that infrastructure to every enterprise that runs the real economy. Learn more about our vision in our manifesto.
About the RoleWe're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.
This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.
Must-Have- 3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)
- Strong problem-solving skills and ability to dive into unfamiliar backend codebases
- Strong Go and Kubernetes experience.
- Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)
- Clear, calm communication under pressure — especially during live incidents
- Experience working with distributed systems or services at scale
- Built or maintained internal tooling for on-call teams or reliability workflows
- Familiarity with deployment pipelines, CI/CD, or infra-as-code
- Experience improving system observability (e.g., custom metrics, traces, log pipelines)
- Opportunity to work at a high-growth AI startup
, backed by top investors. - Fast Growth — Backed by a16z and YC
, on track for double-digit ARR
. - Top-Tier Compensation — Competitive salary + equity in a high-growth startup.
- Ownership & Autonomy — Take full ownership of projects and ship fast.
- Work With the Best — Join a world-class team of engineers and builders.
The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller.
By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer.
In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data.
If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through security subject to the GDPR.
For more information, visit https://(Use the "Apply for this Job" box below).-policy
By submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).