Senior Site Reliability Engineer - Observability
Seattle, King County, Washington, 98127, USA
Listed on 2025-12-01
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Senior Site Reliability Engineer – Observability
Join Lambda’s AI cloud mission as a Senior Site Reliability Engineer focused on Observability. This role requires onsite presence in the San Francisco office four days a week, with a remote work day on Tuesday.
Base Pay Range$240,000 – $401,000 per year
What You’ll Do- Deploy and operate observability platforms for logging, metrics, and distributed tracing.
- Automate the deployment and operation of these systems.
- Set up monitoring for modern AI/HPC clusters.
- Develop platform software that improves system reliability across Lambda engineering.
- Lead other engineering teams to design and develop solutions for monitoring challenges.
- 8+ years software engineering, 3+ years in Go.
- 5+ years SRE practices.
- Proven understanding of observability tools and practices.
- Experience with Kubernetes deployment and monitoring.
- Experience building CI/CD pipelines.
- Expect quality and reliability from the solutions you build.
- Collaborate across team boundaries to meet observability needs.
- Monitoring AI systems or HPC clusters.
- Prometheus and PromQL queries.
- Messaging systems like NATS.
- Open Telemetry ecosystem experience.
- Network monitoring, Ethernet, Infiniband.
- Dashboard design principles.
- Linux fundamentals and system administration.
- Ansible, Terraform infrastructure automation.
- Generous cash & equity compensation.
- Health, dental, vision coverage.
- Wellness and commuter stipends.
- 401(k) with 2% company match (USA).
- Flexible paid time off.
Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
A Final NoteYou do not need to meet all of the listed expectations to apply for this position. Lambda is committed to building a team with a variety of backgrounds, experiences, and skills.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).