Senior Site Reliability Engineer; SRE Job Redwood City area,California USA,IT/Tech

Position: Senior Site Reliability Engineer (SRE)

At Bellota Labs
, we are a fast-paced, hypergrowth startup poised to revolutionize the gaming world with ClubWPT Gold
—a groundbreaking product from the World Poker Tour
. Driven by innovation, game integrity, and exceptional customer experiences, we are on a mission to set new standards in online gaming.

We are seeking an experienced Senior Site Reliability Engineer (SRE) to design, build, and maintain highly reliable, scalable, and secure systems. You will play a critical role in ensuring system availability, performance, and operational excellence across our infrastructure and applications.

As a senior member of the team, you will also mentor engineers, influence architecture decisions, and drive best practices in reliability engineering, automation, and incident management.

Key Responsibilities:

Reliability & Availability

Design and implement highly available, scalable, and fault-tolerant systems.
Define and maintain SLIs, SLOs, and SLAs.
Lead incident response, root cause analysis (RCA), and postmortems.
Improve system resiliency and reduce operational toil through automation.

Observability & Monitoring

Design monitoring, alerting, and logging strategies.
Implement tools such as Prometheus, Grafana, Datadog, ELK, or similar.
Establish proactive alerting and capacity planning processes.

Performance & Scalability

Conduct performance testing and optimization.
Identify bottlenecks and implement improvements.
Support system scaling initiatives and architecture reviews.

Collaboration & Leadership

Partner with engineering teams to embed reliability into development processes.
Lead reliability initiatives and cross-functional projects.
Mentor junior engineers and promote SRE best practices.

Experience:

5+ years of experience in SRE, Dev Ops, or Infrastructure Engineering.
Strong experience with cloud platforms (AWS).
Deep understanding of Linux systems and networking fundamentals.
Experience with containerization and orchestration (Docker, Kubernetes).
Proficiency in scripting/programming (Python, Go, Bash, or similar).
Experience with monitoring and observability platforms (Datadog/Prometheus).

Preferred Technologies (Nice to Have):

Experience operating high-scale production systems.
Experience with microservices architecture.
Background in database reliability (Postgres, MySQL, Redis, etc.).
Experience implementing SRE practices (error budgets, blameless postmortems).
Experience with AI-driven SRE

$200,000 - $250,000 a year

Lead High-Impact Projects – Play a key role in delivering innovative gaming experiences to a global audience

Collaborate Across Borders – Work with talented teams across Asia and the US

Fast-Paced Growth – Be part of a hypergrowth startup with ambitious goals

Competitive Benefits – Enjoy a top-tier compensation package in a dynamic company

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language