Senior Site Reliability Engineer; SRE
Listed on 2026-02-28
-
IT/Tech
SRE/Site Reliability, Systems Engineer, Cloud Computing, Network Engineer
At Bellota Labs
, we are a fast-paced, hypergrowth startup poised to revolutionize the gaming world with ClubWPT Gold
—a groundbreaking product from the World Poker Tour
. Driven by innovation, game integrity, and exceptional customer experiences, we are on a mission to set new standards in online gaming.
We are seeking an experienced Senior Site Reliability Engineer (SRE) to design, build, and maintain highly reliable, scalable, and secure systems. You will play a critical role in ensuring system availability, performance, and operational excellence across our infrastructure and applications.
As a senior member of the team, you will also mentor engineers, influence architecture decisions, and drive best practices in reliability engineering, automation, and incident management.
Key Responsibilities:Reliability & Availability
- Design and implement highly available, scalable, and fault-tolerant systems.
- Define and maintain SLIs, SLOs, and SLAs.
- Lead incident response, root cause analysis (RCA), and postmortems.
- Improve system resiliency and reduce operational toil through automation.
- Design monitoring, alerting, and logging strategies.
- Implement tools such as Prometheus, Grafana, Datadog, ELK, or similar.
- Establish proactive alerting and capacity planning processes.
- Conduct performance testing and optimization.
- Identify bottlenecks and implement improvements.
- Support system scaling initiatives and architecture reviews.
- Partner with engineering teams to embed reliability into development processes.
- Lead reliability initiatives and cross-functional projects.
- Mentor junior engineers and promote SRE best practices.
- 5+ years of experience in SRE, Dev Ops, or Infrastructure Engineering.
- Strong experience with cloud platforms (AWS).
- Deep understanding of Linux systems and networking fundamentals.
- Experience with containerization and orchestration (Docker, Kubernetes).
- Proficiency in scripting/programming (Python, Go, Bash, or similar).
- Experience with monitoring and observability platforms (Datadog/Prometheus).
- Experience operating high-scale production systems.
- Experience with microservices architecture.
- Background in database reliability (Postgres, MySQL, Redis, etc.).
- Experience implementing SRE practices (error budgets, blameless postmortems).
- Experience with AI-driven SRE
$200,000 - $250,000 a year
Lead High-Impact Projects – Play a key role in delivering innovative gaming experiences to a global audience
Collaborate Across Borders – Work with talented teams across Asia and the US
Fast-Paced Growth – Be part of a hypergrowth startup with ambitious goals
Competitive Benefits – Enjoy a top-tier compensation package in a dynamic company
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).