Principal Site Reliability Engineer
Listed on 2025-11-27
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Principal Site Reliability Engineer
Playson is a leading online gaming supplier, founded in 2012, with worldwide recognition in the industry.
We provide our customers with a high‑end micro‑service‑based platform as a service that processes billions of financial transactions per day.
We offer a cross‑regional setup and are focused on reducing latency to zero, investing heavily in delivering the best game experience and smooth connection regardless of the internet coverage and bandwidth of the game clients.
We are currently seeking an experienced Senior Site Reliability Engineer to join our dynamic Platform Tribe.
Key Responsibilities- Manage day‑to‑day alerts, system checks, and issue escalation as necessary.
- Provide 24x7 on‑call support for critical SaaS events.
- Document issues and remediation steps.
- Proactively create monitors within the EKS/K8s ecosystem.
- Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
- Enhance infrastructure health by implementing checks and scripts to address known issues.
- Maintain and develop deployment code.
- Implement and integrate new technologies into our Cloud Infrastructure.
- Collaborate with other teams to provide top‑notch support and assistance.
- Prioritize customer focus in planning deployments/updates to ensure minimal impact.
- Conduct root cause analysis (RCA) and take corrective actions to prevent issue recurrence.
- Assign alert‑related actions to the appropriate team after investigation.
- Handle support requests for environment‑specific actions.
- Strong experience with issue processing (RCA, post‑mortems).
- Proficiency in Kubernetes (deployment, scaling, troubleshooting).
- Familiarity with AWS, Terraform, Docker, CI/CD.
- Experience with monitoring tools such as Data Dog, Prometheus, Grafana, and logging solutions (ELK Stack) or AWS Cloud Watch.
- Strong understanding of networking concepts and protocols.
- Proficiency in at least one scripting language (Python, NodeJS, Go).
- Experience with configuration management tools like FluxCD/ArgoCD.
- Proficiency in Git or other version control systems.
- Familiarity with incident response and management tools (Pager Duty, Opsgenie, Victor Ops).
- Ownership, proactiveness, persistence, and passion for maintaining a high‑traffic online platform.
- HR Interview
- Technical Interview
- Final Interview with Head of Platform & CTO
- Flexibility in your schedule
- Full medical insurance for you and your +1
- Unlimited paid vacation leave
- Bonus system
- Unlimited sick leave
- Courses and training reimbursement
If you're ready to embrace ambitious goals and thrive in a dynamic environment,
Apply now and become part of Playson's exciting journey in the iGaming world!
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).