Cloud Operations Engineer
Listed on 2026-01-01
-
IT/Tech
Cloud Computing, Systems Engineer
At Cloudbeds, we're not just building software, we’re transforming hospitality. Our intelligently designed platform powers properties across 150 countries, processing billions in bookings annually. From independent properties to hotel groups, we help hoteliers transform operations and uplevel their commercial strategy through a unified platform that integrates with hundreds of partners. And we do it with a completely remote team. Imagine working alongside global innovators to build AI-powered solutions that solve hoteliers' biggest challenges.
Since our founding in 2012, we've become the World's Best Hotel PMS Solutions Provider and landed on Deloitte's Technology Fast 500 again in 2024 – but we're just getting started.
As a Cloud Operations Engineer
, you’ll be the frontline support for our global infrastructure, playing a key role in ensuring 24/7 operational stability across our AWS-based environment. Your core responsibilities will include monitoring critical systems through platforms such as Datadog, Pager Duty, and Cloud Watch, rapidly validating alerts, and escalating verified incidents based on clearly defined protocols.
You’ll execute operational tasks, follow documented procedures for common issues, and manage standard maintenance activities. You'll also have opportunities to collaborate directly with senior engineers across SRE, Dev Ops, and Infrastructure teams, contributing to the resolution of a wide range of technical challenges and gaining exposure to complex, real-world systems.
Acting as the central communication point during incidents, you’ll maintain clear, timely updates to stakeholders and facilitate smooth transitions between engineering and support teams.
Our Network Operations TeamYou’ll be joining a brand-new team at the ground level, helping shape the future of SaaS operations for a company undergoing exciting growth. Working closely with SRE, Dev Ops, Security, and various Workload teams, you’ll be at the heart of collaborative problem-solving and operational innovation. It’s a rare chance to build, influence, and grow in a highly visible and impactful role.
This role offers a rare opportunity to gain deep, hands-on experience in cloud operations and incident management while working alongside high-performing engineering teams. You'll build the foundation for growth into specialized areas like SRE, Dev Ops, or Infrastructure Engineering, with direct exposure to real-world systems at scale.
What You Bring to the Team- Support Kubernetes (EKS) environments by performing operational checks, validating pod health, reviewing logs, and assisting with incident triage during deployments and scaling events
- Assist with CI/CD pipeline operations by supporting deployments, rollbacks, and release verification in collaboration with Dev Ops and platform engineering teams using ArgoCD and Git Hub
- Execute Infrastructure as Code changes and standard operating procedures using Terraform across cloud infrastructure and application services
- Monitor, triage, and validate incidents using observability and alerting tools such as Pager Duty, Datadog, Amazon Cloud Watch, Prometheus, and Grafana, escalating to SRE, Dev Ops, or application teams as appropriate
- Execute documented runbooks and SOPs to resolve common operational issues, including basic AWS troubleshooting, infrastructure access requests (SSO, VPN, IAM), and deployment support
- Perform routine operational tasks such as configuration changes, maintenance activities, and standard change requests across cloud infrastructure and application services
- Contribute to operational excellence by maintaining and improving runbooks, updating documentation, and participating in post-incident reviews (RCA) to drive reliability improvements
- 3-4 years of hands‑on experience in Dev Ops, Site Reliability Engineering (SRE), or related operational roles with focus on cloud infrastructure
- Practical experience with Amazon EKS (Elastic Kubernetes Service) or other managed Kubernetes platforms, including container orchestration and operational management
- Hands‑on experience with CI/CD and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).