Site Reliability Engineer
Listed on 2026-01-06
-
IT/Tech
Cloud Computing
About Canonical
Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of globally distributed collaboration, with 1200+ colleagues in 75+ countries and very few office‑based roles.
Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution. The company is founder‑led, profitable, and growing.
Our goal is to perfect enterprise infrastructure Dev Ops practices, raising the bar on what's possible with automation by embracing a model‑driven approach, whether on‑premise or on public clouds. We run hundreds of private cloud, Kubernetes clusters, and applications for customers across both physical and public cloud estates. We identify and address incidents, monitor and observe applications, anticipate potential issues, and enable product refinement to ultimately achieve high‑quality standards in our open source portfolio.
Role OverviewLocation:
Globally remote role.
We deploy and run Open Stack, Kubernetes, storage solutions, and open source applications, applying Dev Ops practices. Your work will encompass the entire stack, from bare‑metal networking and kernel up to Kubernetes and open source applications. You will be trained in our core technologies like Open Stack, Kubernetes, security standards, open source products such as Kubeflow, Kafka, Open Search, databases, and many others.
Automation for us is a software engineering problem that we approach with a scientific mindset to bring operations at scale, driven by metrics and code.
- Deploy and maintain Open Stack, Kubernetes, storage solutions, and open source applications.
- Operate and monitor infrastructure across private cloud and public cloud estates.
- Investigate incidents, monitor performance, and anticipate potential issues to support product refinement.
- Build and maintain automation and tooling to enable operations at scale.
- Collaborate with cross‑functional teams to continuously improve infrastructure.
- Degree in software engineering or computer science.
- Python software development experience.
- Operational experience in Linux environments.
- Experience with Kubernetes deployment or operations.
- Excellent interpersonal skills, curiosity, flexibility, and accountability.
- Ability to travel internationally twice a year for company events up to two weeks long.
- Familiarity with Open Stack deployment or operations.
- Familiarity with public cloud deployment or operations.
- Familiarity with private cloud management.
- Distributed work environment with twice‑yearly team sprints in person.
- Personal learning and development budget of USD 2,000 per year.
- Every 6 months compensation review.
- Recognition rewards.
- Annual holiday leave.
- Maternity and paternity leave.
- Employee Assistance Programs.
- Opportunity to travel to new locations to meet your colleagues.
- Priority Pass and travel upgrades for long‑haul company events.
Canonical is an equal opportunity employer. We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background creates a better work environment and better products. Whatever your identity, we will give your application fair consideration.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).