Expert Site Reliability Engineer at Confluent
Job in
Toronto, Ontario, C6A, Canada
Listed on 2026-06-16
Listing for:
IBM
Full Time
position Listed on 2026-06-16
Job specializations:
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
You'll work with in a multi-cloud architecture to optimize performance and reliability.
This expert role blends 75% technical engineering with 25% strategy, involving the analysis of systemic failure patterns, designing reliability frameworks, and teaching best practices. You'll be instrumental in developing incident response processes that facilitate organizational success and sustainability. Join a global team dedicated to improving cloud-based reliability.
Key Responsibilities:
• Analyze and improve systemic failure patterns
• Own configuration and workflows for incident management tools
• Define SLO/SLA frameworks to guide reliability investments
• Edit incident documents for customer clarity
• Lead training programs and coach teams through post-mortems
Requirements:
• 10+ years of experience in SRE or incident management
• Cloud experience with AWS, GCP, or Azure
• Expertise in incident management tools such as Rootly
• Strong understanding of distributed systems
• Experience in cultural change within engineering organizations
Utilize your reliability engineering expertise to drive impactful changes across Confluent's architecture.
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×