Site Reliability Engineer LLM Platforms
Job in
Toronto, Ontario, C6A, Canada
Listed on 2026-06-20
Listing for:
Insight Global
Full Time
position Listed on 2026-06-20
Job specializations:
-
IT/Tech
SRE/Site Reliability, Systems Engineer, Cloud Computing: Infrastructure & Operations
Job Description & How to Apply Below
We're seeking a Site Reliability Engineer to enhance our real-time transcription and summarization platforms powered by Large Language Models.
Your role will be vital in ensuring system reliability and performance.
As an SRE, you will confront the complexities inherent in handling diverse real-time communication formats. Your daily tasks will include monitoring system health, optimizing infrastructure, and managing incident responses. You will implement automation to maintain efficient, low-latency data pipelines across various platforms.
Key Responsibilities:
• Monitor and manage the health of LLM platforms
• Improve performance through enhanced observability
• Respond effectively to incidents when they arise
• Scale infrastructure to meet growing operational needs
• Automate processes for reliable data handling
Requirements:
• Expertise in Dynatrace along with RUM
o
• Familiarity with Elastic Search and Grafana
• Experience with Open Shift and Open Telemetry
• Knowledge of MongoDB and Post Gres systems
• Proficient in Git Hub Actions CI/CD, Hashi Corp Vault
Become a crucial part of our team dedicated to reliability and performance.
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×