Senior SRE
Listed on 2026-02-12
-
IT/Tech
Cloud Computing, Cybersecurity, Systems Engineer, IT Support
CBTS serves enterprise and midmarket clients in all industries across the United States and Canada. CBTS combines deep technical expertise with a full suite of flexible technology solutions--including Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, CBTS delivers comprehensive technology solutions for its clients' transformative business initiatives.
For more information, please visit .
OnX is a leading technology solution provider that serves businesses, healthcare organizations, and government agencies across Canada. OnX combines deep technical expertise with a full suite of flexible technology solutions—including Generative AI, Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, OnX delivers comprehensive technology solutions for its clients’ transformative business initiatives.
For more information, please visit .
Location: Remote
Experience: 6+ years
Employment Type: Full-time
We are seeking a Senior Site Reliability Engineer (SRE) with strong experience in Splunk to ensure the reliability, scalability, and performance of our systems. The ideal candidate will design and implement monitoring solutions, automate operational tasks, and collaborate with development teams to improve system resilience and observability.
Key Responsibilities:- Design, implement, and maintain Splunk dashboards, alerts, and reports for system monitoring and incident management.
- Develop and optimize observability solutions for infrastructure and applications.
- Automate operational processes using scripting and configuration management tools .
- Collaborate with development and operations teams to improve system reliability and performance .
- Troubleshoot and resolve production issues , ensuring minimal downtime.
- Implement incident response and root cause analysis processes.
- Drive capacity planning, performance tuning, and scalability improvements .
- Ensure compliance with security and governance standards .
Qualifications:
- Strong experience with Splunk (configuration, dashboard creation, alerting, log analysis).
- Proficiency in Linux/Unix systems administration .
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Strong scripting skills in Python, Shell, or similar languages .
- Familiarity with CI/CD pipelines and automation tools (Ansible, Terraform, Jenkins).
- Knowledge of monitoring and observability tools (Prometheus, Grafana, ELK).
- Excellent troubleshooting and problem-solving skills.
Skills:
- Experience with containerization and orchestration (Docker, Kubernetes).
- Exposure to incident management frameworks (ITIL, SRE best practices).
- Understanding of security monitoring and compliance .
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).