Senior SRE Job Chattanooga area,Tennessee USA,IT/Tech

CBTS serves enterprise and midmarket clients in all industries across the United States and Canada. CBTS combines deep technical expertise with a full suite of flexible technology solutions--including Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, CBTS delivers comprehensive technology solutions for its clients' transformative business initiatives.

For more information, please visit .

OnX is a leading technology solution provider that serves businesses, healthcare organizations, and government agencies across Canada. OnX combines deep technical expertise with a full suite of flexible technology solutions—including Generative AI, Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, OnX delivers comprehensive technology solutions for its clients’ transformative business initiatives.

For more information, please visit .

Job Title: Senior Site Reliability Engineer (SRE) – Splunk Specialist

Location: Remote
Experience: 6+ years
Employment Type: Full-time

Role Overview:

We are seeking a Senior Site Reliability Engineer (SRE) with strong experience in Splunk to ensure the reliability, scalability, and performance of our systems. The ideal candidate will design and implement monitoring solutions, automate operational tasks, and collaborate with development teams to improve system resilience and observability.

Key Responsibilities:

Design, implement, and maintain Splunk dashboards, alerts, and reports for system monitoring and incident management.
Develop and optimize observability solutions for infrastructure and applications.
Automate operational processes using scripting and configuration management tools .
Collaborate with development and operations teams to improve system reliability and performance .
Troubleshoot and resolve production issues , ensuring minimal downtime.
Implement incident response and root cause analysis processes.
Drive capacity planning, performance tuning, and scalability improvements .
Ensure compliance with security and governance standards .

Required Skills &

Qualifications:

Strong experience with Splunk (configuration, dashboard creation, alerting, log analysis).
Proficiency in Linux/Unix systems administration .
Hands-on experience with cloud platforms (AWS, Azure, or GCP).
Strong scripting skills in Python, Shell, or similar languages .
Familiarity with CI/CD pipelines and automation tools (Ansible, Terraform, Jenkins).
Knowledge of monitoring and observability tools (Prometheus, Grafana, ELK).
Excellent troubleshooting and problem-solving skills.

Preferred

Skills:

Experience with containerization and orchestration (Docker, Kubernetes).
Exposure to incident management frameworks (ITIL, SRE best practices).
Understanding of security monitoring and compliance .

Education:

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language