×
Register Here to Apply for Jobs or Post Jobs. X

Sr. Site Reliability Engineer

Job in Phoenix, Maricopa County, Arizona, 85003, USA
Listing for: ExecutivePlacements.com
Full Time position
Listed on 2026-01-02
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Job Description & How to Apply Below

Senior Site Reliability Engineer (SRE)

Location: Phoenix, AZ

Join the Cloud Operations and Observability team as an SRE. You will drive resiliency, performance, automation, and AI‑driven observability across hybrid Azure and GCP environments. Your focus will be on designing, implementing, and managing Kubernetes infrastructure and integrating AI/LLM solutions into observability and operational workflows.

Key Responsibilities
  • Build and operate scalable, secure, and highly available infrastructure in Azure and GCP.
  • Design and maintain observability platforms leveraging Splunk, Open Telemetry, and cloud‑native monitoring tools.
  • Develop and support AI/LLM‑driven automation solutions to improve incident triage, alert correlation, and root‑cause analysis.
  • Partner with application and data teams to define SLOs, SLIs, and error budgets.
  • Drive operational excellence through automation, chaos testing, and proactive reliability improvements.
  • Optimize Kubernetes environments (GKE/AKS) for performance, security, and cost‑efficiency.
  • Integrate observability data pipelines with LLMs for anomaly detection, summarization, and proactive remediation.
  • Participate in on‑call rotations, incident response, and post‑mortems.
  • Implement runbooks, auto‑remediation scripts, and AI copilots for operations.
Required Qualifications
  • 8+ years of experience as an SRE.
  • Strong expertise in Azure and GCP cloud platforms (certifications a plus).
  • Proficient in Splunk (Enterprise + Observability) for monitoring, alerting, and log analytics.
  • In‑depth knowledge of Kubernetes (AKS, GKE), Helm, and container lifecycle.
  • Familiarity with AI/ML and LLM‑based tools (e.g., OpenAI, Hugging Face, Azure OpenAI) for observability or automation use cases.
  • Experience with CI/CD pipelines, Git Ops, and secure deployment practices.
  • Programming/scripting skills in Python, Go, or Bash.
  • Strong understanding of SRE principles: SLAs, SLIs, SLOs, error budgets, and incident management.
Preferred Qualifications
  • Experience building AI‑enabled runbooks or copilots.
  • Exposure to Fin Ops or cost‑optimization strategies in cloud environments.
  • Knowledge of distributed tracing and event correlation using Open Telemetry.
  • Familiarity with Kafka, Pub/Sub, or other messaging systems for observability data.
Seniority Level

Mid‑Senior level

Employment Type

Full‑time

Job Function

Engineering and Information Technology

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary