×
Register Here to Apply for Jobs or Post Jobs. X

Director, Site Reliability Engineering

Job in Ann Arbor, Washtenaw County, Michigan, 48113, USA
Listing for: Barracuda
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Project Manager
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Come Join Our Passionate Team!

At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data, and applications with innovative solutions that grow and adapt with our customers’ journey. More than 220,000 organizations worldwide trust Barracuda to protect them — in ways they may not even know they are at risk — so they can focus on taking their business to the next level.

We know a diverse workforce adds to our collective value and strength as an organization.

Barracuda Networks is proud to be an Equal Opportunity Employer, committed to equal employment opportunity and equitable compensation regardless of race, gender, religion, sex, sexual orientation, national origin, or disability.

Envision Yourself At Barracuda

We are seeking a strategic and visionary Director of Site Reliability Engineering (SRE), in the Cloud Operations group, to lead global reliability initiatives across Barracuda’s SaaS portfolio. You will oversee a distributed team of Site Reliability Engineers and partner closely with Product Engineering, Security & Compliance, and other Cloud Operations teams to ensure our platforms are highly available, scalable, secure, and cost-efficient.

This role will also drive AI-powered automation and agentic systems adoption to transform reliability operations.

What Will You Be Working On
  • Strategic Leadership:
    Define and execute Barracuda’s global SRE strategy, aligning reliability goals with business objectives and customer SLAs.
  • Operational Excellence:
    Drive continuous improvement in availability, latency, performance, and cost optimization across all cloud services.
  • AI & Agentic Systems Integration:
    Implement AI-driven observability and anomaly detection for proactive incident prevention; deploy agentic automation systems to manage routine operational tasks, optimize cloud resources, and accelerate remediation workflows; explore LLM-based runbooks and autonomous agents for incident triage and root cause analysis.
  • Cross-Functional Collaboration:

    Partner with Engineering, Security, and Fin Ops teams to embed reliability into product design and delivery pipelines.
  • Architecture & Governance:
    Influence architectural decisions for reliability, disaster recovery, and observability systems; ensure compliance with security and regulatory standards.
  • Automation & Tooling:
    Champion Infrastructure-as-Code and CI/CD automation at scale using Terraform, Cloud Formation, Git Hub Actions, and Jenkins.
  • Incident &

    Risk Management:

    Facilitate incident response protocols, conduct executive-level postmortems, and implement proactive risk mitigation strategies.
  • Service Level Management:
    Define and enforce SLIs and SLOs across global services; report reliability metrics to executive leadership.
  • Team Development:
    Build and mentor a high-performing SRE organization; foster a culture of ownership, innovation, and collaboration across regions.
  • Cloud Optimization:
    Lead initiatives for cost governance and performance tuning in AWS and Azure environments.
  • Executive Communication:
    Present reliability roadmaps, KPIs, and risk assessments to senior leadership and stakeholders.
What you bring to the role
  • Experience: 12+ years in infrastructure, cloud operations, or SRE roles, including 5+ years in leadership positions managing distributed teams.
  • Cloud Expertise: Deep knowledge of AWS and Azure architectures, security, and operations in large-scale SaaS environments.
  • AI & Automation: Experience implementing AI-driven observability, predictive analytics, and autonomous remediation systems.
  • Infrastructure as Code: Proven success implementing such as Terraform or Cloud Formation at enterprise scale.
  • CI/CD & Automation: Advanced experience with Git Hub Actions, Jenkins, and deployment strategies (blue/green, canary, rolling).
  • Container Orchestration: Expertise in Kubernetes (EKS, AKS) and containerized workloads.
  • Observability & Resilience: Strong background in Prometheus, Grafana, ELK, and APM tools; experience designing self-healing systems.
  • Programming: Proficiency in Python, Go, or similar…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary