Site Reliability Engineering; SRE Architect Job Atlanta area,Georgia USA,IT/Tech

Position: Site Reliability Engineering (SRE) Architect

Site Reliability Engineering (SRE) Architect

Get AI-powered advice on this job and more exclusive features.

This range is provided by STAFFWORXS. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$70.00/hr - $75.00/hr

Direct message the job poster from STAFFWORXS

Delivery Manager @ STAFFWORXS | US IT Recruitment

Job Title:
Site Reliability Engineering (SRE) Architect

Location: Atlanta, Georgia

Work Model: Hybrid (In-person presence required)

Overview

We are seeking a highly experienced Site Reliability Engineering (SRE) Architect to lead the strategic design, development, and maturity of our reliability engineering practices. This role goes beyond operational support, focusing on defining the architectural blueprint, standards, and frameworks that guide development and SRE operations teams in building resilient, scalable, and high-performing systems. The SRE Architect will influence technology decisions, enhance system observability, and foster a culture of reliability across the organization.

Key Responsibilities

Reliability Strategy & Architecture
- Architect scalable, highly available, secure, and cost-effective solutions on AWS.
- Define and promote SRE standards, best practices, and architectural blueprints across engineering teams.
- Evaluate and enhance current observability systems, identifying gaps and driving next-level maturity to improve system insights.
- Lead the definition and implementation of SLIs, SLOs, and error budgets for critical services.
- Design solutions to eliminate operational toil through automation and improved system architecture.
- Assess existing SRE tools, CI/CD pipelines, IaC modules, and automated remediation frameworks, proposing improvements.
- Evaluate and recommend new tools, technologies, and practices to strengthen reliability, productivity, and operational excellence.
Technical Leadership & Consultation
- Serve as a senior advisor on reliability, scalability, and performance across development and platform teams.
- Offer architectural guidance for new services to ensure reliability principles are integrated from the start.
- Mentor SREs and engineers, promoting strong engineering discipline and adherence to SRE principles.
- Lead architecture reviews and production readiness assessments for critical systems.
Resilience Engineering
- Lead blameless postmortems for major incidents and drive systemic architectural improvements.
- Advocate and architect resilience patterns including circuit breakers, rate limiting, graceful degradation, and chaos engineering.

Required Qualifications

Proven experience in architectural roles focused on reliability, scalability, and performance.
Deep hands-on expertise with SRE principles (SLIs/SLOs, error budgets, automation, incident management).
Strong AWS experience across infrastructure, networking, and security.
Expertise with containerization and orchestration (Kubernetes, Docker, serverless).
Experience building observability solutions (Dynatrace, Prometheus, Grafana, ELK/EFK, Jaeger, Open Telemetry).
Strong programming/scripting abilities (Python, Go, Bash).
Excellent analytical and strategic problem-solving skills.
Strong communication, collaboration, and leadership abilities.

Preferred Qualifications

Experience implementing and maturing chaos engineering practices and platforms.

Seniority level

Mid-Senior level

Employment type

Contract

Job function

Other

Referrals increase your chances of interviewing at STAFFWORXS by 2x

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language