Site Reliability Engineering; SRE Architect
Listed on 2025-12-25
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Site Reliability Engineering (SRE) Architect
Get AI-powered advice on this job and more exclusive features.
This range is provided by STAFFWORXS. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range$70.00/hr - $75.00/hr
Direct message the job poster from STAFFWORXS
Delivery Manager @ STAFFWORXS | US IT RecruitmentJob Title:
Site Reliability Engineering (SRE) Architect
Location: Atlanta, Georgia
Work Model: Hybrid (In-person presence required)
OverviewWe are seeking a highly experienced Site Reliability Engineering (SRE) Architect to lead the strategic design, development, and maturity of our reliability engineering practices. This role goes beyond operational support, focusing on defining the architectural blueprint, standards, and frameworks that guide development and SRE operations teams in building resilient, scalable, and high-performing systems. The SRE Architect will influence technology decisions, enhance system observability, and foster a culture of reliability across the organization.
Key Responsibilities- Reliability Strategy & Architecture
- Architect scalable, highly available, secure, and cost-effective solutions on AWS.
- Define and promote SRE standards, best practices, and architectural blueprints across engineering teams.
- Evaluate and enhance current observability systems, identifying gaps and driving next-level maturity to improve system insights.
- Lead the definition and implementation of SLIs, SLOs, and error budgets for critical services.
- Design solutions to eliminate operational toil through automation and improved system architecture.
- Assess existing SRE tools, CI/CD pipelines, IaC modules, and automated remediation frameworks, proposing improvements.
- Evaluate and recommend new tools, technologies, and practices to strengthen reliability, productivity, and operational excellence.
- Technical Leadership & Consultation
- Serve as a senior advisor on reliability, scalability, and performance across development and platform teams.
- Offer architectural guidance for new services to ensure reliability principles are integrated from the start.
- Mentor SREs and engineers, promoting strong engineering discipline and adherence to SRE principles.
- Lead architecture reviews and production readiness assessments for critical systems.
- Resilience Engineering
- Lead blameless postmortems for major incidents and drive systemic architectural improvements.
- Advocate and architect resilience patterns including circuit breakers, rate limiting, graceful degradation, and chaos engineering.
- Proven experience in architectural roles focused on reliability, scalability, and performance.
- Deep hands-on expertise with SRE principles (SLIs/SLOs, error budgets, automation, incident management).
- Strong AWS experience across infrastructure, networking, and security.
- Expertise with containerization and orchestration (Kubernetes, Docker, serverless).
- Experience building observability solutions (Dynatrace, Prometheus, Grafana, ELK/EFK, Jaeger, Open Telemetry).
- Strong programming/scripting abilities (Python, Go, Bash).
- Excellent analytical and strategic problem-solving skills.
- Strong communication, collaboration, and leadership abilities.
- Experience implementing and maturing chaos engineering practices and platforms.
- Mid-Senior level
- Contract
- Other
Referrals increase your chances of interviewing at STAFFWORXS by 2x
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).