SRE ARCHITECT
Job in
Fremont, Alameda County, California, 94537, USA
Listed on 2026-06-01
Listing for:
Info Way Solutions LLC
Full Time
position Listed on 2026-06-01
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
We are seeking a highly experienced Site Reliability Engineering (SRE) Architect to lead the design, implementation, and governance of highly reliable, scalable, and resilient distributed systems. This role requires a strategic thinker with deep technical expertise who can drive SRE best practices, define reliability standards, and ensure production stability across complex cloud and hybrid environments. Key Responsibilities
Architectural Strategy
- Design and implement scalable, resilient, and high-performance infrastructure across cloud and hybrid environments
- Establish architectural standards for reliability and fault tolerance
- Define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Collaborate with stakeholders to align reliability goals with business objectives
- Drive Infrastructure-as-Code (IaC) adoption using tools like Terraform and Ansible
- Lead automation initiatives to reduce manual operational effort ( "toil )
- Enhance CI/CD pipelines and implement self-healing systems
- Design and implement observability frameworks including monitoring, logging, and distributed tracing
- Utilize tools such as Dynatrace, Grafana, and Splunk for proactive system monitoring
- Lead incident response, root cause analysis (RCA), and postmortems
- Implement chaos engineering practices to improve system resilience
- Mentor junior SREs and Dev Ops engineers
- Promote SRE culture, best practices, and operational excellence across teams
- Experience: 10 12+ years in SRE, Dev Ops, Software Engineering, or System Administration
- Programming/Scripting: Proficiency in Go, Python, Java, or Bash
- Cloud Platforms: Strong experience with AWS, GCP, or Azure
- Infrastructure as Code (IaC): Hands-on expertise with Terraform, Ansible
- Containerization: Deep understanding of Kubernetes and Docker
- Observability Tools: Experience with Dynatrace, Grafana, Splunk
- Strong troubleshooting, analytical, and problem-solving skills
- Experience in large-scale distributed systems
- Exposure to enterprise environments and high-availability systems
- Strong communication and stakeholder management skills
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×