×
Register Here to Apply for Jobs or Post Jobs. X

SRE ARCHITECT

Job in Fremont, Alameda County, California, 94537, USA
Listing for: Info Way Solutions LLC
Full Time position
Listed on 2026-06-01
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Job Title : SRE Architect Job Summary
We are seeking a highly experienced Site Reliability Engineering (SRE) Architect to lead the design, implementation, and governance of highly reliable, scalable, and resilient distributed systems. This role requires a strategic thinker with deep technical expertise who can drive SRE best practices, define reliability standards, and ensure production stability across complex cloud and hybrid environments. Key Responsibilities
Architectural Strategy
  • Design and implement scalable, resilient, and high-performance infrastructure across cloud and hybrid environments
  • Establish architectural standards for reliability and fault tolerance
SRE Governance
  • Define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
  • Collaborate with stakeholders to align reliability goals with business objectives
Automation & Toil Reduction
  • Drive Infrastructure-as-Code (IaC) adoption using tools like Terraform and Ansible
  • Lead automation initiatives to reduce manual operational effort ( "toil )
  • Enhance CI/CD pipelines and implement self-healing systems
Observability & Monitoring
  • Design and implement observability frameworks including monitoring, logging, and distributed tracing
  • Utilize tools such as Dynatrace, Grafana, and Splunk for proactive system monitoring
Incident Management & Chaos Engineering
  • Lead incident response, root cause analysis (RCA), and postmortems
  • Implement chaos engineering practices to improve system resilience
Mentorship & Leadership
  • Mentor junior SREs and Dev Ops engineers
  • Promote SRE culture, best practices, and operational excellence across teams
Required Skills & Experience
  • Experience: 10 12+ years in SRE, Dev Ops, Software Engineering, or System Administration
  • Programming/Scripting: Proficiency in Go, Python, Java, or Bash
  • Cloud Platforms: Strong experience with AWS, GCP, or Azure
  • Infrastructure as Code (IaC): Hands-on expertise with Terraform, Ansible
  • Containerization: Deep understanding of Kubernetes and Docker
  • Observability Tools: Experience with Dynatrace, Grafana, Splunk
  • Strong troubleshooting, analytical, and problem-solving skills
Preferred Qualifications
  • Experience in large-scale distributed systems
  • Exposure to enterprise environments and high-availability systems
  • Strong communication and stakeholder management skills
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary