×
Register Here to Apply for Jobs or Post Jobs. X

SME SRE Observability

Job in Fremont, Alameda County, California, 94538, USA
Listing for: Info Way Solutions LLC
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Job Description & How to Apply Below
Job Title: SME – SRE Observability Engineer
Location: Minnesota (Onsite – 4 to 5 days/week)

Job Summary:
We are seeking an experienced Subject Matter Expert (SME) in Site Reliability Engineering (SRE) with a strong focus on Observability. The ideal candidate will be responsible for designing, implementing, and optimizing observability frameworks to ensure high system reliability, performance, and scalability in a production environment.

Key Responsibilities:
  • Lead the design and implementation of observability solutions including metrics, logging, and tracing.
  • Act as an SME for SRE best practices, ensuring system reliability, availability, and performance.
  • Develop and maintain dashboards, alerts, and monitoring strategies.
  • Collaborate with development, Dev Ops, and infrastructure teams to improve system visibility.
  • Perform root cause analysis (RCA) and drive incident resolution.
  • Optimize system performance and reliability through proactive monitoring.
  • Implement automation to improve operational efficiency and reduce manual intervention.
  • Define and track SLIs, SLOs, and SLAs.
Required Skills & Qualifications:
  • Strong experience in Site Reliability Engineering (SRE) concepts and practices.
  • Deep expertise in Observability tools (e.g., Prometheus, Grafana, ELK Stack, Datadog, Splunk, or similar).
  • Experience with cloud platforms (AWS, Azure, or GCP).
  • Proficiency in scripting/programming (Python, Bash, or similar).
  • Hands-on experience with monitoring, alerting, and logging frameworks.
  • Strong troubleshooting and performance tuning skills.
  • Experience with CI/CD pipelines and automation tools.
Preferred Qualifications:
  • Experience working in high-availability, distributed systems.
  • Knowledge of containerization and orchestration tools (Docker, Kubernetes).
  • Prior experience as an SRE SME or Lead.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary