×
Register Here to Apply for Jobs or Post Jobs. X

SRE - Observability

Job in Denver, Denver County, Colorado, 80285, USA
Listing for: Focused Labs
Full Time position
Listed on 2025-12-23
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, Cybersecurity, IT Support
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below
Position: Staff SRE - Observability

At Focused, we move quickly to deliver quality software that achieves client outcomes and meets their customer’s needs. We strategically partner with our clients to leverage our expertise in design and software, while our clients bring their own domain expertise. We work with a variety of clients from different industries, collaborating as we get new products to market, modernizing legacy systems, or helping teams learn the skills they need to be successful.

Our values:

  • Listen first
    • We are experts in product practices but life long learners in the domain of our customers. We research, collaborate, and understand.
  • Learn why
    • We ask questions and talk to users to understand problem spaces, objectives, and goals, which allows us to deeply invest and drive towards the outcomes of our clients.
  • Love your craft
    • We love diving into a variety of domains and solving problems. We take pride in delivering value, in communicating progress, and guiding our clients to success.

We are seeking an experienced Staff Observability Consultant with deep expertise in Open Telemetry and strong Platform Engineering capabilities to help organizations implement, optimize, and scale their observability infrastructure. This role requires a seasoned consultant who can design comprehensive telemetry strategies, implement distributed tracing solutions, establish robust monitoring practices, and interface closely with clients on the observability journey.

Key Responsibilities:

Open Telemetry & Observability

  • Design and implement end-to-end Open Telemetry solutions across diverse technology stacks
  • Configure and deploy Open Telemetry Collectors for efficient data collection, processing, sampling, and routing
  • Establish telemetry pipelines for metrics, traces, and logs across microservices architectures
  • Optimize collector configurations for performance, reliability, and cost-effectiveness
  • Augment existing infrastructure with with integrated observability solutions
  • Implement Infrastructure as Code (IaC) solutions using Terraform, Pulumi, Cloud Formation, etc.
  • Architect and manage Kubernetes clusters with comprehensive monitoring and logging
  • Build CI/CD pipelines with embedded observability and automated testing

Site Reliability Engineering (SRE)

  • Establish and maintain Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs)
  • Implement error budgets, toil reduction strategies, and capacity planning
  • Support incident response procedures and post-mortem processes
  • Deploy and manage observability infrastructure across AWS, GCP, and Azure
  • Establish security, compliance, and governance frameworks for telemetry data
  • Experience automating Agent Evaluations in CI/CD pipelines and observability backends.

Required Qualifications:

Core Observability & Open Telemetry

  • 3-7 years of experience in observability, monitoring, and distributed systems
  • Deep hands-on experience with Open Telemetry ecosystem, including SDKs, APIs, and specifications
  • Proficiency with Open Telemetry Collector configuration, processors, exporters, and receivers
  • Strong understanding of telemetry data models, semantic conventions, and instrumentation best practices
  • 5+ years of Platform Engineering or Dev Ops experience with focus on site reliability, observability, and incident response
  • Proficiency with Infrastructure as Code tools (Terraform, Pulumi, Cloud Formation, CDK)
  • Strong experience with CI/CD platforms (Git Hub Actions, Git Lab CI, Jenkins, ArgoCD)
  • Hands-on experience with major cloud providers (AWS, GCP, Azure) and their observability services
  • Experience with container technologies (Docker, Podman) and container registries
  • Knowledge of networking, security, load balancing, and distributed systems concepts

Site Reliability Engineering

  • Experience implementing SRE practices including error budgets and toil metrics
  • Proficiency in incident management, on-call procedures, and post-mortem culture
  • Experience with capacity planning, performance optimization, and scalability design

Programming & Automation

  • Proficiency in multiple programming languages preferred (Go, Python, Java, Node.js, Rust)
  • Strong scripting and automation skills (Bash, Python, Power Shell)
  • Understanding of software engineering…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary