Site Reliability Engineer; SRE - Observability Specialist at Vodastra Las Vegas,NV Job Las Vegas area,Nevada USA,IT/Tech

Position: Site Reliability Engineer (SRE) - Observability Specialist at Vodastra Las Vegas, NV

Job Description

Site Reliability Engineer (SRE) - Observability Specialist

Location:

Las Vegas, NV 89101 (Onsite) Position Type:
Contract

Job Summary

We are seeking a skilled and passionate Site Reliability Engineer (SRE) with a strong focus on Observability to join our onsite team. In this role, you will design, implement, and maintain observability solutions to ensure the reliability, scalability, and performance of our systems. As an Observability Specialist, you will collaborate with development, operations, and business teams to drive improvements in system monitoring, logging, tracing, and alerting.

Key Responsibilities

Observability Architecture & Implementation

Design and implement observability solutions, including monitoring, logging, and distributed tracing, to provide actionable insights into system behavior and health.
Evaluate and integrate observability tools and platforms (e.g., Prometheus, Grafana, Elasticsearch, Datadog, New Relic).

Monitoring & Alerting

Define and maintain key performance indicators (KPIs) and service level objectives (SLOs) to measure system reliability and performance.
Develop robust alerting systems that minimize noise and provide meaningful, actionable alerts for critical issues.

System Reliability Engineering

Proactively identify system reliability risks through observability metrics and collaborate with teams to implement mitigation strategies.
Participate in root cause analysis (RCA) and implement solutions to prevent the recurrence of incidents.

Collaboration & Advocacy

Work closely with development and Dev Ops teams to embed observability best practices into the software delivery lifecycle.
Act as a champion for observability, educating teams on its importance and guiding them in its adoption.

Automation & Optimization

Automate repetitive observability tasks, such as dashboard creation, log parsing, and alert tuning.
Optimize monitoring systems to reduce overhead and enhance efficiency.

Documentation & Reporting

Create and maintain documentation for observability processes, tools, and integrations.
Develop dashboards and reports to provide visibility into system health and reliability for stakeholders.

Qualifications
Education

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).

Experience

Proven experience in Site Reliability Engineering, Dev Ops, or a similar role.
Extensive hands‑on experience with observability tools and platforms (e.g., Prometheus, Grafana, Splunk, Elastic Stack, Open Telemetry).
Experience with cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker).

Skills

Proficiency in programming and scripting languages (e.g., Python, Go, Bash).
Strong understanding of distributed systems, microservices architecture, and networking.
Expertise in designing monitoring systems with KPIs, SLOs, and SLIs.
Experience with incident response, postmortem analysis, and reliability reporting.

Preferred Qualifications

Certifications in cloud platforms or observability tools.
Familiarity with chaos engineering principles and practices.
Hands‑on experience with Infrastructure-as-Code (e.g., Terraform, Ansible).

Key Competencies

Analytical mindset with strong problem‑solving skills.
Effective communication and collaboration abilities.
Proactive and detail‑oriented with a passion for reliability and automation.

#J-18808-Ljbffr

Site Reliability Engineer; SRE - Observability Specialist at Vodastra Las Vegas, NV