Site Reliability Engineer Job Plano area,Texas USA,IT/Tech

Base pay range

$/yr - $/yr

About the Role

Job Title: Site Reliability Engineer

Key Responsibilities

Implement and manage full‑stack observability using Datadog across infrastructure, applications, and services.
Instrument monitoring agents in on‑premise, cloud, and hybrid environments.
Design and deploy monitoring solutions including dashboards, alerts, monitors, SLA/SLO definitions, and anomaly detection.
Integrate Datadog with third‑party systems such as Service Now, SSO, and ITSM tools.
Instrument applications and services using Open Telemetry to collect logs, metrics, and traces.
Build and maintain observability platforms providing deep system visibility.
Develop dashboards and alerts using Prometheus, Grafana, Splunk, and ELK Stack.
Automate monitoring configurations using Terraform, Ansible, and scripting.
Integrate observability into CI/CD pipelines (e.g., Jenkins).
Collaborate with Dev, SRE, and Dev Ops teams to align monitoring with business and operational goals.
Support incident response, root cause analysis, and reliability improvements.
Implement security and vulnerability management within observability platforms.

Must-Have

Skills & Qualifications

Strong hands‑on experience with Datadog (Logs, Metrics, APM, Distributed Tracing).
Hands‑on experience in cloud‑based observability solutions across AWS, Azure, and GCP.
Strong understanding of observability concepts (Logs, Metrics, Tracing).
Experience instrumenting systems using Open Telemetry.
Proficiency in Python and/or Go for scripting and automation.
Hands‑on experience with Terraform and Ansible (IaC).
Experience with Kubernetes and containerized environments.
Knowledge of CI/CD pipelines and automation tools (e.g., Jenkins).
Solid background in system operations and software engineering.
Experience with security and vulnerability management in observability platforms.

Nice‑to‑Have Skills

Experience with additional observability tools such as Prometheus, Grafana, ELK Stack, Splunk, New Relic, and AWS Cloud Watch.
Experience optimizing cloud agent instrumentation for performance and cost.
Exposure to large‑scale, distributed, or high‑availability systems.

Salary Range

The salary for this position is between $120,000– $130,000 annually. Factors which may affect pay within this range may include geography/market, skills, education, experience, and other qualifications of the successful candidate.

Benefits

Medical insurance, dental insurance, vision insurance, 401(k) retirement plan, long‑term disability insurance, short‑term disability insurance, 5 personal days accrued each calendar year, 10‑15 days of paid vacation time, 6 paid holidays and 1 floating holiday per calendar year, Ascendion Learning Management System

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Information Technology

Industries

Technology, Information and Internet

Want to change the world? Let us know.

Tell us about your experiences, education, and ambitions. Bring your knowledge, unique viewpoint, and creativity to the table. Let’s talk!

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language