×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer, Datadog Specialist

Job in Denver, Denver County, Colorado, 80285, USA
Listing for: S&P Global
Part Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    IT Support, Systems Engineer, Cybersecurity, Cloud Computing
Salary/Wage Range or Industry Benchmark: 90000 - 122000 USD Yearly USD 90000.00 122000.00 YEAR
Job Description & How to Apply Below

About the Role

Grade Level (for internal use): 09

Site Reliability Engineer – Datadog Specialist

The Team: The IT Operations team at S&P Dow Jones Indices owns and operates the Production systems that power S&P DJI’s global index platforms. Our focus is reliability, visibility, and operational excellence, ensuring critical market-facing services remain available, observable, and resilient.

Responsibilities and Impact:

This role sits at the intersection of Site Reliability Engineering and Observability, focused on the hands-on implementation and operation of enterprise telemetry platforms. The position supports application, infrastructure, and production support teams by ensuring systems are well-instrumented, observable, and diagnosable in Production environments.

We are seeking a hands-on Observability Engineer with strong experience using Datadog and modern telemetry tools. This is not a general Dev Ops or platform engineering role; it is a tool-focused position responsible for implementing, operating, and continuously improving observability across applications, databases, and infrastructure within an established SRE framework.

Own and evolve end-to-end observability using Datadog:

  • APM, Distributed Tracing, DBM
  • Log ingestion, parsing, pipelines, and correlation
  • Synthetic monitoring, RUM (where applicable)
  • AI-driven alerting, Watchdog, and anomaly detection

Design and enforce monitoring standards:

  • Alert quality, signal-to-noise reduction
  • Golden signals, SLO/SLA-aligned monitoring
  • Consistent tagging, naming, and telemetry hygiene

Serve as the primary Datadog platform specialist:

  • Dashboards, monitors, service catalog, integrations
  • Cost visibility and optimization of logs/APM/DBM usage
  • Enablement and onboarding of application teams

Support production incident response:

  • Use Datadog, Splunk, and logs to triage incidents
  • Lead or support root-cause analysis and post-incident reviews
  • Improve observability gaps identified during incidents
  • Integrate telemetry with other ITSM tools such as Service Now and Pager Duty to support incident and change workflows

Partner with engineering teams to:

  • Improve instrumentation (APM, custom metrics, logs)
  • Adopt Open Telemetry where appropriate
  • Validate observability during releases and changes
  • Participate in DR testing, operational readiness reviews, and continuous improvement of SRE/IT Ops practices

Compensation/Benefits Information: (This section is only applicable to US candidates)

S&P Global states that the anticipated base salary range for this position is $90,000 to $122,000. Final base salary for this role will be based on the individual’s geographic location, as well as experience level, skill set, training, licenses and certifications.

In addition to base compensation, this role is eligible for an annual incentive plan. This role is not eligible for additional compensation such as an annual incentive bonus or sales commission plan.

This role is eligible to receive additional S&P Global benefits. For more information on the benefits we provide to our employees, please .

What We’re Looking For:

Basic

Required Qualifications:

  • 4+ years of experience in Observability, SRE, or Production Operations roles
  • Strong, hands-on Datadog experience: APM, logs, DBM, dashboards, monitors, integrations
  • Experience working with telemetry concepts:
    Metrics, logs, traces, log correlation, distributed tracing
  • Working knowledge of AWS environments (EC2, ECS, RDS, S3, Dynamo

    DB etc)
  • Ability to read and reason about application code (Java and/or Python) to support instrumentation, troubleshooting, and telemetry design (this is not a feature-development role)
  • Experience integrating monitoring tools with Pager Duty and Service Now
  • Strong troubleshooting, documentation, and communication skills

Additional

Preferred Qualifications:

  • Datadog certifications (APM, Logs, Fundamentals)
  • Exposure to Splunk, ELK, Dynatrace, or similar tools
  • Experience with Open Telemetry (instrumentation or collectors)
  • Familiarity with CI/CD pipelines and containerized workloads
  • Experience supporting mission-critical, high-availability systems
  • Financial services, index, or data-platform experience

Location: This role can be hybrid 2-3 days a week at most of…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary