×
Register Here to Apply for Jobs or Post Jobs. X

Monitoring and Observability Engineer

Job in Pittsburgh, Allegheny County, Pennsylvania, 15258, USA
Listing for: Veterans Sourcing Group
Full Time position
Listed on 2026-06-01
Job specializations:
  • IT/Tech
    Systems Engineer, Cybersecurity, IT Support, Cloud Computing: Infrastructure & Operations
Job Description & How to Apply Below

Job Title
: Monitoring and Observability Engineer
Duration: 12+ Months (Possible extension)
Location:
Pittsburgh, PA 15258

Onsite Role (4 days a week)
Alt
ernate

Location:

Lake Mary, FL 32746 or New York, NY 10286


Responsibilities:

  • Seeking a skilled Cloud Monitoring and Observability Engineer (Azure) engineer to design, implement, and optimize end-to-end monitoring and observability solutions for a mission-critical application deployed in the Azure environment.
  • The ideal candidate has hands-on experience with enterprise monitoring tools—such as App Dynamics, Thousand Eyes, Net Scout, and Solar Winds (or equivalent alternatives)—and a strong background in building scalable, secure, and compliant observability stacks for cloud deployments.
  • Will collaborate closely with application engineering, cloud platform, network, and security teams to ensure comprehensive coverage across application, infrastructure, and network layers
  • Design and implement end-to-end monitoring, alerting, and observability for an Azure-hosted application across application, infrastructure, network, and user experience layers.
  • Configure, integrate, and maintain enterprise monitoring platforms to deliver actionable telemetry, performance baselines, and SLA/SLO tracking.
  • Build dashboards, health checks, synthetic tests, and alerting workflows; optimize alert fidelity to minimize noise and improve signal-to-noise ratio.
  • Establish and document telemetry standards (metrics, logs, traces), data collection strategies, and service-level indicators (SLIs) aligned to reliability objectives (SLOs).
  • Integrate Azure-native services (Azure Monitor, Log Analytics, Application Insights) with enterprise tools to provide unified visibility and correlation.
  • Implement network performance monitoring, path visibility, and internet/extranet testing using NPM tools (e.g., Thousand Eyes, Net Scout); leverage infrastructure monitoring platforms (e.g., Solar Winds) for device and service health.
  • Instrument applications with APM tools (e.g., App Dynamics, Dynatrace, New Relic) for business transaction monitoring, dependency mapping, and root-cause analysis; tune anomaly detection and policy thresholds.
  • Collaborate with Dev Ops/SRE teams to embed monitoring into CI/CD and infrastructure-as-code patterns; ensure new services adhere to observability standards.
  • Define runbooks and escalation paths; support incident response and post-incident reviews with data-driven insights and remediation recommendations.
  • Ensure monitoring solutions meet applicable security and compliance requirements; support audit requests with clear documentation and evidence.
  • Conduct capacity and performance trend analysis; recommend optimization, right-sizing, and resilience improvements.
  • Provide knowledge transfer, documentation, and training on monitoring tools, best practices, and operational workflows.
Education/

Experience:

  • 5+ years implementing enterprise monitoring/observability for cloud or hybrid environments, including mission-critical applications.
  • Demonstrable expertise with at least one tool in each category (or equivalent), including production deployments, advanced configuration, and operational use:
  • Application Performance Monitoring (APM):
    App Dynamics, Dynatrace, or New Relic.
  • Experience instrumenting services for business transaction tracing, code-level diagnostics, service maps, and anomaly detection.
  • Ability to design APM dashboards and create alert policies with appropriate thresholds and baselines.
  • Network Performance Monitoring (NPM) / Digital Experience Monitoring (DEM):
    Thousand Eyes, Net Scout, or Kentik.
  • Experience with synthetic tests, path visualization, packet-level analysis, and internet/WAN performance monitoring.
  • Ability to configure endpoint agents, BGP/DNS tests, and multi-hop path monitoring for user experience correlation.
  • Infrastructure Monitoring and Event Management:
    Solar Winds, Microsoft SCOM, Datadog, or Prometheus/Grafan.
    • Experience monitoring servers, containers, network devices, and cloud services; creating availability and capacity dashboards.
    • Proficiency with alert routing, de-duplication, and event correlation.
    • Strong Azure monitoring experience:
      Azure Monitor, Log Analytics (KQL), Application Insights, and integration with third-party tools.
  • Solid understanding of distributed tracing, metrics, and log aggregation; familiarity with Open Telemetry concepts and data pipelines.
  • Scripting/automation skills (Power Shell, Python, or Bash) to automate monitoring configuration, agent deployment, test creation, and reporting.
  • Networking fundamentals (DNS, BGP, HTTP, TLS, TCP/IP), CDN concepts, and WAN performance monitoring; ability to correlate app and network telemetry.
  • Experience supporting incident response and performance troubleshooting across applications, infrastructure, and network layers.
  • Excellent documentation and communication skills; collaborative mindset with engineering, operations, and security stakeholders.
Preferred:
  • Background in regulated environments (financial services,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary