IT Monitoring & Observability Engineer
Listed on 2026-06-06
-
IT/Tech
Systems Engineer, IT Support, Cybersecurity, Cloud Computing
Job Title
IT Monitoring & Observability Engineer
LocationJackson, MI 49201
ModeContract Onsite
Position SummaryWe are seeking an IT Monitoring / Full-Stack Observability Engineer to design, implement, and operate enterprise monitoring solutions across on-premises infrastructure and Microsoft Azure cloud environments. This role will be responsible for delivering end-to-end observability covering infrastructure, applications, services, logs, metrics, traces, and alerting with a primary focus on Solar Winds (self-hosted) for on-prem monitoring and Azure Monitor for cloud monitoring.
The ideal candidate has strong technical depth in monitoring architecture, alert strategy, dashboarding, and operational response, and can partner with infrastructure, application, and security teams to ensure high availability and performance.
- Monitoring / Observability Platform Ownership
- Own and enhance enterprise monitoring using:
- Solar Winds (Self-Hosted) for on-prem infrastructure monitoring (network, server, storage, virtualization).
- Azure Monitor for Azure cloud monitoring (metrics, logs, alerts, workbooks).
- Define and maintain standards for instrumentation, telemetry collection, alerting, and dashboards across hybrid environments.
- On-Prem Monitoring (Solar Winds)
- Administer and optimize Solar Winds platform components (e.g., Orion modules where applicable).
- Configure monitoring for:
- Network devices (SNMP/WMI/ICMP), routers/switches/firewalls
- Windows/Linux servers
- VMware/Hyper‑V and storage platforms (as applicable)
- Build actionable alerts with escalation policies, suppress ions, dependencies, and maintenance windows.
- Troubleshoot polling, credentialing, discovery, and performance issues for Solar Winds services and SQL back‑end (as needed).
- Azure Monitoring (Azure Monitor / Log Analytics / Application Insights)
- Implement Azure‑native monitoring strategy using:
- Azure Monitor Metrics + Alerts
- Log Analytics Work spaces
- Application Insights (where applicable)
- Workbooks for visualization and reporting.
- Create and maintain KQL queries for logs/insights and operational analytics.
- Establish alert rules for Azure resources (VMs, AKS, App Services, Functions, SQL, Storage, networking, etc.).
- Full‑Stack Observability Practices
- Drive adoption of observability best practices:
- Golden signals (latency, traffic, errors, saturation)
- SLOs/SLIs (where applicable)
- Noise reduction and alert fatigue prevention
- Ensure dashboards tell an operational story (health, performance, capacity, and trends).
- Support incident response by correlating signals across on‑prem + cloud.
- Automation, Integration & ITSM
- Automate monitoring configuration and reporting through scripting (Power Shell, Python) and IaC (Terraform/Bicep as appropriate).
- Integrate monitoring alerts with ITSM tools (e.g., Service Now/Jira/Remedy) and collaboration channels (Teams/Email).
- Support continuous improvement through post‑incident reviews and monitoring enhancements.
- Documentation & Governance
- Maintain runbooks, SOPs, monitoring standards, and service maps.
- Ensure monitoring adheres to security and compliance requirements (access controls, logging retention, least privilege).
- 8+ years experience in IT monitoring / observability / infrastructure operations in enterprise environments.
- Hands‑on experience with Solar Winds (Self‑Hosted) administration and monitoring configuration.
- Hands‑on experience with Azure Monitor including Log Analytics, alerts, and workbooks.
- Strong working knowledge of Windows Server and Linux fundamentals.
- Networking concepts (TCP/IP, DNS, routing, SNMP, firewalls).
- Monitoring protocols and methods (SNMP, WMI, agents, APIs, syslog).
- Experience building dashboards, defining alert thresholds, tuning signals, and reducing noise.
- Proficiency with KQL (Kusto Query Language) for Log Analytics queries.
- Strong troubleshooting and root‑cause analysis skills across hybrid systems.
- Ability to work in on‑call/after‑hours rotations (as applicable).
- Solar Winds module experience (as applicable): NPM, SAM, NCM, VMAN, Net Flow, etc.
- Azure services monitoring experience: AKS, App Service, Functions, SQL MI/DB, Key Vault, Storage, Front…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).