Observability and Monitoring Engineer
Listed on 2025-12-27
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
e&e is seeking an Observability and Monitoring Engineer for a hybrid contract opportunity in West Des Moines, IA!
The Observability and Monitoring Engineer is responsible for designing, building, and maturing enterprise-wide monitoring, logging, alerting, and observability capabilities across a cloud-based technology environment. This role defines the overall observability strategy, architecture, and implementation standards that enable proactive issue detection, faster troubleshooting, and data‑driven operational insights across applications, infrastructure, operating systems, databases, file transfers, and batch processes. The ideal candidate brings strong hands‑on engineering experience, architectural leadership, and the ability to integrate and rationalize multiple monitoring tools into a cohesive observability framework.
Responsibilities- Define and implement standards for logs, metrics, traces, event correlation, and alerting across multiple environments.
- Design and build centralized dashboards and alerting policies providing unified visibility across:
- Applications and services
- Operating systems
- Relational databases
- File transfer platforms and managed transfer tools
- Batch jobs and scheduled processes
- Develop actionable, noise‑free alerting thresholds, escalation policies, and operational runbooks.
- Integrate and manage multiple monitoring and logging platforms into a cohesive observability ecosystem.
- Assess existing tools and recommend consolidation, optimization, or modernization where appropriate.
- Manage the lifecycle, configuration, tuning, and health of observability platforms.
- Automate monitoring deployments using Infrastructure as Code and CI/CD pipelines; create reusable templates and standards to enable rapid onboarding of new applications.
- Build self‑service dashboards and reporting for both technical and business stakeholders.
- Define and maintain SLOs, SLIs, and reliability KPIs for critical services.
- Partner with application, infrastructure, and security teams to reduce MTTR and improve system reliability.
- Participate in incident response, root cause analysis, and problem management activities.
- Provide technical leadership and mentoring, advising teams on observability architecture and best practices.
- Develop and maintain system documentation and contribute to technical planning and strategy sessions.
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience implementing monitoring and observability solutions, including extensive hands‑on experience with Dynatrace.
- Experience working with monitoring and logging platforms such as Zabbix, Graylog, Splunk, Solar Winds, or comparable tools.
- 5+ years of hands‑on experience with cloud platforms and services, with strong emphasis on AWS architectures.
- Deep understanding of observability concepts including metrics, logs, traces, distributed tracing, and event correlation.
- Proven experience building dashboards and KPIs across application, infrastructure, and database layers.
- Strong scripting and automation skills (Python, Bash, Power Shell).
- Experience with Infrastructure as Code tools such as Terraform and/or Cloud Formation.
- Solid understanding of systems architecture, network monitoring, and performance tuning.
- Familiarity with ITIL incident and problem management processes.
- Experience using AI‑enabled tools to enhance observability, alerting, and operational insights.
- Experience with containerized and microservices‑based architectures.
- Hands‑on experience with Open Telemetry, Prometheus, Grafana, or similar observability frameworks.
- Cloud Services:
Compute, storage, databases, serverless, and container services - Monitoring & Observability Tools:
Dynatrace, Cloud Watch, Zabbix, Solar Winds, Graylog, Splunk - Configuration Management:
Ansible, Puppet, Chef - CI/CD Tools:
Jenkins, Quick Build, Bitbucket - Scripting
Languages:
Python, Power Shell, Bash - Infrastructure as Code:
Terraform, Cloud Formation
Mid‑Senior level
Employment typeContract
Job functionInformation Technology
IndustriesIT Services and IT Consulting
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).