Senior Monitoring Engineer Computer/Software Springfield, VA
Listed on 2025-12-23
-
IT/Tech
Systems Engineer, Cloud Computing
Senior Monitoring Engineer
Elluminates Software provides innovation for Federal customers, including AI-driven SaaS, Cloud and On-Prem transformation, and advanced Infrastructure Automation.
We have worked closely with the industry to innovate and collaborate on solutions with national and global technology platform impact for over twenty years.
Job DescriptionThe Senior Monitoring Engineer is a senior-level technical expert who is accountable for the advanced troubleshooting, performance analysis and optimization of enterprise monitoring platforms. This position is responsible for the design, implementation, and ongoing enhancement of observability solutions in hybrid environments, including on-premises, cloud, and virtual infrastructure. The Senior Monitoring Engineer is responsible for the final escalation point for complex monitoring issues, collaborates with other teams to guarantee system reliability, and promotes best practices in observability.
Dutiesand Responsibilities
- Serve as the Tier 3 escalation point for issues related to any of the monitoring/ observability platforms and tools.
- Lead root cause analysis (RCA) for major incidents and recurring performance issues.
- Maintain, configure, and optimize monitoring tool deployments across cloud (e.g., AWS, Azure), on-premises, and VMware environments.
- Design and implement custom dashboards, synthetic monitoring, and service-level objectives (SLOs).
- Develop and maintain alerting strategies that reduce noise and ensure actionable notifications.
- Work closely with application, infrastructure, Dev Ops, and security teams to define monitoring requirements and integrate observability into CI/CD pipelines.
- Analyze metrics, logs, and traces to ensure end-to-end service visibility and performance optimization.
- Assist in onboarding applications and teams into the observability platform.
- Provide training and mentorship to Tier 1 and Tier 2 support teams.
- Ensure platform resilience, availability, and compliance with internal standards and SLAs.
- Participate in on-call rotations for high-priority incidents as needed.
- 5+ years of experience in IT infrastructure, application performance monitoring, or site reliability engineering (SRE).
- 2+ years of hands-on experience using platforms such as Dynatrace, Zabbix, and monitoring tools in VMware Cloud Foundation (VCF).
- Solid understanding of observability concepts including metrics, logs, traces, and user experience monitoring.
- Experience supporting complex, distributed systems in cloud and hybrid environments.
- Proficient with scripting and automation (e.g., Power Shell, Python, Bash, or Ansible).
- Strong understanding of networking, Linux/Windows systems, containers, and application architectures (microservices, APIs, etc.).
- Dynatrace Associate or Professional Certification.
- Experience with Dynatrace, including One Agent deployment, Smartscape, Pure Path, and Davis AI.
- Experience with integration of Dynatrace with tools such as Service Now, Splunk, Jira, or CI/CD pipelines.
- Experience with other observability tools (e.g., Prometheus, Grafana, New Relic, App Dynamics, Splunk, Elastic).
- Familiarity with Dev Ops practices and Infrastructure-as-Code (e.g., Terraform).
- Understanding of ITIL framework and change management processes.
- Excellent troubleshooting, problem-solving skills.
- Strong written and verbal communication.
- Ability to work independently and collaboratively across teams.
- Customer-focused mindset and attention to detail.
- Continuous learning and adaptability in a fast-paced environment.
- Bachelors and nine (9) years or more experience;
Masters and seven (7) years or more experience;
PhD or JD and four (4) years or more experience. Additional experience in lieu of degree.
- Type: Full-Time
- Clearance: Secret (with ability to obtain TS)
- Location: On Site in Springfield, Virginia
- Shift: Normal Business Hours
- Type of Travel: Local
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).