Lead Systems Engineer
Listed on 2026-06-23
-
IT/Tech
Cybersecurity, Systems Engineer, IT Support, Cloud Computing: Infrastructure & Operations
Are you ready to make an impact at DTCC?
Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional development? At DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world that we serve.
The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance.
Pay and Benefits:
- Competitive compensation, including base pay and annual incentive
- Comprehensive health and life insurance and well-being benefits, based on location
- Pension / Retirement benefits
- Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
- DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).
The Impact you will have in this role:
At DTCC, the Observability team is at the forefront of ensuring the health, performance, and reliability of our critical systems and applications. We empower the organization with real-time access to infrastructure and business applications by using innovative monitoring, reporting, and visualization tools.
Our team collects and analyzes metrics, logs, and traces using platforms like Splunk and other telemetry solutions. This data is crucial for assessing application health and availability, and for enabling rapid root cause analysis when issues arise—helping us maintain resilience in a fast-paced, high-volume trading environment.
If you're passionate about observability, data-driven problem solving, and building systems that make a real-world impact, we’d love to have you on our team.
Primary Responsibilities:
As a member of DTCC’s Observability team, you will play a pivotal role in enhancing our monitoring and telemetry capabilities across critical infrastructure and business applications. Your responsibilities will include:
- Lead the migration from Open Text monitoring tools to Grafana and other open-source platforms.
- Design and deploy monitoring rules for infrastructure and business applications.
- Develop and manage alerting rules and notification workflows.
- Build real-time dashboards to visualize system health and performance.
- Configure and manage Open Telemetry Collectors and Pipelines.
- Integrate observability tools with CI/CD, incident management, and cloud platforms.
- Deploy and manage observability agents across diverse environments.
- Perform upgrades and maintenance of observability platforms.
Qualifications:
- Minimum of 6-8 years of related experience.
- Bachelor's degree preferred or equivalent experience.
Talent needed for success
- Demonstrable experience designing intuitive, real-time dashboards (e.g., in Grafana) that effectively communicate system health, performance trends, and business critical metrics.
- Expertise in defining and tuning monitoring rules, thresholds, and alerting logic to ensure accurate and actionable incident detection.
- Good understanding of both application-level and operating system-level metrics, including CPU, memory, disk I/O, network, and custom business metrics.
- Experience with structured log ingestion, parsing, and analysis using tools like Splunk, Fluentd, or Open Telemetry.
- Familiarity with implementing and analyzing synthetic transactions and real user monitoring to assess end-user experience and application responsiveness.
- Hands-on experience with application tracing tools and frameworks (e.g., Open Telemetry, Jaeger, Zipkin) to diagnose performance bottlenecks and service dependencies.
- Proficiency in configuring and using AWS Cloud Watch for collecting and visualizing cloud-native metrics, logs,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).