More jobs:
Lead Observability Engineer; Grafana Cloud
Job in
Coppell, Dallas County, Texas, 75019, USA
Listed on 2026-04-18
Listing for:
DTCC
Full Time
position Listed on 2026-04-18
Job specializations:
-
IT/Tech
Cybersecurity, Systems Engineer, IT Support, Cloud Computing
Job Description & How to Apply Below
Are you ready to make an impact at DTCC?
Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional development? At DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world that we serve.
The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance.
Pay and Benefits:
- Competitive compensation, including base pay and annual incentive
- Comprehensive health and life insurance and well-being benefits, based on location
- Pension / Retirement benefits
- Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
- DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).
Being a member of IT Fin Sight Delivery team, you will be for an Observability Engineer that will be performing the role for the Observability Engineering team. The team maintains the firm's application and infrastructure monitoring tools, reporting and analytics tools. This position is primarily for working on monitoring tools like Grafana Cloud (SaaS), Dynatrace, Open Text Operations Bridge Manager and Splunk Cloud.
Your Primary Responsibilities:
- Design and Implementation:
Developing and implementing monitoring solutions for applications and infrastructure to ensure high availability and performance. - Monitoring and Analysis:
Continuously monitoring system performance, identifying bottlenecks, and analyzing trends to proactively address potential issues. - Incident Management:
Responding to and managing incidents, performing root cause analysis, and implementing corrective actions to prevent recurrence. - Collaboration:
Working closely with development, operations, and other IT teams to ensure monitoring solutions are integrated and aligned with business needs. - Tool Management:
Selecting, configuring, and maintaining monitoring tools and platforms, ensuring they are up-to-date and effective. - Reporting:
Generating and presenting reports on system performance, incidents, and trends for stakeholders. - Optimization:
Continuously improving monitoring processes and tools to enhance efficiency and effectiveness. - Compliance and Security:
Ensuring monitoring solutions comply with security policies and regulatory requirements. - Working on engineering and development focused projects from start to finish with minimal supervision
- Providing technical and operational support for our customer base as well as other technical areas within the company that utilize our tools
- Risk management functions such as reconciliation of vulnerabilities, security baselines as well as other risk and audit related objectives
- Administrative functions for our tools such as keeping the tool documentation current and handling service requests
- 24x7 on-call L3 support on a rotational schedule with other team members
- Participate in user training to increase awareness of observability solutions
- Ensuring incident, problem and change tickets are addressed in a timely fashion, as well as escalating technical and managerial issues
- Following DTCC's ITIL process for incident, change and problem resolution
*
* NOTE:
The Responsibilities of this role are not limited to the details above. **
Qualifications:
- Min of 6 years of relevant experience
- Bachelors' degree in Computer Science or any technical field and/or equivalent experience
- Minimum of 6 years of experience in IT infrastructure, application monitoring, and performance management.
- 5+ years' experience of Splunk engineering/support in a production environment. This includes all phases of lifecycle management: planning, design, deployment, upkeep and retirement
- Should have a developed competency with monitoring solutions in a production environment
- Monitoring Tools:
Proficiency in using monitoring tools such as Grafana Cloud, Dynatrace, Open Text Operations Bridge Manager, Splunk, and others. - Scripting and Automation:
Knowledge of scripting languages like Python, Bash, or Power Shell to automate monitoring tasks and processes. - Cloud Platforms:
Experience with cloud platforms such as AWS, Azure, or Google Cloud, including their monitoring and management services. - Networking:
Understanding of network protocols, configurations, and troubleshooting. - System Administration:
Strong background in system administration for both Windows and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×