Engineer – Network Observability Platform and Automation
Listed on 2025-12-16
-
IT/Tech
Systems Engineer, Cloud Computing
Join to apply for the Manager – Network Observability Platform and Automation role at Digital Realty
Manager – Network Observability Platform and Automation2 days ago Be among the first 25 applicants
Join to apply for the Manager – Network Observability Platform and Automation role at Digital Realty
Job Description
Location:
Austin, Boston, Dallas, Ashburn, Chicago
Your role
A Manager – Network Observability typically leads a team of engineers focused on maintaining and improving the reliability, performance, and availability of an organization's systems and infrastructure. This role involves a mix of technical leadership, people management, and strategic planning, ensuring systems meet business and user needs.
Job Description
Position Title: Network Observability Platform and Automation
Location:
Austin, Boston, Dallas, Ashburn, Chicago
Your role
A Manager – Network Observability typically leads a team of engineers focused on maintaining and improving the reliability, performance, and availability of an organization's systems and infrastructure. This role involves a mix of technical leadership, people management, and strategic planning, ensuring systems meet business and user needs.
In this role, you will be responsible for oversight of Digital Realty’s Observability stack. The ideal candidate can demonstrate a unique blend of network engineering, network operations, and software understanding through the application of engineering principals. You will focus on delivering operational discipline and embrace key operational principals including automation, agile development, and scripting.
In this unique role, you will be part of the Observability team and build and maintain a global observability infrastructure. Ideal candidates for this role will bring an understanding of carrier class network infrastructure as well as experience working in a fast-paced development environment.
What You’ll Do
- Team Leadership:
- Manage and mentor a team of SREs, fostering their growth and development.
- Set team goals, prioritize projects, and ensure alignment with organizational objectives.
- Conduct performance reviews and provide constructive feedback.
- Build a positive and collaborative team environment.
- Technical Oversight:
- Oversee the design, implementation, and maintenance of reliable infrastructure and services.
- Collaborate with other teams to define requirements, standards, and best practices.
- Identify and address performance bottlenecks and ensure system stability.
- Implement and improve monitoring and observability frameworks.
- Operational Excellence:
- Manage on-call rotations and incident response to minimize downtime and ensure swift resolution.
- Drive automation efforts to reduce manual tasks and improve efficiency.
- Implement structured engineering and operations processes.
- Analyze and evaluate existing processes to identify opportunities for improvement.
- Strategic Planning:
- Develop and implement the long-term reliability strategy for the organization.
- Make decisions about build vs. buy for tools and technologies.
- Ensure alignment with business goals and customer expectations.
- Manage relationships with vendors and other stakeholders.
- Communication and
Collaboration: - Act as a bridge between technical teams and other departments.
- Represent the SRE team to stakeholders and communicate effectively.
- Collaborate with other engineering teams to ensure efficient workflows.
- Foster a culture of blameless postmortems and continuous learning.
Key
Skills and Experience:
- Strong technical background in distributed systems, cloud computing, and related technologies.
- Proven experience in managing and mentoring technical teams.
- Excellent problem-solving and communication skills.
- Experience with monitoring, automation, and incident management.
- Understanding of SLOs , SLIs , and SLAs .
- Familiarity with Dev Ops and Agile practices.
- 10+ years of operations and engineering experience
- 5+ years of of team building and management
- 3+ years of network engineering in large scale data center environments
- Bachelor’s degree in computer science (or equivalent training) preferred
- Expertise in Layer 3…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).