SRE Technical Manager - Transport Job Norfolk area,Virginia USA,IT/Tech

Overview

Leidos currently has an opening on the Service Management Integration and Transport (SMIT) Contract for a Site Reliability Engineering (SRE) Technical Manager. This role leads the Transport SRE team to ensure the reliability, performance, and scalability of critical systems across 5-6 SRE Pods. You will collaborate with engineering, product, and operations to implement best practices in automation, incident management, and system monitoring, and support, migrate, automate, and optimize software development and deployment processes, infrastructure as code, and maturation of the SRE program under the Director of Site Reliability Engineering.

The SRE Technical Manager will mentor and coach technical staff, perform collaborative code reviews, and partner to align reliability objectives with business goals within the Navy’s largest IT services program.

What You’ll Get to Do

Manage and mentor 5-6 SRE teams (pods) and 60+ FTEs, providing guidance, setting performance expectations, and fostering professional development.
Collaborate with SRE Resource Managers to staff resources and achieve reliability and scalability goals for your SRE vertical teams.
Manage P&L for the Transport Services vertical, including budget planning, tool selection, and infrastructure investments to meet reliability and scalability needs.
Participate in performance reviews, interviews, and development planning with team members.
Oversee reliability, availability, and performance of critical systems by leading SRE teams in monitoring, incident response, and performance optimization.
Ensure adherence to best practices for system reliability, automation, and operational efficiency.
Drive continuous improvement by analyzing metrics (SLOs, MTTR, MTBF) and identifying enhancement opportunities.
Collaborate with operations, quality, cybersecurity, and other SRE teams to define and enforce SLOs and manage error budgets.
Act as a liaison between the SRE team and other departments to prioritize reliability and operational needs in product development.
Define the SRE strategy with senior leadership and set long-term reliability goals aligned with business objectives.
Lead efforts to reduce operational toil through automation and build/enhance automation tools for infrastructure management, monitoring, and incident response.
Oversee development and adoption of Infrastructure as Code (IaC) tools, CI/CD pipelines, and other automation processes.
Ensure SRE practices align with organizational security requirements and collaborate with security teams to integrate reliability-focused security practices.
Proactively address potential issues to meet or exceed service levels and align reliability expectations with stakeholders.
Collaborate with Developers, Security, and Operations to continuously deliver products and increase value for the organization and customers.
Advocate Agile and modern SRE practices, providing technical guidance on best practices and staying current with the latest SRE trends.

Required Qualifications

B.S. degree (or equivalent) in Cybersecurity, Information Security, IT, Network Engineering, Computer Science, or related field; or Master’s degree with 6+ years of relevant experience and 8-10 years of SRE or Dev Ops experience, including at least 4 years in a leadership role.
DoD Secret Clearance.
Minimum DoD 8570.01 IAT Level II certification required prior to onboarding and must be maintained while supporting the SMIT Contract.
Ability to support program execution in classified environments and to access SIPRNet from an NMCI location on short notice (local travel).
Excellent written and oral communication skills, including producing technical analyses/reports, presentations, and executive-level briefings with internal and external stakeholders.
Ability to review requirements, understand capabilities, and propose solutions that satisfy customer needs.
Ability to work in a highly collaborative, forward-thinking, and innovation-driven environment.
Proven experience managing teams responsible for large-scale, distributed systems with high reliability and performance demands.
Strong track record of incident management, postmortems, and…


Increase/decrease your Search Radius (miles)



Job Posting Language