×
Register Here to Apply for Jobs or Post Jobs. X

Lead Site Reliability Engineer

Job in Washington, District of Columbia, 20022, USA
Listing for: Federated IT
Full Time position
Listed on 2026-02-23
Job specializations:
  • IT/Tech
    Systems Engineer, Cybersecurity, IT Support, Cloud Computing
Salary/Wage Range or Industry Benchmark: 120000 - 150000 USD Yearly USD 120000.00 150000.00 YEAR
Job Description & How to Apply Below

About Bridge Defense.

Bridge Defense is redefining how modern defense technology is delivered. Based in Washington, D.C., we are built for the dynamic mission environment facing the Department of Defense, the Intelligence Community, and federal law enforcement agencies. We provide full-spectrum national security solutions that combine secure infrastructure, cleared talent, and mission‑ready software to meet evolving defense challenges. Our services include secure software development in classified environments and the design and implementation of advanced IT and cybersecurity capabilities ranging from secure cloud architectures and enterprise infrastructure to data center operations, scientific analysis, and cutting‑edge cyber defense.

We are led by technologists and veterans with firsthand mission experience, which enables us to understand both the operational realities and the innovation needed to succeed. Our approach is agile and outcome‑based, delivering results in weeks rather than months whenever possible.

At Bridge Defense we value people, integrity, We foster an environment where innovation thrives in support of traditional mission requirements. Our team members receive competitive compensation, robust benefits, professional development and certification opportunities, and clear paths for growth while working on the nation’s most critical projects.

Core Values:
  • Innovation & Responsiveness:
    We push beyond legacy models with efficient, tech‑led solutions built to scale and evolve.
  • Trusted Performance:
    Security, compliance, and deep experience in delivering to demanding environments guides all we do.
  • Mission Focused Expertise:
    From veteran leadership to cleared engineers, our people understand both the technology and the mission.
About the Role

As the Lead Site Reliability Engineer for our Compute Bridge Engagement, you’ll be responsible for the reliability, scalability, and performance of one of the largest hardware and AI infrastructure efforts in the U.S. defense sector. You will lead the deployment, management, and automation of a high‑performance computing mesh across multiple secure environments, ensuring operational excellence and mission continuity for a 9‑figure government program.

This is a hands‑on engineering leadership role that bridges physical infrastructure and modern Dev Ops automation, ideal for someone who thrives at the intersection of hardware systems, distributed computing, and AI/ML workflows.

What You’ll Do
  • Lead infrastructure design, deployment, and operations for Compute Bridge hardware clusters across secure and distributed environments
  • Install and configure physical systems, including high‑density GPU servers, networking gear, and storage arrays
  • Build and deploy secure Linux images and containerized workloads using Open Shift and other orchestration platforms
  • Develop and manage automation pipelines for provisioning, configuration management, and monitoring using modern Dev Ops tool chains (Ansible, Terraform, etc.)
  • Operate and maintain distributed networking meshes across multiple classified and unclassified domains
  • Implement and manage out‑of‑band management tools (IMPI, iDRAC, BMC, etc.) for remote troubleshooting and control
  • Integrate and optimize NVIDIA GPU infrastructure for AI/ML training and inference workloads
  • Collaborate with mission engineers, software teams, and government operators to ensure system readiness and performance
  • Provide on‑site technical leadership for deployments, troubleshooting, and continuous improvement
  • Mentor junior engineers and establish operational best practices across the Compute Bridge program as the contract grows
What You’ll Bring
  • 3+ years of experience in site reliability, systems engineering, or hardware operations roles
  • Deep expertise with physical infrastructure: server racking, cabling, diagnostics, and troubleshooting
  • Strong experience with Linux systems administration, imaging, and automated deployment
  • Hands‑on experience managing large‑scale clusters or distributed systems in Open Shift or Kubernetes environments
  • Familiarity with Dev Ops automation (Ansible, Terraform, CI/CD pipelines)
  • Experience configuring and managing networking and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary