Systems Reliability Engineer
Listed on 2026-02-18
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Systems Reliability Engineer at Leidos summary:
The Systems Reliability Engineer at Leidos is responsible for ensuring the availability and performance of GEOAxIS ICAM services by troubleshooting incidents, performing root cause analysis, and implementing automated monitoring and remediation solutions. The role requires expertise in COTS product integration, scripting in Linux environments, and familiarity with Dev Ops tools and container technologies. Candidates must hold a TS/SCI clearance, have strong communication skills, and be able to work onsite in Chantilly, VA with the ability to support off‑hour calls for high‑priority incidents.
DescriptionGEOAxIS is looking for a Systems Reliability Engineer to work with the rest of the operations team to help drive program technical execution, innovation and modernization.
The GEOAxIS system provides Identity, Credential and Access Management for all web applications. GEOAxIS enables online, on‑demand access to NGA GEOINT content based on user’s authoritative attributes/roles. Our Mission is to maintain highly available ICAM services for protecting those critical mission applications across all security domains. The GxNext contract was awarded to Leidos in 2021 and runs until 2031.
ResponsibilitiesTroubleshoot and resolve system/operational incidents
Perform root cause analysis for operational incidents
Analyze system performance and take corrective actions as needed
Coordinate with mission partners, consumer applications, and other external entities in troubleshooting enterprise incidents and integration problems
Design, develop, and implement automated solutions to proactively monitor system health, identify performance bottlenecks, and resolve system issues through automated remediation, reducing manual intervention and improving system reliability.
Collect data, identify and analyze trends in Operational Incidents, and provide suggestions to mitigate common issues
Work closely with Ops Tech Lead and Development Lead to identify baseline enhancements to improve operational stability
Work with deployment and ISP teams to support baseline deployments to operations
Willingness to support off‑hour calls to assist in troubleshooting when high priority operational incidents occur
BS degree and 4+ years of prior relevant experience or Masters with 2+ years of prior relevant experience.
Requires a TS/SCI and ability to obtain and maintain a Polygraph post hire
Strong communication skills, both verbal and written
Ability to quickly learn new software and IT concepts
Strong problem solving and decision making skills
Self‑starter with an ability to work in a team environment and independently
Intimately familiar with the COTS products that the program leverages:
Oracle Identity and Access Management (IdAM) suite, Apache webgates, and Computer Associates (CA) API GatewayExperience scripting in a Linux environment using Shell and Bash
Deep understanding and background in COTS integration and custom code development
Experience in at least one of the following languages:
Bash
Python
Java
NodeJS
Local to DMV (DC/Maryland/Virginia) with ability to be physically present at the team’s work location in Chantilly
Strong interpersonal skills and proven track record of leading technical teams, conveying technical solutions to technical and non‑technical audiences
Candidate must be able to physically be in Chantilly, VA a minimum of 5 days a week to work with the team with occasional meetings in Reston and/or Springfield, VA
All candidates must be US CITIZENS to be considered for the position
Security+ certification within 60 days of hire
Kubernetes experience using Rancher RKE2 or Openshift
Strong understanding of containers
Experience containerizing existing custom software
Knowledge of common Dev Ops tools such as:
Ansible
ArgoCD
Gitlab
Nexus3
Kubernetes
Certifications in any of the following:
RHCSA/RHCE
AWS Solutions Architect/Dev Ops Engineer
CKA/CKAD
Familiarity with modern authentication flows such as SAML, OAuth2 and OIDC
At Leidos, we don’t want someone who "fits the mold"—we want someone who melts it down and builds something better. This is a role for the…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).