Site Reliability Engineer; AHT - R10231023
Listed on 2026-05-04
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Position Overview
Northrop Grumman Defense Systems (NGDS), Beavercreek, OH, is hiring a Site Reliability Engineer. This role defines reliability from the user’s perspective, instruments and measures against targets, and builds tooling and runbooks that make failures recoverable. Candidates partner with dev teams to improve operational quality and lead problem resolution in production. The SRE will debug distributed systems, resolve incidents, and translate findings into lasting reliability improvements.
RelocationAssistance & Travel
Relocation assistance may be available. Travel: yes, 10% of the time. Clearance:
Top Secret required; no clearance at start. U.S. citizenship required.
- Incident Response – lead real-time detection, triage, and resolution of production incidents; conduct post-mortems and drive corrective actions.
- Toil Reduction – identify repetitive operational work, develop automation and runbooks, and implement CI/CD pipelines to reduce manual effort.
- Reliability Evaluations – define service level objectives (SLOs) and error budget policies; assess system reliability against targets using observability data.
- Platform Enablement – build and maintain shared tooling (e.g., Kubernetes clusters, Git Ops workflows); enable development teams with SDKs, instrumentation guidance, and reliability best practices.
- Engineer (Level
2): 2+ years related experience with a Bachelor’s degree in Computer Science or related STEM degree from an accredited institution. - Principal Engineer (Level
3): 5+ years related experience with a Bachelor’s degree in Computer Science or related STEM degree from an accredited institution; 3 years of Master’s degree experience. - U.S. citizenship with the ability to obtain a Top-Secret security clearance.
- Systems-thinking mindset and observability fundamentals.
- Basic software-engineering skills – automation, APIs, Git workflows, code reviews.
- Linux and networking fundamentals.
- Strong communication, collaboration, and organizational abilities.
- Specialty Skills (1 or more); e.g., Kubernetes, Git Ops, Open Telemetry, Grafana, CI/CD, scripting, developer enablement, or alerting and anomaly detection.
- SRE related certifications (e.g., Dev Ops Institute, AWS Solutions Architect, or equivalent).
- Hands-on experience with Python, Go, Kubernetes, ArgoCD, Git Lab/Git Hub, Jenkins, Docker, Locust/Gatling, Prometheus, Grafana.
- Experience with container orchestration, service mesh, and cloud-native infrastructure.
- Proven reliability improvements in large-scale distributed systems.
- Familiarity with security best practices for cloud and on-prem environments.
Primary Level Salary Range: 83,.00. Secondary Level Salary Range:
-. Benefits include health insurance, life and disability insurance, savings plan, company-paid holidays, and paid time off for vacation or personal business.
Northrop Grumman is an Equal Opportunity Employer. For a full EEO statement, please visit U.S. citizenship is required for all positions with a government clearance and other restricted positions.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).