More jobs:
Senior Reliability Engineer
Job in
Albuquerque, Bernalillo County, New Mexico, 87110, USA
Listed on 2026-06-05
Listing for:
RS21
Full Time
position Listed on 2026-06-05
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity
Job Description & How to Apply Below
The Senior Reliability Engineer supports RS21's space systems programs by owning the reliability, deployment, and operational continuity of cloud and hybrid infrastructure supporting real-time satellite data processing, telemetry pipelines, and ML-driven anomaly detection systems. This role is the engineering backbone behind RS21's operational space platforms, ensuring that systems deployed into defense and commercial satellite environments are stable, observable, and recoverable under real-world operational conditions.
At the Senior level, this person leads the SRE and Dev Ops practice for assigned space programs independently. They design monitoring and alerting architecture, own the deployment pipeline from code commit to operational environment, define SLOs and error budgets, and partner with software and data engineering teams to ensure systems are built to operate reliably from day one. They understand the constraints of classified and ops-floor environments and apply those constraints practically in every architectural and operational decision.
This role works closely with software engineers, data engineers, ML practitioners, and government stakeholders across RS21's DoD space systems portfolio, including deployments supporting AFRL, Space Force, and satellite operations floor environments. It requires someone who can hold both the engineering rigor of a senior SRE and the operational pragmatism required to deploy into highly regulated, mission-critical settings.
Key Responsibilities
Reliability Engineering & SRE Practice
- Define and maintain SLOs, SLAs, and error budgets for RS21's space systems platforms, in collaboration with engineering and government stakeholders.
- Lead incident response for operational platform failures, including triage, root cause analysis, blameless post-mortems, and follow-through on corrective actions.
- Architect and implement monitoring, alerting, and observability solutions using Cloud Watch, Cloud Trail, and custom telemetry pipelines that reflect the operational realities of satellite systems.
- Continuously improve system reliability through load testing, failure injection, chaos engineering practices, and proactive capacity planning.
- Ensure operational requirements including latency, throughput, and sustainment are reflected in platform architecture and delivery plans from the earliest design stages.
- Design, implement, and maintain cloud and hybrid deployment architectures for RS21's space systems platforms, including real-time ML inference pipelines, telemetry ingestion systems, and anomaly detection services.
- Own the deployment pipeline for space systems software across AWS Gov Cloud and on-premise or edge-adjacent environments connected to satellite operations floors.
- Architect containerized workloads using Docker and Kubernetes, including Helm chart development, cluster management, and workload scheduling for latency-sensitive satellite data processing.
- Contribute to and enforce infrastructure-as-code practices using Terraform or CDK, ensuring all infrastructure is versioned, auditable, and reproducible.
- Support classified and operationally sensitive deployments, applying zero-trust architecture principles and STIG compliance requirements throughout.
- Lead security architecture reviews for cloud and hybrid infrastructure supporting DoD space programs, applying zero-trust principles and hardening against STIG and FedRAMP requirements.
- Support ATO processes, RMF documentation, and accreditation activities in collaboration with security, legal, and government partners.
- Implement IAM policies, cross-account access controls, and audit logging architectures using AWS IAM, Cloud Trail, and Macie.
- Ensure all deployment environments maintain continuous compliance posture and flag deviations proactively before they affect accreditation status.
- Design, implement, and maintain CI/CD pipelines for space systems software using Git Hub Actions, Git Lab CI, or Azure Dev Ops, including automated testing, security scanning, and deployment gate controls.
- Establish and enforce branching strategies, deployment promotion gates, and rollback procedures appropriate to operationally sensitive space environments.
- Partner with software and data engineering teams to embed reliability and security practices into the development lifecycle rather than treating them as post-deployment concerns.
- Lead the adoption of Data Ops and MLOps pipeline standards for RS21's ML-based anomaly detection and predictive maintenance systems deployed in satellite contexts.
- Own the operational reliability of real-time data pipelines ingesting satellite telemetry, including Kinesis, MSK/Kafka, Lambda, and custom streaming architectures.
- Monitor and optimize pipeline performance, latency, and throughput to meet the real-time processing requirements of satellite operations floor…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×