More jobs:
Site Reliability Engineer
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-17
Listing for:
Xona Space Systems
Full Time
position Listed on 2026-06-17
Job specializations:
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Requirements
- Cloud Operations: 4+ years of experience managing production-grade environments in AWS, GCP, or Azure
- Orchestration:
Expert-level proficiency with Kubernetes (EKS), including networking, ingress controllers, and service mesh management - Automation:
Strong experience with configuration management and IaC (e.g., Terraform, Ansible, Helm) - Data Systems:
Deep knowledge of SQL and No
SQL database administration, focusing on replication, backup, and disaster recovery - Programming:
Proficiency in Python and C++ for developing internal tooling and automating complex operational workflows - Systems Internals:
Strong understanding of Linux networking, storage, and kernel tuning - (Desirable) Prior experience in Aerospace, Defense, or high-reliability sectors
- (Desirable) Familiarity with CCSDS standards or satellite ground station software
- (Desirable) Experience with secure, air-gapped, or hybrid-cloud deployments
- We are seeking a Site Reliability Engineer (SRE) to architect and manage the critical ground infrastructure for our satellite constellation. This role is responsible for the "last mile" of mission success: ensuring that the software controlling our orbital assets is highly available, scalable, and seamlessly integrated with Mission Operations
- You will own the lifecycle of our production environments, from automating deployments via Infrastructure as Code (IaC) to managing the core data systems that track constellation health and user activity
- Infrastructure as Code (IaC):
Design and maintain scalable, repeatable cloud infrastructure (AWS) using tools like Terraform or Cloud Formation - Mission Ops Integration:
Build and optimize the interfaces between core data management systems and Mission Operations software, ensuring reliable telemetry and command flows - User & Data Management:
Architect and maintain high-availability identity providers (IdP) and distributed databases to support global user access and real-time data processing - Automated Deployment Pipelines:
Create and manage robust CI/CD pipelines to deploy containerized applications into production with a focus on zero-downtime and rollback capabilities - Observability & Reliability:
Implement comprehensive monitoring, alerting, and logging (e.g., Prometheus, Grafana, ELK) to ensure 99.99% uptime for ground segment services - Scalability Engineering:
Perform capacity planning and performance tuning to handle the high-throughput data requirements of a growing satellite constellation
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×