Site Reliability Engineer; Space Communications Job Torrance area,California USA,IT/Tech

Position: Site Reliability Engineer (Space Communications)

Location

Los Angeles, CA

Employment Type

Full time

Location Type

On‑site

Department

Software

Compensation

$108K – $140K
• Offers Equity

Compensation at Northwood Space is based on role, level, location, and alignment with market data. Individual base pay is determined on a case‑by‑case basis and may vary depending on job‑related skills, education, experience, and technical expertise. In addition to base salary, Northwood Space offers long‑term incentives such as company stock options and discretionary performance bonuses. Benefits include equity, comprehensive health care, flexible vacation, retirement savings plans, and opportunities for professional development.

About

Northwood

Northwood is on a mission to transform connectivity between earth and space and bring the benefits of space to the masses through innovations in space communications technologies. If you like building quickly and seeing your work deployed in locations around the globe with real impact, we want you at Northwood.

Role

Northwood is looking for an Infrastructure Engineer to help build and maintain our observability infrastructure and ensure our global space communications network operates reliably. As we rapidly scale our operations and establish ground stations around the world, we need someone who can grow with us while building robust monitoring and logging systems and supporting our development teams with reliable CI/CD pipelines.

You’ll be responsible for building and maintaining our observability and monitoring infrastructure, while working closely with engineering teams to improve system reliability and deployment processes. This role offers significant growth opportunities as we scale, and you’ll collaborate with experienced engineers to establish monitoring best practices and incident response procedures. We’re seeking someone with 2-4 years of experience who thrives in a fast‑paced startup environment and is excited to take on diverse infrastructure challenges.

Responsibilities

Build and maintain observability stack with tools like Grafana, Prometheus, Loki, Vector, Cloud Watch, Victoria Metrics, etc. for metrics and log ingestion across environments
Support and improve CI/CD pipelines using Git Lab and ArgoCD, collaborating with development teams on deployment best practices
Help build and maintain cloud infrastructure using Terraform on AWS, contributing to the scalability and reliability of our space communication systems
Work with senior engineers to establish monitoring strategies, alerting, and incident response procedures
Deploy and manage Kubernetes applications using Helm charts, with focus on reliability and developer experience
Collaborate with engineering teams to implement performance monitoring and troubleshooting across microservices
Support identity and access management integration with Okta and Hashi Corp Vault
Assist in managing NixOS‑based infrastructure for reproducible system configurations
Participate in incident response efforts and contribute to post‑incident reviews and improvements

Basic Qualifications

2-4 years of hands‑on experience with infrastructure tools and monitoring systems in production environments
Experience with containerization (Docker, Kubernetes) and basic container orchestration
Familiarity with CI/CD tools (Git Lab, Jenkins, or similar) and infrastructure as code concepts
Experience with cloud platforms (AWS preferred) and basic infrastructure automation
Programming skills in Python or similar language and experience with configuration management
Startup mentality with ability to work in fast‑paced, high‑growth environments and take on diverse responsibilities
Experience with logging and metrics collection for production systems
Understanding of system reliability principles and interest in learning SRE practices

Preferred Qualifications

Some exposure to observability tools like Vector, Loki, Grafana, Prometheus, or similar monitoring systems
Experience with Terraform or other infrastructure as code tools
Familiarity with NixOS or other declarative system configuration approaches
Basic knowledge of Hashi Corp Vault, Okta, or similar identity/secrets management tools
Interest in distributed systems and…


Increase/decrease your Search Radius (miles)



Job Posting Language