Infrastructure & SRE Engineer — Secure AI Platform,Equity Job San Francisco area,California USA,IT/Tech

Salary Range: $180K to $250K base plus equity Company Overview

The company is small, technical, and operating at a high‑ownership stage. They are already seeing strong enterprise demand, including regulated and defense‑adjacent use cases, and are now hiring foundational infrastructure engineers who can help scale the platform.

This is a strong fit for engineers who want to work close to the metal on Kubernetes, containers, networking, cloud infrastructure, secure execution environments, observability, and distributed systems.

Role 1:
Software Engineer, Infrastructure

The Infrastructure role is focused on building the core systems that power secure AI agent execution. This person will work on the platform layer that allows agents to run workloads safely, quickly, and reliably across cloud environments.

This role is a fit for someone who enjoys building foundational infrastructure, not just maintaining it. The ideal candidate has strong hands‑on experience with Kubernetes, Docker, Linux, networking, AWS or GCP, Terraform or Pulumi, and distributed systems.

What you will work on

Build and scale secure infrastructure for AI agent workloads
Design and operate sandboxed execution environments, containerized systems, and distributed job orchestration
Improve performance across the platform, with a constant focus on speed, reliability, and efficiency
Build secure VPC deployments for enterprise and regulated customers
Work on infrastructure involving Kubernetes, Docker, Docker‑in‑Docker, micro

VMs, Terraform, Pulumi, AWS, GCP, Grafana, and Prometheus
Debug complex production issues across containers, networking, Linux systems, cloud primitives, and distributed services
Own systems from design through production deployment

Strong fit signals

Strong production experience with Kubernetes, Docker, cloud infrastructure, and distributed systems
Deep knowledge in at least one infrastructure layer such as containers, networking, Linux, storage, or cloud primitives
Experience building infrastructure systems from scratch
Strong debugging ability below the surface of managed cloud tooling
Background from a strong infrastructure‑heavy company or top engineering environment
Comfortable working directly with founders in a small, fast‑moving startup

Role 2:
Site Reliability Engineer

The SRE role is focused on keeping our client’s production infrastructure reliable, observable, secure, and scalable as customer demand grows. This person will own reliability practices, monitoring, alerting, incident response, deployment safety, and automation.

This role is a fit for someone who has operated production systems at scale and can improve reliability without adding unnecessary process. The ideal candidate has hands‑on experience with Kubernetes, Terraform or Pulumi, observability, incident response, SLOs, cloud infrastructure, and automation.

What you will work on

Own production reliability across our client’s infrastructure platform
Build and improve monitoring, alerting, dashboards, and observability workflows
Lead incident response, root cause analysis, and postmortems
Automate deployments, scaling, provisioning, and recovery tasks
Improve developer experience through safer releases and better operational tooling
Work with Grafana, Prometheus, Terraform, Pulumi, Docker, Kubernetes, Python or Go, AWS, GCP, Azure, and Pager Duty‑style workflows
Help keep infrastructure highly available, secure, and ready for enterprise customers

Strong fit signals

3+ years of explicit SRE, production infrastructure, or platform reliability experience
Strong hands‑on experience with Kubernetes, Docker, Terraform or Pulumi, Grafana, and Prometheus
Experience with incident response, on‑call, SLOs, SLIs, alerting, and production debugging
Ability to automate reliability work with Python, Go, Bash, or infrastructure tooling
Experience scaling infrastructure, not just maintaining it
Background from a strong engineering company or infrastructure‑heavy environment

Ideal Candidate Background

Our client is prioritizing candidates with strong recent full‑time experience at respected infrastructure or engineering companies. Target backgrounds include companies such as:

Google, Meta, AWS,…

Infrastructure & SRE Engineer — Secure AI Platform, Equity