×
Register Here to Apply for Jobs or Post Jobs. X

Infrastructure & SRE Engineer — Secure AI Platform, Equity

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Crosscheck Staffing
Full Time position
Listed on 2026-05-27
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Systems Engineer, Cloud Computing, Network Engineer
Salary/Wage Range or Industry Benchmark: 180000 - 250000 USD Yearly USD 180000.00 250000.00 YEAR
Job Description & How to Apply Below

Salary Range: $180K to $250K base plus equity Company Overview

The company is small, technical, and operating at a high‑ownership stage. They are already seeing strong enterprise demand, including regulated and defense‑adjacent use cases, and are now hiring foundational infrastructure engineers who can help scale the platform.

This is a strong fit for engineers who want to work close to the metal on Kubernetes, containers, networking, cloud infrastructure, secure execution environments, observability, and distributed systems.

Role 1:
Software Engineer, Infrastructure

The Infrastructure role is focused on building the core systems that power secure AI agent execution. This person will work on the platform layer that allows agents to run workloads safely, quickly, and reliably across cloud environments.

This role is a fit for someone who enjoys building foundational infrastructure, not just maintaining it. The ideal candidate has strong hands‑on experience with Kubernetes, Docker, Linux, networking, AWS or GCP, Terraform or Pulumi, and distributed systems.

What you will work on
  • Build and scale secure infrastructure for AI agent workloads
  • Design and operate sandboxed execution environments, containerized systems, and distributed job orchestration
  • Improve performance across the platform, with a constant focus on speed, reliability, and efficiency
  • Build secure VPC deployments for enterprise and regulated customers
  • Work on infrastructure involving Kubernetes, Docker, Docker‑in‑Docker, micro

    VMs, Terraform, Pulumi, AWS, GCP, Grafana, and Prometheus
  • Debug complex production issues across containers, networking, Linux systems, cloud primitives, and distributed services
  • Own systems from design through production deployment
Strong fit signals
  • Strong production experience with Kubernetes, Docker, cloud infrastructure, and distributed systems
  • Deep knowledge in at least one infrastructure layer such as containers, networking, Linux, storage, or cloud primitives
  • Experience building infrastructure systems from scratch
  • Strong debugging ability below the surface of managed cloud tooling
  • Background from a strong infrastructure‑heavy company or top engineering environment
  • Comfortable working directly with founders in a small, fast‑moving startup
Role 2:
Site Reliability Engineer

The SRE role is focused on keeping our client’s production infrastructure reliable, observable, secure, and scalable as customer demand grows. This person will own reliability practices, monitoring, alerting, incident response, deployment safety, and automation.

This role is a fit for someone who has operated production systems at scale and can improve reliability without adding unnecessary process. The ideal candidate has hands‑on experience with Kubernetes, Terraform or Pulumi, observability, incident response, SLOs, cloud infrastructure, and automation.

What you will work on
  • Own production reliability across our client’s infrastructure platform
  • Build and improve monitoring, alerting, dashboards, and observability workflows
  • Lead incident response, root cause analysis, and postmortems
  • Automate deployments, scaling, provisioning, and recovery tasks
  • Improve developer experience through safer releases and better operational tooling
  • Work with Grafana, Prometheus, Terraform, Pulumi, Docker, Kubernetes, Python or Go, AWS, GCP, Azure, and Pager Duty‑style workflows
  • Help keep infrastructure highly available, secure, and ready for enterprise customers
Strong fit signals
  • 3+ years of explicit SRE, production infrastructure, or platform reliability experience
  • Strong hands‑on experience with Kubernetes, Docker, Terraform or Pulumi, Grafana, and Prometheus
  • Experience with incident response, on‑call, SLOs, SLIs, alerting, and production debugging
  • Ability to automate reliability work with Python, Go, Bash, or infrastructure tooling
  • Experience scaling infrastructure, not just maintaining it
  • Background from a strong engineering company or infrastructure‑heavy environment
Ideal Candidate Background

Our client is prioritizing candidates with strong recent full‑time experience at respected infrastructure or engineering companies. Target backgrounds include companies such as:

Google, Meta, AWS,…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary