×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in California, Moniteau County, Missouri, 65018, USA
Listing for: Zefr
Full Time position
Listed on 2026-02-23
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, Data Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 190000 - 210000 USD Yearly USD 190000.00 210000.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer
Location: California

What we do:

Zefr is the leading global technology company enabling responsible marketing in walled garden social environments. Zefr’s solutions empower brands to manage their content adjacency on scaled platforms such as You Tube, Meta, Tik Tok, and Snap, in accordance with industry standard frameworks. Through its patented AI technology, Zefr offers brands and agencies more accurate and transparent solutions for social walled gardens.

The company is headquartered in Los Angeles, California, with additional locations across the globe.

What you’ll do:

As a Site Reliability Engineer at Zefr, you’ll apply your expertise in cloud infrastructure, CI/CD, Observability, and core SRE concepts, to deliver high-quality, reliable, and scalable solutions. A significant aspect of this role involves working closely with the rest of Zefr’s Engineering and Data Science teams, ensuring the specialized infrastructure required for our services is robust, efficient, and scalable.

We’re looking for someone to combine their technical expertise with strong leadership and a passion for continuous improvement and innovation. Zefr wants a candidate that champions reliability as a product feature, and can translate complex technical concepts into strategy. This is a role where you'll shape how we build and operate systems at scale.

  • Support and build systems and tools that enable other engineers to generate, deploy, and manage product features and models both quickly and safely.
  • Deploy and support a multi-cloud, micro-service architecture, including infrastructure tailored for ML workloads, deployed via Github Actions, ArgoCD & Kubernetes.
  • Collaborate with other engineers, particularly the Machine Learning team, to architect secure, resilient, scalable, and cost-efficient applications and ML systems/pipelines in AWS and GCP.
  • Foster and push our Dev Ops culture and philosophy by encouraging continuous improvement across all engineering teams.
  • Proactively maintain the health of production environments, including monitoring application performance and resource utilization.
  • Participate in 24/7 on-call rotation, respond to system performance issues and outages.
  • Debug code at the application and infrastructure level.
  • Mature our CI/CD workflows and release process.
  • Maintains a forward-thinking approach, actively researching and proposing new solutions.
  • Propose and review Engineering Request for Comments (RFC) to drive Engineering architecture and practices.
Technology Stack at Zefr:

Core Infrastructure & Cloud Platforms:

  • Cloud Providers: Google Cloud Platform (primary), Amazon Web Services
  • Infrastructure as Code (IaC): Terraform, Terragrunt
  • Containerization & Orchestration: Docker, Kubernetes (experience with GKE and/or EKS expected), Helm, Kustomize
  • Service Mesh: Istio

CI/CD & Automation:

  • CI/CD Pipelines: Git Hub Actions
  • Git Ops / Continuous Delivery: Argo CD
  • Primary Scripting/Automation Language: Python

Observability & Monitoring:

  • Monitoring & Alerting: Prometheus, Chronosphere, Pagerduty
  • Telemetry Standards: Open Telemetry

Application & Data Ecosystem (Supporting):

  • Application Languages/Frameworks: Python, FastAPI, Flask, Node.js, React
  • Data Streaming: Apache Kafka
  • Data Processing/Transformation: Pandas, DBT
  • Workflow Orchestration: Apache Airflow, Ray

Data Stores & Databases:

  • Relational Databases: Postgre

    SQL (including managed versions like AWS Aurora, GCP Cloud SQL)
  • No

    SQL Databases:
    DynamoDB
  • Search Databases: Open Search
  • Vector Databases: Qdrant
  • Caching: Redis
  • Data Warehousing: Snowflake
What we’re looking for:
  • 7+ year job history designing, managing, deploying, and supporting Cloud Infrastructure in a production environment using major public cloud providers (GCP experience a huge bonus)
  • Knowledge of Git Ops including an understanding of modern CI/CD pipelines, techniques and technologies (Github Actions, Git Lab, Circle

    CI, Argo CD, Flux)
  • Proficiency with IaC and configuration management tools (Terraform, Terragrunt, Open Tofu, Crossplane, Pulumi)
  • Production experience architecting, managing, deploying, and supporting container based workloads into Kubernetes clusters
  • Strong problem-solving experience, focusing on automation
  • Proven track record of building and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary