×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer​/DevOps Engineer

Job in Mountain View, Santa Clara County, California, 94039, USA
Listing for: Prophet Town
Full Time position
Listed on 2026-05-16
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability, Systems Engineer, Network Engineer
Salary/Wage Range or Industry Benchmark: 130000 - 180000 USD Yearly USD 130000.00 180000.00 YEAR
Job Description & How to Apply Below
Position: Senior Site Reliability Engineer / DevOps Engineer

Mountain View, United States | Posted on 05/12/2026

Location: Onsite - Mountain View, CA

Experience

Required:

5+ years

Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe

Role Overview

Seeking a Senior Site Reliability Engineer / Dev Ops Engineer to design, scale, and operate highly available global infrastructure supporting production systems across multiple international regions.

This role is for an engineer with 5+ years of experience building and running production‑grade cloud infrastructure. The right person understands where distributed systems fail and has learned the hard lessons that come from operating Kubernetes and cloud platforms at scale.

The ideal candidate has deep hands‑on experience with Kubernetes, ArgoCD, Terraform, CI/CD pipelines, AWS infrastructure, and multi‑region platform reliability. They should understand the limitations, sharp edges, and operational failure modes of these tools.

This is an onsite role working closely with platform engineering and leadership to build resilient global infrastructure.

What You’ll Do
  • Design and operate globally distributed production infrastructure across AWS regions and physical data center environments in South America and Europe
  • Build highly available multi-region systems with strong disaster recovery and failover strategies
  • Solve cross-region networking, latency, DNS routing, replication, and reliability challenges
  • Build, scale, secure, and troubleshoot production Kubernetes clusters
  • Handle cluster lifecycle management, upgrades, node failures, networking issues, storage problems, and control‑plane troubleshooting
  • Tune workloads for resiliency, scheduling efficiency, autoscaling behavior, and resource optimization
  • etcd instability
  • networking overlays and CNI failures
  • node pressure and eviction behaviorcluster upgrade regressions
Git Ops / ArgoCD Operations
  • Design and maintain Git Ops workflows using ArgoCD
  • Manage promotion pipelines across environments and regions
  • Resolve drift detection issues, sync conflicts, reconciliation failures, and deployment ordering challenges
  • Build safe rollback and progressive deployment strategies

Candidates should know why ArgoCD breaks, not just how to click “Sync.”

Infrastructure as Code
  • Build and maintain reusable Terraform modules for multi‑region infrastructure
  • Manage state strategy, workspace isolation, secrets handling, and provider complexity
  • Solve real‑world Terraform pain points, including:
    • state corruption and locking conflicts
    • module version drift
    • provider upgrade regressions
    • dependency graph surprises
    • cross‑account provisioning complexity
  • Build and optimize production CI/CD pipelines
  • Improve deployment speed, safety, and repeatability
  • Troubleshoot flaky pipelines, artifact inconsistencies, race conditions, environment drift, and rollback failures
Reliability & Observability
  • Establish SLIs/SLOs and production health standards
  • Build alerting, monitoring, tracing, and incident response workflows
  • Lead root cause analysis and postmortem improvements
  • Reduce operational toil through automation
Why This Role

You’ll own foundational infrastructure decisions for globally distributed systems and help build resilient platform capabilities at international scale.

This is a hands‑on engineering role for someone who wants meaningful ownership and complex technical problems.

Requirements Required Experience

5+ years in Site Reliability Engineering, Dev Ops, or Platform Engineering

Deep production experience with:

ArgoCD

Terraform

AWS

CI/CD systems

Preferred Experience
  • Experience operating infrastructure across multiple continents
  • Experience with hybrid cloud or physical data center integration
  • Strong networking knowledge, including BGP, VPNs, routing, DNS, and load balancing
  • Experience with security hardening and compliance in production systems
  • Software engineering background with Go, Python, or Bash
What “Senior” Means Here

You have enough production experience to have strong opinions because you have seen failures firsthand.

You know:

why Terraform plans sometimes lie

why ArgoCD syncs can fail for non‑obvious reasons

why Kubernetes upgrades can ruin your week

why “works in staging” means very little

why multi‑region failover diagrams often fail in production

why observability usually breaks exactly when needed most

You’ve solved these problems repeatedly and improved systems because of those lessons.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary