×
Register Here to Apply for Jobs or Post Jobs. X

Principal Site Reliability Engineer; SRE

Job in Los Angeles, Los Angeles County, California, 90079, USA
Listing for: Stride Consulting
Full Time position
Listed on 2026-01-07
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Principal Site Reliability Engineer (SRE)

Principal Site Reliability Engineer (SRE)

Los Angeles, CA or Remote

Overview

InStride is a public benefit corporation

Partner with leading employers to unlock opportunities for their employees through access to education programs. Our mission is to empower partners and their employees to advance careers, elevate expertise, and achieve meaningful growth. If you are passionate about making a difference and driving educational and professional advancement, InStride is the place for you.

Candidates must be located in one of the following states to be considered eligible for employment: AZ, CA, CO, CT, FL, GA, IL, IN, KS, LA, MD, MA, MI, MO, NV, NH, NJ, NY, PA, OH, OR, TX, VA, WA, WI.

What we’re looking for

We’re looking for a Principal Site Reliability Engineer (SRE) to join InStride’s growing engineering team. This is a highly technical role for an individual contributor who thrives at the intersection of cloud architecture, automation, and reliability engineering. You will be the go-to AWS expert for complex initiatives, setting technical direction, and raising the bar for operational excellence across our platform. Every system you design, every automation you implement, and every safeguard you put in place will directly support our mission of expanding access to life-changing education for working adults around the globe.

Skills

we’d love to see
  • Cloud Architecture & Strategy:
    Design and optimize AWS environments that balance scalability, resilience, and cost efficiency for enterprise workloads.
  • Technical Leadership & Mentorship:
    Guide engineers on best practices in Kubernetes, Dev Sec Ops , and AWS-native design patterns.
  • Infrastructure as Code Mastery:
    Build reusable IaC libraries with AWS CDK, Terraform, or Cloud Formation to standardize deployments.
  • Security & Compliance by Design:
    Enforce least-privilege IAM, encryption-by-default, and policy-as-code guardrails to meet security and regulatory standards.
  • Observability & Reliability Engineering:
    Define SLIs/SLOs, manage error budgets, and implement monitoring strategies with Prometheus, Grafana, and AWS-native tools.
  • CI/CD Excellence:
    Optimize automated pipelines with Harness and Git Hub, enabling faster, safer, and more reliable software delivery.
  • Networking & Resilience:
    Architect secure, performant VPCs, load balancing, and multi-region failover strategies with AWS networking services.
  • Automation & Self-Service Enablement:
    Deliver developer-friendly automation and Internal Developer Portal (IDP) capabilities that empower teams to provision infrastructure without SRE intervention.
Who you are
  • 10+ years of experience in SRE, Dev Ops, or Platform Engineering roles operating production AWS workloads.
  • Hands-on expertise with AWS EKS, Kubernetes networking, Helm, autoscaling frameworks (Karpenter/Cluster Autoscaler), serverless architectures, and API Gateways.
  • Proven delivery of service mesh solutions (Istio, Linkerd, or AWS App Mesh) for secure and observable service-to-service communication.
  • Proficiency with IaC using AWS CDK (Type Script preferred/Python), Terraform, or Cloud Formation.
  • Strong programming and automation skills in Go, Python, or Type Script, with Bash as well.
  • Demonstrated experience implementing policy-as-code with OPA/Rego or similar tooling integrated into CI/CD pipelines.
  • Solid understanding of SLI/SLO/error-budget methodologies and hands-on experience with monitoring and alerting stacks (Prometheus, Grafana, Cloud Watch, Groundcover).
  • Deep knowledge of AWS security best practices, including IAM policies, encryption, OS hardening, and compliance enforcement.
  • Excellent communication skills with the ability to translate reliability metrics into business impact and guide incident/post-mortem discussions.
  • Experience mentoring engineers and influencing enterprise AWS and Dev Ops strategies without direct management responsibilities.
  • Familiarity with Internal Developer Portals (Backstage, Port, Cortex) and self-service automation is a strong plus.
How you will create impact
  • Elevate platform reliability:
    Design and operate multi-region, fault-tolerant systems that ensure InStride’s learning platform is always available for learners and partners.
  • Advance…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary