Site Reliability Engineer - Observability
Listed on 2026-06-02
-
Software Development
Secure Every Identity, from AI to Human. Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real‑world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.
Key Responsibilities- Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- GCP Observability Engineering: Optimize the collection, processing, and storage of observability data to ensure high reliability and low latency of our Splunk and Grafana services.
- Incident Response: Participate in on‑call rotations and lead post‑incident reviews to drive systemic improvements and observability‑driven development.
- Automation: Eliminate toil by automating the deployment and scaling of observability agents and collectors.
- GKE: Minimum 5+ years of experience scaling and managing observability in a Google Cloud platform.
- Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources.
- SRE Mindset: Minimum 3+ years of experience in an SRE, Dev Ops, or Systems Engineering role with a focus on high‑availability systems.
- Programming Proficiency: Strong coding skills in Python or Go for building internal tools and automating workflows.
- Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
- Problem Solving: A data‑driven approach to debugging complex, cross‑service performance bottlenecks.
- Telemetry Standards: Hands‑on experience with Open Telemetry (OTel), Vector, or similar frameworks for instrumenting applications.
- Grafana Loki: Experience in migrating Splunk to Grafana Loki.
Experience managing observability native tools within AWS.
Additional Requirements- This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g., a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this form to request an accommodation.
Notice for New York City Applicants & Employees:
Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please note our full NYC AEDT Notice.
Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).