Senior Site Reliability Engineer - Observability
Listed on 2026-04-29
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing
Position Overview:
We are seeking a highly technical Senior Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to deliver a world‑class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code—utilizing Terraform and strong coding proficiency in Go, Python, or Ruby—to automate the deployment of agents and collectors across complex distributed systems.
Key Responsibilities:
- Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services.
- Participate in on‑call rotations and lead post‑incident reviews to drive systemic improvements and observability‑driven development.
- Automate the deployment and scaling of observability agents and collectors to eliminate toil.
Required Skills & Experience (Essentials):
- Minimum 5+ years of experience scaling and managing Splunk Cloud at scale (1000+ services), including Workload Management (WLM) and HEC optimization.
- Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
- Minimum 3+ years of experience in an SRE, Dev Ops, or Systems Engineering role focused on high‑availability systems.
- Strong coding skills in SPL, Go, or Python for building internal tools and automating workflows.
- Deep understanding of Linux internals, networking (TCP/IP, DNS, load balancing), and container orchestration (Kubernetes/EKS).
- Data‑driven approach to debugging complex, cross‑service performance bottlenecks.
Bonus Skills (Nice‑to‑Haves):
- Hands‑on experience with Open Telemetry (OTel), Vector, or similar frameworks for instrumenting applications.
- Experience implementing a Splunk charge‑back app for usage reporting.
Additional Requirements:
- Must be able to access federal environments and/or protected federal data; requires U.S. Person status upon hire.
- Must attend in‑person onboarding in our San Francisco office the first week of employment.
Salary & Benefits:
The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is $147,000 USD – $202,000 USD. Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (PTO and parental leave) in accordance with our applicable plans and policies.
Benefits:
- Supporting Your Well‑Being
- Driving Social Impact
- Developing Talent and Fostering Connection + Community
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider qualified applicants with arrest and convictions records, consistent with applicable laws. If a reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding, please use this Form to request an accommodation.
Notice for New York City Applicants & Employees:
Okta may use Automated Employment Decision Tools (AEDT) as defined by New York City Local Law 144. For more information, please see our Personnel and Job Candidate Privacy Notice at
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).