Manager,SRE FedRAMP Job Chicago area,Illinois USA,IT/Tech

Position: Manager, SRE FedRAMP-33539

Position Overview

Manager, Site Reliability Engineering – FedRAMP

Join us as we pursue our ground‑breaking vision to make machine data accessible, usable, and valuable to everyone. At Splunk, we are committed to our work, customers, having fun, and most significantly to each other's success.

The Splunk Observability Cloud provides full‑fidelity monitoring and fixing across infrastructure, applications, and user interfaces, in real‑time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud‑native systems engineers who use infrastructure‑as‑code, microservices, automation, and efficient design to build, operate, and scale our products.

Responsibilities

Lead a team of engineers who are passionate about large‑scale distributed systems for Splunk Cloud Observability in FedRAMP environments.
Manage across the organization to deliver quality products that delight Splunk’s passionate users.
Mentor and grow teams of tight‑knit engineers who are building a state‑of‑the‑art, cloud‑based environment for massive‑scale data processing.
Partner with Talent Acquisition to recruit, interview, and hire the best engineering talent to join Splunk’s growing SRE FedRAMP team.
Manage engineers to achieve more than they thought possible; enjoy managing and driving teams to success and are fulfilled through the success of others.
Manage a team working on reliability projects, including HA, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO, chaos engineering, application uptime and performance, capacity management & planning, SLIs, SLOs, error budgets, and monitoring dashboards.
Responsible for deployment and operations of large‑scale distributed data stores and streaming services, establishing design patterns for monitoring and benchmarking.
Establish and document production run books and guidelines for developers, tooling, toil reduction, runbooks & automation to handle production environments.
Incident management and improving MTTD/MTTR for services, cloud cost optimization.

Qualifications

8+ years of experience handling large‑scale cloud‑native microservices platforms.
2+ years of strong hands‑on management experience managing teams deploying, handling, and monitoring large‑scale Kubernetes clusters in the public cloud (AWS or GCP).
Experience with and leading a team in infrastructure automation and scripting using Python and/or Golang.
Experience managing remote teams.
Strong hands‑on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. for building observability for large‑scale microservices deployments.
Experience with deployment, operations, and performance management of large‑scale clusters such as Cassandra, Kafka, Elastic Search, Mongo

DB, Zoo Keeper, Redis, etc.
Excellent problem‑solving, triaging, and debugging skills in large‑scale distributed systems.
Preferred:
Familiarity working with and/or managing in compliance environments such as HIPAA, Gov Cloud, State Government, Federal Government, SOC2, or FedRAMP.
Preferred: AWS Solutions Architect certification.
Preferred:
Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications.
Preferred:
Experience with Infrastructure‑as‑Code using Terraform, Cloud Formation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
Preferred:
Experience with CI/CD frameworks and Pipeline‑as‑Code such as Jenkins, Spinnaker, Git Lab, Argo, Artifactory, etc.
Proven skills to effectively work across teams and functions to influence design, operations, and deployment of highly available software.
Bachelors/Masters in Computer Science, Computer Engineering, or related technical field, or equivalent practical experience.

Compensation & Benefits

Annual Base Pay: $ - $ USD.

U.S. employees have access to quality medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, short and long‑term disability coverage, basic life insurance, and numerous wellbeing offerings.

Employees receive up to twelve paid holidays per calendar year (including floating holiday), birthday day off, and up to 16 days of vacation time off per year for non‑exempt employees; exempt employees participate in Cisco’s flexible Vacation Time Off program.

Employees are eligible for sick time off (80 hours provided on hire date and each January 1st thereafter).

EEO Statement

Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language