Splunk Production Services Engineer
Listed on 2026-07-04
-
IT/Tech
Cybersecurity, Cloud Computing: Infrastructure & Operations
Job Description Position Summary
We are seeking a highly skilled Splunk Production Services Engineer to support and operate a large-scale, business‑critical Splunk Enterprise and Splunk Cloud platform within a financial services environment.
Splunk is a foundational capability for the Information Security organization, enabling real-time security monitoring, threat detection, investigations, and regulatory reporting. This role is accountable for production stability, performance, data integrity, and security log readiness, requiring deep technical expertise and a strong operational ownership mindset.
The engineer will act as a trusted platform owner, ensuring Splunk availability, scalability, and reliability while partnering closely with Information Security, SOC, architecture, engineering, and operations teams.
Key ResponsibilitiesSplunk Platform Operations & Production Stability
- Own end-to-end production support for a highly distributed Splunk Enterprise and Splunk Cloud environment, including search head clusters, indexer clusters, deployers, deployment servers, and forwarders
- Ensure high availability, performance, and resiliency of the Splunk platform supporting security and operational use cases
- Lead incident response, troubleshooting, root cause analysis (RCA), and service restoration for Splunk and Cribl platforms
- Proactively identify risks, capacity constraints, and performance bottlenecks; implement preventive and tuning measures
- Serve as a key technical enabler for Information Security and SOC teams, ensuring timely, accurate, and reliable ingestion of security logs
- Onboard and normalize new data sources, supporting CIM compliance, field normalization, and SIEM best practices
- Tune ingestion pipelines using props.conf and transforms.conf, index-time and search-time optimizations
- Build and support dashboards, searches, and alerts that enable threat detection, investigations, and reporting
- Administer and support the Cribl environment for data routing, filtering, enrichment, and cost optimization
- Ensure data integrity, reliability, and performance across Splunk ingestion pipelines
- Collaborate with architecture teams on data flow strategies and onboarding standards
- Develop and maintain runbooks, SOPs, installation guides, and operational documentation
- Adhere to change management, incident management, and SLA commitments using ITSM tools
- Operate effectively in a regulated banking environment, supporting auditability and compliance requirements
- 5+ years of hands‑on experience administering large-scale Splunk Enterprise or Splunk Cloud environments
- Indexer clustering and search head clustering
- Universal and heavy forwarder architectures
- Smart Store / S3-compatible object storage
- SPL, search optimization, summary indexing, data model acceleration
- Deep experience with security log ingestion and SIEM use cases
- Proven ability to lead production incidents, perform RCA, and drive preventive solutions
- Strong Linux administration skills and experience managing Splunk configuration and apps
- Experience working in 24x7 production environments with high availability expectations
- Excellent written and verbal communication skills, with the ability to engage senior technical and business stakeholders
- A production owner’s mindset
- Deep technical credibility in Splunk and data pipelines
- Ability to operate calmly and decisively during high‑severity security and platform incidents
- Strong partnership with Information Security, where Splunk availability and data quality are mission‑critical to protecting the bank
- Splunk certifications such as Enterprise Admin or Enterprise Architect
- Experience with Splunk Enterprise Security (ES) and SOAR (Phantom or equivalent)
- Exposure to cloud logging and security architectures (AWS, Azure, GCP)
- Knowledge of Red Hat Enterprise Linux and Windows Server administration
- Experience with monitoring, APM, and event management tools
- Strong understanding of security, network, system, and database operations
- Ability to balance…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).