×
Register Here to Apply for Jobs or Post Jobs. X

Lead Associate Principal, Software Engineering: DevOps

Job in Aurora, Kane County, Illinois, 60505, USA
Listing for: The Options Clearing Corporation (OCC)
Full Time position
Listed on 2026-06-01
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

[Required] AWS EC2, Kubernetes, Kafka, Jenkins, Terraform, Ansible, Hashicorp Vault

What You’ll Do

Successful candidate will collaborate with various product, infrastructure, operations, security, and production control teams to elicit and fulfill technical requirements, while driving site reliability, system observability, and operational excellence across the platform.

Primary Duties and Responsibilities
  • Guides the implementation using CI/CD pipelines in Kubernetes environment
  • Directs review, configuration, and execution of Terraform and Ansible automation pipelines delivered by product teams
  • Guides the setup of common infrastructure platforms like multi‑region Kubernetes and Kafka clusters
  • Elicits requirements for application deployment and sizing to manage expected workloads
  • Defines and enforces Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets in collaboration with product teams
  • Leads blameless post‑mortems and drives resolution of action items to reduce repeat incidents
  • Designs and implements observability frameworks covering metrics, logs, and distributed tracing across all platform services
  • Drives toil reduction initiatives by identifying and automating repetitive operational work Partners with product teams to embed reliability requirements and non‑functional requirements (NFRs) early in the software development lifecycle
  • Monitors application performance and tunes systems working with product teams
  • Confers with product team leads and practitioners to create deployment and reliability plans
  • Confers with Enterprise Architecture and Renaissance architecture teams to devise implementation architecture
  • Promotes standards across application configuration towards the highest security posture
  • Collaborates with access management and security teams on setting up roles and permissions using least privilege strategies
  • Collaborates with integration/performance testing teams to leverage integrated release testing in the Release Acceptance environment
  • Collaborates with production controls teams on monitoring, failover, logging, and alerting strategies
  • Owns and continuously improves incident response runbooks, on‑call rotations, and escalation procedures
  • Conducts capacity planning and load forecasting to proactively address scalability needs
  • Implements and validates infrastructure failover scenarios
  • Confers with Network team on all connectivity plans and issue resolution (including between on‑premises and AWS)
  • Follows and enables program‑level agile practices for efficient collaboration and delivery
  • Develops documentation for ORT technical infrastructure, architecture, and reliability support.
Supervisory Responsibilities

None

Qualifications
  • [Required] Understanding of Kanban and/or Agile methodologies
  • [Required] Familiarity with SRE principles as defined by Google SRE practices (error budgets, toil elimination, reliability hierarchy)
  • [Required] Able to succeed in a fast‑paced environment with frequent changes
  • [Required] Comfortable communicating with both technical and non‑technical audiences
  • [Required] Self‑starter — takes initiative to research, learn, and deliver; anticipates the play
  • [Required] Team player — humble, collaborative, and focused on making the entire team succeed.
Technical Skills & Background
  • [Required] AWS EC2, Kubernetes, Kafka, Jenkins, Terraform, Ansible, Hashicorp Vault
  • [Required] Observability tooling such as Prometheus, Grafana, Open Telemetry, Datadog, or equivalent
  • [Required] Incident management platforms and on‑call tooling (e.g., Pager Duty, Ops Genie)
  • [Required] Microservices and streaming data‑intensive application architecture
  • [Required] Application architecture, networking, and security in the cloud
  • [Required] Setting up platforms in AWS for high‑performance requirements
  • [Required] Broad experience in API‑based development
  • [Required] Git and Artifactory for sourcing artifacts
  • [Required] Multi‑AZ, multi‑region failover architecture
  • [Required] Chaos engineering principles and tooling (e.g., Chaos Monkey, Gremlin, Litmus Chaos)
  • [Required] Fluent with different data formats and structures: JSON, Protobuf, Avro
  • [Required] SQL and No

    SQL databases, in‑memory data stores
  • [Required]…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary