Production Reliability Engineer Job Schiller Park area,Illinois USA,IT/Tech

A top global trading firm is seeking a Production Reliability Engineer to join its Central Operations and Reliability Engineering team within Production Infrastructure. This group plays a critical role in managing and supporting a real-time, high-performance trading environment operating at global scale. The role sits at the intersection of infrastructure, automation, and live production support, with direct responsibility for system reliability, performance, and operational risk in a mission-critical environment.

What You’ll Do

Own and improve a large-scale production environment with a focus on reliability, performance, and operability
Proactively monitor and troubleshoot distributed, latency-sensitive systems
Build and maintain Dev Ops and automation tooling across configuration management, deployments, monitoring, data collection, and analysis
Use system and operational metrics to improve scalability and stability
Partner with engineers, operators, and stakeholders to investigate and resolve complex system issues
Coordinate production changes and manage incidents in collaboration with risk and operational support teams
Communicate directly with end users to manage incidents and drive technology improvements
Support reconciliation workflows related to system output and downstream processes
Evaluate and manage operational risk for production changes
Define, document, and continuously refine operational procedures
Mentor and support other reliability and operations engineers
Participate in shared operational and on-call responsibilities

What You Bring

Degree in Computer Science, Engineering, or equivalent professional experience
5+ years in Dev Ops, SRE, Linux Systems Engineering, or Network Engineering roles
3+ years of experience with Python and shell scripting. Familiarity with C++ is a plus
Strong Linux expertise, including system internals, performance tuning, and system/network configuration
Solid understanding of networking fundamentals (routing, multicast, VLANs, Ethernet)
Detail-oriented mindset with a strong sense of ownership and urgency
Ability to support periodic on-call duties with reliable availability

Why This Role

You’ll work on high-impact systems where reliability matters, in a highly technical environment that values ownership, precision, and continuous improvement. The role offers deep exposure to complex infrastructure and real-time problem-solving without bureaucracy or brand-driven distractions.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language