Senior Production Engineer; IC
Listed on 2026-05-30
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
About Ontrac Solutions
At Ontrac Solutions, we partner with elite engineering organizations to build systems that operate at planetary scale. Our team supports complex cloud, infrastructure, automation, and production engineering initiatives for organizations modernizing critical platforms and high-availability environments.
We are seeking a highly skilled Senior Production Engineer — IC4 to support a critical customer engagement. This role is ideal for a hands-on engineering professional with deep experience in infrastructure modernization, Linux systems, Python automation, production support, and large-scale migration execution.
Role OverviewThe Senior Production Engineer will work closely with Cloud Platform Engineering, Cloud Tech SRE, internal engineering teams, and customer stakeholders to support the modernization of legacy infrastructure into production-ready environments.
This individual will help lead complex operating system upgrades, packaging migrations, configuration management transitions, observability improvements, CI/CD hardening, and service onboarding efforts across a large-scale infrastructure footprint.
The ideal candidate is comfortable executing independently, owning technical work streams, resolving complex production issues, and documenting repeatable processes for long-term operational success.
Key Responsibilities- Lead and execute large-scale OS modernization efforts, including migrations from RHEL7 to EL8/EL9 across approximately 1,700 systems and virtual machines
. - Support configuration management transitions, including Chef to CINC and legacy package/configuration migration from yinst to RPM
. - Build, maintain, and configure RPM packages to support infrastructure modernization and application migration efforts.
- Develop, execute, and improve automated runbooks for OS upgrades, configuration changes, service onboarding, and production support.
- Triage, own, and resolve complex production issues, including high-priority S-bugs and infrastructure-related incidents.
- Harden CI/CD pipelines, observability frameworks, and rollout/rollback mechanisms for legacy-to-modern infrastructure transitions.
- Partner closely with Cloud Tech SRE to provide follow-the-sun Tier-2 production support
, including hands‑on incident response and break/fix operations. - Onboard services to modern monitoring, logging, and observability stacks.
- Support migrations from legacy monitoring tools such as Yamas to platforms such as Chronosphere, Prometheus, and Grafana
. - Assist with log management and Splunk integration strategies.
- Partner with application development teams during cloud cutovers, component migrations, and production readiness activities.
- Automate repetitive operational tasks using Python and related tooling.
- Document technical procedures, runbooks, migration steps, and operational standards.
- 5+ years of professional software engineering, production engineering, SRE, Dev Ops, or infrastructure engineering experience.
- Strong hands‑on experience with Python for automation, tooling, scripting, and operational workflows.
- Experience supporting Linux infrastructure in production environments, ideally including RHEL7, EL8, and EL9
. - Experience with OS modernization, infrastructure migration, or large‑scale systems upgrade initiatives.
- Hands‑on experience with package management and build processes, preferably including RPM packaging
. - Experience with configuration management tools such as Chef, CINC, Ansible, Puppet, or similar platforms
. - Strong understanding of production support, incident response, break/fix workflows, and Tier‑2 operational support.
- Experience hardening CI/CD pipelines and supporting safe rollout/rollback processes.
- Familiarity with observability, monitoring, logging, and alerting frameworks.
- Ability to work independently, manage technical tasks, and communicate clearly with engineering and stakeholder teams.
- Strong documentation skills and the ability to create repeatable runbooks and operational procedures.
- Experience with Chef to CINC migrations.
- Experience with yinst to RPM migration or similar legacy packaging transitions.
- Experience supporting…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).