Production Support Engineer
Listed on 2026-02-16
-
IT/Tech
IT Support, Systems Engineer
Cloud Operations is a fast-paced team responsible for ensuring the reliability, security, and performance of IDX’s 24x7x365 SaaS platform.
As aProduction Support Engineer,you will develop deep product and platform knowledge and apply your technicalexpertiseto investigate, triage, and resolve production incidents and system defects. You will play a critical role inmaintainingand improving the availability and performance of services our clients rely on, using monitoring, analysis, and automation to drive measurable improvements.
In this role, you will perform detailed defect analysis to assess impact and urgency,leverage specialized tools to support root cause investigations, and partner closely with Engineering and other teams to drive resolution. You will also contribute to continuous improvement efforts by streamlining manual processes through automation and cloud technologies.
You’llbe part of a highly collaborative team that responds to live incidents, proactivelymonitorsbusiness-critical systems, andoperateswitha strong senseof ownership and accountability. Together,we’refocused on protecting our clients from identity risk while building a company andworkplacewe’reproud of.
Role and Responsibilities Monitoring and Alerting- Proactivelymonitorproduction systems to ensure environment health, stability, and availability using AWS-native and third-party monitoring tools (e.g., Cloud Watch, Dynatrace).
- Respond to and triage major production incidents, gathering logs, metrics, and other data to validate, reproduce, and assess impact.
- Drive incident investigations by providing clear summaries,validating issues across platforms, and escalating to appropriate teamsfor resolution.
- Assess, log, categorize, and track system defectsin accordance with established defect management processes and standard operating procedures.
- Monitor defect status through resolution and communicate progress to stakeholders.
- Use specialized tools and analysis toprovideactionable insights that support root cause identification and ongoing platform improvements.
- Communicateincidentand defect analysis to Management, Information Security, Client Services, and Engineering.
- Support outage communications, including drafting and distributing updates for internal teams, executive leadership, and external customers as appropriate.
- Contribute to Root Cause Analysis (RCA) documentation, detailing incident timelines, impact, contributing factors, and corrective actions.
- Generate andmaintainservice availability and performance metrics for reporting and trend analysis.
- Partner closely with Engineering to troubleshoot and resolve production issues.
- Develop andmaintainautomation to improve reliability, reduce manual effort, and ensure repeatable operational outcomes.
- Address Information Security–related requests through manual intervention or Infrastructure as Code (Terraform).
- Participate in code and infrastructure reviews with a focus on security, reliability, and cost efficiency.
- Monitor service request queues and manage support tickets in alignment with established SLAs.
- Document systems, processes, and operational procedures to ensure clarity and knowledge sharing.
- Use diagnostic tools toidentifyroot causes and implement effective resolutions.
- Performadditionalduties as assigned to support the stability and success of the platform.
- Experience:
4+ years of progressive experience in SaaS operations, application administration, or technical support roles supporting production systems. - Education:
Bachelor’s degree in a technical field or equivalent hands-on experience supporting SaaS technologies in AWS. - Technical
Skills:
Hands-on experience with AWS cloud services, Azure Dev Ops, Linux-based systems, containerized workloads (Docker), application monitoring tools, Infrastructure as Code (Terraform), scripting, and Git. - Availability & Teamwork:
Willingness and ability toparticipatein after-hours incident response and on-call rotations. Flexible, dependable, and comfortable collaborating across teams. - Performance & Ownership:
Self-motivated and results-driven, with the ability to apply specialized knowledge to solve complex operational problems. - Attention to Detail:
Able to work independently whilemaintainingstrong communication and a collaborative presence within the team.
IDX is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other legally protected characteristic.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).