Production Support and QA Specialist
Essex Junction, Chittenden County, Vermont, 05452, USA
Listed on 2026-06-02
-
IT/Tech
IT Support, Cloud Computing
Overview
Production Support & QA Specialist role with Agile Resources, Inc. on a 1+ year contract with potential for FTE conversion. 100% Remote anywhere in the United States.
This role is a Dev Ops-focused position supporting mission-critical production systems. It serves as an operational gatekeeper, ensuring system stability, release quality, and reliable production operations across applications and platforms. You will play a hands-on role in incident management, quality validation, deployments, and change control while partnering closely with engineering, Dev Ops, infrastructure, and vendor teams.
What You’ll DoProduction Support & Operations
- Provide Tier 2 / Tier 3 production support for critical applications and integrations, ensuring high availability and rapid issue resolution
- Act as an escalation point for complex incidents, performing advanced troubleshooting across application, infrastructure, and vendor layers
- Lead incident response efforts, including triage, severity assessment, coordination, and resolution
- Perform root cause analysis (RCA) and document corrective and preventive actions
- Manage ticket and case workflows to ensure proper prioritization, escalation, and SLA adherence
- Support production environments during releases, planned maintenance, and emergency fixes
- Perform user access provisioning and deprovisioning in alignment with security and audit requirements
Quality Assurance, Deployment & Change Management
- Serve as a QA gatekeeper prior to production releases, validating readiness, test results, and operational criteria
- Review and approve release artifacts, deployment plans, and rollback strategies
- Provide deployment oversight, including real-time monitoring and post-deployment verification
- Participate in Change Control Board (CCB) activities, including risk assessment and post-implementation reviews
- Ensure adherence to ITIL-based change management processes and documentation standards
Monitoring, Reliability & Performance
- Monitor production systems using logging, alerting, and observability tools
- Oversee batch jobs and scheduled processes, investigating failures as needed
- Track certificate life cycles and coordinate renewals to prevent outages
- Monitor vendor releases and patches, assessing impact and coordinating upgrades
- Track and report operational metrics such as availability, incident trends, performance, and capacity
- Analyze recurring issues to drive proactive improvements and improve system reliability
Communication & Documentation
- Serve as a primary communication point during incidents, outages, and maintenance windows
- Translate technical issues into clear updates for business and leadership stakeholders
- Create and maintain runbooks, SOPs, troubleshooting guides, and operational documentation
- Ensure documentation remains accurate, current, and audit-ready
- 5+ years of experience in Production Support, Dev Ops, or IT Operations
- Strong hands-on experience supporting live, mission-critical systems
- Demonstrated experience with incident response, RCA, and change management
- Software testing / QA experience
, including functional, regression, and integration testing - Experience validating fixes and releases prior to production deployment
- Detail-oriented problem solver with a structured troubleshooting approach
- Clear and confident communicator during incidents and escalations
- Strong documentation and reporting skills
- Working knowledge of ITIL practices (Incident, Change, and Problem Management)
- QA or software testing certification (e.g., CTFL, CSTE, CSQA, or equivalent)
- Experience supporting high-availability or regulated production environments
- Familiarity with Dev Ops or SRE practices
, including CI/CD and observability - ITIL certification
- Contract
- Mid-Senior level
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).