×
Register Here to Apply for Jobs or Post Jobs. X

Application Support​/SRE Lead

Job in Newark, Essex County, New Jersey, 07175, USA
Listing for: Aegistech
Full Time position
Listed on 2026-02-12
Job specializations:
  • IT/Tech
    IT Support, Cloud Computing, IT Project Manager, Systems Engineer
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Our client in Northern NJ is seeking a full-time Production Support/Site Reliability Engineering Lead. This position is hybrid, onsite in the office 4-5 days a week. Local candidates only/no relocation.

This position is not eligible for Visa sponsorship. Please, no third parties.

Job Overview

The Production Support & SRE Manager owns end-to-end production operations for our SaaS applications. This role leads L1/L2 application support, drives Incident and Problem Management processes, and champions Site Reliability Engineering (SRE) best practices. It is a hands-on leadership position requiring strong technical depth, operational excellence, and exceptional communication skills. The manager will collaborate closely with Development, QA, Infrastructure, and Database teams to ensure system stability, reliability, and high availability across all environments.

Key Responsibilities Incident & Problem Management
  • Own the full Incident Management lifecycle—from detection through resolution and post-incident review.
  • Lead and coordinate incident bridge calls with customers and internal teams for high-priority issues.
  • Ensure incidents are logged accurately, prioritized correctly, and resolved within defined SLAs.
  • Maintain clear, timely communication with internal stakeholders and clients during outages and major incidents.
  • Drive Problem Management by identifying recurring issues, patterns, and systemic weaknesses.
  • Gather technical inputs from cross-functional teams to produce accurate, detailed RCA documentation.
  • Prepare and present structured RCA reports, including impact, timeline, root cause, and corrective actions.
SRE & Operational Excellence
  • Define and maintain SLIs/SLOs for critical services (availability, latency, error rates, throughput).
  • Champion observability across systems—logging, metrics, tracing, dashboards, and alerting.
  • Improve and standardize monitoring and alerting for Angular, C#, and SQL Server–based applications.
  • Identify and implement automation opportunities (runbooks, self-healing, deployment checks, validation scripts) to reduce manual toil.
  • Participate in capacity planning, performance tuning, and resilience testing.
  • Lead and mentor L1/L2 support engineers and SRE-focused team members.
  • Establish clear expectations around ticket hygiene, communication, and ownership.
  • Conduct regular operational reviews covering backlog, aged incidents, recurring issues, SLAs, and reliability metrics.
  • Partner with development managers and product owners to prioritize stability and reliability improvements alongside feature delivery.
  • Define, document, and continuously improve Incident and Problem Management processes aligned with ITIL and SRE best practices.
  • Ensure all incidents, problems, and changes are properly documented in the ticketing system.
  • Create and maintain operational dashboards and reports for leadership and key stakeholders.
  • Ensure the team builds and maintains knowledge base articles and runbooks to accelerate L1/L2 resolution.
Qualifications Required
  • 5+ years of experience in Production Support, Application Support, SRE, or Operations for web-based/SaaS applications.
  • 3+ years in a leadership role (Manager/Lead) overseeing production support and/or SRE functions.
  • Strong experience leading P1/P0 incidents and coordinating multiteam responses.
  • Proven experience with Problem Management and Root Cause Analysis for complex, cross-functional issues.
  • Hands-on experience with web application environments, preferably Angular, C#/.NET, and SQL Server.
  • Experience with monitoring, logging, and alerting tools; strong familiarity with observability dashboards.
  • Ability to read and interpret application logs, metrics, and distributed traces.
  • Ability to analyze SQL queries and diagnose database performance issues (blocking, deadlocks, slow queries).
  • Excellent verbal and written communication skills, with the ability to explain technical issues to non-technical audiences.
  • Strong analytical, critical thinking, and problem-solving abilities.
Preferred
  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
  • Experience supporting applications for Health Plans or Insurance organizations.
  • Exposure to regulated environments such as healthcare (HIPAA/HITECH, HITRUST, NIST-based controls).
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary