More jobs:
Site Reliability Engineer
Job in
Wilmington, New Castle County, Delaware, 19803, USA
Listed on 2026-06-03
Listing for:
3B Staffing
Full Time
position Listed on 2026-06-03
Job specializations:
-
IT/Tech
Systems Engineer, IT Support
Job Description & How to Apply Below
- Manager is not looking for cloud/ Dev Ops/ infrastructure skilled candidates - DO NOT sent profiles with these experiences
- This is an SRE focused role - hiring manager is looking for someone who is highly skilled in RCA, postmortem documentations
- Software background is ideal (ex. Java)
As part of the Site Reliability Engineering team within the Reference Data Engineering group, you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to runtime problems. In this environment, you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you'll be part of application development org to build more resilient, self-healing applications that require minimum production operations.
Key Responsibilities:
- Lead and conduct detailed Root Cause Analysis (RCA) for incidents, identifying underlying issues and recommending corrective actions.
- Document and communicate findings from RCA processes, ensuring transparency and knowledge sharing across the organization.
- Develop and maintain incident postmortem reports, providing insights and actionable recommendations to stakeholders.
- Monitor system performance and reliability metrics, proactively identifying potential issues before they escalate.
- Contribute to the design and implementation of automated monitoring and alerting systems to improve incident detection and response times.
- Continuously improve the incident management process, incorporating feedback and lessons learned from RCA activities.
- Participate in incident response activities.
- Bachelor's degree or equivalent experience in a software engineering discipline
- 5+ years of Software Engineering experience
- Excellent communication skills, with the ability to convey technical findings to both technical and non-technical audiences
- Excellent debugging and trouble shooting skills
- Experience in Site Reliability Engineering, Dev Ops, or a similar role, with a focus on incident management and RCA.
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Dynatrace).
- Familiarity with containerization technologies (e.g., Docker, Kubernetes).
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×