More jobs:
SW Eng - Incident Management
Job in
Acton, Middlesex County, Massachusetts, 01720, USA
Listed on 2026-01-01
Listing for:
Insulet Corporation
Full Time
position Listed on 2026-01-01
Job specializations:
-
IT/Tech
IT Support, Cloud Computing
Job Description & How to Apply Below
Position Overview
The Staff Software Engineer – Incident Management will play a critical role in strengthening Insulet’s ability to respond to and recover from major incidents impacting our platform and services. This role focuses on engineering solutions that improve incident detection, response, and resolution, while partnering closely with Incident Managers, SREs and cross-functional teams. The ideal candidate combines technical expertise with a deep understanding of incident lifecycle management and operational resilience.
Responsibilities- Driving the incident management process and coordinating efforts with all teams involved, including SRE, R&D, IT, vendors, and stakeholder, in resolving the incident.
- Responding to incidents and initiating the incident management process.
- Prioritizing incidents according to their urgency and business impact.
- Coordinating response efforts and collaborating with the incident response team to ensure that all protocols are diligently followed.
- Communicating with internal stakeholders on major incidents and impacts.
- Producing documents that outline incident timelines and actions taken during the incident.
- Coordinating post-incident RCAs with responders and SMEs and communicating to stakeholders.
- Design and implement automation for incident detection, triage, and resolution.
- Develop and maintain runbooks, playbooks, and tooling to streamline incident response.
- Collaborate with Incident Managers to improve processes and reduce Mean Time to Recovery (MTTR).
- Participate in major incident response efforts, providing technical leadership during high-severity events.
- Lead post-incident reviews and implement preventive measures to avoid recurrence.
- Contribute to continuous improvement of incident management frameworks and best practices.
- Partner with SRE and development teams to embed reliability and resilience into system design.
- Strong understanding of incident management principles and frameworks (e.g., ITIL).
- Hands-on experience with incident response in complex, distributed systems.
- Hands-on experience with conducting post-incident review (blameless post-mortem) sessions.
- Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes).
- Hands-on experience with monitoring and alerting tools (e.g., Datadog, Pager Duty, Prometheus, Grafana).
- Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams.
- Ability to work under pressure and make decisions during high-impact incidents.
- Excellent troubleshooting and problem-solving skills.
- Experience with cloud platforms (AWS, Azure, or GCP).
- Understanding of compliance and security requirements in regulated environments.
- Ability to mentor others on incident response best practices.
- Proficiency in scripting or automation (Python, Bash, or similar) for operational tasks.
- Bachelor’s degree required (preferred field of study: Computer Science, Engineering, or related field).
- 7+ years of experience in software engineering, operations, or reliability roles.
- Minimum 3+ years focused on incident management or operational resilience.
- Proven track record of improving incident response processes and reducing MTTR.
- NOTE:
This position is eligible for hybrid working arrangements (requires on-site work from an Insulet office). #LI-Hybrid - Travel is estimated at 5% but will flex depending on business need.
Mid-Senior level
Employment TypeFull-time
Job FunctionMarketing, Public Relations, and Writing/Editing
IndustriesMedical Equipment Manufacturing
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×