Incident Manager
Listed on 2026-06-18
-
IT/Tech
IT Support, IT Project Manager, Cloud Computing: Infrastructure & Operations, Cybersecurity
Category-defining tech. Career-defining work.
The next era of software won’t operate at human scale. Applications will create applications. Systems will coordinate with systems. Data and decisions will move faster than teams can react to them. The infrastructure powering that world cannot be fragile, reactive, or limited by the assumptions of the past.
Cockroach Labs exists to build what comes next — before the world requires it.
We created Cockroach DB to survive failures, scale without compromise, and adapt to changing conditions automatically. Now we’re helping define a future where complexity fades into the background and infrastructure simply works, no matter the scale.
This is the kind of challenge that attracts people who want to shape industries, not just participate in them. The work is ambitious, the standards are high, and the impact is real.
The RoleAs an Incident Manager at Cockroach Labs, you will lead the coordination and resolution of incidents across internal systems, Cockroach DB Cloud, customer-hosted environments, and security/compliance events in the NA region. You will drive structured response efforts, partner with cross-functional teams to identify root causes, and help prevent recurrence in an environment where the pace is fast and the bar is high.
To be eligible for this role, you must be located in the Pacific time zone.
You Will- Manage the full lifecycle of incidents from detection through resolution, ensuring adherence to established incident management processes.
- Lead and coordinate cross-functional response efforts to drive timely and effective incident resolution.
- Declare and escalate high-severity incidents, mobilizing appropriate stakeholders and leadership as needed.
- Serve as an escalation point for critical incidents and support crisis response activities.
- Lead structured root cause analysis and post-incident reviews, ensuring actionable follow-up items are identified.
- Track corrective actions to completion to reduce repeat incidents.
- Provide clear, timely communication to technical and non-technical stakeholders, including customer-facing updates when required.
- Contribute to incident metrics tracking (e.g., MTTR, MTTD, recurrence) and support reporting on trends and areas for improvement.
- Support ongoing improvements to incident management processes, documentation, and tooling.
- Participate in a rotational on-call schedule to ensure 24x7 coverage for high-severity incidents.
In your first 30 days, you will familiarize yourself with Cockroach DB, our customers, and our company. We will provide some self-guided onboarding with reading and hands-on material to familiarize yourself with the company and some of the responsibilities of the role. During this period, you will also start to get acquainted with our incident management protocols and tools, and begin shadowing incident response activities to observe and learn from other team members with an eye to future improvements and optimizations.
After 3 months, you will be integrated into the company and will be familiar with the various systems we use. You will be able to manage incidents from both internal and customer environments and will be actively contributing to the Incident Management program. You will be leading incident response efforts, conducting root cause analyses, and participating in post-incident reviews.
You Have- 5+ years of experience in technical operations, SRE, support, or incident management roles, including at least 2 years of direct Incident Management experience leading high-severity incidents.
- Prior experience working in a highly technical, fast-paced environment such as a cloud infrastructure, SaaS, or enterprise software company.
- Strong troubleshooting and analytical skills in a 24x7 operational environment.
- A demonstrated practice of integrating AI tools into your daily workflow to improve speed, quality, and decision-making.
- Strong judgment applied to validating, refining, or discarding AI-generated output.
- A continuous drive to experiment with and adopt better ways of working as tools evolve.
- Excellent written and verbal communication skills across technical and non-technical audiences.
- Abi…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).