×
Register Here to Apply for Jobs or Post Jobs. X

Major Incident Manager

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Crusoe
Full Time position
Listed on 2026-03-01
Job specializations:
  • IT/Tech
    IT Support, Technical Support, Cybersecurity, IT Project Manager
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

The Incident Manager role is critical to maintaining service reliability and preserving customer trust. This position directly impacts company success by minimizing downtime, managing high-severity incidents, and ensuring rapid resolution of complex technical challenges. You will lead the response to high-visibility incidents and customer escalations, acting as a central point of coordination to drive timely, effective outcomes.

In this role, you’ll spearhead the management of critical incidents from identification through resolution, while continuously improving incident response processes and support readiness. You’ll work cross-functionally with engineering, product, and customer teams to design scalable self-service support workflows, contribute to product improvements, and develop robust incident response strategies. You’ll also play a key role in mentoring team members, delivering training, and building knowledge resources that strengthen both internal teams and customer success.

We’re looking for a technically skilled professional with strong Linux expertise, excellent communication skills, and 4–5 years of customer-facing experience. Prior experience in incident management and on-call rotations is essential.

What You’ll Be Working On
  • Diagnose and resolve complex technical issues related to Infini Band
    , containerization, and distributed training environments
  • Lead high-severity incident response efforts to ensure rapid mitigation and minimal disruption to customer operations
  • Manage customer escalations with professionalism, clarity, and urgency, ensuring stakeholder confidence throughout the incident lifecycle
Implement & Optimize
  • Guide customers through the implementation, configuration, and optimization of HPC infrastructure
  • Partner with customers to improve performance, scalability, and efficiency across their environments
Educate & Empower
  • Develop and deliver internal and external training materials, including live training sessions, documentation, and knowledge base articles
  • Provide ongoing enablement to help customers effectively adopt and maximize the value of company solutions
  • Lead incident response training and preparedness initiatives for internal teams
Collaborate Internally
  • Work closely with engineering and product teams to share customer feedback and operational insights
  • Influence product enhancements and reliability improvements based on real-world incident data
  • Contribute to the continuous improvement of incident management processes and the overall customer experience
What You’ll Bring to the Team Technical Proficiency
  • Strong hands-on experience with Linux
    , virtualization
    , Kubernetes
    , and managing customer incidents
  • Solid understanding of the TCP/IP stack
  • Working knowledge of Infrastructure-as-Code (IaC) practices
  • Excellent written and verbal communication skills, with the ability to clearly explain complex technical issues
  • Proven problem-solving mindset with strong diagnostic and analytical abilities
  • 3–5+ years of experience in a team leadership role
    , serving as a liaison between internal teams and external customers
  • 4–5 years of customer-facing experience in a technical environment
  • Direct experience participating in or leading incident management efforts and on-call rotations
Bonus Skills
  • Programming experience in one or more programming languages
  • Restricted Stock Units (RSUs) in a fast-growing, well-funded technology company
  • Comprehensive health insurance options, including HDHP and PPO plans
    , plus vision and dental coverage for you and your dependents
  • Employer contributions to HSA accounts
  • Paid parental leave
  • Company-paid life insurance, short-term disability, and long-term disability coverage
  • 401(k) plan with a 100% company match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Subscription to the Calm app
  • Met Life Legal benefits
  • Company-paid Commuter FSA benefit of $200 per month
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary