×
Register Here to Apply for Jobs or Post Jobs. X

Lead, NOC & Incident Management

Job in New York City, Richmond County, New York, USA
Listing for: Fluidstack
Full Time position
Listed on 2026-02-21
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Job Description & How to Apply Below
About Fluidstack

At Fluidstack, we're building the infrastructure for abundant intelligence. We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more - to unlock compute at the speed of light.

We're working with urgency to make AGI a reality. As such, our team is highly motivated and committed to delivering world-class infrastructure. We treat our customers' outcomes as our own, taking pride in the systems we build and the trust we earn. If you're motivated by purpose, obsessed with excellence, and ready to work very hard to accelerate the future of intelligence, join us in building what's next.

About the Role

Fluidstack is seeking a Lead, NOC & Incident Management to build and lead our cross-functional operations center (NOC) and incident management execution function. You'll shape how Fluidstack detects, triages, and responds to operational events across our entire AI infrastructure portfolio, from datacenter facilities to network backbone to internal platform services.

This role demands equal parts operational leadership and technical capability. You'll build the 24/7 monitoring and triage function, operationalize our incident management framework, and establish the operational culture that enables Fluidstack to meet stringent customer SLAs.

Success means Fluidstack's infrastructure teams stop spending time on operational toil - alert monitoring, carrier ticket management, incident bridge setup, shift coverage gaps - and instead focus on engineering and reliability work. You're the person who ensures someone is always watching the glass, incidents are handled consistently, and post-incident learning actually happens.

Focus
  • NOC Build & Operations: Stand up the cross-functional operations center from scratch. Assist in selecting and onboard an MSP partner for Tier 1 coverage. Build staffing models, handoff processes, KPIs, and quality standards. Own the single question: "is someone qualified watching every alert, 24/7?"
  • Incident Management Execution: Create, deploy and operationalize Fluidstack's incident management framework. Manage the Incident Manager on-call rotation. Train engineers on incident roles. Run incident bridges during SEV0/SEV1 events. Ensure post-incident reviews happen on schedule and action items actually close. Partner with the Program Manager (process owner) to continuously improve the framework based on real-world execution.
  • Operational Readiness: Own the "are we ready?" question for every new domain onboarded to the NOC. Drive runbook quality assurance with functional teams. Plan and execute tabletop exercises. Coordinate with the Platform team on incident.io tooling workflows. Onboard new infrastructure domains (Facilities, Network, Systems) into NOC coverage on a phased schedule aligned with datacenter launches.
  • Cross-Functional Orchestration: Build tight operational partnerships with Network Ops, DC Ops, Systems/Platform, and Security teams. Define clear Tier 1 → Tier 2 escalation criteria for each domain. Ensure the NOC acts as a force multiplier for engineering teams by absorbing monitoring, triage, vendor ticket management, and incident coordination.
  • Vendor & Carrier Ticket Lifecycle: Establish processes for the NOC to manage the full lifecycle of carrier and vendor tickets - creation, tracking, SLA enforcement, escalation. Work with Network Ops and DC Ops to define ticket templates, escalation triggers, and vendor communication standards. Ensure no ticket falls through the cracks and every carrier/vendor interaction is documented.
  • Metrics & Continuous Improvement: Establish operational metrics (MTTA, MTTR, escalation rate, false positive rate, runbook coverage) and reporting cadence. Use data to identify patterns, reduce alert noise, improve runbook quality, and drive down incident response times. Produce monthly operational reports for leadership and customer-facing stakeholders.
About You
  • Proven NOC/Operations Center Leadership: 5+ years in network operations, infrastructure operations, or site reliability roles with significant experience running and building a NOC, operations center, or equivalent 24/7…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary