Director Management
Listed on 2026-03-01
-
IT/Tech
Systems Engineer, Cloud Computing, IT Project Manager
About Nscale
Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI applications using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work.
Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do.
The Role
We're seeking a Director of Fleet Management to lead the engineering team building Nscale's Fleet Manager platform. You'll oversee the development of workflow automations that provision, test, and remediate GPU nodes and network switches at scale—ensuring our infrastructure is healthy, reliable, and continuously optimised.
Fleet Manager is central to Nscale's operations, automating the entire lifecycle of our compute infrastructure from initial device enrolment through burn‑in testing, network standup, and GPU health monitoring with self‑healing capabilities. You'll be required to balance leading a team of software engineers while contributing to the build of production‑grade Python‑based automation systems that keep our AI cloud running at peak performance.
Whatyou'll do
- Lead and mentor a team of software engineers building Fleet Manager's workflow automation systems.
- Own the technical roadmap and delivery of device provisioning, validation, testing, and remediation workflows.
- Design and build workflow orchestration systems to automate GPU node and switch lifecycle management at scale.
- Partner with Infrastructure, Platform, and SRE teams to integrate Fleet Manager with DCIMs, Net Box, Open Stack, and other infrastructure tooling.
- Establish engineering standards for reliability, observability, and operational excellence across all Fleet Manager services.
- Build Python‑based (or similar) automation workflows for hardware enrolment, burn‑in testing, network configuration, and GPU health checks.
- Drive architecture decisions balancing automation complexity, reliability, and maintainability.
- Collaborate with Product and stakeholders to translate infrastructure operational needs into robust, scalable automation.
- Set execution cadence and delivery standards: sprints, reviews, incident management, and on‑call practices.
- 8+ years' experience building and operating production software systems, with proven technical leadership in infrastructure or automation tooling.
- Strong Python engineering fundamentals and experience building workflow automation systems.
- Hands‑on experience with workflow orchestration tools (Temporal, Airflow, Prefect, or similar).
- Track record working with infrastructure tooling such as DCIMs, ERP systems, or data centre management platforms.
- Experience integrating with infrastructure APIs and systems (bare metal provisioning, IPMI, network automation, monitoring).
- Strong people management and mentorship skills, with ability to hire and develop high‑performing engineering teams.
- Deep understanding of operational excellence: SLOs, monitoring, alerting, incident response, and production reliability.
- Experience building systems that automate hardware lifecycle management, provisioning, or remediation workflows.
- Comfortable working closely with Infrastructure and Operations teams to understand real‑world constraints and requirements.
- Passion for building automation that eliminates toil and scales infrastructure operations.
- Build automation that scales infrastructure operations effortlessly.
- Balance pragmatic solutions with long‑term architectural vision.
- Foster a culture of ownership and operational discipline.
- Highly competitive package (base + equity) with reviews every 12 months.
- Join the fastest‑growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting‑edge AI.
- Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things,…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: