Director of DevOps
Listed on 2026-06-15
-
IT/Tech
IT Support, Cybersecurity, Systems Engineer, Cloud Computing: Infrastructure & Operations
The Director of Dev Ops sets the strategy and runs the day-to-day for Exostar’s global, 24x7 production operations. The role is the technical and operational backbone for every customer-impacting issue: it owns the engineering response that Customer Support depends on, partners closely with Support on customer outcomes, and is the single accountable leader on the engineering side of every Sev-1/Sev-2. This is a highly visible role in a fast-growing, fast-moving company — every dollar of cloud and tooling spend gets reported up through this leader, and every step we take toward an AI-native operations posture is driven from this seat.
The successful candidate is an automation-first technologist, a cross-organizational operator who is equally credible with engineering, product, security, support, and finance, and a process-minded leader who treats manual toil as a defect and uses AI as the default tool to eliminate it.
Responsibilities: 24x7 Operations & Customer Issue Response- Own production uptime, performance, and reliability across all Exostar SaaS products on a 24x7 basis, including SLO definition, on-call rotation, incident command, and stakeholder communications during major incidents.
- Own the engineering and operational response to every customer-impacting issue. Partner with Customer Support on triage, root cause, communications, and resolution — Support owns the customer relationship; this role owns the technical fix and the systemic prevention.
- Drive blameless postmortems and ensure every Sev-1/Sev-2 results in a durable engineering or process fix — not a “we’ll watch for it next time.”
- Be the single accountable engineering leader when a customer asks “what happened, and what are you doing about it.”
- Treat manual operational work as a defect. Set and enforce a target for the percentage of operational toil eliminated each quarter.
- Drive infrastructure-as-code, Git Ops, automated remediation, and self-healing patterns across the production estate.
- Build a deployment platform that lets engineering teams ship safely and frequently without Dev Ops as a bottleneck.
- Own CI/CD pipeline strategy, golden paths, and the developer experience for shipping to production.
- Stand up and continuously evolve an AIOps practice: AI-driven anomaly detection, log summarization, intelligent alerting, and agentic incident triage.
- Deploy AI agents to draft runbooks, post first-pass postmortems, and accelerate engineering investigation of customer-reported issues.
- Mine operational and incident data with AI for recurring failure modes and capacity drift and turn those into engineering bets.
- Operate as a peer to engineering, product, security, customer support, and finance leaders. This role lives at the intersection of those functions and has to be effective in all of them.
- Partner with Customer Support on the joint operating model: incident handoffs, ticket-to-engineering workflows, status communications, and shared metrics for customer experience during issues.
- Partner with Product Management and Finance on launch-readiness, capacity planning, and pricing/COGS modeling for new and existing services.
- Partner with the Security Office on compliance, audit readiness, and secure-by-default infrastructure (SOC 2, NIST 800-171, CMMC, FedRAMP-adjacent).
- Represent Operations in customer escalations, audit conversations, and revenue-impacting deals.
- Own the cost-of-goods line for Exostar’s hosted services. Forecast, track, and explain it monthly to Finance and senior leadership.
- Drive cloud cost optimization (commitments, right-sizing, idle elimination, architectural efficiency) as an ongoing discipline, not an annual project.
- Build the unit-economics views Finance needs to run the business: cost per customer, per product, per environment.
- Own vendor relationships and contracts for infrastructure, observability, and managed services; lead RFPs and renewals.
- Stand up and maintain the operating cadence: weekly ops reviews, monthly business reviews, quarterly capacity planning, incident review boards (jointly with Customer Support…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).