Sr. Engineering Manager Job San Francisco area,California USA,IT/Tech

The Opportunity

Every product surface at Hinge Health - the ones helping 18M+ people move beyond pain - depends on the foundational infrastructure your team will own. As Senior Engineering Manager, Service Enablement , you’ll lead the platform-minded team responsible for keeping our cloud infrastructure rock-solid while accelerating the shift to AI-native engineering workflows. Your work will be a force multiplier for every engineer, product manager, and designer in the company.

This is a high-impact leadership role for someone who thrives at the intersection of platform reliability, developer enablement, and autonomous AI-driven operations.

What You’ll Accomplish

In your first 3 months:

Establish relationships with key stakeholders across Security, Product Engineering, Data Engineering, and the India-based SRE team. Audit the current infrastructure posture, on-call practices, and CI/CD pipelines to identify the highest-leverage improvement opportunities.
Assess your team of 8–10 engineers (Staff, Senior, and mid-level SREs), understand individual growth trajectories, and begin shaping a coaching-first operating cadence.

In your first 6 months:

Own Platform Reliability & Scalability
- Guarantee stability and performance across multiple EKS clusters, maintaining 99.9%+ uptime while optimizing cost-efficiency as AI workloads scale.
Drive the Harness Engineering Initiative
- Define safety rails, test harnesses, and verification systems that allow autonomous AI agents to reliably build, test, and maintain infrastructure in production-grade environments.
Scale the NestJS Monorepo & CI/CD Platform
- Accelerate migration of all NestJS services into the monorepo, ensuring NX, Git Hub Actions, and Okteto deliver a world-class developer experience.

In your first year:

Execute Vendor Strategy & Cost Optimization
- Own relationships with AWS, Datadog, Cloudflare, Temporal, and Infisical; deliver significant cost savings as Hinge Health scales infrastructure and AI workloads.
Champion Operational Excellence
- Run a follow-the-sun support model, maintain 24/7 on-call coverage, lead the Incident Review Board, and continuously improve SLOs, runbooks, and observability to reduce MTTR.
Build a High-Performing, Inclusive Team
- Develop talent toward Senior and Staff levels, maintain sustainable on-call practices, and foster a culture of blameless retros and continuous improvement.

Who You Are

A Platform Thinker - You see infrastructure as a product. You obsess over developer experience metrics (DXI, DORA, PR throughput) and build systems that reduce friction for every team you serve.
AI-Forward - You’re energized—not threatened—by the shift to AI‑assisted engineering. You’ve experimented with tools like Cursor, Claude Code, or Copilot and can envision a world where engineers act as architects and auditors of AI‑generated code.
A Trust Builder - You communicate effectively across engineering, security, product, and executive stakeholders to align infrastructure decisions with business priorities.
A Learn‑it‑All - You don’t have all the answers, but you have a proven framework for finding them. You lead blameless retros that convert insights into action and measure what matters most.
Hands‑On & Accountable - You know the ins and outs of your team’s tech stack and spend ~15% of your time working in the code. You take owner‑operator pride in supporting production systems.
Resilient - You thrive in fast-paced, ambiguous environments and can make results happen even with imperfect data—whether that’s navigating a breaking‑change migration at scale or an unexpected incident.

Basic Requirements

10+ years of professional experience in technology, with depth in SaaS platforms and large‑scale distributed systems.
4+ years managing engineering teams of 6+ direct reports, including Senior and Staff-level engineers.
Deep hands‑on expertise with cloud infrastructure (AWS), container orchestration (Kubernetes/EKS), and Infrastructure as Code (Terraform).
Proven track record owning platform reliability at scale - including incident management, SLO‑driven operations, and cost optimization.
Cross‑functional leadership experience spanning multiple…