Manager, Engineering; Production Orchestration
Listed on 2026-06-17
-
IT/Tech
Systems Engineer, SRE/Site Reliability
Location: New York
Category-defining tech. Career-defining work.
Lots of tech companies disrupt. But, many fail when they try to scale. We're different. Cockroach DB makes it easier for companies to build and scale apps. This is how and why we're helping some of the most innovative companies on the planet. We tackle problems head-on and focus on solutions that create lasting impact.
Because when our customers win, we all win.The Role
At the heart of Cockroach DB is our Production Orchestration team- the stewards of availability, reliability, and scalability across our cloud offerings and beyond. Built on a foundation of SRE principles and carrying forward years of operational practice, our core commitment is clear: ensuring our customers have a secure, reliable, and performant production service at scale.
We're looking for an Engineering Manager to lead our Production Orchestration team as part of a global Production Engineering organization. You'll drive foundational architectural changes to how we operate our fleet, champion AI-driven approaches to both development and operations, and foster a culture of operational excellence, ensuring Cockroach DB meets and exceeds our SLAs while keeping pace with rapid growth.
You will report to Tom Schmidt, Director of Production Engineering, who has led this team for 4+ years and will continue to be deeply involved in its technical direction. You'll be responsible for the growth and development of the team's engineers, day-to-day execution, and operational health, while bringing your own leadership and ideas to the table.
You Will- Lead the Production Orchestration team, focused on the reliability, availability, and scalability of Cockroach DB in production.
- Own operational excellence. Ensure the team is meeting or exceeding our SLAs, running effective incident response, and continuously improving our operational posture. Every incident is treated as a learning opportunity.
- Partner across the global Production Engineering organization to align on shared goals, ensure smooth coordination across time zones, and drive cohesive execution.
- Drive automation and tooling. Relentlessly reduce operational toil by building systems that improve observability and scale our fleet without scaling headcount linearly.
- Leverage AI to improve how the team builds and operates. Help the team adopt AI-assisted development practices and identify applied AI opportunities to improve operational workflows, from alert triage to capacity planning to incident response.
- Contribute to foundational architecture. The team is building a new architectural initiative that will reshape how we operate our fleet. You'll help lead execution on this work and ensure the team has the space and support to deliver.
- Coach and develop your engineers. Provide direct, constructive feedback. Guide personal development and career growth beyond just technical skills. Managing performance and ensuring engineers are achieving their goals is essential to retaining a high-performing team.
- Partner with engineering and product leadership to shape the roadmap for Cockroach DB's operational capabilities and future products.
- Collaborate across teams to build and establish the tools and processes that empower everyone to make our customers successful.
In your first 30 days, you will become an integrated member of our engineering team. You'll spend time learning about the Production Orchestration team's domain, processes, and people, as well as Cockroach DB and Cockroach DB Cloud. You'll shadow on-call rotations, review recent incidents, and begin to understand the operational landscape. We believe it's essential for you to take this first month to become familiar with our technology and our company.
After 3 months, you will be fully integrated into the team and comfortable leading the Production Orchestration team's execution. You'll have built an understanding of our infrastructure, observability stack, and operational tooling. You'll understand the team's priorities and roadmap, have established working relationships with partner teams across Production Engineering, and be actively contributing to our incident response and operational review processes.
After 6 months, you'll be confidently managing the team and driving their work forward. You'll be shaping how the team approaches its new architectural work, identifying opportunities to apply AI to operational challenges, and ensuring that each member of your team is working on projects that align with both our needs and their interests. You'll be a key voice in Production Engineering's strategic direction.
YouHave
- A passion for building relationships and a deep sense of responsibility for the welfare of the engineering team you manage, including their professional development and growth. We're looking for managers that want to empower their team to achieve their professional and personal goals.
- Experience leading global operations and/or incident management and response.
- Experience working on complex…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).