Lead DevOps Engineer Job Somerville area,Massachusetts USA,IT/Tech

This role is located in Somerville, MA. We are a hybrid work environment and are in the office 3+ days per week.

To be considered for this role, candidates must have United States Citizenship due to the nature of the assignments. If you do not have U.S. Citizenship, you will not be considered for the position.

Tulip, the leader in AI-native frontline operations, is helping companies around the world equip their workforce with composable, connected apps, leading to higher quality work, improved efficiency, and end-to-end traceability across operations. Tulip's cloud-native, no-code platform, powered by embedded AI, is driving the digital transformation of industrial environments through composable, human-centric solutions that go beyond disrupting the Manufacturing Execution System (MES) category.

A spinoff out of MIT, Tulip is headquartered in Somerville, MA, with offices in Germany, Hungary, Singapore, and Israel. Tulip has been recognized as a World Economic Forum Global Innovator, a 2024 Deloitte Technology Fast award winner, one of Energage's Top Workplaces USA, and one of Built In Boston's "Best Places to Work" and "Best Midsize Places to Work".

About You

You're a senior infrastructure engineer who believes that the best system is one that runs itself. You thrive on building automation that eliminates toil, designing for resilience at global scale, and owning the full lifecycle of cloud infrastructure — from architecture to observability. You bring a bias for action and a continuous improvement mindset to everything you do: if something can be automated, you'll automate it;

if a system is fragile, you'll make it robust. You're equally comfortable diving deep into a production incident and partnering with developers to make their lives easier. In this role you will coach and mentor fellow Dev Ops Engineers on technical best practices, tooling, and operational patterns, helping raise the collective capabilities of the team.

What Skills Do I Need?

5-7+ years of hands‑on Dev Ops or infrastructure engineering experience, with demonstrated ownership of production cloud environments at scale
Proficiency with modern cloud infrastructure tooling — experience with Kubernetes, Helm, Terraform, Ansible, and major cloud providers (AWS and/or Azure) is highly relevant
Proven experience mentoring and coaching engineers — whether formally or informally — and a genuine interest in developing the people around you
Experience managing enterprise‑grade data persistence layers, including No

SQL and SQL databases, key/value stores, and messaging systems (e.g., AMQP, MQTT)
Familiarity with observability and monitoring tooling (e.g., Prometheus, Mimir, Thanos, Grafana) and a strong understanding of what good SRE practice looks like in a fast‑growing SaaS environment
Comfort driving team rituals — sprint planning, standups, retrospectives — and contributing to a high‑performing team culture
Exposure to modern programming or scripting languages used in infrastructure contexts (e.g., Go, Type Script, Python, Bash)
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

Key Responsibilities

Own the deployment, health, and continuous improvement of Tulip's multi‑cloud, multi‑region SaaS environments — including clusters spanning the US, Europe, and Asia
Design and evolve cloud architecture to ensure customer availability, stability, and performance as Tulip scales globally
Contribute to and help shape the infrastructure technical roadmap in partnership with engineering leadership
Own and continuously improve Tulip's CI/CD infrastructure, driving toward a fully automated, human‑interaction‑free software delivery lifecycle
Build automation tooling and internal systems that reduce operational toil and increase developer velocity and bring the team along in building and owning them
Define and maintain observability standards across Tulip's cloud environments, including metrics, alerting, logging, and distributed tracing
Proactively identify performance degradation and capacity risks before they impact customers; lead incident response and drive root cause analysis
Mentor and coach junior and mid‑level…