Lead DevOps Engineer
Listed on 2026-05-16
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
This role is located in Somerville, MA. We are a hybrid work environment and are in the office 3+ days per week.
To be considered for this role, candidates must have United States Citizenship due to the nature of the assignments. If you do not have U.S. Citizenship, you will not be considered for the position.
Tulip, the leader in AI-native frontline operations, is helping companies around the world equip their workforce with composable, connected apps, leading to higher quality work, improved efficiency, and end-to-end traceability across operations. Tulip's cloud-native, no-code platform, powered by embedded AI, is driving the digital transformation of industrial environments through composable, human-centric solutions that go beyond disrupting the Manufacturing Execution System (MES) category.
A spinoff out of MIT, Tulip is headquartered in Somerville, MA, with offices in Germany, Hungary, Singapore, and Israel. Tulip has been recognized as a World Economic Forum Global Innovator, a 2024 Deloitte Technology Fast award winner, one of Energage's Top Workplaces USA, and one of Built In Boston's "Best Places to Work" and "Best Midsize Places to Work".
About YouYou're a senior infrastructure engineer who believes that the best system is one that runs itself. You thrive on building automation that eliminates toil, designing for resilience at global scale, and owning the full lifecycle of cloud infrastructure — from architecture to observability. You bring a bias for action and a continuous improvement mindset to everything you do: if something can be automated, you'll automate it;
if a system is fragile, you'll make it robust. You're equally comfortable diving deep into a production incident and partnering with developers to make their lives easier. In this role you will coach and mentor fellow Dev Ops Engineers on technical best practices, tooling, and operational patterns, helping raise the collective capabilities of the team.
- 5-7+ years of hands‑on Dev Ops or infrastructure engineering experience, with demonstrated ownership of production cloud environments at scale
- Proficiency with modern cloud infrastructure tooling — experience with Kubernetes, Helm, Terraform, Ansible, and major cloud providers (AWS and/or Azure) is highly relevant
- Proven experience mentoring and coaching engineers — whether formally or informally — and a genuine interest in developing the people around you
- Experience managing enterprise‑grade data persistence layers, including No
SQL and SQL databases, key/value stores, and messaging systems (e.g., AMQP, MQTT) - Familiarity with observability and monitoring tooling (e.g., Prometheus, Mimir, Thanos, Grafana) and a strong understanding of what good SRE practice looks like in a fast‑growing SaaS environment
- Comfort driving team rituals — sprint planning, standups, retrospectives — and contributing to a high‑performing team culture
- Exposure to modern programming or scripting languages used in infrastructure contexts (e.g., Go, Type Script, Python, Bash)
- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
- Own the deployment, health, and continuous improvement of Tulip's multi‑cloud, multi‑region SaaS environments — including clusters spanning the US, Europe, and Asia
- Design and evolve cloud architecture to ensure customer availability, stability, and performance as Tulip scales globally
- Contribute to and help shape the infrastructure technical roadmap in partnership with engineering leadership
- Own and continuously improve Tulip's CI/CD infrastructure, driving toward a fully automated, human‑interaction‑free software delivery lifecycle
- Build automation tooling and internal systems that reduce operational toil and increase developer velocity and bring the team along in building and owning them
- Define and maintain observability standards across Tulip's cloud environments, including metrics, alerting, logging, and distributed tracing
- Proactively identify performance degradation and capacity risks before they impact customers; lead incident response and drive root cause analysis
- Mentor and coach junior and mid‑level…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).