Team Lead Operations
50667, Köln, Nordrhein-Westfalen, Deutschland
Verfasst am 2026-02-27
-
IT/Informationstechnik
Cloud Computing, Systemingenieur, IT Projekt Manager, IT Support
About Working at envelio
Too easy is boring! Together, we are on a mission to drive forward the energy transition. We love what we do, and we are unafraid to dive in. We believe in taking ownership of our work and in continuously growing and evolving.
In short: own it, love it, grow with it.
We are a humble team of coffee and maté lovers with over 20 nationalities. With our geek humor, our love for emojis and random facts is only natural. Over 150 envelians are already on board. Dive in and thrive!
Your RoleAs Team Lead Operations (all genders) you will build a deeply technical team of around 6 people focused on thestable, secure, and predictable operation of our product: the Intelligent Grid Platform (IGP).
Your team is responsible for product operations: keeping customer IGP environments healthy, managing operational processes such as incident handling and releases, and driving systematic reliability improvements based on real production signals. You work closely with Product, Customer Success, and Engineering teams. You also partner with the SRE/Infrastructure team that owns the platform foundation (cluster provisioning, deployment pipelines, observability tooling, etc.), while your team focuses on running IGP for customers day to day.
You will help evolve our operating model towards 24/7 reliability for customer environments (processes, ownership, and escalation), together with Engineering, SRE/Infrastructure, and Customer Success.
- You coach, mentor, and help your direct reports grow through 1:1s, performance reviews, and regular feedback
- You own and evolve the operational execution of the IGP across customer environments
- You ensure fast, structured handling of customer-impacting issues and incidents and drive effective follow-ups so the same issues do not reoccur
- You create clarity around ownership and escalation paths for production topics and coordinate efficiently across squads and Customer Success
- You drive operational excellence: pragmatic incident response, calm communication, and a continuous improvement culture with blameless learnings
- You balance short-term operational work (restore service) with long-term investments (reduce toil, improve reliability, improve tooling and runbooks)
- You shape team priorities, capacity, and roadmap: decide what gets attention now vs. what becomes a planned reliability investment
- You support hiring and team development by identifying and attracting talent, and by shaping growth paths within your team
Perfection is a myth! We’re more interested in the human behind the screen, so think of these criteria as helpful directions — we're excited to see how your unique skills might fit in.
- You have strong experience operating complex applications and understand how to run services reliably under real-world constraints
- You are comfortable with incident management, root cause analysis, and prioritizing operational work under time pressure
- You have proven experience leading and developing a team in an operations-heavy environment
- You are strong at stakeholder management and coordination across teams (Engineering squads, Product, Customer Success)
- You have a continuous improvement mindset: you reduce operational toil via better processes, automation, and documentation
- You can communicate clearly in high-pressure situations and create alignment on next steps
- You are fluent in German and English
- Clear ownership for production topics, and efficient coordination across squads and Customer Success
- Structured incident handling (restore service, communicate clearly, then follow up on root causes)
- Release operations with a pragmatic risk mindset (safe changes,fast rollback when needed)
- Monitoring and alerting hygiene (signal over noise)
- Strong runbooks and automation to reduce operational toil over time
- Multi-cloud, hybrid on-prem setup with Kubernetes and Helm as the common denominator
- Application primarily written in Python and Type Script
- Standard backing services like Postgre
SQL, Rabbit
MQ, Redis - Gitlab & Gitlab CI for managing the Software Delivery Lifecycle
- Terraform for Infrastructure as Code
- Join us fully remote (#LI-Remote) or at our lovely office in Cologne in a hybrid working mode
- Option for remote work from abroad (up to three months per year from anywhere in the EU or the USA)
- State of the art technology and modern tech stack
- Excellent hardware equipment (16 inch Mac Books, 2 screens at your workplace)
- 30 holidays + 3 corporate holidays
- Support for your health through sports membership cooperations
- Flexible use of a monthlymobility budget (e.g. Jobrad, ÖPNV)
- Time and resources for individual growth
- envelio pension plan
- Regular company and team events
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten: