×
Hier anmelden um sich kostenlos auf Stellen zu bewerben oder Stellenanzeigen aufzugeben. X

Site Reliability Engineer; fmx

in 50667, Köln, Nordrhein-Westfalen, Deutschland
Unternehmen: ilert GmbH
Vollzeit position
Verfasst am 2026-01-15
Berufliche Spezialisierung:
  • IT/Informationstechnik
    Systemingenieur, Site Reliability Ingenieur/in
Stellenbeschreibung
Stellenbezeichnung: Site Reliability Engineer (fmx)

Location: Hybrid Cologne (Rheinauhafen) 3 days in the office 2 remote (Tue Thu)

Team: Engineering Reports to CTO

Keep the world awake build reliability at scale

ilert helps thousands of Dev Ops & IT teams detect fix and communicate incidents faster.

Our platform is mission-critical: customers rely on us 24/7 to keep their always-on businesses running.

As a Site Reliability Engineer at ilert youll own the reliability performance and scalability of our core platform across AWS Kubernetes Kafka and more.

Tasks

Build & operate a highly available platform

  • Run and evolve our AWS-based infrastructure
  • Operate and optimize self-managed Kafka Click House clusters and our Observability stack
  • Ensure resilience disaster recovery and capacity planning across the stack

Improve reliability & performance

  • Build and maintain SLOs SLIs error budgets and observability dashboards
  • Debug production issues across layers (networking Kubernetes application DB)
  • Improve performance of our ingestion pipeline

Automation & tooling

  • Automate operations with Terraform Helm Kubernetes operators and internal tooling
  • Build tooling for safer deploys blue/green rollouts and automated verification
  • Strengthen incident response workflows through deep collaboration with our AI SRE agent team

Security & compliance

  • Implement best practices for workload isolation secrets management IAM and auditability
  • Support our ISO
    27001 posture by automating controls and hardening our infrastructure

Cross-functional impact

  • Partner with Backend AI and Product teams to design reliable services
  • Participate in on-call rotation
  • Lead post-incident reviews and drive reliability improvements long-term
Requirements
  • 3 years experience as SRE Platform Engineer Dev Ops Engineer or Infrastructure Engineer
  • Strong hands‑on experience with AWS Kubernetes Linux internals networking performance tuning
  • Experience operating self-managed distributed systems ideally Kafka or Click House
  • Strong understanding of observability
  • Experience automating infrastructure with Terraform and CI/CD systems
  • Fluent English (our working language);
    German optional
Benefits
  • Product-centric - 100 % focused on solving a mission‑critical pain felt by every always‑on business
  • Hybrid freedom - 2 days remote by default; gorgeous Rheinauhafen roof terrace when youre in town
  • Focus > meetings - We time‑box syncs favour async docs and protect maker time
  • 28 days off - plus public holidays
  • Commute perks - subsidised public transport
Key Skills

Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting

Employment Type : Employee

Experience: years

Vacancy: 1

#J-18808-Ljbffr
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
 
 
 
Suchen Sie hier nach weiteren Stellen:
(nach Beruf, Fähigkeit)
Standort
Increase search radius (miles)

Sprache der Stellenausschreibung
Lebenslauf-Kategorie
Bildungsgrad
Filter
Mindest-Bildungsgrad für die Stelle
Mindest-Berufserfahrung für die Stelle
Veröffentlicht in den letzten:
Gehalt