×
Hier anmelden um sich kostenlos auf Stellen zu bewerben oder Stellenanzeigen aufzugeben. X

Senior Site Reliability Engineer

in 10115, Berlin, Berlin, Deutschland
Unternehmen: Talon.One LinkedIn Jobs
Vollzeit position
Verfasst am 2026-03-02
Berufliche Spezialisierung:
  • IT/Informationstechnik
    Systemingenieur, Site Reliability Ingenieur/in, Cloud Computing
Gehalts-/Lohnspanne oder Branchenbenchmark: 60000 - 80000 EUR pro Jahr EUR 60000.00 80000.00 YEAR
Stellenbeschreibung

Talon.

One is the most powerful incentives engine that unifies loyalty, promotions and gamification into one holistic platform. Backed by enterprise-grade security and scalability, Talon.

One empowers companies to build personalized, profitable promotions and loyalty programs using any data.

Today, over 250 of the world’s most-loved brands including Adidas, Sephora and Carlsberg work with Talon.

One to drive deeper engagement and lasting loyalty with their customers.

ABOUT THE TEAM

As our Senior Site Reliability Engineer, you will own and drive reliability across the Talon.

One platform. This is a hands‑on senior role with broad impact. You will shape how we design, measure, and improve reliability across the entire engineering organization.

You will build and evolve our reliability foundations, from observability architecture and SLO frameworks to incident management and production standards. You will not only respond to incidents, but systematically eliminate their root causes. You will reduce operational toil through automation, improve signal quality across our monitoring systems, and guide engineering teams in building resilient, scalable services by design.

If you enjoy building practical systems, setting technical direction, and delivering measurable reliability improvements across a complex distributed platform, this role is for you.

ONCE YOU ARE HERE YOU WILL
  • Own reliability outcomes: availability, latency, error rates, and overall operational health.
  • Define and introduce SLOs and error budgets to establish clear reliability targets and drive engineering prioritization.
  • Guide the engineering organization with designs, standards, and best practices to ensure reliability and stability across the Talon.

    One product.
  • Build and evolve observability across metrics, logs, and traces, making the system understandable, not just monitored.
  • Design and improve our monitoring/observability architecture end‑to‑end, including data pipelines, signal quality, alert strategy, dashboards, and SLO implementation, and cost‑aware scalability.
  • Eliminate operational toil by building reliability tooling and automation that reduces repetitive work and improves system resilience.
  • Drive structural improvements by identifying and addressing the underlying causes of incidents, not just managing their symptoms.
  • Lead and continuously improve incident management: on‑call readiness, severity handling, stakeholder communication, blameless post‑mortems, and strong follow‑through.
  • Drive continuous improvement: reduce noisy alerts, close reliability gaps, and automate recurring operational work.
  • Work deeply in Kubernetes and cloud environments, especially Google Cloud, and make deployments safer and more predictable.
  • Operate with Git Ops principles: reliability changes are versioned, reviewed, traceable, and reproducible.
WHAT WE NEED YOU TO BRING TO THE TABLE
  • A strong sense of ownership for production health, proactively driving improvements in stability, performance, and resilience.
  • The ability to establish and evolve SLO‑driven reliability practices in an organization that is building this muscle.
  • Strong observability instincts with a focus on signal over noise, turning metrics, logs, and traces into actionable insight through clean dashboards, meaningful alerts, and well‑defined SLOs instead of alert fatigue.
  • Hands‑on experience with the Grafana stack, including Prometheus, Grafana Alloy, Loki, and Tempo, with practical knowledge of pipeline design, scaling considerations, and maintaining high signal quality.
  • Experience designing or significantly improving monitoring and observability architectures across collection, storage, retention, cardinality control, tagging strategy, cost awareness, and ensuring the reliability of the observability stack itself.
  • Solid understanding of Kubernetes workloads, networking, scaling patterns, and failure modes, with real‑world experience operating systems in Google Cloud environments.
  • Understanding of the Open Telemetry protocol and its role in modern observability architectures.
  • A proactive mindset. You bring solutions, clearly articulate design options and trade‑offs, and drive initiatives through to…
Stellen-Anforderungen
10+ Jahre Berufserfahrung
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
 
 
 
Suchen Sie hier nach weiteren Stellen:
(nach Beruf, Fähigkeit)
Standort
Suchradius erweitern (Meilen)

Sprache der Stellenausschreibung
Lebenslauf-Kategorie
Bildungsgrad
Filter
Mindest-Bildungsgrad für die Stelle
Mindest-Berufserfahrung für die Stelle
Veröffentlicht in den letzten:
Gehalt