Site Reliability Engineer - SCI
in
10115, Berlin, Berlin, Deutschland
Verfasst am 2025-12-20
Unternehmen:
SAP SE
Vollzeit
position Verfasst am 2025-12-20
Berufliche Spezialisierung:
-
IT/Informationstechnik
Systemingenieur, Cloud Computing, Site Reliability Ingenieur/in, Netzwerkingenieur
Stellenbeschreibung
We help the world run better
At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong.
What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.
- Build enterprise cloud infrastructure that provides European data sovereignty and hyperscaler-grade capabilities. You'll work on SAP Cloud Infrastructure, help to solve complex distributed systems challenges at scale: multi-region networking, container orchestration, storage systems, and the APIs that connect them.
- We develop solutions using Go, Open Stack, and Kubernetes, tackling problems like:
How do you auto-scale thousands of containers across regions? How do you build resilient storage systems? How do you design APIs that handle massive traffic spikes? Incidents happen, how to keep them low in numbers to make customers and engineers a good life? - Your work will power SAP's production systems and thousands of customer environments. You'll contribute to infrastructure that enables organizations to run mission-critical applications with the performance and reliability they expect from leading cloud platforms.
- In your role as Site Reliability Engineer, you'll ensure the operational excellence of SAP Cloud Infrastructure, improve monitoring, preventive alerting, and reliability engineering practices for enterprise cloud services that provide European data sovereignty and hyperscaler-grade capabilities. You'll help to maintain high availability and performance standards across distributed systems serving thousands of customer environments.
- Your focus will be on challenging robust observability solutions, implementing chaos engineering practices, question gaps, find weak spots, establishing and questioning SLOs/SLIs for complex infrastructure challenges like multi-region networking, container orchestration at scale, and storage systems that handle massive traffic spikes. You'll tackle operational challenges including automated remediation, performance optimization, and incident response for mission-critical systems.
- Contributing to production systems that serve SAP's global customer base, you'll establish reliability standards and contribute to operational tooling that enables organizations to run mission-critical applications with enterprise-grade performance and reliability. Your work will ensure that European organizations can maintain data sovereignty while accessing world-class cloud capabilities that remain highly available and performant under demanding production workloads.
- SRE Foundation: 5+ years of Site Reliability Engineering or operations experience with deep understanding of SLI/SLO/SLA concepts and error budget implementation. Relevant experience in Data Engineering or Data Analysis is of benefit.
- Cloud & Infrastructure: Strong knowledge of Virtualized Infrastructure; best with strong knowledge of Open Stack, Kubernetes, multi-cloud environments with experience managing hyperscaler-grade platforms
- Automation & Monitoring: Proficiency in Python, Go, Bash for reporting automation, with expertise in Prometheus, Grafana, ELK Stack, and distributed tracing would be of benefit.
- Reliability Engineering: Proven experience in high availability design, fault tolerance, chaos engineering, and performance optimization
- Incident Management: Hands-on experience with on-call duties, post-mortem analysis, and systematic toil reduction through automation.
- Data Sources & Data Analysis: Strong skills in handling divers dataset with different data quality should be a natural skill. We use Open Search, Prometheus, open telemetry, Postgre
SQL, Redis, etc. Familiar with Jupyter Notebooks
You'll join Plus One Central Engineering, where innovation meets collaboration. Our culture values engineering excellence, continuous learning, and diverse perspectives. We embrace modern…
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
Suchen Sie hier nach weiteren Stellen:
×