Site Reliability Engineer
in
44787, Bochum, Nordrhein-Westfalen, Deutschland
Verfasst am 2026-01-19
Unternehmen:
SonarSource
Vollzeit
position Verfasst am 2026-01-19
Berufliche Spezialisierung:
-
IT/Informationstechnik
Cyber-Sicherheit, Systemingenieur, Cloud Computing, Netzwerksicherheit
Stellenbeschreibung
What You Will Do Daily :
- System Health Monitoring, Alert Triaging, and Error Budget Management :
Dedicate time to monitoring critical security infrastructure (e.g., identity platforms, firewalls, compliance systems) and core infrastructure components. Focus on using and maintaining dashboards tied to Service Level Objectives (SLOs), triaging high-severity alerts, and analyzing the current Error Budget burn rate to guide prioritization for the rest of the day. - Infrastructure as Code (IaC) and Policy as Code Development :
Spend the largest portion of time writing, reviewing, and testing code (e.g., Python, Go, Terraform, or proprietary tools) to automate the deployment, configuration, and security hardening of infrastructure. This involves treating infrastructure and security policies as software to ensure consistency and prevent configuration drift. - Toil Elimination and Automation of Operational Tasks :
Identify, scope, and implement automated solutions for manual, repetitive, and time‑consuming tasks (toil) related to security patching, compliance checks, certificate rotations, or infrastructure maintenance. The goal is to continuously reduce the operational workload for the team. - Security Pipeline and Observability Maintenance :
Maintain and enhance the Dev Sec Ops security tools integrated into the CI/CD pipelines (e.g., static analysis, vulnerability scanning, security configuration checks). Ensure the end‑to‑end logging, metrics, and tracing (observability) systems for both infrastructure and security tools are robust, accurate, and provide immediate diagnostic capability during incidents. - Incident Response Engineering and Post‑Mortem Action :
Participate in the on‑call rotation and actively engage in engineering solutions derived from post‑mortems. This means turning incident root causes into preventative measures implemented via code, improving runbooks into automated actions, and reducing Mean Time To Resolution (MTTR) for future incidents.
- Deep IaC Expertise :
Professional experience provisioning and managing complex infrastructure using tools like Terraform or Cloud Formation (AWS), or similar tools like Ansible or Puppet for configuration management. - Cloud / Platform Experience :
Hands‑on experience with a major cloud provider (AWS, GCP, Azure) or managing large‑scale internal/private cloud infrastructure. - SLO / SLI Implementation :
Practical experience defining, measuring, and reporting on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services. - Logging / Metrics / Tracing Stacks :
Proven experience with modern observability platforms (e.g., Prometheus / Grafana, ELK / EFK stack, proprietary systems, or vendor solutions like Datadog / Splunk) for proactive issue identification. - Networking :
Strong understanding of core networking concepts (TCP/IP, DNS, Load Balancing, Firewalls, Proxies) sufficient to debug complex service connectivity and latency issues. - Automation of Security Controls :
Experience implementing security best practices via code, such as automated vulnerability scanning, configuration hardening, secret management (e.g., Hashi Corp Vault), and key rotation. - Identity and Access Management (IAM) :
Practical experience managing large‑scale IAM systems (e.g., implementing least‑privilege policies, single sign‑on). - Incident Management :
Experience running or significantly contributing to post‑incident reviews (post‑mortems) and prioritizing resulting engineering work (error budget management).
- Our culture and mission set us apart. We have a dynamic work culture that values respect and kindness and embraces the right to fail (and get right back up again!).
- Great people make a great company. We value people skills as much as technical skills and strive to keep things friendly while still being passionate leaders in our domains.
- We have a flexible work policy that includes 3 days in‑office and 2 days work‑from‑home each week for those located near our office locations; some locations such as Dubai, India, Japan and Australia operate fully remotely.
- We have a growth mindset. We love learning and…
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
Suchen Sie hier nach weiteren Stellen:
×