Senior Site Reliability Engineer; m/f/d
Listed on 2026-02-16
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Location: Germany
Overview
Are you passionate about building reliable, scalable systems and ensuring seamless operations in the healthcare industry? We are seeking an experienced Site Reliability Engineer (m/f/d) to join our team!
Since the foundation in 2019, Famedly has been committed to digitizing medical communication processes in compliance with data protection regulations and thus revolutionizing the healthcare system. Famedly has launched the first gematik-certified TI messenger to improve communication and collaboration within the healthcare sector. Famedly enables medical teams to share sensitive patient information, images and other files in real time and from any location – from medication schedules and lab results to X-rays.
As a dynamic, remote-first startup based in Berlin with a growing and experienced team, we work together every day towards our vision of a healthcare system without information barriers.
We are seeking a Site Reliability Engineer (m/f/d) to join our infrastructure team. In this role, you will ensure the reliability, scalability, and performance of our healthcare-critical systems. You ll design and implement SRE practices, build robust infrastructure, and collaborate with development teams to embed operational excellence throughout the software lifecycle.
Responsibilities- Take ownership of reliability, observability and performance of our backend systems, spanning Rust microservices, containerised deployments on Kubernetes/K8s and production in healthcare-critical environments.
- Design, implement and evolve SRE practices:
Define service-level indicators (SLIs), service-level objectives (SLOs), error budgets, conduct blameless post-mortems, capacity planning and develop disaster-recovery strategies. - Build and maintain our infrastructure as code, CI/CD pipelines, configuration management, and simplify deployment workflows to support rapid, safe product iteration.
- Collaborate closely with development teams to embed reliability and operational thinking early in the development lifecycle.
- Lead automation of incident detection, alerting, diagnostics and remediation: instrumentation of services, structured logs, metrics, tracing, dashboards.
- Work cross-functionally to drive technical standards, share best practices, and mentor engineers on operational maturity.
- Participate in incident response and root-cause investigations. Drive improvements based on findings.
- Contribute to the architecture and roadmap of our platform: propose and evaluate new technologies/services, help evolve our Kubernetes footprint and cloud strategy to meet healthcare-grade compliance and scalability demands.
- Excellent German and English communication skills, both written and spoken.
- Good understanding of modern software architecture, APIs, and system integrations.
- 5+ years of experience in SRE/Dev Ops or infrastructure engineering at scale (preferably with SaaS or B2B products in regulated industries).
- Strong hands-on experience with Kubernetes, container orchestration, service meshes (e.g., Istio) and microservice architecture.
- Strong hands-on experience with observability tools and practices: metrics, logging, tracing, and alerting (e.g., Prometheus, Grafana, Tempo).
- Proficiency in infrastructure as code, Git Ops practices, and CI/CD pipelines.
- Experience with incident management, on-call rotations, and conducting post-mortems to drive continuous improvement.
- Understanding of reliability engineering principles: SLIs/SLOs, error budgets, availability modelling, capacity planning.
- Familiarity with cloud environments, networking, security, compliance in regulated spaces is a strong plus.
- Self-starter mindset, with an ability to influence and uplift engineering culture in a fast-growing startup context.
- Experience with Rust backends, multi-tenant architectures, Kubernetes operators or service-meshes in regulated production.
- Experience with Nix and related tooling.
- Work in a rising and ambitious startup, that is in an exciting start-up phase - We have grown very quickly since 2019 and we still have big plans! Famedly has launched the first gematik-certified TI messenger and we are…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).