×
Hier anmelden um sich kostenlos auf Stellen zu bewerben oder Stellenanzeigen aufzugeben. X

Site Reliability Engineering; SRE Architect - CRL - Germany

in 80331, München, Bayern, Deutschland
Unternehmen: Infosys
Vollzeit position
Verfasst am 2026-01-16
Berufliche Spezialisierung:
  • IT/Informationstechnik
    Systemingenieur, Cloud Computing
Stellenbeschreibung
Stellenbezeichnung: Site Reliability Engineering (SRE) Architect - CRL - Germany

The Role

We are looking for a visionary and highly experienced SRE Architect to lead the design and implementation of our reliability and scalability strategy. You will be the principal architect responsible for creating the blueprint for our production systems, ensuring they are resilient, performant, and highly available. This is a senior-level role that combines deep technical expertise with strategic thinking to influence the entire engineering organization.

You will define the standards and frameworks that empower our SRE and development teams to build and operate world-class services.

Key Responsibilities
  • Architectural Design & Strategy: Design and architect robust, scalable, and fault-tolerant infrastructure and application services on public cloud platforms (AWS, GCP, Azure). Define the long-term vision for system reliability and performance.
  • Reliability Frameworks: Establish and govern the standards for Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets across all engineering teams.
  • Observability & Telemetry: Architect a comprehensive observability strategy. Design the systems for logging, metrics, tracing, and alerting to provide deep insights into system health and facilitate rapid incident response.
  • Automation & Infrastructure as Code (IaC): Lead the strategy for automation and IaC. Design reusable patterns and frameworks using tools like Terraform and Ansible to ensure consistent, repeatable, and secure infrastructure provisioning.
  • Resilience & Chaos Engineering: Proactively identify and mitigate reliability risks. Design and champion the implementation of resilience patterns, disaster recovery plans, and chaos engineering experiments to validate system robustness.
  • Technical Leadership & Mentoring: Act as a thought leader and subject matter expert in reliability engineering. Mentor SREs and developers, evangelize best practices, and lead architectural review sessions to ensure reliability is a core component of every feature.
  • Incident Management Evolution: While not the primary on-call responder, you will analyze major incidents to identify architectural weaknesses and drive the necessary design changes to prevent recurrence. You will help evolve our postmortem culture and incident response capabilities.
Required Qualifications & Skills
  • Experience: 10+ years of experience in software engineering, Dev Ops, or systems engineering, with at least 5 years in a senior SRE or systems architecture role.
  • Cloud Expertise: Expert-level knowledge of at least one major cloud provider (
    AWS, GCP, or Azure
    ), including core services like compute, storage, networking, and managed databases.
  • Containerization & Orchestration: Deep, hands‑on experience designing and managing large‑scale Kubernetes clusters and container‑based microservices architectures.
  • Infrastructure as Code (IaC): Proven expertise in architecting infrastructure with Terraform
    . Proficiency with configuration management tools like Ansible, Chef, or Puppet
    .
  • Extensive experience designing and implementing monitoring and observability solutions using tools like Prometheus, Grafana, Open Telemetry, Jaeger, and the ELK Stack (Elasticsearch, Logstash, Kibana) or similar commercial tools (e.g., Datadog, New Relic).
  • Programming/Scripting: Strong proficiency in a high‑level programming language such as Go or Python for automation, tooling, and building system integrations.
  • Systems Design: Deep understanding of distributed systems, networking protocols (TCP/IP, HTTP), and high‑availability design patterns.
Preferred Qualifications
  • Experience working across multiple cloud environmentsmulti‑cloud).
  • Professional cloud certifications (e.g., AWS Certified Solutions Architect Professional, Google Professional Cloud Architect).
  • Experience with service mesh technologies like Istio or Linkerd.
  • Knowledge of security best practices in a cloud‑native environment (Dev Sec Ops ).
  • Demonstrated experience leading large‑scale technology transformations and influencing engineering culture.
About your team

Our CRL (Consumer Goods, retail & Logistics) practice helps some of the largest global firms and most recognizable local brands solve their biggest challenges in today’s age of constant disruption. With diverse services spanning growth strategy and new product innovation, to omni‑channel customer experience, supply chain resiliency and AI‑driven new business models, we help clients shape and achieve their growth agenda for a sustainable future.

We transform traditional organizations to digitally centric business models and drive new revenue streams.

#J-18808-Ljbffr
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
 
 
 
Suchen Sie hier nach weiteren Stellen:
(nach Beruf, Fähigkeit)
Standort
Increase search radius (miles)

Sprache der Stellenausschreibung
Lebenslauf-Kategorie
Bildungsgrad
Filter
Mindest-Bildungsgrad für die Stelle
Mindest-Berufserfahrung für die Stelle
Veröffentlicht in den letzten:
Gehalt