Senior Site Reliability Engineer Job Bellville area,South Africa,IT/Tech

Who are we?

Sanlam Fintech is a newly established digital first business within the Sanlam Group on a mission to democratize financial advice and solutions for everyone across the African continent. We exist to pioneer inclusive financial confidence helping people build strong foundations to bridge the gap in generational wealth. Our culture us that of agility and constant deployment, we believe in learning fast, learning cheap and learning forward.

Our aim is to provide a work environment where knowledge workers can accelerate the development of their ideas and bring innovation to market, at the same time provide compelling career and development proposition that will enable them to realize their dreams.

Position Overview

The Site Reliability Engineer (SRE) at Sanlam Fintech is responsible for ensuring the reliability, scalability, and performance of our cloud-native infrastructure and services. This role bridges software engineering and operations, applying engineering principles to solve complex infrastructure challenges. The SRE will focus on building and maintaining resilient systems on AWS, implementing comprehensive observability solutions, and driving automation across the infrastructure lifecycle.

Operating in a Dev Ops environment, the SRE takes full ownership of the systems they build and operate, ensuring high availability and optimal customer experience. They work closely with Software Engineers, Platform Engineers, and Dev Sec Ops teams to deliver infrastructure solutions that support Sanlam Fintech business objectives and uphold our commitment to operational excellence.

What will you do?

Reliability & Resilience

Build highly available, fault-tolerant systems on AWS

Define SLIs, SLOs and error budgets to track and improve reliability

Plan and implement disaster recovery strategies (RTO/RPO)

Lead incident response and root cause analysis

Build self-healing systems with automated fixes for common failures

Run chaos engineering tests to find and fix weaknesses

Observability & Monitoring

Set up metrics, logs and traces for full system visibility

Build dashboards and alerts for fast incident detection

Implement distributed tracing to spot performance issues

Set monitoring standards and maintain operational runbooks

Publish regular uptime and operational metrics reports

Infrastructure Automation

Write and maintain Infrastructure as Code using Terraform and Cloud Formation

Automate provisioning, configuration and deployments with Dev Ops/Platform teams

Build and manage CI/CD pipelines using Git Hub Actions

Implement Git Ops practices and self-service automation to reduce manual work

Cloud Infrastructure & Architecture

Design and optimise serverless solutions (Lambda, API Gateway, Step Functions)

Manage and optimise Kubernetes clusters

Implement cloud-native patterns like event-driven and microservices architectures

Optimise cloud costs and evaluate new AWS services

Software Engineering & Development

Build clean, well-structured automation tools and scripts

Apply Clean Architecture and Domain-Driven Design to infrastructure code

Improve internal tools to boost developer productivity

Use AI tools (Claude, GPT) to automate routine tasks

Collaboration & Knowledge Sharing

Work with cross-functional teams using Jira, Confluence and JSM

Participate in on-call rotations and incident handoffs

Mentor junior engineers in SRE practices

Document decisions, procedures and run blameless postmortems

Qualification and Experience

Requi
red
Experi
ence

5+ years of experience in systems engineering, Dev Ops, or site reliability engineering roles

3+ years of hands-on experience with AWS cloud services in production environments

2+ years of experience with Infrastructure as Code (Terraform and/or Cloud Formation)

Demonstrated experience in incident management and on-call responsibilities

Track record of implementing automation that reduced operational toil

Educati
onal
Background

Bachelor's degree in Computer Science, Information Technology, Engineering or related field; or equivalent practical experience

Relevant professional certifications are advantageous but not required

What will make you successful in this role?

Cloud Platforms & Infrastructure

Strong expertise in AWS services including EC2, ECS, EKS, Lambda, API Gateway, Step Functions, S3, RDS, Dynamo

DB, Cloud Watch and networking services such as VPC, Route
53 and ALB/NLB

Deep understanding of serverless architecture patterns and best practices

Experience with Kubernetes cluster management, deployment strategies and service mesh concepts

Knowledge of cloud security best practices including IAM, security groups and encryption

Infrastructure
as
Code&
Automati
on

Proficiency in Terraform for multi-environment infrastructure management

Experience with AWS Cloud Formation for native AWS resource provisioning

Strong scripting skills in Python for automation and tooling development

Experience with configuration management tools and practices

Observabili
ty&
Moni
tori
ng

Expertise in Datadog, Cloudwatch and OTEL for full-stack observability…


Increase/decrease your Search Radius (miles)



Job Posting Language