Senior Site Reliability Engineer
Job in
Bellville, 7530, South Africa
Listing for:
Sanlam
Full Time
position
Listed on 2026-02-25
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer
Job Description & How to Apply Below
Who are we?
Sanlam Fintech is a newly established digital first business within the Sanlam Group on a mission to democratize financial advice and solutions for everyone across the African continent. We exist to pioneer inclusive financial confidence helping people build strong foundations to bridge the gap in generational wealth. Our culture us that of agility and constant deployment, we believe in learning fast, learning cheap and learning forward.
Our aim is to provide a work environment where knowledge workers can accelerate the development of their ideas and bring innovation to market, at the same time provide compelling career and development proposition that will enable them to realize their dreams.
Position Overview
The Site Reliability Engineer (SRE) at Sanlam Fintech is responsible for ensuring the reliability, scalability, and performance of our cloud-native infrastructure and services. This role bridges software engineering and operations, applying engineering principles to solve complex infrastructure challenges. The SRE will focus on building and maintaining resilient systems on AWS, implementing comprehensive observability solutions, and driving automation across the infrastructure lifecycle.
Operating in a Dev Ops environment, the SRE takes full ownership of the systems they build and operate, ensuring high availability and optimal customer experience. They work closely with Software Engineers, Platform Engineers, and Dev Sec Ops teams to deliver infrastructure solutions that support Sanlam Fintech business objectives and uphold our commitment to operational excellence.
What will you do?
Reliability & Resilience
Build highly available, fault-tolerant systems on AWSDefine SLIs, SLOs and error budgets to track and improve reliabilityPlan and implement disaster recovery strategies (RTO/RPO)Lead incident response and root cause analysisBuild self-healing systems with automated fixes for common failuresRun chaos engineering tests to find and fix weaknessesObservability & Monitoring
Set up metrics, logs and traces for full system visibilityBuild dashboards and alerts for fast incident detectionImplement distributed tracing to spot performance issuesSet monitoring standards and maintain operational runbooksPublish regular uptime and operational metrics reportsInfrastructure Automation
Write and maintain Infrastructure as Code using Terraform and Cloud FormationAutomate provisioning, configuration and deployments with Dev Ops/Platform teamsBuild and manage CI/CD pipelines using Git Hub ActionsImplement Git Ops practices and self-service automation to reduce manual workCloud Infrastructure & Architecture
Design and optimise serverless solutions (Lambda, API Gateway, Step Functions)Manage and optimise Kubernetes clustersImplement cloud-native patterns like event-driven and microservices architecturesOptimise cloud costs and evaluate new AWS servicesSoftware Engineering & Development
Build clean, well-structured automation tools and scriptsApply Clean Architecture and Domain-Driven Design to infrastructure codeImprove internal tools to boost developer productivityUse AI tools (Claude, GPT) to automate routine tasksCollaboration & Knowledge Sharing
Work with cross-functional teams using Jira, Confluence and JSMParticipate in on-call rotations and incident handoffsMentor junior engineers in SRE practicesDocument decisions, procedures and run blameless postmortemsQualification and Experience
Requi
red
Experi
ence
5+ years of experience in systems engineering, Dev Ops, or site reliability engineering roles3+ years of hands-on experience with AWS cloud services in production environments2+ years of experience with Infrastructure as Code (Terraform and/or Cloud Formation)Demonstrated experience in incident management and on-call responsibilitiesTrack record of implementing automation that reduced operational toilEducati
onal
Background
Bachelor's degree in Computer Science, Information Technology, Engineering or related field; or equivalent practical experienceRelevant professional certifications are advantageous but not requiredWhat will make you successful in this role?
Cloud Platforms & Infrastructure
Strong expertise in AWS services including EC2, ECS, EKS, Lambda, API Gateway, Step Functions, S3, RDS, Dynamo
DB, Cloud Watch and networking services such as VPC, Route
53 and ALB/NLBDeep understanding of serverless architecture patterns and best practicesExperience with Kubernetes cluster management, deployment strategies and service mesh conceptsKnowledge of cloud security best practices including IAM, security groups and encryptionInfrastructure
as
Code&
Automati
on
Proficiency in Terraform for multi-environment infrastructure managementExperience with AWS Cloud Formation for native AWS resource provisioningStrong scripting skills in Python for automation and tooling developmentExperience with configuration management tools and practicesObservabili
ty&
Moni
tori
ng
Expertise in Datadog, Cloudwatch and OTEL for full-stack observability…
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here: