DevOps SRE Engineer - Observability & Automation
Job Description & How to Apply Below
Job Title
Urgent requirement for Dev Ops SRE Engineer - Observability & Automation required for our banking clients in Abu Dhabi, UAE.
Responsibilities- Define and implement SLIs / SLOs and error budgets for business‑critical digital banking services.
- Build actionable observability (metrics, logs, traces, dashboards, and alerts) using Dynatrace, Prometheus, Grafana, and the ELK stack, while reducing alert fatigue.
- Leverage AI‑driven insights and anomaly detection (Dynatrace Davis AI or equivalent AIOps platform) to proactively predict and resolve reliability issues before impact.
- Lead incident management—from on‑call triage and root‑cause analysis to blameless postmortems with actionable follow‑ups.
- Improve deployment safety with robust rollout/rollback strategies, canary and blue‑green deployments, and production readiness reviews.
- Support and optimize microservices‑based architectures, ensuring service reliability, scalability, and inter‑service resilience.
- Conduct capacity planning, performance tuning, and resilience testing, optimizing for reliability and cost efficiency.
- Automate operational toil—runbooks, remediation scripts, proactive health checks, and self‑healing workflows.
- Collaborate with Dev Ops to embed reliability gates and validations into CI/CD pipelines (Git Hub Actions, Jenkins, Git Lab CI/CD, Azure Dev Ops).
- Own and evolve the observability and AIOps stack, driving intelligent automation and predictive alerting capabilities.
- Maintain high‑quality documentation, playbooks, and operational standards across environments.
- Ensure operational compliance and security alignment with internal controls and regulatory standards.
- Analyze system performance, availability, and cost data to continually optimize operations.
- Provide reliability support and escalation guidance for critical production systems during major incidents.
- 5+ years of experience in SRE or Dev Ops roles, building and managing large‑scale, high‑availability systems across banking, fintech, e‑commerce, or other data‑intensive digital ecosystems.
- Bachelor’s degree in Computer Science or equivalent technical experience.
- Strong experience with Linux environments and performance troubleshooting.
- Proven expertise in Terraform and Infrastructure as Code (IaC) methodologies.
- Proficiency with Kubernetes and container orchestration in microservices environments.
- Hands‑on experience with AWS (preferred); exposure to Azure or GCP is an advantage.
- Deep knowledge of Dynatrace (AIOps, Davis AI), Prometheus, Grafana, and the ELK stack.
- Experience implementing AI/ML‑driven reliability or automation solutions (AIOps, anomaly detection, predictive alerting).
- Practical understanding of CI/CD pipelines (Git Hub Actions, Jenkins, Git Lab CI/CD, Azure Dev Ops).
- Experience with Kafka, Rabbit
MQ, Redis, Aurora, and RDS databases. - Strong scripting or programming skills in Python, Bash, or Go.
- Kafka, Rabbit
MQ, Redis, RDS/Aurora - Observability (metrics, logs, traces, dashboards, alerts)
- Kubernetes, Docker, container orchestration, microservices support
- Terraform, IaC
- Linux environments, performance troubleshooting
- Banking domain experience
Automation, reliability, Dev Ops.
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×