Senior Software Engineer - SRE Focused Job Abu Dhabi area,UAE/Dubai,IT/Tech

We are a B2B Wealth Tech startup based in Abu Dhabi and backed by BNY Mellon and Lunate. The company has raised $300M to build a state‑of‑the‑art wealth technology platform.

Our mission is to power and grow our clients’ Wealth franchises through differentiated experiences, financial solutions, and insights. Our digital wealth management platform will enable banks and other financial institutions in the Middle East to grow and further penetrate affluent, HNW and UHNW investor segments.

Role

We’re building a team that owns production incident response, deep debugging, and permanent fixes across application, data, and deployment layers. This is not a tickets‑only ops role. You will write code, ship fixes safely, and harden the platform so issues don’t repeat. This is a software engineering role with real production ownership. You’ll combine engineering and operations to own outcomes end‑to‑end: investigate incidents, ship code fixes, and prevent repeat issues through tests, observability, and hardening.

Lead and execute production incident response: triage, mitigation, stakeholder communication, and coordination across teams
Debug and fix issues across Go services (mandatory) and the broader stack (Node.js services where relevant)
Work across service boundaries:
Graph

QL/RPC, distributed tracing, dependency failures, performance bottlenecks, and safe degradation patterns
Troubleshoot Kubernetes workloads and deployments
Diagnose Postgre

SQL/CNPG issues
Handle production bugs that span application + data pipelines (ETL/Snowflake mappings), including backfills/replays and data‑quality validation
Build prevention: add regression tests, improve observability, and maintain runbooks/service passports
Drive reliability improvements: SLOs/SLIs, alert quality, release readiness checks, and operational standards across teams

Requirements

7+ years in SRE/Production Engineering/Platform Engineering (reliability‑focused)
Strong Go (mandatory): ability to read, debug, and ship production fixes in Go codebases
Proven experience debugging distributed systems in production (latency, error rates, timeouts, retries, cascading failures)
Strong hands‑on experience with Kubernetes in production environments
Experience with Helm and Git Ops workflows (FluxCD preferred; ArgoCD acceptable)
Solid Postgre

SQL troubleshooting experience (performance, incident patterns, migrations)
Observability experience (metrics/logging/tracing; Datadog/Grafana/Tempo/Loki experience is a plus)
Strong incident leadership: calm under pressure, clear communication, structured problem‑solving
Engineering hygiene: PR discipline, reviews, testing mindset, safe rollouts/rollbacks
Comfortable with IAM/security fundamentals in real production systems: OAuth2/OIDC basics, RBAC/least privilege, and safe secrets handling

Good to Have

Node.js backend experience in production
Experience in Fin Tech/regulated environments/high‑availability systems (auditability, change control, incident rigor)
Data reliability experience: ETL monitoring, reconciliation, Snowflake operations, schema/mapping drift handling
Reliability patterns common to trading/fintech platforms: correctness and data integrity mindset (idempotency, reconciliation), resilient partner integrations, and strong observability for critical user journeys

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

IT Services and IT Consulting

Referrals increase your chances of interviewing at Alpheya by 2x

Senior Software Engineer, Systems Infrastructure

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language