×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Greater London, London, Greater London, W1B, England, UK
Listing for: CMG (Capital Markets Gateway)
Full Time position
Listed on 2026-06-03
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 GBP Yearly GBP 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Location: Greater London

About the Company

Capital Markets Gateway LLC (CMG) is a capital markets‑focused fintech transforming global equity capital markets (ECM) through data, technology, and connectivity. As the preferred source for ECM analytics and the first network connecting the buy‑side and sell‑side for ECM workflows, CMG is committed to reshaping how capital markets operate. Founded in 2017, CMG has completed three successful fundraising rounds and is backed by prestigious financial institutions.

The CMG platform is currently relied upon by nearly 150 buy‑side firms representing $40 trillion in AUM and 22 global investment banks. For more information, visit (Use the "Apply for this Job" box below)..

The Role

CMG is looking for a Site Reliability Engineer (SRE) with a strong focus on monitoring, observability, and alerting to ensure the reliability, performance, and scalability of our infrastructure and applications. You will design, implement, and maintain monitoring solutions to provide visibility into system health and performance, proactively detect anomalies, and reduce incident response time.

Engineering Team

The CMG engineering team consists of domain experts who work collaboratively within a culture of cross‑domain knowledge sharing. Engineers are encouraged to challenge the status quo, seek improvement, and explore solutions with bleeding‑edge technologies such as AI. The team values research, prototyping, and best practices from code review to production rollouts, including pull requests, test automation, code coverage, containerization, and one‑click deployments.

Responsibilities

Monitoring & Observability
  • Design, implement, and maintain monitoring and observability solutions using Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert‑Manager), Datadog, and Open Telemetry.
  • Define and implement SLOs, SLIs, and error budgets to measure system reliability.
  • Develop and optimize dashboards, alerts, and reports for system performance and business metrics.
Alerting & Incident Management
  • Design actionable alerting strategies to minimize noise and improve MTTR.
  • Integrate alerting systems with Jira.
  • Establish and refine runbooks for on‑call teams to handle alerts efficiently.
  • Empower teams to ensure observability coverage and incident response practices.
Performance Optimization
  • Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system efficiency, scalability, and cost‑effectiveness.
  • Help conduct load testing and capacity planning to ensure systems can handle peak traffic loads.
Automation and Tooling
  • Identify opportunities for automation and develop tools to streamline operational processes, such as fail‑over, configuration management, and monitoring.
  • Implement monitoring and alerting systems within automations to detect and resolve issues proactively.
Collaboration and Communication
  • Collaborate closely with cross‑functional teams, including software engineers, operations, and infrastructure teams, to understand system requirements, provide technical guidance, and drive solutions.
  • Communicate effectively to stakeholders about system changes, incidents, and improvements.
  • Promote and spread SRE principles and practices across the company.
Qualifications
  • Must be based in Latin America.
  • English level - C1 or C2.
  • Proven experience as a Site Reliability Engineer or similar role.
  • Proficiency in logging, metrics, and tracing frameworks (Data Dog, Loki, Prometheus, Open Telemetry).
  • Experience with cloud platforms (Azure preferred) and infrastructure‑as‑code tools (e.g., Terraform).
  • Strong programming and scripting skills (Python, Bash).
  • Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes).
  • Understanding of Linux‑based systems, networking, and security principles related to containerized applications.
  • Strong problem‑solving and troubleshooting skills, with a passion for identifying and resolving complex technical issues.
  • Excellent communication and collaboration abilities.
  • Ability to thrive in a fast‑paced, constantly evolving environment.
  • Experience with Postgre

    SQL monitoring and optimization (optional / nice to have).
Tech Stack
  • Azure as an infrastructure provider.
  • Docker + Kubernetes for…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary