×
Register Here to Apply for Jobs or Post Jobs. X

Manager, Site Reliability Engineering; GCP - Bilingual; French​/English

Job in Greater London, London, Greater London, W1B, England, UK
Listing for: Veson Nautical
Full Time position
Listed on 2026-02-09
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Project Manager
Job Description & How to Apply Below
Position: Manager, Site Reliability Engineering (GCP) - Bilingual (French/English)
Location: Greater London

Description

Manager, Site Reliability Engineering (GCP)

Who We Are:

Veson Nautical empowers the global maritime industry to navigate complexity on all sides of the trade. Veson's platform combines AI-driven workflows, trusted data, and seamless collaboration, to deliver the insight and context needed for confident, competitive decision-making.

The Opportunity:

As the manager of the Site Reliability Engineering team for Google Cloud Platform (GCP) at Veson Nautical, you will be responsible for designing, building, monitoring and supporting the GCP infrastructure that underpins our rapidly growing SaaS platform (10 Kubernetes clusters, over 1800 pods, more than 20 billion documents on Elastic Search) and the services and products that depend upon it. This includes:

  • Oceanbolt - a dynamic data intelligence platform, tracking over 23,000 vessels in real time to deliver accurate, timely market intelligence to drive decision making
  • Shipfix - using proprietary AI-driven tools to infer cargo and vessel information, extracting, anonymizing, and aggregating billions of data points with near real-time processing of email exchanges in the shipping market

Our business and our platforms are experiencing rapid growth, which ensures we have no shortage of exciting and challenging projects to work on.

The Team:

This is a hands-on technical leadership role where you'll manage an extremely talented team of Site Reliability Engineers, while personally contributing to architecture and infrastructure initiatives.

We are looking for a leader who can think systematically and manage complex systems at scale through automation. The successful candidate will be comfortable participating in architectural discussions with software engineers, and from a scalability perspective will ensure that the platform can double in size over the next 1-2 years.

The team is committed to a Dev Ops culture, proactive monitoring and managing infrastructure at scale - they thrive on improving our cloud-native platforms and adopting new technologies.

This position will be in our London office in Southwark, in a hybrid model where we expect a minimum of two days/week attendance in person.

Our Stack:

  • Google Cloud Platform - primarily PaaS services (Bigtable, Cloud SQL, Dataflow, Datastore, GKE, GCS, KMS, Pub/Sub)
  • Email - ingestion through Microsoft Graph API automation and IMAP integrations
  • Elastic Search hosted with Kubernetes Operator
  • CI/CD - Gitlab Pipelines and ArgoCD
  • Infrastructure-as-Code - Terraform, Terragrunt and Atlantis
  • Monitoring and Security - Cloud Armor Enterprise, Grafana / Grafana Tempo, Open Telemetry, Ops Genie, Renovate, Sentry
  • AI Tools - Augment Code, Git Hub Copilot, Claude
Key Responsibilities
  • Lead and mentor a team of Site Reliability Engineers, fostering a culture of excellence, collaboration, and continuous improvement
  • Design, implement, and manage scalable, reliable, and secure cloud infrastructure on Google Cloud Platform
  • Oversee the provisioning and management of containerized applications using Docker and Kubernetes
  • Drive automation initiatives for infrastructure provisioning and configuration management using Terraform and other IaC tools
  • Partner closely with development teams to ensure reliability, performance, and scalability of platforms
  • Establish and maintain comprehensive monitoring, alerting, and observability practices
  • Build processes and discipline to improve consistency, visibility, and documentation across infrastructure and operations
  • Lead incident response efforts and ensure service uptime
  • Develop automation, monitoring, and management solutions
  • Prepare infrastructure for integration and future growth
Skills/Experience Needed to Be Successful in This Role

Required:

  • Previous experience working on a large-scale Software-as-a-Service (SaaS) platform which supported thousands of global users in a 24x7x365 environment
  • Previous experience leading a Site Reliability Engineering / Cloud Ops / Platform / Infrastructure team
  • Operational experience with Google Cloud Platform, Kubernetes and Terraform
  • Programming or scripting experience in Python, bash, or a similar language
  • Experience with cloud cost management (budgeting, anomaly detection,…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary