More jobs:
ML/AI Engineer
Job in
Manchester, Greater Manchester, M9, England, UK
Listed on 2025-12-20
Listing for:
Lloyds Bank plc
Full Time
position Listed on 2025-12-20
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, Data Engineer, AI Engineer
Job Description & How to Apply Below
** Thursday 15 January 2026
** Salary Range**£72,702 - £80,780
** Flexible Working Options
** Hybrid Working, Job Share
** Job Description Summary**.
** Job Description
***
* JOB TITLE:
** ML/AI Engineer
** SALARY:** £70,929 - £85,000 per annum
*
* LOCATION:
** Manchester
*
* HOURS:
** Full-time – 35 hours
** WORKING PATTERN:
** Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Manchester office.
** About this opportunity…
** Exciting opportunity for a hands-on ML/AI Engineer to join our Data & AI Engineering team. You’ll build, automate, and maintain scalable systems that support the full machine learning lifecycle. You will lead Kubernetes orchestration, CI/CD automation (including Harness), GPU optimisation, and large‑scale model deployment, owning the path from code commit to reliable, monitored production services
This is a unique opportunity to shape the future of AI by embedding fairness, transparency, and accountability at the heart of innovation. You’ll join us at an exciting time as we move into the next phase of our transformation. We’re looking for curious, passionate engineers who thrive on innovation and want to make a real impact.
We’re on an exciting journey and there couldn’t be a better time to join us. The investments we’re making in our people, data, and technology are leading to innovative projects, fresh possibilities, and countless new ways for our people to work, learn, and thrive.
** What you’ll do…
*** Compose, build, and operate production‑grade Kubernetes clusters for high‑volume model inference and scheduled training jobs.
* Configure autoscaling, resource quotas, GPU/CPU node pools, service mesh, Helm charts, and custom operators to meet reliability and efficiency targets.
* Implement Git Ops workflows for environment configuration and application releases.
* Build CI/CD pipelines in Harness (or equivalent) to automate build, test, model packaging, and deployment across environments (dev / pre‑prod / prod).
* Enable progressive delivery (blue/green, canary) and rollback strategies, integrating quality gates, unit/integration tests, and model‑evaluation checks.
* Standardise pipelines for continuous training (CT) and continuous monitoring (CM) to keep models fresh and safe in production.
* Deploy and tune GPU‑backed inference services (e.g., A100), optimise CUDA environments, and leverage Tensor
RT where appropriate.
* Operate scalable serving frameworks (NVIDIA Triton, Torch Serve) with attention to latency, efficiency, resilience, and cost.
* Implement end‑to‑end observability for models and pipelines: drift, data quality, fairness signals, latency, GPU utilisation, error budgets, and SLOs/SLIs via Prometheus, Grafana, and Dynatrace.
* Establish actionable alerting and runbooks for on‑call operations; drive incident reviews and reliability improvements.
* Operate a model registry (e.g., MLflow) with experiment tracking, versioning, lineage, and environment‑specific artefacts.
* Enforce audit readiness: model cards, reproducible builds, provenance, and controlled promotion between stages
** What you’ll need…
*** Strong Python for automation, tooling, and service development.
* Deep expertise in Kubernetes, Docker, Helm, operators, node‑pool management, and autoscaling.
* CI/CD expertise having hands‑on experience with Harness (or similar) building multi‑stage pipelines; experience with Git Ops, artefact repositories, and environment promotion.
* Practical experience with CUDA, Tensor
RT, Triton, Torch Serve, and GPU scheduling/optimisation.
* Proficiency in Prometheus, Grafana, Dynatrace defining SLIs/SLOs and alert thresholds for ML systems.
* Experience operating MLflow (or equivalent) for experiment tracking, model bundling, and deployments.
* Expert use of Git, branching models, protected merges, and code‑review workflows.
** It would be great if you had any of the following…
*** Experience with GCP (e.g., GKE, Cloud Run, Pub/Sub, Big Query) and Vertex AI (Endpoints, Pipelines, Model Monitoring, Feature Store).
* Hooks for prompt/version management, offline/online evaluation, and human‑in‑the‑loop workflows (e.g., RLHF) to enable…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×