Machine Learning Systems Engineer Job Manchester area,England UK,IT/Tech

We are building real-time conversational AI systems for contact centres, powered by ASR, LLMs, and TTS.

As an LLM Systems Engineer, you will sit within our LLM team and focus on the systems layer that makes production Conversational AI work ’ll design and improve the infrastructure, orchestration, and runtime systems behind low-latency conversational AI workflows.

This role focuses on solving the technical challenges associated with delivering real-time AI conversations: coordinating complex AI systems under strict latency and reliability constraints.

What you’ll do

Design and build systems that enable LLM workflows to maintain real-time responses even under peak load
Improve latency, throughput, concurrency, and reliability across our production systems
Build orchestration logic for model calls, services, queues, retries, fallbacks, and routing that balances load management with low response times
Help scale systems to support high volumes of concurrent real-time conversations
Optimise memory usage and resource efficiency across LLM-powered services
Deploy and support autoscaling in AI services running in AWS-based systems
Build observability into AI workflows, including monitoring, logging, alerting, and performance tracking
Work closely with data scientists, MLEs, prototype engineers, and backend engineers
Help turn LLM capabilities into stable, scalable production Conversational AI systems

What we’re looking for

Experience building production backend systems, distributed systems, or ML infrastructure
Strong understanding of scalability, latency, reliability, and performance engineering
Experience with cloud infrastructure, ideally AWS
Experience working with APIs, queues, service orchestration, and production monitoring
Understanding of how LLMs are used in production systems
Ability to reason about concurrency, throughput, memory usage, and failure handling

Nice to have

Experience with conversational AI, voice systems, ASR, TTS, or real-time streaming systems
Experience with model serving or inference infrastructure
Exposure to open-source LLMs or LLM orchestration frameworks
Experience with Docker, Kubernetes, ECS, or similar container orchestration tools
Experience with Redis, Kafka, Kinesis, SQS, or similar queueing/event systems
Familiarity with monitoring tools such as Cloud Watch, Prometheus, or Grafana

You’ll help build the systems behind real-time AI conversations used in production contact centre environments. This is a high-impact engineering role focused on low latency, scalability, reliability, and making LLM-powered systems work under real-world load.

#J-18808-Ljbffr