Machine Learning Systems Engineer
Listed on 2026-05-16
-
IT/Tech
AI Software Engineer, Systems Engineer, Machine Learning/ ML Engineer
We are building real-time conversational AI systems for contact centres, powered by ASR, LLMs, and TTS.
As an LLM Systems Engineer, you will sit within our LLM team and focus on the systems layer that makes production Conversational AI work ’ll design and improve the infrastructure, orchestration, and runtime systems behind low-latency conversational AI workflows.
This role focuses on solving the technical challenges associated with delivering real-time AI conversations: coordinating complex AI systems under strict latency and reliability constraints.
What you’ll do- Design and build systems that enable LLM workflows to maintain real-time responses even under peak load
- Improve latency, throughput, concurrency, and reliability across our production systems
- Build orchestration logic for model calls, services, queues, retries, fallbacks, and routing that balances load management with low response times
- Help scale systems to support high volumes of concurrent real-time conversations
- Optimise memory usage and resource efficiency across LLM-powered services
- Deploy and support autoscaling in AI services running in AWS-based systems
- Build observability into AI workflows, including monitoring, logging, alerting, and performance tracking
- Work closely with data scientists, MLEs, prototype engineers, and backend engineers
- Help turn LLM capabilities into stable, scalable production Conversational AI systems
- Experience building production backend systems, distributed systems, or ML infrastructure
- Strong understanding of scalability, latency, reliability, and performance engineering
- Experience with cloud infrastructure, ideally AWS
- Experience working with APIs, queues, service orchestration, and production monitoring
- Understanding of how LLMs are used in production systems
- Ability to reason about concurrency, throughput, memory usage, and failure handling
- Experience with conversational AI, voice systems, ASR, TTS, or real-time streaming systems
- Experience with model serving or inference infrastructure
- Exposure to open-source LLMs or LLM orchestration frameworks
- Experience with Docker, Kubernetes, ECS, or similar container orchestration tools
- Experience with Redis, Kafka, Kinesis, SQS, or similar queueing/event systems
- Familiarity with monitoring tools such as Cloud Watch, Prometheus, or Grafana
You’ll help build the systems behind real-time AI conversations used in production contact centre environments. This is a high-impact engineering role focused on low latency, scalability, reliability, and making LLM-powered systems work under real-world load.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: