Systems Research Engineer - LLM Optimisation; vLLM/TensorRT-LLM Job Edinburgh area,City of Edinburgh Scotland UK,IT/Tech

Position: Systems Research Engineer - LLM Optimisation (vLLM / TensorRT-LLM)
Location: City of Edinburgh

Systems Research Engineer - LLM Optimisation (vLLM / Tensor

RT-LLM)

Permanent

Edinburgh City Centre (On-site 5 days), walking distance from local transport links

Salary :
Competitive and negotiable, generous benefits package

In an era where Large Language Models (LLMs) are rebuilding the foundational software stack, our client is at the forefront of reshaping how large-scale models are trained, served, and deployed. Operating at the intersection of advanced systems research and industrial-scale engineering, their Edinburgh-based team is driving new AI Infrastructure & Agentic Serving architectures.

This role is a unique opportunity to help define next-generation large-scale data centres and AI infrastructure systems, turning innovative system designs into deployable, real-world technologies.

We are seeking Systems Research Engineers with a deep passion for computer systems, distributed AI infrastructure, and performance optimization. These roles are ideal for recent PhD graduates or exceptional BSc/MSc engineers looking to build research-driven experience in Operating Systems, Distributed Systems, AI Model Serving, Machine learning infrastructure. You will work closely with architects to prototype and optimize the next generation of global AI clusters.

What

you will be doing

Distributed Systems Research & Development : Architect, implement, and evaluate distributed system components for emerging AI and data-centric workloads. Drive modular design and scalability across GPU, and NPU clusters, building highly efficient serving and scheduling systems.
Performance Optimization & Profiling : Conduct in-depth profiling and performance tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems.
Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant AI serving across distributed environments. Research and prototype new techniques for cache sharing, data locality, and resource orchestration and scheduling within AI clusters.
Research & Publications : Translate innovative research ideas into publishable contributions at leading venues (e.g., OSDI, NSDI, Euro Sys, SoCC, MLSys, NeurIPS, ICML, ICLR) while driving internal adoption of novel methods and architectures.
Cross-Team Collaboration : Communicate technical insights, research progress, and evaluation outcomes effectively to multidisciplinary stakeholders and global research teams.

What we are looking for

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field
Fresh PhD graduates in systems, distributed computing, or large-scale AI infrastructure are also welcome
At least 2 years of experience with LLM inference / serving framework optimization (vLLM / Ray Serve / Tensor

RT-LLM / PyTorch)
Hands-on experience with distributed KV cache optimization
Familiarity with GPU and how they execute LLMs
Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure.
Solid grounding in systems research methodology, distributed algorithms, and profiling tools.
Proficiency in C/C++, with additional experience in Python for research prototyping.
Team-oriented mindset with effective technical communication skills

If this sounds like a role you can take hold of, we would love to hear from you! To apply for this role, please send your CV to Maggie Kwong

#J-18808-Ljbffr

Systems Research Engineer - LLM Optimisation; vLLM​/TensorRT-LLM

Systems Research Engineer - LLM Optimisation; vLLM/TensorRT-LLM