Master Thesis Project FALL
Listed on 2026-06-27
-
IT/Tech
Machine Learning/ ML Engineer, AI Engineer (Applied/Software), Data Scientist, AI Business & Operations
Location: Gothenburg
Reinforcement Learning for Large Language Models (LLMs)
Modulai is offering a master's thesis opportunity focused on applying Reinforcement Learning (RL) to improve the capabilities of large language models (LLMs). Reinforcement learning was first pivotal in aligning LLMs with human preferences, but recent work shows its role now extends much further, RL has become the dominant paradigm for eliciting reasoning, enabling models to acquire advanced problem-solving strategies and adapt to complex, multi-step tasks.
Recent advancements highlight the transformative role of RL in LLM post-training:
- Deep Seek-R1 demonstrated that reasoning ability can be induced through large-scale RL with verifiable rewards, including a pure-RL variant (R1-Zero) trained with no supervised fine-tuning at all, popularizing RL as the central tool for building reasoning models.
- Deep Seek Math explored how reinforcement learning can enable models to handle multi-step mathematical reasoning, and introduced the RL method now widely used across the field, Group Relative Policy Optimization (GRPO).
- Tulu 3 introduced a family of fully-open post-trained models, leveraging Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO), and a technique dubbed Reinforcement Learning with Verifiable Rewards (RLVR).
- DAPO released a fully open-source, large-scale RL system that refines GRPO with techniques such as Clip-Higher and dynamic sampling to stabilize long chain-of-thought training, surpassing R1-Zero-level results with substantially fewer training steps.
- ReTool introduced reinforcement learning for tool use, showing how LLMs can learn to combine text-based reasoning and code interpreters for complex tasks.
This project aims to investigate RL approaches for improving LLMs in specialized domains (such as reasoning and tool use). You will explore open-weight models, implement and compare RL methods inspired by the latest research, and evaluate how reinforcement learning impacts model capabilities. Through this work, you will contribute to the growing understanding of how RL can shape the next generation of LLMs.
ML techniques and tools
- Open-weight LLMs
- Reinforcement learning for LLMs
- Python, PyTorch, Git, Hugging Face
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).