×
Register Here to Apply for Jobs or Post Jobs. X

Master Thesis Project FALL

Job in Gothenburg, Dawson County, Nebraska, 69138, USA
Listing for: Modulai
Apprenticeship/Internship position
Listed on 2026-06-27
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software), Data Scientist, AI Business & Operations
Job Description & How to Apply Below
Position: Master Thesis Project - 2026 (FALL)
Location: Gothenburg

Reinforcement Learning for Large Language Models (LLMs)

Modulai is offering a master's thesis opportunity focused on applying Reinforcement Learning (RL) to improve the capabilities of large language models (LLMs). Reinforcement learning was first pivotal in aligning LLMs with human preferences, but recent work shows its role now extends much further, RL has become the dominant paradigm for eliciting reasoning, enabling models to acquire advanced problem-solving strategies and adapt to complex, multi-step tasks.

Recent advancements highlight the transformative role of RL in LLM post-training:

  • Deep Seek-R1 demonstrated that reasoning ability can be induced through large-scale RL with verifiable rewards, including a pure-RL variant (R1-Zero) trained with no supervised fine-tuning at all, popularizing RL as the central tool for building reasoning models.
  • Deep Seek Math  explored how reinforcement learning can enable models to handle multi-step mathematical reasoning, and introduced the RL method now widely used across the field, Group Relative Policy Optimization (GRPO).
  • Tulu 3 introduced a family of fully-open post-trained models, leveraging Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO), and a technique dubbed Reinforcement Learning with Verifiable Rewards (RLVR).
  • DAPO released a fully open-source, large-scale RL system that refines GRPO with techniques such as Clip-Higher and dynamic sampling to stabilize long chain-of-thought training, surpassing R1-Zero-level results with substantially fewer training steps.
  • ReTool introduced reinforcement learning for tool use, showing how LLMs can learn to combine text-based reasoning and code interpreters for complex tasks.

This project aims to investigate RL approaches for improving LLMs in specialized domains (such as reasoning and tool use). You will explore open-weight models, implement and compare RL methods inspired by the latest research, and evaluate how reinforcement learning impacts model capabilities. Through this work, you will contribute to the growing understanding of how RL can shape the next generation of LLMs.

ML techniques and tools

  • Open-weight LLMs
  • Reinforcement learning for LLMs
  • Python, PyTorch, Git, Hugging Face
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary