Direkt zum Inhalt springen
login.png Login    |
de | en
MyTUM-Portal
Technical University of Munich

Technical University of Munich

Sitemap > Bulletin Board > Diplomarbeiten, Bachelor- und Masterarbeiten > Master’s Thesis: Data-Efficient Language Model Alignment Techniques for Text Generation Tasks
up   Back to  News Board    previous   Browse in News  next    

Master’s Thesis: Data-Efficient Language Model Alignment Techniques for Text Generation Tasks

15.08.2025, Diplomarbeiten, Bachelor- und Masterarbeiten

This thesis explores data-efficient preference alignment techniques for text generation. The goal is to understand and extend hybrid approaches where models leverage limited human supervision alongside their own output feedback. By analyzing double-gradient architectures and iterative refinement strategies, this work aims to improve the efficiency and reliability of LLM alignment with reduced dependency on large, annotated datasets.

TL;DR: This thesis explores data-efficient preference alignment techniques for text generation. The goal is to understand and extend hybrid approaches where models leverage limited human supervision alongside their own output feedback. By analyzing double-gradient architectures and iterative refinement strategies, this work aims to improve the efficiency and reliability of LLM alignment with reduced dependency on large, annotated datasets.

Project Background

Aligning large language models (LLMs) with human preferences is essential for safe and useful text generation. Traditional approaches like Reinforcement Learning with Human Feedback (RLHF) are data- and resource-intensive. Recent research explores hybrid alignment methods that combine small amounts of annotated data with model-generated supervision. Techniques such as SPPO, I-SHEEP, and RS-DPO show that models can iteratively refine themselves by generating, evaluating, and learning from their own outputs, often through mechanisms that allow gradients to pass through the model multiple times. This project investigates such techniques to achieve efficient alignment with minimal human input.

Your Tasks

  • Survey and categorize hybrid alignment methods, focusing on those enabling double gradient flow and iterative refinement (e.g., SPIN, SPPO, I-SHEEP, SPO, RS-DPO).
  • Develop and implement a comparative framework for evaluating selected alignment strategies on text generation tasks.
  • Conduct empirical analysis on alignment quality, data efficiency, and training stability using open datasets and benchmarks (e.g., AlpacaEval, OpenAssistant).
  • Investigate the impact of limited supervision on preference model quality and generalization.
  • Propose and evaluate refinements or novel techniques where applicable.

What We Offer

  • Access to computing resources (GPU clusters) and relevant datasets.
  • Regular supervision and support from researchers with expertise in language model training and alignment.
  • Opportunity to collaborate on publications or contribute to open-source implementations.
  • A focused and relevant research environment in alignment and generative AI.

Project Details

  • Prerequisites: Strong background in deep learning and NLP; experience in PyTorch/JAX and HuggingFace. Familiarity with large language model fine-tuning and alignment is a plus.
  • Preferred Start Date: October 2025 (flexible).
  • How to Apply: Submit a CV, brief motivation highlighting relevant background/experience, and transcripts to marton.szep@tum.de. Please indicate any prior experience with language models or alignment methods.

Wenn Sie selbst eine Diplomarbeit ausschreiben wollen, lesen Sie bitte vorher unbedingt das 'Best Practice Manual Stellenanzeigen'.

Kontakt: marton.szep@tum.de

Todays events

no events today.

Calendar of events