Direkt zum Inhalt springen
login.png Login    |
de | en
MyTUM-Portal
Technical University of Munich

Technical University of Munich

Sitemap > Bulletin Board > Diplomarbeiten, Bachelor- und Masterarbeiten > Masters Thesis: Chunking German Legal Texts at Legal Tech Professorship
up   Back to  News Board    previous   Browse in News  next    

Masters Thesis: Chunking German Legal Texts at Legal Tech Professorship

15.10.2025, Diplomarbeiten, Bachelor- und Masterarbeiten

This projects goal is to design and evaluate smarter chunking methods for German legal texts so LLMs keep crucial context (like cross-references and conditions). You’ll analyze legal text structure, implement several chunking strategies (fixed, sliding window, semantic/agentic), and build benchmarks for retrieval and QA. Some NLP/IR familiarity helps, but all skills can be learned.

Project: Chunking German Legal Texts

Chunking German Legal Texts

A project to develop and evaluate chunking strategies for German legal documents in LLM applications.

Project description

This project aims to develop a framework chunking German legal texts for Large Language Models (LLMs). Legal texts present several challenges: they contain complex cross-references, hierarchical structures, and semantically dense passages where preserving context is necessary for correct interpretation. Traditional fixed-length chunking often breaks legal arguments or separates critical citations from their context.

Chunking is crucial for LLMs as it determines how documents are segmented to fit within context windows, and poor chunking strategies lead to lost semantic relationships, inaccurate interpretations that result in retrieval failures. For instance, when asked whether a contract permits early termination, an LLM using poorly chunked text might retrieve only the section mentioning “termination” but miss the critical conditions leading to incorrect legal advice. The goal is to develop and evaluate strategies that enable more accurate legal document retrieval, and question-answering systems.

Objectives

  • Analyze characteristics of legal texts to identify optimal chunking.
  • Implement and compare multiple chunking strategies (e.g., fixed, sliding window, semantic-based, agentic-based).
  • Create a benchmarking setup to assess chunking quality on legal QA and retrieval tasks.

Requirements

  • Understanding of legal document structures.
  • Experience with natural language processing and text segmentation.
  • Familiarity with semantic similarity measures and embedding models.
  • Basic knowledge of information retrieval.

How to apply

All necessary skills can be learned during the project, so feel free to apply. Contact max.prior@tum.de with your CV and university transcripts.

Kontakt: max.prior@tum.de

More Information

https://www.cs.cit.tum.de/en/lt/tum-legal-tech-working-group/

Todays events

no events today.

Calendar of events