Masters Thesis: Chunking German Legal Texts
15.10.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten
Chunking German Legal Texts
Project description
This project aims to develop a framework chunking German legal texts for Large Language Models (LLMs). Legal texts present several challenges: they contain complex cross-references, hierarchical structures, and semantically dense passages where preserving context is necessary for correct interpretation. Traditional fixed-length chunking often breaks legal arguments or separates critical citations from their context.
Chunking is crucial for LLMs as it determines how documents are segmented to fit within context windows, and poor chunking strategies lead to lost semantic relationships, inaccurate interpretations that result in retrieval failures. For instance, when asked whether a contract permits early termination, an LLM using poorly chunked text might retrieve only the section mentioning “termination” but miss the critical conditions leading to incorrect legal advice. The goal is to develop and evaluate strategies that enable more accurate legal document retrieval, and question-answering systems.
Objectives
- Analyze characteristics of legal texts to identify optimal chunking.
- Implement and compare multiple chunking strategies (e.g., fixed, sliding window, semantic-based, agentic-based).
- Create a benchmarking setup to assess chunking quality on legal QA and retrieval tasks.
Requirements
- Understanding of legal document structures.
- Experience with natural language processing and text segmentation.
- Familiarity with semantic similarity measures and embedding models.
- Basic knowledge of information retrieval.
How to apply
All necessary skills can be learned during the project, so feel free to apply. Contact max.prior@tum.de with your CV and university transcripts.
Kontakt: max.prior@tum.de
Mehr Information
https://www.cs.cit.tum.de/en/lt/tum-legal-tech-working-group/