Direkt zum Inhalt springen
login.png Login    |
de | en
MyTUM-Portal
Technische Universität München

Technische Universität München

auf   Zurück zu  Nachrichten-Bereich    vorhergehendes   Browse in News  nächster    

Masters Thesis: Chunking German Legal Texts

15.10.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten

Project: Chunking German Legal Texts

Chunking German Legal Texts

A project to develop and evaluate chunking strategies for German legal documents in LLM applications.

Project description

This project aims to develop a framework chunking German legal texts for Large Language Models (LLMs). Legal texts present several challenges: they contain complex cross-references, hierarchical structures, and semantically dense passages where preserving context is necessary for correct interpretation. Traditional fixed-length chunking often breaks legal arguments or separates critical citations from their context.

Chunking is crucial for LLMs as it determines how documents are segmented to fit within context windows, and poor chunking strategies lead to lost semantic relationships, inaccurate interpretations that result in retrieval failures. For instance, when asked whether a contract permits early termination, an LLM using poorly chunked text might retrieve only the section mentioning “termination” but miss the critical conditions leading to incorrect legal advice. The goal is to develop and evaluate strategies that enable more accurate legal document retrieval, and question-answering systems.

Objectives

  • Analyze characteristics of legal texts to identify optimal chunking.
  • Implement and compare multiple chunking strategies (e.g., fixed, sliding window, semantic-based, agentic-based).
  • Create a benchmarking setup to assess chunking quality on legal QA and retrieval tasks.

Requirements

  • Understanding of legal document structures.
  • Experience with natural language processing and text segmentation.
  • Familiarity with semantic similarity measures and embedding models.
  • Basic knowledge of information retrieval.

How to apply

All necessary skills can be learned during the project, so feel free to apply. Contact max.prior@tum.de with your CV and university transcripts.

Kontakt: max.prior@tum.de

Mehr Information

https://www.cs.cit.tum.de/en/lt/tum-legal-tech-working-group/

Termine heute

no events today.

Veranstaltungskalender