Direkt zum Inhalt springen
login.png Login join.png Register    |
de | en
MyTUM-Portal
Technische Universität München

Technische Universität München

Sitemap > Schwarzes Brett > Abschlussarbeiten, Bachelor- und Masterarbeiten > Master thesis or research internship - Influence of Novel Text Data on Large Language Models
auf   Zurück zu  Nachrichten-Bereich    vorhergehendes   Browse in News  nächster    

Master thesis or research internship - Influence of Novel Text Data on Large Language Models

22.01.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten

External master thesis or research internship supervised by the Chair of Media Technology and Sureel.ai

Generative AI models are demonstrating strong performance in various domains. Models such as GPT4, trained using billions of text documents, are capable of generating correct text samples even for highly complex and challenging prompts. However, lawsuits such as the New York Times suing OpenAI have shown that these models face significant copyright challenges.

One solution for these copyright challenges is to properly assess the influence of the training data on any generated text. While this is practically infeasible for the foundational training data of billions of text documents, the influence of smaller fine-tuning data sets is more attainable. At Sureel Inc., a framework for identifying the influence of new image and music data on generative AI models is available. Assessing the influence of text documents and making the output LLMs more interpretable remains an open research question.

For that, the concepts used to relate generated image and audio data to their training data should be transferred to text synthesis. The focus lies on identifying the impact new text documents supplied to LLMs such as Llama 3 have on the subsequent text samples synthesized by the model. For this, existing methods for identifying the sources used by these models in their predictions should be investigated and combined with the techniques already available at Sureel. The goal is to develop a framework that allows computing percentage-wise influence to different specific texts supplied to Llama 3 as a representative state-of-the-art LLM.

This thesis will be conducted externally at Sureel Inc., a silicon-valley-based startup with an office in Munch specializing in explainable and legal generative AI content.

Requirements: Experience with Python and generative AI

Project type: Master thesis or research internship

Kontakt: christopher@sureel.ai

Mehr Information

http://www.sureel.ai

Termine heute

no events today.

Veranstaltungskalender