Direkt zum Inhalt springen
login.png Login    |
de | en
MyTUM-Portal
Technical University of Munich

Technical University of Munich

Sitemap > Bulletin Board > Diplomarbeiten, Bachelor- und Masterarbeiten > Master Thesis - Large Vision-Language Models for Autonomous System
up   Back to  News Board    previous   Browse in News  next    

Master Thesis - Large Vision-Language Models for Autonomous System

21.07.2025, Diplomarbeiten, Bachelor- und Masterarbeiten

Are you captivated by the power of Vision-Language Models to transform autonomous driving and intelligent traffic systems? Join us in exploring how large-scale multimodal models can interpret complex traffic scenes and enhance 3D perception using real-world data.

Vision-Language Models are emerging as powerful tools for bridging perception and reasoning in complex multimodal environments. Their application in autonomous driving and intelligent infrastructure offers new ways to enhance safety, interpret traffic scenes, and build structured knowledge for decision-making.

We are currently looking for highly motivated students to join our research in the following directions:

Explore how vision-language models can enhance perception, reasoning, and decision-making in real-world traffic scenarios. Focus areas include multimodal scene understanding, vision-language training, and fine-grained spatio-temporal reasoning in dynamic urban environments.
Develop and evaluate simulation-ready 3D world models from real traffic sensor data, involving dynamic scene generation, trajectory modeling, and structured representations to support downstream tasks such as planning and prediction in autonomous systems.

Topic Description:

Vision-Language Models in Autonomous Driving and Intelligent Traffic Infrastructure:
Leveraging large multi-modal LLMs to interpret complex traffic scenes. Key directions include enhancing perception and reasoning through visual-linguistic alignment, spatio-temporal video understanding, and traffic-aware retrieval using real-world sensor data, etc.

World Models and Data Generation for 3D Traffic Environment Understanding:
Building 3D world models from multi-sensor infrastructure data. Tasks include dynamic scene generation, structured environment representation, and trajectory modeling to support planning, prediction, and decision-making in autonomous systems.

Requirements:

  • Background in computer science, electrical engineering, physics, or mathematics
  • Solid understanding of machine learning and deep learning
  • Good programming skills in Python and at least one deep learning framework
  • Self-motivation and independent working
  • Bonus: Familiarity with Docker, Git, Slurm

What you will gain:

  • Hands-on experience with state-of-the-art vision-language models
  • Access to real sensor infrastructure data from urban and highway environments
  • Opportunity to work in a highly interdisciplinary research team
  • Potential to contribute to top-tier publications

Ref:
X. Zhou et al., “Vision Language Models in Autonomous Driving: A Survey and Outlook”, IEEE Transactions on Intelligent Vehicles, 2024. Available at: https://ieeexplore.ieee.org/document/10531702

Kontakt: xingcheng.zhou@tum.de

Todays events

no events today.

Calendar of events