Towards More Informative 3D Scene Graphs for Visual Reasoning
13.10.2025, Studentische Hilfskräfte, Praktikantenstellen, Studienarbeiten
Recent advances in vision–language models have enhanced our ability to interpret complex visual scenes. However, these systems often struggle with structured and spatial reasoning. This project aims to develop methods for improved visual understanding that combine vision and language representations for interpretable, structured perception and reasoning.
You will explore modern foundation models such as, BLIP-2, PRISM-0, HOV-SG, ROOT, Panoptic Scene Graph Generation, and investigate how their reasoning capabilities can be extended to create more coherent and spatially aware representations for real-world robotic or AI applications.
Start Date: Winter Semester 2025
Location: Technical University of Munich (in-person participation required; occasional remote work possible)
Application Deadline: 24.10.2025
Requirements:
- Strong programming skills in Python; experience with PyTorch or similar deep learning frameworks
- Familiarity with computer vision and visual–language models
- Interest in multi-modal reasoning, representation learning, or robotic perception
Contact:
Panagiotis Petropoulakis
Chair of Robotics, Artificial Intelligence and Real-Time Systems
Technical University of Munich
Keywords: Visual Reasoning, Multi-Modal AI, Computer Vision, Robotics, Deep Learning, Representation Learning, Vision-Language Models
Kontakt: panagiotis.petropoulakis@tum.de
Mehr Information
1 |
MSc: Towards More Informative 3D Scene Graphs for Visual Reasoning, pdf file
(Type: application/pdf,
Größe: 267.5 kB)
Datei speichern
|