Master’s Thesis – Real-Time Multi-Modal Sensor Fusion with BEV Models in NVIDIA DeepStream
19.08.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten
The Chair of Robotics, Artificial Intelligence, and Real-Time Systems offers a Master’s thesis opportunity in the field of multi-modal sensor fusion. The project focuses on extending NVIDIA BEVFusion to integrate 8 cameras, 1 LiDAR, and 4 RADAR sensors within a real-time DeepStream pipeline. The goal is to reduce inference latency from ~35 ms to around 20 ms while improving robustness and accuracy in 360° perception.
Motivation & Relevance
Autonomous driving relies on accurate and fast perception of the environment. State-of-the-art BEV (Bird’s Eye View) models such as NVIDIA BEVFusion and CUDA-BEVFusion enable joint reasoning over camera and LiDAR inputs, but current setups often exceed latency requirements for safety-critical applications. By integrating RADAR and optimizing the fusion pipeline within NVIDIA’s DeepStream framework, we aim to achieve real-time performance with enhanced reliability across modalities.
Project Description
You will design and implement a multi-modal fusion system capable of:
- Extending BEVFusion to handle 8 cameras, 1 LiDAR, and 4 RADAR sensors
- Integrating all sensors into a real-time NVIDIA DeepStream pipeline
- Optimizing inference for < 20 ms latency
- Evaluating trade-offs between accuracy, robustness, and runtime
- Benchmarking performance on simulated or recorded multi-sensor datasets
Your Tasks
- Adapt and optimize the BEVFusion model for multi-modal input
- Profile and reduce inference latency using TensorRT and DeepStream tools
- Evaluate fusion performance with respect to detection accuracy and timing
- Document results and propose strategies for real-world deployment
Profile
- Master’s student in Computer Science, Robotics, Electrical Engineering, or related field
- Programming skills in C++, Python is a plus
- Familiarity with NVIDIA DeepStream and TensorRT is beneficial
- Knowledge of multi-modal perception (camera, LiDAR, RADAR) is a plus
What You Will Gain
- Hands-on experience with state-of-the-art multi-modal fusion techniques
- Practical expertise in NVIDIA’s DeepStream and TensorRT optimization
- Contribution to real-time perception for autonomous driving
- Insights into deploying AI models in safety-critical environments
How to Apply
Please include your CV and a transcript of your grades in your application.
Kontakt: erik-leo.hass@tum.de