Direkt zum Inhalt springen
login.png Login    |
de | en
MyTUM-Portal
Technical University of Munich

Technical University of Munich

Sitemap > Bulletin Board > Diplomarbeiten, Bachelor- und Masterarbeiten > Deep fusion for Multiple-view 6D pose estimation in the Operating Room
up   Back to  News Board    previous   Browse in News  next    

Deep fusion for Multiple-view 6D pose estimation in the Operating Room

09.09.2024, Diplomarbeiten, Bachelor- und Masterarbeiten

The Human-Centered Computing and Extended Reality Lab of the Professorship for Machine Intelligence in Orthopedics seeks applicants for Bachelor/Master Thesis for the Winter Semester 24/25 until 30th of September 2024.

Overview

To ensure that artificial intelligence can handle real data as well as possible, the training data must be as similar as possible to the real data. This is particularly challenging in medical applications, as the image data is very different and diverse. This project aims to create the most realistic data from the instruments before and during an operation to learn to estimate their positions. Blood-immersed and metallic reflective instruments are particularly challenging. This project is based on preliminary work and includes creating AI models for surgical assistance.

Background & Motivation

The situation in the operating room is often very complex, as many different instruments have to be prepared and used. To support the staff here, the KARVIMIO project is researching how multi-view RGB-D cameras can learn to identify the individual parts and their poses without any optical markers. This information is passed on to Augmented Reality head-mounted displays so that it can be seen directly which parts need to be picked and how to be assembled and applied. Instruments with bloody and metallic reflective surfaces should be detected and estimated using multiple-view sensor fusion technology. The proposed project can build on preliminary work and should optimize and evaluate it to this end

Related Work and Approach

Multi-sensor fusion is essential for an accurate and reliable augmented reality system. Recent approaches are based on color and depth camera fusion technology. This can be realized using convolutional neural networks, an efficient and generic multi-task multi-sensor fusion framework. Our current framework enables synthetic rendering for training pose estimation and can be extendable for novel fusion approaches.

Student’s Task

The primary objectives for student tasks in this thesis will be:

  • Integrate Real datasets for Operating Room: Incorporate real multi-view medical scenes in operating rooms to test the accuracy of the framework. This will include the basic camera calibration, scene reconstruction and 6D pose labeling procedures.
  • Develop 3D vision network for multi-view fusion: Based on current available framework, students should be able to develop enhanced frameworks for efficient RGB-D fusion. The framework should be able to realize real-time 6D pose estimation based on multiple views.
  • Pipeline Evaluation and Benchmarking: Finally the new developed pipeline should be applied to real medical scenes. The new developed approach should be compared with currently available state-of-the-art approaches.

Technical Prerequisites

Students should be familiar with Pytorch and have the basic knowledge of 3D Vision, i.e. camera stereo calibration. They should have basic knowledge of 3D Vision networks, i.e. Pointnet++, 3DCNN. They should have strong motivation for technical development.

Please send your transcript of records, CV and motivation to: Shiyu Li (shiyu.li@tum.de) with CC to hex-thesis.ortho@mh.tum.de


Literature
Fabian Duffhauss et al. “Symfm6d: Symmetry-aware multi-directional fusion for multi-view 6d object pose estimation”. In: IEEE Robotics and Automation Letters (2023)
Zhijian Liu et al. “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation”. In: 2023 IEEE international conference on robotics and automation (ICRA). IEEE. 2023, pp. 2774–2781.
Rukhovich, Danila, Anna Vorontsova, and Anton Konushin. "Tr3d: Towards real-time indoor 3d object detection." 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023.
Duffhauss, Fabian, Tobias Demmler, and Gerhard Neumann. "Mv6d: Multi-view 6d pose estimation on rgb-d frames using a deep point-wise voting network." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.

Kontakt: hex-thesis.ortho@mh.tum.de, shiyu.li@tum.de

More Information

https://hex-lab.io