Direkt zum Inhalt springen
login.png Login join.png Register    |
de | en
MyTUM-Portal
Technische Universität München

Technische Universität München

Sitemap > Schwarzes Brett > Abschlussarbeiten, Bachelor- und Masterarbeiten > Deep fusion for Multiple-view 6D pose estimation in the Operating Room
auf   Zurück zu  Nachrichten-Bereich    vorhergehendes   Browse in News  nächster    

Deep fusion for Multiple-view 6D pose estimation in the Operating Room

09.09.2024, Abschlussarbeiten, Bachelor- und Masterarbeiten

Overview

To ensure that artificial intelligence can handle real data as well as possible, the training data must be as similar as possible to the real data. This is particularly challenging in medical applications, as the image data is very different and diverse. This project aims to create the most realistic data from the instruments before and during an operation to learn to estimate their positions. Blood-immersed and metallic reflective instruments are particularly challenging. This project is based on preliminary work and includes creating AI models for surgical assistance.

Background & Motivation

The situation in the operating room is often very complex, as many different instruments have to be prepared and used. To support the staff here, the KARVIMIO project is researching how multi-view RGB-D cameras can learn to identify the individual parts and their poses without any optical markers. This information is passed on to Augmented Reality head-mounted displays so that it can be seen directly which parts need to be picked and how to be assembled and applied. Instruments with bloody and metallic reflective surfaces should be detected and estimated using multiple-view sensor fusion technology. The proposed project can build on preliminary work and should optimize and evaluate it to this end

Related Work and Approach

Multi-sensor fusion is essential for an accurate and reliable augmented reality system. Recent approaches are based on color and depth camera fusion technology. This can be realized using convolutional neural networks, an efficient and generic multi-task multi-sensor fusion framework. Our current framework enables synthetic rendering for training pose estimation and can be extendable for novel fusion approaches.

Student’s Task

The primary objectives for student tasks in this thesis will be:

  • Integrate Real datasets for Operating Room: Incorporate real multi-view medical scenes in operating rooms to test the accuracy of the framework. This will include the basic camera calibration, scene reconstruction and 6D pose labeling procedures.
  • Develop 3D vision network for multi-view fusion: Based on current available framework, students should be able to develop enhanced frameworks for efficient RGB-D fusion. The framework should be able to realize real-time 6D pose estimation based on multiple views.
  • Pipeline Evaluation and Benchmarking: Finally the new developed pipeline should be applied to real medical scenes. The new developed approach should be compared with currently available state-of-the-art approaches.

Technical Prerequisites

Students should be familiar with Pytorch and have the basic knowledge of 3D Vision, i.e. camera stereo calibration. They should have basic knowledge of 3D Vision networks, i.e. Pointnet++, 3DCNN. They should have strong motivation for technical development.

Please send your transcript of records, CV and motivation to: Shiyu Li (shiyu.li@tum.de) with CC to hex-thesis.ortho@mh.tum.de


Literature
Fabian Duffhauss et al. “Symfm6d: Symmetry-aware multi-directional fusion for multi-view 6d object pose estimation”. In: IEEE Robotics and Automation Letters (2023)
Zhijian Liu et al. “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation”. In: 2023 IEEE international conference on robotics and automation (ICRA). IEEE. 2023, pp. 2774–2781.
Rukhovich, Danila, Anna Vorontsova, and Anton Konushin. "Tr3d: Towards real-time indoor 3d object detection." 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023.
Duffhauss, Fabian, Tobias Demmler, and Gerhard Neumann. "Mv6d: Multi-view 6d pose estimation on rgb-d frames using a deep point-wise voting network." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.

Kontakt: hex-thesis.ortho@mh.tum.de, shiyu.li@tum.de

Mehr Information

https://hex-lab.io