Master Thesis: Vision-Language Pretraining for Bone Tumor Classification
04.12.2024, Diplomarbeiten, Bachelor- und Masterarbeiten
This Master Thesis focuses on enhancing bone tumor classification from X-ray images using vision-language pretraining. By leveraging large public datasets and incorporating anatomical context through captions, it aims to address the challenges of data scarcity and anatomical heterogeneity. The work involves building a supervised baseline, pretraining a self-supervised vision-language model, and testing fine-tuning and zero-shot strategies.
Abstract:
Bone tumor classification presents significant challenges due to the subtle visual differences among tumor entities, even for expert radiologists. This thesis aims to enhance diagnostic capabilities using vision-language pretraining to classify bone tumors from X-ray images. By pretraining on large public datasets such as MURA and incorporating anatomical context through captions, this thesis seeks to address key limitations posed by data scarcity and anatomical heterogeneity in the field of bone tumors.
Methodology:
- Literature review on the current state-of-the-art techniques for bone tumor classification and self-supervised vision-language pretraining.
- Implement a supervised model for bone tumor classification using X-Rays to serve as a baseline.
- Pretrain a vision-language model in a self-supervised manner, which will serve as a general-purpose model for downstream tasks.
- Test several fine-tuning strategies for bone tumor classification and test zero-shot capabilities.
Prerequisites:
- Advanced knowledge of deep learning with imaging data.
- Beneficial but not necessary: experience in medicine/oncology.
- Preferred starting date: January-February 2025 (with flexibility).
What we offer:
- Very rare medical data with high potential for publication.
- Highly educated & interdisciplinary environment.
- Top-level hardware for scientific computing.
- Constant feedback from medical and computer science experts.
How to apply:
Send an email to anna.curto-vilalta@tum.de, with your CV and a small introduction about you and your motivation.
References:
A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” Feb. 26, 2021, arXiv: arXiv:2103.00020. doi: https://doi.org/10.48550/arXiv.2103.00020.
H. Q. Vo et al., "Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction," in IEEE Journal of Biomedical and Health Informatics, doi: 10.1109/JBHI.2024.3507638.
Kontakt: anna.curto-vilalta@tum.de