Master Thesis: Analytical Research on the Interplay of Global Spatial Tissue Structure and Local Microenvironment in Spatial Omics Data
08.12.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten
Introduction
The development of single-cell sequencing technologies has enabled the measurement of gene expression at cellular resolution, providing insights on cellular diversity at an unprecedented scale. Recently, spatial omics technologies have extended this capability by preserving the spatial coordinates of each cell, allowing researchers to analyze gene expression in the context of the tissue’s physical architecture.
A key goal in understanding tissue organization is the identification of cellular niches, groups of neighboring cells that coordinate specific functions. Graph neural networks (GNNs) are a common modeling approach. Typically, a k-nearest-neighbor (KNN) graph is constructed using the cells’ two-dimensional spatial coordinates. Each cell becomes a node, its gene expression vector serves as the node feature, and edges connect each cell to its nearest spatial neighbors. Message-passing modules then aggregate information from local neighborhoods to generate node embeddings.
The Challenge: A well-known limitation of standard GNN architectures is their predominantly local receptive field. Because message passing propagates information only across adjacent nodes, the resulting embeddings often fail to represent a cell’s global position within the tissue. Consequently, cells with similar local neighborhoods can end up with indistinguishable embeddings even if they occupy distant regions of the tissue. This loss of global spatial context obscures large-scale tissue organization (such as gradients or zonation) in the latent space, which is relevant to understanding several biological phenomena, including morphogen signaling, hypoxia, or immune cell recruiting.
Our Solution: To address this gap, we have developed a method that integrates both n-hop neighborhood information and global positional context into the learned embeddings. As a result, the latent space preserves not only local transcriptional features but also broader tissue-level structure.
Thesis Project Overview
The Hypothesis
We hypothesize that incorporating global spatial location alongside local context enhances the interpretation of biological signals across tissue niches. Specifically, this approach resolves variations among niches that share similar local microenvironments but occupy different global locations.
Proof-of-Concept Example: In our analysis of Ductal Carcinoma In Situ (DCIS), we observed that normal duct cells spatially proximal to tumor cells overexpress the EGR1 gene. Our model successfully distinguished these cells from their distant counterparts based on global position, despite their identical local cell-type neighborhoods. This observation warrants further investigation.
Your Role
This thesis project moves beyond method development and focuses on biological discovery and validation. Your primary objective is to demonstrate the unique capabilities of our model by applying it to new, unexplored biological contexts. You will:
- Survey available spatial omics repositories to identify datasets where global spatial logic is biologically relevant.
- Curate and preprocess selected datasets for analysis.
- Train our global-context GNN on these datasets to generate refined cell embeddings.
- Analyze outcomes to determine if global embeddings reveal novel biological insights.
Expected Start Time: As soon as possible
Expected Duration: 6 months
Candidate Profile
- Background in biology or bioinformatics
- Experience with single-cell data analysis
- Proficiency in Python and PyTorch
- Experience with spatial omics data (preferred)
- Familiarity with the PyTorch Geometric library (preferred)
What We Offer
- Joint computational and biological supervision
- Opportunities for co-authorship on resulting publications
- Access to Helmholtz Munich computational and research infrastructure
How to Apply
Please submit your CV, transcripts (if applicable), and a cover letter using this Google form.
What to Include in Your Cover Letter:
- A concise description (one paragraph) of a method you are familiar with that applies a GNN to spatial omics data.
- A clear explanation of one limitation of the method described above.
- A proposal of one suitable spatial omics dataset on which you believe our method could be applied, along with the specific biological or analytical question you would aim to address.
Supervisors:
Mostafa Shahhosseini
Dr. Sergio Marco Salas
Prof. Fabian Theis
Kontakt: mostafa.shahhosseini@helmholtz-munich.de


