Feature extraction and spatial analysis

Feature extraction and spatial analysis#

Here, we will load a GTEx Small Intestine slide as example.

from huggingface_hub import hf_hub_download

slide = hf_hub_download(
    "rendeirolab/lazyslide-data", 
    "GTEX-11DXX-1626.svs", 
    repo_type="dataset",
    cache_dir="."
)

from wsidata import open_wsi
import lazyslide as zs

import matplotlib.pyplot as plt

Let’s open the wsi!

wsi = open_wsi(slide)
wsi

WSI: datasets--rendeirolab--lazyslide-data/snapshots/3ae589f240b9897db973b706f77636559b100696/GTEX-11DXX-1626.svs
Reader: openslide
Dimensions: 38717×51791 (h×w), 3 Pyramids
Pixel physical size: 0.49 MPP (20X)

SpatialData object
└── Images
      └── 'wsi_thumbnail': DataArray[cyx] (3, 1118, 1495)
with coordinate systems:
    ▸ 'global', with elements:
        wsi_thumbnail (Images)

What does the tissue look like?

zs.pl.tissue(wsi)

../_images/e53d95f1853780b85346f77abaa9a360343cda50e4cea5a354d09ba419c57f7d.png

Let’s first find and tile the tissue, we will request tiny tile size of 64*64 px, this may take a while.

zs.pp.find_tissues(wsi)
zs.pp.tile_tissues(wsi, 128)

Morphological feature extraction#

Feature extraction is to transform the image into a the numeric representation, which comprises of different morphological features. Typically, this is done by feeding the tiles into a vision model.

LazySlide supports automatic mix-precision inference which may reduce memory usuage and have faster inference spped, try amp=True. Since we are working on a big slide with tiny tile sizes, this may take 10mins to finish (MacBook M3 Max).

zs.tl.feature_extraction(wsi, "plip", amp=True)

::{note} Autocast doesn’t work well with mps backend though, you may get all nan results.

zs.tl.feature_extraction(wsi, "plip")

Features are saved as AnnData store with a convention of “{model name}_{tiles key}”. For example, h0-mini_tiles

Feature aggregation#

To perform analysis across dataset, a usual way is to pool features into a 1D vector that can represent the entire slide. By default, the mean pooling is applied. Advanced slide encoders will be introduced later.

zs.tl.feature_aggregation(wsi, feature_key="plip")

You can retrieve specific feature with the fetch accessor. This will return a copy of the anndata.

adata = wsi.fetch.features_anndata("plip")

Pre-computed results#

If you don’t want to run feature extraction, you can simply load the pre-computed one

wsi = zs.datasets.gtex_small_intestine()

wsi

WSI: /home/runner/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/3ae589f240b9897db973b706f77636559b100696/GTEX-11DXX-1626.svs
Reader: openslide
Dimensions: 38717×51791 (h×w), 3 Pyramids
Pixel physical size: 0.49 MPP (20X)

SpatialData object, with associated Zarr store: /home/runner/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/3ae589f240b9897db973b706f77636559b100696/GTEX-11DXX-1626.zarr
├── Images
│     └── 'wsi_thumbnail': DataArray[cyx] (3, 1118, 1495)
├── Shapes
│     ├── 'tiles': GeoDataFrame shape: (19038, 3) (2D shapes)
│     └── 'tissues': GeoDataFrame shape: (6, 2) (2D shapes)
└── Tables
      ├── 'plip_tiles': AnnData (19038, 512)
      └── 'plip_tiles_text_similarity': AnnData (19038, 4)
with coordinate systems:
    ▸ 'global', with elements:
        wsi_thumbnail (Images), tiles (Shapes), tissues (Shapes)

Examination of feature space#

You may need to install scanpy and igraph to run the following command.

import scanpy as sc

adata = wsi["plip_tiles"]
sc.pp.scale(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, flavor="igraph", resolution=0.2)

sc.pl.umap(adata, color="leiden")

../_images/d37ae824e9221219ca0ab4909bb2fc12332e543c5a85294e22c7f975db83c50f.png

sc.tl.rank_genes_groups(adata, groupby="leiden")

features = set()
for i in adata.obs["leiden"].unique():
    names = sc.get.rank_genes_groups_df(adata, i).names
    features.update(list(names[0:10]) + list(names[-10:]))
features = list(features)

import marsilea as ma
import marsilea.plotter as mp
from scipy.stats import zscore

key = "leiden"
h = ma.Heatmap(zscore(adata[:, features].X.T), height=2, width=4, label="Feature")
order = sorted(adata.obs[key].unique())
h.group_cols(adata.obs[key], order=order)
h.add_top(mp.Chunk(order, fill_colors=adata.uns[f"{key}_colors"], padding=2), pad=0.05)
h.add_dendrogram("right", method="average", linewidth=0.1)
h.add_legends()
h.render()

../_images/3ae6353f2d9591bef3a165f6ba0dc01091ea66901ab41b837d592f72742c3f84.png

Identification of spatial domains#

The leiden clustering on features from foundational model can already recover the spatial domains of tissues pretty well.

zs.pl.tiles(
    wsi,
    feature_key="plip",
    color="leiden",
    alpha=0.5,
    palette=adata.uns[f"{key}_colors"],
    show_contours=False,
)

../_images/d6ab14071ae9b0e192a29e2d735763cc8bbb6a0ff4c015e7a79115239c700647.png

For simplicity, you can simply run spatial domain through:

zs.tl.spatial_domain(wsi, feature_key="plip", resolution=0.2)

The spatial domain is equivalent to the following process:

wsi.write()

Integration of spatial information with UTAG#

In this example, you may notice the border of domain is not very smooth, this can be improved by integrating spatial information.

UTAG is a method develop to discovery spatial domain with unsupervised learning.

zs.pp.tile_graph(wsi)
zs.tl.spatial_features(wsi, "plip")

zs.tl.spatial_domain(wsi, layer="spatial_features", feature_key="plip", resolution=0.2)

zs.pl.tiles(wsi, color="domain", alpha=0.5)

../_images/a777a24b906e6e6f6eb513b2d719661ec5b87135a028adeacb57e1462fc90420.png

Text feature extraction#

Apart from deriving morphological features from vision models, you can also run multimodal to derive text features.

Currently, there are two vision-language models for pathology

PLIP, Nature Medicine, 2023
CONCH, Nature Medicine, 2024

Since we’ve extracted the plip vision features for our WSI, we only need to extract features for the texts.

terms = ["mucosa", "submucosa", "musclaris", "lymphocyte"]

embeddings = zs.tl.text_embedding(terms, model="plip")
zs.tl.text_image_similarity(wsi, embeddings, model="plip", softmax=True)

zs.pl.tiles(
    wsi,
    feature_key="plip_tiles_text_similarity",
    color=terms,
    cmap="rainbow",
    show_image=False,
    tissue_id=3,
    alpha=0.7,
)

../_images/2c5c788d4bcb24cb0f1bd449d2f28ae7d835e0f0d33f97f0489f4b4559c69c76.png