Feature extraction and spatial analysis#

Here, we will load a GTEx Small Intestine slide as example.

from huggingface_hub import hf_hub_download

slide = hf_hub_download(
    "rendeirolab/lazyslide-data", 
    "GTEX-11DXX-1626.svs", 
    repo_type="dataset",
    cache_dir="."
)
from wsidata import open_wsi
import lazyslide as zs

import matplotlib.pyplot as plt

Let’s open the wsi!

wsi = open_wsi(slide)
wsi
WSI: datasets--rendeirolab--lazyslide-data/snapshots/3ae589f240b9897db973b706f77636559b100696/GTEX-11DXX-1626.svs
Reader: openslide
Dimensions: 38717×51791 (h×w), 3 Pyramids
Pixel physical size: 0.49 MPP (20X)
SpatialData object
└── Images
      └── 'wsi_thumbnail': DataArray[cyx] (3, 1118, 1495)
with coordinate systems:
    ▸ 'global', with elements:
        wsi_thumbnail (Images)

What does the tissue look like?

zs.pl.tissue(wsi)
../_images/e53d95f1853780b85346f77abaa9a360343cda50e4cea5a354d09ba419c57f7d.png

Let’s first find and tile the tissue, we will request tiny tile size of 64*64 px, this may take a while.

zs.pp.find_tissues(wsi)
zs.pp.tile_tissues(wsi, 128)

Morphological feature extraction#

Feature extraction is to transform the image into a the numeric representation, which comprises of different morphological features. Typically, this is done by feeding the tiles into a vision model.

LazySlide supports automatic mix-precision inference which may reduce memory usuage and have faster inference spped, try amp=True. Since we are working on a big slide with tiny tile sizes, this may take 10mins to finish (MacBook M3 Max).

zs.tl.feature_extraction(wsi, "plip", amp=True)

::{note} Autocast doesn’t work well with mps backend though, you may get all nan results.

zs.tl.feature_extraction(wsi, "plip")

Features are saved as AnnData store with a convention of “{model name}_{tiles key}”. For example, h0-mini_tiles

Feature aggregation#

To perform analysis across dataset, a usual way is to pool features into a 1D vector that can represent the entire slide. By default, the mean pooling is applied. Advanced slide encoders will be introduced later.

zs.tl.feature_aggregation(wsi, feature_key="plip")

You can retrieve specific feature with the fetch accessor. This will return a copy of the anndata.

adata = wsi.fetch.features_anndata("plip")

Pre-computed results#

If you don’t want to run feature extraction, you can simply load the pre-computed one

wsi = zs.datasets.gtex_small_intestine()
wsi
WSI: /home/runner/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/3ae589f240b9897db973b706f77636559b100696/GTEX-11DXX-1626.svs
Reader: openslide
Dimensions: 38717×51791 (h×w), 3 Pyramids
Pixel physical size: 0.49 MPP (20X)
SpatialData object, with associated Zarr store: /home/runner/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/3ae589f240b9897db973b706f77636559b100696/GTEX-11DXX-1626.zarr
├── Images
│     └── 'wsi_thumbnail': DataArray[cyx] (3, 1118, 1495)
├── Shapes
│     ├── 'tiles': GeoDataFrame shape: (19038, 3) (2D shapes)
│     └── 'tissues': GeoDataFrame shape: (6, 2) (2D shapes)
└── Tables
      ├── 'plip_tiles': AnnData (19038, 512)
      └── 'plip_tiles_text_similarity': AnnData (19038, 4)
with coordinate systems:
    ▸ 'global', with elements:
        wsi_thumbnail (Images), tiles (Shapes), tissues (Shapes)

Examination of feature space#

You may need to install scanpy and igraph to run the following command.

import scanpy as sc

adata = wsi["plip_tiles"]
sc.pp.scale(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, flavor="igraph", resolution=0.2)
sc.pl.umap(adata, color="leiden")
../_images/d37ae824e9221219ca0ab4909bb2fc12332e543c5a85294e22c7f975db83c50f.png
sc.tl.rank_genes_groups(adata, groupby="leiden")

features = set()
for i in adata.obs["leiden"].unique():
    names = sc.get.rank_genes_groups_df(adata, i).names
    features.update(list(names[0:10]) + list(names[-10:]))
features = list(features)
import marsilea as ma
import marsilea.plotter as mp
from scipy.stats import zscore

key = "leiden"
h = ma.Heatmap(zscore(adata[:, features].X.T), height=2, width=4, label="Feature")
order = sorted(adata.obs[key].unique())
h.group_cols(adata.obs[key], order=order)
h.add_top(mp.Chunk(order, fill_colors=adata.uns[f"{key}_colors"], padding=2), pad=0.05)
h.add_dendrogram("right", method="average", linewidth=0.1)
h.add_legends()
h.render()
../_images/3ae6353f2d9591bef3a165f6ba0dc01091ea66901ab41b837d592f72742c3f84.png

Identification of spatial domains#

The leiden clustering on features from foundational model can already recover the spatial domains of tissues pretty well.

zs.pl.tiles(
    wsi,
    feature_key="plip",
    color="leiden",
    alpha=0.5,
    palette=adata.uns[f"{key}_colors"],
    show_contours=False,
)
../_images/d6ab14071ae9b0e192a29e2d735763cc8bbb6a0ff4c015e7a79115239c700647.png

For simplicity, you can simply run spatial domain through:

zs.tl.spatial_domain(wsi, feature_key="plip", resolution=0.2)

The spatial domain is equivalent to the following process:

wsi.write()

Integration of spatial information with UTAG#

In this example, you may notice the border of domain is not very smooth, this can be improved by integrating spatial information.

UTAG is a method develop to discovery spatial domain with unsupervised learning.

zs.pp.tile_graph(wsi)
zs.tl.spatial_features(wsi, "plip")
zs.tl.spatial_domain(wsi, layer="spatial_features", feature_key="plip", resolution=0.2)
zs.pl.tiles(wsi, color="domain", alpha=0.5)
../_images/a777a24b906e6e6f6eb513b2d719661ec5b87135a028adeacb57e1462fc90420.png

Text feature extraction#

Apart from deriving morphological features from vision models, you can also run multimodal to derive text features.

Currently, there are two vision-language models for pathology

  • PLIP, Nature Medicine, 2023

  • CONCH, Nature Medicine, 2024

Since we’ve extracted the plip vision features for our WSI, we only need to extract features for the texts.

terms = ["mucosa", "submucosa", "musclaris", "lymphocyte"]
embeddings = zs.tl.text_embedding(terms, model="plip")
zs.tl.text_image_similarity(wsi, embeddings, model="plip", softmax=True)
zs.pl.tiles(
    wsi,
    feature_key="plip_tiles_text_similarity",
    color=terms,
    cmap="rainbow",
    show_image=False,
    tissue_id=3,
    alpha=0.7,
)
../_images/2c5c788d4bcb24cb0f1bd449d2f28ae7d835e0f0d33f97f0489f4b4559c69c76.png