Virtual Staining: Zero-Cost Multiplexed Imaging from H&E Images#
This tutorial introduces one of the advanced features of LazySlide: the ability to transform a standard H&E (Hematoxylin and Eosin) histology image into a multiplexed image with multiple predicted biomarker channels. This powerful technique enables researchers to extract rich molecular information from routine histological slides without the need for expensive immunohistochemistry or specialized staining protocols.
# Import LazySlide and load a sample whole slide image (WSI)
import lazyslide as zs
wsi = zs.datasets.sample()
wsi
Reader: openslide
Dimensions: 2967×2220 (h×w), 1 Pyramid
Pixel physical size: 0.50 MPP (20X)
SpatialData object, with associated Zarr store: /home/runner/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/61b923cd2be1e50ed7116ecb93de47b8b4a5c947/sample.zarr
├── Images
│ └── 'wsi_thumbnail': DataArray[cyx] (3, 1496, 1119)
├── Shapes
│ ├── 'annotations': GeoDataFrame shape: (3, 4) (2D shapes)
│ ├── 'dl-tissue': GeoDataFrame shape: (2, 2) (2D shapes)
│ ├── 'tiles': GeoDataFrame shape: (31, 3) (2D shapes)
│ └── 'tissues': GeoDataFrame shape: (1, 2) (2D shapes)
└── Tables
└── 'resnet50_tiles': AnnData (31, 2048)
with coordinate systems:
▸ 'global', with elements:
wsi_thumbnail (Images), annotations (Shapes), dl-tissue (Shapes), tiles (Shapes), tissues (Shapes)
zs.pl.tissue(wsi)
zs.pp.tile_tissues(wsi, 64, stride_px=32)
# Apply virtual staining using the ROSIE model
zs.tl.virtual_stain(wsi, model="rosie")
Downloading: "https://download.pytorch.org/models/convnext_small-0c510722.pth" to /home/runner/.cache/torch/hub/checkpoints/convnext_small-0c510722.pth
Examining the Results#
Let’s examine our WSI object to see how the virtual staining has enhanced our data:
wsi
Reader: openslide
Dimensions: 2967×2220 (h×w), 1 Pyramid
Pixel physical size: 0.50 MPP (20X)
SpatialData object, with associated Zarr store: /home/runner/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/61b923cd2be1e50ed7116ecb93de47b8b4a5c947/sample.zarr
├── Images
│ ├── 'rosie_prediction': DataArray[cyx] (50, 92, 69)
│ └── 'wsi_thumbnail': DataArray[cyx] (3, 1496, 1119)
├── Shapes
│ ├── 'annotations': GeoDataFrame shape: (3, 4) (2D shapes)
│ ├── 'dl-tissue': GeoDataFrame shape: (2, 2) (2D shapes)
│ ├── 'tiles': GeoDataFrame shape: (2181, 3) (2D shapes)
│ └── 'tissues': GeoDataFrame shape: (1, 2) (2D shapes)
└── Tables
└── 'resnet50_tiles': AnnData (31, 2048)
with coordinate systems:
▸ 'global', with elements:
rosie_prediction (Images), wsi_thumbnail (Images), annotations (Shapes), dl-tissue (Shapes), tiles (Shapes), tissues (Shapes)
with the following elements not in the Zarr store:
▸ rosie_prediction (Images)
Understanding Virtual Staining Output#
The virtual staining process generates predicted biomarker images that are stored in the images slot of the WSI data object (SpatialData format).
How ROSIE works:
The ROSIE model makes predictions at the tile level, generating a mean expression value for each biomarker within each tile
This approach provides spatial gene expression predictions across the entire tissue section
For higher resolution results, consider using smaller tile sizes with more overlap
Technical note: To replicate the exact settings from the original ROSIE paper, use:
zs.pp.tile_tissues(wsi, 128, stride_px=8)
Let’s explore the generated predictions:
# Access the virtual staining predictions stored in the images dictionary
wsi.images["rosie_prediction"]
<xarray.DataArray 'image' (c: 50, y: 92, x: 69)> Size: 317kB
dask.array<array, shape=(50, 92, 69), dtype=uint8, chunksize=(50, 92, 69), chunktype=numpy.ndarray>
Coordinates:
* c (c) <U11 2kB 'DAPI' 'CD45' 'CD68' ... 'HLA-E' 'CollagenIV' 'CD66'
* y (y) float64 736B 0.5 1.5 2.5 3.5 4.5 ... 87.5 88.5 89.5 90.5 91.5
* x (x) float64 552B 0.5 1.5 2.5 3.5 4.5 ... 64.5 65.5 66.5 67.5 68.5
Attributes:
transform: {'global': Scale (y, x)\n [32.25 32.17391304]}# Display all available biomarker channels predicted by the ROSIE model
wsi.images["rosie_prediction"].c.data
array(['DAPI', 'CD45', 'CD68', 'CD14', 'PD1', 'FoxP3', 'CD8', 'HLA-DR',
'PanCK', 'CD3e', 'CD4', 'aSMA', 'CD31', 'Vimentin', 'CD45RO',
'Ki67', 'CD20', 'CD11c', 'Podoplanin', 'PDL1', 'GranzymeB', 'CD38',
'CD141', 'CD21', 'CD163', 'BCL2', 'LAG3', 'EpCAM', 'CD44', 'ICOS',
'GATA3', 'Gal3', 'CD39', 'CD34', 'TIGIT', 'ECad', 'CD40', 'VISTA',
'HLA-A', 'MPO', 'PCNA', 'ATM', 'TP63', 'IFNg', 'Keratin8/18',
'IDO1', 'CD79a', 'HLA-E', 'CollagenIV', 'CD66'], dtype='<U11')
# Visualize all predicted biomarker channels in a grid layout
import matplotlib.pyplot as plt
# Create a 5x10 subplot grid to display all 50 biomarker channels
fig, axs = plt.subplots(5, 10, figsize=(20, 10))
# Plot each biomarker channel with its corresponding gene name as title
for i, ax in enumerate(axs.flatten()):
ax.imshow(wsi.images["rosie_prediction"].data[i])
ax.set_title(wsi.images["rosie_prediction"].c.data[i], fontsize=8)
ax.axis("off")
plt.tight_layout()
plt.show()
You can also use spatialdata plot to overlay specific biomarker channels on the original H&E image for better visualization.
import spatialdata_plot # noqa
(
wsi.pl.render_images("wsi_thumbnail")
.pl.render_images("rosie_prediction", channel=["DAPI"], alpha=0.5)
.pl.show()
)
Summary#
Congratulations! You’ve successfully applied virtual staining to transform a standard H&E image into a rich, multiplexed dataset. This technique opens up numerous possibilities for:
Biomarker discovery: Identify spatial patterns of gene expression without expensive assays
Digital pathology: Enhance routine histological analysis with molecular insights
Research acceleration: Generate hypotheses about tissue biology from existing slide archives
Cost-effective screening: Prioritize samples for expensive molecular assays
The virtual staining approach represents a powerful bridge between traditional histopathology and modern molecular biology, enabling researchers to extract maximum value from standard histological preparations.