Glossary#
This glossary contains definitions of terms used throughout the LazySlide documentation.
Histopathology#
- annotation#
Additional information or labels associated with regions of interest in a WSI, such as marking tumor boundaries, cell types, or pathological features. Annotations can be manual (created by pathologists) or automated (generated by AI models) and are typically stored as polygons, points, or other geometric shapes.
- artifact#
Unwanted features or distortions in histopathology images that can interfere with analysis, such as air bubbles, dust, folded tissue, staining irregularities, or scanning artifacts. These need to be identified and excluded from analysis to ensure accurate results.
- digital pathology#
The practice of digitizing glass slides into high-resolution WSIs and using computational tools for analysis, diagnosis, and research. Digital pathology enables remote consultation, quantitative analysis, artificial intelligence applications, and improved workflow efficiency compared to traditional microscopy.
H&E#Hematoxylin and eosin staining, the most common tissue staining method in histopathology. Hematoxylin stains cell nuclei blue/purple (binding to DNA/RNA) and eosin stains cytoplasm, extracellular matrix, and other structures pink/red. H&E slides are the standard for disease diagnosis and are widely used in digital pathology analysis.
- histopathology#
The study of diseased tissue at the microscopic level to understand the manifestations of disease. Involves examining tissue sections that have been processed, sectioned, and stained to identify cellular and structural abnormalities for diagnosis and research.
- magnification#
Zoom level of a WSI. It is related to the level of detail in a WSI, typically measured in terms of how many times the original tissue is magnified (e.g., 20x, 40x). Higher magnification levels provide more detail but require more computational resources for analysis.
mpp#Microns per pixel - a unit measuring the physical size represented by each pixel in a digital image. Lower mpp values indicate higher resolution (more detail), while higher mpp values indicate lower resolution.
- patch#
A small rectangular region extracted from a whole slide image for analysis. They enable efficient processing of high-resolution WSIs by breaking them into manageable sub-images
- polygon#
A geometric shape defined by a series of connected points forming a closed boundary. In LazySlide, polygons are used to represent tissue regions, contours, tiles, and annotation as vector shapes that can be manipulated and analyzed geometrically. See Shapely documentation for more details.
- pyramid#
- pyramid structure#
A multi-resolution representation of a WSI, storing images at different zoom levels
- segmentation level#
The resolution level of a WSI used for segmentation. Level 0 has the higher resolution, and higher levels have progressively lower resolution. It can be set by using the parameter level in LazySlide’s tissue segmentation function. The optimal level is automatically determined based on the available memory, but can be manually set for consistency.
- tile#
Synonym for patch - a small image region extracted from a WSI for processing.
WSI#Whole Slide Image - a high-resolution digital image of an entire tissue section.
- ntours
Closed boundaries outlining tissue regions in a WSI (e.g., tumor areas, stroma), often represented as polygons.
- les
Empty spaces within a contour (e.g., artifacts, fat deposits, or non-tissue regions like lumens). They are typically excluded from further analysis.
Data structures#
AnnData#Annotated data object used for storing and manipulating single-cell and spatial omics data. See Anndata documentation for more details.
- embedding#
- feature embedding#
A numerical representation of data (such as image patches) in a lower-dimensional space, typically produced by neural networks. These embeddings capture semantic information and can be used for downstream tasks like clustering, classification, or similarity search.
- features#
Numerical representations or measurements extracted from data, such as pixel intensities, texture descriptors, or learned representations from neural networks. In histopathology, features can describe visual properties of patches or geometric properties of tissue regions.
GeoDataFrame#A pandas DataFrame extension for working with geospatial data, where each row represents a spatial feature with a geometry column. Used extensively in LazySlide to store and manipulate spatial objects like tissue contours, tiles, and cell boundaries as polygons.
- geometric features#
Quantitative measurements of shape and spatial properties of objects, such as area, perimeter, convexity, solidity, and eccentricity. In LazySlide, these are computed for tissue contours and provide morphological characterization of tissue regions.
Hugging Face#A popular platform and ecosystem for sharing and deploying machine learning models, particularly natural language processing and computer vision models. Many foundation models in LazySlide are hosted on Hugging Face.
- multi-channel image#
An image containing multiple channels of information beyond standard RGB, such as fluorescence microscopy images with different stains or markers. Each channel typically represents a different biological target or imaging modality, enabling simultaneous analysis of multiple features within the same tissue sample.
- multiplexed images#
Images that capture multiple biological targets or markers simultaneously within a single sample, typically through techniques like immunofluorescence or mass spectrometry imaging. These images provide rich, multi-dimensional data for studying cellular interactions and tissue architecture at high resolution.
- omics data#
High-throughput biological data including genomics, transcriptomics, proteomics, and other molecular profiling technologies. In spatial biology, omic data can be integrated with histopathology images to provide comprehensive molecular and morphological characterization of tissues.
SpatialData#A framework for handling spatially resolved omics data, used as LazySlide’s data foundation. See SpatialData documentation for more details.
WSIData#LazySlide’s data structure for storing WSI data and associated annotations. See WSIData documentation for more details.
Model types & machine learning#
- embedding#
A dense vector representation of data in a continuous vector space, typically learned by machine learning models. In LazySlide, embeddings can represent patches, tiles, or entire WSIs as numerical vectors that capture semantic information for downstream analysis.
- foundation model#
A large-scale pre-trained model (e.g., UNI, CONCH) that can be adapted for various downstream tasks.
- Leiden clustering#
A community detection algorithm for clustering nodes in graphs, commonly used for spatial clustering of tiles or cells. Often applied after constructing a spatial tile graph to identify spatially coherent regions.
- multimodal model#
A model that can process and integrate multiple types of input data, such as both images and text.
- pretrained model#
A neural network model that has been trained on a large dataset and can be fine-tuned or used as a feature extractor for new tasks. Foundation models are a type of pretrained model designed for broad applicability.
- segmentation model#
A model that performs image segmentation tasks, partitioning images into meaningful regions.
- tile prediction model#
A model that makes predictions on image tiles or patches, typically for classification or regression tasks.
- transform function#
A preprocessing function that converts raw image data into the format expected by a vision model, typically including normalization, resizing, and tensor conversion. Each model defines its own transform function via the get_transform() method.
- vision model#
A model designed to process and analyze visual data, typically images.
- zero-shot learning#
Machine learning approach where models make predictions on classes not seen during training.
Image analysis#
- artifact segmentation#
The process of identifying and delineating artifacts within histopathology images to exclude them from analysis. Can be performed using rule-based methods (e.g., filtering non-reddish regions) or machine learning approaches to automatically detect various types of artifacts.
- binary mask#
A mask where each pixel has only two possible values: 0 (background/negative) or 1 (foreground/positive). Used in segmentation tasks to represent regions of interest, with pixels set to 1 indicating the presence of the target object or tissue type.
- cell segmentation#
The process of identifying and delineating individual cells within histopathology images. Can be performed using traditional computer vision methods or deep learning models, typically as instance segmentation to distinguish between individual cell instances rather than just classifying cell vs. background pixels.
- cell type segmentation#
A specialized form of cell segmentation that not only identifies and delineates individual cells but also classifies them into specific cell types (e.g., lymphocytes, epithelial cells, stromal cells). Combines instance segmentation with cell type classification to provide both spatial boundaries and cellular identity.
- confusion matrix#
A table used to evaluate the performance of classification models, showing the counts of true positives, true negatives, false positives, and false negatives. Essential for computing various metrics like precision, recall, and accuracy in segmentation and classification tasks.
- Delaunay triangulation#
A geometric method for creating a triangular mesh from a set of points, where no point lies inside the circumcircle of any triangle. Used in spatial tile graph construction to define neighborhood relationships between tiles based on natural spatial connectivity rather than fixed distance thresholds. Can be set by using the parameter use_delaunay in pp.tile_graph.
- feature aggregation#
The process of combining features from multiple sources or spatial locations, such as aggregating patch features within tissue regions or combining features from neighboring tiles in a spatial tile graph.
- feature extraction#
The process of computing numerical representations (features) from raw data, such as extracting embeddings from image patches using vision models or computing geometric features from tissue contours.
- Hungarian algorithm#
An algorithm that finds the best way to match items from two groups, minimizing the total cost of matching. In segmentation evaluation, it optimally pairs predicted objects with ground truth objects to compute accurate performance metrics. Think of it as solving “which predicted cell should be matched to which real cell?” in the most fair way possible.
- instance#
An individual object or entity within an image, such as a single cell, nucleus, or tissue structure.
- instance mask#
A binary or labeled mask where each pixel value represents a specific object instance ID, with 0 typically representing background. Used in instance segmentation to distinguish between individual objects (e.g., different cells) of the same class, where each cell would have a unique pixel value in the mask.
- instance segmentation#
A type of segmentation that identifies and delineates each individual instance of an object (e.g., each cell) in an image. It is similar to semantic segmentation (associating each pixel with a class label) but distinguishes between different instances of the same class.
- intersection over union#
- IoU#
A metric used to evaluate the accuracy of object detection and segmentation models, calculated as the area of overlap between predicted and ground truth regions divided by the area of their union. Values range from 0 (no overlap) to 1 (perfect overlap). Also known as the Jaccard index.
- mask#
A 2D array or image where pixel values indicate different regions, classes, or properties. In LazySlide, masks are used to represent segmentation results, tissue boundaries, or regions of interest. Can be binary masks, multi-class masks, or instance masks.
- neighborhood graph construction#
The process of building a graph structure that represents spatial relationships between objects (such as tiles or cells), where edges connect spatially proximate nodes. Used as a foundation for spatial analysis methods.
- non-maximum suppression#
- NMS#
A post-processing technique used in object detection and instance segmentation to remove duplicate or overlapping detections. NMS keeps only the detection with the highest confidence score among overlapping objects, helping to eliminate redundant predictions and produce cleaner final results.
- panoptic quality#
- PQ#
A metric that measures how well a model performs both object detection and segmentation combined. It asks two questions: “Did you find all the objects?” and “How accurately did you outline them?” PQ combines these into one score from 0 to 1, where 1 means perfect detection with perfect boundaries. Commonly used to evaluate instance segmentation models using the Hungarian algorithm for optimal matching.
- segmentation#
The task of partitioning an image into meaningful regions, such as identifying individual cells or tissue structures.
- semantic segmentation#
A type of segmentation that classifies each pixel in an image into a category (e.g., tumor vs. normal tissue) without distinguishing between individual instances.
- spatial feature smoothing#
A technique for reducing noise and creating spatial coherence in feature maps by averaging or interpolating values across neighboring locations in a spatial tile graph. Helps create smoother spatial patterns and reduce the impact of outlier measurements.
- spatial tile graph#
A spatial graph structure representing neighborhood relationships between tiles extracted from a WSI. Each tile becomes a node, and edges connect spatially adjacent tiles based on distance or Delaunay triangulation. This transformation enables analysis methods based on graph theory and graph neural networks.
- tissue segmentation#
The process of identifying and delineating tissue regions within a WSI, typically distinguishing tissue from background areas. Used as a preprocessing step to focus analysis on relevant tissue regions and exclude empty slide areas, artifacts, or non-tissue regions.
- unsupervised spatial domain segmentation#
A machine learning approach for automatically identifying distinct spatial regions or domains in tissue without prior labeled examples, typically using clustering methods applied to spatial features and neighborhood graph construction.