========
Glossary
========

This glossary contains definitions of terms used throughout the LazySlide documentation.


Histopathology 
===============

.. glossary::
   :sorted:

   annotation
      Additional information or labels associated with regions of interest in a :term:`WSI`, such as marking tumor boundaries, cell types, or pathological features. 
      Annotations can be manual (created by pathologists) or automated (generated by AI models) and are typically stored as polygons, points, or other geometric shapes.

   artifact
      Unwanted features or distortions in histopathology images that can interfere with analysis, such as air bubbles, dust, folded tissue, staining irregularities, or scanning artifacts. 
      These need to be identified and excluded from analysis to ensure accurate results.

   digital pathology
      The practice of digitizing glass slides into high-resolution :term:`WSI`\s and using computational tools for analysis, diagnosis, and research. 
      Digital pathology enables remote consultation, quantitative analysis, artificial intelligence applications, and improved workflow efficiency compared to traditional microscopy.

   histopathology
      The study of diseased tissue at the microscopic level to understand the manifestations of disease. 
      Involves examining tissue sections that have been processed, sectioned, and stained to identify cellular and structural abnormalities for diagnosis and research.

   ``H&E``
      Hematoxylin and eosin staining, the most common tissue staining method in histopathology. 
      Hematoxylin stains cell nuclei blue/purple (binding to DNA/RNA) and eosin stains cytoplasm, extracellular matrix, and other structures pink/red.
      H&E slides are the standard for disease diagnosis and are widely used in digital pathology analysis.

   patch
      A small rectangular region extracted from a whole slide image for analysis. 
      They enable efficient processing of high-resolution :term:`WSIs <WSI>` by breaking them into manageable sub-images

   segmentation level
      The resolution level of a WSI used for segmentation. Level 0 has the higher resolution, 
      and higher levels have progressively lower resolution. It can be set by using the parameter `level` 
      in LazySlide's tissue segmentation function. The optimal level is automatically determined based 
      on the available memory, but can be manually set for consistency.

   magnification
       Zoom level of a :term:`WSI`. It is related to the level of detail in a WSI, 
       typically measured in terms of how many times the original tissue is magnified (e.g., 20x, 40x). Higher magnification levels provide more detail but require more computational resources for analysis.
   
   ``mpp``
      Microns per pixel - a unit measuring the physical size represented by each pixel in a digital image. 
      Lower mpp values indicate higher resolution (more detail), while higher mpp values indicate lower resolution. 

   polygon
      A geometric shape defined by a series of connected points forming a closed boundary. 
      In LazySlide, polygons are used to represent tissue regions, :term:`contours`, :term:`tiles <tile>`, and :term:`annotation <annotations>` 
      as vector shapes that can be manipulated and analyzed geometrically. See `Shapely`_ documentation for more details.

   pyramid
   pyramid structure
       A multi-resolution representation of a WSI, storing images at different zoom levels 
      
   tile
      Synonym for :term:`patch` - a small image region extracted from a WSI for processing.
    
   ``WSI``
      Whole Slide Image - a high-resolution digital image of an entire tissue section.

    contours
       Closed boundaries outlining tissue regions in a WSI (e.g., tumor areas, stroma),
       often represented as polygons. 

    holes
       Empty spaces within a :term:`contour` (e.g., artifacts, fat deposits, or non-tissue regions like lumens).
       They are typically excluded from further analysis.


Data structures
===============

.. glossary::
   :sorted:

   ``AnnData``
      Annotated data object used for storing and manipulating single-cell and spatial omics data. See `Anndata`_ documentation for more details.

   ``GeoDataFrame``
      A pandas DataFrame extension for working with geospatial data, where each row represents a spatial feature with a geometry column. 
      Used extensively in LazySlide to store and manipulate spatial objects like tissue :term:`contours`, :term:`tiles <tile>`, 
      and cell boundaries as :term:`polygons <polygon>`.

   embedding
   feature embedding
      A numerical representation of data (such as image :term:`patches <patch>`) in a lower-dimensional space, typically produced by neural networks. 
      These embeddings capture semantic information and can be used for downstream tasks like clustering, classification, or similarity search.

   features
      Numerical representations or measurements extracted from data, such as pixel intensities, texture descriptors, or learned representations from neural networks. 
      In histopathology, features can describe visual properties of :term:`patches <patch>` or geometric properties of tissue regions.

   geometric features
      Quantitative measurements of shape and spatial properties of objects, such as area, perimeter, convexity, solidity, and eccentricity. 
      In LazySlide, these are computed for tissue :term:`contours` and provide morphological characterization of tissue regions.

   ``Hugging Face``
      A popular platform and ecosystem for sharing and deploying machine learning models, particularly natural language processing and computer vision models. 
      Many :term:`foundation models <foundation model>` in LazySlide are hosted on Hugging Face.

   ``SpatialData``
      A framework for handling spatially resolved omics data, used as LazySlide's data foundation. See `SpatialData`_ documentation for more details.

   ``WSIData``
      LazySlide's data structure for storing WSI data and associated annotations. See `WSIData`_ documentation for more details.

   multi-channel image
      An image containing multiple channels of information beyond standard RGB, such as fluorescence microscopy images with different stains or markers. 
      Each channel typically represents a different biological target or imaging modality, enabling simultaneous analysis of multiple features within the same tissue sample.

   multiplexed images
      Images that capture multiple biological targets or markers simultaneously within a single sample, typically through techniques like immunofluorescence or mass spectrometry imaging. 
      These images provide rich, multi-dimensional data for studying cellular interactions and tissue architecture at high resolution.

   omics data
      High-throughput biological data including genomics, transcriptomics, proteomics, and other molecular profiling technologies. 
      In spatial biology, omic data can be integrated with histopathology images to provide comprehensive molecular and morphological characterization of tissues.


Model types & machine learning
===============================

.. glossary::
   :sorted:

   embedding
      A dense vector representation of data in a continuous vector space, typically learned by machine learning models. 
      In LazySlide, embeddings can represent :term:`patches <patch>`, :term:`tiles <tile>`, or entire :term:`WSIs <WSI>` 
      as numerical vectors that capture semantic information for downstream analysis.

   foundation model
      A large-scale pre-trained model (e.g., UNI, CONCH) that can be adapted for various downstream tasks.

   Leiden clustering
      A community detection algorithm for clustering nodes in graphs, commonly used for spatial clustering of :term:`tiles <tile>` or cells. 
      Often applied after constructing a :term:`spatial tile graph` to identify spatially coherent regions.

   multimodal model
      A model that can process and integrate multiple types of input data, such as both images and text.

   pretrained model
      A neural network model that has been trained on a large dataset and can be fine-tuned or used as a feature extractor for new tasks. 
      :term:`Foundation models <foundation model>` are a type of pretrained model designed for broad applicability.

   segmentation model
      A model that performs image segmentation tasks, partitioning images into meaningful regions.

   tile prediction model
      A model that makes predictions on image tiles or patches, typically for classification or regression tasks.

   transform function
      A preprocessing function that converts raw image data into the format expected by a :term:`vision model`, typically including normalization, resizing, and tensor conversion. 
      Each model defines its own transform function via the `get_transform()` method.

   vision model
      A model designed to process and analyze visual data, typically images.

   zero-shot learning
      Machine learning approach where models make predictions on classes not seen during training.

Image analysis 
===============

.. glossary::
   :sorted:

   artifact segmentation
      The process of identifying and delineating artifacts within histopathology images to exclude them from analysis. 
      Can be performed using rule-based methods (e.g., filtering non-reddish regions) or machine learning approaches to automatically detect various types of artifacts.

   binary mask
      A mask where each pixel has only two possible values: 0 (background/negative) or 1 (foreground/positive). 
      Used in :term:`segmentation` tasks to represent regions of interest, with pixels set to 1 indicating the presence of the target object or tissue type.

   cell segmentation
      The process of identifying and delineating individual cells within histopathology images. 
      Can be performed using traditional computer vision methods or deep learning models, typically as :term:`instance segmentation` 
      to distinguish between individual cell instances rather than just classifying cell vs. background pixels.

   cell type segmentation
      A specialized form of :term:`cell segmentation` that not only identifies and delineates individual cells but also classifies them into specific cell types 
      (e.g., lymphocytes, epithelial cells, stromal cells). Combines :term:`instance segmentation` with cell type classification to provide both spatial boundaries and cellular identity.

   Delaunay triangulation
      A geometric method for creating a triangular mesh from a set of points, where no point lies inside the circumcircle of any triangle. 
      Used in :term:`spatial tile graph` construction to define neighborhood relationships between :term:`tiles <tile>` based on natural 
      spatial connectivity rather than fixed distance thresholds. Can be set by using the parameter `use_delaunay` in `pp.tile_graph`.

   feature aggregation
      The process of combining :term:`features` from multiple sources or spatial locations, such as aggregating :term:`patch` features within tissue regions 
      or combining features from neighboring :term:`tiles <tile>` in a :term:`spatial tile graph`.

   feature extraction
      The process of computing numerical representations (:term:`features`) from raw data, such as extracting embeddings from image :term:`patches <patch>` 
      using :term:`vision models <vision model>` or computing :term:`geometric features` from tissue :term:`contours`.

   instance
      An individual object or entity within an image, such as a single cell, nucleus, or tissue structure. 
      
   instance mask
      A binary or labeled mask where each pixel value represents a specific object instance ID, with 0 typically representing background. 
      Used in :term:`instance segmentation` to distinguish between individual objects (e.g., different cells) of the same class, 
      where each cell would have a unique pixel value in the mask.

   instance segmentation
      A type of segmentation that identifies and delineates each individual instance of an object (e.g., each cell) in an image. 
      It is similar to semantic segmentation (associating each pixel with a class label) but distinguishes between different instances of the same class.

   mask
      A 2D array or image where pixel values indicate different regions, classes, or properties. 
      In LazySlide, masks are used to represent :term:`segmentation` results, tissue boundaries, or regions of interest. 
      Can be :term:`binary masks <binary mask>`, multi-class masks, or :term:`instance masks <instance mask>`.

   neighborhood graph construction
      The process of building a graph structure that represents spatial relationships between objects (such as :term:`tiles <tile>` or cells), 
      where edges connect spatially proximate nodes. Used as a foundation for spatial analysis methods.

   non-maximum suppression
   NMS
      A post-processing technique used in object detection and :term:`instance segmentation` to remove duplicate or overlapping detections. 
      NMS keeps only the detection with the highest confidence score among overlapping objects, helping to eliminate redundant predictions 
      and produce cleaner final results.

   panoptic quality
   PQ
      A metric that measures how well a model performs both object detection and segmentation combined. 
      It asks two questions: "Did you find all the objects?" and "How accurately did you outline them?" 
      PQ combines these into one score from 0 to 1, where 1 means perfect detection with perfect boundaries. 
      Commonly used to evaluate :term:`instance segmentation` models using the :term:`Hungarian algorithm` for optimal matching.

   segmentation
      The task of partitioning an image into meaningful regions, such as identifying individual cells or tissue structures.

   semantic segmentation
      A type of segmentation that classifies each pixel in an image into a category (e.g., tumor vs. normal tissue) without distinguishing between individual instances.

   spatial feature smoothing
      A technique for reducing noise and creating spatial coherence in :term:`feature` maps by averaging or interpolating values across neighboring locations 
      in a :term:`spatial tile graph`. Helps create smoother spatial patterns and reduce the impact of outlier measurements.

   spatial tile graph
      A spatial graph structure representing neighborhood relationships between :term:`tiles <tile>` extracted from a :term:`WSI`. 
      Each tile becomes a node, and edges connect spatially adjacent tiles based on distance or :term:`Delaunay triangulation`. 
      This transformation enables analysis methods based on graph theory and graph neural networks.

   tissue segmentation
      The process of identifying and delineating tissue regions within a :term:`WSI`, typically distinguishing tissue from background areas. 
      Used as a preprocessing step to focus analysis on relevant tissue regions and exclude empty slide areas, artifacts, or non-tissue regions.

   unsupervised spatial domain segmentation
      A machine learning approach for automatically identifying distinct spatial regions or domains in tissue without prior labeled examples, 
      typically using clustering methods applied to spatial :term:`features` and :term:`neighborhood graph construction`.

   confusion matrix
      A table used to evaluate the performance of classification models, showing the counts of true positives, true negatives, false positives, and false negatives. 
      Essential for computing various metrics like :term:`precision`, :term:`recall`, and :term:`accuracy` in segmentation and classification tasks.

   Hungarian algorithm
      An algorithm that finds the best way to match items from two groups, minimizing the total cost of matching. 
      In segmentation evaluation, it optimally pairs predicted objects with ground truth objects to compute accurate performance metrics.
      Think of it as solving "which predicted cell should be matched to which real cell?" in the most fair way possible.

   intersection over union
   IoU
      A metric used to evaluate the accuracy of object detection and segmentation models, calculated as the area of overlap between predicted and ground truth regions divided by the area of their union. 
      Values range from 0 (no overlap) to 1 (perfect overlap). Also known as the Jaccard index.



.. _scverse: https://scverse.org/
.. _WSIData: https://wsidata.readthedocs.io/
.. _SpatialData: https://spatialdata.scverse.org/
.. _Scanpy: https://scanpy.readthedocs.io/
.. _Anndata: https://anndata.readthedocs.io/
.. _Squidpy: https://squidpy.readthedocs.io/
.. _Shapely: https://shapely.readthedocs.io/