lazyslide_models.vision.Moozy

lazyslide_models.vision.Moozy#

class Moozy(model_path=None, token=None)#

Bases: SlideEncoderModel

moozy 🤗Hugging Face GitHub Paper Params: 85.77M CC-BY-NC-SA-4.0 [Kotp et al., 2026] A patient-first foundation model for computational pathology MOOZY slide and case encoder.

The slide encoder requires spatial coordinates and patch sizes for its ALiBi position bias. Pass coords (xy positions) and patch_sizes as keyword arguments to encode_slide().

The case transformer aggregates multiple slide embeddings into a single patient-level representation via encode_case().

encode_case(slide_embeddings)#

Aggregate slide embeddings into a case-level embedding.

Parameters:

slide_embeddingstorch.Tensor: Slide-level CLS embeddings. Shape [S, 768] where S is the number of slides for a patient case.

Returns:

torch.Tensor: Case embedding of shape [768].

encode_slide(embeddings, coords=None, **kwargs)#

Encode patch features into a slide-level embedding.

Parameters:

embeddingstorch.Tensor

Patch features. Accepted shapes:

[B, H, W, 384] — spatial grid layout (native format).
[H, W, 384] — single slide spatial grid (will be unsqueezed).
[B, T, 384] — flat sequence; will be reshaped to a square grid (T must be a perfect square or will be zero-padded).
[T, 384] — single slide flat sequence.

coordstorch.Tensor

Spatial coordinates for each patch token. Must match the spatial layout of embeddings. Shape [B, H, W, 2] or [H, W, 2] for grid inputs, or [B, T, 2] / [T, 2] for flat inputs. Required — the ALiBi position bias needs real-space positions.

**kwargs

patch_sizesfloat or torch.Tensor, optional: Patch size in level-0 pixels. Defaults to 224.
invalid_masktorch.Tensor, optional: Boolean mask [B, H, W] where True = invalid/background.

Returns:

dict: {"embeddings": cls_output} where cls_output is [B, 768].

lazyslide_models.vision.Moozy

Contents

lazyslide_models.vision.Moozy#