lazyslide_models.vision.Moozy

lazyslide_models.vision.Moozy#

class Moozy(model_path=None, token=None)#

Bases: SlideEncoderModel

moozy 🤗Hugging Face GitHub Paper Params: 85.77M CC-BY-NC-SA-4.0 [Kotp et al., 2026] A patient-first foundation model for computational pathology MOOZY slide and case encoder.

The slide encoder requires spatial coordinates and patch sizes for its ALiBi position bias. Pass coords (xy positions) and patch_sizes as keyword arguments to encode_slide().

The case transformer aggregates multiple slide embeddings into a single patient-level representation via encode_case().

encode_case(slide_embeddings)#

Aggregate slide embeddings into a case-level embedding.

Parameters:
slide_embeddingstorch.Tensor

Slide-level CLS embeddings. Shape [S, 768] where S is the number of slides for a patient case.

Returns:
torch.Tensor

Case embedding of shape [768].

encode_slide(embeddings, coords=None, **kwargs)#

Encode patch features into a slide-level embedding.

Parameters:
embeddingstorch.Tensor

Patch features. Accepted shapes:

  • [B, H, W, 384] — spatial grid layout (native format).

  • [H, W, 384] — single slide spatial grid (will be unsqueezed).

  • [B, T, 384] — flat sequence; will be reshaped to a square grid (T must be a perfect square or will be zero-padded).

  • [T, 384] — single slide flat sequence.

coordstorch.Tensor

Spatial coordinates for each patch token. Must match the spatial layout of embeddings. Shape [B, H, W, 2] or [H, W, 2] for grid inputs, or [B, T, 2] / [T, 2] for flat inputs. Required — the ALiBi position bias needs real-space positions.

**kwargs
patch_sizesfloat or torch.Tensor, optional

Patch size in level-0 pixels. Defaults to 224.

invalid_masktorch.Tensor, optional

Boolean mask [B, H, W] where True = invalid/background.

Returns:
dict

{"embeddings": cls_output} where cls_output is [B, 768].