Albumentations in Life Sciences: Who Actually Uses It

Vladimir Iglovikov
Vladimir Iglovikov
Maintainer
10 min read
life-sciencesbiomedical-imagingmicroscopyhistopathologyradiologybioimage-analysisadoption
Albumentations in Life Sciences: Who Actually Uses It

Albumentations is shared image-augmentation infrastructure for life-sciences AI. It shows up in radiology, histopathology, microscopy, endoscopy, ophthalmology, infectious-disease imaging, neuroscience imaging, and cell-analysis workflows.

This post is the receipts: how many life-sciences papers cite it, which OSS library declares it as a direct dependency, which named organizations import it in public repositories, and where it appears in public Hugging Face model and dataset cards.

All numbers below come from an internal evidence pipeline over public sources: citation metadata, GitHub Code Search, the Hugging Face Hub, and root-level packaging files (requirements.txt, pyproject.toml, etc.) in each OSS repo. The derived CSVs used for this audit are not published with the blog post, so treat the tables as an evidence brief rather than a fully self-contained replication package. The org-scoped GitHub query is org:<name> "import albumentations".

Headline

  • 563 life-sciences papers cite Albumentations
  • 1 OSS life-sciences library declares it as a direct dependency
  • 12 public repositories across 3 named life-sciences organizations import it
  • 33 Hugging Face artifacts in the life-sciences / biomedical-imaging tag space reference it

"Albumentations" here means the project stewarded by Albumentations LLC: the legacy MIT albumentations package (archived June 2025) plus the maintained successor albumentationsx (AGPL-3.0 + Commercial), which preserves API compatibility. See the dual-licensing post for context.

This is broader than the earlier medical-imaging audit. "Life sciences" includes clinical imaging, but also bioimage analysis, microscopy, cell biology, infectious-disease imaging, neuroscience imaging, and high-content screening.

Why Life Sciences Pulls in an Augmentation Library at All

Life-sciences image data is messy in a very specific way. It is not just "photos, but harder." A training sample might be a pathology tile, a fluorescence microscopy stack, a phase-contrast video frame, an OCT slice, a retinal image, a CT patch, a bacterial colony image, a cell mask, a polyp box, a landmarked organ view, or a multichannel assay plate.

Three details make augmentation infrastructure matter:

  1. Labels and images have to move together. Masks for nuclei, organs, lesions, cells, plaques, cysts, vessels, and tissue regions have to stay pixel-aligned with the image. The same is true for bounding boxes and keypoints. Albumentations is built around Compose over (image, mask, bboxes, keypoints), which is why it appears in segmentation, detection, and measurement pipelines rather than only in image-classification scripts.
  2. The valid invariances are biological and clinical, not generic. A square symmetry can be reasonable for histology tiles or microscopy crops. A horizontal flip can be wrong for laterality-sensitive radiology, ophthalmology, or surgical-orientation tasks. Brightness and contrast jitter may model staining, illumination, or scanner variation, but it is not a substitute for physics-aware acquisition modeling. The library gives you the mechanism; the domain decides what variation preserves the label.
  3. Multichannel throughput matters. Life-sciences data often goes beyond RGB: fluorescence channels, CT window stacks, multispectral microscopy, derived masks, and auxiliary channels. Augmentation usually runs CPU-side inside a data loader and has to feed the GPU. In the current 9-channel CPU benchmark, AlbumentationsX is fastest on 30 of 42 transforms, with pairwise wins on 33 of 41 transforms vs Kornia and 15 of 23 transforms vs Torchvision. That benchmark is not a biomedical benchmark by itself, but the arbitrary-channel constraint is directly relevant to life-sciences workflows.

Concretely, a conservative microscopy or pathology segmentation pipeline can look like this:

import albumentations as A
import numpy as np

image = np.load("microscopy_tile.npy")
mask = np.load("cell_mask.npy")

transform = A.Compose([
    A.RandomCrop(height=512, width=512),
    A.SquareSymmetry(p=1.0),
    A.Affine(
        scale=(0.9, 1.1),
        translate_percent=(-0.03, 0.03),
        rotate=(-10, 10),
        shear=(-3, 3),
        p=0.5,
    ),
    A.RandomBrightnessContrast(
        brightness_range=(-0.08, 0.08),
        contrast_range=(-0.08, 0.08),
        p=0.4,
    ),
    A.GaussNoise(std_range=(0.01, 0.04), p=0.2),
])

out = transform(image=image, mask=mask)
tile, label = out["image"], out["mask"]

In order, that pipeline is RandomCrop -> SquareSymmetry -> Affine -> RandomBrightnessContrast -> GaussNoise. For tissue patches, microscopy tiles, or cell-imaging crops, square symmetries are often defensible because there is no canonical camera-up direction. For chest X-ray, retinal laterality, surgical views, or acquisition-protocol-sensitive tasks, the same transform can be a bug.

The same Compose pipeline would also accept bboxes=... and keypoints=... and keep them aligned.

OSS Life-Sciences Libraries That Depend on Albumentations

These are repository-rooted facts. The dependency is declared in packaging files, not inferred from a citation graph or README mention.

Of 18 verified life-sciences OSS projects, 1 project declares albumentations as a direct dependency:

LibraryOrgEvidence file(s)Repo
TIAToolboxTissue Image Analytics Centrerequirements/requirements.txtTissueImageAnalytics/tiatoolbox

TIAToolbox matters because it is a reusable pathology toolkit, not a one-off experiment repository. Direct dependency counts are conservative by design. They miss internal pharmaceutical, hospital, biotechnology, and research pipelines, plus public repositories that import Albumentations in training scripts without packaging it as a reusable library.

Named Life-Sciences Organizations Using It

Org-scoped GitHub Code Search (org:<name> "import albumentations") found import albumentations in 12 repositories across 3 organizations from a hand-curated tier-1 life-sciences list: medical AI toolkits, bioimage-analysis projects, microscopy and pathology tooling, clinical-imaging OSS, and life-science research labs.

OrganizationReposType
MIC-DKFZ9Organization
bowang-lab2Organization
TissueImageAnalytics1Organization

MIC-DKFZ is the largest public-code cluster in this audit. TissueImageAnalytics is the clearest reusable-library signal because TIAToolbox declares Albumentations as a dependency and imports it in stain-augmentation tooling. bowang-lab contributes public medical and biological imaging training code where Albumentations appears in the data pipeline.

A representative path list from the search:

RepoFile
MIC-DKFZ/AGGC2022data/test_augs.py
MIC-DKFZ/BodyPartRegressionbpreg/preprocessing/nrrd2npy.py
MIC-DKFZ/diabetes-xaifeature_extraction/extract_features_fp_timm.py
MIC-DKFZ/generalized_yolov5utils/augmentations.py
MIC-DKFZ/help_a_hematologist_out_challengeaugmentation/policies/cifar.py
MIC-DKFZ/image_classificationsrc/glovita/augmentation/policies/dataset_specific/aid.py
MIC-DKFZ/perovskite-xaidata/augmentations/perov_2d.py
MIC-DKFZ/radioactivesrc/radioa/model/SAMMed2D.py
MIC-DKFZ/semantic_segmentationsrc/semantic_segmentation/datasets/base_dataset.py
TissueImageAnalytics/tiatoolboxtiatoolbox/tools/stainaugment.py
bowang-lab/EchoJEPAdata/batch_depth_attenuation.py
bowang-lab/MedSAMSlicerMedSAMLite/Resources/server_essentials/medsam_interface/engines/src/data/medsam_datamodule.py

Academic Citations

Albumentations is cited by 563 unique life-sciences / biomedical-imaging papers. The count is filtered from an internal citation export containing 2,470 unique citing papers and 12,371 author-paper-affiliation rows.

The citation data is deduplicated by paper URL, with paper title as fallback. That detail matters because the raw citation export contains one row per (paper x author x affiliation), so counting rows would overstate adoption.

Year-over-Year Growth

YearLife-sciences papers citing Albumentations
202018
202140
202278
202397
2024113
2025148
202669

The visible pattern is steady growth through 2025, with 2026 already substantial as of May 12. The conservative interpretation is simple: life-sciences ML papers increasingly publish code, increasingly use standard augmentation libraries instead of local one-off transforms, and increasingly cite the tooling that sits in the training pipeline.

Top-Cited Life-Sciences Papers (Sample)

The truncated titles are exactly what the citation export returned in this audit. The point of the table is not bibliographic polish; it is a concrete sample of life-sciences papers where Albumentations appears in the citation trail.

Top Affiliations

Affiliations with at least three life-sciences papers in the filtered citation set:

AffiliationPapers
Radboud University Medical Center6
University of Electronic Science and Technology of China6
University College London5
University of Pennsylvania5
Memorial Sloan Kettering Cancer Center4
Technical University of Munich4
University of Oxford4
University of Ulsan College of Medicine, Seoul4
Affiliated Hospital of Hubei University of Arts and Science3
Beihang University3
Case Western Reserve University3
Chinese Academy of Sciences, Shenzhen3
Chulalongkorn University3
Concordia University3
First Affiliated Hospital of Jinan University3

Hugging Face Ecosystem

Across Hugging Face Hub artifacts tagged medical / medical-imaging / radiology / histopathology / microscopy / healthcare / biology / bioimage / cell-segmentation / drug-discovery, 33 artifacts reference Albumentations in their model or dataset card: 32 models and 1 dataset.

The absolute download counts are small for most of these cards, which is normal for specialized biomedical artifacts on Hugging Face. The useful signal is not popularity ranking. The useful signal is that Albumentations appears in public training recipes across radiology, histopathology, endoscopy, pressure-sore classification, polyp segmentation, cell segmentation, and related biomedical tasks.

KindIDDownloadsLikesTags
modelSnarcy/RedDino-large4231medical-imaging
datasetLosHuesitos9-9/Huesitos661medical
modelLab-Rasool/PRIMER91radiology
modelibrahim313/ducknet-polyp-segmentation41medical-imaging
modelRuthvikBandari/DiaFootAI40medical-imaging
modelThiyaga158/Custom_CNN_For_Pneumonia_Detection_Using_Check_X-Ray00healthcare; medical-imaging
modeldheeren-tejani/DiabeticRetinpathyClassifier00medical-imaging
modeladelelsayed1991/chexpert-mae-densenet-fpn00healthcare; medical-imaging
modelayanahmedkhan/VIT-gi-endoscopy-classifier00medical-imaging
modelRuthvikBandari/DiaFoot.AI-v200medical-imaging
modeltanishq74/retinasense-vit01medical-imaging
modelMrCzaro/Pressure_sore_cascade_classifier_Torch00medical-imaging
modelcsmp-hub/cellpose-histo-hgsc-nuc-v100histopathology
modelcsmp-hub/hovernet-histo-hgsc-nuc-v100histopathology
modelcsmp-hub/stardist-histo-hgsc-nuc-v100histopathology
modelcsmp-hub/cellvit-histo-hgsc-nuc-v100histopathology
modelcsmp-hub/cppnet-histo-hgsc-nuc-v100histopathology
modelhistolytics-hub/hovernet-histo-hgsc-pan-v100histopathology
modelhistolytics-hub/cellpose-histo-hgsc-pan-v100histopathology
modelhistolytics-hub/stardist-histo-hgsc-pan-v100histopathology

Life-Sciences Subcategory Rollup

A single paper or repository can match more than one subcategory, so these are evidence rollups rather than mutually exclusive totals.

Academic Papers

SubcategoryCount
Radiology and clinical imaging259
Biomedical imaging75
Microscopy and bioimage analysis56
Histopathology and digital pathology44
Infectious disease and immunology imaging43
Neuroscience imaging29
Cell and developmental biology imaging6
Therapeutics discovery and high-content screening2

Public Repositories

SubcategoryCount
Histopathology and digital pathology1

Hugging Face Artifacts

SubcategoryCount
Histopathology and digital pathology20
Microscopy and bioimage analysis9
Radiology and clinical imaging2

What This Means

Life-sciences image workflows depend on label-preserving transforms: microscopy channels, histopathology tiles, radiology slices, endoscopy frames, cell masks, organ masks, boxes, landmarks, and metadata all have to stay aligned. The public evidence above shows the Albumentations ecosystem acting as shared infrastructure across those workflows, not as a single-purpose medical-imaging script.

The most important caveat is that biological and clinical augmentation is less forgiving than generic computer vision. A transform can be technically correct and scientifically wrong. HorizontalFlip can be harmless for many tissue patches and harmful for laterality-sensitive tasks. RandomBrightnessContrast can model nuisance variation in illumination or staining, but it does not replace scanner or assay physics. ElasticTransform can help in some microscopy and histology segmentation settings and can destroy morphology in others.

Every named org in the table above is a current, public-code user. TIAToolbox ships Albumentations transitively to its users. The 563-paper citation count is a lower bound because it only counts papers whose metadata explicitly contains life-sciences or biomedical-imaging keywords. It does not attempt to count private clinical, pharmaceutical, biotechnology, or research usage.

If you maintain a life-sciences OSS project, foundation model, or training pipeline and want to be added to or removed from this evidence set, ping me. The audit is scripted internally and can be rerun on request.


This brief is generated from an internal evidence pipeline over public APIs and public repository files. The derived artifacts are not published with this post. Last regenerated 2026-05-12.

Hero image: cropped and resized from An Image of Microorganisms by turek on Pexels.