Albumentations in Life Sciences: Who Actually Uses It

Albumentations is shared image-augmentation infrastructure for life-sciences AI. It shows up in radiology, histopathology, microscopy, endoscopy, ophthalmology, infectious-disease imaging, neuroscience imaging, and cell-analysis workflows.

This post is the receipts: how many life-sciences papers cite it, which OSS library declares it as a direct dependency, which named organizations import it in public repositories, and where it appears in public Hugging Face model and dataset cards.

All numbers below come from an internal evidence pipeline over public sources: citation metadata, GitHub Code Search, the Hugging Face Hub, and root-level packaging files (requirements.txt, pyproject.toml, etc.) in each OSS repo. The derived CSVs used for this audit are not published with the blog post, so treat the tables as an evidence brief rather than a fully self-contained replication package. The org-scoped GitHub query is org:<name> "import albumentations".

Headline

563 life-sciences papers cite Albumentations
1 OSS life-sciences library declares it as a direct dependency
12 public repositories across 3 named life-sciences organizations import it
33 Hugging Face artifacts in the life-sciences / biomedical-imaging tag space reference it

"Albumentations" here means the project stewarded by Albumentations LLC: the legacy MIT albumentations package (archived June 2025) plus the maintained successor albumentationsx (AGPL-3.0 + Commercial), which preserves API compatibility. See the dual-licensing post for context.

This is broader than the earlier medical-imaging audit. "Life sciences" includes clinical imaging, but also bioimage analysis, microscopy, cell biology, infectious-disease imaging, neuroscience imaging, and high-content screening.

Why Life Sciences Pulls in an Augmentation Library at All

Life-sciences image data is messy in a very specific way. It is not just "photos, but harder." A training sample might be a pathology tile, a fluorescence microscopy stack, a phase-contrast video frame, an OCT slice, a retinal image, a CT patch, a bacterial colony image, a cell mask, a polyp box, a landmarked organ view, or a multichannel assay plate.

Three details make augmentation infrastructure matter:

Labels and images have to move together. Masks for nuclei, organs, lesions, cells, plaques, cysts, vessels, and tissue regions have to stay pixel-aligned with the image. The same is true for bounding boxes and keypoints. Albumentations is built around Compose over (image, mask, bboxes, keypoints), which is why it appears in segmentation, detection, and measurement pipelines rather than only in image-classification scripts.
The valid invariances are biological and clinical, not generic. A square symmetry can be reasonable for histology tiles or microscopy crops. A horizontal flip can be wrong for laterality-sensitive radiology, ophthalmology, or surgical-orientation tasks. Brightness and contrast jitter may model staining, illumination, or scanner variation, but it is not a substitute for physics-aware acquisition modeling. The library gives you the mechanism; the domain decides what variation preserves the label.
Multichannel throughput matters. Life-sciences data often goes beyond RGB: fluorescence channels, CT window stacks, multispectral microscopy, derived masks, and auxiliary channels. Augmentation usually runs CPU-side inside a data loader and has to feed the GPU. In the current 9-channel CPU benchmark, AlbumentationsX is fastest on 30 of 42 transforms, with pairwise wins on 33 of 41 transforms vs Kornia and 15 of 23 transforms vs Torchvision. That benchmark is not a biomedical benchmark by itself, but the arbitrary-channel constraint is directly relevant to life-sciences workflows.

Concretely, a conservative microscopy or pathology segmentation pipeline can look like this:

import albumentations as A
import numpy as np

image = np.load("microscopy_tile.npy")
mask = np.load("cell_mask.npy")

transform = A.Compose([
    A.RandomCrop(height=512, width=512),
    A.SquareSymmetry(p=1.0),
    A.Affine(
        scale=(0.9, 1.1),
        translate_percent=(-0.03, 0.03),
        rotate=(-10, 10),
        shear=(-3, 3),
        p=0.5,
    ),
    A.RandomBrightnessContrast(
        brightness_range=(-0.08, 0.08),
        contrast_range=(-0.08, 0.08),
        p=0.4,
    ),
    A.GaussNoise(std_range=(0.01, 0.04), p=0.2),
])

out = transform(image=image, mask=mask)
tile, label = out["image"], out["mask"]

In order, that pipeline is RandomCrop -> SquareSymmetry -> Affine -> RandomBrightnessContrast -> GaussNoise. For tissue patches, microscopy tiles, or cell-imaging crops, square symmetries are often defensible because there is no canonical camera-up direction. For chest X-ray, retinal laterality, surgical views, or acquisition-protocol-sensitive tasks, the same transform can be a bug.

The same Compose pipeline would also accept bboxes=... and keypoints=... and keep them aligned.

OSS Life-Sciences Libraries That Depend on Albumentations

These are repository-rooted facts. The dependency is declared in packaging files, not inferred from a citation graph or README mention.

Of 18 verified life-sciences OSS projects, 1 project declares albumentations as a direct dependency:

Library	Org	Evidence file(s)	Repo
TIAToolbox	Tissue Image Analytics Centre	`requirements/requirements.txt`	TissueImageAnalytics/tiatoolbox

TIAToolbox matters because it is a reusable pathology toolkit, not a one-off experiment repository. Direct dependency counts are conservative by design. They miss internal pharmaceutical, hospital, biotechnology, and research pipelines, plus public repositories that import Albumentations in training scripts without packaging it as a reusable library.

Named Life-Sciences Organizations Using It

Org-scoped GitHub Code Search (org:<name> "import albumentations") found import albumentations in 12 repositories across 3 organizations from a hand-curated tier-1 life-sciences list: medical AI toolkits, bioimage-analysis projects, microscopy and pathology tooling, clinical-imaging OSS, and life-science research labs.

Organization	Repos	Type
MIC-DKFZ	9	Organization
bowang-lab	2	Organization
TissueImageAnalytics	1	Organization

MIC-DKFZ is the largest public-code cluster in this audit. TissueImageAnalytics is the clearest reusable-library signal because TIAToolbox declares Albumentations as a dependency and imports it in stain-augmentation tooling. bowang-lab contributes public medical and biological imaging training code where Albumentations appears in the data pipeline.

A representative path list from the search:

Repo	File
MIC-DKFZ/AGGC2022	`data/test_augs.py`
MIC-DKFZ/BodyPartRegression	`bpreg/preprocessing/nrrd2npy.py`
MIC-DKFZ/diabetes-xai	`feature_extraction/extract_features_fp_timm.py`
MIC-DKFZ/generalized_yolov5	`utils/augmentations.py`
MIC-DKFZ/help_a_hematologist_out_challenge	`augmentation/policies/cifar.py`
MIC-DKFZ/image_classification	`src/glovita/augmentation/policies/dataset_specific/aid.py`
MIC-DKFZ/perovskite-xai	`data/augmentations/perov_2d.py`
MIC-DKFZ/radioactive	`src/radioa/model/SAMMed2D.py`
MIC-DKFZ/semantic_segmentation	`src/semantic_segmentation/datasets/base_dataset.py`
TissueImageAnalytics/tiatoolbox	`tiatoolbox/tools/stainaugment.py`
bowang-lab/EchoJEPA	`data/batch_depth_attenuation.py`
bowang-lab/MedSAMSlicer	`MedSAMLite/Resources/server_essentials/medsam_interface/engines/src/data/medsam_datamodule.py`

Academic Citations

Albumentations is cited by 563 unique life-sciences / biomedical-imaging papers. The count is filtered from an internal citation export containing 2,470 unique citing papers and 12,371 author-paper-affiliation rows.

The citation data is deduplicated by paper URL, with paper title as fallback. That detail matters because the raw citation export contains one row per (paper x author x affiliation), so counting rows would overstate adoption.

Year-over-Year Growth

Year	Life-sciences papers citing Albumentations
2020	18
2021	40
2022	78
2023	97
2024	113
2025	148
2026	69

The visible pattern is steady growth through 2025, with 2026 already substantial as of May 12. The conservative interpretation is simple: life-sciences ML papers increasingly publish code, increasingly use standard augmentation libraries instead of local one-off transforms, and increasingly cite the tooling that sits in the training pipeline.

Top-Cited Life-Sciences Papers (Sample)

Citations	Year	Paper	Matched keyword
6	2025	Rapid label-free identification of seven bacterial species using microfluidics, single-cell time-lapse phase-contrast mi	microscopy
6	2024	Rapid label-free identification of seven bacterial species using microfluidics, single-cell time-lapse phase-contrast mi	microscopy
5	2021	Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An exp	histopathology
5	2024	Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learn	whole slide
5	2025	Automatic labels are as effective as manual labels in digital pathology images classification with deep learning	digital pathology
5	2021	Impact of Lung Segmentation on the Diagnosis and Explanation of COVID-19 in Chest X-ray Images	covid
5	2024	MTANet: Multi-Type Attention Ensemble for Malaria Parasite Detection	malaria
5	2025	Segmentation and quantification of atherosclerotic plaques in optical coherence tomography	optical coherence tomography
5	2026	A Transformer-Based Framework for OCT Cyst Segmentation	oct
4	2023	AUTOMATIC POLYP SEMANTIC SEGMENTATION USING WIRELESS CAPSULE ENDOSCOPY IMAGES WITH VARIOUS CONVOLUTIONAL NEURAL NETWORK	endoscopy

The truncated titles are exactly what the citation export returned in this audit. The point of the table is not bibliographic polish; it is a concrete sample of life-sciences papers where Albumentations appears in the citation trail.

Top Affiliations

Affiliations with at least three life-sciences papers in the filtered citation set:

Affiliation	Papers
Radboud University Medical Center	6
University of Electronic Science and Technology of China	6
University College London	5
University of Pennsylvania	5
Memorial Sloan Kettering Cancer Center	4
Technical University of Munich	4
University of Oxford	4
University of Ulsan College of Medicine, Seoul	4
Affiliated Hospital of Hubei University of Arts and Science	3
Beihang University	3
Case Western Reserve University	3
Chinese Academy of Sciences, Shenzhen	3
Chulalongkorn University	3
Concordia University	3
First Affiliated Hospital of Jinan University	3

Hugging Face Ecosystem

Across Hugging Face Hub artifacts tagged medical / medical-imaging / radiology / histopathology / microscopy / healthcare / biology / bioimage / cell-segmentation / drug-discovery, 33 artifacts reference Albumentations in their model or dataset card: 32 models and 1 dataset.

The absolute download counts are small for most of these cards, which is normal for specialized biomedical artifacts on Hugging Face. The useful signal is not popularity ranking. The useful signal is that Albumentations appears in public training recipes across radiology, histopathology, endoscopy, pressure-sore classification, polyp segmentation, cell segmentation, and related biomedical tasks.

Kind	ID	Downloads	Likes	Tags
model	Snarcy/RedDino-large	423	1	medical-imaging
dataset	LosHuesitos9-9/Huesitos	66	1	medical
model	Lab-Rasool/PRIMER	9	1	radiology
model	ibrahim313/ducknet-polyp-segmentation	4	1	medical-imaging
model	RuthvikBandari/DiaFootAI	4	0	medical-imaging
model	Thiyaga158/Custom_CNN_For_Pneumonia_Detection_Using_Check_X-Ray	0	0	healthcare; medical-imaging
model	dheeren-tejani/DiabeticRetinpathyClassifier	0	0	medical-imaging
model	adelelsayed1991/chexpert-mae-densenet-fpn	0	0	healthcare; medical-imaging
model	ayanahmedkhan/VIT-gi-endoscopy-classifier	0	0	medical-imaging
model	RuthvikBandari/DiaFoot.AI-v2	0	0	medical-imaging
model	tanishq74/retinasense-vit	0	1	medical-imaging
model	MrCzaro/Pressure_sore_cascade_classifier_Torch	0	0	medical-imaging
model	csmp-hub/cellpose-histo-hgsc-nuc-v1	0	0	histopathology
model	csmp-hub/hovernet-histo-hgsc-nuc-v1	0	0	histopathology
model	csmp-hub/stardist-histo-hgsc-nuc-v1	0	0	histopathology
model	csmp-hub/cellvit-histo-hgsc-nuc-v1	0	0	histopathology
model	csmp-hub/cppnet-histo-hgsc-nuc-v1	0	0	histopathology
model	histolytics-hub/hovernet-histo-hgsc-pan-v1	0	0	histopathology
model	histolytics-hub/cellpose-histo-hgsc-pan-v1	0	0	histopathology
model	histolytics-hub/stardist-histo-hgsc-pan-v1	0	0	histopathology

Life-Sciences Subcategory Rollup

A single paper or repository can match more than one subcategory, so these are evidence rollups rather than mutually exclusive totals.

Academic Papers

Subcategory	Count
Radiology and clinical imaging	259
Biomedical imaging	75
Microscopy and bioimage analysis	56
Histopathology and digital pathology	44
Infectious disease and immunology imaging	43
Neuroscience imaging	29
Cell and developmental biology imaging	6
Therapeutics discovery and high-content screening	2

Public Repositories

Subcategory	Count
Histopathology and digital pathology	1

Hugging Face Artifacts

Subcategory	Count
Histopathology and digital pathology	20
Microscopy and bioimage analysis	9
Radiology and clinical imaging	2

What This Means

Life-sciences image workflows depend on label-preserving transforms: microscopy channels, histopathology tiles, radiology slices, endoscopy frames, cell masks, organ masks, boxes, landmarks, and metadata all have to stay aligned. The public evidence above shows the Albumentations ecosystem acting as shared infrastructure across those workflows, not as a single-purpose medical-imaging script.

The most important caveat is that biological and clinical augmentation is less forgiving than generic computer vision. A transform can be technically correct and scientifically wrong. HorizontalFlip can be harmless for many tissue patches and harmful for laterality-sensitive tasks. RandomBrightnessContrast can model nuisance variation in illumination or staining, but it does not replace scanner or assay physics. ElasticTransform can help in some microscopy and histology segmentation settings and can destroy morphology in others.

Every named org in the table above is a current, public-code user. TIAToolbox ships Albumentations transitively to its users. The 563-paper citation count is a lower bound because it only counts papers whose metadata explicitly contains life-sciences or biomedical-imaging keywords. It does not attempt to count private clinical, pharmaceutical, biotechnology, or research usage.

If you maintain a life-sciences OSS project, foundation model, or training pipeline and want to be added to or removed from this evidence set, ping me. The audit is scripted internally and can be rerun on request.

This brief is generated from an internal evidence pipeline over public APIs and public repository files. The derived artifacts are not published with this post. Last regenerated 2026-05-12.

Hero image: cropped and resized from An Image of Microorganisms by turek on Pexels.