AlbumentationsX vs Torchvision

Compare AlbumentationsX with Torchvision transforms: API differences, RGB image benchmark results, and a PyTorch migration guide.

What Is Different?

Torchvision is the native augmentation layer for the PyTorch ecosystem. AlbumentationsX is framework-independent and optimized around NumPy/OpenCV image augmentation before tensors enter the model.

  • Torchvision commonly operates on PIL images or tensors; AlbumentationsX operates on NumPy arrays and can emit PyTorch tensors with ToTensorV2.
  • Torchvision integrates tightly with PyTorch datasets and model examples; AlbumentationsX focuses on faster CPU augmentation and richer computer-vision target handling.
  • AlbumentationsX pipelines pass named targets such as image, mask, bboxes, and keypoints; Torchvision v1-style pipelines are mostly image-first unless you use the newer tv_tensors stack.
  • AlbumentationsX tends to be easier when geometric transforms must stay consistent across masks, boxes, and keypoints.

Benchmark vs Torchvision

The benchmark data is generated during the site build from the published benchmark artifacts. Results are shown for the full PyTorch DataLoader pipeline and for the single-operation RGB micro benchmark. Published GPU artifacts are shown in the DataLoader table where they exist; the RGB micro table on this page is CPU.

DataLoader Pipeline

End-to-end in-memory DataLoader throughput with 8 workers, batch size 256, normalization, tensor conversion, batching, and collation included. The table shows the variable transform name only. Higher images/second is better. CPU and GPU columns are shown where published artifacts exist.

54 / 57
pipelines where AlbumentationsX is faster
2.2x
average speedup
average computed from benchmark table rows
2.2.6 vs 0.26.0
library versions
from generated benchmark metadata
TransformSpeedup
Albx / best
AlbumentationsX
CPU · 2.2.6
Torchvision
CPU · 0.26.0
Torchvision
GPU · 0.26.0
Elastic12x2878 ± 0234 ± 0118 ± 0
PhotoMetricDistort3.5x4269 ± 01234 ± 0580 ± 2
ColorJitter3.4x4299 ± 01263 ± 0629 ± 3
ColorJiggle3.4x4255 ± 01267 ± 0626 ± 3
Sharpen2.1-2.2x4978 ± 01899 ± 02304 ± 13
GaussianBlur1.9x5029 ± 02654 ± 01957 ± 14
AutoContrast1.8-1.9x4646 ± 02424 ± 02562 ± 78
Contrast1.6x5435 ± 03313 ± 02282 ± 15
Perspective1.6x4132 ± 02640 ± 0792 ± 4
Rotate1.5x4865 ± 03141 ± 01273 ± 8
Equalize1.5x4511 ± 03062 ± 01518 ± 5
Affine1.4x4528 ± 03175 ± 01221 ± 4
Brightness1.4x5310 ± 03888 ± 03097 ± 16
Erasing1.3x5415 ± 04068 ± 02150 ± 27
Solarize1.3x5407 ± 03569 ± 04130 ± 90
Grayscale1.3x5378 ± 04188 ± 03960 ± 122
JpegCompression1.2x4267 ± 03447 ± 0
RandomResizedCrop1.2-1.3x5056 ± 03870 ± 04140 ± 182
ChannelShuffle1.2x5467 ± 04543 ± 03631 ± 51
Posterize1.1-1.2x5431 ± 04126 ± 04726 ± 73
Invert1.1-1.2x5569 ± 04065 ± 04868 ± 104
VerticalFlip1.1x5512 ± 04008 ± 05020 ± 58
Pad1.1x4996 ± 04162 ± 04634 ± 36
HorizontalFlip1.0x5002 ± 04024 ± 05050 ± 58
RandomCrop2240.8x5130 ± 04367 ± 06253 ± 97
Resize0.5x1350 ± 01235 ± 02829 ± 98
Blur5507 ± 0
CLAHE3505 ± 0
ChannelDropout5314 ± 0
CornerIllumination3969 ± 0
EnhanceDetail5033 ± 0
EnhanceEdge4923 ± 0
GaussianIllumination3656 ± 0
GaussianNoise3415 ± 0
Hue4723 ± 0
LinearIllumination4179 ± 0
LongestMaxSize1363 ± 0
MedianBlur4038 ± 0
MotionBlur4785 ± 0
OpticalDistortion3579 ± 0
PlankianJitter4979 ± 0
PlasmaBrightness2727 ± 0
PlasmaContrast2291 ± 0
PlasmaShadow2797 ± 0
RGBShift4977 ± 0
Rain4198 ± 0
RandomGamma5450 ± 0
RandomJigsaw4863 ± 0
RandomRotate905087 ± 0
SaltAndPepper4508 ± 0
Saturation4646 ± 0
Shear4141 ± 0
SmallestMaxSize1370 ± 0
Snow4135 ± 0
ThinPlateSpline743 ± 0
Transpose5338 ± 0
UnsharpMask4635 ± 0

Micro Benchmark

The benchmark below compares single-threaded RGB image augmentation throughput. Torchvision receives PIL images; AlbumentationsX receives OpenCV-loaded RGB NumPy arrays. Higher images/second is better.

26 / 26
transforms where AlbumentationsX is faster
5.61x
median speedup
3.78x-12.01x IQR
2.2.6 vs 0.26.0
library versions
from generated benchmark metadata
TransformSpeedup
Albx / best
albumentationsx
2.2.6
Torchvision
0.26.0
GaussianBlur27x2343 ± 486 ± 0
Sharpen18-19x1388 ± 575 ± 0
Solarize18x9760 ± 34545 ± 10
Contrast14-15x6933 ± 30475 ± 7
ColorJitter14x641 ± 147 ± 0
ColorJiggle13-14x639 ± 547 ± 0
PhotoMetricDistort13x581 ± 445 ± 0
Elastic≥9.5x191 ± 0≤20
Brightness8.4-8.8x6912 ± 13804 ± 15
AutoContrast7.7-7.9x1243 ± 19159 ± 0
Rotate6.1-6.5x1408 ± 40223 ± 1
Invert5.4-6.1x15095 ± 612619 ± 152
VerticalFlip5.3-6.0x14051 ± 552490 ± 149
Posterize5.2-5.9x14399 ± 582598 ± 137
Pad5.1-5.8x13181 ± 1182420 ± 122
Erasing4.9-5.3x9511 ± 741872 ± 71
RandomCrop2243.9-5.3x38380 ± 1928492 ± 1207
Grayscale4.2-4.5x5194 ± 11198 ± 32
HorizontalFlip4.0-4.4x8416 ± 191999 ± 82
Affine3.6-3.7x872 ± 8240 ± 1
Perspective2.7-2.8x559 ± 2202 ± 2
Equalize2.6x807 ± 3313 ± 1
RandomResizedCrop2.4-2.7x7150 ± 192823 ± 172
Resize2.4-2.6x2463 ± 37979 ± 23
ChannelShuffle2.2-2.4x4337 ± 131866 ± 72
JpegCompression1.3-1.4x692 ± 7512 ± 4
Blur4449 ± 17
CLAHE283 ± 1
ChannelDropout6810 ± 65
CornerIllumination425 ± 2
EnhanceDetail2148 ± 13
EnhanceEdge1373 ± 16
GaussianIllumination388 ± 1
GaussianNoise225 ± 0
Hue967 ± 1
LinearIllumination521 ± 1
LongestMaxSize2825 ± 42
MedianBlur843 ± 4
MotionBlur1953 ± 21
OpticalDistortion274 ± 1
PlankianJitter2253 ± 17
PlasmaBrightness267 ± 1
PlasmaContrast143 ± 0
PlasmaShadow420 ± 3
RGBShift2292 ± 3
Rain1259 ± 2
RandomGamma9938 ± 46
RandomJigsaw5172 ± 16
RandomRotate905990 ± 85
SaltAndPepper738 ± 10
Saturation847 ± 17
Shear784 ± 6
SmallestMaxSize2017 ± 25
Snow489 ± 3
ThinPlateSpline52 ± 0
Transpose4627 ± 26
UnsharpMask906 ± 2

See the aggregate image benchmark or inspect the benchmark source code.

Conversion Guide

In PyTorch projects, the usual migration is to keep your Dataset and DataLoader, replace torchvision.transforms with AlbumentationsX, then finish with ToTensorV2.

  • Read images as NumPy arrays, usually with OpenCV plus BGR to RGB conversion.
  • Replace transform lists with A.Compose.
  • For classification, return transformed['image'] after ToTensorV2.
  • For detection or segmentation, pass masks, bboxes, labels, and keypoint params through Compose instead of updating them manually.
Torchvision
from torchvision import transforms

transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
])

image = transform(pil_image)
AlbumentationsX
import albumentations as A
from albumentations.pytorch import ToTensorV2

transform = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
    ToTensorV2(),
])

image = transform(image=image_np)["image"]

Use AlbumentationsX When

  • PyTorch training pipelines where CPU augmentation speed matters.
  • Detection, segmentation, keypoints, and multi-input augmentation where targets must stay aligned.
  • Projects that want the same augmentation library across PyTorch, TensorFlow, Keras, and custom training loops.

Use Torchvision When

  • Simple PyTorch classification baselines that already use torchvision examples.
  • Tensor-native workflows that rely on torchvision transforms, tv_tensors, or PyTorch-only deployment assumptions.