AlbumentationsX vs Torchvision
Compare AlbumentationsX with Torchvision transforms: API differences, RGB image benchmark results, and a PyTorch migration guide.
What Is Different?
Torchvision is the native augmentation layer for the PyTorch ecosystem. AlbumentationsX is framework-independent and optimized around NumPy/OpenCV image augmentation before tensors enter the model.
- Torchvision commonly operates on PIL images or tensors; AlbumentationsX operates on NumPy arrays and can emit PyTorch tensors with ToTensorV2.
- Torchvision integrates tightly with PyTorch datasets and model examples; AlbumentationsX focuses on faster CPU augmentation and richer computer-vision target handling.
- AlbumentationsX pipelines pass named targets such as image, mask, bboxes, and keypoints; Torchvision v1-style pipelines are mostly image-first unless you use the newer tv_tensors stack.
- AlbumentationsX tends to be easier when geometric transforms must stay consistent across masks, boxes, and keypoints.
Benchmark vs Torchvision
The benchmark below compares single-threaded RGB image augmentation throughput. Torchvision receives PIL images; AlbumentationsX receives OpenCV-loaded RGB NumPy arrays. Higher images/second is better.
| Transform | albumentationsx 2.2.5 CPU · macOS arm64 | Torchvision 0.26.0 CPU · macOS arm64 | Speedup Albx / best other (range) |
|---|---|---|---|
| Elastic | 453 ± 2 | 8 ± 0 | 53-55x |
| Resize | 3542 ± 11 | 288 ± 2 | 12-12x |
| Solarize | 13505 ± 442 | 1297 ± 2 | 10-11x |
| ColorJitter | 1221 ± 10 | 131 ± 2 | 9.1-9.5x |
| ColorJiggle | 1208 ± 16 | 133 ± 1 | 8.9-9.2x |
| PhotoMetricDistort | 1070 ± 19 | 129 ± 0 | 8.1-8.5x |
| Contrast | 10045 ± 119 | 1228 ± 4 | 8.1-8.3x |
| HorizontalFlip | 13200 ± 430 | 1678 ± 15 | 7.5-8.2x |
| GaussianBlur | 2462 ± 11 | 336 ± 4 | 7.2-7.4x |
| Pad | 34979 ± 3274 | 5072 ± 68 | 6.2-7.6x |
| Erasing | 27849 ± 4028 | 4175 ± 10 | 5.7-7.7x |
| Rotate | 2996 ± 12 | 451 ± 9 | 6.5-6.8x |
| Invert | 31753 ± 1327 | 5195 ± 84 | 5.8-6.5x |
| VerticalFlip | 29169 ± 2657 | 5023 ± 29 | 5.2-6.4x |
| Grayscale | 19593 ± 350 | 3409 ± 8 | 5.6-5.9x |
| Posterize | 28724 ± 3259 | 5162 ± 6 | 4.9-6.2x |
| Brightness | 9849 ± 99 | 1989 ± 9 | 4.9-5.0x |
| RandomResizedCrop | 4354 ± 22 | 939 ± 23 | 4.5-4.8x |
| Perspective | 1185 ± 9 | 268 ± 6 | 4.3-4.6x |
| Affine | 1456 ± 23 | 331 ± 7 | 4.2-4.6x |
| Sharpen | 2221 ± 35 | 525 ± 1 | 4.2-4.3x |
| AutoContrast | 1619 ± 44 | 399 ± 2 | 3.9-4.2x |
| RandomCrop128 | 93574 ± 1964 | 31208 ± 564 | 2.9-3.1x |
| CenterCrop128 | 95346 ± 1281 | 34279 ± 307 | 2.7-2.8x |
| ChannelShuffle | 8235 ± 86 | 5383 ± 12 | 1.5-1.5x |
| Normalize | 1642 ± 26 | 1091 ± 6 | 1.5-1.5x |
| JpegCompression | 1351 ± 11 | 925 ± 3 | 1.4-1.5x |
| Equalize | 1086 ± 12 | 946 ± 5 | 1.1-1.2x |
| Blur | 7544 ± 134 | — | — |
| CLAHE | 644 ± 5 | — | — |
| ChannelDropout | 11971 ± 434 | — | — |
| Colorize | 3858 ± 11 | — | — |
| CornerIllumination | 866 ± 28 | — | — |
| Dithering | 6 ± 0 | — | — |
| GaussianIllumination | 773 ± 21 | — | — |
| GaussianNoise | 328 ± 20 | — | — |
| Hue | 1908 ± 18 | — | — |
| LinearIllumination | 557 ± 18 | — | — |
| LongestMaxSize | 3847 ± 62 | — | — |
| MedianBlur | 1546 ± 16 | — | — |
| MotionBlur | 3847 ± 49 | — | — |
| OpticalDistortion | 395 ± 4 | — | — |
| PlankianJitter | 3278 ± 13 | — | — |
| PlasmaBrightness | 394 ± 9 | — | — |
| PlasmaContrast | 250 ± 6 | — | — |
| PlasmaShadow | 526 ± 8 | — | — |
| RGBShift | 5025 ± 48 | — | — |
| Rain | 2169 ± 27 | — | — |
| RandomGamma | 14482 ± 424 | — | — |
| RandomJigsaw | 9413 ± 136 | — | — |
| RandomRotate90 | 8652 ± 167 | — | — |
| SaltAndPepper | 946 ± 4 | — | — |
| Saturation | 1389 ± 27 | — | — |
| Shear | 1322 ± 7 | — | — |
| SmallestMaxSize | 2676 ± 7 | — | — |
| Snow | 754 ± 4 | — | — |
| ThinPlateSpline | 92 ± 1 | — | — |
| Transpose | 8184 ± 199 | — | — |
| UnsharpMask | 3063 ± 37 | — | — |
See the aggregate image benchmark or inspect the benchmark source code.
Conversion Guide
In PyTorch projects, the usual migration is to keep your Dataset and DataLoader, replace torchvision.transforms with AlbumentationsX, then finish with ToTensorV2.
- Read images as NumPy arrays, usually with OpenCV plus BGR to RGB conversion.
- Replace transform lists with A.Compose.
- For classification, return transformed['image'] after ToTensorV2.
- For detection or segmentation, pass masks, bboxes, labels, and keypoint params through Compose instead of updating them manually.
from torchvision import transforms
transform = transforms.Compose([
transforms.RandomHorizontalFlip(p=0.5),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
])
image = transform(pil_image)import albumentations as A
from albumentations.pytorch import ToTensorV2
transform = A.Compose([
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
ToTensorV2(),
])
image = transform(image=image_np)["image"]Use AlbumentationsX When
- PyTorch training pipelines where CPU augmentation speed matters.
- Detection, segmentation, keypoints, and multi-input augmentation where targets must stay aligned.
- Projects that want the same augmentation library across PyTorch, TensorFlow, Keras, and custom training loops.
Use Torchvision When
- Simple PyTorch classification baselines that already use torchvision examples.
- Tensor-native workflows that rely on torchvision transforms, tv_tensors, or PyTorch-only deployment assumptions.