Albumentations to Torchvision Transform Mapping

On this page

This page maps Albumentations transforms to torchvision equivalents where torchvision has a direct benchmark implementation. The benchmark source of truth is benchmark/transforms/torchvision_impl.py, with shared canonical parameters in benchmark/transforms/specs.py.

If the torchvision column is -, the benchmark has no direct torchvision v2 implementation for that Albumentations transform. That does not always mean the operation is impossible in PyTorch; it means torchvision does not expose it as the same tested augmentation primitive in the benchmark.

Migration Rules

  • Albumentations receives NumPy arrays in H,W,C layout. Torchvision usually receives PIL images or C,H,W tensors.
  • Albumentations uses transform names plus p; torchvision often encodes randomness in class names such as RandomHorizontalFlip.
  • Albumentations can update masks, boxes, keypoints, rotated boxes, volumes, and videos through A.Compose; torchvision image transforms alone do not give you the same pipeline contract.
  • Torchvision color transforms often follow PIL semantics. Albumentations often uses OpenCV or explicit dtype-aware behavior, so tiny pixel differences are expected.

Geometric Transforms

Albumentations transformTorchvision transformNotes
ResizeResizeDirect resize benchmark mapping. Albumentations also propagates masks and other targets.
RandomCropRandomCropBenchmark uses a 224x224 random crop. Albumentations can update boxes and keypoints.
RandomResizedCropRandomResizedCropSimilar crop-and-resize policy; Albumentations adds mask interpolation and target propagation.
CenterCropCenterCropBenchmark uses a 224x224 center crop.
HorizontalFlipRandomHorizontalFlipSame image operation when p=1; Albumentations also updates supported targets.
VerticalFlipRandomVerticalFlipSame image operation when p=1; Albumentations also updates supported targets.
PadPadDirect padding benchmark mapping.
RotateRandomRotationAlbumentations exposes bbox rotation method, crop behavior, border mode, image fill, and mask fill.
AffineRandomAffineTorchvision translation is fractional; Albumentations supports percent and pixel translation plus richer axis-specific parameter formats.
PerspectiveRandomPerspectiveTorchvision uses distortion_scale; Albumentations uses a corner movement scale range and has keep_size / fit_output.
ElasticTransformElasticTransformSimilar concept; parameter meanings and defaults differ. Verify visually when migrating.
RandomRotate90-No direct torchvision v2 benchmark implementation. Use Albumentations when 90-degree rotations are part of the augmentation policy.
SquareSymmetry-No direct torchvision benchmark implementation for the full 8-way square symmetry group.
Transpose-No direct torchvision benchmark implementation.
SafeRotate-No direct torchvision benchmark implementation for rotation that keeps the whole image content.
RandomScale-No direct torchvision benchmark implementation.
ShiftScaleRotate-Approximate with affine logic if needed, but the benchmark has no direct torchvision primitive for the combined transform.
GridDistortion-No direct torchvision benchmark implementation for grid-based non-rigid distortion.
ThinPlateSpline-No direct torchvision benchmark implementation for smooth control-point deformation.
OpticalDistortion-No direct torchvision benchmark implementation for lens/fisheye distortion.
RandomGridShuffle-No direct torchvision benchmark implementation.
Morphological-No direct torchvision benchmark implementation for morphology operations.
PadIfNeeded-No direct torchvision benchmark implementation for padding to minimum size constraints.
CropAndPad-No direct torchvision benchmark implementation.
RandomSizedCrop-No direct torchvision benchmark implementation.
LongestMaxSize-No direct torchvision benchmark implementation for this named aspect-ratio primitive.
SmallestMaxSize-No direct torchvision benchmark implementation for this named aspect-ratio primitive.

Color and Pixel Transforms

Albumentations transformTorchvision transformNotes
ColorJitterColorJitterBoth adjust brightness, contrast, saturation, and hue. Color-space details differ. RGB-specific.
ChannelShuffleRandomChannelPermutationSame idea. For arbitrary-channel data, verify the operation is semantically valid for your channels.
ToGrayGrayscaleTorchvision outputs 1 or 3 channels. Albumentations has configurable output channels and multiple grayscale methods.
GaussianBlurGaussianBlurAlbumentations can sample kernel and sigma ranges.
InvertImgRandomInvertAlbumentations determines maximum value from dtype.
PosterizeRandomPosterizeAlbumentations supports per-channel and ranged bit settings.
SolarizeRandomSolarizeAlbumentations uses normalized threshold ranges and dtype-aware scaling.
SharpenRandomAdjustSharpnessSimilar goal, different parameterization and implementation methods.
AutoContrastRandomAutocontrastClose image-only equivalent.
EqualizeRandomEqualizeAlbumentations adds algorithm choice and optional masks.
NormalizeNormalizeAlbumentations accepts uint8 and handles scaling through max_pixel_value; torchvision usually works on tensors.
ErasingRandomErasingAlbumentations adds mask fill and inpainting options.
ImageCompressionJPEGAlbumentations supports JPEG and WebP compression simulation.
RandomBrightnessContrastColorJitterBenchmark maps standalone brightness and contrast specs to torchvision ColorJitter. Albumentations commonly keeps them in one transform.
RandomOrder + ColorJitter + ChannelShuffleRandomPhotometricDistortRecreate the SSD-style policy as a composition when needed.
ToRGB-No direct torchvision benchmark implementation. Use only when converting grayscale-like inputs to RGB is intentional.
GaussNoise-No direct torchvision benchmark implementation for sampled Gaussian noise.
RandomGamma-No direct torchvision benchmark implementation.
PlanckianJitter-No direct torchvision benchmark implementation for physics-based color temperature jitter.
MedianBlur-No direct torchvision benchmark implementation.
Blur-No direct torchvision benchmark implementation for box blur in this benchmark.
MotionBlur-No direct torchvision benchmark implementation.
CLAHE-No direct torchvision benchmark implementation.
HueSaturationValue-No direct torchvision benchmark implementation for this HSV-style transform.
ChannelDropout-No direct torchvision benchmark implementation.
Downscale-No direct torchvision benchmark implementation for downscale-upscale degradation.
PixelSpread-No direct torchvision benchmark implementation.
Emboss-No direct torchvision benchmark implementation.
ChromaticAberration-No direct torchvision benchmark implementation.
ISONoise-No direct torchvision benchmark implementation.
ShotNoise-No direct torchvision benchmark implementation.
MultiplicativeNoise-No direct torchvision benchmark implementation.
AdditiveNoise-No direct torchvision benchmark implementation.
RandomToneCurve-No direct torchvision benchmark implementation.
RingingOvershoot-No direct torchvision benchmark implementation.
Spatter-No direct torchvision benchmark implementation.
UnsharpMask-No direct torchvision benchmark implementation.
FancyPCA-No direct torchvision benchmark implementation.
Superpixels-No direct torchvision benchmark implementation.
ToSepia-No direct torchvision benchmark implementation.

Weather, Illumination, and Dropout

Albumentations transformTorchvision transformNotes
RandomFog-No direct torchvision benchmark implementation.
RandomRain-No direct torchvision benchmark implementation.
RandomSnow-No direct torchvision benchmark implementation.
RandomShadow-No direct torchvision benchmark implementation.
RandomSunFlare-No direct torchvision benchmark implementation.
Illumination-No direct torchvision benchmark implementation for linear, corner, or Gaussian illumination modes.
PlasmaBrightnessContrast-No direct torchvision benchmark implementation.
PlasmaShadow-No direct torchvision benchmark implementation.
RandomGravel-No direct torchvision benchmark implementation.
AtmosphericFog-No direct torchvision benchmark implementation.
Vignetting-No direct torchvision benchmark implementation.
Dithering-No direct torchvision benchmark implementation.
FilmGrain-No direct torchvision benchmark implementation.
Halftone-No direct torchvision benchmark implementation.
LensFlare-No direct torchvision benchmark implementation.
GridMask-No direct torchvision benchmark implementation.
GridDropout-No direct torchvision benchmark implementation.
PixelDropout-No direct torchvision benchmark implementation.
CoarseDropout-No direct torchvision benchmark implementation. Torchvision has RandomErasing, but not the same sampled multi-hole dropout primitive.
ConstrainedCoarseDropout-No direct torchvision benchmark implementation for object-constrained dropout.
CopyAndPaste-No direct torchvision benchmark implementation for annotation-aware copy-paste.
WaterRefraction-No direct torchvision benchmark implementation.

Additional Albumentations-Only Coverage

These transforms exist in Albumentations but have no direct torchvision benchmark implementation. This is the practical catalog gap: staying on torchvision means skipping these policies or implementing them yourself.

Albumentations transformTorchvision transformNotes
AtLeastOneBBoxRandomCrop-Bbox-aware crop that preserves at least one object.
BBoxSafeRandomCrop-Bbox-safe crop policy for detection datasets.
RandomSizedBBoxSafeCrop-Sampled crop-and-resize with bbox safety rules.
CropNonEmptyMaskIfExists-Segmentation crop that prefers non-empty mask regions.
RandomCropNearBBox-Crop sampled near a selected bounding box.
RandomCropFromBorders-Random border crop policy.
LetterBox-Resize-and-pad policy common in detection pipelines.
MaskDropout-Drop objects using mask regions and keep targets aligned.
Mosaic-Multi-image detection augmentation with target handling.
AdvancedBlur-Sampled blur kernel beyond standard Gaussian blur.
Defocus-Camera defocus simulation.
GlassBlur-Glass-like local distortion blur.
ZoomBlur-Zoom-motion blur simulation.
RGBShift-RGB channel shift augmentation.
SaltAndPepper-Salt-and-pepper corruption.
PhotoMetricDistort-Photometric distortion policy commonly used in detection.
FDA-Fourier domain adaptation.
HistogramMatching-Match image histogram to reference images.
PixelDistributionAdaptation-Pixel distribution adaptation to reference images.
HEStain-H&E stain augmentation for histopathology.
FrequencyMasking-Frequency-domain masking for spectrogram-like inputs.
TimeMasking-Time-axis masking for spectrogram-like inputs.
TimeReverse-Reverse temporal axis.
CenterCrop3D-3D center crop for volumetric data.
RandomCrop3D-3D random crop for volumetric data.
CoarseDropout3D-3D coarse dropout for volumes.
CubicSymmetry-Cubic symmetry transform for 3D data.
GridShuffle3D-Grid shuffle augmentation for 3D data.
Pad3D-3D padding.
PadIfNeeded3D-3D minimum-size padding.

Channel Notes

Albumentations is a safer default for non-RGB image tensors when the transform is channel-agnostic: resize, crop, flip, affine, perspective, blur, noise, dropout, and normalization can preserve the channel dimension. Be careful with RGB/color-space transforms such as ColorJitter, HueSaturationValue, ToGray, ToRGB, RandomRain, and RandomSnow.

Do not map a 9-channel multispectral tensor through an RGB transform unless that conversion is part of your data design.

Migration Example

import albumentations as A

pipeline = A.Compose([
    A.RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 4 / 3), p=1.0),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(brightness_range=(0.5, 1.5), contrast_range=(0, 2.5), saturation_range=(0, 2.5), hue_range=(-0.5, 0.5), p=0.5),
    A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

For detection:

pipeline = A.Compose(
    [
        A.RandomResizedCrop(size=(512, 512), scale=(0.8, 1.0), p=1.0),
        A.HorizontalFlip(p=0.5),
    ],
    bbox_params=A.BboxParams(coord_format="coco", label_fields=["labels"]),
)

Benchmark Pages