Albumentations to Torchvision Transform Mapping

On this page

This page maps Albumentations transforms to the closest Torchvision v2 operation when Torchvision has a practical direct equivalent. Use it as a migration and capability guide, not as a performance page.

How to Read This Page

  • A named Torchvision transform means Torchvision v2 has a practical direct operation for the same idea.
  • - means Torchvision does not support that transform as a built-in augmentation primitive.
  • (partial) means Torchvision covers part of the behavior, but not the full Albumentations transform contract.
  • The table shows built-in Torchvision operations. Custom PyTorch implementations are not counted as Torchvision support.

Migration Rules

  • Albumentations receives NumPy arrays in OpenCV-style H,W,C channel-last layout. Torchvision usually receives PIL images or C,H,W tensors.
  • Albumentations uses transform names plus p; Torchvision often encodes randomness in class names such as RandomHorizontalFlip.
  • Albumentations can update masks, boxes, keypoints, oriented bounding boxes (OBB), volumes, and videos through A.Compose. Torchvision v2 can update supported TVTensor targets, but its policy model and serialization/debug contract are different.
  • Torchvision color transforms often follow PIL semantics. Albumentations often uses OpenCV or explicit dtype-aware behavior, so small pixel differences are expected.
  • Albumentations Mosaic and CopyAndPaste are per-sample multi-image policies with target handling. They are not the same product boundary as batch-level MixUp or CutMix after collation.
  • MixUp, CutMix, GPU-side normalization, and other batch-level tensor policies are a good reason to combine Albumentations with Torchvision/PyTorch tensor code, not a reason to give up Albumentations for the rest of the pipeline.

Geometry and Size

Color, Intensity, and Tensor Operations

Weather, Illumination, and Dropout

Target-Aware and Multi-Image Policies

Volume and Domain-Specific Transforms

Migration Example

Torchvision v2 classification pipelines map cleanly when the policy is image-only:

import torch
import torchvision.transforms.v2 as T

pipeline = T.Compose(
    [
        T.RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 4 / 3)),
        T.RandomHorizontalFlip(p=0.5),
        T.ColorJitter(brightness=0.5, contrast=1.5, saturation=1.5, hue=0.5),
        T.ToImage(),
        T.ToDtype(torch.float32, scale=True),
        T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ],
)

The equivalent Albumentations policy keeps the same structure:

import albumentations as A

pipeline = A.Compose(
    [
        A.RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 4 / 3), p=1.0),
        A.HorizontalFlip(p=0.5),
        A.ColorJitter(
            brightness_range=(0.5, 1.5),
            contrast_range=(0, 2.5),
            saturation_range=(0, 2.5),
            hue_range=(-0.5, 0.5),
            p=0.5,
        ),
        A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ],
)

For detection, the Torchvision version usually relies on TVTensors:

import torchvision.transforms.v2 as T

pipeline = T.Compose(
    [
        T.RandomResizedCrop(size=(512, 512), scale=(0.8, 1.0)),
        T.RandomHorizontalFlip(p=0.5),
    ],
)

The Albumentations version puts the target contract in A.Compose:

import albumentations as A

pipeline = A.Compose(
    [
        A.RandomResizedCrop(size=(512, 512), scale=(0.8, 1.0), p=1.0),
        A.HorizontalFlip(p=0.5),
    ],
    bbox_params=A.BboxParams(coord_format="coco", label_fields=["labels"]),
)