Transforms: The Building Blocks of Augmentation

On this page

The Core Idea

A transform is one augmentation operation. It receives one or more named targets, such as image, mask, bboxes, or keypoints, applies its operation to the targets it supports, and returns a dictionary with the transformed targets.

Examples of transforms include HorizontalFlip, RandomBrightnessContrast, RandomCrop, and GaussianBlur.

The important contract is:

named targets in -> transform samples parameters -> named targets out

For most real pipelines, you combine transforms with A.Compose. This page focuses on individual transform mechanics; the Pipelines guide covers composition.

Calling a Transform

Instantiate a transform with its configuration, then call it with keyword arguments. The most common target is image.

import albumentations as A
import numpy as np

rng = np.random.default_rng(137)
image = rng.integers(0, 256, size=(100, 100, 3), dtype=np.uint8)

transform = A.HorizontalFlip(p=1.0)

result = transform(image=image)
flipped_image = result["image"]

Always pass targets by keyword. Do not call a transform as transform(image); Albumentations expects named targets so it knows which processing rules to use.

The Return Value Is a Dictionary

Transforms return dictionaries, even when you pass only one image.

result = transform(image=image)
image_after = result["image"]

When you pass multiple targets, the same dictionary contains each transformed target.

rng = np.random.default_rng(137)
image = rng.integers(0, 256, size=(100, 100, 3), dtype=np.uint8)
mask = rng.integers(0, 4, size=(100, 100), dtype=np.uint8)

transform = A.HorizontalFlip(p=1.0)

result = transform(image=image, mask=mask)
flipped_image = result["image"]
flipped_mask = result["mask"]

This dictionary contract is the same for single transforms and for A.Compose pipelines.

Probability: p

Every transform has a probability parameter p. It controls whether the transform runs each time it is called.

ValueMeaning
p=1.0Always apply the transform.
p=0.0Never apply the transform.
p=0.5Apply the transform on about half of calls.
always_flip = A.HorizontalFlip(p=1.0)
sometimes_flip = A.HorizontalFlip(p=0.5)
rare_blur = A.GaussianBlur(p=0.1)

The probability check happens independently each time the transform is called. For nested probabilities and pipeline-level probability, see Setting Probabilities.

Parameter Values and Ranges

Many transforms accept either fixed values or ranges. When a transform is applied, Albumentations samples concrete parameter values from those ranges for that call.

brightness = A.RandomBrightnessContrast(
    brightness_range=(-0.2, 0.3),
    contrast_range=(-0.1, 0.1),
    p=1.0,
)

Here, the transform always runs because p=1.0, but the brightness and contrast values vary between calls.

Some parameters can be fixed by giving a range with identical endpoints:

fixed_blur = A.GaussianBlur(blur_range=(3, 3), p=1.0)
random_blur = A.GaussianBlur(blur_range=(3, 7), p=1.0)

The first blur always uses the same blur range endpoint. The second samples a value from the configured range when it runs.

For size-dependent transforms, prefer parameters that scale with the image when the API supports them. For example, CoarseDropout accepts fractional hole_height_range and hole_width_range values in (0, 1], so dropout holes scale with image size.

Transform Families

Transforms are easiest to understand by the kind of change they make and which targets they can safely affect.

FamilyWhat changesTypical targets affectedExamples
Pixel transformsPixel values, not geometryImage-like targetsRandomBrightnessContrast, ColorJitter, GaussianBlur, GaussNoise
Spatial transformsGeometry, position, size, orientationImages, masks, boxes, keypoints, supported volumesHorizontalFlip, RandomCrop, Resize, Affine
3D transformsVolumetric geometry or volumetric regionsvolume, volumes, mask3d, masks3dRandomCrop3D, Pad3D, CoarseDropout3D
Mixed transformsMore than one kind of effectDepends on the transformRandomResizedCrop, ShiftScaleRotate

Pixel transforms usually leave masks, bounding boxes, and keypoints unchanged because they do not move pixels through space. Spatial transforms must update every supported spatial target so all annotations still match the transformed image.

3D behavior depends on the transform. Dedicated 3D transforms operate on volumetric targets. Some 2D transforms can be applied to volumes slice-wise with shared parameters; see Volumetric Augmentation for the full workflow.

Mixed transforms combine effects. For example, RandomResizedCrop changes geometry by cropping and resizing, and resizing also involves interpolation.

Targets and Compatibility

Not every transform supports every target type. Support means the transform has a defined, target-specific contract. A spatial transform that supports masks must preserve mask semantics. A transform that supports bounding boxes must update coordinates correctly. A transform that does not know how to update a target safely should be rejected instead of silently corrupting supervision.

import albumentations as A
import numpy as np

rng = np.random.default_rng(137)
image = rng.integers(0, 256, size=(128, 128, 3), dtype=np.uint8)
mask = rng.integers(0, 3, size=(128, 128), dtype=np.uint8)

transform = A.RandomCrop(height=96, width=96, p=1.0)

result = transform(image=image, mask=mask)
cropped_image = result["image"]
cropped_mask = result["mask"]

In this example, the same crop is applied to image and mask, but each target uses its own processing rules. The image is cropped as image data; the mask is cropped as label data.

For the conceptual model, read what "supported" means for targets. For the exact support matrix, use the Supported Targets by Transform reference.

From One Transform to a Pipeline

A single transform is useful for demos and simple deterministic operations. Most training workflows use A.Compose to run several transforms in sequence and configure target processors, seeds, additional targets, and debugging options.

import albumentations as A

pipeline = A.Compose(
    [
        A.RandomCrop(height=224, width=224, p=1.0),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(p=0.2),
    ],
    seed=137,
)

Use this page to understand what each transform contributes. Use the Pipelines guide to learn how A.Compose, OneOf, SomeOf, RandomOrder, and other composition tools control the whole policy.

Pipeline design itself is a separate question:

Where to Go Next?