Transforms: The Building Blocks of Augmentation
On this page
- The Core Idea
- Calling a Transform
- The Return Value Is a Dictionary
- Probability: p
- Parameter Values and Ranges
- Transform Families
- Targets and Compatibility
- From One Transform to a Pipeline
- Where to Go Next?
The Core Idea
A transform is one augmentation operation. It receives one or more named targets, such as image, mask,
bboxes, or keypoints, applies its operation to the targets it supports, and returns a dictionary with the
transformed targets.
Examples of transforms include HorizontalFlip, RandomBrightnessContrast, RandomCrop, and GaussianBlur.
The important contract is:
named targets in -> transform samples parameters -> named targets out
For most real pipelines, you combine transforms with A.Compose. This page focuses on individual transform
mechanics; the Pipelines guide covers composition.
Calling a Transform
Instantiate a transform with its configuration, then call it with keyword arguments. The most common target is
image.
import albumentations as A
import numpy as np
rng = np.random.default_rng(137)
image = rng.integers(0, 256, size=(100, 100, 3), dtype=np.uint8)
transform = A.HorizontalFlip(p=1.0)
result = transform(image=image)
flipped_image = result["image"]
Always pass targets by keyword. Do not call a transform as transform(image); Albumentations expects named
targets so it knows which processing rules to use.
The Return Value Is a Dictionary
Transforms return dictionaries, even when you pass only one image.
result = transform(image=image)
image_after = result["image"]
When you pass multiple targets, the same dictionary contains each transformed target.
rng = np.random.default_rng(137)
image = rng.integers(0, 256, size=(100, 100, 3), dtype=np.uint8)
mask = rng.integers(0, 4, size=(100, 100), dtype=np.uint8)
transform = A.HorizontalFlip(p=1.0)
result = transform(image=image, mask=mask)
flipped_image = result["image"]
flipped_mask = result["mask"]
This dictionary contract is the same for single transforms and for A.Compose pipelines.
Probability: p
Every transform has a probability parameter p. It controls whether the transform runs each time it is
called.
| Value | Meaning |
|---|---|
p=1.0 | Always apply the transform. |
p=0.0 | Never apply the transform. |
p=0.5 | Apply the transform on about half of calls. |
always_flip = A.HorizontalFlip(p=1.0)
sometimes_flip = A.HorizontalFlip(p=0.5)
rare_blur = A.GaussianBlur(p=0.1)
The probability check happens independently each time the transform is called. For nested probabilities and pipeline-level probability, see Setting Probabilities.
Parameter Values and Ranges
Many transforms accept either fixed values or ranges. When a transform is applied, Albumentations samples concrete parameter values from those ranges for that call.
brightness = A.RandomBrightnessContrast(
brightness_range=(-0.2, 0.3),
contrast_range=(-0.1, 0.1),
p=1.0,
)
Here, the transform always runs because p=1.0, but the brightness and contrast values vary between calls.
Some parameters can be fixed by giving a range with identical endpoints:
fixed_blur = A.GaussianBlur(blur_range=(3, 3), p=1.0)
random_blur = A.GaussianBlur(blur_range=(3, 7), p=1.0)
The first blur always uses the same blur range endpoint. The second samples a value from the configured range when it runs.
For size-dependent transforms, prefer parameters that scale with the image when the API supports them. For
example, CoarseDropout accepts fractional
hole_height_range and hole_width_range values in (0, 1], so dropout holes scale with image size.
Transform Families
Transforms are easiest to understand by the kind of change they make and which targets they can safely affect.
| Family | What changes | Typical targets affected | Examples |
|---|---|---|---|
| Pixel transforms | Pixel values, not geometry | Image-like targets | RandomBrightnessContrast, ColorJitter, GaussianBlur, GaussNoise |
| Spatial transforms | Geometry, position, size, orientation | Images, masks, boxes, keypoints, supported volumes | HorizontalFlip, RandomCrop, Resize, Affine |
| 3D transforms | Volumetric geometry or volumetric regions | volume, volumes, mask3d, masks3d | RandomCrop3D, Pad3D, CoarseDropout3D |
| Mixed transforms | More than one kind of effect | Depends on the transform | RandomResizedCrop, ShiftScaleRotate |
Pixel transforms usually leave masks, bounding boxes, and keypoints unchanged because they do not move pixels through space. Spatial transforms must update every supported spatial target so all annotations still match the transformed image.
3D behavior depends on the transform. Dedicated 3D transforms operate on volumetric targets. Some 2D transforms can be applied to volumes slice-wise with shared parameters; see Volumetric Augmentation for the full workflow.
Mixed transforms combine effects. For example, RandomResizedCrop changes geometry by cropping and resizing, and resizing also involves interpolation.
Targets and Compatibility
Not every transform supports every target type. Support means the transform has a defined, target-specific contract. A spatial transform that supports masks must preserve mask semantics. A transform that supports bounding boxes must update coordinates correctly. A transform that does not know how to update a target safely should be rejected instead of silently corrupting supervision.
import albumentations as A
import numpy as np
rng = np.random.default_rng(137)
image = rng.integers(0, 256, size=(128, 128, 3), dtype=np.uint8)
mask = rng.integers(0, 3, size=(128, 128), dtype=np.uint8)
transform = A.RandomCrop(height=96, width=96, p=1.0)
result = transform(image=image, mask=mask)
cropped_image = result["image"]
cropped_mask = result["mask"]
In this example, the same crop is applied to image and mask, but each target uses its own processing rules.
The image is cropped as image data; the mask is cropped as label data.
For the conceptual model, read what "supported" means for targets. For the exact support matrix, use the Supported Targets by Transform reference.
From One Transform to a Pipeline
A single transform is useful for demos and simple deterministic operations. Most training workflows use
A.Compose to run several transforms in sequence and configure target processors, seeds, additional targets,
and debugging options.
import albumentations as A
pipeline = A.Compose(
[
A.RandomCrop(height=224, width=224, p=1.0),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
],
seed=137,
)
Use this page to understand what each transform contributes. Use the Pipelines guide to learn
how A.Compose, OneOf, SomeOf, RandomOrder, and other composition tools control the whole policy.
Pipeline design itself is a separate question:
- Use Choosing Augmentations for Model Generalization to decide which transforms belong in a training policy.
- Use Performance Tuning for crop-first ordering,
uint8handling, OpenCV threading, and other speed concerns. - Use Targets when your pipeline includes masks, bounding boxes, keypoints, volumes, or additional targets.
Where to Go Next?
- Pipelines: Combine transforms with
A.Composeand composition utilities. - Probabilities: Understand transform and pipeline probability behavior.
- Targets: Learn how transforms interact with images, masks, boxes, keypoints, and volumes.
- Choosing Augmentations: Build an effective augmentation policy for a task.
- Performance Tuning: Make augmentation pipelines fast enough for training.
- Explore Transforms Visually: Inspect individual transforms and their parameters.