Using Additional Targets in Albumentations

The Core Idea

additional_targets lets you pass extra named inputs to a pipeline and tell Albumentations to treat each one like an existing target type. The transform parameters are sampled once, then applied consistently to the standard target and to every additional target.

Use it when your sample has separate named fields that must stay aligned:

stereo pairs: image and right_image
RGB plus depth or thermal data: image and depth
restoration pairs: degraded input and clean target
multiple named masks: mask, parts_mask, edge_mask

The important choice is the mapping. Mapping a name to image is not the same as mapping it to mask. That choice controls which transforms affect the target and which interpolation behavior is used.

Choose the Right Mechanism

Albumentations has several ways to handle more than one input. Choose the smallest mechanism that matches your data shape and semantics.

Situation	Use	Why
One image plus one standard target	`image`, `mask`, `bboxes`, `keypoints`	The standard target name is already enough.
An unnamed stack or sequence	`images`, `masks`, `volumes`, `masks3d`	Items share sampled parameters across the first dimension.
Separate named arrays in one sample	`additional_targets`	Each field keeps its own name while reusing standard target behavior.
Extra labels for boxes or keypoints	`label_fields` in `bbox_params` or `keypoint_params`	Labels stay synchronized with filtered boxes or keypoints.
Custom metadata or geometry	custom `apply_to_<key>` methods	The target cannot be treated exactly like an image, mask, bbox, or keypoint.

For example, use images when a single field contains a stack of frames. Use additional_targets when the sample has separate named fields such as image, right_image, and depth.

Mapping Names to Target Semantics

additional_targets is a dictionary passed to A.Compose:

transform = A.Compose(
    [...],
    additional_targets={"right_image": "image", "depth": "mask"},
    seed=137,
)

The keys are the extra keyword argument names you will pass when calling the pipeline. The values are the standard target types whose behavior should be reused.

Mapping	Geometric transforms	Pixel/color/intensity transforms	Interpolation behavior	Common use
`"image"`	Applied	Applied	Image interpolation	Stereo image, second RGB image, image restoration input
`"mask"`	Applied	Skipped	Mask interpolation	Segmentation masks, clean restoration target, depth when you want mask-like routing
`"bboxes"`	Applied through bbox processor	Skipped	Coordinate processing	Separate named bbox set
`"keypoints"`	Applied through keypoint processor	Skipped	Coordinate processing	Separate named keypoint set
`"volume"`	Applied as volume/image-like data	Applied when compatible	Volume/image behavior	Named 3D image-like input
`"mask3d"`	Applied as 3D mask-like data	Skipped	3D mask behavior	Named 3D label volume

Mapping to image means color jitter, blur, noise, compression, brightness, and other image-only transforms can affect that target. Mapping to mask means those transforms skip the target, while spatial transforms still keep it aligned with the image.

This is the main safety rule: map by semantics, not by array shape. A clean image in a restoration pair may have shape (H, W, C), but if degradation transforms must not touch it, it should usually be routed as mask or handled in a split pipeline.

For continuous targets such as depth maps or heatmaps, mask-like routing is often useful because it skips color and degradation transforms. However, the default mask interpolation is nearest-neighbor, which is right for categorical masks but often wrong for continuous values. Use an explicit mask interpolation when continuous targets should be interpolated.

Examples

Stereo Pair: Named Image-Like Target

A stereo pair usually needs the same crop, resize, and flip applied to both views. If both images should also receive the same photometric transform, map the second view to image.

import albumentations as A
import numpy as np

transform = A.Compose(
    [
        A.RandomResizedCrop(size=(256, 512), scale=(0.8, 1.0), p=1.0),
        A.HorizontalFlip(p=0.5),
        A.ColorJitter(p=0.3),
    ],
    additional_targets={"right_image": "image"},
    seed=137,
)

left_image = np.random.randint(0, 256, (300, 600, 3), dtype=np.uint8)
right_image = np.random.randint(0, 256, (300, 600, 3), dtype=np.uint8)

result = transform(image=left_image, right_image=right_image)

aug_left = result["image"]
aug_right = result["right_image"]

Both outputs receive the same sampled geometry. Because right_image is mapped to image, image-only transforms such as ColorJitter also apply to it.

RGB and Depth: Named Mask-Like Target

Depth maps usually need the same geometry as the RGB image, but they should not receive RGB color transforms. Route the depth map as mask. If depth values are continuous, set mask interpolation explicitly.

import cv2
import albumentations as A
import numpy as np

transform = A.Compose(
    [
        A.Resize(height=256, width=256, p=1.0),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(p=0.3),
    ],
    additional_targets={"depth": "mask"},
    mask_interpolation=cv2.INTER_LINEAR,
    seed=137,
)

image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
depth = np.random.rand(300, 300).astype(np.float32)

result = transform(image=image, depth=depth)

aug_image = result["image"]
aug_depth = result["depth"]

The RGB image can receive brightness and contrast changes. The depth map receives the same resize and flip, but not the brightness or contrast transform.

For categorical masks, keep nearest-neighbor interpolation. For continuous targets such as depth or heatmaps, choose interpolation deliberately based on what the values mean.

Restoration Pair: Degrade Input, Preserve Clean Target

In denoising, deblurring, compression removal, and similar restoration tasks, the clean target must stay geometrically aligned with the degraded input but should not receive the degradation transforms. Map the clean target to mask when the pipeline contains image-only degradation transforms.

import albumentations as A
import numpy as np

transform = A.Compose(
    [
        A.RandomCrop(height=256, width=256, p=1.0),
        A.HorizontalFlip(p=0.5),
        A.GaussNoise(std_range=(0.05, 0.12), p=1.0),
        A.GaussianBlur(blur_range=(3, 5), p=0.5),
    ],
    additional_targets={"clean": "mask"},
    seed=137,
)

clean_image = np.random.randint(0, 256, (320, 320, 3), dtype=np.uint8)

result = transform(image=clean_image, clean=clean_image)

degraded_input = result["image"]
clean_target = result["clean"]

The crop and flip are shared. Noise and blur affect only image, so clean_target remains clean and aligned. If your restoration pipeline includes resizing, rotation, or warping, decide whether mask interpolation is acceptable for the clean target or whether a split pipeline is clearer.

Multiple Named Masks

Use additional_targets when masks have different meanings and should keep separate names.

import albumentations as A
import numpy as np

transform = A.Compose(
    [
        A.RandomCrop(height=128, width=128, p=1.0),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(p=0.3),
    ],
    additional_targets={"parts_mask": "mask", "edge_mask": "mask"},
    seed=137,
)

image = np.random.randint(0, 256, (160, 160, 3), dtype=np.uint8)
semantic_mask = np.random.randint(0, 5, (160, 160), dtype=np.uint8)
parts_mask = np.random.randint(0, 3, (160, 160), dtype=np.uint8)
edge_mask = np.random.randint(0, 2, (160, 160), dtype=np.uint8)

result = transform(
    image=image,
    mask=semantic_mask,
    parts_mask=parts_mask,
    edge_mask=edge_mask,
)

All masks receive the same crop and flip. Only the image receives brightness and contrast changes.

Dynamic Targets with `add_targets`

Use add_targets when you need to add target mappings after constructing a pipeline. This is useful when a pipeline is created by shared code and target names are known later.

import albumentations as A
import numpy as np

transform = A.Compose([A.VerticalFlip(p=1.0)], seed=137)
transform.add_targets({"image2": "image", "mask2": "mask"})

image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
image2 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
mask2 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)

result = transform(image=image, image2=image2, mask2=mask2)

Prefer declaring additional_targets in A.Compose when the mapping is known upfront; it keeps the pipeline configuration easier to inspect, serialize, and review.

Common Mistakes

Mapping a clean target to image in a degradation pipeline. Image-only transforms such as noise, blur, compression, and color jitter will affect it. Use mask routing or split the pipeline.
Mapping continuous targets to mask without thinking about interpolation. Nearest-neighbor interpolation preserves class IDs, but it may be wrong for depth, heatmaps, or dense regression targets.
Using additional_targets for an unnamed sequence. If the input is a stack of frames or masks, prefer plural targets such as images or masks.
Using additional_targets for custom metadata. Camera intrinsics, calibration matrices, crop provenance, or domain-specific annotations usually need custom apply_to_<key> routing.
Forgetting processors for coordinate targets. Mappings to bboxes and keypoints still require bbox_params and keypoint_params, including any needed label_fields.
Reusing a standard target name. Additional target names must not clash with reserved names such as image, mask, masks, bboxes, keypoints, volume, volumes, mask3d, or masks3d.

Where to Go Next?

Working with Targets: Understand target semantics and what "supported" means.
Pipelines: Review A.Compose, validation, seeds, and target configuration.
Video Augmentation: Use plural images for frame stacks.
Volumetric Augmentation: Use volume, volumes, mask3d, and masks3d.
Creating Custom Transforms: Add custom apply_to_<key> routing when standard target behavior is not enough.
Serialization: Save and load pipelines that include additional_targets.