Using Additional Targets in Albumentations 🔗
Motivation 🔗
Standard Albumentations pipelines seamlessly synchronize augmentations between an image
and its corresponding mask
, bboxes
, or keypoints
. However, sometimes you need to apply the exact same geometric transformations (and potentially other transforms) with identical random parameters to multiple image-like or mask-like arrays simultaneously.
Common use cases include:
- Stereo Images: Applying the same crop, rotation, or flip to both left and right stereo images. (Note: If you only need identical processing for multiple images, you could alternatively stack them into a single NumPy array and pass them via the plural
images
argument, treating it like a video clip. See the Video Augmentation Guide for details on that approach.) - Multi-modal Data: Consistently transforming an image and its corresponding depth map or thermal map (treating them as image-like or mask-like depending on desired interpolation and color transform application).
- Multiple Masks: Augmenting an image along with several different segmentation masks (e.g., foreground mask, parts mask) using the same geometric warp.
- Image Restoration Tasks: Applying geometric transforms identically to a clean target image and a degraded input version, while applying degradation transforms (like noise or blur) only to the input.
Albumentations provides the additional_targets
argument in A.Compose
and the add_targets
method specifically for these situations where multiple named arrays need synchronized augmentation parameters.
Note on Other Multi-Input Scenarios:
Albumentations has other mechanisms for different multi-input scenarios, covered in other guides:
- For sequences (like video frames or volumetric slices) where the same parameters should apply across the sequence dimension, use the plural arguments (
images
,masks
,volumes
,masks3d
). See the Video and Volumetric guides. - For associating multiple labels or attributes with individual bounding boxes or keypoints, use
bbox_params
orkeypoint_params
withlabel_fields
. See the Bounding Box and Keypoint guides.
This guide focuses only on the additional_targets
feature.
Core Mechanism: additional_targets
in A.Compose
🔗
Albumentations recognizes several standard target types that have specific processing rules:
image
: The primary input image(s) (e.g.,[H, W, C]
). Receives geometric, color, and intensity transforms. Uses standard interpolation for geometric transforms.mask
: Segmentation mask(s) (e.g.,[H, W]
). Receives geometric transforms using nearest-neighbor interpolation. Does not receive color/intensity transforms.masks
: Multiple segmentation masks passed together (e.g.,[N, H, W]
). Processed likemask
.bboxes
: Bounding boxes. Processed according tobbox_params
. Requiresbbox_params
to be set.keypoints
: Keypoints. Processed according tokeypoint_params
. Requireskeypoint_params
to be set.volume
: A 3D volume (e.g.,[D, H, W, C]
). Receives 3D geometric transforms, and applicable 2D transforms slice-wise. Color/intensity transforms applied if treated as'image'
.volumes
: Multiple 3D volumes (e.g.,[N, D, H, W, C]
). Processed likevolume
across the first dimension.mask3d
: A 3D mask (e.g.,[D, H, W]
). Receives 3D geometric transforms using nearest-neighbor interpolation. Does not receive color/intensity transforms.masks3d
: Multiple 3D masks (e.g.,[N, D, H, W]
). Processed likemask3d
across the first dimension.
The additional_targets
argument in A.Compose
allows you to define new keyword argument names for your transform calls and map them to the processing logic of one of the standard single-instance target types ('image'
, 'mask'
, 'volume'
, 'mask3d'
, etc.). This tells Albumentations: "Treat the data passed under this new name exactly like you would treat data passed under the standard target name specified in the mapping."
The additional_targets
argument takes a dictionary:
- Keys: Define the new keyword argument names you'll use when calling the transform (e.g.,
'right_image'
,'depth'
,'mask2'
). These names must not clash with the standard names listed above. - Values: Specify the standard target type whose processing logic should be applied. You can map to
'image'
,'mask'
,'bboxes'
, or'keypoints'
(and potentially volumetric types like'volume'
,'mask3d'
):- Mapping to
'image'
: Applies geometric transforms (standard interpolation) and color/intensity transforms. - Mapping to
'mask'
: Applies geometric transforms (nearest-neighbor interpolation) but not color/intensity transforms. - Mapping to
'bboxes'
: Applies geometric transforms according to the rules defined inbbox_params
. Requiresbbox_params
to be set inCompose
. - Mapping to
'keypoints'
: Applies geometric transforms according to the rules defined inkeypoint_params
. Requireskeypoint_params
to be set inCompose
.
- Mapping to
Example: Augmenting Stereo Images
import albumentations as A
import numpy as np
# Treat 'right_image' exactly like the primary 'image'
transform = A.Compose([
A.RandomResizedCrop(height=256, width=512, scale=(0.8, 1.0), p=1.0),
A.ColorJitter(p=0.5),
A.HorizontalFlip(p=0.5),
], additional_targets={'right_image': 'image'})
# Dummy stereo data
left_image = np.random.randint(0, 256, (300, 600, 3), dtype=np.uint8)
right_image = np.random.randint(0, 256, (300, 600, 3), dtype=np.uint8)
# Apply transform, passing both images
augmented = transform(image=left_image, right_image=right_image)
aug_left = augmented['image']
aug_right = augmented['right_image']
print(f"Left shape: {aug_left.shape}, Right shape: {aug_right.shape}")
# Both images received the exact same crop parameters, color jitter values, and flip decision.
Example: Augmenting Image and Multiple Masks
import albumentations as A
import numpy as np
# Define pipeline with two additional mask targets
transform = A.Compose([
A.ShiftScaleRotate(p=0.5),
A.RandomCrop(width=80, height=80, p=1.0),
A.GaussianBlur(p=0.5) # Will only affect 'image'
], additional_targets={'semantic_mask': 'mask', 'parts_mask': 'mask'})
# Create dummy data
image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
semantic_mask = np.random.randint(0, 5, (100, 100), dtype=np.uint8)
parts_mask = np.random.randint(0, 3, (100, 100), dtype=np.uint8)
# Apply transform passing all targets
augmented = transform(image=image,
semantic_mask=semantic_mask,
parts_mask=parts_mask)
aug_image = augmented['image']
aug_semantic_mask = augmented['semantic_mask']
aug_parts_mask = augmented['parts_mask']
print(f"Augmented Semantic Mask shape: {aug_semantic_mask.shape}")
print(f"Augmented Parts Mask shape: {aug_parts_mask.shape}")
# Image, semantic_mask, and parts_mask received identical ShiftScaleRotate/RandomCrop.
# Only image was potentially blurred by GaussianBlur.
Dynamic Addition: add_targets
Method 🔗
Add targets to an existing Compose
object dynamically.
import albumentations as A
import numpy as np
transform_dynamic = A.Compose([A.VerticalFlip(p=1.0)])
# Add targets later
new_targets = {'image2': 'image', 'mask_extra': 'mask'}
transform_dynamic.add_targets(new_targets)
# Prepare data
image1 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
image2 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
mask_extra = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
# Call with dynamically added arguments
augmented = transform_dynamic(image=image1, image2=image2, mask_extra=mask_extra)
aug_image1 = augmented['image']
aug_image2 = augmented['image2']
aug_mask_extra = augmented['mask_extra']
print("Image1, Image2, and Mask_extra all vertically flipped identically.")
Important Considerations 🔗
- Shape Consistency: All input arrays passed in a single call to the transform (e.g.,
image
,right_image
,depth
,mask2
) must have the same spatial dimensions (Height and Width for 2D, Depth, Height, and Width for 3D) before transformations are applied. Albumentations validates this by default. - Primary Use: Best suited for multiple image-like or mask-like arrays needing synchronized parameters.
- Params Requirement: If mapping to
'bboxes'
or'keypoints'
, ensure the correspondingbbox_params
orkeypoint_params
are provided toCompose
. - Naming: Keys in
additional_targets
must be unique and not clash with standard arguments (image
,mask
,masks
,bboxes
,keypoints
,volume
,volumes
,mask3d
,masks3d
). - Transform Support: Check individual transform documentation (
Targets
section) to confirm support for the target types you are using.
When to Use Which Mechanism? 🔗
- Single image + associated mask/bboxes/keypoints: Use standard arguments (
image
,mask
,bboxes
,keypoints
) withbbox_params
/keypoint_params
if needed. - Sequence/Batch (video, volumetric): Use plural arguments (
images
,masks
,volumes
,masks3d
) to apply identical parameters across the first dimension. - Multiple Attributes per BBox/Keypoint: Use
label_fields
withinbbox_params
/keypoint_params
to handle multiple labels or flags per box/keypoint. - Multiple Distinct Arrays (Image/Mask/BBox/Keypoint): Use
additional_targets
to synchronize parameters across separate inputs when you need them treated according to a standard target processing logic (e.g., stereo pairs, multi-modal data, multiple masks, potentially multiple sets of bboxes/keypoints needing identical geometric transforms). Remember to providebbox_params
orkeypoint_params
if mapping to those types.
Where to Go Next? 🔗
After learning how to use additional targets, consider these resources and next steps:
- Integrate into Task Pipelines: Apply this technique in relevant basic usage scenarios like Video Augmentation (if treating frames separately), multi-modal data, or image restoration setups.
- Review Core Concepts: Revisit Targets and Pipelines to solidify your understanding of how
A.Compose
manages different inputs. - Explore Other Advanced Guides:
- Serialization: Save pipelines that use additional targets.
- Custom Transforms: Learn how to handle additional targets within your own custom transforms.
- Consult API Documentation: Refer to the Compose API documentation and the Transforms Overview list for detailed information on parameters and target support.