Using Additional Targets in Albumentations 🔗
Motivation 🔗
Standard Albumentations pipelines seamlessly synchronize augmentations between an image and its corresponding mask, bboxes, or keypoints. However, sometimes you need to apply the exact same geometric transformations (and potentially other transforms) with identical random parameters to multiple image-like or mask-like arrays simultaneously.
Common use cases include:
- Stereo Images: Applying the same crop, rotation, or flip to both left and right stereo images. (Note: If you only need identical processing for multiple images, you could alternatively stack them into a single NumPy array and pass them via the plural
imagesargument, treating it like a video clip. See the Video Augmentation Guide for details on that approach.) - Multi-modal Data: Consistently transforming an image and its corresponding depth map or thermal map (treating them as image-like or mask-like depending on desired interpolation and color transform application).
- Multiple Masks: Augmenting an image along with several different segmentation masks (e.g., foreground mask, parts mask) using the same geometric warp.
- Image Restoration Tasks: Applying geometric transforms identically to a clean target image and a degraded input version, while applying degradation transforms (like noise or blur) only to the input.
Albumentations provides the additional_targets argument in A.Compose and the add_targets method specifically for these situations where multiple named arrays need synchronized augmentation parameters.
Note on Other Multi-Input Scenarios:
Albumentations has other mechanisms for different multi-input scenarios, covered in other guides:
- For sequences (like video frames or volumetric slices) where the same parameters should apply across the sequence dimension, use the plural arguments (
images,masks,volumes,masks3d). See the Video and Volumetric guides. - For associating multiple labels or attributes with individual bounding boxes or keypoints, use
bbox_paramsorkeypoint_paramswithlabel_fields. See the Bounding Box and Keypoint guides.
This guide focuses only on the additional_targets feature.
Core Mechanism: additional_targets in A.Compose 🔗
Albumentations recognizes several standard target types that have specific processing rules:
image: The primary input image(s) (e.g.,[H, W, C]). Receives geometric, color, and intensity transforms. Uses standard interpolation for geometric transforms.mask: Segmentation mask(s) (e.g.,[H, W]). Receives geometric transforms using nearest-neighbor interpolation. Does not receive color/intensity transforms.masks: Multiple segmentation masks passed together (e.g.,[N, H, W]). Processed likemask.bboxes: Bounding boxes. Processed according tobbox_params. Requiresbbox_paramsto be set.keypoints: Keypoints. Processed according tokeypoint_params. Requireskeypoint_paramsto be set.volume: A 3D volume (e.g.,[D, H, W, C]). Receives 3D geometric transforms, and applicable 2D transforms slice-wise. Color/intensity transforms applied if treated as'image'.volumes: Multiple 3D volumes (e.g.,[N, D, H, W, C]). Processed likevolumeacross the first dimension.mask3d: A 3D mask (e.g.,[D, H, W]). Receives 3D geometric transforms using nearest-neighbor interpolation. Does not receive color/intensity transforms.masks3d: Multiple 3D masks (e.g.,[N, D, H, W]). Processed likemask3dacross the first dimension.
The additional_targets argument in A.Compose allows you to define new keyword argument names for your transform calls and map them to the processing logic of one of the standard single-instance target types ('image', 'mask', 'volume', 'mask3d', etc.). This tells Albumentations: "Treat the data passed under this new name exactly like you would treat data passed under the standard target name specified in the mapping."
The additional_targets argument takes a dictionary:
- Keys: Define the new keyword argument names you'll use when calling the transform (e.g.,
'right_image','depth','mask2'). These names must not clash with the standard names listed above. - Values: Specify the standard target type whose processing logic should be applied. You can map to
'image','mask','bboxes', or'keypoints'(and potentially volumetric types like'volume','mask3d'):- Mapping to
'image': Applies geometric transforms (standard interpolation) and color/intensity transforms. - Mapping to
'mask': Applies geometric transforms (nearest-neighbor interpolation) but not color/intensity transforms. - Mapping to
'bboxes': Applies geometric transforms according to the rules defined inbbox_params. Requiresbbox_paramsto be set inCompose. - Mapping to
'keypoints': Applies geometric transforms according to the rules defined inkeypoint_params. Requireskeypoint_paramsto be set inCompose.
- Mapping to
Example: Augmenting Stereo Images
import albumentations as A
import numpy as np
# Treat 'right_image' exactly like the primary 'image'
transform = A.Compose([
A.RandomResizedCrop(height=256, width=512, scale=(0.8, 1.0), p=1.0),
A.ColorJitter(p=0.5),
A.HorizontalFlip(p=0.5),
], additional_targets={'right_image': 'image'})
# Dummy stereo data
left_image = np.random.randint(0, 256, (300, 600, 3), dtype=np.uint8)
right_image = np.random.randint(0, 256, (300, 600, 3), dtype=np.uint8)
# Apply transform, passing both images
augmented = transform(image=left_image, right_image=right_image)
aug_left = augmented['image']
aug_right = augmented['right_image']
print(f"Left shape: {aug_left.shape}, Right shape: {aug_right.shape}")
# Both images received the exact same crop parameters, color jitter values, and flip decision.
Example: Augmenting Image and Multiple Masks
import albumentations as A
import numpy as np
# Define pipeline with two additional mask targets
transform = A.Compose([
A.ShiftScaleRotate(p=0.5),
A.RandomCrop(width=80, height=80, p=1.0),
A.GaussianBlur(p=0.5) # Will only affect 'image'
], additional_targets={'semantic_mask': 'mask', 'parts_mask': 'mask'})
# Create dummy data
image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
semantic_mask = np.random.randint(0, 5, (100, 100), dtype=np.uint8)
parts_mask = np.random.randint(0, 3, (100, 100), dtype=np.uint8)
# Apply transform passing all targets
augmented = transform(image=image,
semantic_mask=semantic_mask,
parts_mask=parts_mask)
aug_image = augmented['image']
aug_semantic_mask = augmented['semantic_mask']
aug_parts_mask = augmented['parts_mask']
print(f"Augmented Semantic Mask shape: {aug_semantic_mask.shape}")
print(f"Augmented Parts Mask shape: {aug_parts_mask.shape}")
# Image, semantic_mask, and parts_mask received identical ShiftScaleRotate/RandomCrop.
# Only image was potentially blurred by GaussianBlur.
Dynamic Addition: add_targets Method 🔗
Add targets to an existing Compose object dynamically.
import albumentations as A
import numpy as np
transform_dynamic = A.Compose([A.VerticalFlip(p=1.0)])
# Add targets later
new_targets = {'image2': 'image', 'mask_extra': 'mask'}
transform_dynamic.add_targets(new_targets)
# Prepare data
image1 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
image2 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
mask_extra = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
# Call with dynamically added arguments
augmented = transform_dynamic(image=image1, image2=image2, mask_extra=mask_extra)
aug_image1 = augmented['image']
aug_image2 = augmented['image2']
aug_mask_extra = augmented['mask_extra']
print("Image1, Image2, and Mask_extra all vertically flipped identically.")
Important Considerations 🔗
- Shape Consistency: All input arrays passed in a single call to the transform (e.g.,
image,right_image,depth,mask2) must have the same spatial dimensions (Height and Width for 2D, Depth, Height, and Width for 3D) before transformations are applied. Albumentations validates this by default. - Primary Use: Best suited for multiple image-like or mask-like arrays needing synchronized parameters.
- Params Requirement: If mapping to
'bboxes'or'keypoints', ensure the correspondingbbox_paramsorkeypoint_paramsare provided toCompose. - Naming: Keys in
additional_targetsmust be unique and not clash with standard arguments (image,mask,masks,bboxes,keypoints,volume,volumes,mask3d,masks3d). - Transform Support: Check individual transform documentation (
Targetssection) to confirm support for the target types you are using.
When to Use Which Mechanism? 🔗
- Single image + associated mask/bboxes/keypoints: Use standard arguments (
image,mask,bboxes,keypoints) withbbox_params/keypoint_paramsif needed. - Sequence/Batch (video, volumetric): Use plural arguments (
images,masks,volumes,masks3d) to apply identical parameters across the first dimension. - Multiple Attributes per BBox/Keypoint: Use
label_fieldswithinbbox_params/keypoint_paramsto handle multiple labels or flags per box/keypoint. - Multiple Distinct Arrays (Image/Mask/BBox/Keypoint): Use
additional_targetsto synchronize parameters across separate inputs when you need them treated according to a standard target processing logic (e.g., stereo pairs, multi-modal data, multiple masks, potentially multiple sets of bboxes/keypoints needing identical geometric transforms). Remember to providebbox_paramsorkeypoint_paramsif mapping to those types.
Where to Go Next? 🔗
After learning how to use additional targets, consider these resources and next steps:
- Integrate into Task Pipelines: Apply this technique in relevant basic usage scenarios like Video Augmentation (if treating frames separately), multi-modal data, or image restoration setups.
- Review Core Concepts: Revisit Targets and Pipelines to solidify your understanding of how
A.Composemanages different inputs. - Explore Other Advanced Guides:
- Serialization: Save pipelines that use additional targets.
- Custom Transforms: Learn how to handle additional targets within your own custom transforms.
- Consult API Documentation: Refer to the Compose API documentation and the Transforms Overview list for detailed information on parameters and target support.