Working with Multiple Data Targets 🔗
Albumentations pipelines can apply augmentations consistently across various types of related data, called targets.
Beyond just augmenting the input image
, you can simultaneously augment corresponding masks, bounding boxes, keypoints,
and volumetric data, ensuring spatial transformations are synchronized.
Quick Reference 🔗
Key Concepts:
image
orimages
: Required - primary input data- Spatial transforms: Affect all targets (image, mask, bboxes, keypoints, volumes)
- Pixel transforms: Only affect image-like targets (image, images, volumes)
- Synchronized augmentation: All targets receive identical spatial transformations
- Format configuration:
bbox_params
andkeypoint_params
required for respective targets
Common Target Combinations:
- Classification:
image
- Segmentation:
image
+mask
- Object Detection:
image
+bboxes
+bbox_params
- Keypoint Detection:
image
+keypoints
+keypoint_params
- 3D Data Processing:
volume
+mask3d
Core Requirements 🔗
Mandatory Input 🔗
You must provide either an image
or images
keyword argument. Other targets like mask
, bboxes
, etc.,
are optional and depend on your specific task. All data is passed as keyword arguments to the pipeline call.
Data Type Requirements 🔗
All data passed to Albumentations must be NumPy arrays:
- Image-like data (image, images, volume, volumes): Must be
uint8
orfloat32
NumPy arrays - Masks: Can be any integer type NumPy array
- Bounding boxes: Must be NumPy arrays, typically
float32
with shape(num_boxes, 4)
- Keypoints: Must be NumPy arrays, typically
float32
with shape(num_keypoints, 2+)
- Labels: Must be NumPy arrays (can be string or numeric dtype)
Note: Lists are no longer supported for any data type - all inputs must be NumPy arrays.
Grayscale Image Handling 🔗
Albumentations' Compose
automatically handles grayscale images for convenience:
Important Context:
- All individual transforms require grayscale images to have an explicit channel dimension
(H, W, 1)
- Compose provides the convenience layer by automatically handling both formats
- This eliminates the need for boilerplate code to check and add channel dimensions
Automatic Channel Dimension Management:
- Input flexibility: You can pass grayscale images with or without an explicit channel dimension to
Compose
- Internal preprocessing:
Compose
automatically adds a channel dimension if missing:(H, W)
→(H, W, 1)
(N, H, W)
→(N, H, W, 1)
for multiple images(D, H, W)
→(D, H, W, 1)
for volumes
- Consistent processing: All transforms then operate on consistent dimensions:
- Single images:
ndim=3
- Multiple images/single volumes:
ndim=4
- Multiple volumes:
ndim=5
- Single images:
- Automatic cleanup: If a channel dimension was added,
Compose
removes it in post-processing, returning the original format
Why This Matters:
import albumentations as A
import numpy as np
# Using Compose - handles both formats automatically
transform = A.Compose([A.HorizontalFlip(p=1.0)])
grayscale_2d = np.random.randint(0, 256, (100, 100), dtype=np.uint8) # (H, W)
grayscale_3d = np.random.randint(0, 256, (100, 100, 1), dtype=np.uint8) # (H, W, 1)
# Both work with Compose - no boilerplate needed
result_2d = transform(image=grayscale_2d) # Compose handles conversion
result_3d = transform(image=grayscale_3d) # Already in correct format
# Direct transform usage - REQUIRES channel dimension
flip = A.HorizontalFlip(p=1.0)
# This will fail - transforms expect (H, W, 1)
# flipped_2d = flip(image=grayscale_2d) # ❌ Will not work
# Must add channel dimension for direct usage
grayscale_with_channel = grayscale_2d[..., np.newaxis] # (H, W) -> (H, W, 1)
flipped = flip(image=grayscale_with_channel) # ✅ Works correctly
This design eliminates the boilerplate of checking and adding channel dimensions in every transform, while maintaining consistency across the library.
2D Targets 🔗
Single Image Data 🔗
image
: Primary Input Image 🔗
Description: The main input image for augmentation.
Format: NumPy array with shape:
(height, width, channels)
for color images (e.g., RGB)(height, width)
for grayscale images
Example:
import numpy as np
import albumentations as A
# Color image
image = np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
# Grayscale image
image_gray = np.random.randint(0, 256, (224, 224), dtype=np.uint8)
transform = A.Compose([A.HorizontalFlip(p=0.5)])
result = transform(image=image)
mask
: Segmentation Mask 🔗
Description: A segmentation mask corresponding to the input image.
Format: NumPy array with same height and width as the input image:
(height, width)
for single-class or multiclass masks(height, width, num_classes)
for multi-channel masks
Behavior:
- Spatial transforms: Applied identically to image and mask
- Pixel transforms: Do not affect masks
Example:
import albumentations as A
import numpy as np
image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
mask = np.random.randint(0, 5, (100, 100), dtype=np.uint8) # 5 classes
transform = A.Compose([
A.HorizontalFlip(p=1.0), # Affects both image and mask
A.RandomBrightnessContrast(p=1.0), # Affects only image
])
result = transform(image=image, mask=mask)
transformed_image = result['image']
transformed_mask = result['mask'] # Flipped but not brightness-adjusted
Multiple Image Data 🔗
images
: Batch of Images 🔗
Description: Multiple images that receive identical augmentation parameters. Essential for video frames, stereo pairs, or multi-channel data requiring synchronized transformations.
Format: NumPy array with shape:
(num_images, height, width, channels)
for color images(num_images, height, width)
for grayscale images
Key Feature: All images receive the exact same sequence and parameters of augmentations.
masks
: Multiple Masks 🔗
Description: Multiple segmentation masks for instance segmentation or multi-object scenarios.
Format: (num_masks, height, width)
- each slice [i, :, :]
represents one mask.
Use Case: Instance segmentation where each mask represents a different object instance.
Coordinate-Based Targets 🔗
bboxes
: Bounding Boxes 🔗
Description: Object bounding boxes with configurable coordinate formats.
Requirements:
- Must specify
bbox_params
inA.Compose
- Coordinates and labels handled separately
Supported Formats: pascal_voc
, albumentations
, coco
, yolo
Example:
import albumentations as A
import numpy as np
image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
bboxes = np.array([[10, 10, 50, 50], [60, 60, 90, 90]]) # [x_min, y_min, x_max, y_max]
class_labels = [1, 2]
transform = A.Compose([
A.HorizontalFlip(p=1.0),
A.RandomBrightnessContrast(p=0.5),
], bbox_params=A.BboxParams(
format='pascal_voc',
label_fields=['class_labels']
))
result = transform(image=image, bboxes=bboxes, class_labels=class_labels)
keypoints
: Keypoints/Landmarks 🔗
Description: Points of interest with configurable formats and coordinate systems.
Requirements:
- Must specify
keypoint_params
inA.Compose
- Support for 2D and 3D coordinates
Common Formats: [x, y]
, [x, y, angle, scale]
, [x, y, z]
3D Note: For XYZ keypoints, 2D transforms only modify x, y coordinates; z remains unchanged.
3D Targets (Volumetric Data) 🔗
Single Volume Data 🔗
volume
: 3D Data Volumes 🔗
Description: 3D data volumes for various applications including medical imaging (CT, MRI), scientific simulations, geospatial data, computer graphics, and other volumetric datasets.
Format:
(depth, height, width, channels)
(DHWC)(depth, height, width)
(DHW)
Transform Behavior:
- 2D transforms: Applied slice-wise with identical parameters across all slices
- 3D transforms: Applied to the entire volume
- Pixel transforms: Applied slice-wise
mask3d
: 3D Segmentation Mask 🔗
Description: 3D segmentation mask corresponding to a volume, used for labeling regions in 3D space.
Format: (depth, height, width)
Multiple Volume Data 🔗
volumes
: Batch of 3D Volumes 🔗
Format: (num_volumes, depth, height, width, channels)
or (num_volumes, depth, height, width)
masks3d
: Multiple 3D Masks 🔗
Format: (num_masks, depth, height, width)
Target Compatibility Matrix 🔗
This simplified matrix shows the general compatibility between transform categories and different targets. Note that individual transforms within each category may have specific limitations.
Transform Category | Description | image | mask | bboxes | keypoints | volume | mask3d |
---|---|---|---|---|---|---|---|
Pixel-level | Modify pixel values only (color, brightness, noise, blur) | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
Spatial-level | Modify geometry (flip, rotate, crop, resize) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
3D-specific | Designed for volumetric data | ❌ | ❌ | ❌ | Sometimes | ✅ | ✅ |
Key Points:
- Pixel-level transforms: Only affect pixel values in images/volumes, leaving spatial information unchanged
- Spatial-level transforms: Apply geometric changes consistently across all supported targets
- 3D transforms: Specifically designed for volumetric data (medical imaging, etc.)
⚠️ Important: This is a simplified overview. Individual transforms may have specific requirements or limitations. For a complete, up-to-date reference of which transforms support which targets, see the Supported Targets by Transform Reference.
Practical Examples 🔗
Semantic Segmentation 🔗
import albumentations as A
import numpy as np
# Prepare data
image = np.random.randint(0, 256, (256, 256, 3), dtype=np.uint8)
mask = np.random.randint(0, 21, (256, 256), dtype=np.uint8) # 21 classes
# Create pipeline
transform = A.Compose([
A.RandomCrop(width=224, height=224),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
A.Normalize(),
])
# Apply augmentations
result = transform(image=image, mask=mask)
Object Detection 🔗
import albumentations as A
import numpy as np
# Prepare data
image = np.random.randint(0, 256, (416, 416, 3), dtype=np.uint8)
bboxes = np.array([[50, 50, 150, 150], [200, 200, 350, 350]])
labels = ['person', 'car']
# Create pipeline with bbox parameters
transform = A.Compose([
A.RandomSizedBBoxSafeCrop(width=320, height=320),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.3),
], bbox_params=A.BboxParams(
format='pascal_voc',
label_fields=['labels'],
min_area=1024,
min_visibility=0.1
))
# Apply augmentations
result = transform(image=image, bboxes=bboxes, labels=labels)
Multi-Target Pipeline 🔗
import albumentations as A
import numpy as np
# Prepare all data types
image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
mask = np.random.randint(0, 2, (300, 300), dtype=np.uint8)
bboxes = np.array([[25, 25, 100, 100], [150, 150, 250, 250]])
keypoints = np.array([[50, 50], [200, 200]])
class_labels = [1, 2]
keypoint_labels = ['nose', 'eye']
# Create comprehensive pipeline
transform = A.Compose([
A.RandomCrop(width=256, height=256),
A.HorizontalFlip(p=0.5),
A.Rotate(limit=15, p=0.5),
A.RandomBrightnessContrast(p=0.3),
],
bbox_params=A.BboxParams(format='pascal_voc', label_fields=['class_labels']),
keypoint_params=A.KeypointParams(format='xy', label_fields=['keypoint_labels'])
)
# Apply to all targets
result = transform(
image=image,
mask=mask,
bboxes=bboxes,
keypoints=keypoints,
class_labels=class_labels,
keypoint_labels=keypoint_labels
)
Best Practices 🔗
Data Consistency 🔗
- Shape Matching: Ensure all spatial targets have matching dimensions
- Data Types: Use
uint8
for images, appropriate types for other targets - Coordinate Systems: Verify bbox/keypoint formats match your data
- Array Format: All targets must be NumPy arrays - lists are no longer supported for bboxes, keypoints, or labels
Performance Optimization 🔗
- Batch Processing: Use
images
/masks
for multiple related items - Target Selection: Only include targets you actually need
- Memory Management: Consider data types and array sizes
Common Pitfalls 🔗
- Missing Parameters: Always provide
bbox_params
/keypoint_params
when using bboxes/keypoints - Format Mismatches: Ensure coordinate formats match your data
- Shape Inconsistencies: All targets must have compatible spatial dimensions
Where to Go Next? 🔗
Now that you understand how Albumentations handles different data targets, you can:
Task-Specific Guides:
- Semantic Segmentation - Working with
image
andmask
- Object Detection - Using
image
,bboxes
, andbbox_params
- Keypoint Detection - Handling
image
,keypoints
, andkeypoint_params
- Volumetric Data - Working with
volume
,mask3d
, and 3D transforms
Advanced Topics:
- Additional Targets - Define custom data types beyond standard targets
- Pipelines - Understand how
A.Compose
orchestrates transforms across targets - Supported Targets by Transform - Complete reference of transform-target compatibility
Interactive Learning:
- Explore Transforms - Visualize how different transforms affect various targets