Frequently Asked Questions 🔗
This FAQ covers common questions about Albumentations, from basic setup to advanced usage. You'll find information about:
- Installation troubleshooting and configuration
- Working with different data formats (images, video, volumetric data)
- Advanced usage patterns and best practices
- Integration with other tools and migration from other libraries
If you don't find an answer to your question, please check the GitHub Issues for AlbumentationsX or original Albumentations, or join our Discord community.
Installation 🔗
I am receiving an error message Failed building wheel for imagecodecs
when I am trying to install Albumentations. How can I fix the problem? 🔗
Try to update pip
by running the following command:
python -m pip install --upgrade pip
How to disable automatic checks for new versions? 🔗
To disable automatic checks for new versions, set the environment variable NO_ALBUMENTATIONS_UPDATE
to 1
.
How to make Albumentations use one CPU core? 🔗
Albumentations do not use multithreading by default, but libraries it depends on (like opencv) may use multithreading. To make Albumentations use one CPU core, you can set the following environment variables:
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["VECLIB_MAXIMUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
# Disable OpenCV multithreading and OpenCL
import cv2
cv2.setNumThreads(0)
cv2.ocl.setUseOpenCL(False)
This should be done at the beginning of your script or before creating the DataLoader. Note that this solution may not be necessary for all users, and you should only apply it if you're experiencing performance problems with your specific setup.
Experiencing slow performance with PyTorch DataLoader multi-processing? 🔗
Some users have reported performance issues when using Albumentations with PyTorch's DataLoader in a multi-processing setup. This can occur on certain hardware/software configurations because OpenCV (cv2), which Albumentations uses under the hood, may spawn multiple threads within each DataLoader worker process. These threads can potentially interfere with each other, leading to CPU blocking and slower data loading.
If you encounter this issue, you can try disabling OpenCV's internal multithreading and OpenCL acceleration by calling:
import cv2
cv2.setNumThreads(0)
cv2.ocl.setUseOpenCL(False)
This should be done at the beginning of your script or before creating the DataLoader. Note that this solution may not be necessary for all users, and you should only apply it if you're experiencing performance problems with your specific setup.
AlbumentationsX and Licensing 🔗
What is AlbumentationsX? 🔗
AlbumentationsX is the next-generation successor to Albumentations. It's a 100% drop-in replacement that offers:
- Active maintenance and bug fixes
- Performance improvements
- New features and transforms
- Professional support options
- Dual licensing (AGPL/Commercial)
The original Albumentations remains MIT licensed but is no longer actively maintained.
Do I need to change my code when switching to AlbumentationsX? 🔗
No! AlbumentationsX is designed as a complete drop-in replacement. Simply uninstall albumentations
and install albumentationsx
. Your existing code will work without any modifications.
What is dual licensing and why was it introduced? 🔗
AlbumentationsX uses dual licensing:
- AGPL-3.0: Free for open source projects that are also AGPL-licensed
- Commercial License: Required for proprietary/commercial use or projects with non-AGPL licenses
This model ensures sustainable development of the library. After 7 years of MIT licensing with minimal financial support, dual licensing provides resources for full-time maintenance and development. See our License Guide for details.
Do I need to pay to use AlbumentationsX? 🔗
You need a commercial license if:
- You're using it in proprietary/commercial software
- Your open source project uses a non-AGPL license (MIT, Apache, BSD, etc.)
- You want to keep your code private
You can use it for free only if your entire project is licensed under AGPL-3.0.
Can I continue using the original MIT-licensed Albumentations? 🔗
Yes! The original Albumentations remains available and MIT licensed. However, it's no longer actively maintained, so you won't receive bug fixes or new features.
What is AGPL and how does it differ from MIT? 🔗
AGPL (Affero General Public License) is a copyleft license that requires:
- If you use AGPL software, your entire project must also be AGPL
- You cannot mix AGPL code with MIT/Apache/BSD licensed code without converting everything to AGPL
- If you distribute the software or run it as a network service, you must provide source code
MIT license has no such requirements - you can use MIT code in any project with any license.
How do I disable telemetry in AlbumentationsX? 🔗
AlbumentationsX includes anonymous usage telemetry to improve the library. To disable it:
Globally:
export ALBUMENTATIONS_NO_TELEMETRY=1
Or per-pipeline:
transform = A.Compose([...], telemetry=False)
The telemetry only collects anonymous usage statistics (transform names, parameters) and never collects images or personal data.
How to disable automatic version checks in AlbumentationsX? 🔗
Similar to the original Albumentations, you can disable automatic checks for new versions:
export NO_ALBUMENTATIONS_UPDATE=1
Or disable both telemetry and update checks:
export ALBUMENTATIONS_OFFLINE=1
Data Formats and Basic Usage 🔗
Supported Image Types 🔗
Albumentations works with images of type uint8 and float32. uint8 images should be in the [0, 255]
range, and float32 images should be in the [0, 1]
range. If float32 images lie outside of the [0, 1]
range, they will be automatically clipped to the [0, 1]
range.
Why do you call cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
in your examples? 🔗
For historical reasons, OpenCV reads an image in BGR format (so color channels of the image have the following order: Blue, Green, Red). Albumentations uses the most common and popular RGB image format. So when using OpenCV, we need to convert the image format to RGB explicitly.
How to have reproducible augmentations? 🔗
To have reproducible augmentations, set the seed
parameter in your transform pipeline. This will ensure that the same random parameters are used for each augmentation, resulting in the same output for the same input.
transform = A.Compose([
A.RandomCrop(height=256, width=256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
], seed=42)
How does Albumentations handle grayscale images? 🔗
Albumentations' Compose
automatically manages channel dimensions for grayscale images:
The Key Point:
- All individual transforms in Albumentations require grayscale images to have a channel dimension
(H, W, 1)
Compose
provides convenience by automatically handling both(H, W)
and(H, W, 1)
formats- This design eliminates boilerplate code for checking and adding channel dimensions
With Compose (recommended):
- You can pass grayscale images as either
(H, W)
or(H, W, 1)
- Compose automatically adds a channel dimension if missing during preprocessing
- After applying transforms, it removes any added channel dimension, returning the original format
import albumentations as A
import numpy as np
transform = A.Compose([
A.RandomCrop(224, 224),
A.HorizontalFlip(p=0.5),
])
# Both formats work with Compose
grayscale_2d = np.random.randint(0, 256, (256, 256), dtype=np.uint8) # (H, W)
grayscale_3d = np.random.randint(0, 256, (256, 256, 1), dtype=np.uint8) # (H, W, 1)
result_2d = transform(image=grayscale_2d)['image'] # Output shape: (224, 224)
result_3d = transform(image=grayscale_3d)['image'] # Output shape: (224, 224, 1)
Without Compose (direct transform usage):
- You must ensure grayscale images have an explicit channel dimension
(H, W, 1)
- Transforms will fail with
(H, W)
format
# Direct transform usage
flip = A.HorizontalFlip(p=1.0)
# ❌ This will fail - individual transforms require (H, W, 1)
# flipped_wrong = flip(image=grayscale_2d)['image']
# ✅ Add channel dimension for direct usage
grayscale_with_channel = grayscale_2d[..., np.newaxis] # (H, W) -> (H, W, 1)
flipped = flip(image=grayscale_with_channel)['image'] # Works correctly
This also applies to batch processing:
(N, H, W)
→(N, H, W, 1)
for multiple images(D, H, W)
→(D, H, W, 1)
for volumes
What data formats does Albumentations accept? 🔗
All inputs to Albumentations must be NumPy arrays. Lists are no longer supported for any data type.
Required formats:
- Images: NumPy arrays with shape
(H, W)
or(H, W, C)
- Masks: NumPy arrays with shape
(H, W)
or(H, W, num_classes)
- Bounding boxes: NumPy arrays with shape
(num_boxes, 4)
- Keypoints: NumPy arrays with shape
(num_keypoints, 2+)
- Labels: NumPy arrays (can be string or numeric dtype)
import numpy as np
import albumentations as A
# ✅ Correct - using NumPy arrays
bboxes = np.array([[10, 10, 50, 50], [60, 60, 90, 90]], dtype=np.float32)
labels = np.array(['cat', 'dog'])
keypoints = np.array([[25, 25], [75, 75]], dtype=np.float32)
# ❌ Incorrect - lists are not supported
# bboxes = [[10, 10, 50, 50], [60, 60, 90, 90]] # Will fail
# labels = ['cat', 'dog'] # Will fail
# keypoints = [[25, 25], [75, 75]] # Will fail
Working with Different Data Types 🔗
How to process video data with Albumentations? 🔗
Albumentations can process video data by treating it as a sequence of frames in numpy array format:
(N, H, W)
- Grayscale video (N frames)(N, H, W, C)
- Color video (N frames)
When you pass a video array, Albumentations will apply the same transform with identical parameters to each frame, ensuring temporal consistency.
video = np.random.rand(32, 256, 256, 3) # 32 RGB frames
transform = A.Compose([
A.RandomCrop(height=224, width=224),
A.HorizontalFlip(p=0.5)
], seed=137)
transformed = transform(image=video)['image']
See Working with Video Data for more info.
How to process volumetric data with Albumentations? 🔗
Albumentations can process volumetric data by treating it as a sequence of 2D slices. When you pass a volumetric data as a numpy array, Albumentations will apply the same transform with identical parameters to each slice, ensuring temporal consistency.
See Working with Volumetric Data (3D) for more info.
Which transforms work with my data type? 🔗
Different transforms support different combinations of targets (images, masks, bboxes, keypoints, volumes). Before building your pipeline, check the Supported Targets by Transform reference to ensure your chosen transforms are compatible with your data types.
This reference is particularly useful when:
- Working with multiple targets simultaneously (e.g., images + masks + bboxes)
- Planning volumetric (3D) augmentation pipelines
- Debugging "unsupported target" errors
- Choosing transforms for specific computer vision tasks
My computer vision pipeline works with a sequence of images. I want to apply the same augmentations with the same parameters to each image in the sequence. Can Albumentations do it? 🔗
Yes. You can define additional images, masks, bounding boxes, or keypoints through the additional_targets
argument to Compose
. You can then pass those additional targets to the augmentation pipeline, and Albumentations will augment them in the same way. See this example for more info.
But if you want only to the sequence of images, you may just use images
target that accepts
list[numpy.ndarray]
or np.ndarray with shape (N, H, W, C) / (N, H, W)
.
Advanced Usage 🔗
How to have reproducible augmentations? 🔗
To have reproducible augmentations, set the seed
parameter in your transform pipeline. This will ensure that the same random parameters are used for each augmentation, resulting in the same output for the same input.
Note that Albumentations uses its own internal random state that is completely independent from global random seeds. This means:
- Setting
np.random.seed()
orrandom.seed()
will NOT affect Albumentations' randomization - Two
Compose
instances with the same seed will produce identical augmentation sequences - Each call to the same
Compose
instance still produces random augmentations, but these sequences are reproducible between different instances
Example of reproducible augmentations:
# These two transforms will produce identical sequences
transform1 = A.Compose([
A.RandomCrop(height=256, width=256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
], seed=137)
transform2 = A.Compose([
A.RandomCrop(height=256, width=256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
], seed=137)
# This will NOT affect Albumentations randomization
np.random.seed(137)
random.seed(137)
How can I find which augmentations were applied to the input data and which parameters they used? 🔗
You may pass save_applied_params=True
to Compose
to save the parameters of the applied augmentations. You can access them later using applied_transforms
.
transform = A.Compose([
A.RandomCrop(256, 256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.RandomGamma(p=0.5),
A.Normalize(),
], save_applied_params=True, seed=42)
transformed = transform(image=image)['image']
print(transform["applied_transforms"])
How to apply CutOut augmentation? 🔗
Albumentations provides the CoarseDropout
transform, which is a generalization of the CutOut and Random Erasing augmentation techniques. If you are looking for CutOut or Random Erasing, you should use CoarseDropout
.
CoarseDropout
generalizes these techniques by allowing:
- Variable number of holes: You specify a range for the number of holes (
num_holes_range
) instead of a fixed number or just one. - Variable hole size: Holes can be rectangular, with ranges for height (
hole_height_range
) and width (hole_width_range
). Sizes can be specified in pixels (int) or as fractions of image dimensions (float). - Flexible fill values: You can fill the holes with a constant value (int/float), per-channel values (tuple), random noise per pixel (
'random'
), a single random color per hole ('random_uniform'
), or using OpenCV inpainting methods ('inpaint_telea'
,'inpaint_ns'
). - Optional mask filling: You can specify a separate
fill_mask
value to fill corresponding areas in the mask, or leave the mask unchanged (None
).
Example using random uniform fill:
import albumentations as A
import numpy as np
image = np.random.randint(0, 256, size=(256, 256, 3), dtype=np.uint8)
# Apply CoarseDropout with 3-6 holes, each 10-20 pixels in size,
# filled with a random uniform color.
transform = A.CoarseDropout(
num_holes_range=(3, 6),
hole_height_range=(10, 20),
hole_width_range=(10, 20),
fill="random_uniform",
p=1.0
)
augmented_image = transform(image=image)['image']
This transform randomly removes rectangular regions from an image, similar to CutOut, but with more configuration options.
To specifically mimic the original CutOut behavior, you can configure CoarseDropout
as follows:
- Set
num_holes_range=(1, 1)
to always create exactly one hole. - Set
hole_height_range
andhole_width_range
to the same fixed value (e.g.,(16, 16)
for a 16x16 pixel square or(0.1, 0.1)
for a square 10% of the image size). - Set
fill=0
to fill the hole with black.
Example mimicking CutOut with a fixed 16x16 hole:
cutout_transform = A.CoarseDropout(
num_holes_range=(1, 1),
hole_height_range=(16, 16),
hole_width_range=(16, 16),
fill=0,
p=1.0 # Apply always, or adjust probability as needed
)
How to perform balanced scaling? 🔗
The default scaling logic in RandomScale
, ShiftScaleRotate
, and Affine
transformations is biased towards upscaling.
For example, if scale_limit = (0.5, 2)
, a user might expect that the image will be scaled down in half of the cases and scaled up in the other half. However, in reality, the image will be scaled up in 75% of the cases and scaled down in only 25% of the cases. This is because the default behavior samples uniformly from the interval [0.5, 2]
, and the interval [0.5, 1]
is three times smaller than [1, 2]
.
To achieve balanced scaling, you can use Affine
with balanced_scale=True
, which ensures that the probability of scaling up and scaling down is equal.
balanced_scale_transform = A.Affine(scale=(0.5, 2), balanced_scale=True)
or use OneOf
transform as follows:
balanced_scale_transform = A.OneOf([
A.Affine(scale=(0.5, 1), p=0.5),
A.Affine(scale=(1, 2), p=0.5)])
This approach ensures that exactly half of the samples will be upscaled and half will be downscaled.
How do I know which transforms support my combination of targets? 🔗
When working with multiple targets (images + masks + bboxes + keypoints), you need to ensure all transforms in your pipeline support your specific combination. The Supported Targets by Transform reference shows exactly which transforms work with which target types.
This is especially important when:
- Building pipelines with bounding boxes or keypoints
- Working with volumetric (3D) data
- Debugging "unsupported target" errors
- Migrating from other libraries
For example, if you're working with images and bounding boxes, you'll need to use spatial-level (dual) transforms rather than pixel-level transforms, as pixel-level transforms don't modify bounding box coordinates.
Augmentations have a parameter named p
that sets the probability of applying that augmentation. How does p
work in nested containers? 🔗
The p
parameter sets the probability of applying a specific augmentation. When augmentations are nested within a top-level container like Compose
, the effective probability of each augmentation is the product of the container's probability and the augmentation's probability.
Let's look at an example when a container Compose
contains one augmentation Resize
:
transform = A.Compose([
A.Resize(height=256, width=256, p=1.0),
], p=0.9)
In this case, Resize
has a 90% chance to be applied. This is because there is a 90% chance for Compose
to be applied (p=0.9). If Compose
is applied, then Resize
is applied with 100% probability (p=1.0)
.
To visualize:
- Probability of
Compose
being applied: 0.9 - Probability of
Resize
being applied givenCompose
is applied: 1.0 - Effective probability of
Resize
being applied: 0.9 * 1.0 = 0.9 (or 90%)
This means that the effective probability of Resize
being applied is the product of the probabilities of Compose
and Resize
, which is 0.9 * 1.0 = 0.9
or 90%. This principle applies to other transformations as well, where the overall probability is the product of the individual probabilities within the transformation pipeline.
Here's another example:
transform = A.Compose([
A.Resize(height=256, width=256, p=0.5),
], p=0.9)
In this example, Resize
has an effective probability of being applied as 0.9 * 0.5
= 0.45 or 45%. This is because Compose
is applied 90% of the time, and within that 90%, Resize
is applied 50% of the time.
I created annotations for bounding boxes using labeling service or labeling software. How can I use those annotations in Albumentations? 🔗
You need to convert those annotations to one of the formats, supported by Albumentations. For the list of formats, please refer to this article.
Note: Not all transforms support bounding boxes. Check the Supported Targets by Transform reference to see which transforms work with your specific combination of data types.
Consult the documentation of the labeling service to see how you can export annotations in those formats.
Integration and Migration 🔗
How to save and load augmentation transforms to HuggingFace Hub? 🔗
import albumentations as A
import numpy as np
transform = A.Compose([
A.RandomCrop(256, 256),
A.HorizontalFlip(),
A.RandomBrightnessContrast(),
A.RGBShift(),
A.Normalize(),
])
transform.save_pretrained("qubvel-hf/albu", key="train")
# The 'key' parameter specifies the context or purpose of the saved transform,
# allowing for organized and context-specific retrieval.
# ^ this will save the transform to a directory "qubvel-hf/albu"
# with filename "albumentations_config_train.json"
transform.save_pretrained("qubvel-hf/albu", key="train", push_to_hub=True)
# ^ this will save the transform to a directory "qubvel-hf/albu"
# with filename "albumentations_config_train.json"
# + push the transform to the Hub to the repository "qubvel-hf/albu"
transform.push_to_hub("qubvel-hf/albu", key="train")
# Use `save_pretrained` to save the transform locally and optionally push to the Hub.
# Use `push_to_hub` to directly push the transform to the Hub without saving it locally.
# ^ this will push the transform to the Hub to the repository "qubvel-hf/albu"
# (without saving it locally)
loaded_transform = A.Compose.from_pretrained("qubvel-hf/albu", key="train")
# ^ this will load the transform from local folder if exist or from the Hub
# repository "qubvel-hf/albu"
See this example for more info.
How do I migrate from other augmentation libraries to Albumentations? 🔗
If you're migrating from other libraries like torchvision or Kornia, you can refer to our Library Comparison guide. This guide provides:
- Mapping tables showing equivalent transforms between libraries
- Performance benchmarks demonstrating Albumentations' speed advantages
- Code examples for common migration scenarios
- Key differences in implementation and parameter handling
For a quick visual comparison of different augmentations, you can also use our interactive tool at explore.albumentations.ai to see how transforms affect images before implementing them.
For specific migration examples, see: