albumentations.augmentations.mixing.transforms

View Source on GitHub

Transforms that combine multiple images and their associated annotations. This module contains transformations that take multiple input sources (e.g., a primary image and additional images provided via metadata) and combine them into a single output. Examples include overlaying elements (`OverlayElements`) or creating complex compositions like `Mosaic`.

Members

classMosaic
classOverlayElements

Mosaicclass

Try it on Explore

Combine multiple images and their annotations into a single image using a mosaic grid layout. This transform takes a primary input image (and its annotations) and combines it with additional images/annotations provided via metadata. It calculates the geometry for a mosaic grid, selects additional items, preprocesses annotations consistently (handling label encoding updates), applies geometric transformations, and assembles the final output.

Parameters

Name	Type	Default	Description
grid_yx	tuple[int, int]	(2, 2)	The number of rows (y) and columns (x) in the mosaic grid. Determines the maximum number of images involved (grid_yx[0] * grid_yx[1]). Default: (2, 2).
target_size	tuple[int, int]	(512, 512)	The desired output (height, width) for the final mosaic image. after cropping the mosaic grid.
cell_shape	tuple[int, int]	(512, 512)	cell shape of each cell in the mosaic grid.
center_range	tuple[float, float]	(0.3, 0.7)	Range [0.0-1.0] to sample the center point of the mosaic view relative to the valid central region of the conceptual large grid. This affects which parts of the assembled grid are visible in the final crop. Default: (0.3, 0.7).
fit_mode	One of: 'cover' 'contain'	cover	-
interpolation	One of: cv2.INTER_NEAREST cv2.INTER_NEAREST_EXACT cv2.INTER_LINEAR cv2.INTER_CUBIC cv2.INTER_AREA cv2.INTER_LANCZOS4 cv2.INTER_LINEAR_EXACT	1	OpenCV interpolation flag used for resizing images during geometric processing. Default: cv2.INTER_LINEAR.
mask_interpolation	One of: cv2.INTER_NEAREST cv2.INTER_NEAREST_EXACT cv2.INTER_LINEAR cv2.INTER_CUBIC cv2.INTER_AREA cv2.INTER_LANCZOS4 cv2.INTER_LINEAR_EXACT	0	OpenCV interpolation flag used for resizing masks during geometric processing. Default: cv2.INTER_NEAREST.
fill	One of: tuple[float, ...] float	0	Value used for padding images if needed during geometric processing. Default: 0.
fill_mask	One of: tuple[float, ...] float	0	Value used for padding masks if needed during geometric processing. Default: 0.
metadata_key	str	mosaic_metadata	Key in the input dictionary specifying the list of additional data dictionaries for the mosaic. Each dictionary in the list should represent one potential additional item. Expected keys: 'image' (required, np.ndarray), and optionally 'mask' (np.ndarray), 'bboxes' (np.ndarray), 'keypoints' (np.ndarray), and any relevant label fields (e.g., 'class_labels') corresponding to those specified in `Compose`'s `bbox_params` or `keypoint_params`. Default: "mosaic_metadata".
p	float	0.5	Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare primary data
>>> primary_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> primary_mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> primary_bboxes = np.array([[10, 10, 40, 40], [50, 50, 90, 90]], dtype=np.float32)
>>> primary_labels = [1, 2]
>>>
>>> # Prepare additional images for mosaic
>>> additional_image1 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> additional_mask1 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> additional_bboxes1 = np.array([[20, 20, 60, 60]], dtype=np.float32)
>>> additional_labels1 = [3]
>>>
>>> additional_image2 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> additional_mask2 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> additional_bboxes2 = np.array([[30, 30, 70, 70]], dtype=np.float32)
>>> additional_labels2 = [4]
>>>
>>> additional_image3 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> additional_mask3 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> additional_bboxes3 = np.array([[5, 5, 45, 45]], dtype=np.float32)
>>> additional_labels3 = [5]
>>>
>>> # Create metadata for additional images - structured as a list of dicts
>>> mosaic_metadata = [
...     {
...         'image': additional_image1,
...         'mask': additional_mask1,
...         'bboxes': additional_bboxes1,
...         'labels': additional_labels1
...     },
...     {
...         'image': additional_image2,
...         'mask': additional_mask2,
...         'bboxes': additional_bboxes2,
...         'labels': additional_labels2
...     },
...     {
...         'image': additional_image3,
...         'mask': additional_mask3,
...         'bboxes': additional_bboxes3,
...         'labels': additional_labels3
...     }
... ]
>>>
>>> # Create the transform with Mosaic
>>> transform = A.Compose([
...     A.Mosaic(
...         grid_yx=(2, 2),
...         target_size=(200, 200),
...         cell_shape=(120, 120),
...         center_range=(0.4, 0.6),
...         fit_mode="cover",
...         p=1.0
...     ),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
...     image=primary_image,
...     mask=primary_mask,
...     bboxes=primary_bboxes,
...     labels=primary_labels,
...     mosaic_metadata=mosaic_metadata  # Pass the metadata using the default key
... )
>>>
>>> # Access the transformed data
>>> mosaic_image = transformed['image']        # Combined mosaic image
>>> mosaic_mask = transformed['mask']          # Combined mosaic mask
>>> mosaic_bboxes = transformed['bboxes']      # Combined and repositioned bboxes
>>> mosaic_labels = transformed['labels']      # Combined labels from all images

OverlayElementsclass

Try it on Explore

Apply overlay elements such as images and masks onto an input image. This transformation can be used to add various objects (e.g., stickers, logos) to images with optional masks and bounding boxes for better placement control.

Parameters

Name	Type	Default	Description
metadata_key	str	overlay_metadata	Additional target key for metadata. Default `overlay_metadata`.
p	float	0.5	Probability of applying the transformation. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare primary data (base image and mask)
>>> image = np.zeros((300, 300, 3), dtype=np.uint8)
>>> mask = np.zeros((300, 300), dtype=np.uint8)
>>>
>>> # 1. Create a simple overlay image (a red square)
>>> overlay_image1 = np.zeros((50, 50, 3), dtype=np.uint8)
>>> overlay_image1[:, :, 0] = 255  # Red color
>>>
>>> # 2. Create another overlay with a mask (a blue circle with transparency)
>>> overlay_image2 = np.zeros((80, 80, 3), dtype=np.uint8)
>>> overlay_image2[:, :, 2] = 255  # Blue color
>>> overlay_mask2 = np.zeros((80, 80), dtype=np.uint8)
>>> # Create a circular mask
>>> center = (40, 40)
>>> radius = 30
>>> for i in range(80):
...     for j in range(80):
...         if (i - center[0])**2 + (j - center[1])**2 < radius**2:
...             overlay_mask2[i, j] = 255
>>>
>>> # 3. Create an overlay with both bbox and mask_id
>>> overlay_image3 = np.zeros((60, 120, 3), dtype=np.uint8)
>>> overlay_image3[:, :, 1] = 255  # Green color
>>> # Create a rectangular mask with rounded corners
>>> overlay_mask3 = np.zeros((60, 120), dtype=np.uint8)
>>> cv2.rectangle(overlay_mask3, (10, 10), (110, 50), 255, -1)
>>>
>>> # Create the metadata list - each item is a dictionary with overlay information
>>> overlay_metadata = [
...     {
...         'image': overlay_image1,
...         # No bbox provided - will be placed randomly
...     },
...     {
...         'image': overlay_image2,
...         'bbox': [0.6, 0.1, 0.9, 0.4],  # Normalized coordinates [x_min, y_min, x_max, y_max]
...         'mask': overlay_mask2,
...         'mask_id': 1  # This overlay will update the mask with id 1
...     },
...     {
...         'image': overlay_image3,
...         'bbox': [0.1, 0.7, 0.5, 0.9],  # Bottom left placement
...         'mask': overlay_mask3,
...         'mask_id': 2  # This overlay will update the mask with id 2
...     }
... ]
>>>
>>> # Create the transform
>>> transform = A.Compose([
...     A.OverlayElements(p=1.0),
... ])
>>>
>>> # Apply the transform
>>> result = transform(
...     image=image,
...     mask=mask,
...     overlay_metadata=overlay_metadata  # Pass metadata using the default key
... )
>>>
>>> # Get results with overlays applied
>>> result_image = result['image']  # Image with the three overlays applied
>>> result_mask = result['mask']    # Mask with regions labeled using the mask_id values
>>>
>>> # Let's verify the mask contains the specified mask_id values
>>> has_mask_id_1 = np.any(result_mask == 1)  # Should be True
>>> has_mask_id_2 = np.any(result_mask == 2)  # Should be True

References

doc-augmentation: https://github.com/danaaubakirova/doc-augmentation

albumentations.augmentations.mixing.transforms

Members

Mosaicclass

Parameters

Examples

OverlayElementsclass

Parameters

Examples

References

On this page