albumentations.augmentations.mixing.transforms
Paste object instances onto the primary image, updating all annotations (instance masks, bboxes, keypoints). Designed for instance segmentation training.
Members
- classCopyAndPaste
- classMosaic
- classOverlayElements
CopyAndPasteclass
CopyAndPaste(
min_visibility_after_paste: float = 0.05,
blend_mode: 'hard' | 'gaussian' = hard,
blend_sigma_range: tuple[float, float] = (1.0, 3.0),
scale_range: tuple[float, float] = (1.0, 1.0),
min_paste_area: int = 1,
metadata_key: str = copy_paste_metadata,
p: float = 0.5
)Paste object instances onto the primary image, updating all annotations (instance masks, bboxes, keypoints). Designed for instance segmentation training. Each donor object is tight-cropped to its `mask` (or `bbox` rect for bbox-only donors, optionally expanded to include `keypoints`), shrunk to fit the target image with aspect preserved (no upscaling), optionally jittered by `scale_range`, and stamped at a uniformly random location inside the target. Existing instances that become sufficiently occluded by pasted objects are removed from annotations. All per-object **content** augmentation (rotation, flip, color jitter, scale-up beyond fit) is the user's responsibility — the transform only does crop -> shrink-fit -> optional scale jitter -> uniform random placement -> stamp. Note: Most Copy-Paste implementations (e.g. detectron2) accept a single donor image with all its instance masks and internally sample a random subset of instances to paste, coupling donor selection, instance sampling, and pasting into one opaque step. This implementation separates those concerns: donor selection and instance selection are done by the user externally, and the transform pastes every object in the provided list. The metadata format is `list[dict]` (one dict per object), consistent with `Mosaic`.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| min_visibility_after_paste | float | 0.05 | Minimum mask area ratio (area_after / area_before) for an existing instance to survive after occlusion by pasted objects. Instances whose remaining visible area falls below this threshold are removed from masks and bboxes. Default: 0.05. |
| blend_mode | One of:
| hard | How to blend pasted pixels. "hard" does direct pixel copy (paper default). "gaussian" applies gaussian blur to the alpha mask for soft edges at instance boundaries. Default: "hard". |
| blend_sigma_range | tuple[float, float] | (1.0, 3.0) | Sigma range for gaussian blur when blend_mode="gaussian". Ignored when blend_mode="hard". Default: (1.0, 3.0). |
| scale_range | tuple[float, float] | (1.0, 1.0) | Multiplicative scale jitter applied on top of the shrink-to-fit scale. Sampled uniformly from this range and capped at the fit scale, so the result can shrink the donor further but never exceed fit-to-target. Default: `(1.0, 1.0)` (pure shrink-to-fit, no jitter). |
| min_paste_area | int | 1 | Minimum scaled paste footprint area (pixels). Donors whose final scaled `H*W` falls below this value are silently dropped — useful to avoid pasting tiny blob-noise from huge donors onto small targets. Default: 1. |
| metadata_key | str | copy_paste_metadata | Key in the Compose call data dict containing the list of object dictionaries to paste. Default: "copy_paste_metadata". |
| p | float | 0.5 | Probability of applying the transform. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>>
>>> # Primary data (target image is 100x100)
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> instance_masks = np.zeros((1, 100, 100), dtype=np.uint8)
>>> instance_masks[0, 10:30, 10:30] = 1
>>> bboxes = np.array([[10, 10, 30, 30]], dtype=np.float32)
>>> class_labels = [1]
>>>
>>> # Donor 1: tight 40x40 mask-based donor (any donor dims work).
>>> donor1_image = np.full((40, 40, 3), 200, dtype=np.uint8)
>>> donor1_mask = np.ones((40, 40), dtype=np.uint8)
>>>
>>> # Donor 2: bbox-only donor on a 60x80 image (rectangle paste footprint).
>>> donor2_image = np.random.randint(0, 256, (60, 80, 3), dtype=np.uint8)
>>>
>>> transform = A.Compose([
... A.CopyAndPaste(
... min_visibility_after_paste=0.05,
... scale_range=(0.5, 1.0), # randomly shrink donors to 50%-100% of fit
... p=1.0,
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['class_labels']))
>>>
>>> result = transform(
... image=image,
... masks=instance_masks,
... bboxes=bboxes,
... class_labels=class_labels,
... copy_paste_metadata=[
... {
... 'image': donor1_image,
... 'mask': donor1_mask,
... 'bbox_labels': {'class_labels': 2},
... },
... {
... 'image': donor2_image,
... 'bbox': [10, 5, 70, 55], # pascal_voc on 60x80 donor dims
... 'bbox_labels': {'class_labels': 3},
... },
... ],
... )
>>> result_image = result['image']
>>> result_masks = result['masks'] # (N_surviving + K, H, W)
>>> result_bboxes = result['bboxes'] # Updated bboxes (in pascal_voc, target dims)
>>> result_labels = result['class_labels'] # Updated labelsNotes
Most Copy-Paste implementations (e.g. detectron2) accept a single donor image with all its instance masks and internally sample a random subset of instances to paste, coupling donor selection, instance sampling, and pasting into one opaque step. This implementation separates those concerns: donor selection and instance selection are done by the user externally, and the transform pastes every object in the provided list. The metadata format is `list[dict]` (one dict per object), consistent with `Mosaic`.
Mosaicclass
Mosaic(
grid_yx: tuple[int, int] = (2, 2),
target_size: tuple[int, int] = (512, 512),
cell_shape: tuple[int, int] = (512, 512),
center_range: tuple[float, float] = (0.3, 0.7),
fit_mode: 'cover' | 'contain' = cover,
interpolation: 0 | 6 | 1 | 2 | 3 | 4 | 5 = 1,
mask_interpolation: 0 | 6 | 1 | 2 | 3 | 4 | 5 = 0,
fill: tuple[float, ...] | float = 0,
fill_mask: tuple[float, ...] | float = 0,
metadata_key: str = mosaic_metadata,
p: float = 0.5
)Combine multiple images and annotations into one image via a mosaic grid. Uses metadata for additional images; common in object detection training. Mosaic creates a grid of images by placing the primary image and additional images from metadata into cells of a larger canvas, then crops a region to produce the final output. This is commonly used in object detection training to increase data diversity and help models learn to detect objects at different scales and contexts. The transform takes a primary input image (and its annotations) and combines it with additional images/annotations provided via metadata. It calculates the geometry for a mosaic grid, selects additional items, preprocesses annotations consistently (handling label encoding updates), applies geometric transformations, and assembles the final output.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| grid_yx | tuple[int, int] | (2, 2) | The number of rows (y) and columns (x) in the mosaic grid. Determines the maximum number of images involved (grid_yx[0] * grid_yx[1]). Default: (2, 2). |
| target_size | tuple[int, int] | (512, 512) | The desired output (height, width) for the final mosaic image. after cropping the mosaic grid. |
| cell_shape | tuple[int, int] | (512, 512) | cell shape of each cell in the mosaic grid. |
| center_range | tuple[float, float] | (0.3, 0.7) | Range [0.0-1.0] to sample the center point of the mosaic view relative to the valid central region of the conceptual large grid. This affects which parts of the assembled grid are visible in the final crop. Default: (0.3, 0.7). |
| fit_mode | One of:
| cover | How to fit images into mosaic cells. - "cover": Scale image to fill the entire cell, potentially cropping parts. - "contain": Scale image to fit entirely within the cell, potentially adding padding. Default: "cover". |
| interpolation | One of:
| 1 | OpenCV interpolation flag used for resizing images during geometric processing. Default: cv2.INTER_LINEAR. |
| mask_interpolation | One of:
| 0 | OpenCV interpolation flag used for resizing masks during geometric processing. Default: cv2.INTER_NEAREST. |
| fill | One of:
| 0 | Value used for padding images if needed during geometric processing. Default: 0. |
| fill_mask | One of:
| 0 | Value used for padding masks if needed during geometric processing. Default: 0. |
| metadata_key | str | mosaic_metadata | Key in the input dictionary specifying the list of additional data dictionaries for the mosaic. Each dictionary in the list should represent one potential additional item. Expected keys: 'image' (required, np.ndarray), and optionally 'mask' (np.ndarray), 'masks' (np.ndarray, stacked instance masks), 'bboxes' (np.ndarray), 'keypoints' (np.ndarray), and label fields supplied via the `bbox_labels` and `keypoint_labels` wrapper dicts (see Metadata Format below). Default: "mosaic_metadata". |
| p | float | 0.5 | Probability of applying the transform. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare primary data
>>> primary_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> primary_mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> primary_bboxes = np.array([[10, 10, 40, 40], [50, 50, 90, 90]], dtype=np.float32)
>>> primary_bbox_classes = [1, 2]
>>> primary_keypoints = np.array([[25, 25], [75, 75]], dtype=np.float32)
>>> primary_keypoint_classes = ['eye', 'nose']
>>>
>>> # Prepare additional images for mosaic.
>>> # bbox_labels and keypoint_labels are dicts mapping field name -> list of values.
>>> mosaic_metadata = [
... {
... 'image': np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8),
... 'mask': np.random.randint(0, 2, (100, 100), dtype=np.uint8),
... 'bboxes': np.array([[20, 20, 60, 60]], dtype=np.float32),
... 'bbox_labels': {'bbox_classes': [3]},
... 'keypoints': np.array([[40, 40]], dtype=np.float32),
... 'keypoint_labels': {'keypoint_classes': ['mouth']},
... },
... {
... 'image': np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8),
... 'mask': np.random.randint(0, 2, (100, 100), dtype=np.uint8),
... 'bboxes': np.array([[30, 30, 70, 70]], dtype=np.float32),
... 'bbox_labels': {'bbox_classes': [4]},
... 'keypoints': np.array([[50, 50], [65, 65]], dtype=np.float32),
... 'keypoint_labels': {'keypoint_classes': ['eye', 'eye']},
... },
... ]
>>>
>>> transform = A.Compose([
... A.Mosaic(
... grid_yx=(2, 2),
... target_size=(200, 200),
... cell_shape=(120, 120),
... center_range=(0.4, 0.6),
... fit_mode="cover",
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_classes']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_classes']))
>>>
>>> transformed = transform(
... image=primary_image,
... mask=primary_mask,
... bboxes=primary_bboxes,
... bbox_classes=primary_bbox_classes,
... keypoints=primary_keypoints,
... keypoint_classes=primary_keypoint_classes,
... mosaic_metadata=mosaic_metadata,
... )
>>>
>>> mosaic_image = transformed['image']
>>> mosaic_bboxes = transformed['bboxes']
>>> mosaic_bbox_classes = transformed['bbox_classes']
>>> mosaic_keypoint_classes = transformed['keypoint_classes']Notes
If fewer additional images are provided than needed to fill the grid, the primary image will be replicated to fill the remaining cells. For example, with a 2x2 grid, if only one additional image is provided, the mosaic will contain the primary image in two cells and the additional image in one cell, with one visible cell selected from these three. Stacked instance masks on the `masks` key (N, H, W) are transformed via `apply_to_masks` like other DualTransforms; `_targets` only lists `Targets` enum values (no `Targets.MASKS`).
OverlayElementsclass
OverlayElements(
metadata_key: str = overlay_metadata,
p: float = 0.5
)Apply overlay images/masks onto an input image (e.g. stickers, logos). Optional bboxes and masks for placement. Uses metadata_key.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| metadata_key | str | overlay_metadata | Additional target key for metadata. Default `overlay_metadata`. |
| p | float | 0.5 | Probability of applying the transformation. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare primary data (base image and mask)
>>> image = np.zeros((300, 300, 3), dtype=np.uint8)
>>> mask = np.zeros((300, 300), dtype=np.uint8)
>>>
>>> # 1. Create a simple overlay image (a red square)
>>> overlay_image1 = np.zeros((50, 50, 3), dtype=np.uint8)
>>> overlay_image1[:, :, 0] = 255 # Red color
>>>
>>> # 2. Create another overlay with a mask (a blue circle with transparency)
>>> overlay_image2 = np.zeros((80, 80, 3), dtype=np.uint8)
>>> overlay_image2[:, :, 2] = 255 # Blue color
>>> overlay_mask2 = np.zeros((80, 80), dtype=np.uint8)
>>> # Create a circular mask
>>> center = (40, 40)
>>> radius = 30
>>> for i in range(80):
... for j in range(80):
... if (i - center[0])**2 + (j - center[1])**2 < radius**2:
... overlay_mask2[i, j] = 255
>>>
>>> # 3. Create an overlay with both bbox and mask_id
>>> overlay_image3 = np.zeros((60, 120, 3), dtype=np.uint8)
>>> overlay_image3[:, :, 1] = 255 # Green color
>>> # Create a rectangular mask with rounded corners
>>> overlay_mask3 = np.zeros((60, 120), dtype=np.uint8)
>>> cv2.rectangle(overlay_mask3, (10, 10), (110, 50), 255, -1)
>>>
>>> # Create the metadata list - each item is a dictionary with overlay information
>>> overlay_metadata = [
... {
... 'image': overlay_image1,
... # No bbox provided - will be placed randomly
... },
... {
... 'image': overlay_image2,
... 'bbox': [0.6, 0.1, 0.9, 0.4], # Normalized coordinates [x_min, y_min, x_max, y_max]
... 'mask': overlay_mask2,
... 'mask_id': 1 # This overlay will update the mask with id 1
... },
... {
... 'image': overlay_image3,
... 'bbox': [0.1, 0.7, 0.5, 0.9], # Bottom left placement
... 'mask': overlay_mask3,
... 'mask_id': 2 # This overlay will update the mask with id 2
... }
... ]
>>>
>>> # Create the transform
>>> transform = A.Compose([
... A.OverlayElements(p=1.0),
... ])
>>>
>>> # Apply the transform
>>> result = transform(
... image=image,
... mask=mask,
... overlay_metadata=overlay_metadata # Pass metadata using the default key
... )
>>>
>>> # Get results with overlays applied
>>> result_image = result['image'] # Image with the three overlays applied
>>> result_mask = result['mask'] # Mask with regions labeled using the mask_id values
>>>
>>> # Let's verify the mask contains the specified mask_id values
>>> has_mask_id_1 = np.any(result_mask == 1) # Should be True
>>> has_mask_id_2 = np.any(result_mask == 2) # Should be TrueReferences
- [{'description': 'doc-augmentation', 'source': 'https://github.com/danaaubakirova/doc-augmentation'}]