albumentations.augmentations.mixing.copy_paste

CopyAndPaste mixing transform.

Members

classCopyAndPaste

CopyAndPasteclass

CopyAndPaste(
    min_visibility_after_paste: float = 0.05,
    blend_mode: Literal['hard', 'gaussian'] = hard,
    blend_sigma_range: tuple[float, float] = (1.0, 3.0),
    scale_range: tuple[float, float] = (1.0, 1.0),
    min_paste_area: int = 1,
    metadata_key: str = copy_paste_metadata,
    p: float = 0.5
)

Paste object instances onto the primary image, updating all annotations (instance masks, bboxes, keypoints). Designed for instance segmentation training. Each donor object is tight-cropped to its `mask` (or `bbox` rect for bbox-only donors, optionally expanded to include `keypoints`), shrunk to fit the target image with aspect preserved (no upscaling), optionally jittered by `scale_range`, and stamped at a uniformly random location inside the target. Existing instances that become sufficiently occluded by pasted objects are removed from annotations. All per-object **content** augmentation (rotation, flip, color jitter, scale-up beyond fit) is the user's responsibility — the transform only does crop -> shrink-fit -> optional scale jitter -> uniform random placement -> stamp. Note: Most Copy-Paste implementations (e.g. detectron2) accept a single donor image with all its instance masks and internally sample a random subset of instances to paste, coupling donor selection, instance sampling, and pasting into one opaque step. This implementation separates those concerns: donor selection and instance selection are done by the user externally, and the transform pastes every object in the provided list. The metadata format is `list[dict]` (one dict per object), consistent with `Mosaic`. Args: min_visibility_after_paste (float): Minimum mask area ratio (area_after / area_before) for an existing instance to survive after occlusion by pasted objects. Instances whose remaining visible area falls below this threshold are removed from masks and bboxes. Default: 0.05. blend_mode (Literal["hard", "gaussian"]): How to blend pasted pixels. "hard" does direct pixel copy (paper default). "gaussian" applies gaussian blur to the alpha mask for soft edges at instance boundaries. Default: "hard". blend_sigma_range (tuple[float, float]): Sigma range for gaussian blur when blend_mode="gaussian". Ignored when blend_mode="hard". Default: (1.0, 3.0). scale_range (tuple[float, float]): Multiplicative scale jitter applied on top of the shrink-to-fit scale. Sampled uniformly from this range and capped at the fit scale, so the result can shrink the donor further but never exceed fit-to-target. Default: `(1.0, 1.0)` (pure shrink-to-fit, no jitter). min_paste_area (int): Minimum scaled paste footprint area (pixels). Donors whose final scaled `H*W` falls below this value are silently dropped — useful to avoid pasting tiny blob-noise from huge donors onto small targets. Default: 1. metadata_key (str): Key in the Compose call data dict containing the list of object dictionaries to paste. Default: "copy_paste_metadata". p (float): Probability of applying the transform. Default: 0.5. Metadata Format: The value at `metadata_key` must be a list of dicts. Each dict describes one donor object; donor image dimensions can differ from the target image dimensions and from each other. Coordinates for `bbox` / `keypoints` MUST be in the same `coord_format` declared in the pipeline's `BboxParams` / `KeypointParams`, normalized (where applicable) to the **donor** image dimensions — exactly as you would provide them if the donor were the primary image. The transform handles internal coordinate conversions. - image (np.ndarray): Donor image (Hd, Wd, C) containing the object. Required. - mask (np.ndarray): Binary **instance** mask (Hd, Wd) defining the paste footprint. Optional when `bbox` is provided. Empty masks (no positive pixels) are dropped. - bbox (np.ndarray | list): Horizontal bounding box of the object in `BboxParams.coord_format` on donor dims. Required for bbox-only donors; optional otherwise (a tight box is derived from `mask` if absent). - semantic_mask (np.ndarray): Optional semantic label map (Hd, Wd), same dims as `image`. When provided AND the pipeline passes a `mask` target, the donor's class ids replace the primary semantic mask inside the paste footprint. When the pipeline has a `mask` target but no donor supplies `semantic_mask`, a `UserWarning` fires once. - keypoints (np.ndarray): Keypoints in `KeypointParams.coord_format` on donor dims. Optional. Keypoints outside the mask/bbox tight crop expand the crop bounds so they are preserved into the target. - bbox_labels (dict[str, Any]): Label values for this donor's bbox, keyed by the names declared in `BboxParams.label_fields`. E.g. `{"class_id": 3, "is_crowd": 0}`. - keypoint_labels (dict[str, Any]): Label values for this donor's keypoints, keyed by the names declared in `KeypointParams.label_fields`. A list value is accepted when the object has multiple keypoints. Targets: image, mask, bboxes, keypoints Keypoints vs instance masks: When the pipeline supplies instance masks as `masks` (N, H, W) and `paste_surviving_indices` is computed from them, primary keypoints are filtered only if `keypoints.shape[0]` equals N (one row per instance, same order as stacked masks). Otherwise existing keypoints are left unchanged and pasted keypoints are still appended. Image types: uint8, float32 Supported bboxes: hbb Reference: Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation: https://arxiv.org/abs/2012.07177 Examples: >>> import numpy as np >>> import albumentations as A >>> >>> # Primary data (target image is 100x100) >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8) >>> instance_masks = np.zeros((1, 100, 100), dtype=np.uint8) >>> instance_masks[0, 10:30, 10:30] = 1 >>> bboxes = np.array([[10, 10, 30, 30]], dtype=np.float32) >>> class_labels = [1] >>> >>> # Donor 1: tight 40x40 mask-based donor (any donor dims work). >>> donor1_image = np.full((40, 40, 3), 200, dtype=np.uint8) >>> donor1_mask = np.ones((40, 40), dtype=np.uint8) >>> >>> # Donor 2: bbox-only donor on a 60x80 image (rectangle paste footprint). >>> donor2_image = np.random.randint(0, 256, (60, 80, 3), dtype=np.uint8) >>> >>> transform = A.Compose([ ... A.CopyAndPaste( ... min_visibility_after_paste=0.05, ... scale_range=(0.5, 1.0), # randomly shrink donors to 50%-100% of fit ... p=1.0, ... ), ... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['class_labels'])) >>> >>> result = transform( ... image=image, ... masks=instance_masks, ... bboxes=bboxes, ... class_labels=class_labels, ... copy_paste_metadata=[ ... { ... 'image': donor1_image, ... 'mask': donor1_mask, ... 'bbox_labels': {'class_labels': 2}, ... }, ... { ... 'image': donor2_image, ... 'bbox': [10, 5, 70, 55], # pascal_voc on 60x80 donor dims ... 'bbox_labels': {'class_labels': 3}, ... }, ... ], ... ) >>> result_image = result['image'] >>> result_masks = result['masks'] # (N_surviving + K, H, W) >>> result_bboxes = result['bboxes'] # Updated bboxes (in pascal_voc, target dims) >>> result_labels = result['class_labels'] # Updated labels

Parameters

Name	Type	Default	Description
min_visibility_after_paste	float	0.05	-
blend_mode	One of: 'hard' 'gaussian'	hard	-
blend_sigma_range	tuple[float, float]	(1.0, 3.0)	-
scale_range	tuple[float, float]	(1.0, 1.0)	-
min_paste_area	int	1	-
metadata_key	str	copy_paste_metadata	-
p	float	0.5	-