albumentations.augmentations.mixing.transforms
Combine multiple images and their annotations into a single image using a mosaic grid layout.
Members
- classMosaic
Mosaicclass
Mosaic(
grid_yx: tuple[int, int] = (2, 2),
target_size: tuple[int, int] = (512, 512),
cell_shape: tuple[int, int] = (512, 512),
center_range: tuple[float, float] = (0.3, 0.7),
fit_mode: 'cover' | 'contain' = cover,
interpolation: 0 | 6 | 1 | 2 | 3 | 4 | 5 = 1,
mask_interpolation: 0 | 6 | 1 | 2 | 3 | 4 | 5 = 0,
fill: tuple[float, ...] | float = 0,
fill_mask: tuple[float, ...] | float = 0,
metadata_key: str = mosaic_metadata,
p: float = 0.5
)Combine multiple images and their annotations into a single image using a mosaic grid layout. Mosaic creates a grid of images by placing the primary image and additional images from metadata into cells of a larger canvas, then crops a region to produce the final output. This is commonly used in object detection training to increase data diversity and help models learn to detect objects at different scales and contexts. The transform takes a primary input image (and its annotations) and combines it with additional images/annotations provided via metadata. It calculates the geometry for a mosaic grid, selects additional items, preprocesses annotations consistently (handling label encoding updates), applies geometric transformations, and assembles the final output.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| grid_yx | tuple[int, int] | (2, 2) | The number of rows (y) and columns (x) in the mosaic grid. Determines the maximum number of images involved (grid_yx[0] * grid_yx[1]). Default: (2, 2). |
| target_size | tuple[int, int] | (512, 512) | The desired output (height, width) for the final mosaic image. after cropping the mosaic grid. |
| cell_shape | tuple[int, int] | (512, 512) | cell shape of each cell in the mosaic grid. |
| center_range | tuple[float, float] | (0.3, 0.7) | Range [0.0-1.0] to sample the center point of the mosaic view relative to the valid central region of the conceptual large grid. This affects which parts of the assembled grid are visible in the final crop. Default: (0.3, 0.7). |
| fit_mode | One of:
| cover | How to fit images into mosaic cells. - "cover": Scale image to fill the entire cell, potentially cropping parts. - "contain": Scale image to fit entirely within the cell, potentially adding padding. Default: "cover". |
| interpolation | One of:
| 1 | OpenCV interpolation flag used for resizing images during geometric processing. Default: cv2.INTER_LINEAR. |
| mask_interpolation | One of:
| 0 | OpenCV interpolation flag used for resizing masks during geometric processing. Default: cv2.INTER_NEAREST. |
| fill | One of:
| 0 | Value used for padding images if needed during geometric processing. Default: 0. |
| fill_mask | One of:
| 0 | Value used for padding masks if needed during geometric processing. Default: 0. |
| metadata_key | str | mosaic_metadata | Key in the input dictionary specifying the list of additional data dictionaries for the mosaic. Each dictionary in the list should represent one potential additional item. Expected keys: 'image' (required, np.ndarray), and optionally 'mask' (np.ndarray), 'bboxes' (np.ndarray), 'keypoints' (np.ndarray), and their corresponding label fields. Label field names must match those specified in `Compose`'s processor params: - For bboxes: field name should match `bbox_params.label_fields` - For keypoints: field name should match `keypoint_params.label_fields` Default: "mosaic_metadata". |
| p | float | 0.5 | Probability of applying the transform. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare primary data
>>> primary_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> primary_mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> primary_bboxes = np.array([[10, 10, 40, 40], [50, 50, 90, 90]], dtype=np.float32)
>>> primary_bbox_classes = [1, 2]
>>> primary_keypoints = np.array([[25, 25], [75, 75]], dtype=np.float32)
>>> primary_keypoint_classes = ['eye', 'nose']
>>>
>>> # Prepare additional images for mosaic
>>> additional_image1 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> additional_mask1 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> additional_bboxes1 = np.array([[20, 20, 60, 60]], dtype=np.float32)
>>> additional_bbox_classes1 = [3]
>>> additional_keypoints1 = np.array([[40, 40]], dtype=np.float32)
>>> additional_keypoint_classes1 = ['mouth']
>>>
>>> additional_image2 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> additional_mask2 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> additional_bboxes2 = np.array([[30, 30, 70, 70]], dtype=np.float32)
>>> additional_bbox_classes2 = [4]
>>> additional_keypoints2 = np.array([[50, 50], [65, 65]], dtype=np.float32)
>>> additional_keypoint_classes2 = ['eye', 'eye']
>>>
>>> additional_image3 = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> additional_mask3 = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> additional_bboxes3 = np.array([[5, 5, 45, 45]], dtype=np.float32)
>>> additional_bbox_classes3 = [5]
>>> additional_keypoints3 = np.array([[25, 25]], dtype=np.float32)
>>> additional_keypoint_classes3 = ['nose']
>>>
>>> # Create metadata for additional images - structured as a list of dicts
>>> # Note: label field names must match those in bbox_params/keypoint_params
>>> mosaic_metadata = [
... {
... 'image': additional_image1,
... 'mask': additional_mask1,
... 'bboxes': additional_bboxes1,
... 'bbox_classes': additional_bbox_classes1,
... 'keypoints': additional_keypoints1,
... 'keypoint_classes': additional_keypoint_classes1
... },
... {
... 'image': additional_image2,
... 'mask': additional_mask2,
... 'bboxes': additional_bboxes2,
... 'bbox_classes': additional_bbox_classes2,
... 'keypoints': additional_keypoints2,
... 'keypoint_classes': additional_keypoint_classes2
... },
... {
... 'image': additional_image3,
... 'mask': additional_mask3,
... 'bboxes': additional_bboxes3,
... 'bbox_classes': additional_bbox_classes3,
... 'keypoints': additional_keypoints3,
... 'keypoint_classes': additional_keypoint_classes3
... }
... ]
>>>
>>> # Create the transform with Mosaic
>>> transform = A.Compose([
... A.Mosaic(
... grid_yx=(2, 2),
... target_size=(200, 200),
... cell_shape=(120, 120),
... center_range=(0.4, 0.6),
... fit_mode="cover",
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_classes']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_classes']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=primary_image,
... mask=primary_mask,
... bboxes=primary_bboxes,
... bbox_classes=primary_bbox_classes,
... keypoints=primary_keypoints,
... keypoint_classes=primary_keypoint_classes,
... mosaic_metadata=mosaic_metadata # Pass the metadata using the default key
... )
>>>
>>> # Access the transformed data
>>> mosaic_image = transformed['image'] # Combined mosaic image
>>> mosaic_mask = transformed['mask'] # Combined mosaic mask
>>> mosaic_bboxes = transformed['bboxes'] # Combined and repositioned bboxes
>>> mosaic_bbox_classes = transformed['bbox_classes'] # Combined bbox labels from all images
>>> mosaic_keypoints = transformed['keypoints'] # Combined and repositioned keypoints
>>> mosaic_keypoint_classes = transformed['keypoint_classes'] # Combined keypoint labelsNotes
If fewer additional images are provided than needed to fill the grid, the primary image will be replicated to fill the remaining cells. For example, with a 2x2 grid, if only one additional image is provided, the mosaic will contain the primary image in two cells and the additional image in one cell, with one visible cell selected from these three.