albumentations.augmentations.mixing.mosaic
Mosaic mixing transform.
Members
- classMosaic
Mosaicclass
Mosaic(
grid_yx: tuple[int, int] = (2, 2),
target_size: tuple[int, int] = (512, 512),
cell_shape: tuple[int, int] = (512, 512),
center_range: tuple[float, float] = (0.3, 0.7),
fit_mode: Literal['cover', 'contain'] = cover,
interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT] = cv2.INTER_LINEAR,
mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT] = cv2.INTER_NEAREST,
fill: tuple[float, ...] | float = 0,
fill_mask: tuple[float, ...] | float = 0,
metadata_key: str = mosaic_metadata,
p: float = 0.5
)Combine multiple images and annotations into one image via a mosaic grid. Uses metadata for additional images; common in object detection training. Mosaic creates a grid of images by placing the primary image and additional images from metadata into cells of a larger canvas, then crops a region to produce the final output. This is commonly used in object detection training to increase data diversity and help models learn to detect objects at different scales and contexts. The transform takes a primary input image (and its annotations) and combines it with additional images/annotations provided via metadata. It calculates the geometry for a mosaic grid, selects additional items, preprocesses annotations consistently (handling label encoding updates), applies geometric transformations, and assembles the final output. Args: grid_yx (tuple[int, int]): The number of rows (y) and columns (x) in the mosaic grid. Determines the maximum number of images involved (grid_yx[0] * grid_yx[1]). Default: (2, 2). target_size (tuple[int, int]): The desired output (height, width) for the final mosaic image. after cropping the mosaic grid. cell_shape (tuple[int, int]): cell shape of each cell in the mosaic grid. fit_mode (Literal['cover', 'contain']): How to fit images into mosaic cells. - "cover": Scale image to fill the entire cell, potentially cropping parts. - "contain": Scale image to fit entirely within the cell, potentially adding padding. Default: "cover". metadata_key (str): Key in the input dictionary specifying the list of additional data dictionaries for the mosaic. Each dictionary in the list should represent one potential additional item. Expected keys: 'image' (required, np.ndarray), and optionally 'mask' (np.ndarray), 'masks' (np.ndarray, stacked instance masks), 'bboxes' (np.ndarray), 'keypoints' (np.ndarray), and label fields supplied via the `bbox_labels` and `keypoint_labels` wrapper dicts (see Metadata Format below). Default: "mosaic_metadata". center_range (tuple[float, float]): Range [0.0-1.0] to sample the center point of the mosaic view relative to the valid central region of the conceptual large grid. This affects which parts of the assembled grid are visible in the final crop. Default: (0.3, 0.7). interpolation (int): OpenCV interpolation flag used for resizing images during geometric processing. Default: cv2.INTER_LINEAR. mask_interpolation (int): OpenCV interpolation flag used for resizing masks during geometric processing. Default: cv2.INTER_NEAREST. fill (tuple[float, ...] | float): Value used for padding images if needed during geometric processing. Default: 0. fill_mask (tuple[float, ...] | float): Value used for padding masks if needed during geometric processing. Default: 0. p (float): Probability of applying the transform. Default: 0.5. Workflow (`get_params_dependent_on_data`): 1. Calculate Geometry & Visible Cells: Determine which grid cells are visible in the final `target_size` crop and their placement coordinates on the output canvas. 2. Validate Raw Additional Metadata: Filter the list provided via `metadata_key`, keeping only valid items (dicts with an 'image' key). 3. Select Subset of Raw Additional Metadata: Choose a subset of the valid raw items based on the number of visible cells requiring additional data. 4. Preprocess Selected Raw Additional Items: Preprocess bboxes/keypoints for the *selected* additional items *only*. This uses shared processors from `Compose`, updating their internal state (e.g., `LabelEncoder`) based on labels in these selected items. 5. Prepare Primary Data: Extract preprocessed primary data fields from the input `data` dictionary into a `primary` dictionary. 6. Determine & Perform Replication: If fewer additional items were selected than needed, replicate the preprocessed primary data as required. 7. Combine Final Items: Create the list of all preprocessed items (primary, selected additional, replicated primary) that will be used. 8. Assign Items to VISIBLE Grid Cells 9. Process Geometry & Shift Coordinates: For each assigned item: a. Apply geometric transforms to image/mask based on `fit_mode`: - "cover": Resize to smallest dimension covering the cell, then crop to cell size - "contain": Resize to largest dimension fitting in the cell, then pad to cell size b. Apply geometric shift to the *preprocessed* bboxes/keypoints based on cell placement. 10. Return Parameters: Return the processed cell data (image, mask, shifted bboxes, shifted kps) keyed by placement coordinates. Label Handling: - The transform relies on `bbox_processor` and `keypoint_processor` provided by `Compose`. - `Compose.preprocess` initially fits the processors' `LabelEncoder` on the primary data. - This transform (`Mosaic`) preprocesses the *selected* additional raw items using the same processors. If new labels are found, the shared `LabelEncoder` state is updated via its `update` method. - `Compose.postprocess` uses the final updated encoder state to decode all labels present in the mosaic output for the current `Compose` call. - The encoder state is transient per `Compose` call. Note: If fewer additional images are provided than needed to fill the grid, the primary image will be replicated to fill the remaining cells. For example, with a 2x2 grid, if only one additional image is provided, the mosaic will contain the primary image in two cells and the additional image in one cell, with one visible cell selected from these three. Stacked instance masks on the `masks` key (N, H, W) are transformed via `apply_to_masks` like other DualTransforms; `_targets` only lists `Targets` enum values (no `Targets.MASKS`). Targets: image, mask, bboxes, keypoints Image types: uint8, float32 Supported bboxes: hbb, obb Reference: YOLOv4: Optimal Speed and Accuracy of Object Detection: https://arxiv.org/pdf/2004.10934 Metadata Format: Each dict in the metadata list represents one additional image and must contain: - image (np.ndarray): Additional image. Required. - mask (np.ndarray): Semantic mask for the additional image. Optional. - masks (np.ndarray): Stacked instance masks (N, H, W) for the additional image. Optional; same geometry as image. Use with instance_binding / pipeline masks target. - bboxes (np.ndarray): Bounding boxes in the **same coordinate format** as `BboxParams.coord_format` declared in `Compose`. Optional. - keypoints (np.ndarray): Keypoints in the **same format** as `KeypointParams.coord_format` declared in `Compose`. Optional. - bbox_labels (dict[str, Any]): Label lists for bboxes, keyed by label field name as declared in `BboxParams.label_fields`. Each value must be a list with one entry per bbox. E.g. `{"class_id": [3, 7], "is_crowd": [0, 1]}`. - keypoint_labels (dict[str, Any]): Label lists for keypoints, keyed by label field name as declared in `KeypointParams.label_fields`. Each value must be a list with one entry per keypoint. E.g. `{"joint_name": ["left_eye", "nose"]}`. Examples: >>> import numpy as np >>> import albumentations as A >>> import cv2 >>> >>> # Prepare primary data >>> primary_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8) >>> primary_mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8) >>> primary_bboxes = np.array([[10, 10, 40, 40], [50, 50, 90, 90]], dtype=np.float32) >>> primary_bbox_classes = [1, 2] >>> primary_keypoints = np.array([[25, 25], [75, 75]], dtype=np.float32) >>> primary_keypoint_classes = ['eye', 'nose'] >>> >>> # Prepare additional images for mosaic. >>> # bbox_labels and keypoint_labels are dicts mapping field name -> list of values. >>> mosaic_metadata = [ ... { ... 'image': np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8), ... 'mask': np.random.randint(0, 2, (100, 100), dtype=np.uint8), ... 'bboxes': np.array([[20, 20, 60, 60]], dtype=np.float32), ... 'bbox_labels': {'bbox_classes': [3]}, ... 'keypoints': np.array([[40, 40]], dtype=np.float32), ... 'keypoint_labels': {'keypoint_classes': ['mouth']}, ... }, ... { ... 'image': np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8), ... 'mask': np.random.randint(0, 2, (100, 100), dtype=np.uint8), ... 'bboxes': np.array([[30, 30, 70, 70]], dtype=np.float32), ... 'bbox_labels': {'bbox_classes': [4]}, ... 'keypoints': np.array([[50, 50], [65, 65]], dtype=np.float32), ... 'keypoint_labels': {'keypoint_classes': ['eye', 'eye']}, ... }, ... ] >>> >>> transform = A.Compose([ ... A.Mosaic( ... grid_yx=(2, 2), ... target_size=(200, 200), ... cell_shape=(120, 120), ... center_range=(0.4, 0.6), ... fit_mode="cover", ... p=1.0 ... ), ... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_classes']), ... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_classes'])) >>> >>> transformed = transform( ... image=primary_image, ... mask=primary_mask, ... bboxes=primary_bboxes, ... bbox_classes=primary_bbox_classes, ... keypoints=primary_keypoints, ... keypoint_classes=primary_keypoint_classes, ... mosaic_metadata=mosaic_metadata, ... ) >>> >>> mosaic_image = transformed['image'] >>> mosaic_bboxes = transformed['bboxes'] >>> mosaic_bbox_classes = transformed['bbox_classes'] >>> mosaic_keypoint_classes = transformed['keypoint_classes']
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| grid_yx | tuple[int, int] | (2, 2) | - |
| target_size | tuple[int, int] | (512, 512) | - |
| cell_shape | tuple[int, int] | (512, 512) | - |
| center_range | tuple[float, float] | (0.3, 0.7) | - |
| fit_mode | One of:
| cover | - |
| interpolation | One of:
| cv2.INTER_LINEAR | - |
| mask_interpolation | One of:
| cv2.INTER_NEAREST | - |
| fill | One of:
| 0 | - |
| fill_mask | One of:
| 0 | - |
| metadata_key | str | mosaic_metadata | - |
| p | float | 0.5 | - |