Stay updated

News & Insights
utils

albumentations.augmentations.mixing.functional


Functional implementations for image mixing operations. This module provides utility functions for blending and combining images, such as copy-and-paste operations with masking.

ProcessedMosaicItemclass

ProcessedMosaicItem()

Represents a single data item (primary or additional) after preprocessing. Includes the original image/mask and the *preprocessed* annotations.

copy_and_paste_blendfunction

copy_and_paste_blend(
    base_image: np.ndarray,
    overlay_image: np.ndarray,
    overlay_mask: np.ndarray,
    offset: tuple[int, int]
)

Blend images by copying pixels from an overlay image to a base image using a mask. This function copies pixels from the overlay image to the base image only where the mask has non-zero values. The overlay is placed at the specified offset from the top-left corner of the base image. Args: base_image (np.ndarray): The destination image that will be modified. overlay_image (np.ndarray): The source image containing pixels to copy. overlay_mask (np.ndarray): Binary mask indicating which pixels to copy from the overlay. Pixels are copied where mask > 0. offset (tuple[int, int]): The (y, x) offset specifying where to place the top-left corner of the overlay relative to the base image. Returns: np.ndarray: The blended image with the overlay applied to the base image.

Parameters

NameTypeDefaultDescription
base_imagenp.ndarray--
overlay_imagenp.ndarray--
overlay_masknp.ndarray--
offsettuple[int, int]--

calculate_mosaic_center_pointfunction

calculate_mosaic_center_point(
    grid_yx: tuple[int, int],
    cell_shape: tuple[int, int],
    target_size: tuple[int, int],
    center_range: tuple[float, float],
    py_random: random.Random
)

Calculates the center point for the mosaic crop using proportional sampling within the valid zone. Ensures the center point allows a crop of target_size to overlap all grid cells, applying randomness based on center_range proportionally within the valid region where the center can lie. Args: grid_yx (tuple[int, int]): The (rows, cols) of the mosaic grid. cell_shape (tuple[int, int]): Shape of each cell in the mosaic grid. target_size (tuple[int, int]): The final output (height, width). center_range (tuple[float, float]): Range [0.0-1.0] for sampling center proportionally within the valid zone. py_random (random.Random): Random state instance. Returns: tuple[int, int]: The calculated (x, y) center point relative to the top-left of the conceptual large grid.

Parameters

NameTypeDefaultDescription
grid_yxtuple[int, int]--
cell_shapetuple[int, int]--
target_sizetuple[int, int]--
center_rangetuple[float, float]--
py_randomrandom.Random--

calculate_cell_placementsfunction

calculate_cell_placements(
    grid_yx: tuple[int, int],
    cell_shape: tuple[int, int],
    target_size: tuple[int, int],
    center_xy: tuple[int, int]
)

Calculates placements by clipping arange-defined grid lines to the crop window. Args: grid_yx (tuple[int, int]): The (rows, cols) of the mosaic grid. cell_shape (tuple[int, int]): Shape of each cell in the mosaic grid. target_size (tuple[int, int]): The final output (height, width). center_xy (tuple[int, int]): The calculated (x, y) center of the final crop window, relative to the top-left of the conceptual large grid. Returns: list[tuple[int, int, int, int]]: A list containing placement coordinates `(x_min, y_min, x_max, y_max)` for each resulting cell part on the final output canvas.

Parameters

NameTypeDefaultDescription
grid_yxtuple[int, int]--
cell_shapetuple[int, int]--
target_sizetuple[int, int]--
center_xytuple[int, int]--

filter_valid_metadatafunction

filter_valid_metadata(
    metadata_input: Sequence[dict[str, Any]] | None,
    metadata_key_name: str,
    data: dict[str, Any]
)

Filters a list of metadata dicts, keeping only valid ones based on data compatibility.

Parameters

NameTypeDefaultDescription
metadata_input
One of:
  • Sequence[dict[str, Any]]
  • None
--
metadata_key_namestr--
datadict[str, Any]--

assign_items_to_grid_cellsfunction

assign_items_to_grid_cells(
    num_items: int,
    cell_placements: list[tuple[int, int, int, int]],
    py_random: random.Random
)

Assigns item indices to placement coordinate tuples. Assigns the primary item (index 0) to the placement with the largest area, and assigns the remaining items (indices 1 to num_items-1) randomly to the remaining placements. Args: num_items (int): The total number of items to assign (primary + additional + replicas). cell_placements (list[tuple[int, int, int, int]]): List of placement coords (x1, y1, x2, y2) for cells to be filled. py_random (random.Random): Random state instance. Returns: dict[tuple[int, int, int, int], int]: Dict mapping placement coords (x1, y1, x2, y2) to assigned item index.

Parameters

NameTypeDefaultDescription
num_itemsint--
cell_placementslist[tuple[int, int, int, int]]--
py_randomrandom.Random--

preprocess_selected_mosaic_itemsfunction

preprocess_selected_mosaic_items(
    selected_raw_items: list[dict[str, Any]],
    bbox_processor: BboxProcessor | None,
    keypoint_processor: KeypointsProcessor | None
)

Preprocesses bboxes/keypoints for selected raw additional items. Iterates through items, preprocesses annotations individually using processors (updating label encoders), and returns a list of dicts with original image/mask and the corresponding preprocessed bboxes/keypoints.

Parameters

NameTypeDefaultDescription
selected_raw_itemslist[dict[str, Any]]--
bbox_processor
One of:
  • BboxProcessor
  • None
--
keypoint_processor
One of:
  • KeypointsProcessor
  • None
--

get_opposite_crop_coordsfunction

get_opposite_crop_coords(
    cell_size: tuple[int, int],
    crop_size: tuple[int, int],
    cell_position: Literal['top_left', 'top_right', 'center', 'bottom_left', 'bottom_right']
)

Calculates crop coordinates positioned opposite to the specified cell_position. Given a cell of `cell_size`, this function determines the top-left (x_min, y_min) and bottom-right (x_max, y_max) coordinates for a crop of `crop_size`, such that the crop is located in the corner or center opposite to `cell_position`. For example, if `cell_position` is "top_left", the crop coordinates will correspond to the bottom-right region of the cell. Args: cell_size: The (height, width) of the cell from which to crop. crop_size: The (height, width) of the desired crop. cell_position: The reference position within the cell. The crop will be taken from the opposite position. Returns: tuple[int, int, int, int]: (x_min, y_min, x_max, y_max) representing the crop coordinates. Raises: ValueError: If crop_size is larger than cell_size in either dimension.

Parameters

NameTypeDefaultDescription
cell_sizetuple[int, int]--
crop_sizetuple[int, int]--
cell_position
One of:
  • 'top_left'
  • 'top_right'
  • 'center'
  • 'bottom_left'
  • 'bottom_right'
--

process_cell_geometryfunction

process_cell_geometry(
    cell_shape: tuple[int, int],
    item: ProcessedMosaicItem,
    target_shape: tuple[int, int],
    fill: float | tuple[float, ...],
    fill_mask: float | tuple[float, ...],
    fit_mode: Literal['cover', 'contain'],
    interpolation: int,
    mask_interpolation: int,
    cell_position: Literal['top_left', 'top_right', 'center', 'bottom_left', 'bottom_right']
)

Applies geometric transformations (padding and/or cropping) to a single mosaic item. Uses a Compose pipeline with PadIfNeeded and Crop to ensure the output matches the target cell dimensions exactly, handling both padding and cropping cases. Args: cell_shape: (tuple[int, int]): Shape of the cell. item: (ProcessedMosaicItem): The preprocessed mosaic item dictionary. target_shape: (tuple[int, int]): Target shape of the cell. fill: (float | tuple[float, ...]): Fill value for image padding. fill_mask: (float | tuple[float, ...]): Fill value for mask padding. fit_mode: (Literal["cover", "contain"]): Fit mode for the mosaic. interpolation: (int): Interpolation method for image. mask_interpolation: (int): Interpolation method for mask. cell_position: (Literal["top_left", "top_right", "center", "bottom_left", "bottom_right"]): Position of the cell. Returns: (ProcessedMosaicItem): Dictionary containing the geometrically processed image, mask, bboxes, and keypoints, fitting the target dimensions.

Parameters

NameTypeDefaultDescription
cell_shapetuple[int, int]--
itemProcessedMosaicItem--
target_shapetuple[int, int]--
fill
One of:
  • float
  • tuple[float, ...]
--
fill_mask
One of:
  • float
  • tuple[float, ...]
--
fit_mode
One of:
  • 'cover'
  • 'contain'
--
interpolationint--
mask_interpolationint--
cell_position
One of:
  • 'top_left'
  • 'top_right'
  • 'center'
  • 'bottom_left'
  • 'bottom_right'
--

shift_cell_coordinatesfunction

shift_cell_coordinates(
    processed_item_geom: ProcessedMosaicItem,
    placement_coords: tuple[int, int, int, int]
)

Shifts the coordinates of geometrically processed bboxes and keypoints. Args: processed_item_geom: (ProcessedMosaicItem): The output from process_cell_geometry. placement_coords: (tuple[int, int, int, int]): The (x1, y1, x2, y2) placement on the final canvas. Returns: (ProcessedMosaicItem): A dictionary with keys 'bboxes' and 'keypoints', containing the shifted numpy arrays (potentially empty).

Parameters

NameTypeDefaultDescription
processed_item_geomProcessedMosaicItem--
placement_coordstuple[int, int, int, int]--

assemble_mosaic_from_processed_cellsfunction

assemble_mosaic_from_processed_cells(
    processed_cells: dict[tuple[int, int, int, int], dict[str, Any]],
    target_shape: tuple[int, ...],
    dtype: np.dtype,
    data_key: Literal['image', 'mask'],
    fill: float | tuple[float, ...] | None
)

Assembles the final mosaic image or mask from processed cell data onto a canvas. Initializes the canvas with the fill value and overwrites with processed segments. Handles potentially multi-channel masks. Addresses potential broadcasting errors if mask segments have unexpected dimensions. Assumes input data is valid and correctly sized. Args: processed_cells (dict[tuple[int, int, int, int], dict[str, Any]]): Dictionary mapping placement coords to processed cell data. target_shape (tuple[int, ...]): The target shape of the output canvas (e.g., (H, W) or (H, W, C)). dtype (np.dtype): NumPy dtype for the canvas. data_key (Literal["image", "mask"]): Specifies whether to assemble 'image' or 'mask'. fill (float | tuple[float, ...] | None): Value used to initialize the canvas (image fill or mask fill). Should be a float/int or a tuple matching the number of channels. If None, defaults to 0. Returns: np.ndarray: The assembled mosaic canvas.

Parameters

NameTypeDefaultDescription
processed_cellsdict[tuple[int, int, int, int], dict[str, Any]]--
target_shapetuple[int, ...]--
dtypenp.dtype--
data_key
One of:
  • 'image'
  • 'mask'
--
fill
One of:
  • float
  • tuple[float, ...]
  • None
--

process_all_mosaic_geometriesfunction

process_all_mosaic_geometries(
    canvas_shape: tuple[int, int],
    cell_shape: tuple[int, int],
    placement_to_item_index: dict[tuple[int, int, int, int], int],
    final_items_for_grid: list[ProcessedMosaicItem],
    fill: float | tuple[float, ...],
    fill_mask: float | tuple[float, ...],
    fit_mode: Literal['cover', 'contain'],
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT],
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT]
)

Processes the geometry (cropping/padding) for all assigned mosaic cells. Iterates through assigned placements, applies geometric transforms via process_cell_geometry, and returns a dictionary mapping final placement coordinates to the processed item data. The bbox/keypoint coordinates in the returned dict are *not* shifted yet. Args: canvas_shape (tuple[int, int]): The shape of the canvas. cell_shape (tuple[int, int]): Shape of each cell in the mosaic grid. placement_to_item_index (dict[tuple[int, int, int, int], int]): Mapping from placement coordinates (x1, y1, x2, y2) to assigned item index. final_items_for_grid (list[ProcessedMosaicItem]): List of all preprocessed items available. fill (float | tuple[float, ...]): Fill value for image padding. fill_mask (float | tuple[float, ...]): Fill value for mask padding. fit_mode (Literal["cover", "contain"]): Fit mode for the mosaic. interpolation (int): Interpolation method for image. mask_interpolation (int): Interpolation method for mask. Returns: dict[tuple[int, int, int, int], ProcessedMosaicItem]: Dictionary mapping final placement coordinates (x1, y1, x2, y2) to the geometrically processed item data (image, mask, un-shifted bboxes/kps).

Parameters

NameTypeDefaultDescription
canvas_shapetuple[int, int]--
cell_shapetuple[int, int]--
placement_to_item_indexdict[tuple[int, int, int, int], int]--
final_items_for_gridlist[ProcessedMosaicItem]--
fill
One of:
  • float
  • tuple[float, ...]
--
fill_mask
One of:
  • float
  • tuple[float, ...]
--
fit_mode
One of:
  • 'cover'
  • 'contain'
--
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_NEAREST_EXACT
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
  • cv2.INTER_LINEAR_EXACT
--
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_NEAREST_EXACT
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
  • cv2.INTER_LINEAR_EXACT
--

get_cell_relative_positionfunction

get_cell_relative_position(
    placement_coords: tuple[int, int, int, int],
    target_shape: tuple[int, int]
)

Determines the position of a cell relative to the center of the target canvas. Compares the cell center to the canvas center and returns its quadrant or "center" if it lies on or very close to a central axis. Args: placement_coords (tuple[int, int, int, int]): The (x_min, y_min, x_max, y_max) coordinates of the cell. target_shape (tuple[int, int]): The (height, width) of the overall target canvas. Returns: Literal["top_left", "top_right", "center", "bottom_left", "bottom_right"]: The position of the cell relative to the center of the target canvas.

Parameters

NameTypeDefaultDescription
placement_coordstuple[int, int, int, int]--
target_shapetuple[int, int]--

shift_all_coordinatesfunction

shift_all_coordinates(
    processed_cells_geom: dict[tuple[int, int, int, int], ProcessedMosaicItem],
    canvas_shape: tuple[int, int]
)

Shifts coordinates for all geometrically processed cells. Iterates through the processed cells (keyed by placement coords), applies coordinate shifting to bboxes/keypoints, and returns a new dictionary with the same keys but updated ProcessedMosaicItem values containing the *shifted* coordinates. Args: processed_cells_geom (dict[tuple[int, int, int, int], ProcessedMosaicItem]): Output from process_all_mosaic_geometries (keyed by placement coords). canvas_shape (tuple[int, int]): The shape of the canvas. Returns: dict[tuple[int, int, int, int], ProcessedMosaicItem]: Final dictionary mapping placement coords (x1, y1, x2, y2) to processed cell data with shifted coordinates.

Parameters

NameTypeDefaultDescription
processed_cells_geomdict[tuple[int, int, int, int], ProcessedMosaicItem]--
canvas_shapetuple[int, int]--