albumentations.core.bbox_utils


Utilities for handling bounding box operations during image augmentation. This module provides tools for processing bounding boxes in various formats (COCO, Pascal VOC, YOLO, cxcywh), converting between coordinate systems, normalizing and denormalizing coordinates, filtering boxes based on visibility and size criteria, and performing transformations on boxes to match image augmentations. It forms the core functionality for all bounding box-related operations in the albumentations library.

BboxParamsclass

BboxParams(
    coord_format: Literal['coco', 'pascal_voc', 'albumentations', 'yolo', 'cxcywh'],
    label_fields: Sequence[Any] | None,
    bbox_type: Literal['hbb', 'obb'] = hbb,
    min_area: float = 0.0,
    min_visibility: float = 0.0,
    min_width: float = 0.0,
    min_height: float = 0.0,
    check_each_transform: bool = True,
    filter_invalid_bboxes: bool = False,
    max_accept_ratio: float | None,
    clip_bboxes_on_input: bool = False,
    clip_after_transform: bool = True
)

Parameters for bounding box transforms. Args: coord_format (Literal["coco", "pascal_voc", "albumentations", "yolo", "cxcywh"]): Coordinate format of bounding boxes. Should be one of: - 'coco': [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. - 'pascal_voc': [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. - 'albumentations': like pascal_voc but normalized in [0, 1] range, e.g. [0.2, 0.3, 0.4, 0.5]. - 'yolo': [x_center, y_center, width, height] normalized in [0, 1] range, e.g. [0.1, 0.2, 0.3, 0.4]. - 'cxcywh': [x_center, y_center, width, height] in pixel coordinates, e.g. [50, 50, 40, 60]. bbox_type (Literal["hbb", "obb"]): Bounding box type. - 'hbb': axis-aligned boxes with 4 coords (default). - 'obb': oriented boxes with angle as the 5th coord. label_fields (Sequence[str] | None): List of fields that are joined with boxes, e.g., ['class_labels', 'scores']. Default: None. min_area (float): Minimum area of a bounding box. All bounding boxes whose visible area in pixels is less than this value will be removed. Default: 0.0. min_visibility (float): Minimum fraction of area for a bounding box to remain this box in the result. Should be in [0.0, 1.0] range. Default: 0.0. min_width (float): Minimum width of a bounding box in pixels or normalized units. Bounding boxes with width less than this value will be removed. Default: 0.0. min_height (float): Minimum height of a bounding box in pixels or normalized units. Bounding boxes with height less than this value will be removed. Default: 0.0. check_each_transform (bool): If True, performs checks for each dual transform. Default: True. clip_bboxes_on_input (bool): If True, clips bounding boxes to image boundaries once at pipeline start (during preprocessing). Use this to fix invalid input data (e.g., YOLO coordinates like -1e-6). For OBB: clipping is lossy—boxes with corners outside [0, 1] become axis-aligned (angle=0). Recommend False for OBB when using Affine/rotation. Default: False. filter_invalid_bboxes (bool): If True, filters out invalid bounding boxes (e.g., boxes with negative dimensions or boxes where x_max < x_min or y_max < y_min) at the beginning of the pipeline. If clip_bboxes_on_input=True, filtering is applied after clipping. Default: False. max_accept_ratio (float | None): Maximum allowed aspect ratio for bounding boxes. The aspect ratio is calculated as max(width/height, height/width), so it's always >= 1. Boxes with aspect ratio greater than this value will be filtered out. For example, if max_accept_ratio=3.0, boxes with width:height or height:width ratios greater than 3:1 will be removed. Set to None to disable aspect ratio filtering. Default: None. clip_after_transform (bool): If True, clip bounding boxes to image bounds AFTER EACH TRANSFORM in the augmentation pipeline. If False, boxes may temporarily go outside [0, 1] bounds. This is different from `clip_bboxes_on_input` which only runs once before the pipeline. When True: for HBB, clips (x_min, y_min, x_max, y_max) to [0, 1]; for OBB, clips all 4 rotated corners to [0, 1] and returns a wrapping axis-aligned bounding box (angle set to 0). Default: True. Note: The processing order for bounding boxes is: 1. Convert to albumentations format (normalized pascal_voc) 2. Clip boxes to image boundaries (if clip_bboxes_on_input=True) - PRE-PIPELINE, fixes invalid input 3. Filter invalid boxes (if filter_invalid_bboxes=True) 4. Apply transformations 5. After each transform: clip (if clip_after_transform=True) and filter boxes based on min_area, min_visibility, min_width, min_height 6. Convert back to the original format **clip_bboxes_on_input vs clip_after_transform:** - `clip_bboxes_on_input=True`: Happens ONCE before pipeline (fixes YOLO coords like -1e-6) - `clip_after_transform`: Happens AFTER EACH transform (handles augmentation-induced excursions) Examples: >>> # Create BboxParams for COCO format with class labels >>> bbox_params = BboxParams( ... coord_format='coco', ... label_fields=['class_labels'], ... min_area=1024, ... min_visibility=0.1 ... ) >>> # Create BboxParams that clips and filters invalid boxes >>> bbox_params = BboxParams( ... coord_format='pascal_voc', ... clip_bboxes_on_input=True, ... filter_invalid_bboxes=True ... ) >>> # Create BboxParams that filters extremely elongated boxes >>> bbox_params = BboxParams( ... coord_format='yolo', ... max_accept_ratio=5.0, # Filter boxes with aspect ratio > 5:1 ... clip_bboxes_on_input=True ... ) >>> # Create BboxParams for OBB with clipping after transforms >>> bbox_params = BboxParams( ... coord_format='albumentations', ... bbox_type='obb', ... clip_after_transform=True, # Clip all corners inside bounds ... ) >>> # Create BboxParams with lenient clipping (allows temporary excursions) >>> bbox_params = BboxParams( ... coord_format='yolo', ... clip_bboxes_on_input=True, # Fix input errors ... clip_after_transform=False # Allow boxes to go outside temporarily ... ) >>> # Create BboxParams for cxcywh (center + wh in pixels) >>> bbox_params = BboxParams( ... coord_format='cxcywh', ... label_fields=['class_ids'], ... )

Parameters

NameTypeDefaultDescription
coord_format
One of:
  • 'coco'
  • 'pascal_voc'
  • 'albumentations'
  • 'yolo'
  • 'cxcywh'
--
label_fields
One of:
  • Sequence[Any]
  • None
--
bbox_type
One of:
  • 'hbb'
  • 'obb'
hbb-
min_areafloat0.0-
min_visibilityfloat0.0-
min_widthfloat0.0-
min_heightfloat0.0-
check_each_transformboolTrue-
filter_invalid_bboxesboolFalse-
max_accept_ratio
One of:
  • float
  • None
--
clip_bboxes_on_inputboolFalse-
clip_after_transformboolTrue-

BboxProcessorclass

BboxProcessor(
    params: BboxParams,
    additional_targets: dict[str, str] | None
)

Processor for bounding box transformations. This class handles the preprocessing and postprocessing of bounding boxes during augmentation pipeline, including format conversion, validation, clipping, and filtering. Args: params (BboxParams): Parameters that control bounding box processing. See BboxParams class for details. additional_targets (dict[str, str] | None): Dictionary with additional targets to process. Keys are names of additional targets, values are their types. For example: {'bbox2': 'bboxes'} will handle 'bbox2' as another bounding box target. Default: None. Note: The processing order for bounding boxes is: 1. Convert to albumentations format (normalized pascal_voc) 2. Clip boxes to image boundaries (if params.clip=True) 3. Filter invalid boxes (if params.filter_invalid_bboxes=True) 4. Apply transformations 5. Filter boxes based on min_area, min_visibility, min_width, min_height 6. Convert back to the original format Examples: >>> import albumentations as A >>> # Process COCO format bboxes with class labels >>> params = A.BboxParams( ... format='coco', ... label_fields=['class_labels'], ... min_area=1024, ... min_visibility=0.1 ... ) >>> processor = BboxProcessor(params) >>> >>> # Process multiple bbox fields >>> params = A.BboxParams('pascal_voc') >>> processor = BboxProcessor( ... params, ... additional_targets={'bbox2': 'bboxes'} ... )

Parameters

NameTypeDefaultDescription
paramsBboxParams--
additional_targets
One of:
  • dict[str, str]
  • None
--

normalize_bboxesfunction

normalize_bboxes(
    bboxes: np.ndarray,
    shape: tuple[int, int]
)

Normalize denormalized bounding boxes. Args: bboxes (np.ndarray): Denormalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`. shape (tuple[int, int]): Image shape `(height, width)`. Returns: np.ndarray: Normalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--

obb_to_polygonsfunction

obb_to_polygons(
    bboxes: np.ndarray
)

Convert oriented bounding boxes to corner polygons (vectorized). Same convention as cv2.minAreaRect/cv2.boxPoints for consistency with polygons_to_obb. Base rect corners [-w/2,-h/2], [w/2,-h/2], [w/2,h/2], [-w/2,h/2] rotated by angle and translated to center. Args: bboxes (np.ndarray): Array of shape (N, >=5) where each row is [x_min, y_min, x_max, y_max, angle_deg, ...]. Coordinate-system agnostic. Additional columns beyond the first 5 are preserved but not used. Returns: np.ndarray: Array of shape (N, 4, 2) containing the corner coordinates of each bounding box. Each corner is [x, y] in the same coordinate system as input.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--

polygons_to_obbfunction

polygons_to_obb(
    polygons: np.ndarray,
    extra_fields: np.ndarray | None
)

Fit oriented bbox from corner polygons. Uses cv2.minAreaRect only to get the 4 corners (via boxPoints). From those corners we derive (w, h, angle) with our convention: width = edge more parallel to horizontal, angle in [-90, 90). This ensures obb_to_polygons and cv2.boxPoints produce visually correct results regardless of minAreaRect's internal (w,h,angle) representation. The function is coordinate-system agnostic - it preserves the input coordinate system. Args: polygons: array of shape (N, 4, 2) with corners in any coordinate system. extra_fields: optional array (N, M) to append after bbox coords + angle. Returns: Array of OBB bounding boxes in the same coordinate system as input polygons. Format: [x_min, y_min, x_max, y_max, angle, *extra_fields].

Parameters

NameTypeDefaultDescription
polygonsnp.ndarray--
extra_fields
One of:
  • np.ndarray
  • None
--

denormalize_bboxesfunction

denormalize_bboxes(
    bboxes: np.ndarray,
    shape: tuple[int, int]
)

Denormalize array of bounding boxes. Args: bboxes (np.ndarray): Normalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`. shape (tuple[int, int]): Image shape `(height, width)`. Returns: np.ndarray: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--

calculate_bbox_areas_in_pixelsfunction

calculate_bbox_areas_in_pixels(
    bboxes: np.ndarray,
    shape: tuple[int, int]
)

Calculate areas for multiple bounding boxes. This function computes the areas of bounding boxes given their normalized coordinates and the dimensions of the image they belong to. The bounding boxes are expected to be in the format [x_min, y_min, x_max, y_max] with normalized coordinates (0 to 1). Args: bboxes (np.ndarray): A numpy array of shape (N, 4+) where N is the number of bounding boxes. Each row contains [x_min, y_min, x_max, y_max] in normalized coordinates. Additional columns beyond the first 4 are ignored. shape (tuple[int, int]): A tuple containing the height and width of the image (height, width). Returns: np.ndarray: A 1D numpy array of shape (N,) containing the areas of the bounding boxes in pixels. Returns an empty array if the input `bboxes` is empty. Note: - The function assumes that the input bounding boxes are valid (i.e., x_max > x_min and y_max > y_min). Invalid bounding boxes may result in negative areas. - The function preserves the input array and creates a copy for internal calculations. - The returned areas are in pixel units, not normalized. Examples: >>> bboxes = np.array([[0.1, 0.1, 0.5, 0.5], [0.2, 0.2, 0.8, 0.8]]) >>> image_shape = (100, 100) >>> areas = calculate_bbox_areas(bboxes, image_shape) >>> print(areas) [1600. 3600.]

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--

convert_bboxes_to_albumentationsfunction

convert_bboxes_to_albumentations(
    bboxes: np.ndarray,
    source_format: Literal['coco', 'pascal_voc', 'yolo', 'cxcywh'],
    shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb'],
    check_validity: bool = False
)

Convert bounding boxes from a specified format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in the form of `(x_min, y_min, x_max, y_max)` e.g. `(0.15, 0.27, 0.67, 0.5)`. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). source_format (Literal["coco", "pascal_voc", "yolo", "cxcywh"]): Format of the input bounding boxes. shape (tuple[int, int]): Image shape (height, width). bbox_type (Literal["hbb", "obb"]): Bounding box type; required for cxcywh OBB conversion. check_validity (bool): Check if all boxes are valid boxes. Returns: np.ndarray: An array of bounding boxes in albumentations format with shape (num_bboxes, 4+). Raises: ValueError: If `source_format` is not 'coco', 'pascal_voc', 'yolo' or 'cxcywh'. ValueError: If in YOLO format, any coordinates are not in the range (0, 1].

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
source_format
One of:
  • 'coco'
  • 'pascal_voc'
  • 'yolo'
  • 'cxcywh'
--
shapetuple[int, int]--
bbox_type
One of:
  • 'hbb'
  • 'obb'
--
check_validityboolFalse-

convert_bboxes_from_albumentationsfunction

convert_bboxes_from_albumentations(
    bboxes: np.ndarray,
    target_format: Literal['coco', 'pascal_voc', 'yolo', 'cxcywh'],
    shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb'],
    check_validity: bool = False
)

Convert bounding boxes from the format used by albumentations to a specified format. Args: bboxes (np.ndarray): A numpy array of albumentations bounding boxes with shape (num_bboxes, 4+). The first 4 columns are [x_min, y_min, x_max, y_max]. target_format (Literal["coco", "pascal_voc", "yolo", "cxcywh"]): Required format of the output bounding boxes. shape (tuple[int, int]): Image shape (height, width). check_validity (bool): Check if all boxes are valid boxes. bbox_type (Literal["hbb", "obb"]): Bounding box type; required for cxcywh OBB conversion. Returns: np.ndarray: An array of bounding boxes in the target format with shape (num_bboxes, 4+). Raises: ValueError: If `target_format` is not 'coco', 'pascal_voc', 'yolo' or 'cxcywh'.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
target_format
One of:
  • 'coco'
  • 'pascal_voc'
  • 'yolo'
  • 'cxcywh'
--
shapetuple[int, int]--
bbox_type
One of:
  • 'hbb'
  • 'obb'
--
check_validityboolFalse-

check_bboxesfunction

check_bboxes(
    bboxes: np.ndarray
)

Check if bounding boxes are valid. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). Raises: ValueError: If any bounding box is invalid.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--

clip_bboxesfunction

clip_bboxes(
    bboxes: np.ndarray,
    shape: tuple[int, int]
)

Clip bounding boxes to the image shape. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). shape (tuple[int, int]): The shape of the image (height, width). Returns: np.ndarray: A numpy array of bounding boxes with shape (num_bboxes, 4+).

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--

clip_bboxes_geometryfunction

clip_bboxes_geometry(
    bboxes: np.ndarray,
    shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb']
)

Clip bounding boxes based on actual geometry. This function provides geometry-aware clipping that works correctly for both HBB and OBB: - For HBB: clips (x_min, y_min, x_max, y_max) coordinates to [0, 1] (fast path) - For OBB: clips all 4 rotated corners and returns axis-aligned wrapping box with angle=0 Args: bboxes (np.ndarray): Array of bounding boxes in albumentations format (normalized). Shape: (N, 4+) for HBB or (N, 5+) for OBB. shape (tuple[int, int]): Image shape (height, width). bbox_type (Literal["hbb", "obb"]): Either "hbb" or "obb". Returns: np.ndarray: Clipped bounding boxes. For OBB, returns (N, 5+) with angle set to 0. Note: For HBB, this is equivalent to clip_bboxes() (fast coordinate clipping). For OBB, clips the 4 rotated corners and returns the axis-aligned bounding box that wraps them, with angle set to 0 since the result is axis-aligned. cv2.minAreaRect is NOT used for clipping - only for actual rotations. Examples: >>> # HBB - simple coordinate clipping >>> hbb = np.array([[0.2, 0.3, 1.2, 0.8]]) >>> clipped = clip_bboxes_geometry(hbb, (100, 100), "hbb") >>> # Result: [[0.2, 0.3, 1.0, 0.8]] >>> # OBB - clips corners and returns wrapping HBB with angle=0 >>> obb = np.array([[0.2, 0.3, 1.2, 0.8, 45.0]]) # rotated 45 degrees >>> clipped = clip_bboxes_geometry(obb, (100, 100), "obb") >>> # Result: [[x_min, y_min, x_max, y_max, 0.0]] - angle reset to 0

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--
bbox_type
One of:
  • 'hbb'
  • 'obb'
--

filter_bboxesfunction

filter_bboxes(
    bboxes: np.ndarray,
    shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb'],
    min_area: float = 0.0,
    min_visibility: float = 0.0,
    min_width: float = 1.0,
    min_height: float = 1.0,
    max_accept_ratio: float | None,
    clip_after_transform: bool = True
)

Remove bounding boxes that either lie outside of the visible area by more than min_visibility or whose area in pixels is under the threshold set by `min_area`. Also crops boxes to final image size. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). shape (tuple[int, int]): The shape of the image (height, width). bbox_type (Literal["hbb", "obb"]): Type of bounding boxes. Used for geometry-aware clipping. Required parameter, no default. min_area (float): Minimum area of a bounding box in pixels. Default: 0.0. min_visibility (float): Minimum fraction of area for a bounding box to remain. Default: 0.0. min_width (float): Minimum width of a bounding box in pixels. Default: 0.0. min_height (float): Minimum height of a bounding box in pixels. Default: 0.0. max_accept_ratio (float | None): Maximum allowed aspect ratio, calculated as max(width/height, height/width). Boxes with higher ratios will be filtered out. Default: None. clip_after_transform (bool): If True, clip bounding boxes to image bounds (HBB: coords, OBB: corners). If False, boxes may extend outside [0, 1]. Default: True. Returns: np.ndarray: Filtered bounding boxes.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--
bbox_type
One of:
  • 'hbb'
  • 'obb'
--
min_areafloat0.0-
min_visibilityfloat0.0-
min_widthfloat1.0-
min_heightfloat1.0-
max_accept_ratio
One of:
  • float
  • None
--
clip_after_transformboolTrue-

union_of_bboxesfunction

union_of_bboxes(
    bboxes: np.ndarray,
    erosion_rate: float
)

Calculate union of bounding boxes. Boxes could be in albumentations or Pascal Voc format. Args: bboxes (np.ndarray): List of bounding boxes erosion_rate (float): How much each bounding box can be shrunk, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox lose its volume. Returns: np.ndarray | None: A bounding box `(x_min, y_min, x_max, y_max)` or None if no bboxes are given or if the bounding boxes become invalid after erosion.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
erosion_ratefloat--

bboxes_from_masksfunction

bboxes_from_masks(
    masks: np.ndarray
)

Create bounding boxes from binary masks (fast version) Args: masks (np.ndarray): Binary masks of shape (H, W) or (N, H, W) where N is the number of masks, and H, W are the height and width of each mask. Returns: np.ndarray: An array of bounding boxes with shape (N, 4), where each row is (x_min, y_min, x_max, y_max).

Parameters

NameTypeDefaultDescription
masksnp.ndarray--

masks_from_bboxesfunction

masks_from_bboxes(
    bboxes: np.ndarray,
    shape: tuple[int, int]
)

Convert bounding boxes to masks. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). shape (tuple[int, int]): Image shape (height, width). Returns: np.ndarray: A numpy array of masks with shape (num_bboxes, height, width).

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
shapetuple[int, int]--

bboxes_to_maskfunction

bboxes_to_mask(
    bboxes: np.ndarray,
    image_shape: tuple[int, int]
)

Convert bounding boxes to a single mask. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). image_shape (tuple[int, int]): Image shape (height, width). Returns: np.ndarray: A numpy array of shape (height, width) with 1s where any bounding box is present.

Parameters

NameTypeDefaultDescription
bboxesnp.ndarray--
image_shapetuple[int, int]--

mask_to_bboxesfunction

mask_to_bboxes(
    masks: np.ndarray,
    original_bboxes: np.ndarray,
    bbox_type: Literal['hbb', 'obb']
)

Convert masks back to bounding boxes. Args: masks (np.ndarray): A numpy array of masks with shape (num_masks, height, width). original_bboxes (np.ndarray): Original bounding boxes with shape (num_bboxes, 4+) for HBB or (num_bboxes, 5+) for OBB. bbox_type (Literal["hbb", "obb"]): Type of bounding box - "hbb" for axis-aligned or "obb" for oriented. Default: "hbb". Returns: np.ndarray: A numpy array of bounding boxes with shape (num_masks, 4+) for HBB or (num_masks, 5+) for OBB.

Parameters

NameTypeDefaultDescription
masksnp.ndarray--
original_bboxesnp.ndarray--
bbox_type
One of:
  • 'hbb'
  • 'obb'
--