albumentations.core.bbox_utils
Utilities for handling bounding box operations during image augmentation. This module provides tools for processing bounding boxes in various formats (COCO, Pascal VOC, YOLO), converting between coordinate systems, normalizing and denormalizing coordinates, filtering boxes based on visibility and size criteria, and performing transformations on boxes to match image augmentations. It forms the core functionality for all bounding box-related operations in the albumentations library.
Members
- classBboxParams
- classBboxProcessor
- functionbboxes_from_masks
- functionbboxes_to_mask
- functioncalculate_bbox_areas_in_pixels
- functioncheck_bboxes
- functionclip_bboxes
- functionconvert_bboxes_from_albumentations
- functionconvert_bboxes_to_albumentations
- functiondenormalize_bboxes
- functionfilter_bboxes
- functionmask_to_bboxes
- functionmasks_from_bboxes
- functionnormalize_bboxes
- functionunion_of_bboxes
BboxParamsclass
BboxParams(
format: Literal['coco', 'pascal_voc', 'albumentations', 'yolo'],
label_fields: Sequence[Any] | None = None,
min_area: float = 0.0,
min_visibility: float = 0.0,
min_width: float = 0.0,
min_height: float = 0.0,
check_each_transform: bool = True,
clip: bool = False,
filter_invalid_bboxes: bool = False,
max_accept_ratio: float | None = None
)
Parameters for bounding box transforms.
Parameters
Name | Type | Default | Description |
---|---|---|---|
format | One of:
| - | Format of bounding boxes. Should be one of: - 'coco': [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. - 'pascal_voc': [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. - 'albumentations': like pascal_voc but normalized in [0, 1] range, e.g. [0.2, 0.3, 0.4, 0.5]. - 'yolo': [x_center, y_center, width, height] normalized in [0, 1] range, e.g. [0.1, 0.2, 0.3, 0.4]. |
label_fields | One of:
| None | List of fields that are joined with boxes, e.g., ['class_labels', 'scores']. Default: None. |
min_area | float | 0.0 | Minimum area of a bounding box. All bounding boxes whose visible area in pixels is less than this value will be removed. Default: 0.0. |
min_visibility | float | 0.0 | Minimum fraction of area for a bounding box to remain this box in the result. Should be in [0.0, 1.0] range. Default: 0.0. |
min_width | float | 0.0 | Minimum width of a bounding box in pixels or normalized units. Bounding boxes with width less than this value will be removed. Default: 0.0. |
min_height | float | 0.0 | Minimum height of a bounding box in pixels or normalized units. Bounding boxes with height less than this value will be removed. Default: 0.0. |
check_each_transform | bool | True | If True, performs checks for each dual transform. Default: True. |
clip | bool | False | If True, clips bounding boxes to image boundaries before applying any transform. Default: False. |
filter_invalid_bboxes | bool | False | If True, filters out invalid bounding boxes (e.g., boxes with negative dimensions or boxes where x_max < x_min or y_max < y_min) at the beginning of the pipeline. If clip=True, filtering is applied after clipping. Default: False. |
max_accept_ratio | One of:
| None | Maximum allowed aspect ratio for bounding boxes. The aspect ratio is calculated as max(width/height, height/width), so it's always >= 1. Boxes with aspect ratio greater than this value will be filtered out. For example, if max_accept_ratio=3.0, boxes with width:height or height:width ratios greater than 3:1 will be removed. Set to None to disable aspect ratio filtering. Default: None. |
BboxProcessorclass
BboxProcessor(
params: BboxParams,
additional_targets: dict[str, str] | None = None
)
Processor for bounding box transformations. This class handles the preprocessing and postprocessing of bounding boxes during augmentation pipeline, including format conversion, validation, clipping, and filtering.
Parameters
Name | Type | Default | Description |
---|---|---|---|
params | BboxParams | - | Parameters that control bounding box processing. See BboxParams class for details. |
additional_targets | One of:
| None | Dictionary with additional targets to process. Keys are names of additional targets, values are their types. For example: {'bbox2': 'bboxes'} will handle 'bbox2' as another bounding box target. Default: None. |
bboxes_from_masksfunction
bboxes_from_masks(
masks: np.ndarray
)
Create bounding boxes from binary masks (fast version)
Parameters
Name | Type | Default | Description |
---|---|---|---|
masks | np.ndarray | - | Binary masks of shape (H, W) or (N, H, W) where N is the number of masks, and H, W are the height and width of each mask. |
Returns
- np.ndarray: An array of bounding boxes with shape (N, 4), where each row is
bboxes_to_maskfunction
bboxes_to_mask(
bboxes: np.ndarray,
image_shape: tuple[int, int]
)
Convert bounding boxes to a single mask.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of bounding boxes with shape (num_bboxes, 4+). |
image_shape | tuple[int, int] | - | Image shape (height, width). |
Returns
- np.ndarray: A numpy array of shape (height, width) with 1s where any bounding box is present.
calculate_bbox_areas_in_pixelsfunction
calculate_bbox_areas_in_pixels(
bboxes: np.ndarray,
shape: ShapeType
)
Calculate areas for multiple bounding boxes. This function computes the areas of bounding boxes given their normalized coordinates and the dimensions of the image they belong to. The bounding boxes are expected to be in the format [x_min, y_min, x_max, y_max] with normalized coordinates (0 to 1).
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of shape (N, 4+) where N is the number of bounding boxes. Each row contains [x_min, y_min, x_max, y_max] in normalized coordinates. Additional columns beyond the first 4 are ignored. |
shape | ShapeType | - | A tuple containing the height and width of the image (height, width). |
Returns
- np.ndarray: A 1D numpy array of shape (N,) containing the areas of the bounding boxes in pixels.
Example
>>> bboxes = np.array([[0.1, 0.1, 0.5, 0.5], [0.2, 0.2, 0.8, 0.8]])
>>> image_shape = (100, 100)
>>> areas = calculate_bbox_areas(bboxes, image_shape)
>>> print(areas)
[1600. 3600.]
Notes
- The function assumes that the input bounding boxes are valid (i.e., x_max > x_min and y_max > y_min). Invalid bounding boxes may result in negative areas. - The function preserves the input array and creates a copy for internal calculations. - The returned areas are in pixel units, not normalized.
check_bboxesfunction
check_bboxes(
bboxes: np.ndarray
)
Check if bounding boxes are valid.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of bounding boxes with shape (num_bboxes, 4+). |
clip_bboxesfunction
clip_bboxes(
bboxes: np.ndarray,
shape: ShapeType
)
Clip bounding boxes to the image shape.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of bounding boxes with shape (num_bboxes, 4+). |
shape | ShapeType | - | The shape of the image/volume: - For 2D: {'height': int, 'width': int} - For 3D: {'height': int, 'width': int, 'depth': int} |
Returns
- np.ndarray: A numpy array of bounding boxes with shape (num_bboxes, 4+).
convert_bboxes_from_albumentationsfunction
convert_bboxes_from_albumentations(
bboxes: np.ndarray,
target_format: Literal['coco', 'pascal_voc', 'yolo'],
shape: ShapeType,
check_validity: bool = False
)
Convert bounding boxes from the format used by albumentations to a specified format.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of albumentations bounding boxes with shape (num_bboxes, 4+). The first 4 columns are [x_min, y_min, x_max, y_max]. |
target_format | One of:
| - | Required format of the output bounding boxes. |
shape | ShapeType | - | Image shape (height, width). |
check_validity | bool | False | Check if all boxes are valid boxes. |
Returns
- np.ndarray: An array of bounding boxes in the target format with shape (num_bboxes, 4+).
convert_bboxes_to_albumentationsfunction
convert_bboxes_to_albumentations(
bboxes: np.ndarray,
source_format: Literal['coco', 'pascal_voc', 'yolo'],
shape: ShapeType,
check_validity: bool = False
)
Convert bounding boxes from a specified format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in the form of `(x_min, y_min, x_max, y_max)` e.g. `(0.15, 0.27, 0.67, 0.5)`.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of bounding boxes with shape (num_bboxes, 4+). |
source_format | One of:
| - | Format of the input bounding boxes. |
shape | ShapeType | - | Image shape (height, width). |
check_validity | bool | False | Check if all boxes are valid boxes. |
Returns
- np.ndarray: An array of bounding boxes in albumentations format with shape (num_bboxes, 4+).
denormalize_bboxesfunction
denormalize_bboxes(
bboxes: np.ndarray,
shape: ShapeType | tuple[int, int]
)
Denormalize array of bounding boxes.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | Normalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`. |
shape | One of:
| - | Image shape `(height, width)`. |
Returns
- np.ndarray: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.
filter_bboxesfunction
filter_bboxes(
bboxes: np.ndarray,
shape: ShapeType,
min_area: float = 0.0,
min_visibility: float = 0.0,
min_width: float = 1.0,
min_height: float = 1.0,
max_accept_ratio: float | None = None
)
Remove bounding boxes that either lie outside of the visible area by more than min_visibility or whose area in pixels is under the threshold set by `min_area`. Also crops boxes to final image size.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of bounding boxes with shape (num_bboxes, 4+). |
shape | ShapeType | - | The shape of the image/volume: - For 2D: {'height': int, 'width': int} - For 3D: {'height': int, 'width': int, 'depth': int} |
min_area | float | 0.0 | Minimum area of a bounding box in pixels. Default: 0.0. |
min_visibility | float | 0.0 | Minimum fraction of area for a bounding box to remain. Default: 0.0. |
min_width | float | 1.0 | Minimum width of a bounding box in pixels. Default: 0.0. |
min_height | float | 1.0 | Minimum height of a bounding box in pixels. Default: 0.0. |
max_accept_ratio | One of:
| None | Maximum allowed aspect ratio, calculated as max(width/height, height/width). Boxes with higher ratios will be filtered out. Default: None. |
Returns
- np.ndarray: Filtered bounding boxes.
mask_to_bboxesfunction
mask_to_bboxes(
masks: np.ndarray,
original_bboxes: np.ndarray
)
Convert masks back to bounding boxes.
Parameters
Name | Type | Default | Description |
---|---|---|---|
masks | np.ndarray | - | A numpy array of masks with shape (num_masks, height, width). |
original_bboxes | np.ndarray | - | Original bounding boxes with shape (num_bboxes, 4+). |
Returns
- np.ndarray: A numpy array of bounding boxes with shape (num_masks, 4+).
masks_from_bboxesfunction
masks_from_bboxes(
bboxes: np.ndarray,
shape: ShapeType | tuple[int, int]
)
Convert bounding boxes to masks.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | A numpy array of bounding boxes with shape (num_bboxes, 4+). |
shape | One of:
| - | Image shape (height, width). |
Returns
- np.ndarray: A numpy array of masks with shape (num_bboxes, height, width).
normalize_bboxesfunction
normalize_bboxes(
bboxes: np.ndarray,
shape: ShapeType | tuple[int, int]
)
Normalize array of bounding boxes.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | Denormalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`. |
shape | One of:
| - | Image shape `(height, width)`. |
Returns
- np.ndarray: Normalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.
union_of_bboxesfunction
union_of_bboxes(
bboxes: np.ndarray,
erosion_rate: float
)
Calculate union of bounding boxes. Boxes could be in albumentations or Pascal Voc format.
Parameters
Name | Type | Default | Description |
---|---|---|---|
bboxes | np.ndarray | - | List of bounding boxes |
erosion_rate | float | - | How much each bounding box can be shrunk, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox lose its volume. |
Returns
- np.ndarray | None: A bounding box `(x_min, y_min, x_max, y_max)` or None if no bboxes are given or if