Stay updated

albumentations.augmentations.geometric.functional

Functional implementations of geometric image transformations. This module provides low-level functions for geometric operations such as rotation, resizing, flipping, perspective transforms, and affine transformations on images, bounding boxes and keypoints.

Members

functionresize_bboxes
functionbboxes_rot90
functionbboxes_d4
functionkeypoints_rot90
functionkeypoints_d4
functionresize
functionresize_pyvips
functionresize_pil
functionscale
functionkeypoints_scale
functionperspective
functionperspective_images
functionperspective_bboxes
functionrotation2d_matrix_to_euler_angles
functionperspective_keypoints
functionis_identity_matrix
functionkeypoints_affine
functionapply_affine_to_points
functioncalculate_affine_transform_padding
functionbboxes_affine_largest_box
functionbboxes_affine_ellipse
functionbboxes_affine
functionto_distance_maps
functionvalidate_if_not_found_coords
functionfrom_distance_maps
functiond4
functiontranspose
functiontranspose_images
functiontranspose_volumes
functionrot90
functionrot90_images
functionbboxes_vflip
functionbboxes_hflip
functionbboxes_transpose
functionkeypoints_vflip
functionkeypoints_hflip
functionkeypoints_transpose
functionpad
functionpad_with_params
functionpad_images_with_params
functionremap_keypoints_via_mask
functionremap_keypoints
functiongenerate_inverse_distortion_map
functionupscale_distortion_maps
functionremap_bboxes
functiongenerate_displacement_fields
functionpad_bboxes
functionvalidate_bboxes
functionshift_bboxes
functionget_pad_grid_dimensions
functiongenerate_reflected_bboxes
functionflip_bboxes
functiondistort_image
functionbbox_distort_image
functiondistort_image_keypoints
functiongenerate_distorted_grid_polygons
functionpad_keypoints
functionvalidate_keypoints
functionshift_keypoints
functiongenerate_reflected_keypoints
functionflip_keypoints
functioncreate_affine_transformation_matrix
functioncompute_transformed_image_bounds
functioncompute_affine_warp_output_shape
functioncenter
functioncenter_bbox
functiongenerate_grid
functionnormalize_grid_distortion_steps
functionalmost_equal_intervals
functiongenerate_shuffled_splits
functionsplit_uniform_grid
functiongenerate_perspective_points
functionorder_points
functioncompute_perspective_params
functionexpand_transform
functioncreate_piecewise_affine_maps
functionbboxes_piecewise_affine
functionget_dimension_padding
functionget_padding_params
functionadjust_padding_by_position
functionswap_tiles_on_keypoints
functionswap_tiles_on_image
functionis_valid_component
functionbboxes_grid_shuffle
functioncreate_shape_groups
functionshuffle_tiles_within_shape_groups
functioncompute_pairwise_distances
functioncompute_tps_weights
functiontps_transform
functionget_camera_matrix_distortion_maps
functionget_fisheye_distortion_maps
functiongenerate_control_points
functionhflip_images
functionvflip_images
functionhflip_volumes
functionvflip_volumes
functionrot90_volumes
functionerode
functiondilate
functionmorphology
functionbboxes_morphology
functiond4_images

resize_bboxesfunction

resize_bboxes(
    bboxes: np.ndarray,
    image_shape: tuple[int, int],
    output_shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb']
)

Resize bounding boxes according to image scaling. Params: image_shape, output_shape, bbox_type (hbb/obb). Normalized coords; OBB supports non-uniform scale. Args: bboxes (np.ndarray): Array of bboxes in normalized coords [x_min, y_min, x_max, y_max, (angle), ...] image_shape (tuple[int, int]): Original image shape (height, width) output_shape (tuple[int, int]): Target image shape (height, width) bbox_type (Literal['hbb', 'obb']): Type of bboxes - "hbb" or "obb" Returns: np.ndarray: Resized bboxes in normalized coordinates.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
image_shape	tuple[int, int]	-	-
output_shape	tuple[int, int]	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

bboxes_rot90function

bboxes_rot90(
    bboxes: np.ndarray,
    group_element: Literal['e', 'r90', 'r180', 'r270'],
    bbox_type: Literal['hbb', 'obb']
)

Rotate bounding boxes by 90° CCW (see np.rot90). group_element: e, r90, r180, r270. Supports hbb and obb; OBB center/size/angle updated correctly. Args: bboxes (np.ndarray): Array of bounding boxes with shape (num_boxes, 4+) group_element (Literal['e', 'r90', 'r180', 'r270']): C4 group element to apply. bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB uses center/size/angle update. Returns: np.ndarray: Rotated bounding boxes

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
group_element	One of: 'e' 'r90' 'r180' 'r270'	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

bboxes_d4function

bboxes_d4(
    bboxes: np.ndarray,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'],
    bbox_type: Literal['hbb', 'obb']
)

Apply D4 symmetry (rotations and reflections) to bounding boxes. group_member: e, r90, r180, r270, v, hvt, h, t. Supports hbb and obb. The function transforms a bounding box according to the specified group member from the `D_4` group. These transformations include rotations and reflections, specified to work on an image's bounding box given its dimensions. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...). group_member (Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']): A string identifier for the `D_4` group transformation to apply. bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB uses center/size/angle update. Returns: np.ndarray: The transformed bounding box. Raises: ValueError: If an invalid group member is specified.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

keypoints_rot90function

keypoints_rot90(
    keypoints: np.ndarray,
    group_element: Literal['e', 'r90', 'r180', 'r270'],
    image_shape: tuple[int, int]
)

Rotate keypoints by 90° CCW a specified number of times. group_element: e, r90, r180, r270. Updates x, y, angle; image_shape for pixel coords. Args: keypoints (np.ndarray): An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...). group_element (Literal['e', 'r90', 'r180', 'r270']): C4 group element to apply. image_shape (tuple[int, int]): The shape of the image (height, width). Returns: np.ndarray: The rotated keypoints with the same shape as the input.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
group_element	One of: 'e' 'r90' 'r180' 'r270'	-	-
image_shape	tuple[int, int]	-	-

keypoints_d4function

keypoints_d4(
    keypoints: np.ndarray,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'],
    image_shape: tuple[int, int],
    **params: Any
)

Apply D4 symmetry (rotations and reflections) to keypoints. group_member: e, r90, r180, r270, v, hvt, h, t. image_shape for pixel coords. This function adjusts a keypoint's coordinates according to the specified `D_4` group transformation, which includes rotations and reflections suitable for image processing tasks. These transformations account for the dimensions of the image to ensure the keypoint remains within its boundaries. Args: keypoints (np.ndarray): An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...). group_member (Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']): A string identifier for the `D_4` group transformation to apply. Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hv', 'h', 't'. image_shape (tuple[int, int]): The shape of the image. params (Any): Not used. Returns: np.ndarray: The transformed keypoint. Raises: ValueError: If an invalid group member is specified, indicating that the specified transformation does not exist.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	-
image_shape	tuple[int, int]	-	-
**params	Any	-	-

resizefunction

resize(
    img: ImageType,
    target_shape: tuple[int, int],
    interpolation: int
)

Resize an image to the specified target shape using the backend chosen via the ALBUMENTATIONS_RESIZE environment variable. If the image is already the target size, it is returned unchanged. Args: img (ImageType): Input image. target_shape (tuple[int, int]): Target (height, width) dimensions. interpolation (int): Interpolation method. Returns: np.ndarray: Resized image with shape target_shape + original channel dimensions. Raises: NotImplementedError: If the selected backend is not supported.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
target_shape	tuple[int, int]	-	-
interpolation	int	-	-

resize_pyvipsfunction

resize_pyvips(
    img: ImageType,
    target_shape: tuple[int, int],
    interpolation: int = 1
)

Resize an image to target shape using pyvips. Params: target_shape, interpolation (0=nearest, 1=bilinear, 2=bicubic). Returns same dtype. This function resizes an input image to the target shape using the specified interpolation method. Args: img (ImageType): The input image as a NumPy array. target_shape (tuple[int, int]): The desired output shape (height, width). interpolation (int): The interpolation method to use. 0: Nearest-neighbor 1: Bilinear 2: Bicubic Returns: np.ndarray: The resized image as a NumPy array with the original dtype.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
target_shape	tuple[int, int]	-	-
interpolation	int	1	-

resize_pilfunction

resize_pil(
    img: ImageType,
    target_shape: tuple[int, int],
    interpolation: int
)

Resize an image using PIL. target_shape (H, W), interpolation (cv2 flag mapped to PIL). Handles grayscale, RGB, RGBA, and multi-channel. This function resizes an input image to the target shape using the specified interpolation method. Args: img (ImageType): The input image as a NumPy array. target_shape (tuple[int, int]): The desired output shape (height, width). interpolation (int): The cv2 interpolation flag that will be mapped to PIL interpolation. Maps cv2 constants to PIL.Image.Resampling constants. Returns: np.ndarray: The resized image as a NumPy array.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
target_shape	tuple[int, int]	-	-
interpolation	int	-	-

scalefunction

scale(
    img: ImageType,
    scale: float,
    interpolation: int
)

Scale an image by a factor while preserving aspect ratio. scale > 1 enlarges, scale < 1 shrinks. interpolation: cv2 flag. Calls resize internally. This function scales both height and width dimensions of the image by the same factor. Args: img (ImageType): Input image to scale. scale (float): Scale factor. Values > 1 will enlarge the image, values < 1 will shrink it. interpolation (int): Interpolation method to use (cv2 interpolation flag). Returns: ImageType: Scaled image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
scale	float	-	-
interpolation	int	-	-

keypoints_scalefunction

keypoints_scale(
    keypoints: np.ndarray,
    scale_x: float,
    scale_y: float
)

Scale keypoint x and y by scale_x and scale_y. Use when mapping keypoints after resize or crop. Angle and other extra columns are unchanged. Args: keypoints (np.ndarray): Array of keypoints with shape (num_keypoints, 2+) scale_x (float): Scale factor for x coordinates scale_y (float): Scale factor for y coordinates Returns: np.ndarray: Scaled keypoints

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
scale_x	float	-	-
scale_y	float	-	-

perspectivefunction

perspective(
    img: ImageType,
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    border_val: float | list[float] | np.ndarray,
    border_mode: int,
    keep_size: bool,
    interpolation: int
)

Apply perspective transformation to an image. matrix (3x3), interpolation, border_mode. Same shape or keep_size. For Perspective transform. This function warps an image according to a perspective transformation matrix. It can either maintain the original dimensions or use the specified max dimensions. Args: img (ImageType): Input image to transform. matrix (np.ndarray): 3x3 perspective transformation matrix. max_width (int): Maximum width of the output image if keep_size is False. max_height (int): Maximum height of the output image if keep_size is False. border_val (float | list[float] | np.ndarray): Border value(s) to fill areas outside the transformed image. border_mode (int): OpenCV border mode (e.g., cv2.BORDER_CONSTANT, cv2.BORDER_REFLECT). keep_size (bool): If True, maintain the original image dimensions. interpolation (int): Interpolation method for resampling (cv2 interpolation flag). Returns: np.ndarray: Perspective-transformed image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
matrix	np.ndarray	-	-
max_width	int	-	-
max_height	int	-	-
border_val	One of: float list[float] np.ndarray	-	-
border_mode	int	-	-
keep_size	bool	-	-
interpolation	int	-	-

perspective_imagesfunction

perspective_images(
    images: np.ndarray,
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    border_val: float | list[float] | np.ndarray,
    border_mode: int,
    keep_size: bool,
    interpolation: int
)

Apply perspective transformation to a batch of images (N, H, W, C). matrix, keep_size, border_val, interpolation. Single warp when grayscale and small. Args: images (np.ndarray): Batch of images of shape (N, H, W, C). matrix (np.ndarray): 3x3 perspective transformation matrix. max_width (int): Maximum width of the output image if keep_size is False. max_height (int): Maximum height of the output image if keep_size is False. border_val (float | list[float] | np.ndarray): Border value(s) to fill areas outside the transformed image. border_mode (int): OpenCV border mode (e.g., cv2.BORDER_CONSTANT). keep_size (bool): If True, maintain the original image dimensions. interpolation (int): Interpolation method for resampling (cv2 interpolation flag). Returns: np.ndarray: Batch of perspective-transformed images with the same shape as input when keep_size is True, or (N, max_height, max_width, C) when False.

Parameters

Name	Type	Default	Description
images	np.ndarray	-	-
matrix	np.ndarray	-	-
max_width	int	-	-
max_height	int	-	-
border_val	One of: float list[float] np.ndarray	-	-
border_mode	int	-	-
keep_size	bool	-	-
interpolation	int	-	-

perspective_bboxesfunction

perspective_bboxes(
    bboxes: np.ndarray,
    image_shape: tuple[int, int],
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    keep_size: bool,
    bbox_type: Literal['hbb', 'obb']
)

Apply perspective transformation to bounding boxes. matrix, image_shape, max_width, max_height, keep_size. HBB and OBB supported; OBB via corners. This function transforms bounding boxes using the given perspective transformation matrix. It handles bounding boxes with additional attributes beyond the standard coordinates. Args: bboxes (np.ndarray): An array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...). Additional columns beyond the first 4 are preserved unchanged. image_shape (tuple[int, int]): The shape of the image (height, width). matrix (np.ndarray): The perspective transformation matrix. max_width (int): The maximum width of the output image. max_height (int): The maximum height of the output image. keep_size (bool): If True, maintains the original image size after transformation. bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB path uses polygons. Returns: np.ndarray: An array of transformed bounding boxes with the same shape as input. The first 4 columns contain the transformed coordinates, and any additional columns are preserved from the input. Note: - This function modifies only the coordinate columns (first 4) of the input bounding boxes. - Any additional attributes (columns beyond the first 4) are kept unchanged. - The function handles denormalization and renormalization of coordinates internally. Examples: >>> bboxes = np.array([[0.1, 0.1, 0.3, 0.3, 1], [0.5, 0.5, 0.8, 0.8, 2]]) >>> image_shape = (100, 100) >>> matrix = np.array([[1.5, 0.2, -20], [-0.1, 1.3, -10], [0.002, 0.001, 1]]) >>> transformed_bboxes = perspective_bboxes(bboxes, image_shape, matrix, 150, 150, False)

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
image_shape	tuple[int, int]	-	-
matrix	np.ndarray	-	-
max_width	int	-	-
max_height	int	-	-
keep_size	bool	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

rotation2d_matrix_to_euler_anglesfunction

rotation2d_matrix_to_euler_angles(
    matrix: np.ndarray,
    y_up: bool
)

Extract rotation angle from 2D rotation matrix. y_up: True if Y axis points up. Returns angle in radians. For perspective_keypoints angle update. Args: matrix (np.ndarray): 2x2 or 3x3 rotation matrix. y_up (bool): True if Y axis points up. Returns: float: Rotation angle in radians.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
y_up	bool	-	-

perspective_keypointsfunction

perspective_keypoints(
    keypoints: np.ndarray,
    image_shape: tuple[int, int],
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    keep_size: bool
)

Apply perspective transformation to keypoints. matrix, image_shape, max_width, max_height, keep_size. Updates x, y, angle, scale. Args: keypoints (np.ndarray): Array of shape (N, 5+) in format [x, y, z, angle, scale, ...] image_shape (tuple[int, int]): Original image shape (height, width) matrix (np.ndarray): 3x3 perspective transformation matrix max_width (int): Maximum width after transformation max_height (int): Maximum height after transformation keep_size (bool): Whether to keep original size Returns: np.ndarray: Transformed keypoints array with same shape as input

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
image_shape	tuple[int, int]	-	-
matrix	np.ndarray	-	-
max_width	int	-	-
max_height	int	-	-
keep_size	bool	-	-

is_identity_matrixfunction

is_identity_matrix(
    matrix: np.ndarray
)

Check if the given matrix is an identity matrix (3x3). For skipping no-op affine. Returns True if np.allclose(matrix, eye(3)). Args: matrix (np.ndarray): A 3x3 affine transformation matrix. Returns: bool: True if the matrix is an identity matrix, False otherwise.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-

keypoints_affinefunction

keypoints_affine(
    keypoints: np.ndarray,
    matrix: np.ndarray,
    image_shape: tuple[int, int],
    scale: dict[str, float],
    border_mode: int
)

Apply affine transformation to keypoints. matrix, image_shape, scale dict, border_mode. Updates coordinates, angles, and scales; handles reflection. This function transforms keypoints using the given affine transformation matrix. It handles reflection padding if necessary, updates coordinates, angles, and scales. Args: keypoints (np.ndarray): Array of keypoints with shape (N, 4+) where N is the number of keypoints. Each keypoint is represented as [x, y, angle, scale, ...]. matrix (np.ndarray): The 2x3 or 3x3 affine transformation matrix. image_shape (tuple[int, int]): Shape of the image (height, width). scale (dict[str, float]): Dictionary containing scale factors for x and y directions. Expected keys are 'x' and 'y'. border_mode (int): Border mode for handling keypoints near image edges. Use cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT, etc. Returns: np.ndarray: Transformed keypoints array with the same shape as input. Notes: - The function applies reflection padding if the mode is in REFLECT_BORDER_MODES. - Coordinates (x, y) are transformed using the affine matrix. - Angles are adjusted based on the rotation component of the affine transformation. - Scales are multiplied by the maximum of x and y scale factors. - The @angle_2pi_range decorator ensures angles remain in the [0, 2π] range. Examples: >>> keypoints = np.array([[100, 100, 0, 1]]) >>> matrix = np.array([[1.5, 0, 10], [0, 1.2, 20]]) >>> scale = {'x': 1.5, 'y': 1.2} >>> transformed_keypoints = keypoints_affine(keypoints, matrix, (480, 640), scale, cv2.BORDER_REFLECT_101)

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
matrix	np.ndarray	-	-
image_shape	tuple[int, int]	-	-
scale	dict[str, float]	-	-
border_mode	int	-	-

apply_affine_to_pointsfunction

apply_affine_to_points(
    points: np.ndarray,
    matrix: np.ndarray
)

Apply affine transformation to a set of (x, y) points. matrix (2x3 or 3x3); points shape (N, 2). Returns transformed points. This function handles potential division by zero by replacing zero values in the homogeneous coordinate with a small epsilon value. Args: points (np.ndarray): Array of points with shape (N, 2). matrix (np.ndarray): 3x3 affine transformation matrix. Returns: np.ndarray: Transformed points with shape (N, 2).

Parameters

Name	Type	Default	Description
points	np.ndarray	-	-
matrix	np.ndarray	-	-

calculate_affine_transform_paddingfunction

calculate_affine_transform_padding(
    matrix: np.ndarray,
    image_shape: tuple[int, int]
)

Calculate padding for affine transformation to avoid empty/cropped regions. Returns (pad_top, pad_bottom, pad_left, pad_right) from inverse affine corners.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

bboxes_affine_largest_boxfunction

bboxes_affine_largest_box(
    bboxes: np.ndarray,
    matrix: np.ndarray,
    bbox_type: Literal['hbb', 'obb']
)

Apply affine to bboxes and return largest enclosing axis-aligned boxes. matrix, image_shape, border_mode. For hbb type. Returns (N, 4+). This function transforms each corner of every bounding box using the given affine transformation matrix, then computes the new bounding boxes that fully enclose the transformed corners. Args: bboxes (np.ndarray): An array of bounding boxes with shape (N, 4+) where N is the number of bounding boxes. Each row should contain [x_min, y_min, x_max, y_max] followed by any additional attributes (e.g., class labels). matrix (np.ndarray): The 3x3 affine transformation matrix to apply. bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB path uses polygon transform. Returns: np.ndarray: An array of transformed bounding boxes with the same shape as the input. Each row contains [new_x_min, new_y_min, new_x_max, new_y_max] followed by any additional attributes from the input bounding boxes. Note: - This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max]. - The resulting bounding boxes are the smallest axis-aligned boxes that completely enclose the transformed original boxes. They may be larger than the minimal possible bounding box if the original box becomes rotated. - Any additional attributes beyond the first 4 coordinates are preserved unchanged. - This method is called "largest box" because it returns the largest axis-aligned box that encloses all corners of the transformed bounding box. Examples: >>> bboxes = np.array([[10, 10, 20, 20, 1], [30, 30, 40, 40, 2]]) # Two boxes with class labels >>> matrix = np.array([[2, 0, 5], [0, 2, 5], [0, 0, 1]]) # Scale by 2 and translate by (5, 5) >>> transformed_bboxes = bboxes_affine_largest_box(bboxes, matrix) >>> print(transformed_bboxes) [[ 25. 25. 45. 45. 1.] [ 65. 65. 85. 85. 2.]]

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
matrix	np.ndarray	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

bboxes_affine_ellipsefunction

bboxes_affine_ellipse(
    bboxes: np.ndarray,
    matrix: np.ndarray,
    bbox_type: Literal['hbb', 'obb']
)

Apply affine to bboxes via ellipse approximation (center, axes, angle). matrix, image_shape, border_mode. For obb type. Returns (N, 5+). This function transforms bounding boxes by approximating each box with an ellipse, transforming points along the ellipse's circumference, and then computing the new bounding box that encloses the transformed ellipse. Args: bboxes (np.ndarray): An array of bounding boxes with shape (N, 4+) where N is the number of bounding boxes. Each row should contain [x_min, y_min, x_max, y_max] followed by any additional attributes (e.g., class labels). matrix (np.ndarray): The 3x3 affine transformation matrix to apply. bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB path uses polygon transform. Returns: np.ndarray: An array of transformed bounding boxes with the same shape as the input. Each row contains [new_x_min, new_y_min, new_x_max, new_y_max] followed by any additional attributes from the input bounding boxes. Note: - This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max]. - The ellipse approximation method can provide a tighter bounding box compared to the largest box method, especially for rotations. - 360 points are used to approximate each ellipse, which provides a good balance between accuracy and computational efficiency. - Any additional attributes beyond the first 4 coordinates are preserved unchanged. - This method may be more suitable for objects that are roughly elliptical in shape.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
matrix	np.ndarray	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

bboxes_affinefunction

bboxes_affine(
    bboxes: np.ndarray,
    matrix: np.ndarray,
    rotate_method: Literal['largest_box', 'ellipse'],
    image_shape: tuple[int, int],
    border_mode: int,
    output_shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb']
)

Apply affine transformation to bounding boxes. matrix, image_shape, border_mode. Dispatches to largest-box (hbb) or ellipse (obb). For reflection border modes (cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT), this function: 1. Calculates necessary padding to avoid information loss 2. Applies padding to the bounding boxes 3. Adjusts the transformation matrix to account for padding 4. Applies the affine transformation 5. Validates the transformed bounding boxes For other border modes, it directly applies the affine transformation without padding. Args: bboxes (np.ndarray): Input bounding boxes matrix (np.ndarray): Affine transformation matrix rotate_method (Literal['largest_box', 'ellipse']): Method for rotating bounding boxes ('largest_box' or 'ellipse'). Only applies to HBB (axis-aligned) bounding boxes. Ignored for OBB. image_shape (tuple[int, int]): Shape of the input image border_mode (int): OpenCV border mode output_shape (tuple[int, int]): Shape of the output image bbox_type (Literal['hbb', 'obb']): Bounding box type. OBB uses polygon transformation regardless of rotate_method. Returns: np.ndarray: Transformed and normalized bounding boxes

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
matrix	np.ndarray	-	-
rotate_method	One of: 'largest_box' 'ellipse'	-	-
image_shape	tuple[int, int]	-	-
border_mode	int	-	-
output_shape	tuple[int, int]	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

to_distance_mapsfunction

to_distance_maps(
    keypoints: np.ndarray,
    image_shape: tuple[int, int],
    inverted: bool = False
)

Generate (H,W,N) array of Euclidean distance maps to N keypoints. Helper for image-only augmentations that need keypoint info. Args: keypoints (np.ndarray): A numpy array of shape (N, 2+) where N is the number of keypoints. Each row represents a keypoint's (x, y) coordinates. image_shape (tuple[int, int]): Shape of the image (height, width) inverted (bool): If `True`, inverted distance maps are returned where each distance value d is replaced by `d/(d+1)`, i.e. the distance maps have values in the range `(0.0, 1.0]` with `1.0` denoting exactly the position of the respective keypoint. Returns: np.ndarray: A float32 array of shape (H, W, N) containing `N` distance maps for `N` keypoints. Each location `(y, x, n)` in the array denotes the euclidean distance at `(y, x)` to the `n`-th keypoint. If `inverted` is `True`, the distance `d` is replaced by `d/(d+1)`. The height and width of the array match the height and width in `image_shape`.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
image_shape	tuple[int, int]	-	-
inverted	bool	False	-

validate_if_not_found_coordsfunction

validate_if_not_found_coords(
    if_not_found_coords: Sequence[int] | dict[str, Any] | None
)

Validate and process if_not_found_coords parameter for keypoint transforms. Returns (fill_value, replace_mask). Raises on invalid input.

Parameters

Name	Type	Default	Description
if_not_found_coords	One of: Sequence[int] dict[str, Any] None	-	-

from_distance_mapsfunction

from_distance_maps(
    distance_maps: np.ndarray,
    inverted: bool,
    if_not_found_coords: Sequence[int] | dict[str, Any] | None,
    threshold: float | None
)

Convert distance maps (H, W, N) back to keypoint coordinates. Finds peaks; inverted=False: min distance = keypoint. Inverse of to_distance_maps. This function is the inverse of `to_distance_maps`. It takes distance maps generated for a set of keypoints and reconstructs the original keypoint coordinates. The function supports both regular and inverted distance maps, and can handle cases where keypoints are not found or fall outside a specified threshold. Args: distance_maps (np.ndarray): A 3D numpy array of shape (height, width, nb_keypoints) containing distance maps for each keypoint. Each channel represents the distance map for one keypoint. inverted (bool): If True, treats the distance maps as inverted (where higher values indicate closer proximity to keypoints). If False, treats them as regular distance maps (where lower values indicate closer proximity). if_not_found_coords (Sequence[int] | dict[str, Any] | None, optional): Coordinates to use for keypoints that are not found or fall outside the threshold. Can be: - None: Drop keypoints that are not found. - Sequence of two integers: Use these as (x, y) coordinates for not found keypoints. - Dict with 'x' and 'y' keys: Use these values for not found keypoints. Defaults to None. threshold (float | None, optional): A threshold value to determine valid keypoints. For inverted maps, values >= threshold are considered valid. For regular maps, values <= threshold are considered valid. If None, all keypoints are considered valid. Defaults to None. Returns: np.ndarray: A 2D numpy array of shape (nb_keypoints, 2) containing the (x, y) coordinates of the reconstructed keypoints. If `drop_if_not_found` is True (derived from if_not_found_coords), the output may have fewer rows than input keypoints. Raises: ValueError: If the input `distance_maps` is not a 3D array. Notes: - The function uses vectorized operations for improved performance, especially with large numbers of keypoints. - When `threshold` is None, all keypoints are considered valid, and `if_not_found_coords` is not used. - The function assumes that the input distance maps are properly normalized and scaled according to the original image dimensions. Examples: >>> distance_maps = np.random.rand(100, 100, 3) # 3 keypoints >>> inverted = True >>> if_not_found_coords = [0, 0] >>> threshold = 0.5 >>> keypoints = from_distance_maps(distance_maps, inverted, if_not_found_coords, threshold) >>> print(keypoints.shape) (3, 2)

Parameters

Name	Type	Default	Description
distance_maps	np.ndarray	-	-
inverted	bool	-	-
if_not_found_coords	One of: Sequence[int] dict[str, Any] None	-	-
threshold	One of: float None	-	-

d4function

d4(
    img: ImageType,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']
)

Apply D4 symmetry (rotations and reflections) to an image. group_member: e, r90, r180, r270, v, hvt, h, t. Square input; same shape output. This function manipulates an image using transformations such as rotations and flips, corresponding to the `D_4` dihedral group symmetry operations. Each transformation is identified by a unique group member code. Args: img (ImageType): The input image array to transform. group_member (Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']): A string identifier indicating the specific transformation to apply. Valid codes include: - 'e': Identity (no transformation). - 'r90': Rotate 90 degrees counterclockwise. - 'r180': Rotate 180 degrees. - 'r270': Rotate 270 degrees counterclockwise. - 'v': Vertical flip. - 'hvt': Transpose over second diagonal - 'h': Horizontal flip. - 't': Transpose (reflect over the main diagonal). Returns: ImageType: The transformed image array.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	-

transposefunction

transpose(
    img: ImageType
)

Transpose the first two dimensions (H, W) of an array. (H, W, ...) -> (W, H, ...). Retains the order of any additional dimensions. For image transpose. Args: img (ImageType): Input array. Returns: ImageType: Transposed array.

Parameters

Name	Type	Default	Description
img	ImageType	-	-

transpose_imagesfunction

transpose_images(
    images: ImageType
)

Transpose a batch of images (N, H, W, C). Swaps H and W per image. Same as transpose on each image along axes 0, 1. Returns same shape. Args: images (ImageType): Batch of images to transpose with shape: - (N, H, W) for grayscale images - (N, H, W, C) for multi-channel images where N is the batch size, H is height, W is width, C is channels Returns: ImageType: Transposed batch of images with shape: - (N, W, H) for grayscale images - (N, W, H, C) for multi-channel images

Parameters

Name	Type	Default	Description
images	ImageType	-	-

transpose_volumesfunction

transpose_volumes(
    volumes: np.ndarray
)

Transpose a batch of volumes (N, D, H, W, C). Swaps D and H per volume. Same as transpose on each volume along axes 0, 1. Args: volumes (np.ndarray): Batch of volumes to transpose with shape: - (N, D, H, W) for grayscale volumes - (N, D, H, W, C) for multi-channel volumes where N is the batch size, D is depth, H is height, W is width, C is channels Returns: np.ndarray: Transposed batch of volumes with shape: - (N, D, W, H) for grayscale volumes - (N, D, W, H, C) for multi-channel volumes

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	-

rot90function

rot90(
    img: ImageType,
    group_element: Literal['e', 'r90', 'r180', 'r270']
)

Rotate image 90° counterclockwise. group_element: e, r90, r180, r270. Same as np.rot90. Use for D4-style augmentation. Same dtype and shape. Args: img (ImageType): The input image to rotate. group_element (Literal['e', 'r90', 'r180', 'r270']): C4 group element to apply. Returns: ImageType: The rotated image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
group_element	One of: 'e' 'r90' 'r180' 'r270'	-	-

rot90_imagesfunction

rot90_images(
    images: ImageType,
    group_element: Literal['e', 'r90', 'r180', 'r270']
)

Rotate a batch of images 90° CCW. k per image or single k. Same as rot90 on each image. Shape (N, H, W, C) preserved. Returns same dtype. Args: images (ImageType): Batch of images to rotate with shape: - (N, H, W) for grayscale images - (N, H, W, C) for multi-channel images where N is the batch size, H is height, W is width, C is channels group_element (Literal['e', 'r90', 'r180', 'r270']): C4 group element to apply. Returns: ImageType: Rotated batch of images with shape: - (N, W, H) for grayscale images when group_element is r90 or r270 - (N, H, W) for grayscale images when group_element is e or r180 - (N, W, H, C) for multi-channel images when group_element is r90 or r270 - (N, H, W, C) for multi-channel images when group_element is e or r180

Parameters

Name	Type	Default	Description
images	ImageType	-	-
group_element	One of: 'e' 'r90' 'r180' 'r270'	-	-

bboxes_vflipfunction

bboxes_vflip(
    bboxes: np.ndarray,
    bbox_type: Literal['hbb', 'obb']
)

Flip bounding boxes vertically. Normalized coords; y_min, y_max swapped. Supports hbb and obb (angle adjusted). For VerticalFlip. Args: bboxes (np.ndarray): Array of bounding boxes with shape (num_boxes, 4+) bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB uses center/size/angle update. Returns: np.ndarray: Vertically flipped bounding boxes

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

bboxes_hflipfunction

bboxes_hflip(
    bboxes: np.ndarray,
    bbox_type: Literal['hbb', 'obb']
)

Flip bounding boxes horizontally. Normalized coords; x_min, x_max swapped. Supports hbb and obb (angle adjusted). For HorizontalFlip. Args: bboxes (np.ndarray): Array of bounding boxes with shape (num_boxes, 4+) bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB uses center/size/angle update. Returns: np.ndarray: Horizontally flipped bounding boxes

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

bboxes_transposefunction

bboxes_transpose(
    bboxes: np.ndarray,
    bbox_type: Literal['hbb', 'obb']
)

Transpose bounding boxes along the main diagonal. Swap x and y coords; for obb angle updated. Normalized coords. For Transpose transform. Args: bboxes (np.ndarray): Array of bounding boxes with shape (num_boxes, 4+) bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB uses center/size/angle update. Returns: np.ndarray: Transposed bounding boxes

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

keypoints_vflipfunction

keypoints_vflip(
    keypoints: np.ndarray,
    rows: int
)

Flip keypoints vertically. image_shape for pixel coords; y -> height-1-y. Angle and scale preserved. For VerticalFlip transform. Args: keypoints (np.ndarray): Array of keypoints with shape (num_keypoints, 2+) rows (int): Number of rows in the image Returns: np.ndarray: Vertically flipped keypoints

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
rows	int	-	-

keypoints_hflipfunction

keypoints_hflip(
    keypoints: np.ndarray,
    cols: int
)

Flip keypoints horizontally. image_shape for pixel coords; x -> width-1-x. Angle and scale preserved. For HorizontalFlip. Args: keypoints (np.ndarray): Array of keypoints with shape (num_keypoints, 2+) cols (int): Number of columns in the image Returns: np.ndarray: Horizontally flipped keypoints

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
cols	int	-	-

keypoints_transposefunction

keypoints_transpose(
    keypoints: np.ndarray
)

Transpose keypoints along the main diagonal. Swap x, y; image_shape for pixel coords. Angle updated. For Transpose transform. Args: keypoints (np.ndarray): Array of keypoints with shape (num_keypoints, 2+) Returns: np.ndarray: Transposed keypoints

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-

padfunction

pad(
    img: ImageType,
    min_height: int,
    min_width: int,
    border_mode: int,
    value: tuple[float, ...] | float | None
)

Pad an image to ensure minimum height and width. Params: min_height, min_width, border_mode, fill. Pads on right/bottom if needed. This function adds padding to an image if its dimensions are smaller than the specified minimum dimensions. Padding is added evenly on all sides. Args: img (ImageType): Input image to pad. min_height (int): Minimum height of the output image. min_width (int): Minimum width of the output image. border_mode (int): OpenCV border mode for padding. value (tuple[float, ...] | float | None): Value(s) to fill the border pixels. Returns: np.ndarray: Padded image with dimensions at least (min_height, min_width).

Parameters

Name	Type	Default	Description
img	ImageType	-	-
min_height	int	-	-
min_width	int	-	-
border_mode	int	-	-
value	One of: tuple[float, ...] float None	-	-

pad_with_paramsfunction

pad_with_params(
    img: ImageType,
    h_pad_top: int,
    h_pad_bottom: int,
    w_pad_left: int,
    w_pad_right: int,
    border_mode: int,
    value: tuple[float, ...] | float | None
)

Pad an image with explicit padding per side. Params: pad_top, pad_bottom, pad_left, pad_right, border_mode, fill. For Pad/PadIfNeeded. This function adds specified amounts of padding to each side of the image. Args: img (ImageType): Input image to pad. h_pad_top (int): Number of pixels to add at the top. h_pad_bottom (int): Number of pixels to add at the bottom. w_pad_left (int): Number of pixels to add on the left. w_pad_right (int): Number of pixels to add on the right. border_mode (int): OpenCV border mode for padding. value (tuple[float, ...] | float | None): Value(s) to fill the border pixels. Returns: np.ndarray: Padded image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
h_pad_top	int	-	-
h_pad_bottom	int	-	-
w_pad_left	int	-	-
w_pad_right	int	-	-
border_mode	int	-	-
value	One of: tuple[float, ...] float None	-	-

pad_images_with_paramsfunction

pad_images_with_params(
    images: ImageType,
    h_pad_top: int,
    h_pad_bottom: int,
    w_pad_left: int,
    w_pad_right: int,
    border_mode: int,
    value: tuple[float, ...] | float | None
)

Pad a batch of images (N, H, W, C) with explicit padding per side. Same params as pad_with_params; applies to each image. This function adds specified amounts of padding to each side of the image for each image in the batch. Args: images (ImageType): Input batch of images to pad. h_pad_top (int): Number of pixels to add at the top. h_pad_bottom (int): Number of pixels to add at the bottom. w_pad_left (int): Number of pixels to add on the left. w_pad_right (int): Number of pixels to add on the right. border_mode (int): OpenCV border mode for padding. value (tuple[float, ...] | float | None): Value(s) to fill the border pixels. Returns: np.ndarray: Padded batch of images.

Parameters

Name	Type	Default	Description
images	ImageType	-	-
h_pad_top	int	-	-
h_pad_bottom	int	-	-
w_pad_left	int	-	-
w_pad_right	int	-	-
border_mode	int	-	-
value	One of: tuple[float, ...] float None	-	-

remap_keypoints_via_maskfunction

remap_keypoints_via_mask(
    keypoints: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    image_shape: tuple[int, int]
)

Remap keypoints using mask and cv2.remap. image_shape, mask (displacement); samples new (x,y) from map. For distortion transforms with keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

remap_keypointsfunction

remap_keypoints(
    keypoints: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    image_shape: tuple[int, int]
)

Transform keypoints using coordinate mapping (map_x, map_y). Interpolates new (x, y) from maps; image_shape for bounds. For remap-based distortions. This function applies the inverse of the mapping defined by map_x and map_y to keypoint coordinates. The inverse mapping is necessary because the mapping functions define how pixels move from the source to the destination image, while keypoints need to be transformed from the destination back to the source. Args: keypoints (np.ndarray): Array of keypoints with shape (N, 2+), where the first two columns are x and y coordinates. map_x (np.ndarray): Map of x-coordinates with shape equal to image_shape. map_y (np.ndarray): Map of y-coordinates with shape equal to image_shape. image_shape (tuple[int, int]): Shape (height, width) of the original image. Returns: np.ndarray: Transformed keypoints with the same shape as the input keypoints. Returns an empty array if input keypoints is empty.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

generate_inverse_distortion_mapfunction

generate_inverse_distortion_map(
    map_x: np.ndarray,
    map_y: np.ndarray,
    shape: tuple[int, int]
)

Generate inverse mapping for strong distortions. From forward map_x, map_y; returns inverse map for sampling. For PiecewiseAffine and similar.

Parameters

Name	Type	Default	Description
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
shape	tuple[int, int]	-	-

upscale_distortion_mapsfunction

upscale_distortion_maps(
    map_x: np.ndarray,
    map_y: np.ndarray,
    target_shape: tuple[int, int],
    interpolation: int = cv2.INTER_LINEAR
)

Upscale distortion maps from lower resolution to target shape. interpolation for resampling. When maps are generated at lower res for performance. This is used when distortion maps are generated at a lower resolution for performance, then upscaled to the original image size. Args: map_x (np.ndarray): X-coordinate distortion map (generated at lower resolution) map_y (np.ndarray): Y-coordinate distortion map (generated at lower resolution) target_shape (tuple[int, int]): Target shape (height, width) to upscale to interpolation (int): OpenCV interpolation method Returns: tuple[np.ndarray, np.ndarray]: Upscaled distortion maps with target_shape.

Parameters

Name	Type	Default	Description
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
target_shape	tuple[int, int]	-	-
interpolation	int	cv2.INTER_LINEAR	-

remap_bboxesfunction

remap_bboxes(
    bboxes: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    image_shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb']
)

Remap bounding boxes using displacement maps. map_x, map_y; bbox_type hbb/obb. Converts bboxes to mask, remaps, converts back. For distortion transforms. Args: bboxes (np.ndarray): Bounding boxes array map_x (np.ndarray): X displacement map map_y (np.ndarray): Y displacement map image_shape (tuple[int, int]): Image shape (height, width) bbox_type (Literal['hbb', 'obb']): Type of bounding box - "hbb" for axis-aligned or "obb" for oriented Returns: np.ndarray: Remapped bounding boxes.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
image_shape	tuple[int, int]	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

generate_displacement_fieldsfunction

generate_displacement_fields(
    image_shape: tuple[int, int],
    alpha: float,
    sigma: float,
    same_dxdy: bool,
    kernel_size: tuple[int, int],
    random_generator: np.random.Generator,
    noise_distribution: Literal['gaussian', 'uniform']
)

Generate displacement fields for elastic transform. Params: alpha, sigma, shape; random_generator for reproducibility. Returns map_x, map_y. This function generates displacement fields for elastic transform based on the provided parameters. It generates noise either from a Gaussian or uniform distribution and normalizes it to the range [-1, 1]. Args: image_shape (tuple[int, int]): The shape of the image as (height, width). alpha (float): The alpha parameter for the elastic transform. sigma (float): The sigma parameter for the elastic transform. same_dxdy (bool): Whether to use the same displacement field for both x and y directions. kernel_size (tuple[int, int]): The size of the kernel for the elastic transform. random_generator (np.random.Generator): The random number generator to use. noise_distribution (Literal['gaussian', 'uniform']): The distribution of the noise. Returns: tuple[np.ndarray, np.ndarray]: A tuple containing: - fields: The displacement fields for the elastic transform. - output_shape: The output shape of the elastic warp.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
alpha	float	-	-
sigma	float	-	-
same_dxdy	bool	-	-
kernel_size	tuple[int, int]	-	-
random_generator	np.random.Generator	-	-
noise_distribution	One of: 'gaussian' 'uniform'	-	-

pad_bboxesfunction

pad_bboxes(
    bboxes: np.ndarray,
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    border_mode: int,
    image_shape: tuple[int, int]
)

Pad bounding boxes by a given amount (in normalized or pixel units). Params: pad_x, pad_y or pad_amount. Keeps boxes in [0,1] or image bounds. This function pads bounding boxes by a given amount. It handles both reflection and padding. Args: bboxes (np.ndarray): The bounding boxes to pad. pad_top (int): The amount to pad the top of the bounding boxes. pad_bottom (int): The amount to pad the bottom of the bounding boxes. pad_left (int): The amount to pad the left of the bounding boxes. pad_right (int): The amount to pad the right of the bounding boxes. border_mode (int): The border mode to use. image_shape (tuple[int, int]): The shape of the image as (height, width). Returns: np.ndarray: The padded bounding boxes.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
pad_top	int	-	-
pad_bottom	int	-	-
pad_left	int	-	-
pad_right	int	-	-
border_mode	int	-	-
image_shape	tuple[int, int]	-	-

validate_bboxesfunction

validate_bboxes(
    bboxes: np.ndarray,
    image_shape: Sequence[int]
)

Validate bounding boxes and remove invalid ones. Checks format, bounds; can remove empty or out-of-image boxes. Returns valid bboxes and mask. Args: bboxes (np.ndarray): Array of bounding boxes with shape (n, 4) where each row is [x_min, y_min, x_max, y_max]. image_shape (Sequence[int]): Shape of the image as (height, width). Returns: np.ndarray: Array of valid bounding boxes, potentially with fewer boxes than the input. Examples: >>> bboxes = np.array([[10, 20, 30, 40], [-10, -10, 5, 5], [100, 100, 120, 120]]) >>> valid_bboxes = validate_bboxes(bboxes, (100, 100)) >>> print(valid_bboxes) [[10 20 30 40]]

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
image_shape	Sequence[int]	-	-

shift_bboxesfunction

shift_bboxes(
    bboxes: np.ndarray,
    shift_vector: np.ndarray
)

Shift bounding boxes by a given (dx, dy) vector. Normalized or pixel; bbox_type hbb/obb. For crop/shift transforms. Keeps in bounds. Args: bboxes (np.ndarray): Array of bounding boxes with shape (n, m) where n is the number of bboxes and m >= 4. The first 4 columns are [x_min, y_min, x_max, y_max]. shift_vector (np.ndarray): Vector to shift the bounding boxes by, with shape (4,) for [shift_x, shift_y, shift_x, shift_y]. Returns: np.ndarray: Shifted bounding boxes with the same shape as input.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
shift_vector	np.ndarray	-	-

get_pad_grid_dimensionsfunction

get_pad_grid_dimensions(
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    image_shape: tuple[int, int]
)

Calculate grid dimensions and original image position for reflection padding. Returns (grid_rows, grid_cols, row_offset, col_offset). For reflection crops. Args: pad_top (int): Number of pixels to pad above the image. pad_bottom (int): Number of pixels to pad below the image. pad_left (int): Number of pixels to pad to the left of the image. pad_right (int): Number of pixels to pad to the right of the image. image_shape (tuple[int, int]): Shape of the original image as (height, width). Returns: dict[str, tuple[int, int]]: A dictionary containing: - 'grid_shape': A tuple (grid_rows, grid_cols) where: - grid_rows (int): Number of times the image needs to be repeated vertically. - grid_cols (int): Number of times the image needs to be repeated horizontally. - 'original_position': A tuple (original_row, original_col) where: - original_row (int): Row index of the original image in the grid. - original_col (int): Column index of the original image in the grid.

Parameters

Name	Type	Default	Description
pad_top	int	-	-
pad_bottom	int	-	-
pad_left	int	-	-
pad_right	int	-	-
image_shape	tuple[int, int]	-	-

generate_reflected_bboxesfunction

generate_reflected_bboxes(
    bboxes: np.ndarray,
    grid_dims: dict[str, tuple[int, int]],
    image_shape: tuple[int, int],
    center_in_origin: bool = False
)

Generate reflected bounding boxes for the entire reflection grid. From base bboxes and grid layout; for Mosaic and reflection-based crops. Args: bboxes (np.ndarray): Original bounding boxes. grid_dims (dict[str, tuple[int, int]]): Grid dimensions and original position. image_shape (tuple[int, int]): Shape of the original image as (height, width). center_in_origin (bool): If True, center the grid at the origin. Default is False. Returns: np.ndarray: Array of reflected and shifted bounding boxes for the entire grid.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
grid_dims	dict[str, tuple[int, int]]	-	-
image_shape	tuple[int, int]	-	-
center_in_origin	bool	False	-

flip_bboxesfunction

flip_bboxes(
    bboxes: np.ndarray,
    flip_horizontal: bool = False,
    flip_vertical: bool = False,
    image_shape: tuple[int, int] = (0, 0)
)

Flip bounding boxes horizontally and/or vertically. direction: 'horizontal', 'vertical', or 'both'. Normalized coords; hbb and obb. For flips. Args: bboxes (np.ndarray): Array of bounding boxes with shape (n, m) where each row is [x_min, y_min, x_max, y_max, ...]. flip_horizontal (bool): Whether to flip horizontally. flip_vertical (bool): Whether to flip vertically. image_shape (tuple[int, int]): Shape of the image as (height, width). Returns: np.ndarray: Flipped bounding boxes.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
flip_horizontal	bool	False	-
flip_vertical	bool	False	-
image_shape	tuple[int, int]	(0, 0)	-

distort_imagefunction

distort_image(
    image: np.ndarray,
    generated_mesh: np.ndarray,
    interpolation: int
)

Apply perspective distortion to an image from a generated mesh. Each mesh cell is warped; interpolation for resampling. For PiecewiseAffine-style transforms. This function applies a perspective transformation to each cell of the image defined by the generated mesh. The distortion is applied using OpenCV's perspective transformation and blending techniques. Args: image (np.ndarray): The input image to be distorted. Can be a 2D grayscale image or a 3D color image. generated_mesh (np.ndarray): A 2D array where each row represents a quadrilateral cell as [x1, y1, x2, y2, dst_x1, dst_y1, dst_x2, dst_y2, dst_x3, dst_y3, dst_x4, dst_y4]. The first four values define the source rectangle, and the last eight values define the destination quadrilateral. interpolation (int): Interpolation method to be used in the perspective transformation. Should be one of the OpenCV interpolation flags (e.g., cv2.INTER_LINEAR). Returns: np.ndarray: The distorted image with the same shape and dtype as the input image. Note: - The function preserves the channel dimension of the input image. - Each cell of the generated mesh is transformed independently and then blended into the output image. - The distortion is applied using perspective transformation, which allows for more complex distortions compared to affine transformations. Examples: >>> image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8) >>> mesh = np.array([[0, 0, 50, 50, 5, 5, 45, 5, 45, 45, 5, 45]]) >>> distorted = distort_image(image, mesh, cv2.INTER_LINEAR) >>> distorted.shape (100, 100, 3)

Parameters

Name	Type	Default	Description
image	np.ndarray	-	-
generated_mesh	np.ndarray	-	-
interpolation	int	-	-

bbox_distort_imagefunction

bbox_distort_image(
    bboxes: np.ndarray,
    generated_mesh: np.ndarray,
    image_shape: tuple[int, int]
)

Distort bounding boxes based on a generated mesh. Each bbox warped per mesh cell; image_shape for clipping. For PiecewiseAffine with bboxes. This function applies a perspective transformation to each bounding box based on the provided generated mesh. It ensures that the bounding boxes are clipped to the image boundaries after transformation. Args: bboxes (np.ndarray): The bounding boxes to distort. generated_mesh (np.ndarray): The generated mesh to distort the bounding boxes with. image_shape (tuple[int, int]): The shape of the image as (height, width). Returns: np.ndarray: The distorted bounding boxes.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
generated_mesh	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

distort_image_keypointsfunction

distort_image_keypoints(
    keypoints: np.ndarray,
    generated_mesh: np.ndarray,
    image_shape: tuple[int, int]
)

Map keypoints through a piecewise-affine mesh; new (x,y) from mesh cells. Use with PiecewiseAffine. Angle and extra columns unchanged. This function applies a perspective transformation to each keypoint based on the provided generated mesh. It ensures that the keypoints are clipped to the image boundaries after transformation. Args: keypoints (np.ndarray): The keypoints to distort. generated_mesh (np.ndarray): The generated mesh to distort the keypoints with. image_shape (tuple[int, int]): The shape of the image as (height, width). Returns: np.ndarray: The distorted keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
generated_mesh	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

generate_distorted_grid_polygonsfunction

generate_distorted_grid_polygons(
    dimensions: np.ndarray,
    magnitude: int,
    random_generator: np.random.Generator
)

Generate distorted grid polygons from dimensions and magnitude. Internal vertices randomized; boundary fixed. For PiecewiseAffine mesh generation. This function creates a grid of polygons and applies random distortions to the internal vertices, while keeping the boundary vertices fixed. The distortion is applied consistently across shared vertices to avoid gaps or overlaps in the resulting grid. Args: dimensions (np.ndarray): A 3D array of shape (grid_height, grid_width, 4) where each element is [x_min, y_min, x_max, y_max] representing the dimensions of a grid cell. magnitude (int): Maximum pixel-wise displacement for distortion. The actual displacement will be randomly chosen in the range [-magnitude, magnitude]. random_generator (np.random.Generator): A random number generator. Returns: np.ndarray: A 2D array of shape (total_cells, 8) where each row represents a distorted polygon as [x1, y1, x2, y1, x2, y2, x1, y2]. The total_cells is equal to grid_height * grid_width. Note: - Only internal grid points are distorted; boundary points remain fixed. - The function ensures consistent distortion across shared vertices of adjacent cells. - The distortion is applied to the following points of each internal cell: * Bottom-right of the cell above and to the left * Bottom-left of the cell above * Top-right of the cell to the left * Top-left of the current cell - Each square represents a cell, and the X marks indicate the coordinates where displacement occurs. +--+--+--+--+ | | | | | +--X--X--X--+ | | | | | +--X--X--X--+ | | | | | +--X--X--X--+ | | | | | +--+--+--+--+ - For each X, the coordinates of the left, right, top, and bottom edges in the four adjacent cells are displaced. Examples: >>> dimensions = np.array([[[0, 0, 50, 50], [50, 0, 100, 50]], ... [[0, 50, 50, 100], [50, 50, 100, 100]]]) >>> distorted = generate_distorted_grid_polygons(dimensions, magnitude=10) >>> distorted.shape (4, 8)

Parameters

Name	Type	Default	Description
dimensions	np.ndarray	-	-
magnitude	int	-	-
random_generator	np.random.Generator	-	-

pad_keypointsfunction

pad_keypoints(
    keypoints: np.ndarray,
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    border_mode: int,
    image_shape: tuple[int, int]
)

Pad keypoints by given pad_top, pad_bottom, pad_left, pad_right. border_mode and image_shape; reflection or shift. For Pad with keypoints. This function pads keypoints by a given amount. It handles both reflection and padding. Args: keypoints (np.ndarray): The keypoints to pad. pad_top (int): The amount to pad the top of the keypoints. pad_bottom (int): The amount to pad the bottom of the keypoints. pad_left (int): The amount to pad the left of the keypoints. pad_right (int): The amount to pad the right of the keypoints. border_mode (int): The border mode to use. image_shape (tuple[int, int]): The shape of the image as (height, width). Returns: np.ndarray: The padded keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
pad_top	int	-	-
pad_bottom	int	-	-
pad_left	int	-	-
pad_right	int	-	-
border_mode	int	-	-
image_shape	tuple[int, int]	-	-

validate_keypointsfunction

validate_keypoints(
    keypoints: np.ndarray,
    image_shape: tuple[int, int]
)

Drop keypoints outside image bounds. image_shape (H,W). Keeps points with x in [0,W), y in [0,H). Use after transforms that may move points out of frame. Args: keypoints (np.ndarray): Array of keypoints with shape (N, M) where N is the number of keypoints and M >= 2. The first two columns represent x and y coordinates. image_shape (tuple[int, int]): Shape of the image as (height, width). Returns: np.ndarray: Array of valid keypoints that fall within the image boundaries. Note: This function only checks the x and y coordinates (first two columns) of the keypoints. Any additional columns (e.g., angle, scale) are preserved for valid keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

shift_keypointsfunction

shift_keypoints(
    keypoints: np.ndarray,
    shift_vector: np.ndarray
)

Translate keypoints by shift_vector (dx, dy, dz). Use when mapping keypoints after crop or shift. Angle, scale, and other extra columns unchanged. This function shifts the keypoints by a given shift vector. It only shifts the x, y and z coordinates of the keypoints. Args: keypoints (np.ndarray): The keypoints to shift. shift_vector (np.ndarray): The shift vector to apply to the keypoints. Returns: np.ndarray: The shifted keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
shift_vector	np.ndarray	-	-

generate_reflected_keypointsfunction

generate_reflected_keypoints(
    keypoints: np.ndarray,
    grid_dims: dict[str, tuple[int, int]],
    image_shape: tuple[int, int],
    center_in_origin: bool = False
)

Generate reflected keypoints for the entire reflection grid. grid_dims, image_shape, center_in_origin. For Mosaic/reflection padding with keypoints. This function creates a grid of keypoints by reflecting and shifting the original keypoints. It handles both centered and non-centered grids based on the `center_in_origin` parameter. Args: keypoints (np.ndarray): Original keypoints array of shape (N, 4+), where N is the number of keypoints, and each keypoint is represented by at least 4 values (x, y, angle, scale, ...). grid_dims (dict[str, tuple[int, int]]): A dictionary containing grid dimensions and original position. It should have the following keys: - "grid_shape": tuple[int, int] representing (grid_rows, grid_cols) - "original_position": tuple[int, int] representing (original_row, original_col) image_shape (tuple[int, int]): Shape of the original image as (height, width). center_in_origin (bool, optional): If True, center the grid at the origin. Default is False. Returns: np.ndarray: Array of reflected and shifted keypoints for the entire grid. The shape is (N * grid_rows * grid_cols, 4+), where N is the number of original keypoints. Note: - The function handles keypoint flipping and shifting to create a grid of reflected keypoints. - It preserves the angle and scale information of the keypoints during transformations. - The resulting grid can be either centered at the origin or positioned based on the original grid.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
grid_dims	dict[str, tuple[int, int]]	-	-
image_shape	tuple[int, int]	-	-
center_in_origin	bool	False	-

flip_keypointsfunction

flip_keypoints(
    keypoints: np.ndarray,
    flip_horizontal: bool = False,
    flip_vertical: bool = False,
    image_shape: tuple[int, int] = (0, 0)
)

Flip keypoints horizontally or vertically. direction: 'horizontal' or 'vertical'; image_shape for pixel coords. For HorizontalFlip/VerticalFlip. This function flips keypoints horizontally or vertically based on the provided parameters. It also flips the angle of the keypoints when flipping horizontally. Args: keypoints (np.ndarray): The keypoints to flip. flip_horizontal (bool): Whether to flip horizontally. flip_vertical (bool): Whether to flip vertically. image_shape (tuple[int, int]): The shape of the image as (height, width). Returns: np.ndarray: The flipped keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
flip_horizontal	bool	False	-
flip_vertical	bool	False	-
image_shape	tuple[int, int]	(0, 0)	-

create_affine_transformation_matrixfunction

create_affine_transformation_matrix(
    translate: Mapping[str, float],
    shear: dict[str, float],
    scale: dict[str, float],
    rotate: float,
    shift: tuple[float, float]
)

Build 3x3 affine matrix from translation, shear, scale, rotation, shift. Order: shift topleft, scale, rotate, shear, translate, shift center. Args: translate (Mapping[str, float]): Translation in x and y directions. shear (dict[str, float]): Shear in x and y directions (in degrees). scale (dict[str, float]): Scale factors for x and y directions. rotate (float): Rotation angle in degrees. shift (tuple[float, float]): Shift to apply before and after transformations. Returns: np.ndarray: The resulting 3x3 affine transformation matrix.

Parameters

Name	Type	Default	Description
translate	Mapping[str, float]	-	-
shear	dict[str, float]	-	-
scale	dict[str, float]	-	-
rotate	float	-	-
shift	tuple[float, float]	-	-

compute_transformed_image_boundsfunction

compute_transformed_image_bounds(
    matrix: np.ndarray,
    image_shape: tuple[int, int]
)

Compute the bounds of an image after applying an affine transformation. matrix 3x3, image_shape (H, W). Returns min_coords, max_coords of transformed corners. Args: matrix (np.ndarray): The 3x3 affine transformation matrix. image_shape (tuple[int, int]): The shape of the image as (height, width). Returns: tuple[np.ndarray, np.ndarray]: A tuple containing: - min_coords: An array with the minimum x and y coordinates. - max_coords: An array with the maximum x and y coordinates.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

compute_affine_warp_output_shapefunction

compute_affine_warp_output_shape(
    matrix: np.ndarray,
    input_shape: tuple[int, ...]
)

Compute the output shape of an affine warp. matrix 3x3, input_shape (H, W[, C]). Returns (adjusted_matrix, output_shape). For Affine keep_size=False. This function computes the output shape of an affine warp based on the input matrix and input shape. It calculates the transformed image bounds and then determines the output shape based on the input shape. Args: matrix (np.ndarray): The 3x3 affine transformation matrix. input_shape (tuple[int, ...]): The shape of the input image as (height, width, ...). Returns: tuple[np.ndarray, tuple[int, int]]: A tuple containing: - matrix: The 3x3 affine transformation matrix. - output_shape: The output shape of the affine warp.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
input_shape	tuple[int, ...]	-	-

centerfunction

center(
    image_shape: tuple[int, int]
)

Calculate the center coordinates of the image. (width/2 - 0.5, height/2 - 0.5). For rotation and affine center. image_shape (H, W). Returns (cx, cy). Args: image_shape (tuple[int, int]): The shape of the image. Returns: tuple[float, float]: center_x, center_y

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-

center_bboxfunction

center_bbox(
    image_shape: tuple[int, int]
)

Calculate the center coordinates of the image for bounding boxes. (width/2, height/2). For bbox center in OBB or crop. image_shape (H, W). Returns (cx, cy). Args: image_shape (tuple[int, int]): The shape of the image. Returns: tuple[float, float]: center_x, center_y

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-

generate_gridfunction

generate_grid(
    image_shape: tuple[int, int],
    steps_x: list[float],
    steps_y: list[float],
    num_steps: int
)

Generate a distorted grid (map_x, map_y) for remap. steps_x, steps_y, num_steps control distortion. image_shape (H, W). For GridDistortion. This function creates two 2D arrays (map_x and map_y) that represent a distorted version of the original image grid. These arrays can be used with OpenCV's remap function to apply grid distortion to an image. Args: image_shape (tuple[int, int]): The shape of the image as (height, width). steps_x (list[float]): List of step sizes for the x-axis distortion. The length should be num_steps + 1. Each value represents the relative step size for a segment of the grid in the x direction. steps_y (list[float]): List of step sizes for the y-axis distortion. The length should be num_steps + 1. Each value represents the relative step size for a segment of the grid in the y direction. num_steps (int): The number of steps to divide each axis into. This determines the granularity of the distortion grid. Returns: tuple[np.ndarray, np.ndarray]: A tuple containing two 2D numpy arrays: - map_x: A 2D array of float32 values representing the x-coordinates of the distorted grid. - map_y: A 2D array of float32 values representing the y-coordinates of the distorted grid. Note: - The function generates a grid where each cell can be distorted independently. - The distortion is controlled by the steps_x and steps_y parameters, which determine how much each grid line is shifted. - The resulting map_x and map_y can be used directly with cv2.remap() to apply the distortion to an image. - The distortion is applied smoothly across each grid cell using linear interpolation. Examples: >>> image_shape = (100, 100) >>> steps_x = [1.1, 0.9, 1.0, 1.2, 0.95, 1.05] >>> steps_y = [0.9, 1.1, 1.0, 1.1, 0.9, 1.0] >>> num_steps = 5 >>> map_x, map_y = generate_grid(image_shape, steps_x, steps_y, num_steps) >>> distorted_image = cv2.remap(image, map_x, map_y, cv2.INTER_LINEAR)

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
steps_x	list[float]	-	-
steps_y	list[float]	-	-
num_steps	int	-	-

normalize_grid_distortion_stepsfunction

normalize_grid_distortion_steps(
    image_shape: tuple[int, int],
    num_steps: int,
    x_steps: list[float],
    y_steps: list[float]
)

Normalize grid distortion steps so distortion stays in image bounds. image_shape, num_steps, x_steps, y_steps. Returns dict steps_x, steps_y. This function normalizes the grid distortion steps, ensuring that the distortion never leaves the image bounds. It compensates for smaller last steps in the source image and normalizes the steps such that the distortion never leaves the image bounds. Args: image_shape (tuple[int, int]): The shape of the image as (height, width). num_steps (int): The number of steps to divide each axis into. This determines the granularity of the distortion grid. x_steps (list[float]): List of step sizes for the x-axis distortion. The length should be num_steps + 1. Each value represents the relative step size for a segment of the grid in the x direction. y_steps (list[float]): List of step sizes for the y-axis distortion. The length should be num_steps + 1. Each value represents the relative step size for a segment of the grid in the y direction. Returns: dict[str, np.ndarray]: A dictionary containing the normalized step sizes for the x and y axes.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
num_steps	int	-	-
x_steps	list[float]	-	-
y_steps	list[float]	-	-

almost_equal_intervalsfunction

almost_equal_intervals(
    n: int,
    parts: int
)

Generate nearly equal integer intervals that sum to n. parts is count; max diff 1. For splitting H or W into grid rows/cols. Returns 1D array of part sizes. This function divides the number `n` into `parts` nearly equal parts. It ensures that the sum of all parts equals `n`, and the difference between any two parts is at most one. This is useful for distributing a total amount into nearly equal discrete parts. Args: n (int): The total value to be split. parts (int): The number of parts to split into. Returns: np.ndarray: An array of integers where each integer represents the size of a part. Examples: >>> almost_equal_intervals(20, 3) array([7, 7, 6]) # Splits 20 into three parts: 7, 7, and 6 >>> almost_equal_intervals(16, 4) array([4, 4, 4, 4]) # Splits 16 into four equal parts

Parameters

Name	Type	Default	Description
n	int	-	-
parts	int	-	-

generate_shuffled_splitsfunction

generate_shuffled_splits(
    size: int,
    divisions: int,
    random_generator: np.random.Generator
)

Generate shuffled splits for a dimension (size, divisions). random_generator shuffles interval sizes. Returns cumulative edges. For GridDistortion/Mosaic. Args: size (int): Total size of the dimension (height or width). divisions (int): Number of divisions (rows or columns). random_generator (np.random.Generator): The random generator to use for shuffling the splits. If None, the splits are not shuffled. Returns: np.ndarray: Cumulative edges of the shuffled intervals.

Parameters

Name	Type	Default	Description
size	int	-	-
divisions	int	-	-
random_generator	np.random.Generator	-	-

split_uniform_gridfunction

split_uniform_grid(
    image_shape: tuple[int, int],
    grid: tuple[int, int],
    random_generator: np.random.Generator
)

Split image shape into a uniform grid (rows, cols). Shuffled splits; returns tile coords (start_y, start_x, end_y, end_x) per tile. For GridShuffle/Mosaic. Args: image_shape (tuple[int, int]): The shape of the image as (height, width). grid (tuple[int, int]): The grid size as (rows, columns). random_generator (np.random.Generator): The random generator to use for shuffling the splits. If None, the splits are not shuffled. Returns: np.ndarray: An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x). Note: The function uses `generate_shuffled_splits` to generate the splits for the height and width of the image. The splits are then used to calculate the coordinates of the tiles.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
grid	tuple[int, int]	-	-
random_generator	np.random.Generator	-	-

generate_perspective_pointsfunction

generate_perspective_points(
    image_shape: tuple[int, int],
    scale: float,
    random_generator: np.random.Generator
)

Generate four perspective corner points for image_shape and scale. Normal jitter, modulated to bounds. random_generator. For Perspective transform. This function generates perspective points for a given image shape and scale. It uses a normal distribution to generate the points, and then modulates them to be within the image bounds. Args: image_shape (tuple[int, int]): The shape of the image as (height, width). scale (float): The scale of the perspective points. random_generator (np.random.Generator): The random generator to use for generating the points. Returns: np.ndarray: The perspective points.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
scale	float	-	-
random_generator	np.random.Generator	-	-

order_pointsfunction

order_points(
    pts: np.ndarray
)

Order four points clockwise: top-left, top-right, bottom-right, bottom-left. For perspective transform source/destination quads. pts shape (4, 2). This function orders the points in a clockwise manner, ensuring that the points are in the correct order for perspective transformation. Args: pts (np.ndarray): The points to order. Returns: np.ndarray: The ordered points.

Parameters

Name	Type	Default	Description
pts	np.ndarray	-	-

compute_perspective_paramsfunction

compute_perspective_params(
    points: np.ndarray,
    image_shape: tuple[int, int]
)

Compute perspective params from four points and image_shape. Returns (matrix, max_width, max_height). Adjusts dims so transformed image keeps size. Computes the perspective transformation matrix and output dimensions for a given set of four corner points; call from Perspective or similar transforms. Args: points (np.ndarray): The points to compute the perspective transformation parameters for. image_shape (tuple[int, int]): The shape of the image. Returns: tuple[np.ndarray, int, int]: The perspective transformation parameters and the maximum dimensions of the transformed image.

Parameters

Name	Type	Default	Description
points	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

expand_transformfunction

expand_transform(
    matrix: np.ndarray,
    shape: tuple[int, int]
)

Expand a transformation matrix to include padding. shape (H, W). Returns (expanded_matrix, max_width, max_height). For Perspective with keep_size. This function expands a transformation matrix to include padding, ensuring that the transformed image retains its original dimensions. It first calculates the destination points of the transformed image, then adjusts the matrix to include padding, and finally returns the expanded matrix and the maximum dimensions of the transformed image. Args: matrix (np.ndarray): The transformation matrix to expand. shape (tuple[int, int]): The shape of the image. Returns: tuple[np.ndarray, int, int]: The expanded matrix and the maximum dimensions of the transformed image.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
shape	tuple[int, int]	-	-

create_piecewise_affine_mapsfunction

create_piecewise_affine_maps(
    image_shape: tuple[int, int],
    grid: tuple[int, int],
    scale: float,
    absolute_scale: bool,
    random_generator: np.random.Generator
)

Create map_x and map_y for PiecewiseAffine: jittered grid and IDW yield full-resolution remap maps. Used by the transform; result is passed to OpenCV remap. It generates the control points for the transformation, then uses the remap function to create the transformation maps. Args: image_shape (tuple[int, int]): The shape of the image as (height, width). grid (tuple[int, int]): The grid size as (rows, columns). scale (float): The scale of the transformation. absolute_scale (bool): Whether to use absolute scale. random_generator (np.random.Generator): The random generator to use for generating the points. Returns: tuple[np.ndarray | None, np.ndarray | None]: The transformation maps.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
grid	tuple[int, int]	-	-
scale	float	-	-
absolute_scale	bool	-	-
random_generator	np.random.Generator	-	-

bboxes_piecewise_affinefunction

bboxes_piecewise_affine(
    bboxes: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    border_mode: int,
    image_shape: tuple[int, int]
)

Apply piecewise affine to bboxes via map_x, map_y. bbox->mask->remap->bbox. border_mode, image_shape. For PiecewiseAffine with bboxes. This function applies a piecewise affine transformation to the bounding boxes of an image. It first converts the bounding boxes to masks, then applies the transformation, and finally converts the transformed masks back to bounding boxes. Args: bboxes (np.ndarray): The bounding boxes to transform. map_x (np.ndarray): The x-coordinates of the transformation. map_y (np.ndarray): The y-coordinates of the transformation. border_mode (int): The border mode to use for the transformation. image_shape (tuple[int, int]): The shape of the image. Returns: np.ndarray: The transformed bounding boxes.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
border_mode	int	-	-
image_shape	tuple[int, int]	-	-

get_dimension_paddingfunction

get_dimension_padding(
    current_size: int,
    min_size: int | None,
    divisor: int | None
)

Calculate padding (pad_before, pad_after) for one dimension. current_size, optional min_size or divisor. For PadIfNeeded / divisible sizes. Args: current_size (int): Current size of the dimension min_size (int | None): Minimum size requirement, if any divisor (int | None): Divisor for padding to make size divisible, if any Returns: tuple[int, int]: (pad_before, pad_after)

Parameters

Name	Type	Default	Description
current_size	int	-	-
min_size	One of: int None	-	-
divisor	One of: int None	-	-

get_padding_paramsfunction

get_padding_params(
    image_shape: tuple[int, int],
    min_height: int | None,
    min_width: int | None,
    pad_height_divisor: int | None,
    pad_width_divisor: int | None
)

Calculate padding (pad_top, pad_bottom, pad_left, pad_right) from image_shape and optional min_height, min_width, height/width divisors. For PadIfNeeded. Args: image_shape (tuple[int, int]): (height, width) of the image min_height (int | None): Minimum height requirement, if any min_width (int | None): Minimum width requirement, if any pad_height_divisor (int | None): Divisor for height padding, if any pad_width_divisor (int | None): Divisor for width padding, if any Returns: tuple[int, int, int, int]: (pad_top, pad_bottom, pad_left, pad_right)

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
min_height	One of: int None	-	-
min_width	One of: int None	-	-
pad_height_divisor	One of: int None	-	-
pad_width_divisor	One of: int None	-	-

adjust_padding_by_positionfunction

adjust_padding_by_position(
    h_top: int,
    h_bottom: int,
    w_left: int,
    w_right: int,
    position: Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random'],
    py_random: np.random.RandomState
)

Adjust padding (h_top, h_bottom, w_left, w_right) by position: center, top_left, top_right, bottom_*, or random. py_random for random. For PadIfNeeded.

Parameters

Name	Type	Default	Description
h_top	int	-	-
h_bottom	int	-	-
w_left	int	-	-
w_right	int	-	-
position	One of: 'center' 'top_left' 'top_right' 'bottom_left' 'bottom_right' 'random'	-	-
py_random	np.random.RandomState	-	-

swap_tiles_on_keypointsfunction

swap_tiles_on_keypoints(
    keypoints: np.ndarray,
    tiles: np.ndarray,
    mapping: np.ndarray
)

Reposition keypoints by tile swap mapping. tiles (M, 4), mapping (M,). Keypoints in tile i move to tile mapping[i]. For GridShuffle. This function takes a set of keypoints and repositions them according to a mapping of tile swaps. Keypoints are moved from their original tiles to new positions in the swapped tiles. Args: keypoints (np.ndarray): A 2D numpy array of shape (N, 2) where N is the number of keypoints. Each row represents a keypoint's (x, y) coordinates. tiles (np.ndarray): A 2D numpy array of shape (M, 4) where M is the number of tiles. Each row represents a tile's (start_y, start_x, end_y, end_x) coordinates. mapping (np.ndarray): A 1D numpy array of shape (M,) where M is the number of tiles. Each element i contains the index of the tile that tile i should be swapped with. Returns: np.ndarray: A 2D numpy array of the same shape as the input keypoints, containing the new positions of the keypoints after the tile swap. Raises: RuntimeWarning: If any keypoint is not found within any tile. Notes: - Keypoints that do not fall within any tile will remain unchanged. - The function assumes that the tiles do not overlap and cover the entire image space.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
tiles	np.ndarray	-	-
mapping	np.ndarray	-	-

swap_tiles_on_imagefunction

swap_tiles_on_image(
    image: np.ndarray,
    tiles: np.ndarray,
    mapping: list[int] | None
)

Swap tiles on the image by mapping. tiles (M, 4) [start_y, start_x, end_y, end_x]; mapping lists new index per tile. For GridShuffle. Returns new image. Args: image (np.ndarray): Input image. tiles (np.ndarray): Array of tiles with each tile as [start_y, start_x, end_y, end_x]. mapping (list[int] | None): list of new tile indices. Returns: np.ndarray: Output image with tiles swapped according to the random shuffle.

Parameters

Name	Type	Default	Description
image	np.ndarray	-	-
tiles	np.ndarray	-	-
mapping	One of: list[int] None	-	-

is_valid_componentfunction

is_valid_component(
    component_area: float,
    original_area: float,
    min_area: float | None,
    min_visibility: float | None
)

Return True if component meets min_area and min_visibility. component_area, original_area; None thresholds pass. For GridShuffle bbox filtering.

Parameters

Name	Type	Default	Description
component_area	float	-	-
original_area	float	-	-
min_area	One of: float None	-	-
min_visibility	One of: float None	-	-

bboxes_grid_shufflefunction

bboxes_grid_shuffle(
    bboxes: np.ndarray,
    tiles: np.ndarray,
    mapping: list[int],
    image_shape: tuple[int, int],
    min_area: float,
    min_visibility: float,
    bbox_type: Literal['hbb', 'obb']
)

Shuffle bboxes according to grid tile mapping. bbox->mask->swap_tiles->components->bboxes. min_area, min_visibility, bbox_type. For GridShuffle with bboxes. Args: bboxes (np.ndarray): Array of bounding boxes with shape (num_boxes, 4+) tiles (np.ndarray): Array of grid tiles mapping (list[int]): Mapping of tile indices image_shape (tuple[int, int]): Shape of the image (height, width) min_area (float): Minimum area of a bounding box to keep min_visibility (float): Minimum visibility ratio of a bounding box to keep bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB is not supported here. Returns: np.ndarray: Shuffled bounding boxes

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
tiles	np.ndarray	-	-
mapping	list[int]	-	-
image_shape	tuple[int, int]	-	-
min_area	float	-	-
min_visibility	float	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

create_shape_groupsfunction

create_shape_groups(
    tiles: np.ndarray
)

Group tiles by (height, width) and return dict mapping shape -> list of tile indices. For GridShuffle so shuffling happens only within same-shaped tiles.

Parameters

Name	Type	Default	Description
tiles	np.ndarray	-	-

shuffle_tiles_within_shape_groupsfunction

shuffle_tiles_within_shape_groups(
    shape_groups: dict[tuple[int, int], list[int]],
    random_generator: np.random.Generator
)

Shuffles indices within each group of similar shapes and creates a list where each index points to the index of the tile it should be mapped to. Args: shape_groups (dict[tuple[int, int], list[int]]): Groups of tile indices categorized by shape. random_generator (np.random.Generator): The random generator to use for shuffling the indices. If None, a new random generator will be used. Returns: list[int]: A list where each index is mapped to the new index of the tile after shuffling.

Parameters

Name	Type	Default	Description
shape_groups	dict[tuple[int, int], list[int]]	-	-
random_generator	np.random.Generator	-	-

compute_pairwise_distancesfunction

compute_pairwise_distances(
    points1: np.ndarray,
    points2: np.ndarray
)

Compute pairwise Euclidean squared distances between points1 (N, 2) and points2 (M, 2). Returns (N, M) matrix. For TPS and nearest-neighbor. Uses cv2.gemm. Args: points1 (np.ndarray): First set of points with shape (N, 2) points2 (np.ndarray): Second set of points with shape (M, 2) Returns: np.ndarray: Matrix of pairwise distances with shape (N, M)

Parameters

Name	Type	Default	Description
points1	np.ndarray	-	-
points2	np.ndarray	-	-

compute_tps_weightsfunction

compute_tps_weights(
    src_points: np.ndarray,
    dst_points: np.ndarray
)

Compute Thin Plate Spline weights from src_points and dst_points. Returns (nonlinear_weights, affine_weights) for TPS warp. For ThinPlateSpline. Args: src_points (np.ndarray): Source control points with shape (num_points, 2) dst_points (np.ndarray): Destination control points with shape (num_points, 2) Returns: tuple[np.ndarray, np.ndarray]: Tuple of (nonlinear_weights, affine_weights) - nonlinear_weights: TPS kernel weights for nonlinear deformation (num_points, 2) - affine_weights: Weights for affine transformation (3, 2) [constant term, x scale/shear, y scale/shear] Note: The TPS interpolation is decomposed into: 1. Nonlinear part (controlled by kernel weights) 2. Affine part (global scaling, rotation, translation)

Parameters

Name	Type	Default	Description
src_points	np.ndarray	-	-
dst_points	np.ndarray	-	-

tps_transformfunction

tps_transform(
    target_points: np.ndarray,
    control_points: np.ndarray,
    nonlinear_weights: np.ndarray,
    affine_weights: np.ndarray
)

Apply TPS transformation to target_points given control_points and nonlinear_weights, affine_weights. All float32. For ThinPlateSpline remap.

Parameters

Name	Type	Default	Description
target_points	np.ndarray	-	-
control_points	np.ndarray	-	-
nonlinear_weights	np.ndarray	-	-
affine_weights	np.ndarray	-	-

get_camera_matrix_distortion_mapsfunction

get_camera_matrix_distortion_maps(
    image_shape: tuple[int, int],
    k: float
)

Generate (map_x, map_y) from camera matrix model. image_shape, k. For OpticalDistortion. cv2.initUndistortRectifyMap style. Args: image_shape (tuple[int, int]): Image shape (height, width) k (float): Distortion coefficient Returns: tuple[np.ndarray, np.ndarray]: Tuple of (map_x, map_y) distortion maps

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
k	float	-	-

get_fisheye_distortion_mapsfunction

get_fisheye_distortion_maps(
    image_shape: tuple[int, int],
    k: float
)

Generate (map_x, map_y) distortion maps from fisheye model. image_shape, k. Radial distortion r*(1+k*r_norm^2). For OpticalDistortion fisheye. Args: image_shape (tuple[int, int]): Image shape (height, width) k (float): Distortion coefficient Returns: tuple[np.ndarray, np.ndarray]: Tuple of (map_x, map_y) distortion maps

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
k	float	-	-

generate_control_pointsfunction

generate_control_points(
    num_control_points: int
)

Generate control points for TPS in unit square. num_control_points per side; special case 2 -> 4 corners + center. Returns (N, 2). For ThinPlateSpline. Args: num_control_points (int): Number of control points per side Returns: np.ndarray: Control points with shape (N, 2)

Parameters

Name	Type	Default	Description
num_control_points	int	-	-

hflip_imagesfunction

hflip_images(
    volume: np.ndarray
)

Perform horizontal flip on a single volume (D, H, W) or (D, H, W, C). Flips along width axis. For Transforms3D HorizontalFlip. Flips the volume along the width axis (axis=2). Handles inputs with shapes (D, H, W) or (D, H, W, C). Args: volume (np.ndarray): Input volume. Returns: np.ndarray: Horizontally flipped volume.

Parameters

Name	Type	Default	Description
volume	np.ndarray	-	-

vflip_imagesfunction

vflip_images(
    volume: np.ndarray
)

Perform vertical flip on a single volume (D, H, W) or (D, H, W, C). Flips along height axis. For Transforms3D VerticalFlip. Flips the volume along the height axis (axis=1). Handles inputs with shapes (D, H, W) or (D, H, W, C). Args: volume (np.ndarray): Input volume. Returns: np.ndarray: Vertically flipped volume.

Parameters

Name	Type	Default	Description
volume	np.ndarray	-	-

hflip_volumesfunction

hflip_volumes(
    volumes: np.ndarray
)

Perform horizontal flip on batch of volumes (B, D, H, W) or (B, D, H, W, C). Flips along width axis. For Transforms3D batch HorizontalFlip. Flips the volumes along the width axis (axis=3). Handles inputs with shapes (B, D, H, W) or (B, D, H, W, C). Args: volumes (np.ndarray): Input batch of volumes. Returns: np.ndarray: Horizontally flipped batch of volumes.

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	-

vflip_volumesfunction

vflip_volumes(
    volumes: np.ndarray
)

Perform vertical flip on batch of volumes (B, D, H, W) or (B, D, H, W, C). Flips along height axis. For Transforms3D batch VerticalFlip. Flips the volumes along the height axis (axis=2). Handles inputs with shapes (B, D, H, W) or (B, D, H, W, C). Args: volumes (np.ndarray): Input batch of volumes. Returns: np.ndarray: Vertically flipped batch of volumes.

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	-

rot90_volumesfunction

rot90_volumes(
    volumes: np.ndarray,
    group_element: Literal['e', 'r90', 'r180', 'r270']
)

Rotate batch of volumes 90° CCW in H-W plane. group_element: e, r90, r180, r270. Shape (B, D, H, W) or (B, D, H, W, C). For Transforms3D D4/C4. Rotates the volumes in the height-width plane (axes 2 and 3). Handles inputs with shapes (B, D, H, W) or (B, D, H, W, C). Args: volumes (np.ndarray): Input batch of volumes. group_element (Literal['e', 'r90', 'r180', 'r270']): C4 group element to apply. Returns: np.ndarray: Rotated batch of volumes.

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	-
group_element	One of: 'e' 'r90' 'r180' 'r270'	-	-

erodefunction

erode(
    img: ImageType,
    kernel: np.ndarray
)

One iteration of morphological erosion. Shrinks bright regions. Use for mask/bbox morphology. Same shape and channel count. This function applies erosion to an image using the cv2.erode function. Args: img (ImageType): Input image as a numpy array. kernel (np.ndarray): Kernel as a numpy array. Returns: ImageType: The eroded image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
kernel	np.ndarray	-	-

dilatefunction

dilate(
    img: ImageType,
    kernel: np.ndarray
)

One iteration of morphological dilation. Expands bright regions. Use for mask/bbox morphology. Same shape and channel count. This function applies dilation to an image using the cv2.dilate function. Args: img (ImageType): Input image as a numpy array. kernel (np.ndarray): Kernel as a numpy array. Returns: ImageType: The dilated image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
kernel	np.ndarray	-	-

morphologyfunction

morphology(
    img: ImageType,
    kernel: np.ndarray,
    operation: Literal['dilation', 'erosion']
)

Apply dilation or erosion to an image. operation: 'dilation' or 'erosion'; kernel is structuring element. For BboxMorphology / mask cleanup. This function applies morphology to an image using the cv2.morphologyEx function. Args: img (ImageType): Input image as a numpy array. kernel (np.ndarray): Kernel as a numpy array. operation (Literal['dilation', 'erosion']): The operation to apply. Returns: np.ndarray: The morphology applied to the image.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
kernel	np.ndarray	-	-
operation	One of: 'dilation' 'erosion'	-	-

bboxes_morphologyfunction

bboxes_morphology(
    bboxes: np.ndarray,
    kernel: np.ndarray,
    operation: Literal['dilation', 'erosion'],
    image_shape: tuple[int, int],
    bbox_type: Literal['hbb', 'obb']
)

Apply dilation or erosion to bboxes via mask. bbox->mask->morphology->bbox. kernel, operation, image_shape, bbox_type (hbb/obb). For BboxMorphology. This function applies morphology to bounding boxes by first converting the bounding boxes to a mask and then applying the morphology to the mask. Args: bboxes (np.ndarray): Bounding boxes as a numpy array. kernel (np.ndarray): Kernel as a numpy array. operation (Literal['dilation', 'erosion']): The operation to apply. image_shape (tuple[int, int]): The shape of the image. bbox_type (Literal['hbb', 'obb']): Bounding box type; OBB is not supported here. Returns: np.ndarray: The morphology applied to the bounding boxes.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
kernel	np.ndarray	-	-
operation	One of: 'dilation' 'erosion'	-	-
image_shape	tuple[int, int]	-	-
bbox_type	One of: 'hbb' 'obb'	-	-

d4_imagesfunction

d4_images(
    img: ImageType,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']
)

Apply one of eight D4 square symmetries to a batch of images (N, H, W[, C]). group_member: e, r90, r180, r270, v, hvt, h, t. Rotations and flips. This function manipulates a batch of images using transformations such as rotations and flips, corresponding to the `D_4` dihedral group symmetry operations. Each transformation is identified by a unique group member code. Args: img (ImageType): The input batch of images to transform with shape: - (N, H, W) for grayscale images - (N, H, W, C) for multi-channel images where N is the batch size, H is height, W is width, C is channels group_member (Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']): A string identifier indicating the specific transformation to apply. Valid codes include: - 'e': Identity (no transformation). - 'r90': Rotate 90 degrees counterclockwise. - 'r180': Rotate 180 degrees. - 'r270': Rotate 270 degrees counterclockwise. - 'v': Vertical flip. - 'hvt': Transpose over second diagonal - 'h': Horizontal flip. - 't': Transpose (reflect over the main diagonal). Returns: np.ndarray: The transformed batch of images.

Parameters

Name	Type	Default	Description
img	ImageType	-	-
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	-

resize_bboxes
bboxes_rot90
bboxes_d4
keypoints_rot90
keypoints_d4
resize
resize_pyvips
resize_pil
scale
keypoints_scale
perspective
perspective_images
perspective_bboxes
rotation2d_matrix_to_euler_angles
perspective_keypoints
is_identity_matrix
keypoints_affine
apply_affine_to_points
calculate_affine_transform_padding
bboxes_affine_largest_box
bboxes_affine_ellipse
bboxes_affine
to_distance_maps
validate_if_not_found_coords
from_distance_maps
d4
transpose
transpose_images
transpose_volumes
rot90
rot90_images
bboxes_vflip
bboxes_hflip
bboxes_transpose
keypoints_vflip
keypoints_hflip
keypoints_transpose
pad
pad_with_params
pad_images_with_params
remap_keypoints_via_mask
remap_keypoints
generate_inverse_distortion_map
upscale_distortion_maps
remap_bboxes
generate_displacement_fields
pad_bboxes
validate_bboxes
shift_bboxes
get_pad_grid_dimensions
generate_reflected_bboxes
flip_bboxes
distort_image
bbox_distort_image
distort_image_keypoints
generate_distorted_grid_polygons
pad_keypoints
validate_keypoints
shift_keypoints
generate_reflected_keypoints
flip_keypoints
create_affine_transformation_matrix
compute_transformed_image_bounds
compute_affine_warp_output_shape
center
center_bbox
generate_grid
normalize_grid_distortion_steps
almost_equal_intervals
generate_shuffled_splits
split_uniform_grid
generate_perspective_points
order_points
compute_perspective_params
expand_transform
create_piecewise_affine_maps
bboxes_piecewise_affine
get_dimension_padding
get_padding_params
adjust_padding_by_position
swap_tiles_on_keypoints
swap_tiles_on_image
is_valid_component
bboxes_grid_shuffle
create_shape_groups
shuffle_tiles_within_shape_groups
compute_pairwise_distances
compute_tps_weights
tps_transform
get_camera_matrix_distortion_maps
get_fisheye_distortion_maps
generate_control_points
hflip_images
vflip_images
hflip_volumes
vflip_volumes
rot90_volumes
erode
dilate
morphology
bboxes_morphology
d4_images