albumentations.augmentations.geometric.functional

View Source on GitHub

Functional implementations of geometric image transformations. This module provides low-level functions for geometric operations such as rotation, resizing, flipping, perspective transforms, and affine transformations on images, bounding boxes and keypoints.

Members

functionadjust_padding_by_position
functionalmost_equal_intervals
functionapply_affine_to_points
functionbbox_distort_image
functionbboxes_affine
functionbboxes_affine_ellipse
functionbboxes_affine_largest_box
functionbboxes_d4
functionbboxes_grid_shuffle
functionbboxes_hflip
functionbboxes_piecewise_affine
functionbboxes_rot90
functionbboxes_transpose
functionbboxes_vflip
functioncalculate_affine_transform_padding
functioncenter
functioncenter_bbox
functioncompute_affine_warp_output_shape
functioncompute_pairwise_distances
functioncompute_perspective_params
functioncompute_tps_weights
functioncompute_transformed_image_bounds
functioncopy_make_border_with_value_extension
functioncreate_affine_transformation_matrix
functioncreate_piecewise_affine_maps
functioncreate_shape_groups
functiond4
functiondistort_image
functiondistort_image_keypoints
functionexpand_transform
functionextend_value
functionfind_keypoint
functionflip_bboxes
functionflip_keypoints
functionfrom_distance_maps
functiongenerate_control_points
functiongenerate_displacement_fields
functiongenerate_distorted_grid_polygons
functiongenerate_grid
functiongenerate_inverse_distortion_map
functiongenerate_perspective_points
functiongenerate_reflected_bboxes
functiongenerate_reflected_keypoints
functiongenerate_shuffled_splits
functionget_camera_matrix_distortion_maps
functionget_dimension_padding
functionget_fisheye_distortion_maps
functionget_pad_grid_dimensions
functionget_padding_params
functionis_identity_matrix
functionis_valid_component
functionkeypoints_affine
functionkeypoints_d4
functionkeypoints_hflip
functionkeypoints_rot90
functionkeypoints_scale
functionkeypoints_transpose
functionkeypoints_vflip
functionnormalize_grid_distortion_steps
functionorder_points
functionpad
functionpad_bboxes
functionpad_keypoints
functionpad_with_params
functionperspective
functionperspective_bboxes
functionperspective_keypoints
functionremap
functionremap_bboxes
functionremap_keypoints
functionremap_keypoints_via_mask
functionresize
functionrot90
functionrotation2d_matrix_to_euler_angles
functionscale
functionshift_bboxes
functionshift_keypoints
functionshuffle_tiles_within_shape_groups
functionsplit_uniform_grid
functionswap_tiles_on_image
functionswap_tiles_on_keypoints
functionto_distance_maps
functiontps_transform
functiontranspose
functionvalidate_bboxes
functionvalidate_if_not_found_coords
functionvalidate_keypoints
functionvolume_hflip
functionvolume_rot90
functionvolume_vflip
functionvolumes_hflip
functionvolumes_rot90
functionvolumes_vflip
functionwarp_affine
functionwarp_affine_with_value_extension

adjust_padding_by_positionfunction

adjust_padding_by_position(
    h_top: int,
    h_bottom: int,
    w_left: int,
    w_right: int,
    position: Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random'],
    py_random: np.random.RandomState
)

Adjust padding values based on desired position.

Parameters

Name	Type	Default	Description
h_top	int	-	-
h_bottom	int	-	-
w_left	int	-	-
w_right	int	-	-
position	One of: 'center' 'top_left' 'top_right' 'bottom_left' 'bottom_right' 'random'	-	-
py_random	np.random.RandomState	-	-

almost_equal_intervalsfunction

almost_equal_intervals(
    n: int,
    parts: int
)

Generates an array of nearly equal integer intervals that sum up to `n`. This function divides the number `n` into `parts` nearly equal parts. It ensures that the sum of all parts equals `n`, and the difference between any two parts is at most one. This is useful for distributing a total amount into nearly equal discrete parts.

Parameters

Name	Type	Default	Description
n	int	-	The total value to be split.
parts	int	-	The number of parts to split into.

Returns

np.ndarray: An array of integers where each integer represents the size of a part.

Example

>>> almost_equal_intervals(20, 3)
array([7, 7, 6])  # Splits 20 into three parts: 7, 7, and 6
>>> almost_equal_intervals(16, 4)
array([4, 4, 4, 4])  # Splits 16 into four equal parts

apply_affine_to_pointsfunction

apply_affine_to_points(
    points: np.ndarray,
    matrix: np.ndarray
)

Apply affine transformation to a set of points. This function handles potential division by zero by replacing zero values in the homogeneous coordinate with a small epsilon value.

Parameters

Name	Type	Default	Description
points	np.ndarray	-	Array of points with shape (N, 2).
matrix	np.ndarray	-	3x3 affine transformation matrix.

Returns

np.ndarray: Transformed points with shape (N, 2).

bbox_distort_imagefunction

bbox_distort_image(
    bboxes: np.ndarray,
    generated_mesh: np.ndarray,
    image_shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
generated_mesh	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

bboxes_affinefunction

bboxes_affine(
    bboxes: np.ndarray,
    matrix: np.ndarray,
    rotate_method: Literal['largest_box', 'ellipse'],
    image_shape: tuple[int, int],
    border_mode: int,
    output_shape: tuple[int, int]
)

Apply an affine transformation to bounding boxes. For reflection border modes (cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT), this function: 1. Calculates necessary padding to avoid information loss 2. Applies padding to the bounding boxes 3. Adjusts the transformation matrix to account for padding 4. Applies the affine transformation 5. Validates the transformed bounding boxes For other border modes, it directly applies the affine transformation without padding.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Input bounding boxes
matrix	np.ndarray	-	Affine transformation matrix
rotate_method	One of: 'largest_box' 'ellipse'	-	Method for rotating bounding boxes ('largest_box' or 'ellipse')
image_shape	tuple[int, int]	-	Shape of the input image
border_mode	int	-	OpenCV border mode
output_shape	tuple[int, int]	-	Shape of the output image

Returns

np.ndarray: Transformed and normalized bounding boxes

bboxes_affine_ellipsefunction

bboxes_affine_ellipse(
    bboxes: np.ndarray,
    matrix: np.ndarray
)

Apply an affine transformation to bounding boxes using an ellipse approximation method. This function transforms bounding boxes by approximating each box with an ellipse, transforming points along the ellipse's circumference, and then computing the new bounding box that encloses the transformed ellipse.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	An array of bounding boxes with shape (N, 4+) where N is the number of bounding boxes. Each row should contain [x_min, y_min, x_max, y_max] followed by any additional attributes (e.g., class labels).
matrix	np.ndarray	-	The 3x3 affine transformation matrix to apply.

Returns

np.ndarray: An array of transformed bounding boxes with the same shape as the input.

Notes

- This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max]. - The ellipse approximation method can provide a tighter bounding box compared to the largest box method, especially for rotations. - 360 points are used to approximate each ellipse, which provides a good balance between accuracy and computational efficiency. - Any additional attributes beyond the first 4 coordinates are preserved unchanged. - This method may be more suitable for objects that are roughly elliptical in shape.

bboxes_affine_largest_boxfunction

bboxes_affine_largest_box(
    bboxes: np.ndarray,
    matrix: np.ndarray
)

Apply an affine transformation to bounding boxes and return the largest enclosing boxes. This function transforms each corner of every bounding box using the given affine transformation matrix, then computes the new bounding boxes that fully enclose the transformed corners.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	An array of bounding boxes with shape (N, 4+) where N is the number of bounding boxes. Each row should contain [x_min, y_min, x_max, y_max] followed by any additional attributes (e.g., class labels).
matrix	np.ndarray	-	The 3x3 affine transformation matrix to apply.

Returns

np.ndarray: An array of transformed bounding boxes with the same shape as the input.

Example

>>> bboxes = np.array([[10, 10, 20, 20, 1], [30, 30, 40, 40, 2]])  # Two boxes with class labels
>>> matrix = np.array([[2, 0, 5], [0, 2, 5], [0, 0, 1]])  # Scale by 2 and translate by (5, 5)
>>> transformed_bboxes = bboxes_affine_largest_box(bboxes, matrix)
>>> print(transformed_bboxes)
[[ 25.  25.  45.  45.   1.]
 [ 65.  65.  85.  85.   2.]]

Notes

- This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max]. - The resulting bounding boxes are the smallest axis-aligned boxes that completely enclose the transformed original boxes. They may be larger than the minimal possible bounding box if the original box becomes rotated. - Any additional attributes beyond the first 4 coordinates are preserved unchanged. - This method is called "largest box" because it returns the largest axis-aligned box that encloses all corners of the transformed bounding box.

bboxes_d4function

bboxes_d4(
    bboxes: np.ndarray,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']
)

Applies a `D_4` symmetry group transformation to a bounding box. The function transforms a bounding box according to the specified group member from the `D_4` group. These transformations include rotations and reflections, specified to work on an image's bounding box given its dimensions. Args: bboxes (np.ndarray): A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...). group_member (Literal["e", "r90", "r180", "r270", "v", "hvt", "h", "t"]): A string identifier for the `D_4` group transformation to apply. Returns: BoxInternalType: The transformed bounding box. Raises: ValueError: If an invalid group member is specified.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	-

Returns

BoxInternalType: The transformed bounding box.

bboxes_grid_shufflefunction

bboxes_grid_shuffle(
    bboxes: np.ndarray,
    tiles: np.ndarray,
    mapping: list[int],
    image_shape: tuple[int, int],
    min_area: float,
    min_visibility: float
)

Shuffle bounding boxes according to grid mapping.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (num_boxes, 4+)
tiles	np.ndarray	-	Array of grid tiles
mapping	list[int]	-	Mapping of tile indices
image_shape	tuple[int, int]	-	Shape of the image (height, width)
min_area	float	-	Minimum area of a bounding box to keep
min_visibility	float	-	Minimum visibility ratio of a bounding box to keep

Returns

np.ndarray: Shuffled bounding boxes

bboxes_hflipfunction

bboxes_hflip(
    bboxes: np.ndarray
)

Flip bounding boxes horizontally.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (num_boxes, 4+)

Returns

np.ndarray: Horizontally flipped bounding boxes

bboxes_piecewise_affinefunction

bboxes_piecewise_affine(
    bboxes: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    border_mode: int,
    image_shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
border_mode	int	-	-
image_shape	tuple[int, int]	-	-

bboxes_rot90function

bboxes_rot90(
    bboxes: np.ndarray,
    factor: int
)

Rotates bounding boxes by 90 degrees CCW (see np.rot90)

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (num_boxes, 4+)
factor	int	-	Number of 90-degree rotations (1, 2, or 3)

Returns

np.ndarray: Rotated bounding boxes

bboxes_transposefunction

bboxes_transpose(
    bboxes: np.ndarray
)

Transpose bounding boxes along the main diagonal.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (num_boxes, 4+)

Returns

np.ndarray: Transposed bounding boxes

bboxes_vflipfunction

bboxes_vflip(
    bboxes: np.ndarray
)

Flip bounding boxes vertically.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (num_boxes, 4+)

Returns

np.ndarray: Vertically flipped bounding boxes

calculate_affine_transform_paddingfunction

calculate_affine_transform_padding(
    matrix: np.ndarray,
    image_shape: tuple[int, int]
)

Calculate the necessary padding for an affine transformation to avoid empty spaces.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

centerfunction

center(
    image_shape: tuple[int, int]
)

Calculate the center coordinates if image. Used by images, masks and keypoints.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	The shape of the image.

Returns

tuple[float, float]: center_x, center_y

center_bboxfunction

center_bbox(
    image_shape: tuple[int, int]
)

Calculate the center coordinates for of image for bounding boxes.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	The shape of the image.

Returns

tuple[float, float]: center_x, center_y

compute_affine_warp_output_shapefunction

compute_affine_warp_output_shape(
    matrix: np.ndarray,
    input_shape: tuple[int, ...]
)

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
input_shape	tuple[int, ...]	-	-

compute_pairwise_distancesfunction

compute_pairwise_distances(
    points1: np.ndarray,
    points2: np.ndarray
)

Compute pairwise distances between two sets of points.

Parameters

Name	Type	Default	Description
points1	np.ndarray	-	First set of points with shape (N, 2)
points2	np.ndarray	-	Second set of points with shape (M, 2)

Returns

np.ndarray: Matrix of pairwise distances with shape (N, M)

compute_perspective_paramsfunction

compute_perspective_params(
    points: np.ndarray,
    image_shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
points	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

compute_tps_weightsfunction

compute_tps_weights(
    src_points: np.ndarray,
    dst_points: np.ndarray
)

Compute Thin Plate Spline weights.

Parameters

Name	Type	Default	Description
src_points	np.ndarray	-	Source control points with shape (num_points, 2)
dst_points	np.ndarray	-	Destination control points with shape (num_points, 2)

Returns

tuple[np.ndarray, np.ndarray]: Tuple of (nonlinear_weights, affine_weights)

compute_transformed_image_boundsfunction

compute_transformed_image_bounds(
    matrix: np.ndarray,
    image_shape: tuple[int, int]
)

Compute the bounds of an image after applying an affine transformation.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	The 3x3 affine transformation matrix.
image_shape	tuple[int, int]	-	The shape of the image as (height, width).

Returns

tuple[np.ndarray, np.ndarray]: A tuple containing:

copy_make_border_with_value_extensionfunction

copy_make_border_with_value_extension(
    img: np.ndarray,
    top: int,
    bottom: int,
    left: int,
    right: int,
    border_mode: int,
    value: tuple[float, ...] | float
)

Parameters

Name	Type	Default	Description
img	np.ndarray	-	-
top	int	-	-
bottom	int	-	-
left	int	-	-
right	int	-	-
border_mode	int	-	-
value	One of: tuple[float, ...] float	-	-

create_affine_transformation_matrixfunction

create_affine_transformation_matrix(
    translate: Mapping[str, float],
    shear: dict[str, float],
    scale: dict[str, float],
    rotate: float,
    shift: tuple[float, float]
)

Create an affine transformation matrix combining translation, shear, scale, and rotation.

Parameters

Name	Type	Default	Description
translate	Mapping[str, float]	-	Translation in x and y directions.
shear	dict[str, float]	-	Shear in x and y directions (in degrees).
scale	dict[str, float]	-	Scale factors for x and y directions.
rotate	float	-	Rotation angle in degrees.
shift	tuple[float, float]	-	Shift to apply before and after transformations.

Returns

np.ndarray: The resulting 3x3 affine transformation matrix.

create_piecewise_affine_mapsfunction

create_piecewise_affine_maps(
    image_shape: tuple[int, int],
    grid: tuple[int, int],
    scale: float,
    absolute_scale: bool,
    random_generator: np.random.Generator
)

Create maps for piecewise affine transformation using OpenCV's remap function.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
grid	tuple[int, int]	-	-
scale	float	-	-
absolute_scale	bool	-	-
random_generator	np.random.Generator	-	-

create_shape_groupsfunction

create_shape_groups(
    tiles: np.ndarray
)

Groups tiles by their shape and stores the indices for each shape.

Parameters

Name	Type	Default	Description
tiles	np.ndarray	-	-

d4function

d4(
    img: np.ndarray,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't']
)

Applies a `D_4` symmetry group transformation to an image array. This function manipulates an image using transformations such as rotations and flips, corresponding to the `D_4` dihedral group symmetry operations. Each transformation is identified by a unique group member code.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	The input image array to transform.
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	A string identifier indicating the specific transformation to apply. Valid codes include: - 'e': Identity (no transformation). - 'r90': Rotate 90 degrees counterclockwise. - 'r180': Rotate 180 degrees. - 'r270': Rotate 270 degrees counterclockwise. - 'v': Vertical flip. - 'hvt': Transpose over second diagonal - 'h': Horizontal flip. - 't': Transpose (reflect over the main diagonal).

Returns

np.ndarray: The transformed image array.

distort_imagefunction

distort_image(
    image: np.ndarray,
    generated_mesh: np.ndarray,
    interpolation: int
)

Apply perspective distortion to an image based on a generated mesh. This function applies a perspective transformation to each cell of the image defined by the generated mesh. The distortion is applied using OpenCV's perspective transformation and blending techniques.

Parameters

Name	Type	Default	Description
image	np.ndarray	-	The input image to be distorted. Can be a 2D grayscale image or a 3D color image.
generated_mesh	np.ndarray	-	A 2D array where each row represents a quadrilateral cell as [x1, y1, x2, y2, dst_x1, dst_y1, dst_x2, dst_y2, dst_x3, dst_y3, dst_x4, dst_y4]. The first four values define the source rectangle, and the last eight values define the destination quadrilateral.
interpolation	int	-	Interpolation method to be used in the perspective transformation. Should be one of the OpenCV interpolation flags (e.g., cv2.INTER_LINEAR).

Returns

np.ndarray: The distorted image with the same shape and dtype as the input image.

Example

>>> image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
>>> mesh = np.array([[0, 0, 50, 50, 5, 5, 45, 5, 45, 45, 5, 45]])
>>> distorted = distort_image(image, mesh, cv2.INTER_LINEAR)
>>> distorted.shape
(100, 100, 3)

Notes

- The function preserves the channel dimension of the input image. - Each cell of the generated mesh is transformed independently and then blended into the output image. - The distortion is applied using perspective transformation, which allows for more complex distortions compared to affine transformations.

distort_image_keypointsfunction

distort_image_keypoints(
    keypoints: np.ndarray,
    generated_mesh: np.ndarray,
    image_shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
generated_mesh	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

expand_transformfunction

expand_transform(
    matrix: np.ndarray,
    shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
shape	tuple[int, int]	-	-

extend_valuefunction

extend_value(
    value: tuple[float, ...] | float,
    num_channels: int
)

Parameters

Name	Type	Default	Description
value	One of: tuple[float, ...] float	-	-
num_channels	int	-	-

find_keypointfunction

find_keypoint(
    position: tuple[int, int],
    distance_map: np.ndarray,
    threshold: float | None,
    inverted: bool
)

Determine if a valid keypoint can be found at the given position.

Parameters

Name	Type	Default	Description
position	tuple[int, int]	-	-
distance_map	np.ndarray	-	-
threshold	One of: float None	-	-
inverted	bool	-	-

flip_bboxesfunction

flip_bboxes(
    bboxes: np.ndarray,
    flip_horizontal: bool = False,
    flip_vertical: bool = False,
    image_shape: tuple[int, int] = (0, 0)
)

Flip bounding boxes horizontally and/or vertically.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (n, m) where each row is [x_min, y_min, x_max, y_max, ...].
flip_horizontal	bool	False	Whether to flip horizontally.
flip_vertical	bool	False	Whether to flip vertically.
image_shape	tuple[int, int]	(0, 0)	Shape of the image as (height, width).

Returns

np.ndarray: Flipped bounding boxes.

flip_keypointsfunction

flip_keypoints(
    keypoints: np.ndarray,
    flip_horizontal: bool = False,
    flip_vertical: bool = False,
    image_shape: tuple[int, int] = (0, 0)
)

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
flip_horizontal	bool	False	-
flip_vertical	bool	False	-
image_shape	tuple[int, int]	(0, 0)	-

from_distance_mapsfunction

from_distance_maps(
    distance_maps: np.ndarray,
    inverted: bool,
    if_not_found_coords: Sequence[int] | dict[str, Any] | None = None,
    threshold: float | None = None
)

Convert distance maps back to keypoints coordinates. This function is the inverse of `to_distance_maps`. It takes distance maps generated for a set of keypoints and reconstructs the original keypoint coordinates. The function supports both regular and inverted distance maps, and can handle cases where keypoints are not found or fall outside a specified threshold.

Parameters

Name	Type	Default	Description
distance_maps	np.ndarray	-	A 3D numpy array of shape (height, width, nb_keypoints) containing distance maps for each keypoint. Each channel represents the distance map for one keypoint.
inverted	bool	-	If True, treats the distance maps as inverted (where higher values indicate closer proximity to keypoints). If False, treats them as regular distance maps (where lower values indicate closer proximity).
if_not_found_coords	One of: Sequence[int] dict[str, Any] None	None	Coordinates to use for keypoints that are not found or fall outside the threshold. Can be: - None: Drop keypoints that are not found. - Sequence of two integers: Use these as (x, y) coordinates for not found keypoints. - Dict with 'x' and 'y' keys: Use these values for not found keypoints. Defaults to None.
threshold	One of: float None	None	A threshold value to determine valid keypoints. For inverted maps, values >= threshold are considered valid. For regular maps, values <= threshold are considered valid. If None, all keypoints are considered valid. Defaults to None.

Returns

np.ndarray: A 2D numpy array of shape (nb_keypoints, 2) containing the (x, y) coordinates

Example

>>> distance_maps = np.random.rand(100, 100, 3)  # 3 keypoints
>>> inverted = True
>>> if_not_found_coords = [0, 0]
>>> threshold = 0.5
>>> keypoints = from_distance_maps(distance_maps, inverted, if_not_found_coords, threshold)
>>> print(keypoints.shape)
(3, 2)

Notes

- The function uses vectorized operations for improved performance, especially with large numbers of keypoints. - When `threshold` is None, all keypoints are considered valid, and `if_not_found_coords` is not used. - The function assumes that the input distance maps are properly normalized and scaled according to the original image dimensions.

generate_control_pointsfunction

generate_control_points(
    num_control_points: int
)

Generate control points for TPS transformation.

Parameters

Name	Type	Default	Description
num_control_points	int	-	Number of control points per side

Returns

np.ndarray: Control points with shape (N, 2)

generate_displacement_fieldsfunction

generate_displacement_fields(
    image_shape: tuple[int, int],
    alpha: float,
    sigma: float,
    same_dxdy: bool,
    kernel_size: tuple[int, int],
    random_generator: np.random.Generator,
    noise_distribution: Literal['gaussian', 'uniform']
)

Generate displacement fields for elastic transform.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
alpha	float	-	-
sigma	float	-	-
same_dxdy	bool	-	-
kernel_size	tuple[int, int]	-	-
random_generator	np.random.Generator	-	-
noise_distribution	One of: 'gaussian' 'uniform'	-	-

generate_distorted_grid_polygonsfunction

generate_distorted_grid_polygons(
    dimensions: np.ndarray,
    magnitude: int,
    random_generator: np.random.Generator
)

Generate distorted grid polygons based on input dimensions and magnitude. This function creates a grid of polygons and applies random distortions to the internal vertices, while keeping the boundary vertices fixed. The distortion is applied consistently across shared vertices to avoid gaps or overlaps in the resulting grid.

Parameters

Name	Type	Default	Description
dimensions	np.ndarray	-	A 3D array of shape (grid_height, grid_width, 4) where each element is [x_min, y_min, x_max, y_max] representing the dimensions of a grid cell.
magnitude	int	-	Maximum pixel-wise displacement for distortion. The actual displacement will be randomly chosen in the range [-magnitude, magnitude].
random_generator	np.random.Generator	-	A random number generator.

Returns

np.ndarray: A 2D array of shape (total_cells, 8) where each row represents a distorted polygon

Example

>>> dimensions = np.array([[[0, 0, 50, 50], [50, 0, 100, 50]],
...                        [[0, 50, 50, 100], [50, 50, 100, 100]]])
>>> distorted = generate_distorted_grid_polygons(dimensions, magnitude=10)
>>> distorted.shape
(4, 8)

Notes

- Only internal grid points are distorted; boundary points remain fixed. - The function ensures consistent distortion across shared vertices of adjacent cells. - The distortion is applied to the following points of each internal cell: * Bottom-right of the cell above and to the left * Bottom-left of the cell above * Top-right of the cell to the left * Top-left of the current cell - Each square represents a cell, and the X marks indicate the coordinates where displacement occurs. +--+--+--+--+ | | | | | +--X--X--X--+ | | | | | +--X--X--X--+ | | | | | +--X--X--X--+ | | | | | +--+--+--+--+ - For each X, the coordinates of the left, right, top, and bottom edges in the four adjacent cells are displaced.

generate_gridfunction

generate_grid(
    image_shape: tuple[int, int],
    steps_x: list[float],
    steps_y: list[float],
    num_steps: int
)

Generate a distorted grid for image transformation based on given step sizes. This function creates two 2D arrays (map_x and map_y) that represent a distorted version of the original image grid. These arrays can be used with OpenCV's remap function to apply grid distortion to an image.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	The shape of the image as (height, width).
steps_x	list[float]	-	List of step sizes for the x-axis distortion. The length should be num_steps + 1. Each value represents the relative step size for a segment of the grid in the x direction.
steps_y	list[float]	-	List of step sizes for the y-axis distortion. The length should be num_steps + 1. Each value represents the relative step size for a segment of the grid in the y direction.
num_steps	int	-	The number of steps to divide each axis into. This determines the granularity of the distortion grid.

Returns

tuple[np.ndarray, np.ndarray]: A tuple containing two 2D numpy arrays:

Example

>>> image_shape = (100, 100)
>>> steps_x = [1.1, 0.9, 1.0, 1.2, 0.95, 1.05]
>>> steps_y = [0.9, 1.1, 1.0, 1.1, 0.9, 1.0]
>>> num_steps = 5
>>> map_x, map_y = generate_grid(image_shape, steps_x, steps_y, num_steps)
>>> distorted_image = cv2.remap(image, map_x, map_y, cv2.INTER_LINEAR)

Notes

- The function generates a grid where each cell can be distorted independently. - The distortion is controlled by the steps_x and steps_y parameters, which determine how much each grid line is shifted. - The resulting map_x and map_y can be used directly with cv2.remap() to apply the distortion to an image. - The distortion is applied smoothly across each grid cell using linear interpolation.

generate_inverse_distortion_mapfunction

generate_inverse_distortion_map(
    map_x: np.ndarray,
    map_y: np.ndarray,
    shape: tuple[int, int]
)

Generate inverse mapping for strong distortions.

Parameters

Name	Type	Default	Description
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
shape	tuple[int, int]	-	-

generate_perspective_pointsfunction

generate_perspective_points(
    image_shape: tuple[int, int],
    scale: float,
    random_generator: np.random.Generator
)

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
scale	float	-	-
random_generator	np.random.Generator	-	-

generate_reflected_bboxesfunction

generate_reflected_bboxes(
    bboxes: np.ndarray,
    grid_dims: dict[str, tuple[int, int]],
    image_shape: tuple[int, int],
    center_in_origin: bool = False
)

Generate reflected bounding boxes for the entire reflection grid.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Original bounding boxes.
grid_dims	dict[str, tuple[int, int]]	-	Grid dimensions and original position.
image_shape	tuple[int, int]	-	Shape of the original image as (height, width).
center_in_origin	bool	False	If True, center the grid at the origin. Default is False.

Returns

np.ndarray: Array of reflected and shifted bounding boxes for the entire grid.

generate_reflected_keypointsfunction

generate_reflected_keypoints(
    keypoints: np.ndarray,
    grid_dims: dict[str, tuple[int, int]],
    image_shape: tuple[int, int],
    center_in_origin: bool = False
)

Generate reflected keypoints for the entire reflection grid. This function creates a grid of keypoints by reflecting and shifting the original keypoints. It handles both centered and non-centered grids based on the `center_in_origin` parameter.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Original keypoints array of shape (N, 4+), where N is the number of keypoints, and each keypoint is represented by at least 4 values (x, y, angle, scale, ...).
grid_dims	dict[str, tuple[int, int]]	-	A dictionary containing grid dimensions and original position. It should have the following keys: - "grid_shape": tuple[int, int] representing (grid_rows, grid_cols) - "original_position": tuple[int, int] representing (original_row, original_col)
image_shape	tuple[int, int]	-	Shape of the original image as (height, width).
center_in_origin	bool	False	If True, center the grid at the origin. Default is False.

Returns

np.ndarray: Array of reflected and shifted keypoints for the entire grid. The shape is

Notes

- The function handles keypoint flipping and shifting to create a grid of reflected keypoints. - It preserves the angle and scale information of the keypoints during transformations. - The resulting grid can be either centered at the origin or positioned based on the original grid.

generate_shuffled_splitsfunction

generate_shuffled_splits(
    size: int,
    divisions: int,
    random_generator: np.random.Generator
)

Generate shuffled splits for a given dimension size and number of divisions.

Parameters

Name	Type	Default	Description
size	int	-	Total size of the dimension (height or width).
divisions	int	-	Number of divisions (rows or columns).
random_generator	np.random.Generator	-	The random generator to use for shuffling the splits. If None, the splits are not shuffled.

Returns

np.ndarray: Cumulative edges of the shuffled intervals.

get_camera_matrix_distortion_mapsfunction

get_camera_matrix_distortion_maps(
    image_shape: tuple[int, int],
    k: float
)

Generate distortion maps using camera matrix model.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	Image shape (height, width)
k	float	-	Distortion coefficient

Returns

tuple[np.ndarray, np.ndarray]: Tuple of (map_x, map_y) distortion maps

get_dimension_paddingfunction

get_dimension_padding(
    current_size: int,
    min_size: int | None,
    divisor: int | None
)

Calculate padding for a single dimension.

Parameters

Name	Type	Default	Description
current_size	int	-	Current size of the dimension
min_size	One of: int None	-	Minimum size requirement, if any
divisor	One of: int None	-	Divisor for padding to make size divisible, if any

Returns

tuple[int, int]: (pad_before, pad_after)

get_fisheye_distortion_mapsfunction

get_fisheye_distortion_maps(
    image_shape: tuple[int, int],
    k: float
)

Generate distortion maps using fisheye model.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	Image shape (height, width)
k	float	-	Distortion coefficient

Returns

tuple[np.ndarray, np.ndarray]: Tuple of (map_x, map_y) distortion maps

get_pad_grid_dimensionsfunction

get_pad_grid_dimensions(
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    image_shape: tuple[int, int]
)

Calculate the dimensions of the grid needed for reflection padding and the position of the original image.

Parameters

Name	Type	Default	Description
pad_top	int	-	Number of pixels to pad above the image.
pad_bottom	int	-	Number of pixels to pad below the image.
pad_left	int	-	Number of pixels to pad to the left of the image.
pad_right	int	-	Number of pixels to pad to the right of the image.
image_shape	tuple[int, int]	-	Shape of the original image as (height, width).

Returns

dict[str, tuple[int, int]]: A dictionary containing:

get_padding_paramsfunction

get_padding_params(
    image_shape: tuple[int, int],
    min_height: int | None,
    min_width: int | None,
    pad_height_divisor: int | None,
    pad_width_divisor: int | None
)

Calculate padding parameters based on target dimensions.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	(height, width) of the image
min_height	One of: int None	-	Minimum height requirement, if any
min_width	One of: int None	-	Minimum width requirement, if any
pad_height_divisor	One of: int None	-	Divisor for height padding, if any
pad_width_divisor	One of: int None	-	Divisor for width padding, if any

Returns

tuple[int, int, int, int]: (pad_top, pad_bottom, pad_left, pad_right)

is_identity_matrixfunction

is_identity_matrix(
    matrix: np.ndarray
)

Check if the given matrix is an identity matrix.

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	A 3x3 affine transformation matrix.

Returns

bool: True if the matrix is an identity matrix, False otherwise.

is_valid_componentfunction

is_valid_component(
    component_area: float,
    original_area: float,
    min_area: float | None,
    min_visibility: float | None
)

Validate if a component meets the minimum requirements.

Parameters

Name	Type	Default	Description
component_area	float	-	-
original_area	float	-	-
min_area	One of: float None	-	-
min_visibility	One of: float None	-	-

keypoints_affinefunction

keypoints_affine(
    keypoints: np.ndarray,
    matrix: np.ndarray,
    image_shape: tuple[int, int],
    scale: dict[str, float],
    border_mode: int
)

Apply an affine transformation to keypoints. This function transforms keypoints using the given affine transformation matrix. It handles reflection padding if necessary, updates coordinates, angles, and scales.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (N, 4+) where N is the number of keypoints. Each keypoint is represented as [x, y, angle, scale, ...].
matrix	np.ndarray	-	The 2x3 or 3x3 affine transformation matrix.
image_shape	tuple[int, int]	-	Shape of the image (height, width).
scale	dict[str, float]	-	Dictionary containing scale factors for x and y directions. Expected keys are 'x' and 'y'.
border_mode	int	-	Border mode for handling keypoints near image edges. Use cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT, etc.

Returns

np.ndarray: Transformed keypoints array with the same shape as input.

Example

>>> keypoints = np.array([[100, 100, 0, 1]])
>>> matrix = np.array([[1.5, 0, 10], [0, 1.2, 20]])
>>> scale = {'x': 1.5, 'y': 1.2}
>>> transformed_keypoints = keypoints_affine(keypoints, matrix, (480, 640), scale, cv2.BORDER_REFLECT_101)

Notes

- The function applies reflection padding if the mode is in REFLECT_BORDER_MODES. - Coordinates (x, y) are transformed using the affine matrix. - Angles are adjusted based on the rotation component of the affine transformation. - Scales are multiplied by the maximum of x and y scale factors. - The @angle_2pi_range decorator ensures angles remain in the [0, 2π] range.

keypoints_d4function

keypoints_d4(
    keypoints: np.ndarray,
    group_member: Literal['e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'],
    image_shape: tuple[int, int]
)

Applies a `D_4` symmetry group transformation to a keypoint. This function adjusts a keypoint's coordinates according to the specified `D_4` group transformation, which includes rotations and reflections suitable for image processing tasks. These transformations account for the dimensions of the image to ensure the keypoint remains within its boundaries.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).
group_member	One of: 'e' 'r90' 'r180' 'r270' 'v' 'hvt' 'h' 't'	-	A string identifier for the `D_4` group transformation to apply. Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hv', 'h', 't'.
image_shape	tuple[int, int]	-	The shape of the image.

Returns

KeypointInternalType: The transformed keypoint.

keypoints_hflipfunction

keypoints_hflip(
    keypoints: np.ndarray,
    cols: int
)

Flip keypoints horizontally.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (num_keypoints, 2+)
cols	int	-	Number of columns in the image

Returns

np.ndarray: Horizontally flipped keypoints

keypoints_rot90function

keypoints_rot90(
    keypoints: np.ndarray,
    factor: Literal[0, 1, 2, 3],
    image_shape: tuple[int, int]
)

Rotate keypoints by 90 degrees counter-clockwise (CCW) a specified number of times.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).
factor	One of: 0 1 2 3	-	The number of 90 degree CCW rotations to apply. Must be in the range [0, 3].
image_shape	tuple[int, int]	-	The shape of the image (height, width).

Returns

np.ndarray: The rotated keypoints with the same shape as the input.

keypoints_scalefunction

keypoints_scale(
    keypoints: np.ndarray,
    scale_x: float,
    scale_y: float
)

Scale keypoints by given factors.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (num_keypoints, 2+)
scale_x	float	-	Scale factor for x coordinates
scale_y	float	-	Scale factor for y coordinates

Returns

np.ndarray: Scaled keypoints

keypoints_transposefunction

keypoints_transpose(
    keypoints: np.ndarray
)

Transpose keypoints along the main diagonal.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (num_keypoints, 2+)

Returns

np.ndarray: Transposed keypoints

keypoints_vflipfunction

keypoints_vflip(
    keypoints: np.ndarray,
    rows: int
)

Flip keypoints vertically.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (num_keypoints, 2+)
rows	int	-	Number of rows in the image

Returns

np.ndarray: Vertically flipped keypoints

normalize_grid_distortion_stepsfunction

normalize_grid_distortion_steps(
    image_shape: tuple[int, int],
    num_steps: int,
    x_steps: list[float],
    y_steps: list[float]
)

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	-
num_steps	int	-	-
x_steps	list[float]	-	-
y_steps	list[float]	-	-

order_pointsfunction

order_points(
    pts: np.ndarray
)

Parameters

Name	Type	Default	Description
pts	np.ndarray	-	-

padfunction

pad(
    img: np.ndarray,
    min_height: int,
    min_width: int,
    border_mode: int,
    value: tuple[float, ...] | float | None
)

Pad an image to ensure minimum dimensions. This function adds padding to an image if its dimensions are smaller than the specified minimum dimensions. Padding is added evenly on all sides.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input image to pad.
min_height	int	-	Minimum height of the output image.
min_width	int	-	Minimum width of the output image.
border_mode	int	-	OpenCV border mode for padding.
value	One of: tuple[float, ...] float None	-	Value(s) to fill the border pixels.

Returns

np.ndarray: Padded image with dimensions at least (min_height, min_width).

pad_bboxesfunction

pad_bboxes(
    bboxes: np.ndarray,
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    border_mode: int,
    image_shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
pad_top	int	-	-
pad_bottom	int	-	-
pad_left	int	-	-
pad_right	int	-	-
border_mode	int	-	-
image_shape	tuple[int, int]	-	-

pad_keypointsfunction

pad_keypoints(
    keypoints: np.ndarray,
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    border_mode: int,
    image_shape: tuple[int, int]
)

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
pad_top	int	-	-
pad_bottom	int	-	-
pad_left	int	-	-
pad_right	int	-	-
border_mode	int	-	-
image_shape	tuple[int, int]	-	-

pad_with_paramsfunction

pad_with_params(
    img: np.ndarray,
    h_pad_top: int,
    h_pad_bottom: int,
    w_pad_left: int,
    w_pad_right: int,
    border_mode: int,
    value: tuple[float, ...] | float | None
)

Pad an image with explicitly defined padding on each side. This function adds specified amounts of padding to each side of the image.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input image to pad.
h_pad_top	int	-	Number of pixels to add at the top.
h_pad_bottom	int	-	Number of pixels to add at the bottom.
w_pad_left	int	-	Number of pixels to add on the left.
w_pad_right	int	-	Number of pixels to add on the right.
border_mode	int	-	OpenCV border mode for padding.
value	One of: tuple[float, ...] float None	-	Value(s) to fill the border pixels.

Returns

np.ndarray: Padded image.

perspectivefunction

perspective(
    img: np.ndarray,
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    border_val: float | list[float] | np.ndarray,
    border_mode: int,
    keep_size: bool,
    interpolation: int
)

Apply perspective transformation to an image. This function warps an image according to a perspective transformation matrix. It can either maintain the original dimensions or use the specified max dimensions.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input image to transform.
matrix	np.ndarray	-	3x3 perspective transformation matrix.
max_width	int	-	Maximum width of the output image if keep_size is False.
max_height	int	-	Maximum height of the output image if keep_size is False.
border_val	One of: float list[float] np.ndarray	-	Border value(s) to fill areas outside the transformed image.
border_mode	int	-	OpenCV border mode (e.g., cv2.BORDER_CONSTANT, cv2.BORDER_REFLECT).
keep_size	bool	-	If True, maintain the original image dimensions.
interpolation	int	-	Interpolation method for resampling (cv2 interpolation flag).

Returns

np.ndarray: Perspective-transformed image.

perspective_bboxesfunction

perspective_bboxes(
    bboxes: np.ndarray,
    image_shape: tuple[int, int],
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    keep_size: bool
)

Applies perspective transformation to bounding boxes. This function transforms bounding boxes using the given perspective transformation matrix. It handles bounding boxes with additional attributes beyond the standard coordinates.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	An array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...). Additional columns beyond the first 4 are preserved unchanged.
image_shape	tuple[int, int]	-	The shape of the image (height, width).
matrix	np.ndarray	-	The perspective transformation matrix.
max_width	int	-	The maximum width of the output image.
max_height	int	-	The maximum height of the output image.
keep_size	bool	-	If True, maintains the original image size after transformation.

Returns

np.ndarray: An array of transformed bounding boxes with the same shape as input.

Example

>>> bboxes = np.array([[0.1, 0.1, 0.3, 0.3, 1], [0.5, 0.5, 0.8, 0.8, 2]])
>>> image_shape = (100, 100)
>>> matrix = np.array([[1.5, 0.2, -20], [-0.1, 1.3, -10], [0.002, 0.001, 1]])
>>> transformed_bboxes = perspective_bboxes(bboxes, image_shape, matrix, 150, 150, False)

Notes

- This function modifies only the coordinate columns (first 4) of the input bounding boxes. - Any additional attributes (columns beyond the first 4) are kept unchanged. - The function handles denormalization and renormalization of coordinates internally.

perspective_keypointsfunction

perspective_keypoints(
    keypoints: np.ndarray,
    image_shape: tuple[int, int],
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    keep_size: bool
)

Apply perspective transformation to keypoints.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of shape (N, 5+) in format [x, y, z, angle, scale, ...]
image_shape	tuple[int, int]	-	Original image shape (height, width)
matrix	np.ndarray	-	3x3 perspective transformation matrix
max_width	int	-	Maximum width after transformation
max_height	int	-	Maximum height after transformation
keep_size	bool	-	Whether to keep original size

Returns

np.ndarray: Transformed keypoints array with same shape as input

remapfunction

remap(
    img: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    interpolation: int,
    border_mode: int,
    value: tuple[float, ...] | float | None = None
)

Remap an image according to given coordinate maps. This function applies a generic geometrical transformation using mapping functions that specify the position of each pixel in the output image.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input image to transform.
map_x	np.ndarray	-	Map of x-coordinates with same height and width as the input image.
map_y	np.ndarray	-	Map of y-coordinates with same height and width as the input image.
interpolation	int	-	Interpolation method for resampling.
border_mode	int	-	OpenCV border mode for handling pixels outside the image boundaries.
value	One of: tuple[float, ...] float None	None	Border value(s) if border_mode is BORDER_CONSTANT.

Returns

np.ndarray: Remapped image with the same shape as the input image.

remap_bboxesfunction

remap_bboxes(
    bboxes: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    image_shape: tuple[int, int]
)

Remap bounding boxes using displacement maps.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

remap_keypointsfunction

remap_keypoints(
    keypoints: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    image_shape: tuple[int, int]
)

Transform keypoints using coordinate mapping functions. This function applies the inverse of the mapping defined by map_x and map_y to keypoint coordinates. The inverse mapping is necessary because the mapping functions define how pixels move from the source to the destination image, while keypoints need to be transformed from the destination back to the source.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (N, 2+), where the first two columns are x and y coordinates.
map_x	np.ndarray	-	Map of x-coordinates with shape equal to image_shape.
map_y	np.ndarray	-	Map of y-coordinates with shape equal to image_shape.
image_shape	tuple[int, int]	-	Shape (height, width) of the original image.

Returns

np.ndarray: Transformed keypoints with the same shape as the input keypoints.

remap_keypoints_via_maskfunction

remap_keypoints_via_mask(
    keypoints: np.ndarray,
    map_x: np.ndarray,
    map_y: np.ndarray,
    image_shape: tuple[int, int]
)

Remap keypoints using mask and cv2.remap method.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
map_x	np.ndarray	-	-
map_y	np.ndarray	-	-
image_shape	tuple[int, int]	-	-

resizefunction

resize(
    img: np.ndarray,
    target_shape: tuple[int, int],
    interpolation: int
)

Resize an image to the specified dimensions. This function resizes an input image to the target shape using the specified interpolation method. If the image is already the target size, it is returned unchanged.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input image to resize.
target_shape	tuple[int, int]	-	Target (height, width) dimensions.
interpolation	int	-	Interpolation method to use (cv2 interpolation flag). Examples: cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_NEAREST, etc.

Returns

np.ndarray: Resized image with shape target_shape + original channel dimensions.

rot90function

rot90(
    img: np.ndarray,
    factor: Literal[0, 1, 2, 3]
)

Parameters

Name	Type	Default	Description
img	np.ndarray	-	-
factor	One of: 0 1 2 3	-	-

rotation2d_matrix_to_euler_anglesfunction

rotation2d_matrix_to_euler_angles(
    matrix: np.ndarray,
    y_up: bool
)

Args: matrix (np.ndarray): Rotation matrix y_up (bool): is Y axis looks up or down

Parameters

Name	Type	Default	Description
matrix	np.ndarray	-	-
y_up	bool	-	-

scalefunction

scale(
    img: np.ndarray,
    scale: float,
    interpolation: int
)

Scale an image by a factor while preserving aspect ratio. This function scales both height and width dimensions of the image by the same factor.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input image to scale.
scale	float	-	Scale factor. Values > 1 will enlarge the image, values < 1 will shrink it.
interpolation	int	-	Interpolation method to use (cv2 interpolation flag).

Returns

np.ndarray: Scaled image.

shift_bboxesfunction

shift_bboxes(
    bboxes: np.ndarray,
    shift_vector: np.ndarray
)

Shift bounding boxes by a given vector.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (n, m) where n is the number of bboxes and m >= 4. The first 4 columns are [x_min, y_min, x_max, y_max].
shift_vector	np.ndarray	-	Vector to shift the bounding boxes by, with shape (4,) for [shift_x, shift_y, shift_x, shift_y].

Returns

np.ndarray: Shifted bounding boxes with the same shape as input.

shift_keypointsfunction

shift_keypoints(
    keypoints: np.ndarray,
    shift_vector: np.ndarray
)

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	-
shift_vector	np.ndarray	-	-

shuffle_tiles_within_shape_groupsfunction

shuffle_tiles_within_shape_groups(
    shape_groups: dict[tuple[int, int], list[int]],
    random_generator: np.random.Generator
)

Shuffles indices within each group of similar shapes and creates a list where each index points to the index of the tile it should be mapped to.

Parameters

Name	Type	Default	Description
shape_groups	dict[tuple[int, int], list[int]]	-	Groups of tile indices categorized by shape.
random_generator	np.random.Generator	-	The random generator to use for shuffling the indices. If None, a new random generator will be used.

Returns

list[int]: A list where each index is mapped to the new index of the tile after shuffling.

split_uniform_gridfunction

split_uniform_grid(
    image_shape: tuple[int, int],
    grid: tuple[int, int],
    random_generator: np.random.Generator
)

Splits an image shape into a uniform grid specified by the grid dimensions.

Parameters

Name	Type	Default	Description
image_shape	tuple[int, int]	-	The shape of the image as (height, width).
grid	tuple[int, int]	-	The grid size as (rows, columns).
random_generator	np.random.Generator	-	The random generator to use for shuffling the splits. If None, the splits are not shuffled.

Returns

np.ndarray: An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).

Notes

The function uses `generate_shuffled_splits` to generate the splits for the height and width of the image. The splits are then used to calculate the coordinates of the tiles.

swap_tiles_on_imagefunction

swap_tiles_on_image(
    image: np.ndarray,
    tiles: np.ndarray,
    mapping: list[int] | None = None
)

Swap tiles on the image according to the new format.

Parameters

Name	Type	Default	Description
image	np.ndarray	-	Input image.
tiles	np.ndarray	-	Array of tiles with each tile as [start_y, start_x, end_y, end_x].
mapping	One of: list[int] None	None	list of new tile indices.

Returns

np.ndarray: Output image with tiles swapped according to the random shuffle.

swap_tiles_on_keypointsfunction

swap_tiles_on_keypoints(
    keypoints: np.ndarray,
    tiles: np.ndarray,
    mapping: np.ndarray
)

Swap the positions of keypoints based on a tile mapping. This function takes a set of keypoints and repositions them according to a mapping of tile swaps. Keypoints are moved from their original tiles to new positions in the swapped tiles.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	A 2D numpy array of shape (N, 2) where N is the number of keypoints. Each row represents a keypoint's (x, y) coordinates.
tiles	np.ndarray	-	A 2D numpy array of shape (M, 4) where M is the number of tiles. Each row represents a tile's (start_y, start_x, end_y, end_x) coordinates.
mapping	np.ndarray	-	A 1D numpy array of shape (M,) where M is the number of tiles. Each element i contains the index of the tile that tile i should be swapped with.

Returns

np.ndarray: A 2D numpy array of the same shape as the input keypoints, containing the new positions

Notes

- Keypoints that do not fall within any tile will remain unchanged. - The function assumes that the tiles do not overlap and cover the entire image space.

to_distance_mapsfunction

to_distance_maps(
    keypoints: np.ndarray,
    image_shape: tuple[int, int],
    inverted: bool = False
)

Generate a ``(H,W,N)`` array of distance maps for ``N`` keypoints. The ``n``-th distance map contains at every location ``(y, x)`` the euclidean distance to the ``n``-th keypoint. This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	A numpy array of shape (N, 2+) where N is the number of keypoints. Each row represents a keypoint's (x, y) coordinates.
image_shape	tuple[int, int]	-	Shape of the image (height, width)
inverted	bool	False	If ``True``, inverted distance maps are returned where each distance value d is replaced by ``d/(d+1)``, i.e. the distance maps have values in the range ``(0.0, 1.0]`` with ``1.0`` denoting exactly the position of the respective keypoint.

Returns

np.ndarray: A float32 array of shape (H, W, N) containing ``N`` distance maps for ``N``

tps_transformfunction

tps_transform(
    target_points: np.ndarray,
    control_points: np.ndarray,
    nonlinear_weights: np.ndarray,
    affine_weights: np.ndarray
)

Apply TPS transformation with consistent types.

Parameters

Name	Type	Default	Description
target_points	np.ndarray	-	-
control_points	np.ndarray	-	-
nonlinear_weights	np.ndarray	-	-
affine_weights	np.ndarray	-	-

transposefunction

transpose(
    img: np.ndarray
)

Transposes the first two dimensions of an array of any dimensionality. Retains the order of any additional dimensions.

Parameters

Name	Type	Default	Description
img	np.ndarray	-	Input array.

Returns

np.ndarray: Transposed array.

validate_bboxesfunction

validate_bboxes(
    bboxes: np.ndarray,
    image_shape: Sequence[int]
)

Validate bounding boxes and remove invalid ones.

Parameters

Name	Type	Default	Description
bboxes	np.ndarray	-	Array of bounding boxes with shape (n, 4) where each row is [x_min, y_min, x_max, y_max].
image_shape	Sequence[int]	-	Shape of the image as (height, width).

Returns

np.ndarray: Array of valid bounding boxes, potentially with fewer boxes than the input.

Example

>>> bboxes = np.array([[10, 20, 30, 40], [-10, -10, 5, 5], [100, 100, 120, 120]])
>>> valid_bboxes = validate_bboxes(bboxes, (100, 100))
>>> print(valid_bboxes)
[[10 20 30 40]]

validate_if_not_found_coordsfunction

validate_if_not_found_coords(
    if_not_found_coords: Sequence[int] | dict[str, Any] | None
)

Validate and process `if_not_found_coords` parameter.

Parameters

Name	Type	Default	Description
if_not_found_coords	One of: Sequence[int] dict[str, Any] None	-	-

validate_keypointsfunction

validate_keypoints(
    keypoints: np.ndarray,
    image_shape: tuple[int, int]
)

Validate keypoints and remove those that fall outside the image boundaries.

Parameters

Name	Type	Default	Description
keypoints	np.ndarray	-	Array of keypoints with shape (N, M) where N is the number of keypoints and M >= 2. The first two columns represent x and y coordinates.
image_shape	tuple[int, int]	-	Shape of the image as (height, width).

Returns

np.ndarray: Array of valid keypoints that fall within the image boundaries.

Notes

This function only checks the x and y coordinates (first two columns) of the keypoints. Any additional columns (e.g., angle, scale) are preserved for valid keypoints.

volume_hflipfunction

volume_hflip(
    volume: np.ndarray
)

Perform horizontal flip on a volume (numpy array). Flips the volume along the width axis (axis=2). Handles inputs with shapes (D, H, W) or (D, H, W, C).

Parameters

Name	Type	Default	Description
volume	np.ndarray	-	Input volume.

Returns

np.ndarray: Horizontally flipped volume.

volume_rot90function

volume_rot90(
    volume: np.ndarray,
    factor: Literal[0, 1, 2, 3]
)

Rotate a volume 90 degrees counter-clockwise multiple times. Rotates the volume in the height-width plane (axes 1 and 2). Handles inputs with shapes (D, H, W) or (D, H, W, C).

Parameters

Name	Type	Default	Description
volume	np.ndarray	-	Input volume.
factor	One of: 0 1 2 3	-	Number of 90-degree rotations.

Returns

np.ndarray: Rotated volume.

volume_vflipfunction

volume_vflip(
    volume: np.ndarray
)

Perform vertical flip on a volume (numpy array). Flips the volume along the height axis (axis=1). Handles inputs with shapes (D, H, W) or (D, H, W, C).

Parameters

Name	Type	Default	Description
volume	np.ndarray	-	Input volume.

Returns

np.ndarray: Vertically flipped volume.

volumes_hflipfunction

volumes_hflip(
    volumes: np.ndarray
)

Perform horizontal flip on a batch of volumes (numpy array). Flips the volumes along the width axis (axis=3). Handles inputs with shapes (B, D, H, W) or (B, D, H, W, C).

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	Input batch of volumes.

Returns

np.ndarray: Horizontally flipped batch of volumes.

volumes_rot90function

volumes_rot90(
    volumes: np.ndarray,
    factor: Literal[0, 1, 2, 3]
)

Rotate a batch of volumes 90 degrees counter-clockwise multiple times. Rotates the volumes in the height-width plane (axes 2 and 3). Handles inputs with shapes (B, D, H, W) or (B, D, H, W, C).

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	Input batch of volumes.
factor	One of: 0 1 2 3	-	Number of 90-degree rotations

Returns

np.ndarray: Rotated batch of volumes.

volumes_vflipfunction

volumes_vflip(
    volumes: np.ndarray
)

Perform vertical flip on a batch of volumes (numpy array). Flips the volumes along the height axis (axis=2). Handles inputs with shapes (B, D, H, W) or (B, D, H, W, C).

Parameters

Name	Type	Default	Description
volumes	np.ndarray	-	Input batch of volumes.

Returns

np.ndarray: Vertically flipped batch of volumes.

warp_affinefunction

warp_affine(
    image: np.ndarray,
    matrix: np.ndarray,
    interpolation: int,
    fill: tuple[float, ...] | float,
    border_mode: int,
    output_shape: tuple[int, int]
)

Apply an affine transformation to an image. This function transforms an image using the specified affine transformation matrix. If the transformation matrix is an identity matrix, the original image is returned.

Parameters

Name	Type	Default	Description
image	np.ndarray	-	Input image to transform.
matrix	np.ndarray	-	2x3 or 3x3 affine transformation matrix.
interpolation	int	-	Interpolation method for resampling.
fill	One of: tuple[float, ...] float	-	Border value(s) to fill areas outside the transformed image.
border_mode	int	-	OpenCV border mode for handling pixels outside the image boundaries.
output_shape	tuple[int, int]	-	Shape (height, width) of the output image.

Returns

np.ndarray: Affine-transformed image with dimensions specified by output_shape.

warp_affine_with_value_extensionfunction

warp_affine_with_value_extension(
    image: np.ndarray,
    matrix: np.ndarray,
    dsize: tuple[int, int],
    flags: int,
    border_mode: int,
    border_value: tuple[float, ...] | float
)

Parameters

Name	Type	Default	Description
image	np.ndarray	-	-
matrix	np.ndarray	-	-
dsize	tuple[int, int]	-	-
flags	int	-	-
border_mode	int	-	-
border_value	One of: tuple[float, ...] float	-	-

adjust_padding_by_position
almost_equal_intervals
apply_affine_to_points
bbox_distort_image
bboxes_affine
bboxes_affine_ellipse
bboxes_affine_largest_box
bboxes_d4
bboxes_grid_shuffle
bboxes_hflip
bboxes_piecewise_affine
bboxes_rot90
bboxes_transpose
bboxes_vflip
calculate_affine_transform_padding
center
center_bbox
compute_affine_warp_output_shape
compute_pairwise_distances
compute_perspective_params
compute_tps_weights
compute_transformed_image_bounds
copy_make_border_with_value_extension
create_affine_transformation_matrix
create_piecewise_affine_maps
create_shape_groups
d4
distort_image
distort_image_keypoints
expand_transform
extend_value
find_keypoint
flip_bboxes
flip_keypoints
from_distance_maps
generate_control_points
generate_displacement_fields
generate_distorted_grid_polygons
generate_grid
generate_inverse_distortion_map
generate_perspective_points
generate_reflected_bboxes
generate_reflected_keypoints
generate_shuffled_splits
get_camera_matrix_distortion_maps
get_dimension_padding
get_fisheye_distortion_maps
get_pad_grid_dimensions
get_padding_params
is_identity_matrix
is_valid_component
keypoints_affine
keypoints_d4
keypoints_hflip
keypoints_rot90
keypoints_scale
keypoints_transpose
keypoints_vflip
normalize_grid_distortion_steps
order_points
pad
pad_bboxes
pad_keypoints
pad_with_params
perspective
perspective_bboxes
perspective_keypoints
remap
remap_bboxes
remap_keypoints
remap_keypoints_via_mask
resize
rot90
rotation2d_matrix_to_euler_angles
scale
shift_bboxes
shift_keypoints
shuffle_tiles_within_shape_groups
split_uniform_grid
swap_tiles_on_image
swap_tiles_on_keypoints
to_distance_maps
tps_transform
transpose
validate_bboxes
validate_if_not_found_coords
validate_keypoints
volume_hflip
volume_rot90
volume_vflip
volumes_hflip
volumes_rot90
volumes_vflip
warp_affine
warp_affine_with_value_extension

Navigation

albumentations.augmentations.geometric.functional

Members

adjust_padding_by_positionfunction

Parameters

almost_equal_intervalsfunction

Parameters

Returns

Example

apply_affine_to_pointsfunction

Parameters

Returns

bbox_distort_imagefunction

Parameters

bboxes_affinefunction

Parameters

Returns

bboxes_affine_ellipsefunction

Parameters

Returns

Notes

bboxes_affine_largest_boxfunction

Parameters

Returns

Example

Notes

bboxes_d4function

Parameters

Returns

bboxes_grid_shufflefunction

Parameters

Returns

bboxes_hflipfunction

Parameters

Returns

bboxes_piecewise_affinefunction

Parameters

bboxes_rot90function

Parameters

Returns

bboxes_transposefunction

Parameters

Returns

bboxes_vflipfunction

Parameters

Returns

calculate_affine_transform_paddingfunction

Parameters

centerfunction

Parameters

Returns

center_bboxfunction

Parameters

Returns

compute_affine_warp_output_shapefunction

Parameters

compute_pairwise_distancesfunction

Parameters

Returns

compute_perspective_paramsfunction

Parameters

compute_tps_weightsfunction

Parameters

Returns

compute_transformed_image_boundsfunction

Parameters

Returns

copy_make_border_with_value_extensionfunction

Parameters

create_affine_transformation_matrixfunction

Parameters

Returns

create_piecewise_affine_mapsfunction

Parameters

create_shape_groupsfunction

Parameters

d4function

Parameters

Returns

distort_imagefunction