albumentations.augmentations.geometric.transforms


Geometric transformation classes for image augmentation. This module provides a collection of transforms that modify the geometric properties of images and associated data (masks, bounding boxes, keypoints). Includes implementations for flipping, transposing, affine transformations, distortions, padding, and more complex transformations like grid shuffling and thin plate splines.

Affineclass

Affine(
    scale: tuple[float, float] | float | dict[str, float | tuple[float, float]] = (1.0, 1.0),
    translate_percent: tuple[float, float] | float | dict[str, float | tuple[float, float]] | None = None,
    translate_px: tuple[int, int] | int | dict[str, int | tuple[int, int]] | None = None,
    rotate: tuple[float, float] | float = 0.0,
    shear: tuple[float, float] | float | dict[str, float | tuple[float, float]] = (0.0, 0.0),
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    fit_output: bool = False,
    keep_ratio: bool = False,
    rotate_method: Literal['largest_box', 'ellipse'] = largest_box,
    balanced_scale: bool = False,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    p: float = 0.5
)

Augmentation to apply affine transformations to images. Affine transformations involve: - Translation ("move" image on the x-/y-axis) - Rotation - Scaling ("zoom" in/out) - Shear (move one side of the image, turning a square into a trapezoid) All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters `fill` and `fill_mask` of this class deal with this. Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters `interpolation` and `mask_interpolation` deals with the method of interpolation used for this.

Parameters

NameTypeDefaultDescription
scale
One of:
  • tuple[float, float]
  • float
  • dict[str, float | tuple[float, float]]
(1.0, 1.0)Scaling factor to use, where ``1.0`` denotes "no change" and ``0.5`` is zoomed out to ``50`` percent of the original size. * If a single number, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set ``keep_ratio=True``, then the same value will be used for both x- and y-axis. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes. Note that when the ``keep_ratio=True``, the x- and y-axis ranges should be the same.
translate_percent
One of:
  • tuple[float, float]
  • float
  • dict[str, float | tuple[float, float]]
  • None
NoneTranslation as a fraction of the image height/width (x-translation, y-translation), where ``0`` denotes "no change" and ``0.5`` denotes "half of the axis size". * If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``. * If a single number, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes.
translate_px
One of:
  • tuple[int, int]
  • int
  • dict[str, int | tuple[int, int]]
  • None
NoneTranslation in pixels. * If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``. * If a single int, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes.
rotate
One of:
  • tuple[float, float]
  • float
0.0Rotation in degrees (**NOT** radians), i.e. expected value range is around ``[-360, 360]``. Rotation happens around the *center* of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]`` and used as the rotation value.
shear
One of:
  • tuple[float, float]
  • float
  • dict[str, float | tuple[float, float]]
(0.0, 0.0)Shear in degrees (**NOT** radians), i.e. expected value range is around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple ``(a, b)``, then two value will be uniformly sampled per image from the interval ``[a, b]`` and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen *independently* per axis, resulting in samples that differ between the axes.
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1OpenCV interpolation flag.
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0OpenCV interpolation flag.
fit_outputboolFalseIf True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (`translate_percent` and `translate_px` are ignored). Otherwise (``False``), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False
keep_ratioboolFalseWhen True, the original aspect ratio will be kept when the random scale is applied. Default: False.
rotate_method
One of:
  • 'largest_box'
  • 'ellipse'
largest_boxrotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"
balanced_scaleboolFalseWhen True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False. This is important because without it, scaling tends to lean towards upscaling. For example, if we want the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly from [0.5, 2]. With `balanced_scale`, the function ensures that half the time, the scaling factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in). This makes the zooming in and out process more balanced.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0OpenCV border flag.
fill
One of:
  • tuple[float, ...]
  • float
0The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images.
fill_mask
One of:
  • tuple[float, ...]
  • float
0Same as fill but only for masks.
pfloat0.5probability of applying the transform. Default: 0.5.

References

BaseDistortionclass

BaseDistortion(
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4],
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4],
    keypoint_remapping_method: Literal['direct', 'mask'],
    p: float,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0
)

Base class for distortion-based transformations. This class provides a foundation for implementing various types of image distortions, such as optical distortions, grid distortions, and elastic transformations. It handles the common operations of applying distortions to images, masks, bounding boxes, and keypoints.

Parameters

NameTypeDefaultDescription
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
-Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types (e.g., cv2.INTER_LINEAR, cv2.INTER_CUBIC).
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
-Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
-Method to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat-Probability of applying the transform.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-

Notes

- This is an abstract base class and should not be used directly. - Subclasses should implement the `get_params_dependent_on_data` method to generate the distortion maps (map_x and map_y). - The distortion is applied consistently across all targets (image, mask, bboxes, keypoints) to maintain coherence in the augmented data.

D4class

D4(
    p: float = 1
)

Applies one of the eight possible D4 dihedral group transformations to a square-shaped input, maintaining the square shape. These transformations correspond to the symmetries of a square, including rotations and reflections. The D4 group transformations include: - 'e' (identity): No transformation is applied. - 'r90' (rotation by 90 degrees counterclockwise) - 'r180' (rotation by 180 degrees) - 'r270' (rotation by 270 degrees counterclockwise) - 'v' (reflection across the vertical midline) - 'hvt' (reflection across the anti-diagonal) - 'h' (reflection across the horizontal midline) - 't' (reflection across the main diagonal) Even if the probability (`p`) of applying the transform is set to 1, the identity transformation 'e' may still occur, which means the input will remain unchanged in one out of eight cases.

Parameters

NameTypeDefaultDescription
pfloat1Probability of applying the transform. Default: 1.0.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Compose([
...     A.D4(p=1.0),
... ])
>>> transformed = transform(image=image)
>>> transformed_image = transformed['image']
# The resulting image will be one of the 8 possible D4 transformations of the input

Notes

- This transform is particularly useful for augmenting data that does not have a clear orientation, such as top-view satellite or drone imagery, or certain types of medical images. - The input image should be square-shaped for optimal results. Non-square inputs may lead to unexpected behavior or distortions. - When applied to bounding boxes or keypoints, their coordinates will be adjusted according to the selected transformation. - This transform preserves the aspect ratio and size of the input.

ElasticTransformclass

ElasticTransform(
    alpha: float = 1,
    sigma: float = 50,
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    approximate: bool = False,
    same_dxdy: bool = False,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    noise_distribution: Literal['gaussian', 'uniform'] = gaussian,
    keypoint_remapping_method: Literal['direct', 'mask'] = mask,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    p: float = 0.5
)

Apply elastic deformation to images, masks, bounding boxes, and keypoints. This transformation introduces random elastic distortions to the input data. It's particularly useful for data augmentation in training deep learning models, especially for tasks like image segmentation or object detection where you want to maintain the relative positions of features while introducing realistic deformations. The transform works by generating random displacement fields and applying them to the input. These fields are smoothed using a Gaussian filter to create more natural-looking distortions.

Parameters

NameTypeDefaultDescription
alphafloat1Scaling factor for the random displacement fields. Higher values result in more pronounced distortions. Default: 1.0
sigmafloat50Standard deviation of the Gaussian filter used to smooth the displacement fields. Higher values result in smoother, more global distortions. Default: 50.0
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types. Default: cv2.INTER_LINEAR
approximateboolFalseWhether to use an approximate version of the elastic transform. If True, uses a fixed kernel size for Gaussian smoothing, which can be faster but potentially less accurate for large sigma values. Default: False
same_dxdyboolFalseWhether to use the same random displacement field for both x and y directions. Can speed up the transform at the cost of less diverse distortions. Default: False
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
noise_distribution
One of:
  • 'gaussian'
  • 'uniform'
gaussianDistribution used to generate the displacement fields. "gaussian" generates fields using normal distribution (more natural deformations). "uniform" generates fields using uniform distribution (more mechanical deformations). Default: "gaussian".
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-
pfloat0.5Probability of applying the transform. Default: 0.5

Example

>>> import albumentations as A
>>> transform = A.Compose([
...     A.ElasticTransform(alpha=1, sigma=50, p=0.5),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Notes

- The transform will maintain consistency across all targets (image, mask, bboxes, keypoints) by using the same displacement fields for all. - The 'approximate' parameter determines whether to use a precise or approximate method for generating displacement fields. The approximate method can be faster but may be less accurate for large sigma values. - Bounding boxes that end up outside the image after transformation will be removed. - Keypoints that end up outside the image after transformation will be removed.

GridDistortionclass

GridDistortion(
    num_steps: int = 5,
    distort_limit: tuple[float, float] | float = (-0.3, 0.3),
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    normalized: bool = True,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    keypoint_remapping_method: Literal['direct', 'mask'] = mask,
    p: float = 0.5,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0
)

Apply grid distortion to images, masks, bounding boxes, and keypoints. This transformation divides the image into a grid and randomly distorts each cell, creating localized warping effects. It's particularly useful for data augmentation in tasks like medical image analysis, OCR, and other domains where local geometric variations are meaningful.

Parameters

NameTypeDefaultDescription
num_stepsint5Number of grid cells on each side of the image. Higher values create more granular distortions. Must be at least 1. Default: 5.
distort_limit
One of:
  • tuple[float, float]
  • float
(-0.3, 0.3)Range of distortion. If a single float is provided, the range will be (-distort_limit, distort_limit). Higher values create stronger distortions. Should be in the range of -1 to 1. Default: (-0.3, 0.3).
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1OpenCV interpolation method used for image transformation. Options include cv2.INTER_LINEAR, cv2.INTER_CUBIC, etc. Default: cv2.INTER_LINEAR.
normalizedboolTrueIf True, ensures that the distortion does not move pixels outside the image boundaries. This can result in less extreme distortions but guarantees that no information is lost. Default: True.
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-

Example

>>> import albumentations as A
>>> transform = A.Compose([
...     A.GridDistortion(num_steps=5, distort_limit=0.3, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Notes

- The same distortion is applied to all targets (image, mask, bboxes, keypoints) to maintain consistency. - When normalized=True, the distortion is adjusted to ensure all pixels remain within the image boundaries.

GridElasticDeformclass

GridElasticDeform(
    num_grid_xy: tuple[int, int],
    magnitude: int,
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    p: float = 1.0
)

Apply elastic deformations to images, masks, bounding boxes, and keypoints using a grid-based approach. This transformation overlays a grid on the input and applies random displacements to the grid points, resulting in local elastic distortions. The granularity and intensity of the distortions can be controlled using the dimensions of the overlaying distortion grid and the magnitude parameter.

Parameters

NameTypeDefaultDescription
num_grid_xytuple[int, int]-Number of grid cells along the width and height. Specified as (grid_width, grid_height). Each value must be greater than 1.
magnitudeint-Maximum pixel-wise displacement for distortion. Must be greater than 0.
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1Interpolation method to be used for the image transformation. Default: cv2.INTER_LINEAR
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Interpolation method to be used for mask transformation. Default: cv2.INTER_NEAREST
pfloat1.0Probability of applying the transform. Default: 1.0.

Example

>>> transform = GridElasticDeform(num_grid_xy=(4, 4), magnitude=10, p=1.0)
>>> result = transform(image=image, mask=mask)
>>> transformed_image, transformed_mask = result['image'], result['mask']

Notes

This transformation is particularly useful for data augmentation in medical imaging and other domains where elastic deformations can simulate realistic variations.

HorizontalFlipclass

HorizontalFlip(
    p: float = 0.5
)

Flip the input horizontally around the y-axis.

Parameters

NameTypeDefaultDescription
pfloat0.5probability of applying the transform. Default: 0.5.

OpticalDistortionclass

OpticalDistortion(
    distort_limit: tuple[float, float] | float = (-0.05, 0.05),
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    mode: Literal['camera', 'fisheye'] = camera,
    keypoint_remapping_method: Literal['direct', 'mask'] = mask,
    p: float = 0.5,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0
)

Apply optical distortion to images, masks, bounding boxes, and keypoints. Supports two distortion models: 1. Camera matrix model (original): Uses OpenCV's camera calibration model with k1=k2=k distortion coefficients 2. Fisheye model: Direct radial distortion: r_dist = r * (1 + gamma * r²)

Parameters

NameTypeDefaultDescription
distort_limit
One of:
  • tuple[float, float]
  • float
(-0.05, 0.05)Range of distortion coefficient. For camera model: recommended range (-0.05, 0.05) For fisheye model: recommended range (-0.3, 0.3) Default: (-0.05, 0.05)
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1Interpolation method used for image transformation. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
mode
One of:
  • 'camera'
  • 'fisheye'
cameraDistortion model to use: - 'camera': Original camera matrix model - 'fisheye': Fisheye lens model Default: 'camera'
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-

Example

>>> import albumentations as A
>>> transform = A.Compose([
...     A.OpticalDistortion(distort_limit=0.1, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Notes

- The distortion is applied using OpenCV's initUndistortRectifyMap and remap functions. - The distortion coefficient (k) is randomly sampled from the distort_limit range. - Bounding boxes and keypoints are transformed along with the image to maintain consistency. - Fisheye model directly applies radial distortion

Padclass

Pad(
    padding: int | tuple[int, int] | tuple[int, int, int, int] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    p: float = 1.0
)

Pad the sides of an image by specified number of pixels.

Parameters

NameTypeDefaultDescription
padding
One of:
  • int
  • tuple[int, int]
  • tuple[int, int, int, int]
0Padding values. Can be: * int - pad all sides by this value * tuple[int, int] - (pad_x, pad_y) to pad left/right by pad_x and top/bottom by pad_y * tuple[int, int, int, int] - (left, top, right, bottom) specific padding per side
fill
One of:
  • tuple[float, ...]
  • float
0Padding value if border_mode is cv2.BORDER_CONSTANT
fill_mask
One of:
  • tuple[float, ...]
  • float
0Padding value for mask if border_mode is cv2.BORDER_CONSTANT
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0OpenCV border mode
pfloat1.0probability of applying the transform. Default: 1.0.

PadIfNeededclass

PadIfNeeded(
    min_height: int | None = 1024,
    min_width: int | None = 1024,
    pad_height_divisor: int | None = None,
    pad_width_divisor: int | None = None,
    position: Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random'] = center,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    p: float = 1.0
)

Pads the sides of an image if the image dimensions are less than the specified minimum dimensions. If the `pad_height_divisor` or `pad_width_divisor` is specified, the function additionally ensures that the image dimensions are divisible by these values.

Parameters

NameTypeDefaultDescription
min_height
One of:
  • int
  • None
1024Minimum desired height of the image. Ensures image height is at least this value. If not specified, pad_height_divisor must be provided.
min_width
One of:
  • int
  • None
1024Minimum desired width of the image. Ensures image width is at least this value. If not specified, pad_width_divisor must be provided.
pad_height_divisor
One of:
  • int
  • None
NoneIf set, pads the image height to make it divisible by this value. If not specified, min_height must be provided.
pad_width_divisor
One of:
  • int
  • None
NoneIf set, pads the image width to make it divisible by this value. If not specified, min_width must be provided.
position
One of:
  • 'center'
  • 'top_left'
  • 'top_right'
  • 'bottom_left'
  • 'bottom_right'
  • 'random'
centerPosition where the image is to be placed after padding. Default is 'center'.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0Specifies the border mode to use if padding is required. The default is `cv2.BORDER_CONSTANT`.
fill
One of:
  • tuple[float, ...]
  • float
0Value to fill the border pixels if the border mode is `cv2.BORDER_CONSTANT`. Default is None.
fill_mask
One of:
  • tuple[float, ...]
  • float
0Similar to `fill` but used for padding masks. Default is None.
pfloat1.0Probability of applying the transform. Default is 1.0.

Example

>>> import albumentations as A
>>> transform = A.Compose([
...     A.PadIfNeeded(min_height=1024, min_width=1024, border_mode=cv2.BORDER_CONSTANT, fill=0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> padded_image = transformed['image']
>>> padded_mask = transformed['mask']
>>> adjusted_bboxes = transformed['bboxes']
>>> adjusted_keypoints = transformed['keypoints']

Notes

- Either `min_height` or `pad_height_divisor` must be set, but not both. - Either `min_width` or `pad_width_divisor` must be set, but not both. - If `border_mode` is set to `cv2.BORDER_CONSTANT`, `value` must be provided. - The transform will maintain consistency across all targets (image, mask, bboxes, keypoints, volume). - For bounding boxes, the coordinates will be adjusted to account for the padding. - For keypoints, their positions will be shifted according to the padding.

Perspectiveclass

Perspective(
    scale: tuple[float, float] | float = (0.05, 0.1),
    keep_size: bool = True,
    fit_output: bool = False,
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    p: float = 0.5
)

Apply random four point perspective transformation to the input.

Parameters

NameTypeDefaultDescription
scale
One of:
  • tuple[float, float]
  • float
(0.05, 0.1)Standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
keep_sizeboolTrueWhether to resize image back to its original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes. Default: True.
fit_outputboolFalseIf True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. This is followed by image resizing if keep_size is set to True. If False, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False.
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types. Default: cv2.INTER_LINEAR
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
fill
One of:
  • tuple[float, ...]
  • float
0Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0.
fill_mask
One of:
  • tuple[float, ...]
  • float
0Padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0.
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Compose([
...     A.Perspective(scale=(0.05, 0.1), keep_size=True, p=0.5),
... ])
>>> result = transform(image=image)
>>> transformed_image = result['image']

Notes

This transformation creates a perspective effect by randomly moving the four corners of the image. The amount of movement is controlled by the 'scale' parameter. When 'keep_size' is True, the output image will have the same size as the input image, which may cause some parts of the transformed image to be cut off or padded. When 'fit_output' is True, the transformation ensures that the entire transformed image is visible, which may result in a larger output image if keep_size is False.

PiecewiseAffineclass

PiecewiseAffine(
    scale: tuple[float, float] | float = (0.03, 0.05),
    nb_rows: tuple[int, int] | int = (4, 4),
    nb_cols: tuple[int, int] | int = (4, 4),
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    absolute_scale: bool = False,
    keypoint_remapping_method: Literal['direct', 'mask'] = mask,
    p: float = 0.5,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0
)

Apply piecewise affine transformations to the input image. This augmentation places a regular grid of points on an image and randomly moves the neighborhood of these points around via affine transformations. This leads to local distortions in the image.

Parameters

NameTypeDefaultDescription
scale
One of:
  • tuple[float, float]
  • float
(0.03, 0.05)Standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Recommended values are in the range (0.01, 0.05) for small distortions, and (0.05, 0.1) for larger distortions. Default: (0.03, 0.05).
nb_rows
One of:
  • tuple[int, int]
  • int
(4, 4)Number of rows of points that the regular grid should have. Must be at least 2. For large images, you might want to pick a higher value than 4. If a single int, then that value will always be used as the number of rows. If a tuple (a, b), then a value from the discrete interval [a..b] will be uniformly sampled per image. Default: 4.
nb_cols
One of:
  • tuple[int, int]
  • int
(4, 4)Number of columns of points that the regular grid should have. Must be at least 2. For large images, you might want to pick a higher value than 4. If a single int, then that value will always be used as the number of columns. If a tuple (a, b), then a value from the discrete interval [a..b] will be uniformly sampled per image. Default: 4.
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
absolute_scaleboolFalseIf set to True, the value of the scale parameter will be treated as an absolute pixel value. If set to False, it will be treated as a fraction of the image height and width. Default: False.
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5.
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Compose([
...     A.PiecewiseAffine(scale=(0.03, 0.05), nb_rows=4, nb_cols=4, p=0.5),
... ])
>>> transformed = transform(image=image)
>>> transformed_image = transformed["image"]

Notes

- This augmentation is very slow. Consider using `ElasticTransform` instead, which is at least 10x faster. - The augmentation may not always produce visible effects, especially with small scale values. - For keypoints and bounding boxes, the transformation might move them outside the image boundaries. In such cases, the keypoints will be set to (-1, -1) and the bounding boxes will be removed.

RandomGridShuffleclass

RandomGridShuffle(
    grid: tuple[int, int] = (3, 3),
    p: float = 0.5
)

Randomly shuffles the grid's cells on an image, mask, or keypoints, effectively rearranging patches within the image. This transformation divides the image into a grid and then permutes these grid cells based on a random mapping.

Parameters

NameTypeDefaultDescription
gridtuple[int, int](3, 3)Size of the grid for splitting the image into cells. Each cell is shuffled randomly. For example, (3, 3) will divide the image into a 3x3 grid, resulting in 9 cells to be shuffled. Default: (3, 3)
pfloat0.5Probability that the transform will be applied. Should be in the range [0, 1]. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.array([
...     [1, 1, 1, 2, 2, 2],
...     [1, 1, 1, 2, 2, 2],
...     [1, 1, 1, 2, 2, 2],
...     [3, 3, 3, 4, 4, 4],
...     [3, 3, 3, 4, 4, 4],
...     [3, 3, 3, 4, 4, 4]
... ])
>>> transform = A.RandomGridShuffle(grid=(2, 2), p=1.0)
>>> result = transform(image=image)
>>> transformed_image = result['image']
# The resulting image might look like this (one possible outcome):
# [[4, 4, 4, 2, 2, 2],
#  [4, 4, 4, 2, 2, 2],
#  [4, 4, 4, 2, 2, 2],
#  [3, 3, 3, 1, 1, 1],
#  [3, 3, 3, 1, 1, 1],
#  [3, 3, 3, 1, 1, 1]]

Notes

- This transform maintains consistency across all targets. If applied to an image and its corresponding mask or keypoints, the same shuffling will be applied to all. - The number of cells in the grid should be at least 2 (i.e., grid should be at least (1, 2), (2, 1), or (2, 2)) for the transform to have any effect. - Keypoints are moved along with their corresponding grid cell. - This transform could be useful when only micro features are important for the model, and memorizing the global structure could be harmful. For example: - Identifying the type of cell phone used to take a picture based on micro artifacts generated by phone post-processing algorithms, rather than the semantic features of the photo. See more at https://ieeexplore.ieee.org/abstract/document/8622031 - Identifying stress, glucose, hydration levels based on skin images.

ShiftScaleRotateclass

ShiftScaleRotate(
    shift_limit: tuple[float, float] | float = (-0.0625, 0.0625),
    scale_limit: tuple[float, float] | float = (-0.1, 0.1),
    rotate_limit: tuple[float, float] | float = (-45, 45),
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    border_mode: int = 0,
    shift_limit_x: tuple[float, float] | float | None = None,
    shift_limit_y: tuple[float, float] | float | None = None,
    rotate_method: Literal['largest_box', 'ellipse'] = largest_box,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    p: float = 0.5
)

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters

NameTypeDefaultDescription
shift_limit
One of:
  • tuple[float, float]
  • float
(-0.0625, 0.0625)shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [-1, 1]. Default: (-0.0625, 0.0625).
scale_limit
One of:
  • tuple[float, float]
  • float
(-0.1, 0.1)scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).
rotate_limit
One of:
  • tuple[float, float]
  • float
(-45, 45)rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
border_modeint0flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_CONSTANT
shift_limit_x
One of:
  • tuple[float, float]
  • float
  • None
Noneshift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [-1, 1]. Default: None.
shift_limit_y
One of:
  • tuple[float, float]
  • float
  • None
Noneshift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [-, 1]. Default: None.
rotate_method
One of:
  • 'largest_box'
  • 'ellipse'
largest_boxrotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
fill
One of:
  • tuple[float, ...]
  • float
0padding value if border_mode is cv2.BORDER_CONSTANT.
fill_mask
One of:
  • tuple[float, ...]
  • float
0padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
pfloat0.5probability of applying the transform. Default: 0.5.

SquareSymmetryclass

SquareSymmetry(
    p: float = 1
)

Applies one of the eight possible square symmetry transformations to a square-shaped input. This is an alias for D4 transform with a more intuitive name for those not familiar with group theory. The square symmetry transformations include: - Identity: No transformation is applied - 90° rotation: Rotate 90 degrees counterclockwise - 180° rotation: Rotate 180 degrees - 270° rotation: Rotate 270 degrees counterclockwise - Vertical flip: Mirror across vertical axis - Anti-diagonal flip: Mirror across anti-diagonal - Horizontal flip: Mirror across horizontal axis - Main diagonal flip: Mirror across main diagonal

Parameters

NameTypeDefaultDescription
pfloat1Probability of applying the transform. Default: 1.0.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Compose([
...     A.SquareSymmetry(p=1.0),
... ])
>>> transformed = transform(image=image)
>>> transformed_image = transformed['image']
# The resulting image will be one of the 8 possible square symmetry transformations of the input

Notes

- This transform is particularly useful for augmenting data that does not have a clear orientation, such as top-view satellite or drone imagery, or certain types of medical images. - The input image should be square-shaped for optimal results. Non-square inputs may lead to unexpected behavior or distortions. - When applied to bounding boxes or keypoints, their coordinates will be adjusted according to the selected transformation. - This transform preserves the aspect ratio and size of the input.

ThinPlateSplineclass

ThinPlateSpline(
    scale_range: tuple[float, float] = (0.2, 0.4),
    num_control_points: int = 4,
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 1,
    mask_interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4] = 0,
    keypoint_remapping_method: Literal['direct', 'mask'] = mask,
    p: float = 0.5,
    border_mode: Literal[cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101] = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0
)

Apply Thin Plate Spline (TPS) transformation to create smooth, non-rigid deformations. Imagine the image printed on a thin metal plate that can be bent and warped smoothly: - Control points act like pins pushing or pulling the plate - The plate resists sharp bending, creating smooth deformations - The transformation maintains continuity (no tears or folds) - Areas between control points are interpolated naturally The transform works by: 1. Creating a regular grid of control points (like pins in the plate) 2. Randomly displacing these points (like pushing/pulling the pins) 3. Computing a smooth interpolation (like the plate bending) 4. Applying the resulting deformation to the image

Parameters

NameTypeDefaultDescription
scale_rangetuple[float, float](0.2, 0.4)Range for random displacement of control points. Values should be in [0.0, 1.0]: - 0.0: No displacement (identity transform) - 0.1: Subtle warping - 0.2-0.4: Moderate deformation (recommended range) - 0.5+: Strong warping Default: (0.2, 0.4)
num_control_pointsint4Number of control points per side. Creates a grid of num_control_points x num_control_points points. - 2: Minimal deformation (affine-like) - 3-4: Moderate flexibility (recommended) - 5+: More local deformation control Must be >= 2. Default: 4
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
1OpenCV interpolation flag. Used for image sampling. See also: cv2.INTER_* Default: cv2.INTER_LINEAR
mask_interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
0OpenCV interpolation flag. Used for mask sampling. See also: cv2.INTER_* Default: cv2.INTER_NEAREST
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5
border_mode
One of:
  • cv2.BORDER_CONSTANT
  • cv2.BORDER_REPLICATE
  • cv2.BORDER_REFLECT
  • cv2.BORDER_WRAP
  • cv2.BORDER_REFLECT_101
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-

Example

>>> import albumentations as A
>>> # Basic usage
>>> transform = A.ThinPlateSpline()
>>>
>>> # Subtle deformation
>>> transform = A.ThinPlateSpline(
...     scale_range=(0.1, 0.2),
...     num_control_points=3
... )
>>>
>>> # Strong warping with fine control
>>> transform = A.ThinPlateSpline(
...     scale_range=(0.3, 0.5),
...     num_control_points=5,
... )

Notes

- The transformation preserves smoothness and continuity - Stronger scale values may create more extreme deformations - Higher number of control points allows more local deformations - The same deformation is applied consistently to all targets

References

Transposeclass

Transpose(
    p: float = 0.5
)

Transpose the input by swapping its rows and columns. This transform flips the image over its main diagonal, effectively switching its width and height. It's equivalent to a 90-degree rotation followed by a horizontal flip.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.array([
...     [[1, 2, 3], [4, 5, 6]],
...     [[7, 8, 9], [10, 11, 12]]
... ])
>>> transform = A.Transpose(p=1.0)
>>> result = transform(image=image)
>>> transposed_image = result['image']
>>> print(transposed_image)
[[[ 1  2  3]
  [ 7  8  9]]
 [[ 4  5  6]
  [10 11 12]]]
# The original 2x2x3 image is now 2x2x3, with rows and columns swapped

Notes

- The dimensions of the output will be swapped compared to the input. For example, an input image of shape (100, 200, 3) will result in an output of shape (200, 100, 3). - This transform is its own inverse. Applying it twice will return the original input. - For multi-channel images (like RGB), the channels are preserved in their original order. - Bounding boxes will have their coordinates adjusted to match the new image dimensions. - Keypoints will have their x and y coordinates swapped.

VerticalFlipclass

VerticalFlip(
    p: float = 0.5
)

Flip the input vertically around the x-axis.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.array([
...     [[1, 2, 3], [4, 5, 6]],
...     [[7, 8, 9], [10, 11, 12]]
... ])
>>> transform = A.VerticalFlip(p=1.0)
>>> result = transform(image=image)
>>> flipped_image = result['image']
>>> print(flipped_image)
[[[ 7  8  9]
  [10 11 12]]
 [[ 1  2  3]
  [ 4  5  6]]]
# The original image is flipped vertically, with rows reversed

Notes

- This transform flips the image upside down. The top of the image becomes the bottom and vice versa. - The dimensions of the image remain unchanged. - For multi-channel images (like RGB), each channel is flipped independently. - Bounding boxes are adjusted to match their new positions in the flipped image. - Keypoints are moved to their new positions in the flipped image.