Stay updated

News & Insights

albumentations.augmentations.geometric.distortion


Apply elastic deformation to images, masks, bounding boxes, and keypoints.

ElasticTransformclass

ElasticTransform(
    alpha: float = 1,
    sigma: float = 50,
    interpolation: 0 | 1 | 2 | 3 | 4 = 1,
    approximate: bool = False,
    same_dxdy: bool = False,
    mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
    noise_distribution: 'gaussian' | 'uniform' = gaussian,
    keypoint_remapping_method: 'direct' | 'mask' = mask,
    border_mode: 0 | 1 | 2 | 3 | 4 = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    map_resolution_range: tuple[float, float] = (1.0, 1.0),
    p: float = 0.5
)

Apply elastic deformation to images, masks, bounding boxes, and keypoints. This transformation introduces random elastic distortions to the input data. It's particularly useful for data augmentation in training deep learning models, especially for tasks like image segmentation or object detection where you want to maintain the relative positions of features while introducing realistic deformations. The transform works by generating random displacement fields and applying them to the input. These fields are smoothed using a Gaussian filter to create more natural-looking distortions. Targets: image, mask, bboxes, keypoints, volume, mask3d Image types: uint8, float32 Supported bboxes: hbb, obb

Parameters

NameTypeDefaultDescription
alphafloat1Scaling factor for the random displacement fields. Higher values result in more pronounced distortions. Default: 1.0
sigmafloat50Standard deviation of the Gaussian filter used to smooth the displacement fields. Higher values result in smoother, more global distortions. Default: 50.0
interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
1Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types. Default: cv2.INTER_LINEAR
approximateboolFalseWhether to use an approximate version of the elastic transform. If True, uses a fixed kernel size for Gaussian smoothing, which can be faster but potentially less accurate for large sigma values. Default: False
same_dxdyboolFalseWhether to use the same random displacement field for both x and y directions. Can speed up the transform at the cost of less diverse distortions. Default: False
mask_interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
noise_distribution
One of:
  • 'gaussian'
  • 'uniform'
gaussianDistribution used to generate the displacement fields. "gaussian" generates fields using normal distribution (more natural deformations). "uniform" generates fields using uniform distribution (more mechanical deformations). Default: "gaussian".
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
border_mode
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-
map_resolution_rangetuple[float, float](1.0, 1.0)Range for downsampling the distortion map before applying it. Values should be in (0, 1] where 1.0 means full resolution. Lower values generate smaller distortion maps which are faster to compute but may result in less precise distortions. The actual resolution is sampled uniformly from this range. Default: (1.0, 1.0).
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import albumentations as A
>>> transform = A.Compose([
...     A.ElasticTransform(alpha=1, sigma=50, p=0.5),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Notes

- The transform will maintain consistency across all targets (image, mask, bboxes, keypoints) by using the same displacement fields for all. - The 'approximate' parameter determines whether to use a precise or approximate method for generating displacement fields. The approximate method can be faster but may be less accurate for large sigma values. - Bounding boxes that end up outside the image after transformation will be removed. - Keypoints that end up outside the image after transformation will be removed.

GridDistortionclass

GridDistortion(
    num_steps: int = 5,
    distort_limit: tuple[float, float] | float = (-0.3, 0.3),
    interpolation: 0 | 1 | 2 | 3 | 4 = 1,
    normalized: bool = True,
    mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
    keypoint_remapping_method: 'direct' | 'mask' = mask,
    p: float = 0.5,
    border_mode: 0 | 1 | 2 | 3 | 4 = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    map_resolution_range: tuple[float, float] = (1.0, 1.0)
)

Apply grid distortion to images, masks, bounding boxes, and keypoints. This transformation divides the image into a grid and randomly distorts each cell, creating localized warping effects. It's particularly useful for data augmentation in tasks like medical image analysis, OCR, and other domains where local geometric variations are meaningful.

Parameters

NameTypeDefaultDescription
num_stepsint5Number of grid cells on each side of the image. Higher values create more granular distortions. Must be at least 1. Default: 5.
distort_limit
One of:
  • tuple[float, float]
  • float
(-0.3, 0.3)Range of distortion. If a single float is provided, the range will be (-distort_limit, distort_limit). Higher values create stronger distortions. Should be in the range of -1 to 1. Default: (-0.3, 0.3).
interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
1OpenCV interpolation method used for image transformation. Options include cv2.INTER_LINEAR, cv2.INTER_CUBIC, etc. Default: cv2.INTER_LINEAR.
normalizedboolTrueIf True, ensures that the distortion does not move pixels outside the image boundaries. This can result in less extreme distortions but guarantees that no information is lost. Default: True.
mask_interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5.
border_mode
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-
map_resolution_rangetuple[float, float](1.0, 1.0)Range for downsampling the distortion map before applying it. Values should be in (0, 1] where 1.0 means full resolution. Lower values generate smaller distortion maps which are faster to compute but may result in less precise distortions. The actual resolution is sampled uniformly from this range. Default: (1.0, 1.0).

Examples

>>> import albumentations as A
>>> transform = A.Compose([
...     A.GridDistortion(num_steps=5, distort_limit=0.3, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Notes

- The same distortion is applied to all targets (image, mask, bboxes, keypoints) to maintain consistency. - When normalized=True, the distortion is adjusted to ensure all pixels remain within the image boundaries.

OpticalDistortionclass

OpticalDistortion(
    distort_limit: tuple[float, float] | float = (-0.05, 0.05),
    interpolation: 0 | 1 | 2 | 3 | 4 = 1,
    mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
    mode: 'camera' | 'fisheye' = camera,
    keypoint_remapping_method: 'direct' | 'mask' = mask,
    p: float = 0.5,
    border_mode: 0 | 1 | 2 | 3 | 4 = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    map_resolution_range: tuple[float, float] = (1.0, 1.0)
)

Apply optical distortion to images, masks, bounding boxes, and keypoints. Supports two distortion models: 1. Camera matrix model (original): Uses OpenCV's camera calibration model with k1=k2=k distortion coefficients 2. Fisheye model: Direct radial distortion: r_dist = r * (1 + gamma * r²)

Parameters

NameTypeDefaultDescription
distort_limit
One of:
  • tuple[float, float]
  • float
(-0.05, 0.05)Range of distortion coefficient. For camera model: recommended range (-0.05, 0.05) For fisheye model: recommended range (-0.3, 0.3) Default: (-0.05, 0.05)
interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
1Interpolation method used for image transformation. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
mask_interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.
mode
One of:
  • 'camera'
  • 'fisheye'
cameraDistortion model to use: - 'camera': Original camera matrix model - 'fisheye': Fisheye lens model Default: 'camera'
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5.
border_mode
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-
map_resolution_rangetuple[float, float](1.0, 1.0)Range for downsampling the distortion map before applying it. Values should be in (0, 1] where 1.0 means full resolution. Lower values generate smaller distortion maps which are faster to compute but may result in less precise distortions. The actual resolution is sampled uniformly from this range. Default: (1.0, 1.0).

Examples

>>> import albumentations as A
>>> transform = A.Compose([
...     A.OpticalDistortion(distort_limit=0.1, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Notes

- The distortion is applied using OpenCV's initUndistortRectifyMap and remap functions. - The distortion coefficient (k) is randomly sampled from the distort_limit range. - Bounding boxes and keypoints are transformed along with the image to maintain consistency. - Fisheye model directly applies radial distortion

ThinPlateSplineclass

ThinPlateSpline(
    scale_range: tuple[float, float] = (0.2, 0.4),
    num_control_points: int = 4,
    interpolation: 0 | 1 | 2 | 3 | 4 = 1,
    mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
    keypoint_remapping_method: 'direct' | 'mask' = mask,
    p: float = 0.5,
    border_mode: 0 | 1 | 2 | 3 | 4 = 0,
    fill: tuple[float, ...] | float = 0,
    fill_mask: tuple[float, ...] | float = 0,
    map_resolution_range: tuple[float, float] = (1.0, 1.0)
)

Apply Thin Plate Spline (TPS) transformation to create smooth, non-rigid deformations. Imagine the image printed on a thin metal plate that can be bent and warped smoothly: - Control points act like pins pushing or pulling the plate - The plate resists sharp bending, creating smooth deformations - The transformation maintains continuity (no tears or folds) - Areas between control points are interpolated naturally The transform works by: 1. Creating a regular grid of control points (like pins in the plate) 2. Randomly displacing these points (like pushing/pulling the pins) 3. Computing a smooth interpolation (like the plate bending) 4. Applying the resulting deformation to the image

Parameters

NameTypeDefaultDescription
scale_rangetuple[float, float](0.2, 0.4)Range for random displacement of control points. Values should be in [0.0, 1.0]: - 0.0: No displacement (identity transform) - 0.1: Subtle warping - 0.2-0.4: Moderate deformation (recommended range) - 0.5+: Strong warping Default: (0.2, 0.4)
num_control_pointsint4Number of control points per side. Creates a grid of num_control_points x num_control_points points. - 2: Minimal deformation (affine-like) - 3-4: Moderate flexibility (recommended) - 5+: More local deformation control Must be >= 2. Default: 4
interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
1OpenCV interpolation flag. Used for image sampling. See also: cv2.INTER_* Default: cv2.INTER_LINEAR
mask_interpolation
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0OpenCV interpolation flag. Used for mask sampling. See also: cv2.INTER_* Default: cv2.INTER_NEAREST
keypoint_remapping_method
One of:
  • 'direct'
  • 'mask'
maskMethod to use for keypoint remapping. - "mask": Uses mask-based remapping. Faster, especially for many keypoints, but may be less accurate for large distortions. Recommended for large images or many keypoints. - "direct": Uses inverse mapping. More accurate for large distortions but slower. Default: "mask"
pfloat0.5Probability of applying the transform. Default: 0.5
border_mode
One of:
  • 0
  • 1
  • 2
  • 3
  • 4
0-
fill
One of:
  • tuple[float, ...]
  • float
0-
fill_mask
One of:
  • tuple[float, ...]
  • float
0-
map_resolution_rangetuple[float, float](1.0, 1.0)Range for downsampling the distortion map before applying it. Values should be in (0, 1] where 1.0 means full resolution. Lower values generate smaller distortion maps which are faster to compute but may result in less precise distortions. The actual resolution is sampled uniformly from this range. Default: (1.0, 1.0).

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create sample data
>>> image = np.zeros((100, 100, 3), dtype=np.uint8)
>>> mask = np.zeros((100, 100), dtype=np.uint8)
>>> mask[25:75, 25:75] = 1  # Square mask
>>> bboxes = np.array([[10, 10, 40, 40]])  # Single box
>>> bbox_labels = [1]
>>> keypoints = np.array([[50, 50]])  # Single keypoint at center
>>> keypoint_labels = [0]
>>>
>>> # Set up transform with Compose to handle all targets
>>> transform = A.Compose([
...     A.ThinPlateSpline(scale_range=(0.2, 0.4), p=1.0)
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
...    keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply to all targets
>>> result = transform(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels
... )
>>>
>>> # Access transformed results
>>> transformed_image = result['image']
>>> transformed_mask = result['mask']
>>> transformed_bboxes = result['bboxes']
>>> transformed_bbox_labels = result['bbox_labels']
>>> transformed_keypoints = result['keypoints']
>>> transformed_keypoint_labels = result['keypoint_labels']

Notes

- The transformation preserves smoothness and continuity - Stronger scale values may create more extreme deformations - Higher number of control points allows more local deformations - The same deformation is applied consistently to all targets

References

  • [{'description': '"Principal Warps', 'source': 'Thin-Plate Splines and the Decomposition of Deformations" by F.L. Bookstein https://doi.org/10.1109/34.24792'}, {'description': 'Thin Plate Splines in Computer Vision', 'source': 'https://en.wikipedia.org/wiki/Thin_plate_spline'}, {'description': 'Similar implementation in Kornia', 'source': 'https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.RandomThinPlateSpline'}]