# Geometric transforms (augmentations.geometric.transforms)¶

## class albumentations.augmentations.geometric.transforms.Affine (scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, always_apply=False, p=0.5)  [view source on GitHub] ¶

Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.

Affine transformations involve:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)


All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval and mode of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.

Parameters:

Name Type Description
scale number, tuple of number or dict

Scaling factor to use, where 1.0 denotes "no change" and 0.5 is zoomed out to 50 percent of the original size. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

translate_percent None, number, tuple of number or dict

Translation as a fraction of the image height/width (x-translation, y-translation), where 0 denotes "no change" and 0.5 denotes "half of the axis size". * If None then equivalent to 0.0 unless translate_px has a value other than None. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

translate_px None, int, tuple of int or dict

Translation in pixels. * If None then equivalent to 0 unless translate_percent has a value other than None. * If a single int, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the discrete interval [a..b]. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

rotate number or tuple of number

Rotation in degrees (NOT radians), i.e. expected value range is around [-360, 360]. Rotation happens around the center of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b] and used as the rotation value.

shear number, tuple of number or dict

Shear in degrees (NOT radians), i.e. expected value range is around [-360, 360], with reasonable values being in the range of [-45, 45]. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple (a, b), then two value will be uniformly sampled per image from the interval [a, b] and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

interpolation int

OpenCV interpolation flag.

mask_interpolation int

OpenCV interpolation flag.

cval number or sequence of number

The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when mode=constant. The expected value range is [0, 255] for uint8 images.

cval_mask number or tuple of number

Same as cval but only for masks.

mode int

OpenCV border flag.

fit_output bool

Whether to modify the affine transformation so that the whole output image is always contained in the image plane (True) or accept parts of the image being outside the image plane (False). This can be thought of as first applying the affine transformation and then applying a second transformation to "zoom in" on the new image so that it fits the image plane, This is useful to avoid corners of the image being outside of the image plane after applying rotations. It will however negate translation and scaling.

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

## class albumentations.augmentations.geometric.transforms.ElasticTransform (alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, same_dxdy=False, p=0.5)  [view source on GitHub] ¶

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters:

Name Type Description
alpha float
sigma float

Gaussian filter parameter.

alpha_affine float

The range will be (-alpha_affine, alpha_affine)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

approximate boolean

Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images.

same_dxdy boolean

Whether to use same random generated shift for x and y. Enabling this option gives ~2X speedup.

Image types: uint8, float32

## class albumentations.augmentations.geometric.transforms.Perspective (scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5)  [view source on GitHub] ¶

Perform a random four point perspective transform of the input.

Parameters:

Name Type Description
scale float or [float, float]

standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).

keep_size bool

Whether to resize image’s back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True

pad_mode OpenCV flag

OpenCV border mode.

pad_val int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0

mask_pad_val int, float, list of int, list of float

padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0

fit_output bool

If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

## class albumentations.augmentations.geometric.transforms.PiecewiseAffine (scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5)  [view source on GitHub] ¶

Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.

This is mostly a wrapper around scikit-image's PiecewiseAffine. See also Affine for a similar technique.

Note: This augmenter is very slow. Try to use ElasticTransformation instead, which is at least 10x faster.

Note: For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.

Parameters:

Name Type Description
scale float, tuple of float

Each point on the regular grid is moved around via a normal distribution. This scale factor is equivalent to the normal distribution's sigma. Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of the image if absolute_scale=False (default), so this scale can be the same for different sized images. Recommended values are in the range 0.01 to 0.05 (weak to strong augmentations). * If a single float, then that value will always be used as the scale. * If a tuple (a, b) of float s, then a random value will be uniformly sampled per image from the interval [a, b].

nb_rows int, tuple of int

Number of rows of points that the regular grid should have. Must be at least 2. For large images, you might want to pick a higher value than 4. You might have to then adjust scale to lower values. * If a single int, then that value will always be used as the number of rows. * If a tuple (a, b), then a value from the discrete interval [a..b] will be uniformly sampled per image.

nb_cols int, tuple of int

Number of columns. Analogous to nb_rows.

interpolation int

The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic

mask_interpolation int

same as interpolation but for mask.

cval number

The constant value to use when filling in newly created pixels.

cval_mask number

Same as cval but only for masks.

mode str

{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional Points outside the boundaries of the input are filled according to the given mode. Modes match the behaviour of numpy.pad.

absolute_scale bool

Take scale as an absolute value rather than a relative value.

keypoints_threshold float

Used as threshold in conversion from distance maps to keypoints. The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use None to use no min/max. Default: 0.01

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

## class albumentations.augmentations.geometric.transforms.ShiftScaleRotate (shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, always_apply=False, p=0.5)  [view source on GitHub] ¶

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters:

Name Type Description
shift_limit [float, float] or float

shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).

scale_limit [float, float] or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Default: (-0.1, 0.1).

rotate_limit [int, int] or int

rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

shift_limit_x [float, float] or float

shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None.

shift_limit_y [float, float] or float

shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None.

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints

Image types: uint8, float32