Geometric transforms (augmentations.geometric.transforms)¶
class
albumentations.augmentations.geometric.transforms.Affine
(scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', always_apply=False, p=0.5)
[view source on GitHub]
¶
Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.
Affine transformations involve:
- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)
All such transformations can create "new" pixels in the image without a defined content, e.g.
if the image is translated to the left, pixels are created on the right.
A method has to be defined to deal with these pixel values.
The parameters cval
and mode
of this class deal with this.
Some transformations involve interpolations between several pixels
of the input image to generate output pixel values. The parameters interpolation
and
mask_interpolation
deals with the method of interpolation used for this.
Parameters:
Name | Type | Description |
---|---|---|
scale |
number, tuple of number or dict |
Scaling factor to use, where |
translate_percent |
None, number, tuple of number or dict |
Translation as a fraction of the image height/width
(x-translation, y-translation), where |
translate_px |
None, int, tuple of int or dict |
Translation in pixels.
* If |
rotate |
number or tuple of number |
Rotation in degrees (NOT radians), i.e. expected value range is
around |
shear |
number, tuple of number or dict |
Shear in degrees (NOT radians), i.e. expected value range is
around |
interpolation |
int |
OpenCV interpolation flag. |
mask_interpolation |
int |
OpenCV interpolation flag. |
cval |
number or sequence of number |
The constant value to use when filling in newly created pixels.
(E.g. translating by 1px to the right will create a new 1px-wide column of pixels
on the left of the image).
The value is only used when |
cval_mask |
number or tuple of number |
Same as cval but only for masks. |
mode |
int |
OpenCV border flag. |
fit_output |
bool |
If True, the image plane size and position will be adjusted to tightly capture
the whole image after affine transformation ( |
keep_ratio |
bool |
When True, the original aspect ratio will be kept when the random scale is applied. Default: False. |
rotate_method |
str |
rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box" |
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, keypoints, bboxes
Image types: uint8, float32
Reference: [1] https://arxiv.org/abs/2109.13488
class
albumentations.augmentations.geometric.transforms.ElasticTransform
(alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, same_dxdy=False, p=0.5)
[view source on GitHub]
¶
Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
Parameters:
Name | Type | Description |
---|---|---|
alpha |
float |
|
sigma |
float |
Gaussian filter parameter. |
alpha_affine |
float |
The range will be (-alpha_affine, alpha_affine) |
interpolation |
OpenCV flag |
flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode |
OpenCV flag |
flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value |
int, float, list of ints, list of float |
padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value |
int, float,
list of ints,
list of float |
padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
approximate |
boolean |
Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images. |
same_dxdy |
boolean |
Whether to use same random generated shift for x and y. Enabling this option gives ~2X speedup. |
Targets: image, mask, bbox
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.Flip
[view source on GitHub]
¶
Flip the input either horizontally, vertically or both horizontally and vertically.
Parameters:
Name | Type | Description |
---|---|---|
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, bboxes, keypoints
Image types: uint8, float32
albumentations.augmentations.geometric.transforms.Flip.apply (self, img, d=0, **params)
¶
d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).
class
albumentations.augmentations.geometric.transforms.GridDistortion
(num_steps=5, distort_limit=0.3, interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=False, p=0.5)
[view source on GitHub]
¶
Parameters:
Name | Type | Description |
---|---|---|
num_steps |
int |
count of grid cells on each side. |
distort_limit |
float, [float, float] |
If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.03, 0.03). |
interpolation |
OpenCV flag |
flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode |
OpenCV flag |
flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value |
int, float, list of ints, list of float |
padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value |
int, float,
list of ints,
list of float |
padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
normalized |
bool |
if true, distortion will be normalized to do not go outside the image. Default: False See for more information: https://github.com/albumentations-team/albumentations/pull/722 |
Targets: image, mask
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.HorizontalFlip
[view source on GitHub]
¶
Flip the input horizontally around the y-axis.
Parameters:
Name | Type | Description |
---|---|---|
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, bboxes, keypoints
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.OpticalDistortion
(distort_limit=0.05, shift_limit=0.05, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5)
[view source on GitHub]
¶
Parameters:
Name | Type | Description |
---|---|---|
distort_limit |
float, [float, float] |
If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.05, 0.05). |
shift_limit |
float, [float, float] |
If shift_limit is a single float, the range will be (-shift_limit, shift_limit). Default: (-0.05, 0.05). |
interpolation |
OpenCV flag |
flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode |
OpenCV flag |
flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value |
int, float, list of ints, list of float |
padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value |
int, float,
list of ints,
list of float |
padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
Targets: image, mask, bbox
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.PadIfNeeded
(min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=False, p=1.0)
[view source on GitHub]
¶
Pad side of the image / max if side is less than desired number.
Parameters:
Name | Type | Description |
---|---|---|
min_height |
int |
minimal result image height. |
min_width |
int |
minimal result image width. |
pad_height_divisor |
int |
if not None, ensures image height is dividable by value of this argument. |
pad_width_divisor |
int |
if not None, ensures image width is dividable by value of this argument. |
position |
Union[str, PositionType] |
Position of the image. should be PositionType.CENTER or PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT. or PositionType.RANDOM. Default: PositionType.CENTER. |
border_mode |
OpenCV flag |
OpenCV border mode. |
value |
int, float, list of int, list of float |
padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value |
int, float,
list of int,
list of float |
padding value for mask if border_mode is cv2.BORDER_CONSTANT. |
p |
float |
probability of applying the transform. Default: 1.0. |
Targets: image, mask, bbox, keypoints
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.Perspective
(scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5)
[view source on GitHub]
¶
Perform a random four point perspective transform of the input.
Parameters:
Name | Type | Description |
---|---|---|
scale |
float or [float, float] |
standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1). |
keep_size |
bool |
Whether to resize image’s back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True |
pad_mode |
OpenCV flag |
OpenCV border mode. |
pad_val |
int, float, list of int, list of float |
padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0 |
mask_pad_val |
int, float, list of int, list of float |
padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0 |
fit_output |
bool |
If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False |
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, keypoints, bboxes
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.PiecewiseAffine
(scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5)
[view source on GitHub]
¶
Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.
This is mostly a wrapper around scikit-image's PiecewiseAffine
.
See also Affine
for a similar technique.
Note:
This augmenter is very slow. Try to use ElasticTransformation
instead, which is at least 10x faster.
Note: For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.
Parameters:
Name | Type | Description |
---|---|---|
scale |
float, tuple of float |
Each point on the regular grid is moved around via a normal distribution.
This scale factor is equivalent to the normal distribution's sigma.
Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
the image if |
nb_rows |
int, tuple of int |
Number of rows of points that the regular grid should have.
Must be at least |
nb_cols |
int, tuple of int |
Number of columns. Analogous to |
interpolation |
int |
The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic |
mask_interpolation |
int |
same as interpolation but for mask. |
cval |
number |
The constant value to use when filling in newly created pixels. |
cval_mask |
number |
Same as cval but only for masks. |
mode |
str |
{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
Points outside the boundaries of the input are filled according
to the given mode. Modes match the behaviour of |
absolute_scale |
bool |
Take |
keypoints_threshold |
float |
Used as threshold in conversion from distance maps to keypoints.
The search for keypoints works by searching for the
argmin (non-inverted) or argmax (inverted) in each channel. This
parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
as a keypoint. Use |
Targets: image, mask, keypoints, bboxes
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.ShiftScaleRotate
(shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=False, p=0.5)
[view source on GitHub]
¶
Randomly apply affine transforms: translate, scale and rotate the input.
Parameters:
Name | Type | Description |
---|---|---|
shift_limit |
[float, float] or float |
shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625). |
scale_limit |
[float, float] or float |
scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1). |
rotate_limit |
[int, int] or int |
rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45). |
interpolation |
OpenCV flag |
flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode |
OpenCV flag |
flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value |
int, float, list of int, list of float |
padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value |
int, float,
list of int,
list of float |
padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
shift_limit_x |
[float, float] or float |
shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None. |
shift_limit_y |
[float, float] or float |
shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None. |
rotate_method |
str |
rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box" |
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, keypoints
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.Transpose
[view source on GitHub]
¶
Transpose the input by swapping rows and columns.
Parameters:
Name | Type | Description |
---|---|---|
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, bboxes, keypoints
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.VerticalFlip
[view source on GitHub]
¶
Flip the input vertically around the x-axis.
Parameters:
Name | Type | Description |
---|---|---|
p |
float |
probability of applying the transform. Default: 0.5. |
Targets: image, mask, bboxes, keypoints
Image types: uint8, float32