Geometric transforms (augmentations.geometric.transforms)¶
class
albumentations.augmentations.geometric.transforms.Affine
(scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, cval=0, cval_mask=0, mode=0, fit_output=False, always_apply=False, p=0.5)
¶
Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.
Affine transformations involve:
 Translation ("move" image on the x/yaxis)
 Rotation
 Scaling ("zoom" in/out)
 Shear (move one side of the image, turning a square into a trapezoid)
All such transformations can create "new" pixels in the image without a defined content, e.g.
if the image is translated to the left, pixels are created on the right.
A method has to be defined to deal with these pixel values.
The parameters cval
and mode
of this class deal with this.
Some transformations involve interpolations between several pixels
of the input image to generate output pixel values. The parameter order
deals with the method of interpolation used for this.
Parameters:
Name  Type  Description 

scale 
number, tuple of number or dict 
Scaling factor to use, where 
translate_percent 
None, number, tuple of number or dict 
Translation as a fraction of the image height/width
(xtranslation, ytranslation), where 
translate_px 
None, int, tuple of int or dict 
Translation in pixels.
* If 
rotate 
number or tuple of number 
Rotation in degrees (NOT radians), i.e. expected value range is
around 
shear 
number, tuple of number or dict 
Shear in degrees (NOT radians), i.e. expected value range is
around 
interpolation 
int 
OpenCV interpolation flag. 
mask_interpolation 
int 
OpenCV interpolation flag. 
cval 
number or sequence of number 
The constant value to use when filling in newly created pixels.
(E.g. translating by 1px to the right will create a new 1pxwide column of pixels
on the left of the image).
The value is only used when 
cval_mask 
number or tuple of number 
Same as cval but only for masks. 
mode 
int 
OpenCV border flag. 
fit_output 
bool 
Whether to modify the affine transformation so that the whole output image is always
contained in the image plane ( 
p 
float 
probability of applying the transform. Default: 0.5. 
Targets: image, mask, keypoints, bboxes
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.ElasticTransform
(alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, p=0.5)
¶
Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/erniejunior/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
Parameters:
Name  Type  Description 

alpha 
float 

sigma 
float 
Gaussian filter parameter. 
alpha_affine 
float 
The range will be (alpha_affine, alpha_affine) 
interpolation 
OpenCV flag 
flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. 
border_mode 
OpenCV flag 
flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 
value 
int, float, list of ints, list of float 
padding value if border_mode is cv2.BORDER_CONSTANT. 
mask_value 
int, float,
list of ints,
list of float 
padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. 
approximate 
boolean 
Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images. 
Targets: image, mask
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.Perspective
(scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5)
¶
Perform a random four point perspective transform of the input.
Parameters:
Name  Type  Description 

scale 
float or [float, float] 
standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1). 
keep_size 
bool 
Whether to resize image’s back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True 
pad_mode 
OpenCV flag 
OpenCV border mode. 
pad_val 
int, float, list of int, list of float 
padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0 
mask_pad_val 
int, float, list of int, list of float 
padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0 
fit_output 
bool 
If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False 
p 
float 
probability of applying the transform. Default: 0.5. 
Targets: image, mask, keypoints, bboxes
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.PiecewiseAffine
(scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5)
¶
Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.
This is mostly a wrapper around scikitimage's PiecewiseAffine
.
See also Affine
for a similar technique.
Note:
This augmenter is very slow. Try to use ElasticTransformation
instead, which is at least 10x faster.
Note: For coordinatebased inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an imagebased augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.
Parameters:
Name  Type  Description 

scale 
float, tuple of float 
Each point on the regular grid is moved around via a normal distribution.
This scale factor is equivalent to the normal distribution's sigma.
Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
the image if 
nb_rows 
int, tuple of int 
Number of rows of points that the regular grid should have.
Must be at least 
nb_cols 
int, tuple of int 
Number of columns. Analogous to 
interpolation 
int 
The order of interpolation. The order has to be in the range 05:  0: Nearestneighbor  1: Bilinear (default)  2: Biquadratic  3: Bicubic  4: Biquartic  5: Biquintic 
mask_interpolation 
int 
same as interpolation but for mask. 
cval 
number 
The constant value to use when filling in newly created pixels. 
cval_mask 
number 
Same as cval but only for masks. 
mode 
str 
{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
Points outside the boundaries of the input are filled according
to the given mode. Modes match the behaviour of 
absolute_scale 
bool 
Take 
keypoints_threshold 
float 
Used as threshold in conversion from distance maps to keypoints.
The search for keypoints works by searching for the
argmin (noninverted) or argmax (inverted) in each channel. This
parameters contains the maximum (noninverted) or minimum (inverted) value to accept in order to view a hit
as a keypoint. Use 
Targets: image, mask, keypoints, bboxes
Image types: uint8, float32
class
albumentations.augmentations.geometric.transforms.ShiftScaleRotate
(shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, always_apply=False, p=0.5)
¶
Randomly apply affine transforms: translate, scale and rotate the input.
Parameters:
Name  Type  Description 

shift_limit 
[float, float] or float 
shift factor range for both height and width. If shift_limit is a single float value, the range will be (shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (0.0625, 0.0625). 
scale_limit 
[float, float] or float 
scaling factor range. If scale_limit is a single float value, the range will be (scale_limit, scale_limit). Default: (0.1, 0.1). 
rotate_limit 
[int, int] or int 
rotation range. If rotate_limit is a single int value, the range will be (rotate_limit, rotate_limit). Default: (45, 45). 
interpolation 
OpenCV flag 
flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. 
border_mode 
OpenCV flag 
flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 
value 
int, float, list of int, list of float 
padding value if border_mode is cv2.BORDER_CONSTANT. 
mask_value 
int, float,
list of int,
list of float 
padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. 
shift_limit_x 
[float, float] or float 
shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None. 
shift_limit_y 
[float, float] or float 
shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None. 
p 
float 
probability of applying the transform. Default: 0.5. 
Targets: image, mask, keypoints
Image types: uint8, float32