Skip to content

Full API Reference on a single page

Pixel-level transforms

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

Spatial-level transforms

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform Image Masks BBoxes Keypoints
Affine
BBoxSafeRandomCrop
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
ElasticTransform
Flip
GridDistortion
GridDropout
HorizontalFlip
Lambda
LongestMaxSize
MaskDropout
NoOp
OpticalDistortion
PadIfNeeded
Perspective
PiecewiseAffine
PixelDropout
RandomCrop
RandomCropFromBorders
RandomCropNearBBox
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
SafeRotate
ShiftScaleRotate
SmallestMaxSize
Transpose
VerticalFlip

albumentations.augmentations special

albumentations.augmentations.blur special

albumentations.augmentations.blur.transforms

class albumentations.augmentations.blur.transforms.AdvancedBlur (blur_limit=(3, 7), sigmaX_limit=(0.2, 1.0), sigmaY_limit=(0.2, 1.0), rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a Generalized Normal filter with a randomly selected parameters. This transform also adds multiplicative noise to generated kernel before convolution.

Parameters:

Name Type Description
blur_limit

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigmaX_limit

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigmaX_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

sigmaY_limit

Same as sigmaY_limit for another dimension.

rotate_limit

Range from which a random angle used to rotate Gaussian kernel is picked. If limit is a single int an angle is picked from (-rotate_limit, rotate_limit). Default: (-90, 90).

beta_limit

Distribution shape parameter, 1 is the normal distribution. Values below 1.0 make distribution tails heavier than normal, values above 1.0 make it lighter than normal. Default: (0.5, 8.0).

noise_limit

Multiplicative factor that control strength of kernel noise. Must be positive and preferably centered around 1.0. If set single value noise_limit will be in range (0, noise_limit). Default: (0.75, 1.25).

p float

probability of applying the transform. Default: 0.5.

Reference: https://arxiv.org/abs/2107.10833

Targets: image Image types: uint8, float32

class albumentations.augmentations.blur.transforms.Blur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a random-sized kernel.

Parameters:

Name Type Description
blur_limit int, [int, int]

maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply defocus transform. See https://arxiv.org/abs/1903.12261.

Parameters:

Name Type Description
radius [int, int] or int

range for radius of defocusing. If limit is a single int, the range will be [1, limit]. Default: (3, 10).

alias_blur [float, float] or float

range for alias_blur of defocusing (sigma of gaussian blur). If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: Any

class albumentations.augmentations.blur.transforms.GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a Gaussian filter with a random kernel size.

Parameters:

Name Type Description
blur_limit int, [int, int]

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigma_limit float, [float, float]

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigma_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.GlassBlur (sigma=0.7, max_delta=4, iterations=2, always_apply=False, mode='fast', p=0.5) [view source on GitHub]

Apply glass noise to the input image.

Parameters:

Name Type Description
sigma float

standard deviation for Gaussian kernel.

max_delta int

max distance between pixels which are swapped.

iterations int

number of repeats. Should be in range [1, inf). Default: (2).

mode str

mode of computation: fast or exact. Default: "fast".

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1903.12261 | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

class albumentations.augmentations.blur.transforms.MedianBlur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a median filter with a random aperture linear size.

Parameters:

Name Type Description
blur_limit int

maximum aperture linear size for blurring the input image. Must be odd and in range [3, inf). Default: (3, 7).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.MotionBlur (blur_limit=7, allow_shifted=True, always_apply=False, p=0.5) [view source on GitHub]

Apply motion blur to the input image using a random-sized kernel.

Parameters:

Name Type Description
blur_limit int

maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).

allow_shifted bool

if set to true creates non shifted kernels only, otherwise creates randomly shifted kernels. Default: True.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.ZoomBlur (max_factor=1.31, step_factor=(0.01, 0.03), always_apply=False, p=0.5) [view source on GitHub]

Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.

Parameters:

Name Type Description
max_factor [float, float] or float

range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1.

step_factor [float, float] or float

If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range [step_factor[0], step_factor[1]). Default: (0.01, 0.03). All step_factor values should be positive.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: Any

albumentations.augmentations.crops special

albumentations.augmentations.crops.functional

def albumentations.augmentations.crops.functional.bbox_crop (bbox, x_min, y_min, x_max, y_max, rows, cols) [view source on GitHub]

Crop a bounding box.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

x_min int
y_min int
x_max int
y_max int
rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
tuple

A cropped bounding box (x_min, y_min, x_max, y_max).

def albumentations.augmentations.crops.functional.crop_bbox_by_coords (bbox, crop_coords, crop_height, crop_width, rows, cols) [view source on GitHub]

Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A cropped box (x_min, y_min, x_max, y_max).

crop_coords Tuple[int, int, int, int]

Crop coordinates (x1, y1, x2, y2).

crop_height int
crop_width int
rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
tuple

A cropped bounding box (x_min, y_min, x_max, y_max).

def albumentations.augmentations.crops.functional.crop_keypoint_by_coords (keypoint, crop_coords) [view source on GitHub]

Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

crop_coords Tuple[int, int, int, int]

Crop box coords (x1, x2, y1, y2).

Returns:

Type Description

A keypoint (x, y, angle, scale).

def albumentations.augmentations.crops.functional.keypoint_center_crop (keypoint, crop_height, crop_width, rows, cols) [view source on GitHub]

Keypoint center crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

crop_height int

Crop height.

crop_width int

Crop width.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
tuple

A keypoint (x, y, angle, scale).

def albumentations.augmentations.crops.functional.keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols) [view source on GitHub]

Keypoint random crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

(tuple): A keypoint (x, y, angle, scale).

crop_height int

Crop height.

crop_width int

Crop width.

h_start float

Crop height start.

w_start float

Crop width start.

rows int

Image height.

cols int

Image width.

Returns:

Type Description

A keypoint (x, y, angle, scale).

albumentations.augmentations.crops.transforms

class albumentations.augmentations.crops.transforms.BBoxSafeRandomCrop (erosion_rate=0.0, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input without loss of bboxes.

Parameters:

Name Type Description
erosion_rate float

erosion rate applied on input image height before crop.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes Image types: uint8, float32

class albumentations.augmentations.crops.transforms.CenterCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop the central part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

Note: It is recommended to use uint8 images as input. Otherwise the operation will require internal conversion float32 -> uint8 -> float32 that causes worse performance.

class albumentations.augmentations.crops.transforms.Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0) [view source on GitHub]

Crop region from image.

Parameters:

Name Type Description
x_min int

Minimum upper left x coordinate.

y_min int

Minimum upper left y coordinate.

x_max int

Maximum lower right x coordinate.

y_max int

Maximum lower right y coordinate.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.CropAndPad (px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image). Padding adds pixels to the sides (e.g. black pixels). This transformation will never crop images below a height or width of 1.

Note: This transformation automatically resizes images back to their original size. To deactivate this, add the parameter keep_size=False.

Parameters:

Name Type Description
px int or tuple

The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter percent may be set, not both at the same time. * If None, then pixel-based cropping/padding will not be used. * If int, then that exact number of pixels will always be cropped/padded. * If a tuple of two int s with values a and b, then each side will be cropped/padded by a random amount sampled uniformly per image and side from the interval [a, b]. If however sample_independently is set to False, only one value will be sampled per image and used for all sides. * If a tuple of four entries, then the entries represent top, right, bottom, left. Each entry may be a single int (always crop/pad by exactly that value), a tuple of two int s a and b (crop/pad by an amount within [a, b]), a list of int s (crop/pad by a random value that is contained in the list).

percent float or tuple

The number of pixels to crop (negative values) or pad (positive values) on each side of the image given as a fraction of the image height/width. E.g. if this is set to -0.1, the transformation will always crop away 10% of the image's height at both the top and the bottom (both 10% each), as well as 10% of the width at the right and left. Expected value range is (-1.0, inf). Either this or the parameter px may be set, not both at the same time. * If None, then fraction-based cropping/padding will not be used. * If float, then that fraction will always be cropped/padded. * If a tuple of two float s with values a and b, then each side will be cropped/padded by a random fraction sampled uniformly per image and side from the interval [a, b]. If however sample_independently is set to False, only one value will be sampled per image and used for all sides. * If a tuple of four entries, then the entries represent top, right, bottom, left. Each entry may be a single float (always crop/pad by exactly that percent value), a tuple of two float s a and b (crop/pad by a fraction from [a, b]), a list of float s (crop/pad by a random value that is contained in the list).

pad_mode int

OpenCV border mode.

pad_cval number, Sequence[number]

The constant value to use if the pad mode is BORDER_CONSTANT. * If number, then that value will be used. * If a tuple of two number s and at least one of them is a float, then a random number will be uniformly sampled per image from the continuous interval [a, b] and used as the value. If both number s are int s, the interval is discrete. * If a list of number, then a random value will be chosen from the elements of the list and used as the value.

pad_cval_mask number, Sequence[number]

Same as pad_cval but only for masks.

keep_size bool

After cropping and padding, the result image will usually have a different height/width compared to the original input image. If this parameter is set to True, then the cropped/padded image will be resized to the input image's size, i.e. the output shape is always identical to the input shape.

sample_independently bool

If False and the values for px/percent result in exactly one probability distribution for all image sides, only one single value will be sampled from that probability distribution and used for all sides. I.e. the crop/pad amount then is the same for all sides. If True, four values will be sampled independently, one per side.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

Targets: image, mask, bboxes, keypoints

Image types: any

class albumentations.augmentations.crops.transforms.CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0) [view source on GitHub]

Crop area with mask if mask is non-empty, else make random crop.

Parameters:

Name Type Description
height int

vertical size of crop in pixels

width int

horizontal size of crop in pixels

ignore_values list of int

values to ignore in mask, 0 values are always ignored (e.g. if background value is 5 set ignore_values=[5] to ignore)

ignore_channels list of int

channels to ignore in mask (e.g. if background is a first channel set ignore_channels=[0] to ignore)

p float

probability of applying the transform. Default: 1.0.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image randomly cut parts from borders without resize at the end

Parameters:

Name Type Description
crop_left float

single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

crop_right float

single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

crop_top float

singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

crop_bottom float

single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomCropNearBBox (max_part_shift=(0.3, 0.3), cropping_box_key='cropping_bbox', always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image with random shift by x,y coordinates

Parameters:

Name Type Description
max_part_shift float, [float, float]

Max shift in height and width dimensions relative to cropping_bbox dimension. If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift). Default (0.3, 0.3).

cropping_box_key str

Additional target key for cropping box. Default cropping_bbox

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:

>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_box_key='test_box')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])
class albumentations.augmentations.crops.transforms.RandomResizedCrop (height, width, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Torchvision's variant of crop a random part of the input and rescale it to some size.

Parameters:

Name Type Description
height int

height after crop and resize.

width int

width after crop and resize.

scale [float, float]

range of size of the origin size cropped

ratio [float, float]

range of aspect ratio of the origin aspect ratio cropped

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size without loss of bboxes.

Parameters:

Name Type Description
height int

height after crop and resize.

width int

width after crop and resize.

erosion_rate float

erosion rate applied on input image height before crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomSizedCrop (min_max_height, height, width, w2h_ratio=1.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size.

Parameters:

Name Type Description
min_max_height [int, int]

crop size limits.

height int

height after crop and resize.

width int

width after crop and resize.

w2h_ratio float

aspect ratio of crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.domain_adaptation

class albumentations.augmentations.domain_adaptation.FDA (reference_images, beta_limit=0.1, read_fn=<function read_rgb_image at 0x7f357e730160>, always_apply=False, p=0.5) [view source on GitHub]

Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA Simple "style transfer".

Parameters:

Name Type Description
reference_images List[str] or List(np.ndarray

List of file paths for reference images or list of reference images.

beta_limit float or tuple of float

coefficient beta from paper. Recommended less 0.3.

read_fn Callable

Used-defined function to read image. Function should get image path and return numpy array of image pixels.

Targets: image

Image types: uint8, float32

Reference: https://github.com/YanchaoYang/FDA https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

Examples:

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

class albumentations.augmentations.domain_adaptation.HistogramMatching (reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f357e730160>, always_apply=False, p=0.5) [view source on GitHub]

Apply histogram matching. It manipulates the pixels of an input image so that its histogram matches the histogram of the reference image. If the images have multiple channels, the matching is done independently for each channel, as long as the number of channels is equal in the input image and the reference.

Histogram matching can be used as a lightweight normalisation for image processing, such as feature matching, especially in circumstances where the images have been taken from different sources or in different conditions (i.e. lighting).

See: https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html

Parameters:

Name Type Description
reference_images List[str] or List(np.ndarray

List of file paths for reference images or list of reference images.

blend_ratio [float, float]

Tuple of min and max blend ratio. Matched image will be blended with original with random blend factor for increased diversity of generated images.

read_fn Callable

Used-defined function to read image. Function should get image path and return numpy array of image pixels.

p float

probability of applying the transform. Default: 1.0.

Targets: image

Image types: uint8, uint16, float32

class albumentations.augmentations.domain_adaptation.PixelDistributionAdaptation (reference_images, blend_ratio=(0.25, 1.0), read_fn=<function read_rgb_image at 0x7f357e730160>, transform_type='pca', always_apply=False, p=0.5) [view source on GitHub]

Another naive and quick pixel-level domain adaptation. It fits a simple transform (such as PCA, StandardScaler or MinMaxScaler) on both original and reference image, transforms original image with transform trained on this image and then performs inverse transformation using transform fitted on reference image.

Parameters:

Name Type Description
reference_images List[str] or List(np.ndarray

List of file paths for reference images or list of reference images.

blend_ratio [float, float]

Tuple of min and max blend ratio. Matched image will be blended with original with random blend factor for increased diversity of generated images.

read_fn Callable

Used-defined function to read image. Function should get image path and return numpy array of image pixels. Usually it's default read_rgb_image when images paths are used as reference, otherwise it could be identity function lambda x: x if reference images have been read in advance.

transform_type str

type of transform; "pca", "standard", "minmax" are allowed.

p float

probability of applying the transform. Default: 1.0.

Targets: image

Image types: uint8, float32

See also: https://github.com/arsenyinfo/qudida

def albumentations.augmentations.domain_adaptation.fourier_domain_adaptation (img, target_img, beta) [view source on GitHub]

Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA

Parameters:

Name Type Description
img ndarray

source image

target_img ndarray

target image for domain adaptation

beta float

coefficient from source paper

Returns:

Type Description
ndarray

transformed image

albumentations.augmentations.dropout special

albumentations.augmentations.dropout.channel_dropout

class albumentations.augmentations.dropout.channel_dropout.ChannelDropout (channel_drop_range=(1, 1), fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Randomly Drop Channels in the input Image.

Parameters:

Name Type Description
channel_drop_range [int, int]

range from which we choose the number of channels to drop.

fill_value int, float

pixel value for the dropped channel.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, uint16, unit32, float32

albumentations.augmentations.dropout.coarse_dropout

class albumentations.augmentations.dropout.coarse_dropout.CoarseDropout (max_holes=8, max_height=8, max_width=8, min_holes=None, min_height=None, min_width=None, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5) [view source on GitHub]

CoarseDropout of the rectangular regions in the image.

Parameters:

Name Type Description
max_holes int

Maximum number of regions to zero out.

max_height int, float

Maximum height of the hole.

max_width int, float

Maximum width of the hole.

min_holes int

Minimum number of regions to zero out. If None, min_holes is be set to max_holes. Default: None.

min_height int, float

Minimum height of the hole. Default: None. If None, min_height is set to max_height. Default: None. If float, it is calculated as a fraction of the image height.

min_width int, float

Minimum width of the hole. If None, min_height is set to max_width. Default: None. If float, it is calculated as a fraction of the image width.

fill_value int, float, list of int, list of float

value for dropped pixels.

mask_fill_value int, float, list of int, list of float

fill value for dropped pixels in mask. If None - mask is not affected. Default: None.

Targets: image, mask, keypoints

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1708.04552 | https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py | https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

albumentations.augmentations.dropout.cutout

class albumentations.augmentations.dropout.cutout.Cutout (num_holes=8, max_h_size=8, max_w_size=8, fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

CoarseDropout of the square regions in the image.

Parameters:

Name Type Description
num_holes int

number of regions to zero out

max_h_size int

maximum height of the hole

max_w_size int

maximum width of the hole

fill_value int, float, list of int, list of float

value for dropped pixels.

Targets: image

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1708.04552 | https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py | https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

albumentations.augmentations.dropout.grid_dropout

class albumentations.augmentations.dropout.grid_dropout.GridDropout (ratio=0.5, unit_size_min=None, unit_size_max=None, holes_number_x=None, holes_number_y=None, shift_x=0, shift_y=0, random_offset=False, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5) [view source on GitHub]

GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

Parameters:

Name Type Description
ratio float

the ratio of the mask holes to the unit_size (same for horizontal and vertical directions). Must be between 0 and 1. Default: 0.5.

unit_size_min int

minimum size of the grid unit. Must be between 2 and the image shorter edge. If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: None.

unit_size_max int

maximum size of the grid unit. Must be between 2 and the image shorter edge. If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: None.

holes_number_x int

the number of grid units in x direction. Must be between 1 and image width//2. If 'None', grid unit width is set as image_width//10. Default: None.

holes_number_y int

the number of grid units in y direction. Must be between 1 and image height//2. If None, grid unit height is set equal to the grid unit width or image height, whatever is smaller.

shift_x int

offsets of the grid start in x direction from (0,0) coordinate. Clipped between 0 and grid unit_width - hole_width. Default: 0.

shift_y int

offsets of the grid start in y direction from (0,0) coordinate. Clipped between 0 and grid unit height - hole_height. Default: 0.

random_offset boolean

weather to offset the grid randomly between 0 and grid unit size - hole size If 'True', entered shift_x, shift_y are ignored and set randomly. Default: False.

fill_value int

value for the dropped pixels. Default = 0

mask_fill_value int

value for the dropped pixels in mask. If None, transformation is not applied to the mask. Default: None.

Targets: image, mask

Image types: uint8, float32

References: https://arxiv.org/abs/2001.04086

albumentations.augmentations.dropout.mask_dropout

class albumentations.augmentations.dropout.mask_dropout.MaskDropout (max_objects=1, image_fill_value=0, mask_fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Image & mask augmentation that zero out mask and image regions corresponding to randomly chosen object instance from mask.

Mask must be single-channel image, zero values treated as background. Image can be any number of channels.

Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254

Parameters:

Name Type Description
max_objects

Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]

image_fill_value

Fill value to use when filling image. Can be 'inpaint' to apply inpaining (works only for 3-chahnel images)

mask_fill_value

Fill value to use when filling mask.

Targets: image, mask

Image types: uint8, float32

albumentations.augmentations.functional

def albumentations.augmentations.functional.add_fog (img, fog_coef, alpha_coef, haze_list) [view source on GitHub]

Add fog to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray

Image.

fog_coef float

Fog coefficient.

alpha_coef float

Alpha coefficient.

haze_list list

Returns:

Type Description
numpy.ndarray

Image.

def albumentations.augmentations.functional.add_gravel (img, gravels) [view source on GitHub]

Add gravel to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img ndarray

image to add gravel to

gravels list

list of gravel parameters. (float, float, float, float): (top-left x, top-left y, bottom-right x, bottom right y)

Returns:

Type Description
numpy.ndarray

def albumentations.augmentations.functional.add_rain (img, slant, drop_length, drop_width, drop_color, blur_value, brightness_coefficient, rain_drops) [view source on GitHub]

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray

Image.

slant int
drop_length
drop_width
drop_color
blur_value int

Rainy view are blurry.

brightness_coefficient float

Rainy days are usually shady.

rain_drops

Returns:

Type Description
numpy.ndarray

Image.

def albumentations.augmentations.functional.add_shadow (img, vertices_list) [view source on GitHub]

Add shadows to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray
vertices_list list

Returns:

Type Description
numpy.ndarray

def albumentations.augmentations.functional.add_snow (img, snow_point, brightness_coeff) [view source on GitHub]

Bleaches out pixels, imitation snow.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray

Image.

snow_point

Number of show points.

brightness_coeff

Brightness coefficient.

Returns:

Type Description
numpy.ndarray

Image.

def albumentations.augmentations.functional.add_sun_flare (img, flare_center_x, flare_center_y, src_radius, src_color, circles) [view source on GitHub]

Add sun flare.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray
flare_center_x float
flare_center_y float
src_radius
src_color int, int, int
circles list

Returns:

Type Description
numpy.ndarray

def albumentations.augmentations.functional.bbox_from_mask (mask) [view source on GitHub]

Create bounding box from binary mask (fast version)

Parameters:

Name Type Description
mask numpy.ndarray

binary mask.

Returns:

Type Description
tuple

A bounding box tuple (x_min, y_min, x_max, y_max).

def albumentations.augmentations.functional.equalize (img, mask=None, mode='cv', by_channels=True) [view source on GitHub]

Equalize the image histogram.

Parameters:

Name Type Description
img numpy.ndarray

RGB or grayscale image.

mask numpy.ndarray

An optional mask. If given, only the pixels selected by the mask are included in the analysis. Maybe 1 channel or 3 channel array.

mode str

{'cv', 'pil'}. Use OpenCV or Pillow equalization method.

by_channels bool

If True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by Y channel.

Returns:

Type Description
numpy.ndarray

Equalized image.

def albumentations.augmentations.functional.fancy_pca (img, alpha=0.1) [view source on GitHub]

Perform 'Fancy PCA' augmentation from: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Parameters:

Name Type Description
img numpy.ndarray

numpy array with (h, w, rgb) shape, as ints between 0-255

alpha float

how much to perturb/scale the eigen vecs and vals the paper used std=0.1

Returns:

Type Description
numpy.ndarray

numpy image-like array as uint8 range(0, 255)

def albumentations.augmentations.functional.iso_noise (image, color_shift=0.05, intensity=0.5, random_state=None, ** kwargs) [view source on GitHub]

Apply poisson noise to image to simulate camera sensor noise.

Parameters:

Name Type Description
image numpy.ndarray

Input image, currently, only RGB, uint8 images are supported.

color_shift float
intensity float

Multiplication factor for noise values. Values of ~0.5 are produce noticeable, yet acceptable level of noise.

random_state
**kwargs

Returns:

Type Description
numpy.ndarray

Noised image

def albumentations.augmentations.functional.mask_from_bbox (img, bbox) [view source on GitHub]

Create binary mask from bounding box

Parameters:

Name Type Description
img numpy.ndarray

input image

bbox

A bounding box tuple (x_min, y_min, x_max, y_max)

Returns:

Type Description
mask (numpy.ndarray)

binary mask

def albumentations.augmentations.functional.move_tone_curve (img, low_y, high_y) [view source on GitHub]

Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name Type Description
img numpy.ndarray

RGB or grayscale image.

low_y float

y-position of a Bezier control point used to adjust the tone curve, must be in range [0, 1]

high_y float

y-position of a Bezier control point used to adjust image tone curve, must be in range [0, 1]

def albumentations.augmentations.functional.multiply (img, multiplier) [view source on GitHub]

Parameters:

Name Type Description
img numpy.ndarray

Image.

multiplier numpy.ndarray

Multiplier coefficient.

Returns:

Type Description
numpy.ndarray

Image multiplied by multiplier coefficient.

def albumentations.augmentations.functional.posterize (img, bits) [view source on GitHub]

Reduce the number of bits for each color channel.

Parameters:

Name Type Description
img numpy.ndarray

image to posterize.

bits int

number of high bits. Must be in range [0, 8]

Returns:

Type Description
numpy.ndarray

Image with reduced color channels.

def albumentations.augmentations.functional.solarize (img, threshold=128) [view source on GitHub]

Invert all pixel values above a threshold.

Parameters:

Name Type Description
img numpy.ndarray

The image to solarize.

threshold int

All pixels above this greyscale level are inverted.

Returns:

Type Description
numpy.ndarray

Solarized image.

def albumentations.augmentations.functional.swap_tiles_on_image (image, tiles) [view source on GitHub]

Swap tiles on image.

Parameters:

Name Type Description
image np.ndarray

Input image.

tiles np.ndarray

array of tuples( current_left_up_corner_row, current_left_up_corner_col, old_left_up_corner_row, old_left_up_corner_col, height_tile, width_tile)

Returns:

Type Description
np.ndarray

Output image.

albumentations.augmentations.geometric special

albumentations.augmentations.geometric.functional

def albumentations.augmentations.geometric.functional.bbox_flip (bbox, d, rows, cols) [view source on GitHub]

Flip a bounding box either vertically, horizontally or both depending on the value of d.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

d int

dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

if value of d is not -1, 0 or 1.

def albumentations.augmentations.geometric.functional.bbox_hflip (bbox, rows, cols) [view source on GitHub]

Flip a bounding box horizontally around the y-axis.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

def albumentations.augmentations.geometric.functional.bbox_rot90 (bbox, factor, rows, cols) [view source on GitHub]

Rotates a bounding box by 90 degrees CCW (see np.rot90)

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box tuple (x_min, y_min, x_max, y_max).

factor int

Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A bounding box tuple (x_min, y_min, x_max, y_max).

def albumentations.augmentations.geometric.functional.bbox_rotate (bbox, angle, method, rows, cols) [view source on GitHub]

Rotates a bounding box by angle degrees.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

angle float

Angle of rotation in degrees.

method str

Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

References: https://arxiv.org/abs/2109.13488

def albumentations.augmentations.geometric.functional.bbox_shift_scale_rotate (bbox, angle, scale, dx, dy, rotate_method, rows, cols, ** kwargs) [view source on GitHub]

Rotates, shifts and scales a bounding box. Rotation is made by angle degrees, scaling is made by scale factor and shifting is made by dx and dy.

Parameters:

Name Type Description
bbox tuple

A bounding box (x_min, y_min, x_max, y_max).

angle int

Angle of rotation in degrees.

scale int

Scale factor.

dx int

Shift along x-axis in pixel units.

dy int

Shift along y-axis in pixel units.

rotate_method(str)

Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description

A bounding box (x_min, y_min, x_max, y_max).

def albumentations.augmentations.geometric.functional.bbox_transpose (bbox, axis, rows, cols) [view source on GitHub]

Transposes a bounding box along given axis.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

axis int

0 - main axis, 1 - secondary axis.

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box tuple (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

If axis not equal to 0 or 1.

def albumentations.augmentations.geometric.functional.bbox_vflip (bbox, rows, cols) [view source on GitHub]

Flip a bounding box vertically around the x-axis.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A bounding box (x_min, y_min, x_max, y_max).

def albumentations.augmentations.geometric.functional.elastic_transform (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None, approximate=False, same_dxdy=False) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

def albumentations.augmentations.geometric.functional.elastic_transform_approx (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications for speed). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

def albumentations.augmentations.geometric.functional.from_distance_maps (distance_maps, inverted, if_not_found_coords, threshold=None) [view source on GitHub]

Convert outputs of to_distance_maps() to KeypointsOnImage. This is the inverse of to_distance_maps.

Parameters:

Name Type Description
distance_maps ndarray

The distance maps. N is the number of keypoints.

inverted bool

Whether the given distance maps were generated in inverted mode (i.e. :func:KeypointsOnImage.to_distance_maps was called with inverted=True) or in non-inverted mode.

if_not_found_coords Union[Sequence[int], dict]

Coordinates to use for keypoints that cannot be found in distance_maps.

  • If this is a list/tuple, it must contain two int values.
  • If it is a dict, it must contain the keys x and y with each containing one int value.
  • If this is None, then the keypoint will not be added.
threshold Optional[float]

The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use None to use no min/max.

nb_channels None, int

Number of channels of the image on which the keypoints are placed. Some keypoint augmenters require that information. If set to None, the keypoint's shape will be set to (height, width), otherwise (height, width, nb_channels).

def albumentations.augmentations.geometric.functional.grid_distortion (img, num_steps=10, xsteps=(), ysteps=(), interpolation=1, border_mode=4, value=None) [view source on GitHub]

Perform a grid distortion of an input image.

Reference: http://pythology.blogspot.sg/2014/03/interpolation-on-regular-distorted-grid.html

def albumentations.augmentations.geometric.functional.keypoint_flip (keypoint, d, rows, cols) [view source on GitHub]

Flip a keypoint either vertically, horizontally or both depending on the value of d.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

d int

Number of flip. Must be -1, 0 or 1: * 0 - vertical flip, * 1 - horizontal flip, * -1 - vertical and horizontal flip.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Exceptions:

Type Description
ValueError

if value of d is not -1, 0 or 1.

def albumentations.augmentations.geometric.functional.keypoint_hflip (keypoint, rows, cols) [view source on GitHub]

Flip a keypoint horizontally around the y-axis.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

def albumentations.augmentations.geometric.functional.keypoint_rot90 (keypoint, factor, rows, cols, ** params) [view source on GitHub]

Rotates a keypoint by 90 degrees CCW (see np.rot90)

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

factor int

Number of CCW rotations. Must be in range [0;3] See np.rot90.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A keypoint (x, y, angle, scale).

Exceptions:

Type Description
ValueError

if factor not in set {0, 1, 2, 3}

def albumentations.augmentations.geometric.functional.keypoint_rotate (keypoint, angle, rows, cols, ** params) [view source on GitHub]

Rotate a keypoint by angle.

Parameters:

Name Type Description
keypoint tuple

A keypoint (x, y, angle, scale).

angle float

Rotation angle.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
tuple

A keypoint (x, y, angle, scale).

def albumentations.augmentations.geometric.functional.keypoint_scale (keypoint, scale_x, scale_y) [view source on GitHub]

Scales a keypoint by scale_x and scale_y.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

scale_x float

Scale coefficient x-axis.

scale_y float

Scale coefficient y-axis.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

def albumentations.augmentations.geometric.functional.keypoint_transpose (keypoint) [view source on GitHub]

Rotate a keypoint by angle.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

def albumentations.augmentations.geometric.functional.keypoint_vflip (keypoint, rows, cols) [view source on GitHub]

Flip a keypoint vertically around the x-axis.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A keypoint (x, y, angle, scale).

def albumentations.augmentations.geometric.functional.py3round (number) [view source on GitHub]

Unified rounding in all python versions.

def albumentations.augmentations.geometric.functional.rotation2DMatrixToEulerAngles (matrix, y_up=False) [view source on GitHub]

Parameters:

Name Type Description
matrix ndarray

Rotation matrix

y_up bool

is Y axis looks up or down

def albumentations.augmentations.geometric.functional.to_distance_maps (keypoints, height, width, inverted=False) [view source on GitHub]

Generate a (H,W,N) array of distance maps for N keypoints.

The n-th distance map contains at every location (y, x) the euclidean distance to the n-th keypoint.

This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.

Parameters:

Name Type Description
keypoint

keypoint coordinates

height int

image height

width int

image width

inverted bool

If True, inverted distance maps are returned where each distance value d is replaced by d/(d+1), i.e. the distance maps have values in the range (0.0, 1.0] with 1.0 denoting exactly the position of the respective keypoint.

Returns:

Type Description
ndarray

(H, W, N) ndarray A float32 array containing N distance maps for N keypoints. Each location (y, x, n) in the array denotes the euclidean distance at (y, x) to the n-th keypoint. If inverted is True, the distance d is replaced by d/(d+1). The height and width of the array match the height and width in KeypointsOnImage.shape.

albumentations.augmentations.geometric.resize

class albumentations.augmentations.geometric.resize.LongestMaxSize (max_size=1024, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name Type Description
max_size int, list of int

maximum size of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.

interpolation OpenCV flag

interpolation method. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.resize.RandomScale (scale_limit=0.1, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Randomly resize the input. Output image size is different from the input image size.

Parameters:

Name Type Description
scale_limit [float, float] or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.resize.Resize (height, width, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Resize the input to the given height and width.

Parameters:

Name Type Description
height int

desired height of the output.

width int

desired width of the output.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.resize.SmallestMaxSize (max_size=1024, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name Type Description
max_size int, list of int

maximum size of smallest side of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.

interpolation OpenCV flag

interpolation method. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.rotate

class albumentations.augmentations.geometric.rotate.RandomRotate90 [view source on GitHub]

Randomly rotate the input by 90 degrees zero or more times.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.rotate.RandomRotate90.apply (self, img, factor=0, **params)

Parameters:

Name Type Description
factor int

number of times the input will be rotated by 90 degrees.

class albumentations.augmentations.geometric.rotate.Rotate (limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', crop_border=False, always_apply=False, p=0.5) [view source on GitHub]

Rotate the input by an angle selected randomly from the uniform distribution.

Parameters:

Name Type Description
limit [int, int] or int

range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"

crop_border bool

If True would make a largest possible crop within rotated image

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.rotate.SafeRotate (limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5) [view source on GitHub]

Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we may see some artifacts.

Parameters:

Name Type Description
limit [int, int] or int

range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.transforms

class albumentations.augmentations.geometric.transforms.Affine (scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', always_apply=False, p=0.5) [view source on GitHub]

Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.

Affine transformations involve:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)

All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval and mode of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.

Parameters:

Name Type Description
scale number, tuple of number or dict

Scaling factor to use, where 1.0 denotes "no change" and 0.5 is zoomed out to 50 percent of the original size. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set keep_ratio=True, then the same value will be used for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes. Note that when the keep_ratio=True, the x- and y-axis ranges should be the same.

translate_percent None, number, tuple of number or dict

Translation as a fraction of the image height/width (x-translation, y-translation), where 0 denotes "no change" and 0.5 denotes "half of the axis size". * If None then equivalent to 0.0 unless translate_px has a value other than None. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

translate_px None, int, tuple of int or dict

Translation in pixels. * If None then equivalent to 0 unless translate_percent has a value other than None. * If a single int, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the discrete interval [a..b]. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

rotate number or tuple of number

Rotation in degrees (NOT radians), i.e. expected value range is around [-360, 360]. Rotation happens around the center of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b] and used as the rotation value.

shear number, tuple of number or dict

Shear in degrees (NOT radians), i.e. expected value range is around [-360, 360], with reasonable values being in the range of [-45, 45]. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple (a, b), then two value will be uniformly sampled per image from the interval [a, b] and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

interpolation int

OpenCV interpolation flag.

mask_interpolation int

OpenCV interpolation flag.

cval number or sequence of number

The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when mode=constant. The expected value range is [0, 255] for uint8 images.

cval_mask number or tuple of number

Same as cval but only for masks.

mode int

OpenCV border flag.

fit_output bool

If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (translate_percent and translate_px are ignored). Otherwise (False), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False

keep_ratio bool

When True, the original aspect ratio will be kept when the random scale is applied. Default: False.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

Reference: [1] https://arxiv.org/abs/2109.13488

class albumentations.augmentations.geometric.transforms.ElasticTransform (alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, same_dxdy=False, p=0.5) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters:

Name Type Description
alpha float
sigma float

Gaussian filter parameter.

alpha_affine float

The range will be (-alpha_affine, alpha_affine)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

approximate boolean

Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images.

same_dxdy boolean

Whether to use same random generated shift for x and y. Enabling this option gives ~2X speedup.

Targets: image, mask, bbox

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.Flip [view source on GitHub]

Flip the input either horizontally, vertically or both horizontally and vertically.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.transforms.Flip.apply (self, img, d=0, **params)

d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).

class albumentations.augmentations.geometric.transforms.GridDistortion (num_steps=5, distort_limit=0.3, interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=False, p=0.5) [view source on GitHub]

Parameters:

Name Type Description
num_steps int

count of grid cells on each side.

distort_limit float, [float, float]

If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

normalized bool

if true, distortion will be normalized to do not go outside the image. Default: False See for more information: https://github.com/albumentations-team/albumentations/pull/722

Targets: image, mask

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.HorizontalFlip [view source on GitHub]

Flip the input horizontally around the y-axis.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.OpticalDistortion (distort_limit=0.05, shift_limit=0.05, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5) [view source on GitHub]

Parameters:

Name Type Description
distort_limit float, [float, float]

If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).

shift_limit float, [float, float]

If shift_limit is a single float, the range will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

Targets: image, mask, bbox

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.PadIfNeeded (min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=False, p=1.0) [view source on GitHub]

Pad side of the image / max if side is less than desired number.

Parameters:

Name Type Description
min_height int

minimal result image height.

min_width int

minimal result image width.

pad_height_divisor int

if not None, ensures image height is dividable by value of this argument.

pad_width_divisor int

if not None, ensures image width is dividable by value of this argument.

position Union[str, PositionType]

Position of the image. should be PositionType.CENTER or PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT. or PositionType.RANDOM. Default: PositionType.CENTER.

border_mode OpenCV flag

OpenCV border mode.

value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

padding value for mask if border_mode is cv2.BORDER_CONSTANT.

p float

probability of applying the transform. Default: 1.0.

Targets: image, mask, bbox, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.PadIfNeeded.PositionType

An enumeration.

class albumentations.augmentations.geometric.transforms.Perspective (scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Perform a random four point perspective transform of the input.

Parameters:

Name Type Description
scale float or [float, float]

standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).

keep_size bool

Whether to resize image’s back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True

pad_mode OpenCV flag

OpenCV border mode.

pad_val int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0

mask_pad_val int, float, list of int, list of float

padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0

fit_output bool

If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.PiecewiseAffine (scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5) [view source on GitHub]

Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.

This is mostly a wrapper around scikit-image's PiecewiseAffine. See also Affine for a similar technique.

Note: This augmenter is very slow. Try to use ElasticTransformation instead, which is at least 10x faster.

Note: For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.

Parameters:

Name Type Description
scale float, tuple of float

Each point on the regular grid is moved around via a normal distribution. This scale factor is equivalent to the normal distribution's sigma. Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of the image if absolute_scale=False (default), so this scale can be the same for different sized images. Recommended values are in the range 0.01 to 0.05 (weak to strong augmentations). * If a single float, then that value will always be used as the scale. * If a tuple (a, b) of float s, then a random value will be uniformly sampled per image from the interval [a, b].

nb_rows int, tuple of int

Number of rows of points that the regular grid should have. Must be at least 2. For large images, you might want to pick a higher value than 4. You might have to then adjust scale to lower values. * If a single int, then that value will always be used as the number of rows. * If a tuple (a, b), then a value from the discrete interval [a..b] will be uniformly sampled per image.

nb_cols int, tuple of int

Number of columns. Analogous to nb_rows.

interpolation int

The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic

mask_interpolation int

same as interpolation but for mask.

cval number

The constant value to use when filling in newly created pixels.

cval_mask number

Same as cval but only for masks.

mode str

{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional Points outside the boundaries of the input are filled according to the given mode. Modes match the behaviour of numpy.pad.

absolute_scale bool

Take scale as an absolute value rather than a relative value.

keypoints_threshold float

Used as threshold in conversion from distance maps to keypoints. The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use None to use no min/max. Default: 0.01

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.ShiftScaleRotate (shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=False, p=0.5) [view source on GitHub]

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters:

Name Type Description
shift_limit [float, float] or float

shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).

scale_limit [float, float] or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).

rotate_limit [int, int] or int

rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

shift_limit_x [float, float] or float

shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None.

shift_limit_y [float, float] or float

shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.Transpose [view source on GitHub]

Transpose the input by swapping rows and columns.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.VerticalFlip [view source on GitHub]

Flip the input vertically around the x-axis.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.transforms

class albumentations.augmentations.transforms.ChannelShuffle [view source on GitHub]

Randomly rearrange channels of the input RGB image.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.CLAHE (clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5) [view source on GitHub]

Apply Contrast Limited Adaptive Histogram Equalization to the input image.

Parameters:

Name Type Description
clip_limit float or [float, float]

upper threshold value for contrast limiting. If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).

tile_grid_size [int, int]

size of grid for histogram equalization. Default: (8, 8).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2, always_apply=False, p=0.5) [view source on GitHub]

Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision, this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8 overflow, but we use value saturation.

Parameters:

Name Type Description
brightness float or tuple of float (min, max

How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.

contrast float or tuple of float (min, max

How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.

saturation float or tuple of float (min, max

How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.

hue float or tuple of float (min, max

How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

class albumentations.augmentations.transforms.Downscale (scale_min=0.25, scale_max=0.25, interpolation=None, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by downscaling and upscaling back.

Parameters:

Name Type Description
scale_min float

lower bound on the image scale. Should be < 1.

scale_max float

lower bound on the image scale. Should be .

interpolation

cv2 interpolation method. Could be: - single cv2 interpolation flag - selected method will be used for downscale and upscale. - dict(downscale=flag, upscale=flag) - Downscale.Interpolation(downscale=flag, upscale=flag) - Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.Emboss (alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5) [view source on GitHub]

Emboss the input image and overlays the result with the original image.

Parameters:

Name Type Description
alpha [float, float]

range to choose the visibility of the embossed image. At 0, only the original image is visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).

strength [float, float]

strength range of the embossing. Default: (0.2, 0.7).

p float

probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.augmentations.transforms.Equalize (mode='cv', by_channels=True, mask=None, mask_params=(), always_apply=False, p=0.5) [view source on GitHub]

Equalize the image histogram.

Parameters:

Name Type Description
mode str

{'cv', 'pil'}. Use OpenCV or Pillow equalization method.

by_channels bool

If True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by Y channel.

mask np.ndarray, callable

If given, only the pixels selected by the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable. Function signature must include image argument.

mask_params list of str

Params for mask function.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.FancyPCA (alpha=0.1, always_apply=False, p=0.5) [view source on GitHub]

Augment RGB image using FancyPCA from Krizhevsky's paper "ImageNet Classification with Deep Convolutional Neural Networks"

Parameters:

Name Type Description
alpha float

how much to perturb/scale the eigen vecs and vals. scale is samples from gaussian distribution (mu=0, sigma=alpha)

Targets: image

Image types: 3-channel uint8 images only

Credit: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image https://pixelatedbrian.github.io/2018-04-29-fancy_pca/

class albumentations.augmentations.transforms.FromFloat (dtype='uint16', max_value=None, always_apply=False, p=1.0) [view source on GitHub]

Take an input array where all values should lie in the range [0, 1.0], multiply them by max_value and then cast the resulted value to a type specified by dtype. If max_value is None the transform will try to infer the maximum value for the data type from the dtype argument.

This is the inverse transform for :class:~albumentations.augmentations.transforms.ToFloat.

Parameters:

Name Type Description
max_value float

maximum possible input value. Default: None.

dtype string or numpy data type

data type of the output. See the 'Data types' page from the NumPy docs_. Default: 'uint16'.

p float

probability of applying the transform. Default: 1.0.

Targets: image

Image types: float32

.. _'Data types' page from the NumPy docs: https://docs.scipy.org/doc/numpy/user/basics.types.html

class albumentations.augmentations.transforms.GaussNoise (var_limit=(10.0, 50.0), mean=0, per_channel=True, always_apply=False, p=0.5) [view source on GitHub]

Apply gaussian noise to the input image.

Parameters:

Name Type Description
var_limit [float, float] or float

variance range for noise. If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0).

mean float

mean of the noise. Default: 0

per_channel bool

if set to True, noise will be sampled for each channel independently. Otherwise, the noise will be sampled once for all channels. Default: True

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.HueSaturationValue (hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, always_apply=False, p=0.5) [view source on GitHub]

Randomly change hue, saturation and value of the input image.

Parameters:

Name Type Description
hue_shift_limit [int, int] or int

range for changing hue. If hue_shift_limit is a single int, the range will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).

sat_shift_limit [int, int] or int

range for changing saturation. If sat_shift_limit is a single int, the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).

val_shift_limit [int, int] or int

range for changing value. If val_shift_limit is a single int, the range will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ImageCompression (quality_lower=99, quality_upper=100, compression_type=<ImageCompressionType.JPEG: 0>, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by Jpeg, WebP compression of an image.

Parameters:

Name Type Description
quality_lower float

lower bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.

quality_upper float

upper bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.

compression_type ImageCompressionType

should be ImageCompressionType.JPEG or ImageCompressionType.WEBP. Default: ImageCompressionType.JPEG

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ImageCompression.ImageCompressionType

An enumeration.

class albumentations.augmentations.transforms.InvertImg [view source on GitHub]

Invert the input image by subtracting pixel values from 255.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ISONoise (color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply camera sensor noise.

Parameters:

Name Type Description
color_shift [float, float]

variance range for color hue change. Measured as a fraction of 360 degree Hue angle in HLS colorspace.

intensity [float, float]

Multiplicative factor that control strength of color and luminace noise.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.JpegCompression (quality_lower=99, quality_upper=100, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by Jpeg compression of an image.

Parameters:

Name Type Description
quality_lower float

lower bound on the jpeg quality. Should be in [0, 100] range

quality_upper float

upper bound on the jpeg quality. Should be in [0, 100] range

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.Lambda (image=None, mask=None, keypoint=None, bbox=None, name=None, always_apply=False, p=1.0) [view source on GitHub]

A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optinal arguments like interpolation method, image size, etc:

Parameters:

Name Type Description
image callable

Image transformation function.

mask callable

Mask transformation function.

keypoint callable

Keypoint transformation function.

bbox callable

BBox transformation function.

always_apply bool

Indicates whether this transformation should be always applied.

p float

probability of applying the transform. Default: 1.0.

Targets: image, mask, bboxes, keypoints

Image types: Any

class albumentations.augmentations.transforms.MultiplicativeNoise (multiplier=(0.9, 1.1), per_channel=False, elementwise=False, always_apply=False, p=0.5) [view source on GitHub]

Multiply image to random number or array of numbers.

Parameters:

Name Type Description
multiplier float or tuple of floats

If single float image will be multiplied to this number. If tuple of float multiplier will be in range [multiplier[0], multiplier[1]). Default: (0.9, 1.1).

per_channel bool

If False, same values for all channels will be used. If True use sample values for each channels. Default False.

elementwise bool

If False multiply multiply all pixels in an image with a random value sampled once. If True Multiply image pixels with values that are pixelwise randomly sampled. Defaule: False.

Targets: image

Image types: Any

class albumentations.augmentations.transforms.Normalize (mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, always_apply=False, p=1.0) [view source on GitHub]

Normalization is applied by the formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value)

Parameters:

Name Type Description
mean float, list of float

mean values

std (float, list of float

std values

max_pixel_value float

maximum possible pixel value

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.PixelDropout (dropout_prob=0.01, per_channel=False, drop_value=0, mask_drop_value=None, always_apply=False, p=0.5) [view source on GitHub]

Set pixels to 0 with some probability.

Parameters:

Name Type Description
dropout_prob float

pixel drop probability. Default: 0.01

per_channel bool

if set to True drop mask will be sampled fo each channel, otherwise the same mask will be sampled for all channels. Default: False

drop_value number or sequence of numbers or None

Value that will be set in dropped place. If set to None value will be sampled randomly, default ranges will be used: - uint8 - [0, 255] - uint16 - [0, 65535] - uint32 - [0, 4294967295] - float, double - [0, 1] Default: 0

mask_drop_value number or sequence of numbers or None

Value that will be set in dropped place in masks. If set to None masks will be unchanged. Default: 0

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask Image types: any

class albumentations.augmentations.transforms.Posterize (num_bits=4, always_apply=False, p=0.5) [view source on GitHub]

Reduce the number of bits for each color channel.

Parameters:

Name Type Description
num_bits [int, int] or int, or list of ints [r, g, b], or list of ints [[r1, r1], [g1, g2], [b1, b2]]

number of high bits. If num_bits is a single value, the range will be [num_bits, num_bits]. Must be in range [0, 8]. Default: 4.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.RandomBrightness (limit=0.2, always_apply=False, p=0.5) [view source on GitHub]

Randomly change brightness of the input image.

Parameters:

Name Type Description
limit [float, float] or float

factor range for changing brightness. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomBrightnessContrast (brightness_limit=0.2, contrast_limit=0.2, brightness_by_max=True, always_apply=False, p=0.5) [view source on GitHub]

Randomly change brightness and contrast of the input image.

Parameters:

Name Type Description
brightness_limit [float, float] or float

factor range for changing brightness. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).

contrast_limit [float, float] or float

factor range for changing contrast. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).

brightness_by_max Boolean

If True adjust contrast by image dtype maximum, else adjust contrast by image mean.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomContrast (limit=0.2, always_apply=False, p=0.5) [view source on GitHub]

Randomly change contrast of the input image.

Parameters:

Name Type Description
limit [float, float] or float

factor range for changing contrast. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomFog (fog_coef_lower=0.3, fog_coef_upper=1, alpha_coef=0.08, always_apply=False, p=0.5) [view source on GitHub]

Simulates fog for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
fog_coef_lower float

lower limit for fog intensity coefficient. Should be in [0, 1] range.

fog_coef_upper float

upper limit for fog intensity coefficient. Should be in [0, 1] range.

alpha_coef float

transparency of the fog circles. Should be in [0, 1] range.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomGamma (gamma_limit=(80, 120), eps=None, always_apply=False, p=0.5) [view source on GitHub]

Parameters:

Name Type Description
gamma_limit float or [float, float]

If gamma_limit is a single float value, the range will be (-gamma_limit, gamma_limit). Default: (80, 120).

eps

Deprecated.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomGravel (gravel_roi=(0.1, 0.4, 0.9, 0.9), number_of_patches=2, always_apply=False, p=0.5) [view source on GitHub]

Add gravels.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
gravel_roi float, float, float, float

(top-left x, top-left y, bottom-right x, bottom right y). Should be in [0, 1] range

number_of_patches int

no. of gravel patches required

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomGridShuffle (grid=(3, 3), always_apply=False, p=0.5) [view source on GitHub]

Random shuffle grid's cells on image.

Parameters:

Name Type Description
grid [int, int]

size of grid for splitting image.

Targets: image, mask, keypoints

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomRain (slant_lower=-10, slant_upper=10, drop_length=20, drop_width=1, drop_color=(200, 200, 200), blur_value=7, brightness_coefficient=0.7, rain_type=None, always_apply=False, p=0.5) [view source on GitHub]

Adds rain effects.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
slant_lower

should be in range [-20, 20].

slant_upper

should be in range [-20, 20].

drop_length

should be in range [0, 100].

drop_width

should be in range [1, 5].

drop_color list of (r, g, b

rain lines color.

blur_value int

rainy view are blurry

brightness_coefficient float

rainy days are usually shady. Should be in range [0, 1].

rain_type

One of [None, "drizzle", "heavy", "torrential"]

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomShadow (shadow_roi=(0, 0.5, 1, 1), num_shadows_lower=1, num_shadows_upper=2, shadow_dimension=5, always_apply=False, p=0.5) [view source on GitHub]

Simulates shadows for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
shadow_roi float, float, float, float

region of the image where shadows will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].

num_shadows_lower int

Lower limit for the possible number of shadows. Should be in range [0, num_shadows_upper].

num_shadows_upper int

Lower limit for the possible number of shadows. Should be in range [num_shadows_lower, inf].

shadow_dimension int

number of edges in the shadow polygons

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomSnow (snow_point_lower=0.1, snow_point_upper=0.3, brightness_coeff=2.5, always_apply=False, p=0.5) [view source on GitHub]

Bleach out some pixel values simulating snow.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
snow_point_lower float

lower_bond of the amount of snow. Should be in [0, 1] range

snow_point_upper float

upper_bond of the amount of snow. Should be in [0, 1] range

brightness_coeff float

larger number will lead to a more snow on the image. Should be >= 0

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomSunFlare (flare_roi=(0, 0, 1, 0.5), angle_lower=0, angle_upper=1, num_flare_circles_lower=6, num_flare_circles_upper=10, src_radius=400, src_color=(255, 255, 255), always_apply=False, p=0.5) [view source on GitHub]

Simulates Sun Flare for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
flare_roi float, float, float, float

region of the image where flare will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].

angle_lower float

should be in range [0, angle_upper].

angle_upper float

should be in range [angle_lower, 1].

num_flare_circles_lower int

lower limit for the number of flare circles. Should be in range [0, num_flare_circles_upper].

num_flare_circles_upper int

upper limit for the number of flare circles. Should be in range [num_flare_circles_lower, inf].

src_radius int
src_color int, int, int

color of the flare

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomToneCurve (scale=0.1, always_apply=False, p=0.5) [view source on GitHub]

Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name Type Description
scale float

standard deviation of the normal distribution. Used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Default: 0.1

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.RGBShift (r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, always_apply=False, p=0.5) [view source on GitHub]

Randomly shift values for each channel of the input RGB image.

Parameters:

Name Type Description
r_shift_limit [int, int] or int

range for changing values for the red channel. If r_shift_limit is a single int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).

g_shift_limit [int, int] or int

range for changing values for the green channel. If g_shift_limit is a single int, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).

b_shift_limit [int, int] or int

range for changing values for the blue channel. If b_shift_limit is a single int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RingingOvershoot (blur_limit=(7, 15), cutoff=(0.7853981633974483, 1.5707963267948966), always_apply=False, p=0.5) [view source on GitHub]

Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.

Parameters:

Name Type Description
blur_limit int, [int, int]

maximum kernel size for sinc filter. Should be in range [3, inf). Default: (7, 15).

cutoff float, [float, float]

range to choose the cutoff frequency in radians. Should be in range (0, np.pi) Default: (np.pi / 4, np.pi / 2).

p float

probability of applying the transform. Default: 0.5.

Reference: dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter https://arxiv.org/abs/2107.10833

Targets: image

class albumentations.augmentations.transforms.Sharpen (alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image and overlays the result with the original image.

Parameters:

Name Type Description
alpha [float, float]

range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).

lightness [float, float]

range to choose the lightness of the sharpened image. Default: (0.5, 1.0).

p float

probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.augmentations.transforms.Solarize (threshold=128, always_apply=False, p=0.5) [view source on GitHub]

Invert all pixel values above a threshold.

Parameters:

Name Type Description
threshold [int, int] or int, or [float, float] or float

range for solarizing threshold. If threshold is a single value, the range will be [threshold, threshold]. Default: 128.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: any

class albumentations.augmentations.transforms.Spatter (mean=0.65, std=0.3, gauss_sigma=2, cutout_threshold=0.68, intensity=0.6, mode='rain', color=None, always_apply=False, p=0.5) [view source on GitHub]

Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

Parameters:

Name Type Description
mean float, or tuple of floats

Mean value of normal distribution for generating liquid layer. If single float it will be used as mean. If tuple of float mean will be sampled from range [mean[0], mean[1]). Default: (0.65).

std float, or tuple of floats

Standard deviation value of normal distribution for generating liquid layer. If single float it will be used as std. If tuple of float std will be sampled from range [std[0], std[1]). Default: (0.3).

gauss_sigma float, or tuple of floats

Sigma value for gaussian filtering of liquid layer. If single float it will be used as gauss_sigma. If tuple of float gauss_sigma will be sampled from range [sigma[0], sigma[1]). Default: (2).

cutout_threshold float, or tuple of floats

Threshold for filtering liqued layer (determines number of drops). If single float it will used as cutout_threshold. If tuple of float cutout_threshold will be sampled from range [cutout_threshold[0], cutout_threshold[1]). Default: (0.68).

intensity float, or tuple of floats

Intensity of corruption. If single float it will be used as intensity. If tuple of float intensity will be sampled from range [intensity[0], intensity[1]). Default: (0.6).

mode string, or list of strings

Type of corruption. Currently, supported options are 'rain' and 'mud'. If list is provided type of corruption will be sampled list. Default: ("rain").

color list of (r, g, b) or dict or None

Corruption elements color. If list uses provided list as color for specified mode. If dict uses provided color for specified mode. Color for each specified mode should be provided in dict. If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

Reference: | https://arxiv.org/pdf/1903.12261.pdf | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

class albumentations.augmentations.transforms.Superpixels (p_replace=0.1, n_segments=100, max_size=128, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Transform images partially/completely to their superpixel representation. This implementation uses skimage's version of the SLIC algorithm.

Parameters:

Name Type Description
p_replace float or tuple of float

Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed). Examples: * A probability of 0.0 would mean, that the pixels in no segment are replaced by their average color (image is not changed at all). * A probability of 0.5 would mean, that around half of all segments are replaced by their average color. * A probability of 1.0 would mean, that all segments are replaced by their average color (resulting in a voronoi image). Behaviour based on chosen data types for this parameter: * If a float, then that flat will always be used. * If tuple (a, b), then a random probability will be sampled from the interval [a, b] per image.

n_segments int, or tuple of int

Rough target number of how many superpixels to generate (the algorithm may deviate from this number). Lower value will lead to coarser superpixels. Higher values are computationally more intensive and will hence lead to a slowdown * If a single int, then that value will always be used as the number of segments. * If a tuple (a, b), then a value from the discrete interval [a..b] will be sampled per image.

max_size int or None

Maximum image size at which the augmentation is performed. If the width or height of an image exceeds this value, it will be downscaled before the augmentation so that the longest side matches max_size. This is done to speed up the process. The final output image has the same size as the input image. Note that in case p_replace is below 1.0, the down-/upscaling will affect the not-replaced pixels too. Use None to apply no down-/upscaling.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.augmentations.transforms.TemplateTransform (templates, img_weight=0.5, template_weight=0.5, template_transform=None, name=None, always_apply=False, p=0.5) [view source on GitHub]

Apply blending of input image with specified templates

Parameters:

Name Type Description
templates numpy array or list of numpy arrays

Images as template for transform.

img_weight [float, float] or float

If single float will be used as weight for input image. If tuple of float img_weight will be in range [img_weight[0], img_weight[1]). Default: 0.5.

template_weight [float, float] or float

If single float will be used as weight for template. If tuple of float template_weight will be in range [template_weight[0], template_weight[1]). Default: 0.5.

template_transform

transformation object which could be applied to template, must produce template the same size as input image.

name string

(Optional) Name of transform, used only for deserialization.

p float

probability of applying the transform. Default: 0.5.

Targets: image Image types: uint8, float32

class albumentations.augmentations.transforms.ToFloat (max_value=None, always_apply=False, p=1.0) [view source on GitHub]

Divide pixel values by max_value to get a float32 output array where all values lie in the range [0, 1.0]. If max_value is None the transform will try to infer the maximum value by inspecting the data type of the input image.

See Also: :class:~albumentations.augmentations.transforms.FromFloat

Parameters:

Name Type Description
max_value float

maximum possible input value. Default: None.

p float

probability of applying the transform. Default: 1.0.

Targets: image

Image types: any type

class albumentations.augmentations.transforms.ToGray [view source on GitHub]

Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater than 127, invert the resulting grayscale image.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ToRGB (always_apply=True, p=1.0) [view source on GitHub]

Convert the input grayscale image to RGB.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 1.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ToSepia (always_apply=False, p=0.5) [view source on GitHub]

Applies sepia filter to the input RGB image

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.UnsharpMask (blur_limit=(3, 7), sigma_limit=0.0, alpha=(0.2, 0.5), threshold=10, always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

Parameters:

Name Type Description
blur_limit int, [int, int]

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigma_limit float, [float, float]

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigma_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

alpha float, [float, float]

range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).

threshold int

Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10.

p float

probability of applying the transform. Default: 0.5.

Reference: arxiv.org/pdf/2107.10833.pdf

Targets: image

albumentations.augmentations.utils

def albumentations.augmentations.utils.ensure_contiguous (func) [view source on GitHub]

Ensure that input img is contiguous.

def albumentations.augmentations.utils.get_opencv_dtype_from_numpy (value) [view source on GitHub]

Return a corresponding OpenCV dtype for a numpy's dtype :param value: Input dtype of numpy array :return: Corresponding dtype for OpenCV

def albumentations.augmentations.utils.preserve_channel_dim (func) [view source on GitHub]

Preserve dummy channel dim.

def albumentations.augmentations.utils.preserve_shape (func) [view source on GitHub]

Preserve shape of the image

albumentations.core special

albumentations.core.bbox_utils

class albumentations.core.bbox_utils.BboxParams (format, label_fields=None, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0, check_each_transform=True) [view source on GitHub]

Parameters of bounding boxes

Parameters:

Name Type Description
format str

format of bounding boxes. Should be 'coco', 'pascal_voc', 'albumentations' or 'yolo'.

The coco format [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. The albumentations format is like pascal_voc, but normalized, in other words: [x_min, y_min, x_max, y_max], e.g. [0.2, 0.3, 0.4, 0.5]. The yolo format [x, y, width, height], e.g. [0.1, 0.2, 0.3, 0.4]; x, y - normalized bbox center; width, height - normalized bbox width and height.

label_fields list

list of fields that are joined with boxes, e.g labels. Should be same type as boxes.

min_area float

minimum area of a bounding box. All bounding boxes whose visible area in pixels is less than this value will be removed. Default: 0.0.

min_visibility float

minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.

min_width float

Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0.

min_height float

Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0.

check_each_transform bool

if True, then bboxes will be checked after each dual transform. Default: True

def albumentations.core.bbox_utils.calculate_bbox_area (bbox, rows, cols) [view source on GitHub]

Calculate the area of a bounding box in (fractional) pixels.

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

A bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
float

Area in (fractional) pixels of the (denormalized) bounding box.

def albumentations.core.bbox_utils.check_bbox (bbox) [view source on GitHub]

Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums

def albumentations.core.bbox_utils.check_bboxes (bboxes) [view source on GitHub]

Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums

def albumentations.core.bbox_utils.convert_bbox_from_albumentations (bbox, target_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a bounding box from the format used by albumentations to a format, specified in target_format.

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

An albumentations bounding box (x_min, y_min, x_max, y_max).

target_format str

required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.

rows int

Image height.

cols int

Image width.

check_validity bool

Check if all boxes are valid boxes.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

tuple: A bounding box.

Note: The coco format of a bounding box looks like [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format of a bounding box looks like [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. The yolo format of a bounding box looks like [x, y, width, height], e.g. [0.3, 0.1, 0.05, 0.07].

Exceptions:

Type Description
ValueError

if target_format is not equal to coco, pascal_voc or yolo.

def albumentations.core.bbox_utils.convert_bbox_to_albumentations (bbox, source_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a bounding box from a format specified in source_format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in a form of (x_min, y_min, x_max, y_max) e.g. (0.15, 0.27, 0.67, 0.5).

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

A bounding box tuple.

source_format str

format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.

check_validity bool

Check if all boxes are valid boxes.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

tuple: A bounding box (x_min, y_min, x_max, y_max).

Note: The coco format of a bounding box looks like (x_min, y_min, width, height), e.g. (97, 12, 150, 200). The pascal_voc format of a bounding box looks like (x_min, y_min, x_max, y_max), e.g. (97, 12, 247, 212). The yolo format of a bounding box looks like (x, y, width, height), e.g. (0.3, 0.1, 0.05, 0.07); where x, y coordinates of the center of the box, all values normalized to 1 by image height and width.

Exceptions:

Type Description
ValueError

if target_format is not equal to coco or pascal_voc, or yolo.

ValueError

If in YOLO format all labels not in range (0, 1).

def albumentations.core.bbox_utils.convert_bboxes_from_albumentations (bboxes, target_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a list of bounding boxes from the format used by albumentations to a format, specified in target_format.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of albumentation bounding box (x_min, y_min, x_max, y_max).

target_format str

required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.

rows int

Image height.

cols int

Image width.

check_validity bool

Check if all boxes are valid boxes.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of bounding boxes.

def albumentations.core.bbox_utils.convert_bboxes_to_albumentations (bboxes, source_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a list bounding boxes from a format specified in source_format to the format used by albumentations

def albumentations.core.bbox_utils.denormalize_bbox (bbox, rows, cols) [view source on GitHub]

Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates by image height. This is an inverse operation for :func:~albumentations.augmentations.bbox.normalize_bbox.

Parameters:

Name Type Description
bbox ~TBox

Normalized bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
~TBox

Denormalized bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

If rows or cols is less or equal zero

def albumentations.core.bbox_utils.denormalize_bboxes (bboxes, rows, cols) [view source on GitHub]

Denormalize a list of bounding boxes.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Normalized bounding boxes [(x_min, y_min, x_max, y_max)].

rows int

Image height.

cols int

Image width.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List: Denormalized bounding boxes [(x_min, y_min, x_max, y_max)].

def albumentations.core.bbox_utils.filter_bboxes (bboxes, rows, cols, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0) [view source on GitHub]

Remove bounding boxes that either lie outside of the visible area by more then min_visibility or whose area in pixels is under the threshold set by min_area. Also it crops boxes to final image size.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of albumentation bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

min_area float

Minimum area of a bounding box. All bounding boxes whose visible area in pixels. is less than this value will be removed. Default: 0.0.

min_visibility float

Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.

min_width float

Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0.

min_height float

Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of bounding boxes.

def albumentations.core.bbox_utils.filter_bboxes_by_visibility (original_shape, bboxes, transformed_shape, transformed_bboxes, threshold=0.0, min_area=0.0) [view source on GitHub]

Filter bounding boxes and return only those boxes whose visibility after transformation is above the threshold and minimal area of bounding box in pixels is more then min_area.

Parameters:

Name Type Description
original_shape Sequence[int]

Original image shape (height, width, ...).

bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Original bounding boxes [(x_min, y_min, x_max, y_max)].

transformed_shape Sequence[int]

Transformed image shape (height, width).

transformed_bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Transformed bounding boxes [(x_min, y_min, x_max, y_max)].

threshold float

visibility threshold. Should be a value in the range [0.0, 1.0].

min_area float

Minimal area threshold.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Filtered bounding boxes [(x_min, y_min, x_max, y_max)].

def albumentations.core.bbox_utils.normalize_bbox (bbox, rows, cols) [view source on GitHub]

Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates by image height.

Parameters:

Name Type Description
bbox ~TBox

Denormalized bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
~TBox

Normalized bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

If rows or cols is less or equal zero

def albumentations.core.bbox_utils.normalize_bboxes (bboxes, rows, cols) [view source on GitHub]

Normalize a list of bounding boxes.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Denormalized bounding boxes [(x_min, y_min, x_max, y_max)].

rows int

Image height.

cols int

Image width.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Normalized bounding boxes [(x_min, y_min, x_max, y_max)].

def albumentations.core.bbox_utils.union_of_bboxes (height, width, bboxes, erosion_rate=0.0) [view source on GitHub]

Calculate union of bounding boxes.

Parameters:

Name Type Description
height int

Height of image or space.

width int

Width of image or space.

bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List like bounding boxes. Format is [(x_min, y_min, x_max, y_max)].

erosion_rate float

How much each bounding box can be shrinked, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox to lose its volume.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

tuple: A bounding box (x_min, y_min, x_max, y_max).

albumentations.core.composition

class albumentations.core.composition.Compose (transforms, bbox_params=None, keypoint_params=None, additional_targets=None, p=1.0, is_check_shapes=True) [view source on GitHub]

Compose transforms and handle all transformations regarding bounding boxes

Parameters:

Name Type Description
transforms list

list of transformations to compose.

bbox_params BboxParams

Parameters for bounding boxes transforms

keypoint_params KeypointParams

Parameters for keypoints transforms

additional_targets dict

Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}

p float

probability of applying all list of transforms. Default: 1.0.

is_check_shapes bool

If True shapes consistency of images/mask/masks would be checked on each call. If you would like to disable this check - pass False (do it only if you are sure in your data consistency).

class albumentations.core.composition.OneOf (transforms, p=0.5) [view source on GitHub]

Select one of transforms to apply. Selected transform will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name Type Description
transforms list

list of transformations to compose.

p float

probability of applying selected transform. Default: 0.5.

class albumentations.core.composition.OneOrOther (first=None, second=None, transforms=None, p=0.5) [view source on GitHub]

Select one or another transform to apply. Selected transform will be called with force_apply=True.

class albumentations.core.composition.PerChannel (transforms, channels=None, p=0.5) [view source on GitHub]

Apply transformations per-channel

Parameters:

Name Type Description
transforms list

list of transformations to compose.

channels sequence

channels to apply the transform to. Pass None to apply to all. Default: None (apply to all)

p float

probability of applying the transform. Default: 0.5.

class albumentations.core.composition.Sequential (transforms, p=0.5) [view source on GitHub]

Sequentially applies all transforms to targets.

Note: This transform is not intended to be a replacement for Compose. Instead, it should be used inside Compose the same way OneOf or OneOrOther are used. For instance, you can combine OneOf with Sequential to create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly chose sequence to input data (see the Example section for an example definition of such pipeline).

Examples:

>>> import albumentations as A
>>> transform = A.Compose([
>>>    A.OneOf([
>>>        A.Sequential([
>>>            A.HorizontalFlip(p=0.5),
>>>            A.ShiftScaleRotate(p=0.5),
>>>        ]),
>>>        A.Sequential([
>>>            A.VerticalFlip(p=0.5),
>>>            A.RandomBrightnessContrast(p=0.5),
>>>        ]),
>>>    ], p=1)
>>> ])

class albumentations.core.composition.SomeOf (transforms, n, replace=True, p=1) [view source on GitHub]

Select N transforms to apply. Selected transforms will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name Type Description
transforms list

list of transformations to compose.

n int

number of transforms to apply.

replace bool

Whether the sampled transforms are with or without replacement. Default: True.

p float

probability of applying selected transform. Default: 1.

albumentations.core.keypoints_utils

class albumentations.core.keypoints_utils.KeypointParams (format, label_fields=None, remove_invisible=True, angle_in_degrees=True, check_each_transform=True) [view source on GitHub]

Parameters of keypoints

Parameters:

Name Type Description
format str

format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

x - X coordinate,

y - Y coordinate

s - Keypoint scale

a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)

label_fields list

list of fields that are joined with keypoints, e.g labels. Should be same type as keypoints.

remove_invisible bool

to remove invisible points after transform or not

angle_in_degrees bool

angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints

check_each_transform bool

if True, then keypoints will be checked after each dual transform. Default: True

def albumentations.core.keypoints_utils.check_keypoint (kp, rows, cols) [view source on GitHub]

Check if keypoint coordinates are less than image shapes

def albumentations.core.keypoints_utils.check_keypoints (keypoints, rows, cols) [view source on GitHub]

Check if keypoints boundaries are less than image shapes

albumentations.core.serialization

class albumentations.core.serialization.Serializable [view source on GitHub]

albumentations.core.serialization.Serializable.to_dict (self, on_not_implemented_error='raise')

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name Type Description
self

A transform that should be serialized. If the transform doesn't implement the to_dict method and on_not_implemented_error equals to 'raise' then NotImplementedError is raised. If on_not_implemented_error equals to 'warn' then NotImplementedError will be ignored but no transform parameters will be serialized.

on_not_implemented_error str

raise or warn.

class albumentations.core.serialization.SerializableMeta [view source on GitHub]

A metaclass that is used to register classes in SERIALIZABLE_REGISTRY or NON_SERIALIZABLE_REGISTRY so they can be found later while deserializing transformation pipeline using classes full names.

albumentations.core.serialization.SerializableMeta.__new__ (mcs, name, bases, *args, **kwargs) special staticmethod

Create and return a new object. See help(type) for accurate signature.

def albumentations.core.serialization.from_dict (transform_dict, nonserializable=None, lambda_transforms='deprecated') [view source on GitHub]

Parameters:

Name Type Description
transform_dict Dict[str, Any]

A dictionary with serialized transform pipeline.

nonserializable Optional[Dict[str, Any]]

A dictionary that contains non-serializable transforms. This dictionary is required when you are restoring a pipeline that contains non-serializable transforms. Keys in that dictionary should be named same as name arguments in respective transforms from a serialized pipeline.

lambda_transforms Union[Dict[str, Any], NoneType, str]

Deprecated. Use 'nonserizalizable' instead.

def albumentations.core.serialization.load (filepath, data_format='json', nonserializable=None, lambda_transforms='deprecated') [view source on GitHub]

Load a serialized pipeline from a json or yaml file and construct a transform pipeline.

Parameters:

Name Type Description
filepath str

Filepath to read from.

data_format str

Serialization format. Should be either json or 'yaml'.

nonserializable Optional[Dict[str, Any]]

A dictionary that contains non-serializable transforms. This dictionary is required when you are restoring a pipeline that contains non-serializable transforms. Keys in that dictionary should be named same as name arguments in respective transforms from a serialized pipeline.

lambda_transforms Union[Dict[str, Any], NoneType, str]

Deprecated. Use 'nonserizalizable' instead.

def albumentations.core.serialization.register_additional_transforms () [view source on GitHub]

Register transforms that are not imported directly into the albumentations module.

def albumentations.core.serialization.save (transform, filepath, data_format='json', on_not_implemented_error='raise') [view source on GitHub]

Take a transform pipeline, serialize it and save a serialized version to a file using either json or yaml format.

Parameters:

Name Type Description
transform Serializable

Transform to serialize.

filepath str

Filepath to write to.

data_format str

Serialization format. Should be either json or 'yaml'.

on_not_implemented_error str

Parameter that describes what to do if a transform doesn't implement the to_dict method. If 'raise' then NotImplementedError is raised, if warn then the exception will be ignored and no transform arguments will be saved.

def albumentations.core.serialization.to_dict (transform, on_not_implemented_error='raise') [view source on GitHub]

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name Type Description
transform Serializable

A transform that should be serialized. If the transform doesn't implement the to_dict method and on_not_implemented_error equals to 'raise' then NotImplementedError is raised. If on_not_implemented_error equals to 'warn' then NotImplementedError will be ignored but no transform parameters will be serialized.

on_not_implemented_error str

raise or warn.

albumentations.core.transforms_interface

class albumentations.core.transforms_interface.BasicTransform (always_apply=False, p=0.5) [view source on GitHub]

albumentations.core.transforms_interface.BasicTransform.add_targets (self, additional_targets)

Add targets to transform them the same way as one of existing targets ex: {'target_image': 'image'} ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'} by the way you must have at least one object with key 'image'

Parameters:

Name Type Description
additional_targets Dict[str, str]

keys - new target name, values - old target name. ex: {'image2': 'image'}

class albumentations.core.transforms_interface.DualTransform [view source on GitHub]

Transform for segmentation task.

class albumentations.core.transforms_interface.ImageOnlyTransform [view source on GitHub]

Transform applied to image only.

class albumentations.core.transforms_interface.NoOp [view source on GitHub]

Does nothing

def albumentations.core.transforms_interface.to_tuple (param, low=None, bias=None) [view source on GitHub]

Convert input argument to min-max tuple

Parameters:

Name Type Description
param scalar, tuple or list of 2+ elements

Input value. If value is scalar, return value would be (offset - value, offset + value). If value is tuple, return value would be value + offset (broadcasted).

low

Second element of tuple can be passed as optional argument

bias

An offset factor added to each element

albumentations.imgaug special

albumentations.imgaug.transforms

class albumentations.imgaug.transforms.IAAAdditiveGaussianNoise (loc=0, scale=(2.5500000000000003, 12.75), per_channel=False, always_apply=False, p=0.5) [view source on GitHub]

Add gaussian noise to the input image.

This augmentation is deprecated. Please use GaussNoise instead.

Parameters:

Name Type Description
loc int

mean of the normal distribution that generates the noise. Default: 0.

scale [float, float]

standard deviation of the normal distribution that generates the noise. Default: (0.01 * 255, 0.05 * 255).

p float

probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.imgaug.transforms.IAAAffine (scale=1.0, translate_percent=None, translate_px=None, rotate=0.0, shear=0.0, order=1, cval=0, mode='reflect', always_apply=False, p=0.5) [view source on GitHub]

Place a regular grid of points on the input and randomly move the neighbourhood of these point around via affine transformations.

This augmentation is deprecated. Please use Affine instead.

Note: This class introduce interpolation artifacts to mask if it has values other than {0;1}

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets: image, mask

class albumentations.imgaug.transforms.IAACropAndPad (px=None, percent=None, pad_mode='constant', pad_cval=0, keep_size=True, always_apply=False, p=1) [view source on GitHub]

This augmentation is deprecated. Please use CropAndPad instead.

class albumentations.imgaug.transforms.IAAEmboss (alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5) [view source on GitHub]

Emboss the input image and overlays the result with the original image. This augmentation is deprecated. Please use Emboss instead.

Parameters:

Name Type Description
alpha [float, float]

range to choose the visibility of the embossed image. At 0, only the original image is visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).

strength [float, float]

strength range of the embossing. Default: (0.2, 0.7).

p float

probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.imgaug.transforms.IAAFliplr (always_apply=False, p=0.5) [view source on GitHub]

This augmentation is deprecated. Please use HorizontalFlip instead.

class albumentations.imgaug.transforms.IAAFlipud (always_apply=False, p=0.5) [view source on GitHub]

This augmentation is deprecated. Please use VerticalFlip instead.

class albumentations.imgaug.transforms.IAAPerspective (scale=(0.05, 0.1), keep_size=True, always_apply=False, p=0.5) [view source on GitHub]

Perform a random four point perspective transform of the input. This augmentation is deprecated. Please use Perspective instead.

Note: This class introduce interpolation artifacts to mask if it has values other than {0;1}

Parameters:

Name Type Description
scale [float, float]

standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. Default: (0.05, 0.1).

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask

class albumentations.imgaug.transforms.IAAPiecewiseAffine (scale=(0.03, 0.05), nb_rows=4, nb_cols=4, order=1, cval=0, mode='constant', always_apply=False, p=0.5) [view source on GitHub]

Place a regular grid of points on the input and randomly move the neighbourhood of these point around via affine transformations.

This augmentation is deprecated. Please use PiecewiseAffine instead.

Note: This class introduce interpolation artifacts to mask if it has values other than {0;1}

Parameters:

Name Type Description
scale [float, float]

factor range that determines how far each point is moved. Default: (0.03, 0.05).

nb_rows int

number of rows of points that the regular grid should have. Default: 4.

nb_cols int

number of columns of points that the regular grid should have. Default: 4.

p float

probability of applying the transform. Default: 0.5.

Targets: image, mask

class albumentations.imgaug.transforms.IAASharpen (alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image and overlays the result with the original image. This augmentation is deprecated. Please use Sharpen instead

Parameters:

Name Type Description
alpha [float, float]

range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).

lightness [float, float]

range to choose the lightness of the sharpened image. Default: (0.5, 1.0).

p float

probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.imgaug.transforms.IAASuperpixels (p_replace=0.1, n_segments=100, always_apply=False, p=0.5) [view source on GitHub]

Completely or partially transform the input image to its superpixel representation. Uses skimage's version of the SLIC algorithm. May be slow.

This augmentation is deprecated. Please use Superpixels instead.

Parameters:

Name Type Description
p_replace float

defines the probability of any superpixel area being replaced by the superpixel, i.e. by the average pixel color within its area. Default: 0.1.

n_segments int

target number of superpixels to generate. Default: 100.

p float

probability of applying the transform. Default: 0.5.

Targets: image

albumentations.pytorch special

albumentations.pytorch.transforms

class albumentations.pytorch.transforms.ToTensor (num_classes=1, sigmoid=True, normalize=None) [view source on GitHub]

Convert image and mask to torch.Tensor and divide by 255 if image or mask are uint8 type. This transform is now removed from Albumentations. If you need it downgrade the library to version 0.5.2.

Parameters:

Name Type Description
num_classes int

only for segmentation

sigmoid bool

only for segmentation, transform mask to LongTensor or not.

normalize dict

dict with keys [mean, std] to pass it into torchvision.normalize

class albumentations.pytorch.transforms.ToTensorV2 (transpose_mask=False, always_apply=True, p=1.0) [view source on GitHub]

Convert image and mask to torch.Tensor. The numpy HWC image is converted to pytorch CHW tensor. If the image is in HW format (grayscale image), it will be converted to pytorch HW tensor. This is a simplified and improved version of the old ToTensor transform (ToTensor was deprecated, and now it is not present in Albumentations. You should use ToTensorV2 instead).

Parameters:

Name Type Description
transpose_mask bool

If True and an input mask has three dimensions, this transform will transpose dimensions so the shape [height, width, num_channels] becomes [num_channels, height, width]. The latter format is a standard format for PyTorch Tensors. Default: False.

always_apply bool

Indicates whether this transformation should be always applied. Default: True.

p float

Probability of applying the transform. Default: 1.0.