Skip to content

Full API Reference on a single page

Pixel-level transforms

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

Spatial-level transforms

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform Image Masks BBoxes Keypoints
Affine
BBoxSafeRandomCrop
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
ElasticTransform
Flip
GridDistortion
GridDropout
HorizontalFlip
Lambda
LongestMaxSize
MaskDropout
NoOp
OpticalDistortion
PadIfNeeded
Perspective
PiecewiseAffine
PixelDropout
RandomCrop
RandomCropFromBorders
RandomCropNearBBox
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
SafeRotate
ShiftScaleRotate
SmallestMaxSize
Transpose
VerticalFlip
XYMasking

albumentations.augmentations special

albumentations.augmentations.blur special

albumentations.augmentations.blur.transforms

class albumentations.augmentations.blur.transforms.AdvancedBlur (blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=(0.2, 1.0), sigmaY_limit=(0.2, 1.0), rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=False, p=0.5) [view source on GitHub]

Blurs the input image using a Generalized Normal filter with randomly selected parameters.

This transform also adds multiplicative noise to the generated kernel before convolution, affecting the image in a unique way that combines blurring and noise injection for enhanced data augmentation.


blur_limit (ScaleIntType, optional): Maximum Gaussian kernel size for blurring the input image.
    Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma
    as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
    If a single value is provided, `blur_limit` will be in the range (0, blur_limit).
    Defaults to (3, 7).
sigma_x_limit ScaleFloatType: Gaussian kernel standard deviation for the X dimension.
    Must be in range [0, inf). If a single value is provided, `sigma_x_limit` will be in the range
    (0, sigma_limit). If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
    Defaults to (0.2, 1.0).
sigma_y_limit ScaleFloatType: Gaussian kernel standard deviation for the Y dimension.
    Must follow the same rules as `sigma_x_limit`.
    Defaults to (0.2, 1.0).
rotate_limit (ScaleIntType, optional): Range from which a random angle used to rotate the Gaussian kernel
    is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit).
    Defaults to (-90, 90).
beta_limit (ScaleFloatType, optional): Distribution shape parameter. 1 represents the normal distribution.
    Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it
    lighter than normal.
    Defaults to (0.5, 8.0).
noise_limit (ScaleFloatType, optional): Multiplicative factor that controls the strength of kernel noise.
    Must be positive and preferably centered around 1.0. If a single value is provided,
    `noise_limit` will be in the range (0, noise_limit).
    Defaults to (0.75, 1.25).
p (float, optional): Probability of applying the transform.
    Defaults to 0.5.

Reference: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data", available at https://arxiv.org/abs/2107.10833

Targets: This transformation is applied to images only.

Image types: This transform supports uint8 and float32 image types.

class albumentations.augmentations.blur.transforms.Blur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a random-sized kernel.


blur_limit: maximum kernel size for blurring the input image.
    Should be in range [3, inf). Default: (3, 7).
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply defocus transform. See https://arxiv.org/abs/1903.12261.


radius ([int, int] or int): range for radius of defocusing.
    If limit is a single int, the range will be [1, limit]. Default: (3, 10).
alias_blur ([float, float] or float): range for alias_blur of defocusing (sigma of gaussian blur).
    If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: Any

class albumentations.augmentations.blur.transforms.GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a Gaussian filter with a random kernel size.


blur_limit (int, [int, int]): maximum Gaussian kernel size for blurring the input image.
    Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
    as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
    If set single value `blur_limit` will be in range (0, blur_limit).
    Default: (3, 7).
sigma_limit (float, [float, float]): Gaussian kernel standard deviation. Must be in range [0, inf).
    If set single value `sigma_limit` will be in range (0, sigma_limit).
    If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.GlassBlur (sigma=0.7, max_delta=4, iterations=2, always_apply=False, mode='fast', p=0.5) [view source on GitHub]

Apply glass noise to the input image.


sigma (float): standard deviation for Gaussian kernel.
max_delta (int): max distance between pixels which are swapped.
iterations (int): number of repeats.
    Should be in range [1, inf). Default: (2).
mode (str): mode of computation: fast or exact. Default: "fast".
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1903.12261 | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

class albumentations.augmentations.blur.transforms.MedianBlur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a median filter with a random aperture linear size.


blur_limit (int): maximum aperture linear size for blurring the input image.
    Must be odd and in range [3, inf). Default: (3, 7).
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.MotionBlur (blur_limit=7, allow_shifted=True, always_apply=False, p=0.5) [view source on GitHub]

Apply motion blur to the input image using a random-sized kernel.


blur_limit (int): maximum kernel size for blurring the input image.
    Should be in range [3, inf). Default: (3, 7).
allow_shifted (bool): if set to true creates non shifted kernels only,
    otherwise creates randomly shifted kernels. Default: True.
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.blur.transforms.ZoomBlur (max_factor=1.31, step_factor=(0.01, 0.03), always_apply=False, p=0.5) [view source on GitHub]

Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.


max_factor ([float, float] or float): range for max factor for blurring.
    If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
    All max_factor values should be larger than 1.
step_factor ([float, float] or float): If single float will be used as step parameter for np.arange.
    If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
    All step_factor values should be positive.
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: Any

albumentations.augmentations.crops special

albumentations.augmentations.crops.functional

def albumentations.augmentations.crops.functional.bbox_crop (bbox, x_min, y_min, x_max, y_max, rows, cols) [view source on GitHub]

Crop a bounding box.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
x_min:
y_min:
x_max:
y_max:
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A cropped bounding box `(x_min, y_min, x_max, y_max)`.
def albumentations.augmentations.crops.functional.crop_bbox_by_coords (bbox, crop_coords, crop_height, crop_width, rows, cols) [view source on GitHub]

Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.


bbox: A cropped box `(x_min, y_min, x_max, y_max)`.
crop_coords: Crop coordinates `(x1, y1, x2, y2)`.
crop_height:
crop_width:
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A cropped bounding box `(x_min, y_min, x_max, y_max)`.
def albumentations.augmentations.crops.functional.crop_keypoint_by_coords (keypoint, crop_coords) [view source on GitHub]

Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.


keypoint (tuple): A keypoint `(x, y, angle, scale)`.
crop_coords (tuple): Crop box coords `(x1, x2, y1, y2)`.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.crops.functional.keypoint_center_crop (keypoint, crop_height, crop_width, rows, cols) [view source on GitHub]

Keypoint center crop.


keypoint: A keypoint `(x, y, angle, scale)`.
crop_height: Crop height.
crop_width: Crop width.
rows: Image height.
cols: Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.crops.functional.keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols) [view source on GitHub]

Keypoint random crop.


keypoint: (tuple): A keypoint `(x, y, angle, scale)`.
crop_height (int): Crop height.
crop_width (int): Crop width.
h_start (int): Crop height start.
w_start (int): Crop width start.
rows (int): Image height.
cols (int): Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.

albumentations.augmentations.crops.transforms

class albumentations.augmentations.crops.transforms.BBoxSafeRandomCrop (erosion_rate=0.0, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input without loss of bboxes.


erosion_rate: erosion rate applied on input image height before crop.
p: probability of applying the transform. Default: 1.

Targets: image, mask, bboxes Image types: uint8, float32

class albumentations.augmentations.crops.transforms.CenterCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop the central part of the input.


height: height of the crop.
width: width of the crop.
p: probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

Note:
It is recommended to use uint8 images as input.
Otherwise the operation will require internal conversion
float32 -> uint8 -> float32 that causes worse performance.
class albumentations.augmentations.crops.transforms.Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0) [view source on GitHub]

Crop region from image.


x_min: Minimum upper left x coordinate.
y_min: Minimum upper left y coordinate.
x_max: Maximum lower right x coordinate.
y_max: Maximum lower right y coordinate.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.CropAndPad (px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image). Padding adds pixels to the sides (e.g. black pixels). This transformation will never crop images below a height or width of 1.

Note:
This transformation automatically resizes images back to their original size. To deactivate this, add the
parameter ``keep_size=False``.

px (int or tuple):
    The number of pixels to crop (negative values) or pad (positive values)
    on each side of the image. Either this or the parameter `percent` may
    be set, not both at the same time.
        * If ``None``, then pixel-based cropping/padding will not be used.
        * If ``int``, then that exact number of pixels will always be cropped/padded.
        * If a ``tuple`` of two ``int`` s with values ``a`` and ``b``,
          then each side will be cropped/padded by a random amount sampled
          uniformly per image and side from the interval ``[a, b]``. If
          however `sample_independently` is set to ``False``, only one
          value will be sampled per image and used for all sides.
        * If a ``tuple`` of four entries, then the entries represent top,
          right, bottom, left. Each entry may be a single ``int`` (always
          crop/pad by exactly that value), a ``tuple`` of two ``int`` s
          ``a`` and ``b`` (crop/pad by an amount within ``[a, b]``), a
          ``list`` of ``int`` s (crop/pad by a random value that is
          contained in the ``list``).
percent (float or tuple):
    The number of pixels to crop (negative values) or pad (positive values)
    on each side of the image given as a *fraction* of the image
    height/width. E.g. if this is set to ``-0.1``, the transformation will
    always crop away ``10%`` of the image's height at both the top and the
    bottom (both ``10%`` each), as well as ``10%`` of the width at the
    right and left.
    Expected value range is ``(-1.0, inf)``.
    Either this or the parameter `px` may be set, not both
    at the same time.
        * If ``None``, then fraction-based cropping/padding will not be
          used.
        * If ``float``, then that fraction will always be cropped/padded.
        * If a ``tuple`` of two ``float`` s with values ``a`` and ``b``,
          then each side will be cropped/padded by a random fraction
          sampled uniformly per image and side from the interval
          ``[a, b]``. If however `sample_independently` is set to
          ``False``, only one value will be sampled per image and used for
          all sides.
        * If a ``tuple`` of four entries, then the entries represent top,
          right, bottom, left. Each entry may be a single ``float``
          (always crop/pad by exactly that percent value), a ``tuple`` of
          two ``float`` s ``a`` and ``b`` (crop/pad by a fraction from
          ``[a, b]``), a ``list`` of ``float`` s (crop/pad by a random
          value that is contained in the list).
pad_mode (int): OpenCV border mode.
pad_cval (number, Sequence[number]):
    The constant value to use if the pad mode is ``BORDER_CONSTANT``.
        * If ``number``, then that value will be used.
        * If a ``tuple`` of two ``number`` s and at least one of them is
          a ``float``, then a random number will be uniformly sampled per
          image from the continuous interval ``[a, b]`` and used as the
          value. If both ``number`` s are ``int`` s, the interval is
          discrete.
        * If a ``list`` of ``number``, then a random value will be chosen
          from the elements of the ``list`` and used as the value.
pad_cval_mask (number, Sequence[number]): Same as pad_cval but only for masks.
keep_size (bool):
    After cropping and padding, the result image will usually have a
    different height/width compared to the original input image. If this
    parameter is set to ``True``, then the cropped/padded image will be
    resized to the input image's size, i.e. the output shape is always identical to the input shape.
sample_independently (bool):
    If ``False`` *and* the values for `px`/`percent` result in exactly
    *one* probability distribution for all image sides, only one single
    value will be sampled from that probability distribution and used for
    all sides. I.e. the crop/pad amount then is the same for all sides.
    If ``True``, four values will be sampled independently, one per side.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.

Targets: image, mask, bboxes, keypoints

Image types: any

class albumentations.augmentations.crops.transforms.CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0) [view source on GitHub]

Crop area with mask if mask is non-empty, else make random crop.


height: vertical size of crop in pixels
width: horizontal size of crop in pixels
ignore_values (list of int): values to ignore in mask, `0` values are always ignored
    (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
ignore_channels (list of int): channels to ignore in mask
    (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
p: probability of applying the transform. Default: 1.0.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input.


height: height of the crop.
width: width of the crop.
p: probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image randomly cut parts from borders without resize at the end


crop_left (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from left side in range [0, crop_left * width)
crop_right (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from right side in range [(1 - crop_right) * width, width)
crop_top (float): singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from top side in range [0, crop_top * height)
crop_bottom (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from bottom side in range [(1 - crop_bottom) * height, height)
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomCropNearBBox (max_part_shift=(0.3, 0.3), cropping_box_key='cropping_bbox', always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image with random shift by x,y coordinates


max_part_shift (float, [float, float]): Max shift in `height` and `width` dimensions relative
    to `cropping_bbox` dimension.
    If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift).
    Default (0.3, 0.3).
cropping_box_key (str): Additional target key for cropping box. Default `cropping_bbox`
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:


>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_box_key='test_box')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])
class albumentations.augmentations.crops.transforms.RandomResizedCrop (height, width, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Torchvision's variant of crop a random part of the input and rescale it to some size.


height (int): height after crop and resize.
width (int): width after crop and resize.
scale ([float, float]): range of size of the origin size cropped
ratio ([float, float]): range of aspect ratio of the origin aspect ratio cropped
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size without loss of bboxes.


height: height after crop and resize.
width: width after crop and resize.
erosion_rate: erosion rate applied on input image height before crop.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes Image types: uint8, float32

class albumentations.augmentations.crops.transforms.RandomSizedCrop (min_max_height, height, width, w2h_ratio=1.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size.


min_max_height ([int, int]): crop size limits.
height (int): height after crop and resize.
width (int): width after crop and resize.
w2h_ratio (float): aspect ratio of crop.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.domain_adaptation

class albumentations.augmentations.domain_adaptation.FDA (reference_images, beta_limit=0.1, read_fn=<function read_rgb_image at 0x7f78c6712ee0>, always_apply=False, p=0.5) [view source on GitHub]

Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA Simple "style transfer".


reference_images (Sequence[Any]): Sequence of objects that will be converted to images by `read_fn`. By default,
it expects a sequence of paths to images.
beta_limit (float or tuple of float): coefficient beta from paper. Recommended less 0.3.
read_fn (Callable): Used-defined function to read image. Function should get an element of `reference_images`
and return numpy array of image pixels. Default: takes as input a path to an image and returns a numpy array.

Targets: image

Image types: uint8, float32

Reference: https://github.com/YanchaoYang/FDA https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

Examples:


>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

class albumentations.augmentations.domain_adaptation.HistogramMatching (reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f78c6712ee0>, always_apply=False, p=0.5) [view source on GitHub]

Apply histogram matching. It manipulates the pixels of an input image so that its histogram matches the histogram of the reference image. If the images have multiple channels, the matching is done independently for each channel, as long as the number of channels is equal in the input image and the reference.

Histogram matching can be used as a lightweight normalization for image processing, such as feature matching, especially in circumstances where the images have been taken from different sources or in different conditions (i.e. lighting).

See: https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html


reference_images (Sequence[Any]): Sequence of objects that will be converted to images by `read_fn`. By default,
it expects a sequence of paths to images.
blend_ratio: Tuple of min and max blend ratio. Matched image will be blended with original
    with random blend factor for increased diversity of generated images.
read_fn (Callable): Used-defined function to read image. Function should get an element of `reference_images`
and return numpy array of image pixels. Default: takes as input a path to an image and returns a numpy array.
p: probability of applying the transform. Default: 1.0.

Targets: image

Image types: uint8, uint16, float32

class albumentations.augmentations.domain_adaptation.PixelDistributionAdaptation (reference_images, blend_ratio=(0.25, 1.0), read_fn=<function read_rgb_image at 0x7f78c6712ee0>, transform_type='pca', always_apply=False, p=0.5) [view source on GitHub]

Another naive and quick pixel-level domain adaptation. It fits a simple transform (such as PCA, StandardScaler or MinMaxScaler) on both original and reference image, transforms original image with transform trained on this image and then performs inverse transformation using transform fitted on reference image.


reference_images (Sequence[Any]): Sequence of objects that will be converted to images by `read_fn`. By default,
it expects a sequence of paths to images.
blend_ratio [float, float]: Tuple of min and max blend ratio. Matched image will be blended with original
    with random blend factor for increased diversity of generated images.
read_fn (Callable): Used-defined function to read image. Function should get an element of `reference_images`
and return numpy array of image pixels. Default: takes as input a path to an image and returns a numpy array.
transform_type (str): type of transform; "pca", "standard", "minmax" are allowed.
p (float): probability of applying the transform. Default: 1.0.

Targets: image

Image types: uint8, float32

See also: https://github.com/arsenyinfo/qudida

def albumentations.augmentations.domain_adaptation.fourier_domain_adaptation (img, target_img, beta) [view source on GitHub]

Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA


img:  source image
target_img:  target image for domain adaptation
beta: coefficient from source paper

Returns:

Type Description
ndarray

transformed image

albumentations.augmentations.dropout special

albumentations.augmentations.dropout.channel_dropout

class albumentations.augmentations.dropout.channel_dropout.ChannelDropout (channel_drop_range=(1, 1), fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Randomly Drop Channels in the input Image.


channel_drop_range [int, int]: range from which we choose the number of channels to drop.
fill_value (int, float): pixel value for the dropped channel.
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, uint16, unit32, float32

albumentations.augmentations.dropout.coarse_dropout

class albumentations.augmentations.dropout.coarse_dropout.CoarseDropout (max_holes=8, max_height=8, max_width=8, min_holes=None, min_height=None, min_width=None, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5) [view source on GitHub]

CoarseDropout of the rectangular regions in the image.


max_holes (int): Maximum number of regions to zero out.
max_height (int, float): Maximum height of the hole.
If float, it is calculated as a fraction of the image height.
max_width (int, float): Maximum width of the hole.
If float, it is calculated as a fraction of the image width.
min_holes (int): Minimum number of regions to zero out. If `None`,
    `min_holes` is be set to `max_holes`. Default: `None`.
min_height (int, float): Minimum height of the hole. Default: None. If `None`,
    `min_height` is set to `max_height`. Default: `None`.
    If float, it is calculated as a fraction of the image height.
min_width (int, float): Minimum width of the hole. If `None`, `min_height` is
    set to `max_width`. Default: `None`.
    If float, it is calculated as a fraction of the image width.

fill_value (int, float, list of int, list of float): value for dropped pixels.
mask_fill_value (int, float, list of int, list of float): fill value for dropped pixels
    in mask. If `None` - mask is not affected. Default: `None`.

Targets: image, mask, keypoints

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1708.04552 | https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py | https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

albumentations.augmentations.dropout.grid_dropout

class albumentations.augmentations.dropout.grid_dropout.GridDropout (ratio=0.5, unit_size_min=None, unit_size_max=None, holes_number_x=None, holes_number_y=None, shift_x=0, shift_y=0, random_offset=False, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5) [view source on GitHub]

GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.


ratio: the ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
    Must be between 0 and 1. Default: 0.5.
unit_size_min (int): minimum size of the grid unit. Must be between 2 and the image shorter edge.
    If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
unit_size_max (int): maximum size of the grid unit. Must be between 2 and the image shorter edge.
    If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
holes_number_x (int): the number of grid units in x direction. Must be between 1 and image width//2.
    If 'None', grid unit width is set as image_width//10. Default: `None`.
holes_number_y (int): the number of grid units in y direction. Must be between 1 and image height//2.
    If `None`, grid unit height is set equal to the grid unit width or image height, whatever is smaller.
shift_x (int): offsets of the grid start in x direction from (0,0) coordinate.
    Clipped between 0 and grid unit_width - hole_width. Default: 0.
shift_y (int): offsets of the grid start in y direction from (0,0) coordinate.
    Clipped between 0 and grid unit height - hole_height. Default: 0.
random_offset (boolean): weather to offset the grid randomly between 0 and grid unit size - hole size
    If 'True', entered shift_x, shift_y are ignored and set randomly. Default: `False`.
fill_value (int): value for the dropped pixels. Default = 0
mask_fill_value (int): value for the dropped pixels in mask.
    If `None`, transformation is not applied to the mask. Default: `None`.

Targets: image, mask

Image types: uint8, float32

References:
https://arxiv.org/abs/2001.04086

albumentations.augmentations.dropout.mask_dropout

class albumentations.augmentations.dropout.mask_dropout.MaskDropout (max_objects=1, image_fill_value=0, mask_fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Image & mask augmentation that zero out mask and image regions corresponding to randomly chosen object instance from mask.

Mask must be single-channel image, zero values treated as background. Image can be any number of channels.

Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254


max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
image_fill_value: Fill value to use when filling image.
    Can be 'inpaint' to apply inpaining (works only  for 3-chahnel images)
mask_fill_value: Fill value to use when filling mask.

Targets: image, mask

Image types: uint8, float32

albumentations.augmentations.dropout.xy_masking

class albumentations.augmentations.dropout.xy_masking.XYMasking (num_masks_x=0, num_masks_y=0, mask_x_length=0, mask_y_length=0, fill_value=0, mask_fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis), simulating occlusions. This transform is useful for training models to recognize images with varied visibility conditions. It's particularly effective for spectrogram images, allowing spectral and frequency masking to improve model robustness.

At least one of max_x_length or max_y_length must be specified, dictating the mask's maximum size along each axis.


num_masks_x (Union[int, Tuple[int, int]]): Number or range of horizontal regions to mask. Defaults to 0.
num_masks_y (Union[int, Tuple[int, int]]): Number or range of vertical regions to mask. Defaults to 0.
mask_x_length ([Union[int, Tuple[int, int]]): Specifies the length of the masks along
    the X (horizontal) axis. If an integer is provided, it sets a fixed mask length.
    If a tuple of two integers (min, max) is provided,
    the mask length is randomly chosen within this range for each mask.
    This allows for variable-length masks in the horizontal direction.
mask_y_length (Union[int, Tuple[int, int]]): Specifies the height of the masks along
    the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height,
    while a tuple (min, max) allows for variable-height masks, chosen randomly
    within the specified range for each mask. This flexibility facilitates creating masks of various
    sizes in the vertical direction.
fill_value (Union[int, float, List[int], List[float]]): Value to fill image masks. Defaults to 0.
mask_fill_value (Optional[Union[int, float, List[int], List[float]]]): Value to fill masks in the mask.
    If `None`, uses mask is not affected. Default: `None`.
p (float): Probability of applying the transform. Defaults to 0.5.

Targets: image, mask, keypoints

Image types: uint8, float32

Note: Either max_x_length or max_y_length or both must be defined.

albumentations.augmentations.dropout.xy_masking.XYMasking.validate_mask_length (self, mask_length, dimension_size, dimension_name)

Validate the mask length against the corresponding image dimension size.


mask_length (Optional[Union[int, Tuple[int, int]]]): The length of the mask to be validated.
dimension_size (int): The size of the image dimension (width or height)
    against which to validate the mask length.
dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

albumentations.augmentations.functional

def albumentations.augmentations.functional.add_fog (img, fog_coef, alpha_coef, haze_list) [view source on GitHub]

Add fog to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


img: Image.
fog_coef: Fog coefficient.
alpha_coef: Alpha coefficient.
haze_list:

Returns:

Type Description
ndarray

Image.

def albumentations.augmentations.functional.add_gravel (img, gravels) [view source on GitHub]

Add gravel to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


img (numpy.ndarray): image to add gravel to
gravels (list): list of gravel parameters. (float, float, float, float):
    (top-left x, top-left y, bottom-right x, bottom right y)

Returns:

Type Description
ndarray

numpy.ndarray:

def albumentations.augmentations.functional.add_rain (img, slant, drop_length, drop_width, drop_color, blur_value, brightness_coefficient, rain_drops) [view source on GitHub]

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


img: Image.
slant:
drop_length:
drop_width:
drop_color:
blur_value: Rainy view are blurry.
brightness_coefficient: Rainy days are usually shady.
rain_drops:

Returns:

Type Description
ndarray

Image.

def albumentations.augmentations.functional.add_shadow (img, vertices_list) [view source on GitHub]

Add shadows to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


img (numpy.ndarray):
vertices_list (list):

Returns:

Type Description
ndarray

numpy.ndarray:

def albumentations.augmentations.functional.add_snow (img, snow_point, brightness_coeff) [view source on GitHub]

Bleaches out pixels, imitation snow.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


img: Image.
snow_point: Number of show points.
brightness_coeff: Brightness coefficient.

Returns:

Type Description
ndarray

Image.

def albumentations.augmentations.functional.add_sun_flare (img, flare_center_x, flare_center_y, src_radius, src_color, circles) [view source on GitHub]

Add sun flare.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


img (numpy.ndarray):
flare_center_x (float):
flare_center_y (float):
src_radius:
src_color (int, int, int):
circles (list):

Returns:

Type Description
ndarray

numpy.ndarray:

def albumentations.augmentations.functional.bbox_from_mask (mask) [view source on GitHub]

Create bounding box from binary mask (fast version)


mask (numpy.ndarray): binary mask.

Returns:

Type Description
Tuple[int, int, int, int]

tuple: A bounding box tuple `(x_min, y_min, x_max, y_max)`.

def albumentations.augmentations.functional.fancy_pca (img, alpha=0.1) [view source on GitHub]

Perform 'Fancy PCA' augmentation from: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf


img: numpy array with (h, w, rgb) shape, as ints between 0-255
alpha: how much to perturb/scale the eigen vecs and vals
        the paper used std=0.1

Returns:

Type Description
ndarray

numpy image-like array as uint8 range(0, 255)

def albumentations.augmentations.functional.iso_noise (image, color_shift=0.05, intensity=0.5, random_state=None, ** kwargs) [view source on GitHub]

Apply poisson noise to image to simulate camera sensor noise.


image (numpy.ndarray): Input image, currently, only RGB, uint8 images are supported.
color_shift (float):
intensity (float): Multiplication factor for noise values. Values of ~0.5 are produce noticeable,
           yet acceptable level of noise.
random_state:
**kwargs:

Returns:

Type Description
ndarray

numpy.ndarray: Noised image

def albumentations.augmentations.functional.mask_from_bbox (img, bbox) [view source on GitHub]

Create binary mask from bounding box


img: input image
bbox: A bounding box tuple `(x_min, y_min, x_max, y_max)`

Returns:

Type Description
ndarray

mask: binary mask

def albumentations.augmentations.functional.move_tone_curve (img, low_y, high_y) [view source on GitHub]

Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.


img: RGB or grayscale image.
low_y: y-position of a Bezier control point used
    to adjust the tone curve, must be in range [0, 1]
high_y: y-position of a Bezier control point used
    to adjust image tone curve, must be in range [0, 1]

def albumentations.augmentations.functional.multiply (img, multiplier) [view source on GitHub]


img: Image.
multiplier: Multiplier coefficient.
Returns
Image multiplied by `multiplier` coefficient.

def albumentations.augmentations.functional.posterize (img, bits) [view source on GitHub]

Reduce the number of bits for each color channel.


img: image to posterize.
bits: number of high bits. Must be in range [0, 8]

Returns:

Type Description
ndarray

Image with reduced color channels.

def albumentations.augmentations.functional.solarize (img, threshold=128) [view source on GitHub]

Invert all pixel values above a threshold.


img: The image to solarize.
threshold: All pixels above this grayscale level are inverted.

Returns:

Type Description
ndarray

Solarized image.

def albumentations.augmentations.functional.swap_tiles_on_image (image, tiles) [view source on GitHub]

Swap tiles on image.


image: Input image.
tiles: array of tuples(
    current_left_up_corner_row, current_left_up_corner_col,
    old_left_up_corner_row, old_left_up_corner_col,
    height_tile, width_tile)

Returns:

Type Description
ndarray

np.ndarray: Output image.

albumentations.augmentations.geometric special

albumentations.augmentations.geometric.functional

def albumentations.augmentations.geometric.functional.bbox_flip (bbox, d, rows, cols) [view source on GitHub]

Flip a bounding box either vertically, horizontally or both depending on the value of d.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
d: dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box `(x_min, y_min, x_max, y_max)`.

ValueError: if value of `d` is not -1, 0 or 1.
def albumentations.augmentations.geometric.functional.bbox_hflip (bbox, rows, cols) [view source on GitHub]

Flip a bounding box horizontally around the y-axis.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box `(x_min, y_min, x_max, y_max)`.
def albumentations.augmentations.geometric.functional.bbox_rot90 (bbox, factor, rows, cols) [view source on GitHub]

Rotates a bounding box by 90 degrees CCW (see np.rot90)


bbox: A bounding box tuple (x_min, y_min, x_max, y_max).
factor: Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A bounding box tuple (x_min, y_min, x_max, y_max).
def albumentations.augmentations.geometric.functional.bbox_rotate (bbox, angle, method, rows, cols) [view source on GitHub]

Rotates a bounding box by angle degrees.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
angle: Angle of rotation in degrees.
method: Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box `(x_min, y_min, x_max, y_max)`.
References:
https://arxiv.org/abs/2109.13488
def albumentations.augmentations.geometric.functional.bbox_shift_scale_rotate (bbox, angle, scale, dx, dy, rotate_method, rows, cols, ** kwargs) [view source on GitHub]

Rotates, shifts and scales a bounding box. Rotation is made by angle degrees, scaling is made by scale factor and shifting is made by dx and dy.


bbox (tuple): A bounding box `(x_min, y_min, x_max, y_max)`.
angle (int): Angle of rotation in degrees.
scale (int): Scale factor.
dx (int): Shift along x-axis in pixel units.
dy (int): Shift along y-axis in pixel units.
rotate_method(str): Rotation method used. Should be one of: "largest_box", "ellipse".
    Default: "largest_box".
rows (int): Image rows.
cols (int): Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box `(x_min, y_min, x_max, y_max)`.
def albumentations.augmentations.geometric.functional.bbox_transpose (bbox, axis, rows, cols) [view source on GitHub]

Transposes a bounding box along given axis.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
axis: 0 - main axis, 1 - secondary axis.
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box tuple `(x_min, y_min, x_max, y_max)`.

ValueError: If axis not equal to 0 or 1.
def albumentations.augmentations.geometric.functional.bbox_vflip (bbox, rows, cols) [view source on GitHub]

Flip a bounding box vertically around the x-axis.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image rows.
cols: Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A bounding box `(x_min, y_min, x_max, y_max)`.
def albumentations.augmentations.geometric.functional.elastic_transform (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None, approximate=False, same_dxdy=False) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

def albumentations.augmentations.geometric.functional.elastic_transform_approx (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications for speed). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

def albumentations.augmentations.geometric.functional.find_keypoint (position, distance_map, threshold, inverted) [view source on GitHub]

Determine if a valid keypoint can be found at the given position.

def albumentations.augmentations.geometric.functional.from_distance_maps (distance_maps, inverted, if_not_found_coords, threshold=None) [view source on GitHub]

Convert outputs of to_distance_maps to KeypointsOnImage. This is the inverse of to_distance_maps.

def albumentations.augmentations.geometric.functional.grid_distortion (img, num_steps=10, xsteps=(), ysteps=(), interpolation=1, border_mode=4, value=None) [view source on GitHub]

Perform a grid distortion of an input image.

Reference: http://pythology.blogspot.sg/2014/03/interpolation-on-regular-distorted-grid.html

def albumentations.augmentations.geometric.functional.keypoint_flip (keypoint, d, rows, cols) [view source on GitHub]

Flip a keypoint either vertically, horizontally or both depending on the value of d.


keypoint: A keypoint `(x, y, angle, scale)`.
d: Number of flip. Must be -1, 0 or 1:
    * 0 - vertical flip,
    * 1 - horizontal flip,
    * -1 - vertical and horizontal flip.
rows: Image height.
cols: Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.

ValueError: if value of `d` is not -1, 0 or 1.
def albumentations.augmentations.geometric.functional.keypoint_hflip (keypoint, rows, cols) [view source on GitHub]

Flip a keypoint horizontally around the y-axis.


keypoint: A keypoint `(x, y, angle, scale)`.
rows: Image height.
cols: Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.geometric.functional.keypoint_rot90 (keypoint, factor, rows, cols, ** params) [view source on GitHub]

Rotates a keypoint by 90 degrees CCW (see np.rot90)


keypoint: A keypoint `(x, y, angle, scale)`.
factor: Number of CCW rotations. Must be in range [0;3] See np.rot90.
rows: Image height.
cols: Image width.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A keypoint `(x, y, angle, scale)`.

ValueError: if factor not in set {0, 1, 2, 3}
def albumentations.augmentations.geometric.functional.keypoint_rotate (keypoint, angle, rows, cols, ** params) [view source on GitHub]

Rotate a keypoint by angle.


keypoint: A keypoint `(x, y, angle, scale)`.
angle: Rotation angle.
rows: Image height.
cols: Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.geometric.functional.keypoint_scale (keypoint, scale_x, scale_y) [view source on GitHub]

Scales a keypoint by scale_x and scale_y.


keypoint: A keypoint `(x, y, angle, scale)`.
scale_x: Scale coefficient x-axis.
scale_y: Scale coefficient y-axis.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.geometric.functional.keypoint_transpose (keypoint) [view source on GitHub]

Rotate a keypoint by angle.


keypoint: A keypoint `(x, y, angle, scale)`.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.geometric.functional.keypoint_vflip (keypoint, rows, cols) [view source on GitHub]

Flip a keypoint vertically around the x-axis.


keypoint: A keypoint `(x, y, angle, scale)`.
rows: Image height.
cols: Image width.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A keypoint `(x, y, angle, scale)`.
def albumentations.augmentations.geometric.functional.rotation2d_matrix_to_euler_angles (matrix, y_up=False) [view source on GitHub]

matrix (np.ndarray): Rotation matrix
y_up (bool): is Y axis looks up or down
def albumentations.augmentations.geometric.functional.to_distance_maps (keypoints, height, width, inverted=False) [view source on GitHub]

Generate a (H,W,N) array of distance maps for N keypoints.

The n-th distance map contains at every location (y, x) the euclidean distance to the n-th keypoint.

This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.


keypoint: keypoint coordinates
height: image height
width: image width
inverted (bool): If ``True``, inverted distance maps are returned where each
    distance value d is replaced by ``d/(d+1)``, i.e. the distance
    maps have values in the range ``(0.0, 1.0]`` with ``1.0`` denoting
    exactly the position of the respective keypoint.

Returns:

Type Description
ndarray

(H, W, N) ndarray
    A ``float32`` array containing ``N`` distance maps for ``N``
    keypoints. Each location ``(y, x, n)`` in the array denotes the
    euclidean distance at ``(y, x)`` to the ``n``-th keypoint.
    If `inverted` is ``True``, the distance ``d`` is replaced
    by ``d/(d+1)``. The height and width of the array match the
    height and width in ``KeypointsOnImage.shape``.
def albumentations.augmentations.geometric.functional.validate_if_not_found_coords (if_not_found_coords) [view source on GitHub]

Validate and process if_not_found_coords parameter.

albumentations.augmentations.geometric.resize

class albumentations.augmentations.geometric.resize.LongestMaxSize (max_size=1024, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.


max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
    will be randomly selected from the values in the list.
interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.resize.RandomScale (scale_limit=0.1, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Randomly resize the input. Output image size is different from the input image size.


scale_limit ([float, float] or float): scaling factor range. If scale_limit is a single float value, the
    range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
    If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
    Default: (-0.1, 0.1).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.resize.Resize (height, width, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Resize the input to the given height and width.


height (int): desired height of the output.
width (int): desired width of the output.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.resize.SmallestMaxSize (max_size=1024, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.


max_size (int, list of int): maximum size of smallest side of the image after the transformation. When using a
    list, max size will be randomly selected from the values in the list.
interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.rotate

class albumentations.augmentations.geometric.rotate.RandomRotate90 [view source on GitHub]

Randomly rotate the input by 90 degrees zero or more times.


p: probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.rotate.RandomRotate90.apply (self, img, factor=0, **params)

factor (int): number of times the input will be rotated by 90 degrees.
class albumentations.augmentations.geometric.rotate.Rotate (limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', crop_border=False, always_apply=False, p=0.5) [view source on GitHub]

Rotate the input by an angle selected randomly from the uniform distribution.


limit: range from which a random angle is picked. If limit is a single int
    an angle is picked from (-limit, limit). Default: (-90, 90)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
    cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
    Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of ints,
            list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
    Default: "largest_box"
crop_border (bool): If True would make a largest possible crop within rotated image
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.rotate.SafeRotate (limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5) [view source on GitHub]

Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we may see some artifacts.


limit ([int, int] or int): range from which a random angle is picked. If limit is a single int
    an angle is picked from (-limit, limit). Default: (-90, 90)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
    cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
    Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of ints,
            list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.transforms

class albumentations.augmentations.geometric.transforms.Affine (scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', always_apply=False, p=0.5) [view source on GitHub]

Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.

Affine transformations involve:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)

All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval and mode of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.


scale (number, tuple of number or dict): Scaling factor to use, where ``1.0`` denotes "no change" and
    ``0.5`` is zoomed out to ``50`` percent of the original size.
        * If a single number, then that value will be used for all images.
        * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
          That the same range will be used for both x- and y-axis. To keep the aspect ratio, set
          ``keep_ratio=True``, then the same value will be used for both x- and y-axis.
        * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
          Each of these keys can have the same values as described above.
          Using a dictionary allows to set different values for the two axis and sampling will then happen
          *independently* per axis, resulting in samples that differ between the axes. Note that when
          the ``keep_ratio=True``, the x- and y-axis ranges should be the same.
translate_percent (None, number, tuple of number or dict): Translation as a fraction of the image height/width
    (x-translation, y-translation), where ``0`` denotes "no change"
    and ``0.5`` denotes "half of the axis size".
        * If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``.
        * If a single number, then that value will be used for all images.
        * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
          That sampled fraction value will be used identically for both x- and y-axis.
        * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
          Each of these keys can have the same values as described above.
          Using a dictionary allows to set different values for the two axis and sampling will then happen
          *independently* per axis, resulting in samples that differ between the axes.
translate_px (None, int, tuple of int or dict): Translation in pixels.
        * If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``.
        * If a single int, then that value will be used for all images.
        * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from
          the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis.
        * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
          Each of these keys can have the same values as described above.
          Using a dictionary allows to set different values for the two axis and sampling will then happen
          *independently* per axis, resulting in samples that differ between the axes.
rotate (number or tuple of number): Rotation in degrees (**NOT** radians), i.e. expected value range is
    around ``[-360, 360]``. Rotation happens around the *center* of the image,
    not the top left corner as in some other frameworks.
        * If a number, then that value will be used for all images.
        * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``
          and used as the rotation value.
shear (number, tuple of number or dict): Shear in degrees (**NOT** radians), i.e. expected value range is
    around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``.
        * If a number, then that value will be used for all images as
          the shear on the x-axis (no shear on the y-axis will be done).
        * If a tuple ``(a, b)``, then two value will be uniformly sampled per image
          from the interval ``[a, b]`` and be used as the x- and y-shear value.
        * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
          Each of these keys can have the same values as described above.
          Using a dictionary allows to set different values for the two axis and sampling will then happen
          *independently* per axis, resulting in samples that differ between the axes.
interpolation (int): OpenCV interpolation flag.
mask_interpolation (int): OpenCV interpolation flag.
cval (number or sequence of number): The constant value to use when filling in newly created pixels.
    (E.g. translating by 1px to the right will create a new 1px-wide column of pixels
    on the left of the image).
    The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images.
cval_mask (number or tuple of number): Same as cval but only for masks.
mode (int): OpenCV border flag.
fit_output (bool): If True, the image plane size and position will be adjusted to tightly capture
    the whole image after affine transformation (`translate_percent` and `translate_px` are ignored).
    Otherwise (``False``),  parts of the transformed image may end up outside the image plane.
    Fitting the output shape can be useful to avoid corners of the image being outside the image plane
    after applying rotations. Default: False
keep_ratio (bool): When True, the original aspect ratio will be kept when the random scale is applied.
                   Default: False.
rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or
    "ellipse"[1].
    Default: "largest_box"
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

Reference: [1] https://arxiv.org/abs/2109.13488

class albumentations.augmentations.geometric.transforms.ElasticTransform (alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, same_dxdy=False, p=0.5) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.


alpha (float):
sigma (float): Gaussian filter parameter.
alpha_affine (float): The range will be (-alpha_affine, alpha_affine)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
    cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
    Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of ints,
            list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
approximate (boolean): Whether to smooth displacement map with fixed kernel size.
                       Enabling this option gives ~2X speedup on large images.
same_dxdy (boolean): Whether to use same random generated shift for x and y.
                     Enabling this option gives ~2X speedup.

Targets: image, mask, bbox

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.Flip [view source on GitHub]

Flip the input either horizontally, vertically or both horizontally and vertically.


p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.geometric.transforms.Flip.apply (self, img, d=0, **params)

d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).

class albumentations.augmentations.geometric.transforms.GridDistortion (num_steps=5, distort_limit=0.3, interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=False, p=0.5) [view source on GitHub]

num_steps (int): count of grid cells on each side.
distort_limit (float, [float, float]): If distort_limit is a single float, the range
    will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
    cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
    Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of ints,
            list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
normalized (bool): if true, distortion will be normalized to do not go outside the image. Default: False
    See for more information: https://github.com/albumentations-team/albumentations/pull/722

Targets: image, mask

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.HorizontalFlip [view source on GitHub]

Flip the input horizontally around the y-axis.


p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.OpticalDistortion (distort_limit=0.05, shift_limit=0.05, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5) [view source on GitHub]

distort_limit (float, [float, float]): If distort_limit is a single float, the range
    will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
shift_limit (float, [float, float])): If shift_limit is a single float, the range
    will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
    cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
    Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of ints,
            list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

Targets: image, mask, bbox

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.PadIfNeeded (min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=False, p=1.0) [view source on GitHub]

Pad side of the image / max if side is less than desired number.


min_height (int): minimal result image height.
min_width (int): minimal result image width.
pad_height_divisor (int): if not None, ensures image height is dividable by value of this argument.
pad_width_divisor (int): if not None, ensures image width is dividable by value of this argument.
position (Union[str, PositionType]): Position of the image. should be PositionType.CENTER or
    PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT.
    or PositionType.RANDOM. Default: PositionType.CENTER.
border_mode (OpenCV flag): OpenCV border mode.
value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of int,
            list of float): padding value for mask if border_mode is cv2.BORDER_CONSTANT.
p (float): probability of applying the transform. Default: 1.0.

Targets: image, mask, bbox, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.PadIfNeeded.PositionType

Enumerates the types of positions for placing an object within a container.

This Enum class is utilized to define specific anchor positions that an object can assume relative to a container. It's particularly useful in image processing, UI layout, and graphic design to specify the alignment and positioning of elements.

Attributes
CENTER (str): Specifies that the object should be placed at the center.
TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
RANDOM (str): Indicates that the object's position should be determined randomly.
class albumentations.augmentations.geometric.transforms.Perspective (scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Perform a random four point perspective transform of the input.


scale: standard deviation of the normal distributions. These are used to sample
    the random distances of the subimage's corners from the full image's corners.
    If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
keep_size: Whether to resize image back to their original size after applying the perspective
    transform. If set to False, the resulting images may end up having different shapes
    and will always be a list, never an array. Default: True
pad_mode (OpenCV flag): OpenCV border mode.
pad_val (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
    Default: 0
mask_pad_val (int, float, list of int, list of float): padding value for mask
    if border_mode is cv2.BORDER_CONSTANT. Default: 0
fit_output (bool): If True, the image plane size and position will be adjusted to still capture
    the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.)
    Otherwise, parts of the transformed image may be outside of the image plane.
    This setting should not be set to True when using large scale values as it could lead to very large images.
    Default: False
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.PiecewiseAffine (scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5) [view source on GitHub]

Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.

This is mostly a wrapper around scikit-image's PiecewiseAffine. See also Affine for a similar technique.

Note:
This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.
Note:
For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
this augmenter still has to perform an image-based augmentation,
which will make it significantly slower and not fully correct for such inputs than other transforms.

scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
    This scale factor is equivalent to the normal distribution's sigma.
    Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
    the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
    Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
        * If a single ``float``, then that value will always be used as the scale.
        * If a tuple ``(a, b)`` of ``float`` s, then a random value will
          be uniformly sampled per image from the interval ``[a, b]``.
nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
    Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
    You might have to then adjust scale to lower values.
        * If a single ``int``, then that value will always be used as the number of rows.
        * If a tuple ``(a, b)``, then a value from the discrete interval
          ``[a..b]`` will be uniformly sampled per image.
nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
interpolation (int): The order of interpolation. The order has to be in the range 0-5:
     - 0: Nearest-neighbor
     - 1: Bi-linear (default)
     - 2: Bi-quadratic
     - 3: Bi-cubic
     - 4: Bi-quartic
     - 5: Bi-quintic
mask_interpolation (int): same as interpolation but for mask.
cval (number): The constant value to use when filling in newly created pixels.
cval_mask (number): Same as cval but only for masks.
mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
    Points outside the boundaries of the input are filled according
    to the given mode.  Modes match the behaviour of `numpy.pad`.
absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
    The search for keypoints works by searching for the
    argmin (non-inverted) or argmax (inverted) in each channel. This
    parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
    as a keypoint. Use ``None`` to use no min/max. Default: 0.01

Targets: image, mask, keypoints, bboxes

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.ShiftScaleRotate (shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=False, p=0.5) [view source on GitHub]

Randomly apply affine transforms: translate, scale and rotate the input.


shift_limit ([float, float] or float): shift factor range for both height and width. If shift_limit
    is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and
    upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).
scale_limit ([float, float] or float): scaling factor range. If scale_limit is a single float value, the
    range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
    If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
    Default: (-0.1, 0.1).
rotate_limit ([int, int] or int): rotation range. If rotate_limit is a single int value, the
    range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
    cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
    Default: cv2.BORDER_REFLECT_101
value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
            list of int,
            list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
shift_limit_x ([float, float] or float): shift factor range for width. If it is set then this value
    instead of shift_limit will be used for shifting width.  If shift_limit_x is a single float value,
    the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in
    the range [0, 1]. Default: None.
shift_limit_y ([float, float] or float): shift factor range for height. If it is set then this value
    instead of shift_limit will be used for shifting height.  If shift_limit_y is a single float value,
    the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie
    in the range [0, 1]. Default: None.
rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
    Default: "largest_box"
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.Transpose [view source on GitHub]

Transpose the input by swapping rows and columns.


p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

class albumentations.augmentations.geometric.transforms.VerticalFlip [view source on GitHub]

Flip the input vertically around the x-axis.


p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask, bboxes, keypoints

Image types: uint8, float32

albumentations.augmentations.transforms

class albumentations.augmentations.transforms.ChannelShuffle [view source on GitHub]

Randomly rearrange channels of the input RGB image.


p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.CLAHE (clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5) [view source on GitHub]

Apply Contrast Limited Adaptive Histogram Equalization to the input image.


clip_limit: upper threshold value for contrast limiting.
    If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
tile_grid_size: size of grid for histogram equalization. Default: (8, 8).
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2, always_apply=False, p=0.5) [view source on GitHub]

Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision, this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8 overflow, but we use value saturation.


brightness (float or tuple of float (min, max)): How much to jitter brightness.
    brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
    or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of float (min, max)): How much to jitter contrast.
    contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]
    or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of float (min, max)): How much to jitter saturation.
    saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
    or the given [min, max]. Should be non negative numbers.
hue (float or tuple of float (min, max)): How much to jitter hue.
    hue_factor is chosen uniformly from [-hue, hue] or the given [min, max].
    Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

class albumentations.augmentations.transforms.Downscale (scale_min=0.25, scale_max=0.25, interpolation=None, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by downscaling and upscaling back.


scale_min: lower bound on the image scale. Should be < 1.
scale_max:  lower bound on the image scale. Should be .
interpolation: cv2 interpolation method. Could be:
    - single cv2 interpolation flag - selected method will be used for downscale and upscale.
    - dict(downscale=flag, upscale=flag)
    - Downscale.Interpolation(downscale=flag, upscale=flag) -
    Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.Emboss (alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5) [view source on GitHub]

Emboss the input image and overlays the result with the original image.


alpha: range to choose the visibility of the embossed image. At 0, only the original image is
    visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
strength: strength range of the embossing. Default: (0.2, 0.7).
p: probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.augmentations.transforms.Equalize (mode='cv', by_channels=True, mask=None, mask_params=(), always_apply=False, p=0.5) [view source on GitHub]

Equalize the image histogram.


mode (str): {'cv', 'pil'}. Use OpenCV or Pillow equalization method.
by_channels (bool): If True, use equalization by channels separately,
    else convert image to YCbCr representation and use equalization by `Y` channel.
mask (np.ndarray, callable): If given, only the pixels selected by
    the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable.
    Function signature must include `image` argument.
mask_params (list of str): Params for mask function.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.FancyPCA (alpha=0.1, always_apply=False, p=0.5) [view source on GitHub]

Augment RGB image using FancyPCA from Krizhevsky's paper "ImageNet Classification with Deep Convolutional Neural Networks"


alpha:  how much to perturb/scale the eigen vecs and vals.
    scale is samples from gaussian distribution (mu=0, sigma=alpha)

Targets: image

Image types: 3-channel uint8 images only

Credit: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image https://pixelatedbrian.github.io/2018-04-29-fancy_pca/

class albumentations.augmentations.transforms.FromFloat (dtype='uint16', max_value=None, always_apply=False, p=1.0) [view source on GitHub]

Take an input array where all values should lie in the range [0, 1.0], multiply them by max_value and then cast the resulted value to a type specified by dtype. If max_value is None the transform will try to infer the maximum value for the data type from the dtype argument.

This is the inverse transform for :class:~albumentations.augmentations.transforms.ToFloat.


max_value: maximum possible input value. Default: None.
dtype: data type of the output. See the `'Data types' page from the NumPy docs`_.
    Default: 'uint16'.
p: probability of applying the transform. Default: 1.0.

Targets: image

Image types: float32

.. _'Data types' page from the NumPy docs: https://docs.scipy.org/doc/numpy/user/basics.types.html

class albumentations.augmentations.transforms.GaussNoise (var_limit=(10.0, 50.0), mean=0, per_channel=True, always_apply=False, p=0.5) [view source on GitHub]

Apply gaussian noise to the input image.


var_limit: variance range for noise. If var_limit is a single float, the range
    will be (0, var_limit). Default: (10.0, 50.0).
mean: mean of the noise. Default: 0
per_channel: if set to True, noise will be sampled for each channel independently.
    Otherwise, the noise will be sampled once for all channels. Default: True
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.HueSaturationValue (hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, always_apply=False, p=0.5) [view source on GitHub]

Randomly change hue, saturation and value of the input image.


hue_shift_limit: range for changing hue. If hue_shift_limit is a single int, the range
    will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
sat_shift_limit: range for changing saturation. If sat_shift_limit is a single int,
    the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
val_shift_limit: range for changing value. If val_shift_limit is a single int, the range
    will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ImageCompression (quality_lower=99, quality_upper=100, compression_type=<ImageCompressionType.JPEG: 0>, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by Jpeg, WebP compression of an image.


quality_lower: lower bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.
quality_upper: upper bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.
compression_type (ImageCompressionType): should be ImageCompressionType.JPEG or ImageCompressionType.WEBP.
    Default: ImageCompressionType.JPEG

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ImageCompression.ImageCompressionType

Defines the types of image compression.

This Enum class is used to specify the image compression format.

Attributes
JPEG (int): Represents the JPEG image compression format.
WEBP (int): Represents the WEBP image compression format.

class albumentations.augmentations.transforms.InvertImg [view source on GitHub]

Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.


p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ISONoise (color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply camera sensor noise.


color_shift [float, float]: variance range for color hue change.
    Measured as a fraction of 360 degree Hue angle in HLS colorspace.
intensity ([float, float]: Multiplicative factor that control strength
    of color and luminace noise.
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.Lambda (image=None, mask=None, keypoint=None, bbox=None, name=None, always_apply=False, p=1.0) [view source on GitHub]

A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:


image: Image transformation function.
mask: Mask transformation function.
keypoint: Keypoint transformation function.
bbox: BBox transformation function.
always_apply: Indicates whether this transformation should be always applied.
p: probability of applying the transform. Default: 1.0.

Targets: image, mask, bboxes, keypoints

Image types: Any

class albumentations.augmentations.transforms.MultiplicativeNoise (multiplier=(0.9, 1.1), per_channel=False, elementwise=False, always_apply=False, p=0.5) [view source on GitHub]

Multiply image to random number or array of numbers.


multiplier: If single float image will be multiplied to this number.
    If tuple of float multiplier will be in range `[multiplier[0], multiplier[1])`. Default: (0.9, 1.1).
per_channel: If `False`, same values for all channels will be used.
    If `True` use sample values for each channels. Default False.
elementwise: If `False` multiply multiply all pixels in an image with a random value sampled once.
    If `True` Multiply image pixels with values that are pixelwise randomly sampled. Default: False.

Targets: image

Image types: Any

class albumentations.augmentations.transforms.Normalize (mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, always_apply=False, p=1.0) [view source on GitHub]

Normalization is applied by the formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value)


mean: mean values
std: std values
max_pixel_value: maximum possible pixel value

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.PixelDropout (dropout_prob=0.01, per_channel=False, drop_value=0, mask_drop_value=None, always_apply=False, p=0.5) [view source on GitHub]

Set pixels to 0 with some probability.


dropout_prob (float): pixel drop probability. Default: 0.01
per_channel (bool): if set to `True` drop mask will be sampled for each channel,
    otherwise the same mask will be sampled for all channels. Default: False
drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
    If set to None value will be sampled randomly, default ranges will be used:
        - uint8 - [0, 255]
        - uint16 - [0, 65535]
        - uint32 - [0, 4294967295]
        - float, double - [0, 1]
    Default: 0
mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
    If set to None masks will be unchanged. Default: 0
p (float): probability of applying the transform. Default: 0.5.

Targets: image, mask Image types: any

class albumentations.augmentations.transforms.Posterize (num_bits=4, always_apply=False, p=0.5) [view source on GitHub]

Reduce the number of bits for each color channel.


num_bits ([int, int] or int,
          or list of ints [r, g, b],
          or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
    If num_bits is a single value, the range will be [num_bits, num_bits].
    Must be in range [0, 8]. Default: 4.
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.RandomBrightnessContrast (brightness_limit=0.2, contrast_limit=0.2, brightness_by_max=True, always_apply=False, p=0.5) [view source on GitHub]

Randomly change brightness and contrast of the input image.


brightness_limit: factor range for changing brightness.
    If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
contrast_limit: factor range for changing contrast.
    If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
brightness_by_max: If True adjust contrast by image dtype maximum,
    else adjust contrast by image mean.
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomFog (fog_coef_lower=0.3, fog_coef_upper=1, alpha_coef=0.08, always_apply=False, p=0.5) [view source on GitHub]

Simulates fog for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


fog_coef_lower: lower limit for fog intensity coefficient. Should be in [0, 1] range.
fog_coef_upper: upper limit for fog intensity coefficient. Should be in [0, 1] range.
alpha_coef: transparency of the fog circles. Should be in [0, 1] range.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomGamma (gamma_limit=(80, 120), always_apply=False, p=0.5) [view source on GitHub]

Applies random gamma correction to an image as a form of data augmentation.

This class adjusts the luminance of an image by applying gamma correction with a randomly selected gamma value from a specified range. Gamma correction can simulate various lighting conditions, potentially enhancing model generalization. For more details on gamma correction, see: https://en.wikipedia.org/wiki/Gamma_correction

Attributes
gamma_limit (Union[int, Tuple[int, int]]): The range for gamma adjustment. If `gamma_limit` is a single
    int, the range will be interpreted as (-gamma_limit, gamma_limit), defining how much
    to adjust the image's gamma. Default is (80, 120).
always_apply (bool): If `True`, the transform will always be applied, regardless of `p`.
    Default is `False`.
p (float): The probability that the transform will be applied. Default is 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomGravel (gravel_roi=(0.1, 0.4, 0.9, 0.9), number_of_patches=2, always_apply=False, p=0.5) [view source on GitHub]

Add gravels.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


gravel_roi: (top-left x, top-left y,
    bottom-right x, bottom right y). Should be in [0, 1] range
number_of_patches: no. of gravel patches required

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomGridShuffle (grid=(3, 3), always_apply=False, p=0.5) [view source on GitHub]

Random shuffle grid's cells on image.


grid ([int, int]): size of grid for splitting image.

Targets: image, mask, keypoints

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomRain (slant_lower=-10, slant_upper=10, drop_length=20, drop_width=1, drop_color=(200, 200, 200), blur_value=7, brightness_coefficient=0.7, rain_type=None, always_apply=False, p=0.5) [view source on GitHub]

Adds rain effects.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


slant_lower: should be in range [-20, 20].
slant_upper: should be in range [-20, 20].
drop_length: should be in range [0, 100].
drop_width: should be in range [1, 5].
drop_color (list of (r, g, b)): rain lines color.
blur_value (int): rainy view are blurry
brightness_coefficient (float): rainy days are usually shady. Should be in range [0, 1].
rain_type: One of [None, "drizzle", "heavy", "torrential"]

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomShadow (shadow_roi=(0, 0.5, 1, 1), num_shadows_lower=1, num_shadows_upper=2, shadow_dimension=5, always_apply=False, p=0.5) [view source on GitHub]

Simulates shadows for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


shadow_roi: region of the image where shadows
    will appear. All values should be in range [0, 1].
num_shadows_lower: Lower limit for the possible number of shadows.
    Should be in range [0, `num_shadows_upper`].
num_shadows_upper: Lower limit for the possible number of shadows.
    Should be in range [`num_shadows_lower`, inf].
shadow_dimension: number of edges in the shadow polygons

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomSnow (snow_point_lower=0.1, snow_point_upper=0.3, brightness_coeff=2.5, always_apply=False, p=0.5) [view source on GitHub]

Bleach out some pixel values simulating snow.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


snow_point_lower: lower_bond of the amount of snow. Should be in [0, 1] range
snow_point_upper: upper_bond of the amount of snow. Should be in [0, 1] range
brightness_coeff: larger number will lead to a more snow on the image. Should be >= 0

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomSunFlare (flare_roi=(0, 0, 1, 0.5), angle_lower=0, angle_upper=1, num_flare_circles_lower=6, num_flare_circles_upper=10, src_radius=400, src_color=(255, 255, 255), always_apply=False, p=0.5) [view source on GitHub]

Simulates Sun Flare for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library


flare_roi: region of the image where flare will appear (x_min, y_min, x_max, y_max).
    All values should be in range [0, 1].
angle_lower: should be in range [0, `angle_upper`].
angle_upper: should be in range [`angle_lower`, 1].
num_flare_circles_lower: lower limit for the number of flare circles.
    Should be in range [0, `num_flare_circles_upper`].
num_flare_circles_upper: upper limit for the number of flare circles.
    Should be in range [`num_flare_circles_lower`, inf].
src_radius:
src_color: color of the flare

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RandomToneCurve (scale=0.1, always_apply=False, p=0.5) [view source on GitHub]

Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.


scale: standard deviation of the normal distribution.
    Used to sample random distances to move two control points that modify the image's curve.
    Values should be in range [0, 1]. Default: 0.1

Targets: image

Image types: uint8

class albumentations.augmentations.transforms.RGBShift (r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, always_apply=False, p=0.5) [view source on GitHub]

Randomly shift values for each channel of the input RGB image.


r_shift_limit: range for changing values for the red channel. If r_shift_limit is a single
    int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
g_shift_limit: range for changing values for the green channel. If g_shift_limit is a
    single int, the range  will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
b_shift_limit: range for changing values for the blue channel. If b_shift_limit is a single
    int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.RingingOvershoot (blur_limit=(7, 15), cutoff=(0.7853981633974483, 1.5707963267948966), always_apply=False, p=0.5) [view source on GitHub]

Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.


blur_limit: maximum kernel size for sinc filter.
    Should be in range [3, inf). Default: (7, 15).
cutoff: range to choose the cutoff frequency in radians.
    Should be in range (0, np.pi)
    Default: (np.pi / 4, np.pi / 2).
p: probability of applying the transform. Default: 0.5.

Reference: dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter https://arxiv.org/abs/2107.10833

Targets: image

class albumentations.augmentations.transforms.Sharpen (alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image and overlays the result with the original image.


alpha: range to choose the visibility of the sharpened image. At 0, only the original image is
    visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
lightness: range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
p: probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.augmentations.transforms.Solarize (threshold=128, always_apply=False, p=0.5) [view source on GitHub]

Invert all pixel values above a threshold.


threshold: range for solarizing threshold.
    If threshold is a single value, the range will be [threshold, threshold]. Default: 128.
p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: any

class albumentations.augmentations.transforms.Spatter (mean=0.65, std=0.3, gauss_sigma=2, cutout_threshold=0.68, intensity=0.6, mode='rain', color=None, always_apply=False, p=0.5) [view source on GitHub]

Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.


mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
    If single float it will be used as mean.
    If tuple of float mean will be sampled from range `[mean[0], mean[1])`. Default: (0.65).
std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
    If single float it will be used as std.
    If tuple of float std will be sampled from range `[std[0], std[1])`. Default: (0.3).
gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
    If single float it will be used as gauss_sigma.
    If tuple of float gauss_sigma will be sampled from range `[sigma[0], sigma[1])`. Default: (2).
cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
    (determines number of drops). If single float it will used as cutout_threshold.
    If tuple of float cutout_threshold will be sampled from range `[cutout_threshold[0], cutout_threshold[1])`.
    Default: (0.68).
intensity (float, or tuple of floats): Intensity of corruption.
    If single float it will be used as intensity.
    If tuple of float intensity will be sampled from range `[intensity[0], intensity[1])`. Default: (0.6).
mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
     If list is provided type of corruption will be sampled list. Default: ("rain").
color (list of (r, g, b) or dict or None): Corruption elements color.
    If list uses provided list as color for specified mode.
    If dict uses provided color for specified mode. Color for each specified mode should be provided in dict.
    If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).
p (float): probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

Reference: | https://arxiv.org/pdf/1903.12261.pdf | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

class albumentations.augmentations.transforms.Superpixels (p_replace=0.1, n_segments=100, max_size=128, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Transform images partially/completely to their superpixel representation. This implementation uses skimage's version of the SLIC algorithm.


p_replace (float or tuple of float): Defines for any segment the probability that the pixels within that
    segment are replaced by their average color (otherwise, the pixels are not changed).

Examples:


        * A probability of ``0.0`` would mean, that the pixels in no
          segment are replaced by their average color (image is not
          changed at all).
        * A probability of ``0.5`` would mean, that around half of all
          segments are replaced by their average color.
        * A probability of ``1.0`` would mean, that all segments are
          replaced by their average color (resulting in a voronoi
          image).
    Behaviour based on chosen data types for this parameter:
        * If a ``float``, then that ``flat`` will always be used.
        * If ``tuple`` ``(a, b)``, then a random probability will be
          sampled from the interval ``[a, b]`` per image.
n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
    may deviate from this number). Lower value will lead to coarser superpixels.
    Higher values are computationally more intensive and will hence lead to a slowdown
    * If a single ``int``, then that value will always be used as the
      number of segments.
    * If a ``tuple`` ``(a, b)``, then a value from the discrete
      interval ``[a..b]`` will be sampled per image.
max_size (int or None): Maximum image size at which the augmentation is performed.
    If the width or height of an image exceeds this value, it will be
    downscaled before the augmentation so that the longest side matches `max_size`.
    This is done to speed up the process. The final output image has the same size as the input image.
    Note that in case `p_replace` is below ``1.0``,
    the down-/upscaling will affect the not-replaced pixels too.
    Use ``None`` to apply no down-/upscaling.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.

Targets: image

class albumentations.augmentations.transforms.TemplateTransform (templates, img_weight=0.5, template_weight=0.5, template_transform=None, name=None, always_apply=False, p=0.5) [view source on GitHub]

Apply blending of input image with specified templates

Parameters:

Name Type Description
templates numpy array or list of numpy arrays

Images as template for transform.

img_weight

If single float will be used as weight for input image. If tuple of float img_weight will be in range [img_weight[0], img_weight[1]). Default: 0.5.

template_weight

If single float will be used as weight for template. If tuple of float template_weight will be in range [template_weight[0], template_weight[1]). Default: 0.5.

template_transform

transformation object which could be applied to template, must produce template the same size as input image.

name

(Optional) Name of transform, used only for deserialization.

p

probability of applying the transform. Default: 0.5.

Targets: image Image types: uint8, float32

class albumentations.augmentations.transforms.ToFloat (max_value=None, always_apply=False, p=1.0) [view source on GitHub]

Divide pixel values by max_value to get a float32 output array where all values lie in the range [0, 1.0]. If max_value is None the transform will try to infer the maximum value by inspecting the data type of the input image.

See Also:
:class:`~albumentations.augmentations.transforms.FromFloat`

max_value: maximum possible input value. Default: None.
p: probability of applying the transform. Default: 1.0.

Targets: image

Image types: any type

class albumentations.augmentations.transforms.ToGray [view source on GitHub]

Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater than 127, invert the resulting grayscale image.


p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ToRGB (always_apply=True, p=1.0) [view source on GitHub]

Convert the input grayscale image to RGB.


p: probability of applying the transform. Default: 1.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.ToSepia (always_apply=False, p=0.5) [view source on GitHub]

Applies sepia filter to the input RGB image


p: probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8, float32

class albumentations.augmentations.transforms.UnsharpMask (blur_limit=(3, 7), sigma_limit=0.0, alpha=(0.2, 0.5), threshold=10, always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.


blur_limit: maximum Gaussian kernel size for blurring the input image.
    Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
    as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
    If set single value `blur_limit` will be in range (0, blur_limit).
    Default: (3, 7).
sigma_limit: Gaussian kernel standard deviation. Must be in range [0, inf).
    If set single value `sigma_limit` will be in range (0, sigma_limit).
    If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
alpha: range to choose the visibility of the sharpened image.
    At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
    Default: (0.2, 0.5).
threshold: Value to limit sharpening only for areas with high pixel difference between original image
    and it's smoothed version. Higher threshold means less sharpening on flat areas.
    Must be in range [0, 255]. Default: 10.
p: probability of applying the transform. Default: 0.5.

Reference: arxiv.org/pdf/2107.10833.pdf

Targets: image

albumentations.augmentations.utils

def albumentations.augmentations.utils.ensure_contiguous (func) [view source on GitHub]

Ensure that input img is contiguous.

def albumentations.augmentations.utils.get_opencv_dtype_from_numpy (value) [view source on GitHub]

Return a corresponding OpenCV dtype for a numpy's dtype :param value: Input dtype of numpy array :return: Corresponding dtype for OpenCV

def albumentations.augmentations.utils.preserve_channel_dim (func) [view source on GitHub]

Preserve dummy channel dim.

def albumentations.augmentations.utils.preserve_shape (func) [view source on GitHub]

Preserve shape of the image

albumentations.core special

albumentations.core.bbox_utils

class albumentations.core.bbox_utils.BboxParams (format, label_fields=None, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0, check_each_transform=True) [view source on GitHub]

Parameters of bounding boxes


format (str): format of bounding boxes. Should be 'coco', 'pascal_voc', 'albumentations' or 'yolo'.

    The `coco` format
        `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
    The `pascal_voc` format
        `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
    The `albumentations` format
        is like `pascal_voc`, but normalized,
        in other words: `[x_min, y_min, x_max, y_max]`, e.g. [0.2, 0.3, 0.4, 0.5].
    The `yolo` format
        `[x, y, width, height]`, e.g. [0.1, 0.2, 0.3, 0.4];
        `x`, `y` - normalized bbox center; `width`, `height` - normalized bbox width and height.
label_fields (list): list of fields that are joined with boxes, e.g labels.
    Should be same type as boxes.
min_area (float): minimum area of a bounding box. All bounding boxes whose
    visible area in pixels is less than this value will be removed. Default: 0.0.
min_visibility (float): minimum fraction of area for a bounding box
    to remain this box in list. Default: 0.0.
min_width (float): Minimum width of a bounding box. All bounding boxes whose width is
    less than this value will be removed. Default: 0.0.
min_height (float): Minimum height of a bounding box. All bounding boxes whose height is
    less than this value will be removed. Default: 0.0.
check_each_transform (bool): if `True`, then bboxes will be checked after each dual transform.
    Default: `True`

def albumentations.core.bbox_utils.calculate_bbox_area (bbox, rows, cols) [view source on GitHub]

Calculate the area of a bounding box in (fractional) pixels.


bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.

Returns:

Type Description
float

Area in (fractional) pixels of the (denormalized) bounding box.

def albumentations.core.bbox_utils.check_bbox (bbox) [view source on GitHub]

Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums

def albumentations.core.bbox_utils.check_bboxes (bboxes) [view source on GitHub]

Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums

def albumentations.core.bbox_utils.convert_bbox_from_albumentations (bbox, target_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a bounding box from the format used by albumentations to a format, specified in target_format.


bbox: An albumentations bounding box `(x_min, y_min, x_max, y_max)`.
target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
rows: Image height.
cols: Image width.
check_validity: Check if all boxes are valid boxes.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

tuple: A bounding box.
Note:
The `coco` format of a bounding box looks like `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
The `pascal_voc` format of a bounding box looks like `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
The `yolo` format of a bounding box looks like `[x, y, width, height]`, e.g. [0.3, 0.1, 0.05, 0.07].

ValueError: if `target_format` is not equal to `coco`, `pascal_voc` or `yolo`.

def albumentations.core.bbox_utils.convert_bbox_to_albumentations (bbox, source_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a bounding box from a format specified in source_format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in a form of (x_min, y_min, x_max, y_max) e.g. (0.15, 0.27, 0.67, 0.5).


bbox: A bounding box tuple.
source_format: format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.
check_validity: Check if all boxes are valid boxes.
rows: Image height.
cols: Image width.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

tuple: A bounding box `(x_min, y_min, x_max, y_max)`.
Note:
The `coco` format of a bounding box looks like `(x_min, y_min, width, height)`, e.g. (97, 12, 150, 200).
The `pascal_voc` format of a bounding box looks like `(x_min, y_min, x_max, y_max)`, e.g. (97, 12, 247, 212).
The `yolo` format of a bounding box looks like `(x, y, width, height)`, e.g. (0.3, 0.1, 0.05, 0.07);
where `x`, `y` coordinates of the center of the box, all values normalized to 1 by image height and width.

ValueError: if `target_format` is not equal to `coco` or `pascal_voc`, or `yolo`.
ValueError: If in YOLO format all labels not in range (0, 1).

def albumentations.core.bbox_utils.convert_bboxes_from_albumentations (bboxes, target_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a list of bounding boxes from the format used by albumentations to a format, specified in target_format.


bboxes: List of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
rows: Image height.
cols: Image width.
check_validity: Check if all boxes are valid boxes.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of bounding boxes.

def albumentations.core.bbox_utils.convert_bboxes_to_albumentations (bboxes, source_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a list bounding boxes from a format specified in source_format to the format used by albumentations

def albumentations.core.bbox_utils.denormalize_bbox (bbox, rows, cols) [view source on GitHub]

Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates by image height. This is an inverse operation for :func:~albumentations.augmentations.bbox.normalize_bbox.


bbox: Normalized bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

Denormalized bounding box `(x_min, y_min, x_max, y_max)`.

ValueError: If rows or cols is less or equal zero

def albumentations.core.bbox_utils.denormalize_bboxes (bboxes, rows, cols) [view source on GitHub]

Denormalize a list of bounding boxes.


bboxes: Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
rows: Image height.
cols: Image width.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

def albumentations.core.bbox_utils.filter_bboxes (bboxes, rows, cols, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0) [view source on GitHub]

Remove bounding boxes that either lie outside of the visible area by more then min_visibility or whose area in pixels is under the threshold set by min_area. Also it crops boxes to final image size.


bboxes: List of albumentation bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.
min_area: Minimum area of a bounding box. All bounding boxes whose visible area in pixels.
    is less than this value will be removed. Default: 0.0.
min_visibility: Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.
min_width: Minimum width of a bounding box. All bounding boxes whose width is
    less than this value will be removed. Default: 0.0.
min_height: Minimum height of a bounding box. All bounding boxes whose height is
    less than this value will be removed. Default: 0.0.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of bounding boxes.

def albumentations.core.bbox_utils.filter_bboxes_by_visibility (original_shape, bboxes, transformed_shape, transformed_bboxes, threshold=0.0, min_area=0.0) [view source on GitHub]

Filter bounding boxes and return only those boxes whose visibility after transformation is above the threshold and minimal area of bounding box in pixels is more then min_area.


original_shape: Original image shape `(height, width, ...)`.
bboxes: Original bounding boxes `[(x_min, y_min, x_max, y_max)]`.
transformed_shape: Transformed image shape `(height, width)`.
transformed_bboxes: Transformed bounding boxes `[(x_min, y_min, x_max, y_max)]`.
threshold: visibility threshold. Should be a value in the range [0.0, 1.0].
min_area: Minimal area threshold.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Filtered bounding boxes `[(x_min, y_min, x_max, y_max)]`.

def albumentations.core.bbox_utils.normalize_bbox (bbox, rows, cols) [view source on GitHub]

Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates by image height.


bbox: Denormalized bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

Normalized bounding box `(x_min, y_min, x_max, y_max)`.

ValueError: If rows or cols is less or equal zero

def albumentations.core.bbox_utils.normalize_bboxes (bboxes, rows, cols) [view source on GitHub]

Normalize a list of bounding boxes.


bboxes: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
rows: Image height.
cols: Image width.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

def albumentations.core.bbox_utils.union_of_bboxes (height, width, bboxes, erosion_rate=0.0) [view source on GitHub]

Calculate union of bounding boxes.


height (float): Height of image or space.
width (float): Width of image or space.
bboxes (List[tuple]): List like bounding boxes. Format is `[(x_min, y_min, x_max, y_max)]`.
erosion_rate (float): How much each bounding box can be shrunk, useful for erosive cropping.
    Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox to lose its volume.

Returns:

Type Description
Tuple[float, float, float, float]

tuple: A bounding box `(x_min, y_min, x_max, y_max)`.

albumentations.core.composition

class albumentations.core.composition.Compose (transforms, bbox_params=None, keypoint_params=None, additional_targets=None, p=1.0, is_check_shapes=True) [view source on GitHub]

Compose transforms and handle all transformations regarding bounding boxes


transforms (list): list of transformations to compose.
bbox_params (BboxParams): Parameters for bounding boxes transforms
keypoint_params (KeypointParams): Parameters for keypoints transforms
additional_targets (dict): Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}
p (float): probability of applying all list of transforms. Default: 1.0.
is_check_shapes (bool): If True shapes consistency of images/mask/masks would be checked on each call. If you
    would like to disable this check - pass False (do it only if you are sure in your data consistency).

class albumentations.core.composition.OneOf (transforms, p=0.5) [view source on GitHub]

Select one of transforms to apply. Selected transform will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.


transforms (list): list of transformations to compose.
p (float): probability of applying selected transform. Default: 0.5.

class albumentations.core.composition.OneOrOther (first=None, second=None, transforms=None, p=0.5) [view source on GitHub]

Select one or another transform to apply. Selected transform will be called with force_apply=True.

class albumentations.core.composition.PerChannel (transforms, channels=None, p=0.5) [view source on GitHub]

Apply transformations per-channel


transforms (list): list of transformations to compose.
channels (sequence): channels to apply the transform to. Pass None to apply to all.
Default: None (apply to all)
p (float): probability of applying the transform. Default: 0.5.

class albumentations.core.composition.Sequential (transforms, p=0.5) [view source on GitHub]

Sequentially applies all transforms to targets.

Note:
This transform is not intended to be a replacement for `Compose`. Instead, it should be used inside `Compose`
the same way `OneOf` or `OneOrOther` are used. For instance, you can combine `OneOf` with `Sequential` to
create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly
chose sequence to input data (see the `Example` section for an example definition of such pipeline).

Examples:


>>> import albumentations as A
>>> transform = A.Compose([
>>>    A.OneOf([
>>>        A.Sequential([
>>>            A.HorizontalFlip(p=0.5),
>>>            A.ShiftScaleRotate(p=0.5),
>>>        ]),
>>>        A.Sequential([
>>>            A.VerticalFlip(p=0.5),
>>>            A.RandomBrightnessContrast(p=0.5),
>>>        ]),
>>>    ], p=1)
>>> ])

class albumentations.core.composition.SomeOf (transforms, n, replace=True, p=1) [view source on GitHub]

Select N transforms to apply. Selected transforms will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.


transforms (list): list of transformations to compose.
n (int): number of transforms to apply.
replace (bool): Whether the sampled transforms are with or without replacement. Default: True.
p (float): probability of applying selected transform. Default: 1.

albumentations.core.keypoints_utils

class albumentations.core.keypoints_utils.KeypointParams (format, label_fields=None, remove_invisible=True, angle_in_degrees=True, check_each_transform=True) [view source on GitHub]

Parameters of keypoints


format (str): format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

    x - X coordinate,

    y - Y coordinate

    s - Keypoint scale

    a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)
label_fields (list): list of fields that are joined with keypoints, e.g labels.
    Should be same type as keypoints.
remove_invisible (bool): to remove invisible points after transform or not
angle_in_degrees (bool): angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints
check_each_transform (bool): if `True`, then keypoints will be checked after each dual transform.
    Default: `True`

class albumentations.core.keypoints_utils.KeypointsProcessor (params, additional_targets=None) [view source on GitHub]

albumentations.core.keypoints_utils.KeypointsProcessor.filter (self, data, rows, cols)

The function filters a sequence of data based on the number of rows and columns, and returns a sequence of keypoints.

:param data: The data parameter is a sequence of sequences. Each inner sequence represents a set of keypoints :type data: Sequence[Sequence] :param rows: The rows parameter represents the number of rows in the data matrix. It specifies the number of rows that will be used for filtering the keypoints :type rows: int :param cols: The parameter "cols" represents the number of columns in the grid that the keypoints will be filtered on :type cols: int :return: a sequence of KeypointType objects.

def albumentations.core.keypoints_utils.check_keypoint (kp, rows, cols) [view source on GitHub]

Check if keypoint coordinates are less than image shapes

def albumentations.core.keypoints_utils.check_keypoints (keypoints, rows, cols) [view source on GitHub]

Check if keypoints boundaries are less than image shapes

albumentations.core.serialization

class albumentations.core.serialization.Serializable [view source on GitHub]

albumentations.core.serialization.Serializable.to_dict (self, on_not_implemented_error='raise')

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.


self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
    method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
    If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
    but no transform parameters will be serialized.
on_not_implemented_error (str): `raise` or `warn`.

class albumentations.core.serialization.SerializableMeta [view source on GitHub]

A metaclass that is used to register classes in SERIALIZABLE_REGISTRY or NON_SERIALIZABLE_REGISTRY so they can be found later while deserializing transformation pipeline using classes full names.

albumentations.core.serialization.SerializableMeta.__new__ (cls, name, bases, *args, **kwargs) special staticmethod

Create and return a new object. See help(type) for accurate signature.

def albumentations.core.serialization.from_dict (transform_dict, nonserializable=None) [view source on GitHub]


transform_dict: A dictionary with serialized transform pipeline.
nonserializable (dict): A dictionary that contains non-serializable transforms.
    This dictionary is required when you are restoring a pipeline that contains non-serializable transforms.
    Keys in that dictionary should be named same as `name` arguments in respective transforms from
    a serialized pipeline.

def albumentations.core.serialization.get_shortest_class_fullname (cls) [view source on GitHub]

The function get_shortest_class_fullname takes a class object as input and returns its shortened full name.

:param cls: The parameter cls is of type Type[BasicCompose], which means it expects a class that is a subclass of BasicCompose :type cls: Type[BasicCompose] :return: a string, which is the shortened version of the full class name.

def albumentations.core.serialization.load (filepath_or_buffer, data_format='json', nonserializable=None) [view source on GitHub]

Load a serialized pipeline from a file or file-like object and construct a transform pipeline.


filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to read the serialized
    data from.
    If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
    the serialized data will be read from it directly.
data_format (str): The format of the serialized data. Valid options are 'json' and 'yaml'.
    Defaults to 'json'.
nonserializable (Optional[Dict[str, Any]]): A dictionary that contains non-serializable transforms.
    This dictionary is required when restoring a pipeline that contains non-serializable transforms.
    Keys in the dictionary should be named the same as the `name` arguments in respective transforms
    from the serialized pipeline. Defaults to None.

Returns:

Type Description
object

object: The deserialized transform pipeline.

ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

def albumentations.core.serialization.register_additional_transforms () [view source on GitHub]

Register transforms that are not imported directly into the albumentations module by checking the availability of optional dependencies.

def albumentations.core.serialization.save (transform, filepath_or_buffer, data_format='json', on_not_implemented_error='raise') [view source on GitHub]

Serialize a transform pipeline and save it to either a file specified by a path or a file-like object in either JSON or YAML format.


transform (Serializable): The transform pipeline to serialize.
filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to write the serialized
    data to.
    If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
    the serialized data will be written to it directly.
data_format (str): The format to serialize the data in. Valid options are 'json' and 'yaml'.
    Defaults to 'json'.
on_not_implemented_error (str): Determines the behavior if a transform does not implement the `to_dict` method.
    If set to 'raise', a `NotImplementedError` is raised. If set to 'warn', the exception is ignored, and
    no transform arguments are saved. Defaults to 'raise'.

ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

def albumentations.core.serialization.to_dict (transform, on_not_implemented_error='raise') [view source on GitHub]

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.


transform: A transform that should be serialized. If the transform doesn't implement the `to_dict`
    method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
    If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
    but no transform parameters will be serialized.
on_not_implemented_error (str): `raise` or `warn`.

albumentations.core.transforms_interface

class albumentations.core.transforms_interface.BasicTransform (always_apply=False, p=0.5) [view source on GitHub]

albumentations.core.transforms_interface.BasicTransform.add_targets (self, additional_targets)

Add targets to transform them the same way as one of existing targets ex: {'target_image': 'image'} ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'} by the way you must have at least one object with key 'image'


additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

class albumentations.core.transforms_interface.DualTransform [view source on GitHub]

Transform for segmentation task.

class albumentations.core.transforms_interface.ImageOnlyTransform [view source on GitHub]

Transform applied to image only.

class albumentations.core.transforms_interface.NoOp [view source on GitHub]

Does nothing

def albumentations.core.transforms_interface.to_tuple (param, low=None, bias=None) [view source on GitHub]

Convert input argument to a min-max tuple.


param: Input value which could be a scalar or a sequence of exactly 2 scalars.
low: Second element of the tuple, provided as an optional argument for when `param` is a scalar.
bias: An offset added to both elements of the tuple.

Returns:

Type Description
Union[Tuple[int, int], Tuple[float, float]]

A tuple of two scalars, optionally adjusted by `bias`.
Raises ValueError for invalid combinations or types of arguments.

albumentations.pytorch special

albumentations.pytorch.transforms

class albumentations.pytorch.transforms.ToTensorV2 (transpose_mask=False, always_apply=True, p=1.0) [view source on GitHub]

Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy HWC format and converts them to PyTorch CHW format. If the image is in HW format, it will be converted to PyTorch HW.

Attributes
transpose_mask (bool): If True, transposes 3D input mask dimensions from `[height, width, num_channels]` to
    `[num_channels, height, width]`.
always_apply (bool): Indicates if this transformation should be always applied. Default: True.
p (float): Probability of applying the transform. Default: 1.0.