Skip to content

Full API Reference on a single page

Pixel-level transforms

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

Spatial-level transforms

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform Image Mask BBoxes Keypoints Global Label
Affine
BBoxSafeRandomCrop
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
ElasticTransform
Flip
GridDistortion
GridDropout
HorizontalFlip
Lambda
LongestMaxSize
MaskDropout
MixUp
NoOp
OpticalDistortion
PadIfNeeded
Perspective
PiecewiseAffine
PixelDropout
RandomCrop
RandomCropFromBorders
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
SafeRotate
ShiftScaleRotate
SmallestMaxSize
Transpose
VerticalFlip
XYMasking

augmentations special

blur special

transforms

class AdvancedBlur (blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=False, p=0.5) [view source on GitHub]

Blurs the input image using a Generalized Normal filter with randomly selected parameters.

This transform also adds multiplicative noise to the generated kernel before convolution, affecting the image in a unique way that combines blurring and noise injection for enhanced data augmentation.

Parameters:

Name Type Description
blur_limit ScaleIntType

Maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If a single value is provided, blur_limit will be in the range (0, blur_limit). Defaults to (3, 7).

sigma_x_limit ScaleFloatType

Gaussian kernel standard deviation for the X dimension. Must be in range [0, inf). If a single value is provided, sigma_x_limit will be in the range (0, sigma_limit). If set to 0, sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Defaults to (0.2, 1.0).

sigma_y_limit ScaleFloatType

Gaussian kernel standard deviation for the Y dimension. Must follow the same rules as sigma_x_limit. Defaults to (0.2, 1.0).

rotate_limit ScaleIntType

Range from which a random angle used to rotate the Gaussian kernel is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Defaults to (-90, 90).

beta_limit ScaleFloatType

Distribution shape parameter. 1 represents the normal distribution. Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it lighter than normal. Defaults to (0.5, 8.0).

noise_limit ScaleFloatType

Multiplicative factor that controls the strength of kernel noise. Must be positive and preferably centered around 1.0. If a single value is provided, noise_limit will be in the range (0, noise_limit). Defaults to (0.75, 1.25).

p float

Probability of applying the transform. Defaults to 0.5.

Reference

"Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data", available at https://arxiv.org/abs/2107.10833

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/blur/transforms.py
Python
class AdvancedBlur(ImageOnlyTransform):
    """Blurs the input image using a Generalized Normal filter with randomly selected parameters.

    This transform also adds multiplicative noise to the generated kernel before convolution,
    affecting the image in a unique way that combines blurring and noise injection for enhanced
    data augmentation.

    Args:
        blur_limit (ScaleIntType, optional): Maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If a single value is provided, `blur_limit` will be in the range (0, blur_limit).
            Defaults to (3, 7).
        sigma_x_limit ScaleFloatType: Gaussian kernel standard deviation for the X dimension.
            Must be in range [0, inf). If a single value is provided, `sigma_x_limit` will be in the range
            (0, sigma_limit). If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
            Defaults to (0.2, 1.0).
        sigma_y_limit ScaleFloatType: Gaussian kernel standard deviation for the Y dimension.
            Must follow the same rules as `sigma_x_limit`.
            Defaults to (0.2, 1.0).
        rotate_limit (ScaleIntType, optional): Range from which a random angle used to rotate the Gaussian kernel
            is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit).
            Defaults to (-90, 90).
        beta_limit (ScaleFloatType, optional): Distribution shape parameter. 1 represents the normal distribution.
            Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it
            lighter than normal.
            Defaults to (0.5, 8.0).
        noise_limit (ScaleFloatType, optional): Multiplicative factor that controls the strength of kernel noise.
            Must be positive and preferably centered around 1.0. If a single value is provided,
            `noise_limit` will be in the range (0, noise_limit).
            Defaults to (0.75, 1.25).
        p (float, optional): Probability of applying the transform.
            Defaults to 0.5.

    Reference:
        "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data",
        available at https://arxiv.org/abs/2107.10833

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_x_limit: ScaleFloatType = (0.2, 1.0),
        sigma_y_limit: ScaleFloatType = (0.2, 1.0),
        sigmaX_limit: Optional[ScaleFloatType] = None,  # noqa: N803
        sigmaY_limit: Optional[ScaleFloatType] = None,  # noqa: N803
        rotate_limit: ScaleIntType = 90,
        beta_limit: ScaleFloatType = (0.5, 8.0),
        noise_limit: ScaleFloatType = (0.9, 1.1),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))

        # Handle deprecation of sigmaX_limit and sigmaY_limit
        if sigmaX_limit is not None:
            warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning)
            sigma_x_limit = sigmaX_limit

        if sigmaY_limit is not None:
            warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning)
            sigma_y_limit = sigmaY_limit

        self.sigma_x_limit = self.__check_values(to_tuple(sigma_x_limit, 0.0), name="sigma_x_limit")
        self.sigma_y_limit = self.__check_values(to_tuple(sigma_y_limit, 0.0), name="sigma_y_limit")
        self.rotate_limit = to_tuple(rotate_limit)
        self.beta_limit = to_tuple(beta_limit, low=0.0)
        self.noise_limit = self.__check_values(to_tuple(noise_limit, 0.0), name="noise_limit")

        if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
            self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
        ):
            msg = "AdvancedBlur supports only odd blur limits."
            raise ValueError(msg)

        if self.sigma_x_limit[0] == 0 and self.sigma_y_limit[0] == 0:
            msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
            raise ValueError(msg)

        if not (self.beta_limit[0] < 1.0 < self.beta_limit[1]):
            msg = "Beta limit is expected to include 1.0."
            raise ValueError(msg)

    @staticmethod
    def __check_values(
        value: Sequence[float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
    ) -> Sequence[float]:
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{name} values should be between {bounds}")
        return value

    def apply(self, img: np.ndarray, kernel: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return FMain.convolve(img, kernel=kernel)

    def get_params(self) -> Dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        sigma_x = random.uniform(*self.sigma_x_limit)
        sigma_y = random.uniform(*self.sigma_y_limit)
        angle = np.deg2rad(random.uniform(*self.rotate_limit))

        # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
        beta = (
            random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
        )

        noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])

        # Generate mesh grid centered at zero.
        ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
        # > Shape (ksize, ksize, 2)
        grid = np.stack(np.meshgrid(ax, ax), axis=-1)

        # Calculate rotated sigma matrix
        d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
        u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
        sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

        inverse_sigma = np.linalg.inv(sigma_matrix)
        # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
        kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
        # Add noise
        kernel *= noise_matrix

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)
        return {"kernel": kernel}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str]:
        return (
            "blur_limit",
            "sigma_x_limit",
            "sigma_y_limit",
            "rotate_limit",
            "beta_limit",
            "noise_limit",
        )
class Blur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a random-sized kernel.

Parameters:

Name Type Description
blur_limit Union[int, Tuple[int, int]]

maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/blur/transforms.py
Python
class Blur(ImageOnlyTransform):
    """Blur the input image using a random-sized kernel.

    Args:
        blur_limit: maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))

    def apply(self, img: np.ndarray, kernel: int = 3, **params: Any) -> np.ndarray:
        return F.blur(img, kernel)

    def get_params(self) -> Dict[str, Any]:
        return {"ksize": int(random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2))))}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("blur_limit",)
class Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply defocus transform. See https://arxiv.org/abs/1903.12261.

Parameters:

Name Type Description
radius int, int) or int

range for radius of defocusing. If limit is a single int, the range will be [1, limit]. Default: (3, 10).

alias_blur float, float) or float

range for alias_blur of defocusing (sigma of gaussian blur). If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: Any

Source code in albumentations/augmentations/blur/transforms.py
Python
class Defocus(ImageOnlyTransform):
    """Apply defocus transform. See https://arxiv.org/abs/1903.12261.

    Args:
        radius ((int, int) or int): range for radius of defocusing.
            If limit is a single int, the range will be [1, limit]. Default: (3, 10).
        alias_blur ((float, float) or float): range for alias_blur of defocusing (sigma of gaussian blur).
            If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        Any

    """

    def __init__(
        self,
        radius: ScaleIntType = (3, 10),
        alias_blur: ScaleFloatType = (0.1, 0.5),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.radius = to_tuple(radius, low=1)
        self.alias_blur = to_tuple(alias_blur, low=0)

        if self.radius[0] <= 0:
            msg = "Parameter radius must be positive"
            raise ValueError(msg)

        if self.alias_blur[0] < 0:
            msg = "Parameter alias_blur must be non-negative"
            raise ValueError(msg)

    def apply(self, img: np.ndarray, radius: int = 3, alias_blur: float = 0.5, **params: Any) -> np.ndarray:
        return F.defocus(img, radius, alias_blur)

    def get_params(self) -> Dict[str, Any]:
        return {
            "radius": random_utils.randint(self.radius[0], self.radius[1] + 1),
            "alias_blur": random_utils.uniform(self.alias_blur[0], self.alias_blur[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("radius", "alias_blur")
class GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a Gaussian filter with a random kernel size.

Parameters:

Name Type Description
blur_limit int, (int, int

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigma_limit float, (float, float

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigma_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/blur/transforms.py
Python
class GaussianBlur(ImageOnlyTransform):
    """Blur the input image using a Gaussian filter with a random kernel size.

    Args:
        blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 0))
        self.sigma_limit = to_tuple(sigma_limit if sigma_limit is not None else 0, 0)

        if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
            self.blur_limit = 3, max(3, self.blur_limit[1])
            warnings.warn(
                "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                "blur_limit minimum value changed to 3."
            )

        if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
            self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
        ):
            msg = "GaussianBlur supports only odd blur limits."
            raise ValueError(msg)

    def apply(self, img: np.ndarray, ksize: int = 3, sigma: float = 0, **params: Any) -> np.ndarray:
        return F.gaussian_blur(img, ksize, sigma=sigma)

    def get_params(self) -> Dict[str, float]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
        if ksize != 0 and ksize % 2 != 1:
            ksize = (ksize + 1) % (self.blur_limit[1] + 1)

        return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("blur_limit", "sigma_limit")
class GlassBlur (sigma=0.7, max_delta=4, iterations=2, always_apply=False, mode='fast', p=0.5) [view source on GitHub]

Apply glass noise to the input image.

Parameters:

Name Type Description
sigma float

standard deviation for Gaussian kernel.

max_delta int

max distance between pixels which are swapped.

iterations int

number of repeats. Should be in range [1, inf). Default: (2).

mode str

mode of computation: fast or exact. Default: "fast".

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1903.12261 | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

Source code in albumentations/augmentations/blur/transforms.py
Python
class GlassBlur(Blur):
    """Apply glass noise to the input image.

    Args:
        sigma (float): standard deviation for Gaussian kernel.
        max_delta (int): max distance between pixels which are swapped.
        iterations (int): number of repeats.
            Should be in range [1, inf). Default: (2).
        mode (str): mode of computation: fast or exact. Default: "fast".
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
    |  https://arxiv.org/abs/1903.12261
    |  https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

    """

    def __init__(
        self,
        sigma: float = 0.7,
        max_delta: int = 4,
        iterations: int = 2,
        always_apply: bool = False,
        mode: str = "fast",
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        if iterations < 1:
            raise ValueError(f"Iterations should be more or equal to 1, but we got {iterations}")

        if mode not in ["fast", "exact"]:
            raise ValueError(f"Mode should be 'fast' or 'exact', but we got {mode}")

        self.sigma = sigma
        self.max_delta = max_delta
        self.iterations = iterations
        self.mode = mode

    def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray = None, **params: Any) -> np.ndarray:
        if dxy is None:
            msg = "dxy is None"
            raise ValueError(msg)

        return F.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
        img = params["image"]

        # generate array containing all necessary values for transformations
        width_pixels = img.shape[0] - self.max_delta * 2
        height_pixels = img.shape[1] - self.max_delta * 2
        total_pixels = int(width_pixels * height_pixels)
        dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

        return {"dxy": dxy}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("sigma", "max_delta", "iterations", "mode")

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]
class MedianBlur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a median filter with a random aperture linear size.

Parameters:

Name Type Description
blur_limit int

maximum aperture linear size for blurring the input image. Must be odd and in range [3, inf). Default: (3, 7).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/blur/transforms.py
Python
class MedianBlur(Blur):
    """Blur the input image using a median filter with a random aperture linear size.

    Args:
        blur_limit (int): maximum aperture linear size for blurring the input image.
            Must be odd and in range [3, inf). Default: (3, 7).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
        super().__init__(blur_limit, always_apply, p)

        if self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
            msg = "MedianBlur supports only odd blur limits."
            raise ValueError(msg)

    def apply(self, img: np.ndarray, kernel: int = 3, **params: Any) -> np.ndarray:
        return F.median_blur(img, kernel)
class MotionBlur (blur_limit=7, allow_shifted=True, always_apply=False, p=0.5) [view source on GitHub]

Apply motion blur to the input image using a random-sized kernel.

Parameters:

Name Type Description
blur_limit int

maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).

allow_shifted bool

if set to true creates non shifted kernels only, otherwise creates randomly shifted kernels. Default: True.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/blur/transforms.py
Python
class MotionBlur(Blur):
    """Apply motion blur to the input image using a random-sized kernel.

    Args:
        blur_limit (int): maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        allow_shifted (bool): if set to true creates non shifted kernels only,
            otherwise creates randomly shifted kernels. Default: True.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        blur_limit: ScaleIntType = 7,
        allow_shifted: bool = True,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(blur_limit=blur_limit, always_apply=always_apply, p=p)
        self.allow_shifted = allow_shifted

        if not allow_shifted and self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
            raise ValueError(f"Blur limit must be odd when centered=True. Got: {self.blur_limit}")

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "allow_shifted")

    def apply(self, img: np.ndarray, kernel: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return FMain.convolve(img, kernel=kernel)

    def get_params(self) -> Dict[str, Any]:
        ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
        if ksize <= TWO:
            raise ValueError(f"ksize must be > 2. Got: {ksize}")
        kernel = np.zeros((ksize, ksize), dtype=np.uint8)
        x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
        if x1 == x2:
            y1, y2 = random.sample(range(ksize), 2)
        else:
            y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

        def make_odd_val(v1: int, v2: int) -> Tuple[int, int]:
            len_v = abs(v1 - v2) + 1
            if len_v % 2 != 1:
                if v2 > v1:
                    v2 -= 1
                else:
                    v1 -= 1
            return v1, v2

        if not self.allow_shifted:
            x1, x2 = make_odd_val(x1, x2)
            y1, y2 = make_odd_val(y1, y2)

            xc = (x1 + x2) / 2
            yc = (y1 + y2) / 2

            center = ksize / 2 - 0.5
            dx = xc - center
            dy = yc - center
            x1, x2 = (int(i - dx) for i in [x1, x2])
            y1, y2 = (int(i - dy) for i in [y1, y2])

        cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

        # Normalize kernel
        return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
class ZoomBlur (max_factor=1.31, step_factor=(0.01, 0.03), always_apply=False, p=0.5) [view source on GitHub]

Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.

Parameters:

Name Type Description
max_factor float, float) or float

range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1.

step_factor float, float) or float

If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range [step_factor[0], step_factor[1]). Default: (0.01, 0.03). All step_factor values should be positive.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: Any

Source code in albumentations/augmentations/blur/transforms.py
Python
class ZoomBlur(ImageOnlyTransform):
    """Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.

    Args:
        max_factor ((float, float) or float): range for max factor for blurring.
            If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
            All max_factor values should be larger than 1.
        step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
            If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
            All step_factor values should be positive.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        Any

    """

    def __init__(
        self,
        max_factor: ScaleFloatType = 1.31,
        step_factor: ScaleFloatType = (0.01, 0.03),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.max_factor = to_tuple(max_factor, low=1.0)
        self.step_factor = to_tuple(step_factor, step_factor)

        if self.max_factor[0] < 1:
            msg = "Max factor must be larger or equal 1"
            raise ValueError(msg)
        if self.step_factor[0] <= 0:
            msg = "Step factor must be positive"
            raise ValueError(msg)

    def apply(self, img: np.ndarray, zoom_factors: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        if zoom_factors is None:
            msg = "zoom_factors is None"
            raise ValueError(msg)

        return F.zoom_blur(img, zoom_factors)

    def get_params(self) -> Dict[str, Any]:
        max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
        step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
        return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("max_factor", "step_factor")

crops special

functional

def bbox_crop (bbox, x_min, y_min, x_max, y_max, rows, cols) [view source on GitHub]

Crop a bounding box.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

x_min int
y_min int
x_max int
y_max int
rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A cropped bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/crops/functional.py
Python
def bbox_crop(
    bbox: BoxInternalType, x_min: int, y_min: int, x_max: int, y_max: int, rows: int, cols: int
) -> BoxInternalType:
    """Crop a bounding box.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        x_min:
        y_min:
        x_max:
        y_max:
        rows: Image rows.
        cols: Image cols.

    Returns:
        A cropped bounding box `(x_min, y_min, x_max, y_max)`.

    """
    crop_coords = x_min, y_min, x_max, y_max
    crop_height = y_max - y_min
    crop_width = x_max - x_min
    return crop_bbox_by_coords(bbox, crop_coords, crop_height, crop_width, rows, cols)
def crop_bbox_by_coords (bbox, crop_coords, crop_height, crop_width, rows, cols) [view source on GitHub]

Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A cropped box (x_min, y_min, x_max, y_max).

crop_coords Tuple[int, int, int, int]

Crop coordinates (x1, y1, x2, y2).

crop_height int
crop_width int
rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A cropped bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/crops/functional.py
Python
def crop_bbox_by_coords(
    bbox: BoxInternalType,
    crop_coords: Tuple[int, int, int, int],
    crop_height: int,
    crop_width: int,
    rows: int,
    cols: int,
) -> BoxInternalType:
    """Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the
    required height and width of the crop.

    Args:
        bbox: A cropped box `(x_min, y_min, x_max, y_max)`.
        crop_coords: Crop coordinates `(x1, y1, x2, y2)`.
        crop_height:
        crop_width:
        rows: Image rows.
        cols: Image cols.

    Returns:
        A cropped bounding box `(x_min, y_min, x_max, y_max)`.

    """
    normalized_bbox = denormalize_bbox(bbox, rows, cols)
    x_min, y_min, x_max, y_max = normalized_bbox[:4]
    x1, y1 = crop_coords[:2]
    cropped_bbox = x_min - x1, y_min - y1, x_max - x1, y_max - y1
    return cast(BoxInternalType, normalize_bbox(cropped_bbox, crop_height, crop_width))
def crop_keypoint_by_coords (keypoint, crop_coords) [view source on GitHub]

Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

Parameters:

Name Type Description
keypoint tuple

A keypoint (x, y, angle, scale).

crop_coords tuple

Crop box coords (x1, x2, y1, y2).

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/crops/functional.py
Python
def crop_keypoint_by_coords(
    keypoint: KeypointInternalType, crop_coords: Tuple[int, int, int, int]
) -> KeypointInternalType:
    """Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the
    required height and width of the crop.

    Args:
        keypoint (tuple): A keypoint `(x, y, angle, scale)`.
        crop_coords (tuple): Crop box coords `(x1, x2, y1, y2)`.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    x1, y1 = crop_coords[:2]
    return x - x1, y - y1, angle, scale
def keypoint_center_crop (keypoint, crop_height, crop_width, rows, cols) [view source on GitHub]

Keypoint center crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

crop_height int

Crop height.

crop_width int

Crop width.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/crops/functional.py
Python
def keypoint_center_crop(
    keypoint: KeypointInternalType, crop_height: int, crop_width: int, rows: int, cols: int
) -> KeypointInternalType:
    """Keypoint center crop.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        crop_height: Crop height.
        crop_width: Crop width.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    crop_coords = get_center_crop_coords(rows, cols, crop_height, crop_width)
    return crop_keypoint_by_coords(keypoint, crop_coords)
def keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols) [view source on GitHub]

Keypoint random crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

(tuple): A keypoint (x, y, angle, scale).

crop_height int

Crop height.

crop_width int

Crop width.

h_start int

Crop height start.

w_start int

Crop width start.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/crops/functional.py
Python
def keypoint_random_crop(
    keypoint: KeypointInternalType,
    crop_height: int,
    crop_width: int,
    h_start: float,
    w_start: float,
    rows: int,
    cols: int,
) -> KeypointInternalType:
    """Keypoint random crop.

    Args:
        keypoint: (tuple): A keypoint `(x, y, angle, scale)`.
        crop_height (int): Crop height.
        crop_width (int): Crop width.
        h_start (int): Crop height start.
        w_start (int): Crop width start.
        rows (int): Image height.
        cols (int): Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    crop_coords = get_random_crop_coords(rows, cols, crop_height, crop_width, h_start, w_start)
    return crop_keypoint_by_coords(keypoint, crop_coords)

transforms

class BBoxSafeRandomCrop (erosion_rate=0.0, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input without loss of bboxes.

Parameters:

Name Type Description
erosion_rate float

erosion rate applied on input image height before crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class BBoxSafeRandomCrop(DualTransform):
    """Crop a random part of the input without loss of bboxes.

    Args:
        erosion_rate: erosion rate applied on input image height before crop.
        p: probability of applying the transform. Default: 1.
    Targets:
        image, mask, bboxes
    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    def __init__(self, erosion_rate: float = 0.0, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.erosion_rate = erosion_rate

    def apply(
        self,
        img: np.ndarray,
        crop_height: int = 0,
        crop_width: int = 0,
        h_start: int = 0,
        w_start: int = 0,
        **params: Any,
    ) -> np.ndarray:
        return F.random_crop(img, crop_height, crop_width, h_start, w_start)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
        img_h, img_w = params["image"].shape[:2]
        if len(params["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
            erosive_h = int(img_h * (1.0 - self.erosion_rate))
            crop_height = img_h if erosive_h >= img_h else random.randint(erosive_h, img_h)
            return {
                "h_start": random.random(),
                "w_start": random.random(),
                "crop_height": crop_height,
                "crop_width": int(crop_height * img_w / img_h),
            }
        # get union of all bboxes
        x, y, x2, y2 = union_of_bboxes(
            width=img_w, height=img_h, bboxes=params["bboxes"], erosion_rate=self.erosion_rate
        )
        # find bigger region
        bx, by = x * random.random(), y * random.random()
        bx2, by2 = x2 + (1 - x2) * random.random(), y2 + (1 - y2) * random.random()
        bw, bh = bx2 - bx, by2 - by
        crop_height = img_h if bh >= 1.0 else int(img_h * bh)
        crop_width = img_w if bw >= 1.0 else int(img_w * bw)
        h_start = np.clip(0.0 if bh >= 1.0 else by / (1.0 - bh), 0.0, 1.0)
        w_start = np.clip(0.0 if bw >= 1.0 else bx / (1.0 - bw), 0.0, 1.0)
        return {"h_start": h_start, "w_start": w_start, "crop_height": crop_height, "crop_width": crop_width}

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        crop_height: int = 0,
        crop_width: int = 0,
        h_start: int = 0,
        w_start: int = 0,
        rows: int = 0,
        cols: int = 0,
        **params: Any,
    ) -> BoxInternalType:
        return F.bbox_random_crop(bbox, crop_height, crop_width, h_start, w_start, rows, cols)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image", "bboxes"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("erosion_rate",)
class CenterCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop the central part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

It is recommended to use uint8 images as input. Otherwise the operation will require internal conversion float32 -> uint8 -> float32 that causes worse performance.

Source code in albumentations/augmentations/crops/transforms.py
Python
class CenterCrop(DualTransform):
    """Crop the central part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        It is recommended to use uint8 images as input.
        Otherwise the operation will require internal conversion
        float32 -> uint8 -> float32 that causes worse performance.

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(self, height: int, width: int, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.height = height
        self.width = width

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.center_crop(img, self.height, self.width)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_center_crop(bbox, self.height, self.width, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.keypoint_center_crop(keypoint, self.height, self.width, **params)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("height", "width")
class Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0) [view source on GitHub]

Crop region from image.

Parameters:

Name Type Description
x_min int

Minimum upper left x coordinate.

y_min int

Minimum upper left y coordinate.

x_max int

Maximum lower right x coordinate.

y_max int

Maximum lower right y coordinate.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class Crop(DualTransform):
    """Crop region from image.

    Args:
        x_min: Minimum upper left x coordinate.
        y_min: Minimum upper left y coordinate.
        x_max: Maximum lower right x coordinate.
        y_max: Maximum lower right y coordinate.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        x_min: int = 0,
        y_min: int = 0,
        x_max: int = 1024,
        y_max: int = 1024,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_crop(bbox, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(self.x_min, self.y_min, self.x_max, self.y_max))

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("x_min", "y_min", "x_max", "y_max")
class CropAndPad (px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image). Padding adds pixels to the sides (e.g. black pixels). This transformation will never crop images below a height or width of 1.

Note

This transformation automatically resizes images back to their original size. To deactivate this, add the parameter keep_size=False.

Parameters:

Name Type Description
px int or tuple

The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter percent may be set, not both at the same time. * If None, then pixel-based cropping/padding will not be used. * If int, then that exact number of pixels will always be cropped/padded. * If a tuple of two int s with values a and b, then each side will be cropped/padded by a random amount sampled uniformly per image and side from the interval [a, b]. If however sample_independently is set to False, only one value will be sampled per image and used for all sides. * If a tuple of four entries, then the entries represent top, right, bottom, left. Each entry may be a single int (always crop/pad by exactly that value), a tuple of two int s a and b (crop/pad by an amount within [a, b]), a list of int s (crop/pad by a random value that is contained in the list).

percent float or tuple

The number of pixels to crop (negative values) or pad (positive values) on each side of the image given as a fraction of the image height/width. E.g. if this is set to -0.1, the transformation will always crop away 10% of the image's height at both the top and the bottom (both 10% each), as well as 10% of the width at the right and left. Expected value range is (-1.0, inf). Either this or the parameter px may be set, not both at the same time. * If None, then fraction-based cropping/padding will not be used. * If float, then that fraction will always be cropped/padded. * If a tuple of two float s with values a and b, then each side will be cropped/padded by a random fraction sampled uniformly per image and side from the interval [a, b]. If however sample_independently is set to False, only one value will be sampled per image and used for all sides. * If a tuple of four entries, then the entries represent top, right, bottom, left. Each entry may be a single float (always crop/pad by exactly that percent value), a tuple of two float s a and b (crop/pad by a fraction from [a, b]), a list of float s (crop/pad by a random value that is contained in the list).

pad_mode int

OpenCV border mode.

pad_cval number, Sequence[number]

The constant value to use if the pad mode is BORDER_CONSTANT. * If number, then that value will be used. * If a tuple of two number s and at least one of them is a float, then a random number will be uniformly sampled per image from the continuous interval [a, b] and used as the value. If both number s are int s, the interval is discrete. * If a list of number, then a random value will be chosen from the elements of the list and used as the value.

pad_cval_mask number, Sequence[number]

Same as pad_cval but only for masks.

keep_size bool

After cropping and padding, the result image will usually have a different height/width compared to the original input image. If this parameter is set to True, then the cropped/padded image will be resized to the input image's size, i.e. the output shape is always identical to the input shape.

sample_independently bool

If False and the values for px/percent result in exactly one probability distribution for all image sides, only one single value will be sampled from that probability distribution and used for all sides. I.e. the crop/pad amount then is the same for all sides. If True, four values will be sampled independently, one per side.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

Targets

image, mask, bboxes, keypoints

Image types: any

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropAndPad(DualTransform):
    """Crop and pad images by pixel amounts or fractions of image sizes.
    Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image).
    Padding adds pixels to the sides (e.g. black pixels).
    This transformation will never crop images below a height or width of ``1``.

    Note:
        This transformation automatically resizes images back to their original size. To deactivate this, add the
        parameter ``keep_size=False``.

    Args:
        px (int or tuple):
            The number of pixels to crop (negative values) or pad (positive values)
            on each side of the image. Either this or the parameter `percent` may
            be set, not both at the same time.
                * If ``None``, then pixel-based cropping/padding will not be used.
                * If ``int``, then that exact number of pixels will always be cropped/padded.
                * If a ``tuple`` of two ``int`` s with values ``a`` and ``b``,
                  then each side will be cropped/padded by a random amount sampled
                  uniformly per image and side from the interval ``[a, b]``. If
                  however `sample_independently` is set to ``False``, only one
                  value will be sampled per image and used for all sides.
                * If a ``tuple`` of four entries, then the entries represent top,
                  right, bottom, left. Each entry may be a single ``int`` (always
                  crop/pad by exactly that value), a ``tuple`` of two ``int`` s
                  ``a`` and ``b`` (crop/pad by an amount within ``[a, b]``), a
                  ``list`` of ``int`` s (crop/pad by a random value that is
                  contained in the ``list``).
        percent (float or tuple):
            The number of pixels to crop (negative values) or pad (positive values)
            on each side of the image given as a *fraction* of the image
            height/width. E.g. if this is set to ``-0.1``, the transformation will
            always crop away ``10%`` of the image's height at both the top and the
            bottom (both ``10%`` each), as well as ``10%`` of the width at the
            right and left.
            Expected value range is ``(-1.0, inf)``.
            Either this or the parameter `px` may be set, not both
            at the same time.
                * If ``None``, then fraction-based cropping/padding will not be
                  used.
                * If ``float``, then that fraction will always be cropped/padded.
                * If a ``tuple`` of two ``float`` s with values ``a`` and ``b``,
                  then each side will be cropped/padded by a random fraction
                  sampled uniformly per image and side from the interval
                  ``[a, b]``. If however `sample_independently` is set to
                  ``False``, only one value will be sampled per image and used for
                  all sides.
                * If a ``tuple`` of four entries, then the entries represent top,
                  right, bottom, left. Each entry may be a single ``float``
                  (always crop/pad by exactly that percent value), a ``tuple`` of
                  two ``float`` s ``a`` and ``b`` (crop/pad by a fraction from
                  ``[a, b]``), a ``list`` of ``float`` s (crop/pad by a random
                  value that is contained in the list).
        pad_mode (int): OpenCV border mode.
        pad_cval (number, Sequence[number]):
            The constant value to use if the pad mode is ``BORDER_CONSTANT``.
                * If ``number``, then that value will be used.
                * If a ``tuple`` of two ``number`` s and at least one of them is
                  a ``float``, then a random number will be uniformly sampled per
                  image from the continuous interval ``[a, b]`` and used as the
                  value. If both ``number`` s are ``int`` s, the interval is
                  discrete.
                * If a ``list`` of ``number``, then a random value will be chosen
                  from the elements of the ``list`` and used as the value.
        pad_cval_mask (number, Sequence[number]): Same as pad_cval but only for masks.
        keep_size (bool):
            After cropping and padding, the result image will usually have a
            different height/width compared to the original input image. If this
            parameter is set to ``True``, then the cropped/padded image will be
            resized to the input image's size, i.e. the output shape is always identical to the input shape.
        sample_independently (bool):
            If ``False`` *and* the values for `px`/`percent` result in exactly
            *one* probability distribution for all image sides, only one single
            value will be sampled from that probability distribution and used for
            all sides. I.e. the crop/pad amount then is the same for all sides.
            If ``True``, four values will be sampled independently, one per side.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        any

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        px: Optional[Union[int, List[int]]] = None,
        percent: Optional[Union[float, List[float]]] = None,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_cval: Union[float, Sequence[float]] = 0,
        pad_cval_mask: Union[float, Sequence[float]] = 0,
        keep_size: bool = True,
        sample_independently: bool = True,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)

        if px is None and percent is None:
            msg = "px and percent are empty!"
            raise ValueError(msg)
        if px is not None and percent is not None:
            msg = "Only px or percent may be set!"
            raise ValueError(msg)

        self.px = px
        self.percent = percent

        self.pad_mode = pad_mode
        self.pad_cval = pad_cval
        self.pad_cval_mask = pad_cval_mask

        self.keep_size = keep_size
        self.sample_independently = sample_independently

        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_params: Sequence[int] = (),
        pad_params: Sequence[int] = (),
        pad_value: float = 0,
        rows: int = 0,
        cols: int = 0,
        interpolation: int = cv2.INTER_LINEAR,
        **params: Any,
    ) -> np.ndarray:
        return F.crop_and_pad(
            img, crop_params, pad_params, pad_value, rows, cols, interpolation, self.pad_mode, self.keep_size
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        crop_params: Optional[Sequence[int]] = None,
        pad_params: Optional[Sequence[int]] = None,
        pad_value_mask: Optional[float] = None,
        rows: int = 0,
        cols: int = 0,
        interpolation: int = cv2.INTER_NEAREST,
        **params: Any,
    ) -> np.ndarray:
        return F.crop_and_pad(
            mask, crop_params, pad_params, pad_value_mask, rows, cols, interpolation, self.pad_mode, self.keep_size
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        crop_params: Optional[Sequence[int]] = None,
        pad_params: Optional[Sequence[int]] = None,
        rows: int = 0,
        cols: int = 0,
        result_rows: int = 0,
        result_cols: int = 0,
        **params: Any,
    ) -> BoxInternalType:
        return F.crop_and_pad_bbox(bbox, crop_params, pad_params, rows, cols, result_rows, result_cols)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        crop_params: Optional[Sequence[int]] = None,
        pad_params: Optional[Sequence[int]] = None,
        rows: int = 0,
        cols: int = 0,
        result_rows: int = 0,
        result_cols: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        return F.crop_and_pad_keypoint(
            keypoint, crop_params, pad_params, rows, cols, result_rows, result_cols, self.keep_size
        )

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    @staticmethod
    def __prevent_zero(val1: int, val2: int, max_val: int) -> Tuple[int, int]:
        regain = abs(max_val) + 1
        regain1 = regain // 2
        regain2 = regain // 2
        if regain1 + regain2 < regain:
            regain1 += 1

        if regain1 > val1:
            diff = regain1 - val1
            regain1 = val1
            regain2 += diff
        elif regain2 > val2:
            diff = regain2 - val2
            regain2 = val2
            regain1 += diff

        val1 = val1 - regain1
        val2 = val2 - regain2

        return val1, val2

    @staticmethod
    def _prevent_zero(crop_params: List[int], height: int, width: int) -> List[int]:
        top, right, bottom, left = crop_params

        remaining_height = height - (top + bottom)
        remaining_width = width - (left + right)

        if remaining_height < 1:
            top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
        if remaining_width < 1:
            left, right = CropAndPad.__prevent_zero(left, right, width)

        return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        if self.px is not None:
            new_params = self._get_px_params()
        else:
            percent_params = self._get_percent_params()
            new_params = [
                int(percent_params[0] * height),
                int(percent_params[1] * width),
                int(percent_params[2] * height),
                int(percent_params[3] * width),
            ]

        pad_params = [max(i, 0) for i in new_params]

        crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

        top, right, bottom, left = crop_params
        crop_params = [left, top, width - right, height - bottom]
        result_rows = crop_params[3] - crop_params[1]
        result_cols = crop_params[2] - crop_params[0]
        if result_cols == width and result_rows == height:
            crop_params = []

        top, right, bottom, left = pad_params
        pad_params = [top, bottom, left, right]
        if any(pad_params):
            result_rows += top + bottom
            result_cols += left + right
        else:
            pad_params = []

        return {
            "crop_params": crop_params or None,
            "pad_params": pad_params or None,
            "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
            "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
            "result_rows": result_rows,
            "result_cols": result_cols,
        }

    def _get_px_params(self) -> List[int]:
        if self.px is None:
            msg = "px is not set"
            raise ValueError(msg)

        if isinstance(self.px, int):
            params = [self.px] * 4
        elif len(self.px) == TWO:
            if self.sample_independently:
                params = [random.randrange(*self.px) for _ in range(4)]
            else:
                px = random.randrange(*self.px)
                params = [px] * 4
        elif isinstance(self.px[0], int):
            params = self.px
        else:
            params = [random.randrange(*i) for i in self.px]

        return params

    def _get_percent_params(self) -> List[float]:
        if self.percent is None:
            msg = "percent is not set"
            raise ValueError(msg)

        if isinstance(self.percent, float):
            params = [self.percent] * 4
        elif len(self.percent) == TWO:
            if self.sample_independently:
                params = [random.uniform(*self.percent) for _ in range(4)]
            else:
                px = random.uniform(*self.percent)
                params = [px] * 4
        elif isinstance(self.percent[0], (int, float)):
            params = self.percent
        else:
            params = [random.uniform(*i) for i in self.percent]

        return params  # params = [top, right, bottom, left]

    @staticmethod
    def _get_pad_value(pad_value: Union[float, Sequence[float]]) -> Union[int, float]:
        if isinstance(pad_value, (int, float)):
            return pad_value

        if len(pad_value) == TWO:
            a, b = pad_value
            if isinstance(a, int) and isinstance(b, int):
                return random.randint(a, b)

            return random.uniform(a, b)

        return random.choice(pad_value)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "px",
            "percent",
            "pad_mode",
            "pad_cval",
            "pad_cval_mask",
            "keep_size",
            "sample_independently",
            "interpolation",
        )
class CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0) [view source on GitHub]

Crop area with mask if mask is non-empty, else make random crop.

Parameters:

Name Type Description
height int

vertical size of crop in pixels

width int

horizontal size of crop in pixels

ignore_values list of int

values to ignore in mask, 0 values are always ignored (e.g. if background value is 5 set ignore_values=[5] to ignore)

ignore_channels list of int

channels to ignore in mask (e.g. if background is a first channel set ignore_channels=[0] to ignore)

p float

probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropNonEmptyMaskIfExists(DualTransform):
    """Crop area with mask if mask is non-empty, else make random crop.

    Args:
        height: vertical size of crop in pixels
        width: horizontal size of crop in pixels
        ignore_values (list of int): values to ignore in mask, `0` values are always ignored
            (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
        ignore_channels (list of int): channels to ignore in mask
            (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        height: int,
        width: int,
        ignore_values: Optional[List[int]] = None,
        ignore_channels: Optional[List[int]] = None,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)

        if ignore_values is not None and not isinstance(ignore_values, list):
            raise ValueError(f"Expected `ignore_values` of type `list`, got `{type(ignore_values)}`")
        if ignore_channels is not None and not isinstance(ignore_channels, list):
            raise ValueError(f"Expected `ignore_channels` of type `list`, got `{type(ignore_channels)}`")

        self.height = height
        self.width = width
        self.ignore_values = ignore_values
        self.ignore_channels = ignore_channels

    def apply(
        self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
    ) -> np.ndarray:
        return F.crop(img, x_min, y_min, x_max, y_max)

    def apply_to_bbox(
        self, bbox: BoxInternalType, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
    ) -> BoxInternalType:
        return F.bbox_crop(
            bbox, x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max, rows=params["rows"], cols=params["cols"]
        )

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        x_min: int = 0,
        x_max: int = 0,
        y_min: int = 0,
        y_max: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
        mask_height, mask_width = mask.shape[:2]

        if self.ignore_values is not None:
            ignore_values_np = np.array(self.ignore_values)
            mask = np.where(np.isin(mask, ignore_values_np), 0, mask)

        if mask.ndim == THREE and self.ignore_channels is not None:
            target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
            mask = np.take(mask, target_channels, axis=-1)

        if self.height > mask_height or self.width > mask_width:
            raise ValueError(
                f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})"
            )

        return mask

    def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
        super().update_params(params, **kwargs)
        if "mask" in kwargs:
            mask = self._preprocess_mask(kwargs["mask"])
        elif "masks" in kwargs and len(kwargs["masks"]):
            masks = kwargs["masks"]
            mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
            for m in masks[1:]:
                mask |= self._preprocess_mask(m)
        else:
            msg = "Can not find mask for CropNonEmptyMaskIfExists"
            raise RuntimeError(msg)

        mask_height, mask_width = mask.shape[:2]

        if mask.any():
            mask = mask.sum(axis=-1) if mask.ndim == THREE else mask
            non_zero_yx = np.argwhere(mask)
            y, x = random.choice(non_zero_yx)
            x_min = x - random.randint(0, self.width - 1)
            y_min = y - random.randint(0, self.height - 1)
            x_min = np.clip(x_min, 0, mask_width - self.width)
            y_min = np.clip(y_min, 0, mask_height - self.height)
        else:
            x_min = random.randint(0, mask_width - self.width)
            y_min = random.randint(0, mask_height - self.height)

        x_max = x_min + self.width
        y_max = y_min + self.height

        params.update({"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max})
        return params

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("height", "width", "ignore_values", "ignore_channels")
class RandomCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCrop(DualTransform):
    """Crop a random part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(self, height: int, width: int, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.height = height
        self.width = width

    def apply(self, img: np.ndarray, h_start: int = 0, w_start: int = 0, **params: Any) -> np.ndarray:
        return F.random_crop(img, self.height, self.width, h_start, w_start)

    def get_params(self) -> Dict[str, float]:
        return {"h_start": random.random(), "w_start": random.random()}

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_random_crop(bbox, self.height, self.width, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.keypoint_random_crop(keypoint, self.height, self.width, **params)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("height", "width")
class RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image randomly cut parts from borders without resize at the end

Parameters:

Name Type Description
crop_left float

single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

crop_right float

single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

crop_top float

singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

crop_bottom float

single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropFromBorders(DualTransform):
    """Crop bbox from image randomly cut parts from borders without resize at the end

    Args:
        crop_left (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from left side in range [0, crop_left * width)
        crop_right (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from right side in range [(1 - crop_right) * width, width)
        crop_top (float): singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from top side in range [0, crop_top * height)
        crop_bottom (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from bottom side in range [(1 - crop_bottom) * height, height)
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        crop_left: float = 0.1,
        crop_right: float = 0.1,
        crop_top: float = 0.1,
        crop_bottom: float = 0.1,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        self.crop_left = crop_left
        self.crop_right = crop_right
        self.crop_top = crop_top
        self.crop_bottom = crop_bottom

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
        img = params["image"]
        x_min = random.randint(0, int(self.crop_left * img.shape[1]))
        x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * img.shape[1])), img.shape[1])
        y_min = random.randint(0, int(self.crop_top * img.shape[0]))
        y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * img.shape[0])), img.shape[0])
        return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}

    def apply(
        self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
    ) -> np.ndarray:
        return F.clamping_crop(img, x_min, y_min, x_max, y_max)

    def apply_to_mask(
        self, mask: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
    ) -> np.ndarray:
        return F.clamping_crop(mask, x_min, y_min, x_max, y_max)

    def apply_to_bbox(
        self, bbox: BoxInternalType, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        return F.bbox_crop(bbox, x_min, y_min, x_max, y_max, rows, cols)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        x_min: int = 0,
        x_max: int = 0,
        y_min: int = 0,
        y_max: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "crop_left", "crop_right", "crop_top", "crop_bottom"
class RandomCropNearBBox (max_part_shift=(0.3, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image with random shift by x,y coordinates

Parameters:

Name Type Description
max_part_shift float, (float, float

Max shift in height and width dimensions relative to cropping_bbox dimension. If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift). Default (0.3, 0.3).

cropping_bbox_key str

Additional target key for cropping box. Default cropping_bbox.

cropping_box_key str

[Deprecated] Use cropping_bbox_key instead.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:

Python
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_box')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])
Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropNearBBox(DualTransform):
    """Crop bbox from image with random shift by x,y coordinates

    Args:
        max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
            to `cropping_bbox` dimension.
            If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift).
            Default (0.3, 0.3).
        cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
        cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_box')],
        >>>              bbox_params=BboxParams("pascal_voc"))
        >>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        max_part_shift: ScaleFloatType = (0.3, 0.3),
        cropping_bbox_key: str = "cropping_bbox",
        cropping_box_key: Optional[str] = None,  # Deprecated
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        self.max_part_shift = to_tuple(max_part_shift, low=max_part_shift)

        # Check for deprecated parameter and issue warning
        if cropping_box_key is not None:
            warn(
                "The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
                "Use 'cropping_bbox_key' instead.",
                DeprecationWarning,
                stacklevel=2,
            )
            # Ensure the new parameter is used even if the old one is passed
            cropping_bbox_key = cropping_box_key

        self.cropping_bbox_key = cropping_bbox_key

        if min(self.max_part_shift) < 0 or max(self.max_part_shift) > 1:
            raise ValueError(f"Invalid max_part_shift. Got: {max_part_shift}")

    def apply(
        self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
    ) -> np.ndarray:
        return F.clamping_crop(img, x_min, y_min, x_max, y_max)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
        bbox = params[self.cropping_bbox_key]
        h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
        w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

        x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
        x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

        y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
        y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

        x_min = max(0, x_min)
        y_min = max(0, y_min)

        return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_crop(bbox, **params)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        x_min: int = 0,
        x_max: int = 0,
        y_min: int = 0,
        y_max: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    @property
    def targets_as_params(self) -> List[str]:
        return [self.cropping_bbox_key]

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("max_part_shift", "cropping_bbox_key")
class RandomResizedCrop (height, width, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Torchvision's variant of crop a random part of the input and rescale it to some size.

Parameters:

Name Type Description
height int

height after crop and resize.

width int

width after crop and resize.

scale float, float

range of size of the origin size cropped

ratio float, float

range of aspect ratio of the origin aspect ratio cropped

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomResizedCrop(_BaseRandomSizedCrop):
    """Torchvision's variant of crop a random part of the input and rescale it to some size.

    Args:
        height (int): height after crop and resize.
        width (int): width after crop and resize.
        scale ((float, float)): range of size of the origin size cropped
        ratio ((float, float)): range of aspect ratio of the origin aspect ratio cropped
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        height: int,
        width: int,
        scale: Tuple[float, float] = (0.08, 1.0),
        ratio: Tuple[float, float] = (0.75, 1.3333333333333333),
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(height=height, width=width, interpolation=interpolation, always_apply=always_apply, p=p)
        self.scale = scale
        self.ratio = ratio

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
        img = params["image"]
        area = img.shape[0] * img.shape[1]

        for _ in range(10):
            target_area = random.uniform(*self.scale) * area
            log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
            aspect_ratio = math.exp(random.uniform(*log_ratio))

            width = int(round(math.sqrt(target_area * aspect_ratio)))
            height = int(round(math.sqrt(target_area / aspect_ratio)))

            if 0 < width <= img.shape[1] and 0 < height <= img.shape[0]:
                i = random.randint(0, img.shape[0] - height)
                j = random.randint(0, img.shape[1] - width)
                return {
                    "crop_height": height,
                    "crop_width": width,
                    "h_start": i * 1.0 / (img.shape[0] - height + 1e-10),
                    "w_start": j * 1.0 / (img.shape[1] - width + 1e-10),
                }

        # Fallback to central crop
        in_ratio = img.shape[1] / img.shape[0]
        if in_ratio < min(self.ratio):
            width = img.shape[1]
            height = int(round(width / min(self.ratio)))
        elif in_ratio > max(self.ratio):
            height = img.shape[0]
            width = int(round(height * max(self.ratio)))
        else:  # whole image
            width = img.shape[1]
            height = img.shape[0]
        i = (img.shape[0] - height) // 2
        j = (img.shape[1] - width) // 2
        return {
            "crop_height": height,
            "crop_width": width,
            "h_start": i * 1.0 / (img.shape[0] - height + 1e-10),
            "w_start": j * 1.0 / (img.shape[1] - width + 1e-10),
        }

    def get_params(self) -> Dict[str, Any]:
        return {}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str]:
        return "height", "width", "scale", "ratio", "interpolation"
class RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size without loss of bboxes.

Parameters:

Name Type Description
height int

height after crop and resize.

width int

width after crop and resize.

erosion_rate float

erosion rate applied on input image height before crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
    """Crop a random part of the input and rescale it to some size without loss of bboxes.

    Args:
        height: height after crop and resize.
        width: width after crop and resize.
        erosion_rate: erosion rate applied on input image height before crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.
    Targets:
        image, mask, bboxes
    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    def __init__(
        self,
        height: int,
        width: int,
        erosion_rate: float = 0.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(erosion_rate, always_apply, p)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_height: int = 0,
        crop_width: int = 0,
        h_start: int = 0,
        w_start: int = 0,
        interpolation: int = cv2.INTER_LINEAR,
        **params: Any,
    ) -> np.ndarray:
        crop = F.random_crop(img, crop_height, crop_width, h_start, w_start)
        return FGeometric.resize(crop, self.height, self.width, interpolation)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "height", "width", "interpolation")
class RandomSizedCrop (min_max_height, height, width, w2h_ratio=1.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size.

Parameters:

Name Type Description
min_max_height int, int

crop size limits.

height int

height after crop and resize.

width int

width after crop and resize.

w2h_ratio float

aspect ratio of crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedCrop(_BaseRandomSizedCrop):
    """Crop a random part of the input and rescale it to some size.

    Args:
        min_max_height ((int, int)): crop size limits.
        height (int): height after crop and resize.
        width (int): width after crop and resize.
        w2h_ratio (float): aspect ratio of crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        min_max_height: Tuple[int, int],
        height: int,
        width: int,
        w2h_ratio: float = 1.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(height=height, width=width, interpolation=interpolation, always_apply=always_apply, p=p)

        self.min_max_height = min_max_height
        self.w2h_ratio = w2h_ratio

    def get_params(self) -> Dict[str, Union[int, float]]:
        crop_height = random.randint(self.min_max_height[0], self.min_max_height[1])
        return {
            "h_start": random.random(),
            "w_start": random.random(),
            "crop_height": crop_height,
            "crop_width": int(crop_height * self.w2h_ratio),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str]:
        return "min_max_height", "height", "width", "w2h_ratio", "interpolation"

domain_adaptation

class FDA (reference_images, beta_limit=0.1, read_fn=<function read_rgb_image at 0x7f0ff19f4550>, always_apply=False, p=0.5) [view source on GitHub]

Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source and target datasets, effectively adapting images from one domain to closely resemble those from another without altering their semantic content.

This transform is particularly beneficial in scenarios where the training (source) and testing (target) images come from different distributions, such as synthetic versus real images, or day versus night scenes. Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain alignment by swapping low-frequency components of the Fourier transform between the source and target images. This technique has shown to improve the performance of models on the target domain, particularly for tasks like semantic segmentation, without additional training for domain invariance.

The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more of the original image's characteristics and higher values leading to more pronounced adaptation effects. It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

Parameters:

Name Type Description
reference_images Sequence[Any]

Sequence of objects to be converted into images by read_fn. This typically involves paths to images that serve as target domain examples for adaptation.

beta_limit float or tuple of float

Coefficient beta from the paper, controlling the swapping extent of frequency components. Values should be less than 0.5.

read_fn Callable

User-defined function for reading images. It takes an element from reference_images and returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a numpy array.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

Note

FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target domain samples are unavailable. It enables significant improvements in model generalization by aligning the low-level statistics of source and target images through a simple yet effective Fourier-based method.

Source code in albumentations/augmentations/domain_adaptation.py
Python
class FDA(ImageOnlyTransform):
    """Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation
    (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source
    and target datasets, effectively adapting images from one domain to closely resemble those from another without
    altering their semantic content.

    This transform is particularly beneficial in scenarios where the training (source) and testing (target) images
    come from different distributions, such as synthetic versus real images, or day versus night scenes.
    Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain
    alignment by swapping low-frequency components of the Fourier transform between the source and target images.
    This technique has shown to improve the performance of models on the target domain, particularly for tasks
    like semantic segmentation, without additional training for domain invariance.

    The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more
    of the original image's characteristics and higher values leading to more pronounced adaptation effects.
    It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

    Args:
        reference_images (Sequence[Any]): Sequence of objects to be converted into images by `read_fn`. This typically
            involves paths to images that serve as target domain examples for adaptation.
        beta_limit (float or tuple of float): Coefficient beta from the paper, controlling the swapping extent of
            frequency components. Values should be less than 0.5.
        read_fn (Callable): User-defined function for reading images. It takes an element from `reference_images` and
            returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a
            numpy array.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        - https://github.com/YanchaoYang/FDA
        - https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)

    Note:
        FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target
        domain samples are unavailable. It enables significant improvements in model generalization by aligning
        the low-level statistics of source and target images through a simple yet effective Fourier-based method.
    """

    def __init__(
        self,
        reference_images: Sequence[np.ndarray],
        beta_limit: ScaleFloatType = 0.1,
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        if isinstance(beta_limit, float) and not 0 <= beta_limit <= MAX_BETA_LIMIT:
            msg = "The beta_limit should be within [0, 0.5]."
            raise ValueError(msg)

        self.beta_limit = to_tuple(beta_limit, low=0)

    def apply(
        self, img: np.ndarray, target_image: Optional[np.ndarray] = None, beta: float = 0.1, **params: Any
    ) -> np.ndarray:
        return fourier_domain_adaptation(img, target_image, beta)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
        img = params["image"]
        target_img = self.read_fn(random.choice(self.reference_images))
        target_img = cv2.resize(target_img, dsize=(img.shape[1], img.shape[0]))

        return {"target_image": target_img}

    def get_params(self) -> Dict[str, float]:
        return {"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return "reference_images", "beta_limit", "read_fn"

    def to_dict_private(self) -> Dict[str, Any]:
        msg = "FDA can not be serialized."
        raise NotImplementedError(msg)

class HistogramMatching (reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f0ff19f4550>, always_apply=False, p=0.5) [view source on GitHub]

Implements histogram matching, a technique that adjusts the pixel values of an input image to match the histogram of a reference image. This adjustment ensures that the output image has a similar tone and contrast to the reference. The process is applied independently to each channel of multi-channel images, provided both the input and reference images have the same number of channels.

Histogram matching serves as an effective normalization method in image processing tasks such as feature matching. It is particularly useful when images originate from varied sources or are captured under different lighting conditions, helping to standardize the images' appearance before further processing.

Parameters:

Name Type Description
reference_images Sequence[Any]

A sequence of objects to be converted into images by read_fn. Typically, this is a sequence of image paths.

blend_ratio Tuple[float, float]

Specifies the minimum and maximum blend ratio for blending the matched image with the original image. A random blend factor within this range is chosen for each image to increase the diversity of the output images.

read_fn Callable[[Any], np.ndarray]

A user-defined function for reading images, which accepts an element from reference_images and returns a numpy array of image pixels. By default, this is expected to take a file path and return an image as a numpy array.

p float

The probability of applying the transform to any given image. Defaults to 0.5.

Targets

image

Image types: uint8, float32

Note

This class cannot be serialized directly due to its dynamic nature and dependency on external image data. An attempt to serialize it will raise a NotImplementedError.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)
Source code in albumentations/augmentations/domain_adaptation.py
Python
class HistogramMatching(ImageOnlyTransform):
    """Implements histogram matching, a technique that adjusts the pixel values of an input image
    to match the histogram of a reference image. This adjustment ensures that the output image
    has a similar tone and contrast to the reference. The process is applied independently to
    each channel of multi-channel images, provided both the input and reference images have the
    same number of channels.

    Histogram matching serves as an effective normalization method in image processing tasks such
    as feature matching. It is particularly useful when images originate from varied sources or are
    captured under different lighting conditions, helping to standardize the images' appearance
    before further processing.

    Args:
        reference_images (Sequence[Any]): A sequence of objects to be converted into images by `read_fn`.
            Typically, this is a sequence of image paths.
        blend_ratio (Tuple[float, float]): Specifies the minimum and maximum blend ratio for blending the matched
            image with the original image. A random blend factor within this range is chosen for each image to
            increase the diversity of the output images.
        read_fn (Callable[[Any], np.ndarray]): A user-defined function for reading images, which accepts an
            element from `reference_images` and returns a numpy array of image pixels. By default, this is expected
            to take a file path and return an image as a numpy array.
        p (float): The probability of applying the transform to any given image. Defaults to 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        This class cannot be serialized directly due to its dynamic nature and dependency on external image data.
        An attempt to serialize it will raise a NotImplementedError.

    Reference:
        https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)
    """

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: Tuple[float, float] = (0.5, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio

    def apply(
        self: np.ndarray,
        img: np.ndarray,
        reference_image: Optional[np.ndarray] = None,
        blend_ratio: float = 0.5,
        **params: Any,
    ) -> np.ndarray:
        return apply_histogram(img, reference_image, blend_ratio)

    def get_params(self) -> Dict[str, np.ndarray]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("reference_images", "blend_ratio", "read_fn")

    def to_dict_private(self) -> Dict[str, Any]:
        msg = "HistogramMatching can not be serialized."
        raise NotImplementedError(msg)

class PixelDistributionAdaptation (reference_images, blend_ratio=(0.25, 1.0), read_fn=<function read_rgb_image at 0x7f0ff19f4550>, transform_type='pca', always_apply=False, p=0.5) [view source on GitHub]

Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image with that of a reference image. This process involves fitting a simple statistical transformation (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images, transforming the original image with the transformation trained on it, and then applying the inverse transformation using the transform fitted on the reference image. The result is an adapted image that retains the original content while mimicking the pixel value distribution of the reference domain.

The process can be visualized as two main steps: 1. Adjusting the original image to a standard distribution space using a selected transform. 2. Moving the adjusted image into the distribution space of the reference image by applying the inverse of the transform fitted on the reference image.

This technique is especially useful in scenarios where images from different domains (e.g., synthetic vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in image processing tasks.

Parameters:

Name Type Description
reference_images Sequence[Any]

A sequence of objects (typically image paths) that will be converted into images by read_fn. These images serve as references for the domain adaptation.

blend_ratio Tuple[float, float]

Specifies the minimum and maximum blend ratio for mixing the adapted image with the original, enhancing the diversity of the output images.

read_fn Callable

A user-defined function for reading and converting the objects in reference_images into numpy arrays. By default, it assumes these objects are image paths.

transform_type str

Specifies the type of statistical transformation to apply. Supported values are "pca" for Principal Component Analysis, "standard" for StandardScaler, and "minmax" for MinMaxScaler.

p float

The probability of applying the transform to any given image. Default is 1.0.

Targets

image

Image types: uint8, float32

Reference

For more information on the underlying approach, see: https://github.com/arsenyinfo/qudida

Note

The PixelDistributionAdaptation transform is a novel way to perform domain adaptation at the pixel level, suitable for adjusting images across different conditions without complex modeling. It is effective for preparing images before more advanced processing or analysis.

Source code in albumentations/augmentations/domain_adaptation.py
Python
class PixelDistributionAdaptation(ImageOnlyTransform):
    """Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image
    with that of a reference image. This process involves fitting a simple statistical transformation
    (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images,
    transforming the original image with the transformation trained on it, and then applying the inverse
    transformation using the transform fitted on the reference image. The result is an adapted image
    that retains the original content while mimicking the pixel value distribution of the reference domain.

    The process can be visualized as two main steps:
    1. Adjusting the original image to a standard distribution space using a selected transform.
    2. Moving the adjusted image into the distribution space of the reference image by applying the inverse
       of the transform fitted on the reference image.

    This technique is especially useful in scenarios where images from different domains (e.g., synthetic
    vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in
    image processing tasks.

    Args:
        reference_images (Sequence[Any]): A sequence of objects (typically image paths) that will be
            converted into images by `read_fn`. These images serve as references for the domain adaptation.
        blend_ratio (Tuple[float, float]): Specifies the minimum and maximum blend ratio for mixing
            the adapted image with the original, enhancing the diversity of the output images.
        read_fn (Callable): A user-defined function for reading and converting the objects in
            `reference_images` into numpy arrays. By default, it assumes these objects are image paths.
        transform_type (str): Specifies the type of statistical transformation to apply. Supported values
            are "pca" for Principal Component Analysis, "standard" for StandardScaler, and "minmax" for
            MinMaxScaler.
        p (float): The probability of applying the transform to any given image. Default is 1.0.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        For more information on the underlying approach, see: https://github.com/arsenyinfo/qudida

    Note:
        The PixelDistributionAdaptation transform is a novel way to perform domain adaptation at the pixel level,
        suitable for adjusting images across different conditions without complex modeling. It is effective
        for preparing images before more advanced processing or analysis.
    """

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: Tuple[float, float] = (0.25, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        transform_type: Literal["pca", "standard", "minmax"] = "pca",
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio
        expected_transformers = ("pca", "standard", "minmax")
        if transform_type not in expected_transformers:
            raise ValueError(f"Got unexpected transform_type {transform_type}. Expected one of {expected_transformers}")
        self.transform_type = transform_type

    @staticmethod
    def _validate_shape(img: np.ndarray) -> None:
        if is_grayscale_image(img) or is_multispectral_image(img):
            raise ValueError(
                f"Unexpected image shape: expected 3 dimensions, got {len(img.shape)}."
                f"Is it a grayscale or multispectral image? It's not supported for now."
            )

    def ensure_uint8(self, img: np.ndarray) -> Tuple[np.ndarray, bool]:
        if img.dtype == np.float32:
            if img.min() < 0 or img.max() > 1:
                message = (
                    "PixelDistributionAdaptation uses uint8 under the hood, so float32 should be converted,"
                    "Can not do it automatically when the image is out of [0..1] range."
                )
                raise TypeError(message)
            return (img * 255).astype("uint8"), True
        return img, False

    def apply(self, img: np.ndarray, reference_image: np.ndarray, blend_ratio: float, **params: Any) -> np.ndarray:
        self._validate_shape(img)
        reference_image, _ = self.ensure_uint8(reference_image)
        img, needs_reconvert = self.ensure_uint8(img)

        adapted = adapt_pixel_distribution(
            img,
            ref=reference_image,
            weight=blend_ratio,
            transform_type=self.transform_type,
        )
        if needs_reconvert:
            adapted = adapted.astype("float32") * (1 / 255)
        return adapted

    def get_params(self) -> Dict[str, Any]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return "reference_images", "blend_ratio", "read_fn", "transform_type"

    def to_dict_private(self) -> Dict[str, Any]:
        msg = "PixelDistributionAdaptation can not be serialized."
        raise NotImplementedError(msg)

domain_adaptation_functional

class DomainAdapter (transformer, ref_img, color_conversions=(None, None)) [view source on GitHub]

Source: https://github.com/arsenyinfo/qudida by Arseny Kravchenko

Source code in albumentations/augmentations/domain_adaptation_functional.py
Python
class DomainAdapter:
    """Source: https://github.com/arsenyinfo/qudida by Arseny Kravchenko"""

    def __init__(
        self,
        transformer: TransformerInterface,
        ref_img: np.ndarray,
        color_conversions: Tuple[None, None] = (None, None),
    ):
        self.color_in, self.color_out = color_conversions
        self.source_transformer = deepcopy(transformer)
        self.target_transformer = transformer
        self.target_transformer.fit(self.flatten(ref_img))

    def to_colorspace(self, img: np.ndarray) -> np.ndarray:
        return img if self.color_in is None else cv2.cvtColor(img, self.color_in)

    def from_colorspace(self, img: np.ndarray) -> np.ndarray:
        if self.color_out is None:
            return img
        return cv2.cvtColor(img.astype("uint8"), self.color_out)

    def flatten(self, img: np.ndarray) -> np.ndarray:
        img = self.to_colorspace(img)
        img = img.astype("float32") / 255.0
        return img.reshape(-1, 3)

    def reconstruct(self, pixels: np.ndarray, height: int, width: int) -> np.ndarray:
        pixels = (np.clip(pixels, 0, 1) * 255).astype("uint8")
        return self.from_colorspace(pixels.reshape(height, width, 3))

    @staticmethod
    def _pca_sign(x: np.ndarray) -> np.ndarray:
        return np.sign(np.trace(x.components_))

    def __call__(self, image: np.ndarray) -> np.ndarray:
        height, width = image.shape[:2]
        pixels = self.flatten(image)
        self.source_transformer.fit(pixels)

        # dirty hack to make sure colors are not inverted
        if (
            hasattr(self.target_transformer, "components_")
            and hasattr(self.source_transformer, "components_")
            and self._pca_sign(self.target_transformer) != self._pca_sign(self.source_transformer)
        ):
            self.target_transformer.components_ *= -1

        representation = self.source_transformer.transform(pixels)
        result = self.target_transformer.inverse_transform(representation)
        return self.reconstruct(result, height, width)

dropout special

channel_dropout

class ChannelDropout (channel_drop_range=(1, 1), fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Randomly Drop Channels in the input Image.

Parameters:

Name Type Description
channel_drop_range int, int

range from which we choose the number of channels to drop.

fill_value int, float

pixel value for the dropped channel.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, uint16, unit32, float32

Source code in albumentations/augmentations/dropout/channel_dropout.py
Python
class ChannelDropout(ImageOnlyTransform):
    """Randomly Drop Channels in the input Image.

    Args:
        channel_drop_range (int, int): range from which we choose the number of channels to drop.
        fill_value (int, float): pixel value for the dropped channel.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, uint16, unit32, float32

    """

    def __init__(
        self,
        channel_drop_range: Tuple[int, int] = (1, 1),
        fill_value: float = 0,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        self.channel_drop_range = channel_drop_range

        self.min_channels = channel_drop_range[0]
        self.max_channels = channel_drop_range[1]

        if not 1 <= self.min_channels <= self.max_channels:
            raise ValueError(f"Invalid channel_drop_range. Got: {channel_drop_range}")

        self.fill_value = fill_value

    def apply(self, img: np.ndarray, channels_to_drop: Tuple[int, ...] = (0,), **params: Any) -> np.ndarray:
        return channel_dropout(img, channels_to_drop, self.fill_value)

    def get_params_dependent_on_targets(self, params: Mapping[str, Any]) -> Dict[str, Any]:
        img = params["image"]

        num_channels = img.shape[-1]

        if len(img.shape) == TWO or num_channels == 1:
            msg = "Images has one channel. ChannelDropout is not defined."
            raise NotImplementedError(msg)

        if self.max_channels >= num_channels:
            msg = "Can not drop all channels in ChannelDropout."
            raise ValueError(msg)

        num_drop_channels = random.randint(self.min_channels, self.max_channels)

        channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)

        return {"channels_to_drop": channels_to_drop}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "channel_drop_range", "fill_value"

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

coarse_dropout

class CoarseDropout (max_holes=8, max_height=8, max_width=8, min_holes=None, min_height=None, min_width=None, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5) [view source on GitHub]

CoarseDropout of the rectangular regions in the image.

Parameters:

Name Type Description
max_holes int

Maximum number of regions to zero out.

max_height int, float

Maximum height of the hole.

max_width int, float

Maximum width of the hole.

min_holes int

Minimum number of regions to zero out. If None, min_holes is be set to max_holes. Default: None.

min_height int, float

Minimum height of the hole. Default: None. If None, min_height is set to max_height. Default: None. If float, it is calculated as a fraction of the image height.

min_width int, float

Minimum width of the hole. If None, min_height is set to max_width. Default: None. If float, it is calculated as a fraction of the image width.

fill_value int, float, list of int, list of float

value for dropped pixels.

mask_fill_value int, float, list of int, list of float

fill value for dropped pixels in mask. If None - mask is not affected. Default: None.

Targets

image, mask, keypoints

Image types: uint8, float32

Reference: | https://arxiv.org/abs/1708.04552 | https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py | https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
class CoarseDropout(DualTransform):
    """CoarseDropout of the rectangular regions in the image.

    Args:
        max_holes (int): Maximum number of regions to zero out.
        max_height (int, float): Maximum height of the hole.
        If float, it is calculated as a fraction of the image height.
        max_width (int, float): Maximum width of the hole.
        If float, it is calculated as a fraction of the image width.
        min_holes (int): Minimum number of regions to zero out. If `None`,
            `min_holes` is be set to `max_holes`. Default: `None`.
        min_height (int, float): Minimum height of the hole. Default: None. If `None`,
            `min_height` is set to `max_height`. Default: `None`.
            If float, it is calculated as a fraction of the image height.
        min_width (int, float): Minimum width of the hole. If `None`, `min_height` is
            set to `max_width`. Default: `None`.
            If float, it is calculated as a fraction of the image width.

        fill_value (int, float, list of int, list of float): value for dropped pixels.
        mask_fill_value (int, float, list of int, list of float): fill value for dropped pixels
            in mask. If `None` - mask is not affected. Default: `None`.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Reference:
    |  https://arxiv.org/abs/1708.04552
    |  https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
    |  https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    def __init__(
        self,
        max_holes: int = 8,
        max_height: int = 8,
        max_width: int = 8,
        min_holes: Optional[int] = None,
        min_height: Optional[int] = None,
        min_width: Optional[int] = None,
        fill_value: int = 0,
        mask_fill_value: Optional[int] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.max_holes = max_holes
        self.max_height = max_height
        self.max_width = max_width
        self.min_holes = min_holes if min_holes is not None else max_holes
        self.min_height = min_height if min_height is not None else max_height
        self.min_width = min_width if min_width is not None else max_width
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value
        if not 0 < self.min_holes <= self.max_holes:
            raise ValueError(f"Invalid combination of min_holes and max_holes. Got: {[min_holes, max_holes]}")

        self.check_range(self.max_height)
        self.check_range(self.min_height)
        self.check_range(self.max_width)
        self.check_range(self.min_width)

        if not 0 < self.min_height <= self.max_height:
            raise ValueError(f"Invalid combination of min_height and max_height. Got: {[min_height, max_height]}")
        if not 0 < self.min_width <= self.max_width:
            raise ValueError(f"Invalid combination of min_width and max_width. Got: {[min_width, max_width]}")

    @staticmethod
    def check_range(dimension: ScalarType) -> None:
        if isinstance(dimension, float) and not 0 <= dimension < 1.0:
            raise ValueError(f"Invalid value {dimension}. If using floats, the value should be in the range [0.0, 1.0)")

    def apply(
        self,
        img: np.ndarray,
        fill_value: ScalarType = 0,
        holes: Iterable[Tuple[int, int, int, int]] = (),
        **params: Any,
    ) -> np.ndarray:
        return cutout(img, holes, fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        mask_fill_value: ScalarType = 0,
        holes: Iterable[Tuple[int, int, int, int]] = (),
        **params: Any,
    ) -> np.ndarray:
        if mask_fill_value is None:
            return mask
        return cutout(mask, holes, mask_fill_value)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        height, width = img.shape[:2]

        holes = []
        for _ in range(random.randint(self.min_holes, self.max_holes)):
            if all(
                [
                    isinstance(self.min_height, int),
                    isinstance(self.min_width, int),
                    isinstance(self.max_height, int),
                    isinstance(self.max_width, int),
                ]
            ):
                hole_height = random.randint(self.min_height, self.max_height)
                hole_width = random.randint(self.min_width, self.max_width)
            elif all(
                [
                    isinstance(self.min_height, float),
                    isinstance(self.min_width, float),
                    isinstance(self.max_height, float),
                    isinstance(self.max_width, float),
                ]
            ):
                hole_height = int(height * random.uniform(self.min_height, self.max_height))
                hole_width = int(width * random.uniform(self.min_width, self.max_width))
            else:
                msg = "Min width, max width, \
                    min height and max height \
                    should all either be ints or floats. \
                    Got: {} respectively".format(
                    [
                        type(self.min_width),
                        type(self.max_width),
                        type(self.min_height),
                        type(self.max_height),
                    ]
                )
                raise ValueError(msg)

            y1 = random.randint(0, height - hole_height)
            x1 = random.randint(0, width - hole_width)
            y2 = y1 + hole_height
            x2 = x1 + hole_width
            holes.append((x1, y1, x2, y2))

        return {"holes": holes}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def apply_to_keypoints(
        self, keypoints: Sequence[KeypointType], holes: Iterable[Tuple[int, int, int, int]] = (), **params: Any
    ) -> List[KeypointType]:
        return [keypoint for keypoint in keypoints if not any(keypoint_in_hole(keypoint, hole) for hole in holes)]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "max_holes",
            "max_height",
            "max_width",
            "min_holes",
            "min_height",
            "min_width",
            "fill_value",
            "mask_fill_value",
        )

grid_dropout

class GridDropout (ratio=0.5, unit_size_min=None, unit_size_max=None, holes_number_x=None, holes_number_y=None, shift_x=0, shift_y=0, random_offset=False, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5) [view source on GitHub]

GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

Parameters:

Name Type Description
ratio float

the ratio of the mask holes to the unit_size (same for horizontal and vertical directions). Must be between 0 and 1. Default: 0.5.

unit_size_min int

minimum size of the grid unit. Must be between 2 and the image shorter edge. If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: None.

unit_size_max int

maximum size of the grid unit. Must be between 2 and the image shorter edge. If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: None.

holes_number_x int

the number of grid units in x direction. Must be between 1 and image width//2. If 'None', grid unit width is set as image_width//10. Default: None.

holes_number_y int

the number of grid units in y direction. Must be between 1 and image height//2. If None, grid unit height is set equal to the grid unit width or image height, whatever is smaller.

shift_x int

offsets of the grid start in x direction from (0,0) coordinate. Clipped between 0 and grid unit_width - hole_width. Default: 0.

shift_y int

offsets of the grid start in y direction from (0,0) coordinate. Clipped between 0 and grid unit height - hole_height. Default: 0.

random_offset boolean

weather to offset the grid randomly between 0 and grid unit size - hole size If 'True', entered shift_x, shift_y are ignored and set randomly. Default: False.

fill_value int

value for the dropped pixels. Default = 0

mask_fill_value int

value for the dropped pixels in mask. If None, transformation is not applied to the mask. Default: None.

Targets

image, mask

Image types: uint8, float32

Source code in albumentations/augmentations/dropout/grid_dropout.py
Python
class GridDropout(DualTransform):
    """GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

    Args:
        ratio: the ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
            Must be between 0 and 1. Default: 0.5.
        unit_size_min (int): minimum size of the grid unit. Must be between 2 and the image shorter edge.
            If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
        unit_size_max (int): maximum size of the grid unit. Must be between 2 and the image shorter edge.
            If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
        holes_number_x (int): the number of grid units in x direction. Must be between 1 and image width//2.
            If 'None', grid unit width is set as image_width//10. Default: `None`.
        holes_number_y (int): the number of grid units in y direction. Must be between 1 and image height//2.
            If `None`, grid unit height is set equal to the grid unit width or image height, whatever is smaller.
        shift_x (int): offsets of the grid start in x direction from (0,0) coordinate.
            Clipped between 0 and grid unit_width - hole_width. Default: 0.
        shift_y (int): offsets of the grid start in y direction from (0,0) coordinate.
            Clipped between 0 and grid unit height - hole_height. Default: 0.
        random_offset (boolean): weather to offset the grid randomly between 0 and grid unit size - hole size
            If 'True', entered shift_x, shift_y are ignored and set randomly. Default: `False`.
        fill_value (int): value for the dropped pixels. Default = 0
        mask_fill_value (int): value for the dropped pixels in mask.
            If `None`, transformation is not applied to the mask. Default: `None`.

    Targets:
        image, mask

    Image types:
        uint8, float32

    References:
        https://arxiv.org/abs/2001.04086

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(
        self,
        ratio: float = 0.5,
        unit_size_min: Optional[int] = None,
        unit_size_max: Optional[int] = None,
        holes_number_x: Optional[int] = None,
        holes_number_y: Optional[int] = None,
        shift_x: int = 0,
        shift_y: int = 0,
        random_offset: bool = False,
        fill_value: int = 0,
        mask_fill_value: Optional[ScalarType] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.ratio = ratio
        self.unit_size_min = unit_size_min
        self.unit_size_max = unit_size_max
        self.holes_number_x = holes_number_x
        self.holes_number_y = holes_number_y
        self.shift_x = shift_x
        self.shift_y = shift_y
        self.random_offset = random_offset
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value
        if not 0 < self.ratio <= 1:
            msg = "ratio must be between 0 and 1."
            raise ValueError(msg)

    def apply(self, img: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params: Any) -> np.ndarray:
        return F.cutout(img, holes, self.fill_value)

    def apply_to_mask(
        self, mask: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params: Any
    ) -> np.ndarray:
        if self.mask_fill_value is None:
            return mask

        return F.cutout(mask, holes, self.mask_fill_value)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        height, width = img.shape[:2]
        unit_width, unit_height = self._calculate_unit_dimensions(width, height)
        hole_width, hole_height = self._calculate_hole_dimensions(unit_width, unit_height)
        shift_x, shift_y = self._calculate_shifts(unit_width, unit_height, hole_width, hole_height)
        holes = self._generate_holes(width, height, unit_width, unit_height, hole_width, hole_height, shift_x, shift_y)
        return {"holes": holes}

    def _calculate_unit_dimensions(self, width: int, height: int) -> Tuple[int, int]:
        """Calculates the dimensions of the grid units."""
        if self.unit_size_min is not None and self.unit_size_max is not None:
            self._validate_unit_sizes(height, width)
            unit_size = random.randint(self.unit_size_min, self.unit_size_max)
            return unit_size, unit_size

        return self._calculate_dimensions_based_on_holes(width, height)

    def _validate_unit_sizes(self, height: int, width: int) -> None:
        """Validates the minimum and maximum unit sizes."""
        if self.unit_size_min is not None and self.unit_size_max is not None:
            if not TWO <= self.unit_size_min <= self.unit_size_max:
                msg = "Max unit size should be >= min size, both at least 2 pixels."
                raise ValueError(msg)
            if self.unit_size_max > min(height, width):
                msg = "Grid size limits must be within the shortest image edge."
                raise ValueError(msg)
        else:
            msg = "unit_size_min and unit_size_max must not be None."
            raise ValueError(msg)

    def _calculate_dimensions_based_on_holes(self, width: int, height: int) -> Tuple[int, int]:
        """Calculates dimensions based on the number of holes specified."""
        unit_width = self._calculate_dimension(width, self.holes_number_x, 10)
        unit_height = self._calculate_dimension(height, self.holes_number_y, unit_width)
        return unit_width, unit_height

    def _calculate_dimension(self, dimension: int, holes_number: Optional[int], fallback: int) -> int:
        """Helper function to calculate unit width or height."""
        if holes_number is None:
            return max(2, dimension // fallback)

        if not 1 <= holes_number <= dimension // 2:
            raise ValueError(f"The number of holes must be between 1 and {dimension // 2}.")
        return dimension // holes_number

    def _calculate_hole_dimensions(self, unit_width: int, unit_height: int) -> Tuple[int, int]:
        """Calculates the dimensions of the holes to be dropped out."""
        hole_width = int(unit_width * self.ratio)
        hole_height = int(unit_height * self.ratio)
        hole_width = min(max(hole_width, 1), unit_width - 1)
        hole_height = min(max(hole_height, 1), unit_height - 1)
        return hole_width, hole_height

    def _calculate_shifts(
        self, unit_width: int, unit_height: int, hole_width: int, hole_height: int
    ) -> Tuple[int, int]:
        """Calculates the shifts for the grid start."""
        if self.random_offset:
            shift_x = random.randint(0, unit_width - hole_width)
            shift_y = random.randint(0, unit_height - hole_height)
        else:
            shift_x = 0 if self.shift_x is None else min(max(0, self.shift_x), unit_width - hole_width)
            shift_y = 0 if self.shift_y is None else min(max(0, self.shift_y), unit_height - hole_height)
        return shift_x, shift_y

    def _generate_holes(
        self,
        width: int,
        height: int,
        unit_width: int,
        unit_height: int,
        hole_width: int,
        hole_height: int,
        shift_x: int,
        shift_y: int,
    ) -> List[Tuple[int, int, int, int]]:
        """Generates the list of holes to be dropped out."""
        holes = []
        for i in range(width // unit_width + 1):
            for j in range(height // unit_height + 1):
                x1 = min(shift_x + unit_width * i, width)
                y1 = min(shift_y + unit_height * j, height)
                x2 = min(x1 + hole_width, width)
                y2 = min(y1 + hole_height, height)
                holes.append((x1, y1, x2, y2))
        return holes

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "ratio",
            "unit_size_min",
            "unit_size_max",
            "holes_number_x",
            "holes_number_y",
            "shift_x",
            "shift_y",
            "random_offset",
            "fill_value",
            "mask_fill_value",
        )

mask_dropout

class MaskDropout (max_objects=1, image_fill_value=0, mask_fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Image & mask augmentation that zero out mask and image regions corresponding to randomly chosen object instance from mask.

Mask must be single-channel image, zero values treated as background. Image can be any number of channels.

Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254

Parameters:

Name Type Description
max_objects int

Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]

image_fill_value Union[float, str]

Fill value to use when filling image. Can be 'inpaint' to apply inpainting (works only for 3-channel images)

mask_fill_value Union[int, float]

Fill value to use when filling mask.

Targets

image, mask

Image types: uint8, float32

Source code in albumentations/augmentations/dropout/mask_dropout.py
Python
class MaskDropout(DualTransform):
    """Image & mask augmentation that zero out mask and image regions corresponding
    to randomly chosen object instance from mask.

    Mask must be single-channel image, zero values treated as background.
    Image can be any number of channels.

    Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254

    Args:
        max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
        image_fill_value: Fill value to use when filling image.
            Can be 'inpaint' to apply inpainting (works only  for 3-channel images)
        mask_fill_value: Fill value to use when filling mask.

    Targets:
        image, mask

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(
        self,
        max_objects: int = 1,
        image_fill_value: Union[float, str] = 0,
        mask_fill_value: ScalarType = 0,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.max_objects = to_tuple(max_objects, 1)
        self.image_fill_value = image_fill_value
        self.mask_fill_value = mask_fill_value

    @property
    def targets_as_params(self) -> List[str]:
        return ["mask"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        mask = params["mask"]

        label_image, num_labels = label(mask, return_num=True)

        if num_labels == 0:
            dropout_mask = None
        else:
            objects_to_drop = random.randint(int(self.max_objects[0]), int(self.max_objects[1]))
            objects_to_drop = min(num_labels, objects_to_drop)

            if objects_to_drop == num_labels:
                dropout_mask = mask > 0
            else:
                labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
                dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=bool)
                for label_index in labels_index:
                    dropout_mask |= label_image == label_index

        params.update({"dropout_mask": dropout_mask})
        del params["mask"]
        return params

    def apply(self, img: np.ndarray, dropout_mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        if dropout_mask is None:
            return img

        if self.image_fill_value == "inpaint":
            dropout_mask = dropout_mask.astype(np.uint8)
            _, _, width, height = cv2.boundingRect(dropout_mask)
            radius = min(3, max(width, height) // 2)
            return cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)

        img = img.copy()
        img[dropout_mask] = self.image_fill_value

        return img

    def apply_to_mask(self, mask: np.ndarray, dropout_mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        if dropout_mask is None:
            return mask

        mask = mask.copy()
        mask[dropout_mask] = self.mask_fill_value
        return mask

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "max_objects", "image_fill_value", "mask_fill_value"

xy_masking

class XYMasking (num_masks_x=0, num_masks_y=0, mask_x_length=0, mask_y_length=0, fill_value=0, mask_fill_value=0, always_apply=False, p=0.5) [view source on GitHub]

Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis), simulating occlusions. This transform is useful for training models to recognize images with varied visibility conditions. It's particularly effective for spectrogram images, allowing spectral and frequency masking to improve model robustness.

At least one of max_x_length or max_y_length must be specified, dictating the mask's maximum size along each axis.

Parameters:

Name Type Description
num_masks_x Union[int, Tuple[int, int]]

Number or range of horizontal regions to mask. Defaults to 0.

num_masks_y Union[int, Tuple[int, int]]

Number or range of vertical regions to mask. Defaults to 0.

mask_x_length [Union[int, Tuple[int, int]]

Specifies the length of the masks along the X (horizontal) axis. If an integer is provided, it sets a fixed mask length. If a tuple of two integers (min, max) is provided, the mask length is randomly chosen within this range for each mask. This allows for variable-length masks in the horizontal direction.

mask_y_length Union[int, Tuple[int, int]]

Specifies the height of the masks along the Y (vertical) axis. Similar to mask_x_length, an integer sets a fixed mask height, while a tuple (min, max) allows for variable-height masks, chosen randomly within the specified range for each mask. This flexibility facilitates creating masks of various sizes in the vertical direction.

fill_value Union[int, float, List[int], List[float]]

Value to fill image masks. Defaults to 0.

mask_fill_value Optional[Union[int, float, List[int], List[float]]]

Value to fill masks in the mask. If None, uses mask is not affected. Default: None.

p float

Probability of applying the transform. Defaults to 0.5.

Targets

image, mask, keypoints

Image types: uint8, float32

Note: Either max_x_length or max_y_length or both must be defined.

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
class XYMasking(DualTransform):
    """Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis),
    simulating occlusions. This transform is useful for training models to recognize images
    with varied visibility conditions. It's particularly effective for spectrogram images,
    allowing spectral and frequency masking to improve model robustness.

    At least one of `max_x_length` or `max_y_length` must be specified, dictating the mask's
    maximum size along each axis.

    Args:
        num_masks_x (Union[int, Tuple[int, int]]): Number or range of horizontal regions to mask. Defaults to 0.
        num_masks_y (Union[int, Tuple[int, int]]): Number or range of vertical regions to mask. Defaults to 0.
        mask_x_length ([Union[int, Tuple[int, int]]): Specifies the length of the masks along
            the X (horizontal) axis. If an integer is provided, it sets a fixed mask length.
            If a tuple of two integers (min, max) is provided,
            the mask length is randomly chosen within this range for each mask.
            This allows for variable-length masks in the horizontal direction.
        mask_y_length (Union[int, Tuple[int, int]]): Specifies the height of the masks along
            the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height,
            while a tuple (min, max) allows for variable-height masks, chosen randomly
            within the specified range for each mask. This flexibility facilitates creating masks of various
            sizes in the vertical direction.
        fill_value (Union[int, float, List[int], List[float]]): Value to fill image masks. Defaults to 0.
        mask_fill_value (Optional[Union[int, float, List[int], List[float]]]): Value to fill masks in the mask.
            If `None`, uses mask is not affected. Default: `None`.
        p (float): Probability of applying the transform. Defaults to 0.5.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Note: Either `max_x_length` or `max_y_length` or both must be defined.

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    def __init__(
        self,
        num_masks_x: ScaleIntType = 0,
        num_masks_y: ScaleIntType = 0,
        mask_x_length: ScaleIntType = 0,
        mask_y_length: ScaleIntType = 0,
        fill_value: ColorType = 0,
        mask_fill_value: ColorType = 0,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        if (
            isinstance(mask_x_length, (int, float))
            and mask_x_length <= 0
            and isinstance(mask_y_length, (int, float))
            and mask_y_length <= 0
        ):
            msg = "At least one of `mask_x_length` or `mask_y_length` Should be a positive number."
            raise ValueError(msg)

        if isinstance(num_masks_x, int) and num_masks_x <= 0 and isinstance(num_masks_y, int) and num_masks_y <= 0:
            msg = (
                "At least one of `num_masks_x` or `num_masks_y` "
                "should be a positive number or tuple of two positive numbers."
            )
            raise ValueError(msg)

        if isinstance(num_masks_x, (tuple, list)) and min(num_masks_x) <= 0:
            msg = "All values in `num_masks_x` should be non negative integers."
            raise ValueError(msg)

        if isinstance(num_masks_y, (tuple, list)) and min(num_masks_y) <= 0:
            msg = "All values in `num_masks_y` should be non negative integers."
            raise ValueError(msg)

        self.num_masks_x = num_masks_x
        self.num_masks_y = num_masks_y

        self.mask_x_length = mask_x_length
        self.mask_y_length = mask_y_length
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value

    def apply(
        self,
        img: np.ndarray,
        masks_x: List[Tuple[int, int, int, int]],
        masks_y: List[Tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        return cutout(img, masks_x + masks_y, self.fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        masks_x: List[Tuple[int, int, int, int]],
        masks_y: List[Tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if self.mask_fill_value is None:
            return mask
        return cutout(mask, masks_x + masks_y, self.mask_fill_value)

    def validate_mask_length(
        self, mask_length: Optional[ScaleIntType], dimension_size: int, dimension_name: str
    ) -> None:
        """Validate the mask length against the corresponding image dimension size.

        Args:
            mask_length (Optional[Union[int, Tuple[int, int]]]): The length of the mask to be validated.
            dimension_size (int): The size of the image dimension (width or height)
                against which to validate the mask length.
            dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

        """
        if mask_length is not None:
            if isinstance(mask_length, (tuple, list)):
                if mask_length[0] < 0 or mask_length[1] > dimension_size:
                    raise ValueError(
                        f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]"
                    )
            elif mask_length < 0 or mask_length > dimension_size:
                raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, List[Tuple[int, int, int, int]]]:
        img = params["image"]
        height, width = img.shape[:2]

        # Use the helper method to validate mask lengths against image dimensions
        self.validate_mask_length(self.mask_x_length, width, "mask_x_length")
        self.validate_mask_length(self.mask_y_length, height, "mask_y_length")

        masks_x = self.generate_masks(self.num_masks_x, width, height, self.mask_x_length, axis="x")
        masks_y = self.generate_masks(self.num_masks_y, width, height, self.mask_y_length, axis="y")

        return {"masks_x": masks_x, "masks_y": masks_y}

    @staticmethod
    def generate_mask_size(mask_length: Union[ScaleIntType]) -> int:
        if isinstance(mask_length, int):
            return mask_length  # Use fixed size or adjust to dimension size

        return random.randint(min(mask_length), max(mask_length))

    def generate_masks(
        self,
        num_masks: ScaleIntType,
        width: int,
        height: int,
        max_length: Optional[ScaleIntType],
        axis: str,
    ) -> List[Tuple[int, int, int, int]]:
        if max_length is None or max_length == 0 or isinstance(num_masks, (int, float)) and num_masks == 0:
            return []

        masks = []

        num_masks_integer = num_masks if isinstance(num_masks, int) else random.randint(num_masks[0], num_masks[1])

        for _ in range(num_masks_integer):
            length = self.generate_mask_size(max_length)

            if axis == "x":
                x1 = random.randint(0, width - length)
                y1 = 0
                x2, y2 = x1 + length, height
            else:  # axis == 'y'
                y1 = random.randint(0, height - length)
                x1 = 0
                x2, y2 = width, y1 + length

            masks.append((x1, y1, x2, y2))
        return masks

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def apply_to_keypoints(
        self,
        keypoints: Sequence[KeypointType],
        masks_x: List[Tuple[int, int, int, int]],
        masks_y: List[Tuple[int, int, int, int]],
        **params: Any,
    ) -> List[KeypointType]:
        return [
            keypoint
            for keypoint in keypoints
            if not any(keypoint_in_hole(keypoint, hole) for hole in masks_x + masks_y)
        ]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "num_masks_x",
            "num_masks_y",
            "mask_x_length",
            "mask_y_length",
            "fill_value",
            "mask_fill_value",
        )
validate_mask_length (self, mask_length, dimension_size, dimension_name)

Validate the mask length against the corresponding image dimension size.

Parameters:

Name Type Description
mask_length Optional[Union[int, Tuple[int, int]]]

The length of the mask to be validated.

dimension_size int

The size of the image dimension (width or height) against which to validate the mask length.

dimension_name str

The name of the dimension ('width' or 'height') for error messaging.

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
def validate_mask_length(
    self, mask_length: Optional[ScaleIntType], dimension_size: int, dimension_name: str
) -> None:
    """Validate the mask length against the corresponding image dimension size.

    Args:
        mask_length (Optional[Union[int, Tuple[int, int]]]): The length of the mask to be validated.
        dimension_size (int): The size of the image dimension (width or height)
            against which to validate the mask length.
        dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

    """
    if mask_length is not None:
        if isinstance(mask_length, (tuple, list)):
            if mask_length[0] < 0 or mask_length[1] > dimension_size:
                raise ValueError(
                    f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]"
                )
        elif mask_length < 0 or mask_length > dimension_size:
            raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")

functional

def add_fog (img, fog_coef, alpha_coef, haze_list) [view source on GitHub]

Add fog to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img ndarray

Image.

fog_coef float

Fog coefficient.

alpha_coef float

Alpha coefficient.

haze_list List[Tuple[int, int]]

Returns:

Type Description
ndarray

Image.

Source code in albumentations/augmentations/functional.py
Python
@preserve_shape
def add_fog(img: np.ndarray, fog_coef: float, alpha_coef: float, haze_list: List[Tuple[int, int]]) -> np.ndarray:
    """Add fog to the image.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        img: Image.
        fog_coef: Fog coefficient.
        alpha_coef: Alpha coefficient.
        haze_list:

    Returns:
        Image.

    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for RandomFog augmentation")

    width = img.shape[1]

    hw = max(int(width // 3 * fog_coef), 10)

    for haze_points in haze_list:
        x, y = haze_points
        overlay = img.copy()
        output = img.copy()
        alpha = alpha_coef * fog_coef
        rad = hw // 2
        point = (x + hw // 2, y + hw // 2)
        cv2.circle(overlay, point, int(rad), (255, 255, 255), -1)
        cv2.addWeighted(overlay, alpha, output, 1 - alpha, 0, output)

        img = output.copy()

    image_rgb = cv2.blur(img, (hw // 10, hw // 10))

    if needs_float:
        image_rgb = to_float(image_rgb, max_value=255)

    return image_rgb

def add_gravel (img, gravels) [view source on GitHub]

Add gravel to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray

image to add gravel to

gravels list

list of gravel parameters. (float, float, float, float): (top-left x, top-left y, bottom-right x, bottom right y)

Returns:

Type Description
numpy.ndarray
Source code in albumentations/augmentations/functional.py
Python
@ensure_contiguous
@preserve_shape
def add_gravel(img: np.ndarray, gravels: List[Any]) -> np.ndarray:
    """Add gravel to the image.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        img (numpy.ndarray): image to add gravel to
        gravels (list): list of gravel parameters. (float, float, float, float):
            (top-left x, top-left y, bottom-right x, bottom right y)

    Returns:
        numpy.ndarray:

    """
    non_rgb_warning(img)
    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for AddGravel augmentation")

    image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)

    for gravel in gravels:
        y1, y2, x1, x2, sat = gravel
        image_hls[x1:x2, y1:y2, 1] = sat

    image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)

    if needs_float:
        image_rgb = to_float(image_rgb, max_value=255)

    return image_rgb

def add_rain (img, slant, drop_length, drop_width, drop_color, blur_value, brightness_coefficient, rain_drops) [view source on GitHub]

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img ndarray

Image.

slant int
drop_length int
drop_width int
drop_color Tuple[int, int, int]
blur_value int

Rainy view are blurry.

brightness_coefficient float

Rainy days are usually shady.

rain_drops List[Tuple[int, int]]

Returns:

Type Description
ndarray

Image

Source code in albumentations/augmentations/functional.py
Python
@preserve_shape
def add_rain(
    img: np.ndarray,
    slant: int,
    drop_length: int,
    drop_width: int,
    drop_color: Tuple[int, int, int],
    blur_value: int,
    brightness_coefficient: float,
    rain_drops: List[Tuple[int, int]],
) -> np.ndarray:
    """From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        img: Image.
        slant:
        drop_length:
        drop_width:
        drop_color:
        blur_value: Rainy view are blurry.
        brightness_coefficient: Rainy days are usually shady.
        rain_drops:

    Returns:
        Image

    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for RandomRain augmentation")

    image = img.copy()

    for rain_drop_x0, rain_drop_y0 in rain_drops:
        rain_drop_x1 = rain_drop_x0 + slant
        rain_drop_y1 = rain_drop_y0 + drop_length

        cv2.line(
            image,
            (rain_drop_x0, rain_drop_y0),
            (rain_drop_x1, rain_drop_y1),
            drop_color,
            drop_width,
        )

    image = cv2.blur(image, (blur_value, blur_value))  # rainy view are blurry
    image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype(np.float32)
    image_hsv[:, :, 2] *= brightness_coefficient

    image_rgb = cv2.cvtColor(image_hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)

    if needs_float:
        return to_float(image_rgb, max_value=255)

    return image_rgb

def add_shadow (img, vertices_list) [view source on GitHub]

Add shadows to the image.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray
vertices_list list

Returns:

Type Description
numpy.ndarray
Source code in albumentations/augmentations/functional.py
Python
@ensure_contiguous
@preserve_shape
def add_shadow(img: np.ndarray, vertices_list: List[List[Tuple[int, int]]]) -> np.ndarray:
    """Add shadows to the image.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        img (numpy.ndarray):
        vertices_list (list):

    Returns:
        numpy.ndarray:

    """
    non_rgb_warning(img)
    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for RandomShadow augmentation")

    image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
    mask = np.zeros_like(img)

    # adding all shadow polygons on empty mask, single 255 denotes only red channel
    for vertices in vertices_list:
        cv2.fillPoly(mask, vertices, 255)

    # if red channel is hot, image's "Lightness" channel's brightness is lowered
    red_max_value_ind = mask[:, :, 0] == MAX_VALUES_BY_DTYPE[np.dtype("uint8")]
    image_hls[:, :, 1][red_max_value_ind] = image_hls[:, :, 1][red_max_value_ind] * 0.5

    image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)

    if needs_float:
        return to_float(image_rgb, max_value=255)

    return image_rgb

def add_snow (img, snow_point, brightness_coeff) [view source on GitHub]

Bleaches out pixels, imitation snow.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img ndarray

Image.

snow_point float

Number of show points.

brightness_coeff float

Brightness coefficient.

Returns:

Type Description
ndarray

Image.

Source code in albumentations/augmentations/functional.py
Python
@preserve_shape
def add_snow(img: np.ndarray, snow_point: float, brightness_coeff: float) -> np.ndarray:
    """Bleaches out pixels, imitation snow.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        img: Image.
        snow_point: Number of show points.
        brightness_coeff: Brightness coefficient.

    Returns:
        Image.

    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    snow_point *= 127.5  # = 255 / 2
    snow_point += 85  # = 255 / 3

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for RandomSnow augmentation")

    image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
    image_hls = np.array(image_hls, dtype=np.float32)

    image_hls[:, :, 1][image_hls[:, :, 1] < snow_point] *= brightness_coeff

    image_hls[:, :, 1] = clip(image_hls[:, :, 1], np.uint8, 255)

    image_hls = np.array(image_hls, dtype=np.uint8)

    image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)

    if needs_float:
        image_rgb = to_float(image_rgb, max_value=255)

    return image_rgb

def add_sun_flare (img, flare_center_x, flare_center_y, src_radius, src_color, circles) [view source on GitHub]

Add sun flare.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
img numpy.ndarray
flare_center_x float
flare_center_y float
src_radius int
src_color int, int, int
circles list

Returns:

Type Description
numpy.ndarray
Source code in albumentations/augmentations/functional.py
Python
@preserve_shape
def add_sun_flare(
    img: np.ndarray,
    flare_center_x: float,
    flare_center_y: float,
    src_radius: int,
    src_color: ColorType,
    circles: List[Any],
) -> np.ndarray:
    """Add sun flare.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        img (numpy.ndarray):
        flare_center_x (float):
        flare_center_y (float):
        src_radius:
        src_color (int, int, int):
        circles (list):

    Returns:
        numpy.ndarray:

    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for RandomSunFlareaugmentation")

    overlay = img.copy()
    output = img.copy()

    for alpha, (x, y), rad3, (r_color, g_color, b_color) in circles:
        cv2.circle(overlay, (x, y), rad3, (r_color, g_color, b_color), -1)

        cv2.addWeighted(overlay, alpha, output, 1 - alpha, 0, output)

    point = (int(flare_center_x), int(flare_center_y))

    overlay = output.copy()
    num_times = src_radius // 10
    alpha = np.linspace(0.0, 1, num=num_times)
    rad = np.linspace(1, src_radius, num=num_times)
    for i in range(num_times):
        cv2.circle(overlay, point, int(rad[i]), src_color, -1)
        alp = alpha[num_times - i - 1] * alpha[num_times - i - 1] * alpha[num_times - i - 1]
        cv2.addWeighted(overlay, alp, output, 1 - alp, 0, output)

    image_rgb = output

    if needs_float:
        image_rgb = to_float(image_rgb, max_value=255)

    return image_rgb

def bbox_from_mask (mask) [view source on GitHub]

Create bounding box from binary mask (fast version)

Parameters:

Name Type Description
mask numpy.ndarray

binary mask.

Returns:

Type Description
tuple

A bounding box tuple (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/functional.py
Python
def bbox_from_mask(mask: np.ndarray) -> Tuple[int, int, int, int]:
    """Create bounding box from binary mask (fast version)

    Args:
        mask (numpy.ndarray): binary mask.

    Returns:
        tuple: A bounding box tuple `(x_min, y_min, x_max, y_max)`.

    """
    rows = np.any(mask, axis=1)
    if not rows.any():
        return -1, -1, -1, -1
    cols = np.any(mask, axis=0)
    y_min, y_max = np.where(rows)[0][[0, -1]]
    x_min, x_max = np.where(cols)[0][[0, -1]]
    return x_min, y_min, x_max + 1, y_max + 1

def fancy_pca (img, alpha=0.1) [view source on GitHub]

Perform 'Fancy PCA' augmentation from: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Parameters:

Name Type Description
img ndarray

numpy array with (h, w, rgb) shape, as ints between 0-255

alpha float

how much to perturb/scale the eigen vecs and vals the paper used std=0.1

Returns:

Type Description
ndarray

numpy image-like array as uint8 range(0, 255)

Source code in albumentations/augmentations/functional.py
Python
def fancy_pca(img: np.ndarray, alpha: float = 0.1) -> np.ndarray:
    """Perform 'Fancy PCA' augmentation from:
    http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

    Args:
        img: numpy array with (h, w, rgb) shape, as ints between 0-255
        alpha: how much to perturb/scale the eigen vecs and vals
                the paper used std=0.1

    Returns:
        numpy image-like array as uint8 range(0, 255)

    """
    if not is_rgb_image(img) or img.dtype != np.uint8:
        msg = "Image must be RGB image in uint8 format."
        raise TypeError(msg)

    orig_img = img.astype(float).copy()

    img = img / 255.0  # rescale to 0 to 1 range

    # flatten image to columns of RGB
    img_rs = img.reshape(-1, 3)
    # img_rs shape (640000, 3)

    # center mean
    img_centered = img_rs - np.mean(img_rs, axis=0)

    # paper says 3x3 covariance matrix
    img_cov = np.cov(img_centered, rowvar=False)

    # eigen values and eigen vectors
    eig_vals, eig_vecs = np.linalg.eigh(img_cov)

    # sort values and vector
    sort_perm = eig_vals[::-1].argsort()
    eig_vals[::-1].sort()
    eig_vecs = eig_vecs[:, sort_perm]

    # > get [p1, p2, p3]
    m1 = np.column_stack(eig_vecs)

    # get 3x1 matrix of eigen values multiplied by random variable draw from normal
    # distribution with mean of 0 and standard deviation of 0.1
    m2 = np.zeros((3, 1))
    # according to the paper alpha should only be draw once per augmentation (not once per channel)
    # > alpha = np.random.normal(0, alpha_std)

    # broad cast to speed things up
    m2[:, 0] = alpha * eig_vals[:]

    # this is the vector that we're going to add to each pixel in a moment
    add_vect = np.array(m1) @ np.array(m2)

    for idx in range(3):  # RGB
        orig_img[..., idx] += add_vect[idx] * 255

    # for image processing it was found that working with float 0.0 to 1.0
    # was easier than integers between 0-255
    # > orig_img /= 255.0
    orig_img = np.clip(orig_img, 0.0, 255.0)

    # > orig_img *= 255
    return orig_img.astype(np.uint8)

def iso_noise (image, color_shift=0.05, intensity=0.5, random_state=None, ** kwargs) [view source on GitHub]

Apply poisson noise to image to simulate camera sensor noise.

Parameters:

Name Type Description
image numpy.ndarray

Input image, currently, only RGB, uint8 images are supported.

color_shift float
intensity float

Multiplication factor for noise values. Values of ~0.5 are produce noticeable, yet acceptable level of noise.

random_state Optional[int]
**kwargs Any

Returns:

Type Description
numpy.ndarray

Noised image

Source code in albumentations/augmentations/functional.py
Python
@clipped
def iso_noise(
    image: np.ndarray,
    color_shift: float = 0.05,
    intensity: float = 0.5,
    random_state: Optional[int] = None,
    **kwargs: Any,
) -> np.ndarray:
    """Apply poisson noise to image to simulate camera sensor noise.

    Args:
        image (numpy.ndarray): Input image, currently, only RGB, uint8 images are supported.
        color_shift (float):
        intensity (float): Multiplication factor for noise values. Values of ~0.5 are produce noticeable,
                   yet acceptable level of noise.
        random_state:
        **kwargs:

    Returns:
        numpy.ndarray: Noised image

    """
    if image.dtype != np.uint8:
        msg = "Image must have uint8 channel type"
        raise TypeError(msg)
    if not is_rgb_image(image):
        msg = "Image must be RGB"
        raise TypeError(msg)

    one_over_255 = float(1.0 / 255.0)
    image = np.multiply(image, one_over_255, dtype=np.float32)
    hls = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
    _, stddev = cv2.meanStdDev(hls)

    luminance_noise = random_utils.poisson(stddev[1] * intensity * 255, size=hls.shape[:2], random_state=random_state)
    color_noise = random_utils.normal(0, color_shift * 360 * intensity, size=hls.shape[:2], random_state=random_state)

    hue = hls[..., 0]
    hue += color_noise
    hue %= 360

    luminance = hls[..., 1]
    luminance += (luminance_noise / 255) * (1.0 - luminance)

    image = cv2.cvtColor(hls, cv2.COLOR_HLS2RGB) * 255
    return image.astype(np.uint8)

def mask_from_bbox (img, bbox) [view source on GitHub]

Create binary mask from bounding box

Parameters:

Name Type Description
img ndarray

input image

bbox Tuple[int, int, int, int]

A bounding box tuple (x_min, y_min, x_max, y_max)

Returns:

Type Description
mask

binary mask

Source code in albumentations/augmentations/functional.py
Python
def mask_from_bbox(img: np.ndarray, bbox: Tuple[int, int, int, int]) -> np.ndarray:
    """Create binary mask from bounding box

    Args:
        img: input image
        bbox: A bounding box tuple `(x_min, y_min, x_max, y_max)`

    Returns:
        mask: binary mask

    """
    mask = np.zeros(img.shape[:2], dtype=np.uint8)
    x_min, y_min, x_max, y_max = bbox
    mask[y_min:y_max, x_min:x_max] = 1
    return mask

def move_tone_curve (img, low_y, high_y) [view source on GitHub]

Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name Type Description
img ndarray

RGB or grayscale image.

low_y float

y-position of a Bezier control point used to adjust the tone curve, must be in range [0, 1]

high_y float

y-position of a Bezier control point used to adjust image tone curve, must be in range [0, 1]

Source code in albumentations/augmentations/functional.py
Python
@preserve_shape
def move_tone_curve(img: np.ndarray, low_y: float, high_y: float) -> np.ndarray:
    """Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

    Args:
        img: RGB or grayscale image.
        low_y: y-position of a Bezier control point used
            to adjust the tone curve, must be in range [0, 1]
        high_y: y-position of a Bezier control point used
            to adjust image tone curve, must be in range [0, 1]

    """
    input_dtype = img.dtype

    if not 0 <= low_y <= 1:
        msg = "low_shift must be in range [0, 1]"
        raise ValueError(msg)
    if not 0 <= high_y <= 1:
        msg = "high_shift must be in range [0, 1]"
        raise ValueError(msg)

    if input_dtype != np.uint8:
        raise ValueError(f"Unsupported image type {input_dtype}")

    t = np.linspace(0.0, 1.0, 256)

    # Defines response of a four-point Bezier curve
    def evaluate_bez(t: np.ndarray) -> np.ndarray:
        return 3 * (1 - t) ** 2 * t * low_y + 3 * (1 - t) * t**2 * high_y + t**3

    evaluate_bez = np.vectorize(evaluate_bez)
    remapping = np.rint(evaluate_bez(t) * 255).astype(np.uint8)

    lut_fn = _maybe_process_in_chunks(cv2.LUT, lut=remapping)
    return lut_fn(img)

def multiply (img, multiplier) [view source on GitHub]

Parameters:

Name Type Description
img ndarray

Image.

multiplier ndarray

Multiplier coefficient.

Returns:

Type Description
ndarray

Image multiplied by multiplier coefficient.

Source code in albumentations/augmentations/functional.py
Python
def multiply(img: np.ndarray, multiplier: np.ndarray) -> np.ndarray:
    """Args:

        img: Image.
        multiplier: Multiplier coefficient.

    Returns:
        Image multiplied by `multiplier` coefficient.

    """
    if img.dtype == np.uint8:
        if len(multiplier.shape) == 1:
            return _multiply_uint8_optimized(img, multiplier)

        return _multiply_uint8(img, multiplier)

    return _multiply_non_uint8(img, multiplier)

def posterize (img, bits) [view source on GitHub]

Reduce the number of bits for each color channel.

Parameters:

Name Type Description
img ndarray

image to posterize.

bits int

number of high bits. Must be in range [0, 8]

Returns:

Type Description
ndarray

Image with reduced color channels.

Source code in albumentations/augmentations/functional.py
Python
@preserve_shape
def posterize(img: np.ndarray, bits: int) -> np.ndarray:
    """Reduce the number of bits for each color channel.

    Args:
        img: image to posterize.
        bits: number of high bits. Must be in range [0, 8]

    Returns:
        Image with reduced color channels.

    """
    bits_array = np.uint8(bits)

    if img.dtype != np.uint8:
        msg = "Image must have uint8 channel type"
        raise TypeError(msg)
    if np.any((bits_array < 0) | (bits_array > EIGHT)):
        msg = "bits must be in range [0, 8]"
        raise ValueError(msg)

    if not bits_array.shape or len(bits_array) == 1:
        if bits_array == 0:
            return np.zeros_like(img)
        if bits_array == EIGHT:
            return img.copy()

        lut = np.arange(0, 256, dtype=np.uint8)
        mask = ~np.uint8(2 ** (8 - bits_array) - 1)
        lut &= mask

        return cv2.LUT(img, lut)

    if not is_rgb_image(img):
        msg = "If bits is iterable image must be RGB"
        raise TypeError(msg)

    result_img = np.empty_like(img)
    for i, channel_bits in enumerate(bits_array):
        if channel_bits == 0:
            result_img[..., i] = np.zeros_like(img[..., i])
        elif channel_bits == EIGHT:
            result_img[..., i] = img[..., i].copy()
        else:
            lut = np.arange(0, 256, dtype=np.uint8)
            mask = ~np.uint8(2 ** (8 - channel_bits) - 1)
            lut &= mask

            result_img[..., i] = cv2.LUT(img[..., i], lut)

    return result_img

def solarize (img, threshold=128) [view source on GitHub]

Invert all pixel values above a threshold.

Parameters:

Name Type Description
img ndarray

The image to solarize.

threshold int

All pixels above this grayscale level are inverted.

Returns:

Type Description
ndarray

Solarized image.

Source code in albumentations/augmentations/functional.py
Python
def solarize(img: np.ndarray, threshold: int = 128) -> np.ndarray:
    """Invert all pixel values above a threshold.

    Args:
        img: The image to solarize.
        threshold: All pixels above this grayscale level are inverted.

    Returns:
        Solarized image.

    """
    dtype = img.dtype
    max_val = MAX_VALUES_BY_DTYPE[dtype]

    if dtype == np.dtype("uint8"):
        lut = [(i if i < threshold else max_val - i) for i in range(int(max_val) + 1)]

        prev_shape = img.shape
        img = cv2.LUT(img, np.array(lut, dtype=dtype))

        if len(prev_shape) != len(img.shape):
            img = np.expand_dims(img, -1)
        return img

    result_img = img.copy()
    cond = img >= threshold
    result_img[cond] = max_val - result_img[cond]
    return result_img

def split_uniform_grid (image_shape, grid) [view source on GitHub]

Splits an image shape into a uniform grid specified by the grid dimensions.

Parameters:

Name Type Description
image_shape Tuple[int, int]

The shape of the image as (height, width).

grid Tuple[int, int]

The grid size as (rows, columns).

Returns:

Type Description
np.ndarray

An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).

Source code in albumentations/augmentations/functional.py
Python
def split_uniform_grid(image_shape: Tuple[int, int], grid: Tuple[int, int]) -> np.ndarray:
    """Splits an image shape into a uniform grid specified by the grid dimensions.

    Args:
        image_shape (Tuple[int, int]): The shape of the image as (height, width).
        grid (Tuple[int, int]): The grid size as (rows, columns).

    Returns:
        np.ndarray: An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).
    """
    height, width = image_shape
    n_rows, n_cols = grid

    # Compute split points for the grid
    height_splits = np.linspace(0, height, n_rows + 1, dtype=int)
    width_splits = np.linspace(0, width, n_cols + 1, dtype=int)

    # Calculate tiles coordinates
    tiles = [
        (height_splits[i], width_splits[j], height_splits[i + 1], width_splits[j + 1])
        for i in range(n_rows)
        for j in range(n_cols)
    ]

    return np.array(tiles)

def swap_tiles_on_image (image, tiles) [view source on GitHub]

Swap tiles on the image according to the new format.

Parameters:

Name Type Description
image ndarray

Input image.

tiles ndarray

Array of tiles with each tile as [start_y, start_x, end_y, end_x].

Returns:

Type Description
np.ndarray

Output image with tiles swapped according to the random shuffle.

Source code in albumentations/augmentations/functional.py
Python
def swap_tiles_on_image(image: np.ndarray, tiles: np.ndarray) -> np.ndarray:
    """Swap tiles on the image according to the new format.

    Args:
        image: Input image.
        tiles: Array of tiles with each tile as [start_y, start_x, end_y, end_x].

    Returns:
        np.ndarray: Output image with tiles swapped according to the random shuffle.
    """
    # If no tiles are provided, return a copy of the original image
    if tiles.size == 0:
        return image.copy()

    # Create a copy of the image to retain original for reference
    new_image = np.empty_like(image)
    for start_y, start_x, end_y, end_x in tiles:
        # Assign the corresponding tile from the original image to the new image
        new_image[start_y:end_y, start_x:end_x] = image[start_y:end_y, start_x:end_x]

    return new_image

geometric special

functional

def bbox_flip (bbox, d, rows, cols) [view source on GitHub]

Flip a bounding box either vertically, horizontally or both depending on the value of d.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

d int

dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

if value of d is not -1, 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_flip(bbox: BoxInternalType, d: int, rows: int, cols: int) -> BoxInternalType:
    """Flip a bounding box either vertically, horizontally or both depending on the value of `d`.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        d: dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: if value of `d` is not -1, 0 or 1.

    """
    if d == 0:
        bbox = bbox_vflip(bbox, rows, cols)
    elif d == 1:
        bbox = bbox_hflip(bbox, rows, cols)
    elif d == -1:
        bbox = bbox_hflip(bbox, rows, cols)
        bbox = bbox_vflip(bbox, rows, cols)
    else:
        raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
    return bbox
def bbox_hflip (bbox, rows, cols) [view source on GitHub]

Flip a bounding box horizontally around the y-axis.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_hflip(bbox: BoxInternalType, rows: int, cols: int) -> BoxInternalType:
    """Flip a bounding box horizontally around the y-axis.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    return 1 - x_max, y_min, 1 - x_min, y_max
def bbox_rot90 (bbox, factor, rows, cols) [view source on GitHub]

Rotates a bounding box by 90 degrees CCW (see np.rot90)

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box tuple (x_min, y_min, x_max, y_max).

factor int

Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
tuple

A bounding box tuple (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_rot90(bbox: BoxInternalType, factor: int, rows: int, cols: int) -> BoxInternalType:
    """Rotates a bounding box by 90 degrees CCW (see np.rot90)

    Args:
        bbox: A bounding box tuple (x_min, y_min, x_max, y_max).
        factor: Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.
        rows: Image rows.
        cols: Image cols.

    Returns:
        tuple: A bounding box tuple (x_min, y_min, x_max, y_max).

    """
    if factor not in {0, 1, 2, 3}:
        msg = "Parameter n must be in set {0, 1, 2, 3}"
        raise ValueError(msg)
    x_min, y_min, x_max, y_max = bbox[:4]
    if factor == 1:
        bbox = y_min, 1 - x_max, y_max, 1 - x_min
    elif factor == TWO:
        bbox = 1 - x_max, 1 - y_max, 1 - x_min, 1 - y_min
    elif factor == THREE:
        bbox = 1 - y_max, x_min, 1 - y_min, x_max
    return bbox
def bbox_rotate (bbox, angle, method, rows, cols) [view source on GitHub]

Rotates a bounding box by angle degrees.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

angle float

Angle of rotation in degrees.

method str

Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_rotate(bbox: BoxInternalType, angle: float, method: str, rows: int, cols: int) -> BoxInternalType:
    """Rotates a bounding box by angle degrees.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        angle: Angle of rotation in degrees.
        method: Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    References:
        https://arxiv.org/abs/2109.13488

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    scale = cols / float(rows)
    if method == "largest_box":
        x = np.array([x_min, x_max, x_max, x_min]) - 0.5
        y = np.array([y_min, y_min, y_max, y_max]) - 0.5
    elif method == "ellipse":
        w = (x_max - x_min) / 2
        h = (y_max - y_min) / 2
        data = np.arange(0, 360, dtype=np.float32)
        x = w * np.sin(np.radians(data)) + (w + x_min - 0.5)
        y = h * np.cos(np.radians(data)) + (h + y_min - 0.5)
    else:
        raise ValueError(f"Method {method} is not a valid rotation method.")
    angle = np.deg2rad(angle)
    x_t = (np.cos(angle) * x * scale + np.sin(angle) * y) / scale
    y_t = -np.sin(angle) * x * scale + np.cos(angle) * y
    x_t = x_t + 0.5
    y_t = y_t + 0.5

    x_min, x_max = min(x_t), max(x_t)
    y_min, y_max = min(y_t), max(y_t)

    return x_min, y_min, x_max, y_max
def bbox_shift_scale_rotate (bbox, angle, scale, dx, dy, rotate_method, rows, cols, ** kwargs) [view source on GitHub]

Rotates, shifts and scales a bounding box. Rotation is made by angle degrees, scaling is made by scale factor and shifting is made by dx and dy.

Parameters:

Name Type Description
bbox tuple

A bounding box (x_min, y_min, x_max, y_max).

angle int

Angle of rotation in degrees.

scale int

Scale factor.

dx int

Shift along x-axis in pixel units.

dy int

Shift along y-axis in pixel units.

rotate_method(str)

Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_shift_scale_rotate(
    bbox: BoxInternalType,
    angle: float,
    scale: float,
    dx: int,
    dy: int,
    rotate_method: str,
    rows: int,
    cols: int,
    **kwargs: Any,
) -> BoxInternalType:
    """Rotates, shifts and scales a bounding box. Rotation is made by angle degrees,
    scaling is made by scale factor and shifting is made by dx and dy.


    Args:
        bbox (tuple): A bounding box `(x_min, y_min, x_max, y_max)`.
        angle (int): Angle of rotation in degrees.
        scale (int): Scale factor.
        dx (int): Shift along x-axis in pixel units.
        dy (int): Shift along y-axis in pixel units.
        rotate_method(str): Rotation method used. Should be one of: "largest_box", "ellipse".
            Default: "largest_box".
        rows (int): Image rows.
        cols (int): Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    """
    height, width = rows, cols
    center = (width / 2, height / 2)
    if rotate_method == "ellipse":
        x_min, y_min, x_max, y_max = bbox_rotate(bbox, angle, rotate_method, rows, cols)
        matrix = cv2.getRotationMatrix2D(center, 0, scale)
    else:
        x_min, y_min, x_max, y_max = bbox[:4]
        matrix = cv2.getRotationMatrix2D(center, angle, scale)
    matrix[0, 2] += dx * width
    matrix[1, 2] += dy * height
    x = np.array([x_min, x_max, x_max, x_min])
    y = np.array([y_min, y_min, y_max, y_max])
    ones = np.ones(shape=(len(x)))
    points_ones = np.vstack([x, y, ones]).transpose()
    points_ones[:, 0] *= width
    points_ones[:, 1] *= height
    tr_points = matrix.dot(points_ones.T).T
    tr_points[:, 0] /= width
    tr_points[:, 1] /= height

    x_min, x_max = min(tr_points[:, 0]), max(tr_points[:, 0])
    y_min, y_max = min(tr_points[:, 1]), max(tr_points[:, 1])

    return x_min, y_min, x_max, y_max
def bbox_transpose (bbox, axis, rows, cols) [view source on GitHub]

Transposes a bounding box along given axis.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

axis int

0 - main axis, 1 - secondary axis.

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
Tuple[float, float, float, float]

A bounding box tuple (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

If axis not equal to 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_transpose(bbox: KeypointInternalType, axis: int, rows: int, cols: int) -> KeypointInternalType:
    """Transposes a bounding box along given axis.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        axis: 0 - main axis, 1 - secondary axis.
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box tuple `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: If axis not equal to 0 or 1.

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    if axis not in {0, 1}:
        msg = "Axis must be either 0 or 1."
        raise ValueError(msg)
    if axis == 0:
        bbox = (y_min, x_min, y_max, x_max)
    if axis == 1:
        bbox = (1 - y_max, 1 - x_max, 1 - y_min, 1 - x_min)
    return bbox
def bbox_vflip (bbox, rows, cols) [view source on GitHub]

Flip a bounding box vertically around the x-axis.

Parameters:

Name Type Description
bbox Tuple[float, float, float, float]

A bounding box (x_min, y_min, x_max, y_max).

rows int

Image rows.

cols int

Image cols.

Returns:

Type Description
tuple

A bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/geometric/functional.py
Python
def bbox_vflip(bbox: BoxInternalType, rows: int, cols: int) -> BoxInternalType:
    """Flip a bounding box vertically around the x-axis.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image rows.
        cols: Image cols.

    Returns:
        tuple: A bounding box `(x_min, y_min, x_max, y_max)`.

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    return x_min, 1 - y_max, x_max, 1 - y_min
def elastic_transform (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None, approximate=False, same_dxdy=False) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_shape
def elastic_transform(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    alpha_affine: float,
    interpolation: int = cv2.INTER_LINEAR,
    border_mode: int = cv2.BORDER_REFLECT_101,
    value: Optional[ImageColorType] = None,
    random_state: Optional[np.random.RandomState] = None,
    approximate: bool = False,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Elastic deformation of images as described in [Simard2003]_ (with modifications).
    Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

    .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
         Convolutional Neural Networks applied to Visual Document Analysis", in
         Proc. of the International Conference on Document Analysis and
         Recognition, 2003.
    """
    height, width = img.shape[:2]

    # Random affine
    center_square = np.array((height, width), dtype=np.float32) // 2
    square_size = min((height, width)) // 3
    alpha = float(alpha)
    sigma = float(sigma)
    alpha_affine = float(alpha_affine)

    pts1 = np.array(
        [
            center_square + square_size,
            [center_square[0] + square_size, center_square[1] - square_size],
            center_square - square_size,
        ],
        dtype=np.float32,
    )
    pts2 = pts1 + random_utils.uniform(-alpha_affine, alpha_affine, size=pts1.shape, random_state=random_state).astype(
        np.float32
    )
    matrix = cv2.getAffineTransform(pts1, pts2)

    warp_fn = _maybe_process_in_chunks(
        cv2.warpAffine, M=matrix, dsize=(width, height), flags=interpolation, borderMode=border_mode, borderValue=value
    )
    img = warp_fn(img)

    if approximate:
        # Approximate computation smooth displacement map with a large enough kernel.
        # On large images (512+) this is approximately 2X times faster
        dx = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
        cv2.GaussianBlur(dx, (17, 17), sigma, dst=dx)
        dx *= alpha
        if same_dxdy:
            # Speed up even more
            dy = dx
        else:
            dy = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
            cv2.GaussianBlur(dy, (17, 17), sigma, dst=dy)
            dy *= alpha
    else:
        dx = np.float32(
            gaussian_filter((random_utils.rand(height, width, random_state=random_state) * 2 - 1), sigma) * alpha
        )
        if same_dxdy:
            # Speed up
            dy = dx
        else:
            dy = np.float32(
                gaussian_filter((random_utils.rand(height, width, random_state=random_state) * 2 - 1), sigma) * alpha
            )

    x, y = np.meshgrid(np.arange(width), np.arange(height))

    map_x = np.float32(x + dx)
    map_y = np.float32(y + dy)

    remap_fn = _maybe_process_in_chunks(
        cv2.remap, map1=map_x, map2=map_y, interpolation=interpolation, borderMode=border_mode, borderValue=value
    )
    return remap_fn(img)
def elastic_transform_approx (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications for speed). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_shape
def elastic_transform_approx(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    alpha_affine: float,
    interpolation: int = cv2.INTER_LINEAR,
    border_mode: int = cv2.BORDER_REFLECT_101,
    value: Optional[ImageColorType] = None,
    random_state: Optional[np.random.RandomState] = None,
) -> np.ndarray:
    """Elastic deformation of images as described in [Simard2003]_ (with modifications for speed).
    Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

    .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
         Convolutional Neural Networks applied to Visual Document Analysis", in
         Proc. of the International Conference on Document Analysis and
         Recognition, 2003.
    """
    height, width = img.shape[:2]

    # Random affine
    center_square = np.array((height, width), dtype=np.float32) // 2
    square_size = min((height, width)) // 3
    alpha = float(alpha)
    sigma = float(sigma)
    alpha_affine = float(alpha_affine)

    pts1 = np.array(
        [
            center_square + square_size,
            [center_square[0] + square_size, center_square[1] - square_size],
            center_square - square_size,
        ],
        dtype=np.float32,
    )
    pts2 = pts1 + random_utils.uniform(-alpha_affine, alpha_affine, size=pts1.shape, random_state=random_state).astype(
        np.float32
    )
    matrix = cv2.getAffineTransform(pts1, pts2)

    warp_fn = _maybe_process_in_chunks(
        cv2.warpAffine,
        M=matrix,
        dsize=(width, height),
        flags=interpolation,
        borderMode=border_mode,
        borderValue=value,
    )
    img = warp_fn(img)

    dx = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
    cv2.GaussianBlur(dx, (17, 17), sigma, dst=dx)
    dx *= alpha

    dy = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
    cv2.GaussianBlur(dy, (17, 17), sigma, dst=dy)
    dy *= alpha

    x, y = np.meshgrid(np.arange(width), np.arange(height))

    map_x = np.float32(x + dx)
    map_y = np.float32(y + dy)

    remap_fn = _maybe_process_in_chunks(
        cv2.remap,
        map1=map_x,
        map2=map_y,
        interpolation=interpolation,
        borderMode=border_mode,
        borderValue=value,
    )
    return remap_fn(img)
def find_keypoint (position, distance_map, threshold, inverted) [view source on GitHub]

Determine if a valid keypoint can be found at the given position.

Source code in albumentations/augmentations/geometric/functional.py
Python
def find_keypoint(
    position: Tuple[int, int], distance_map: np.ndarray, threshold: Optional[float], inverted: bool
) -> Optional[Tuple[float, float]]:
    """Determine if a valid keypoint can be found at the given position."""
    y, x = position
    value = distance_map[y, x]
    if not inverted and threshold is not None and value >= threshold:
        return None
    if inverted and threshold is not None and value < threshold:
        return None
    return float(x), float(y)
def from_distance_maps (distance_maps, inverted, if_not_found_coords, threshold=None) [view source on GitHub]

Convert outputs of to_distance_maps to KeypointsOnImage. This is the inverse of to_distance_maps.

Source code in albumentations/augmentations/geometric/functional.py
Python
def from_distance_maps(
    distance_maps: np.ndarray,
    inverted: bool,
    if_not_found_coords: Optional[Union[Sequence[int], Dict[str, Any]]],
    threshold: Optional[float] = None,
) -> List[Tuple[float, float]]:
    """Convert outputs of `to_distance_maps` to `KeypointsOnImage`.
    This is the inverse of `to_distance_maps`.
    """
    if distance_maps.ndim != THREE:
        msg = f"Expected three-dimensional input, got {distance_maps.ndim} dimensions and shape {distance_maps.shape}."
        raise ValueError(msg)
    height, width, nb_keypoints = distance_maps.shape

    drop_if_not_found, if_not_found_x, if_not_found_y = validate_if_not_found_coords(if_not_found_coords)

    keypoints = []
    for i in range(nb_keypoints):
        hitidx_flat = np.argmax(distance_maps[..., i]) if inverted else np.argmin(distance_maps[..., i])
        hitidx_ndim = np.unravel_index(hitidx_flat, (height, width))
        keypoint = find_keypoint(hitidx_ndim, distance_maps[:, :, i], threshold, inverted)
        if keypoint:
            keypoints.append(keypoint)
        elif not drop_if_not_found:
            keypoints.append((if_not_found_x, if_not_found_y))

    return keypoints
def grid_distortion (img, num_steps=10, xsteps=(), ysteps=(), interpolation=1, border_mode=4, value=None) [view source on GitHub]

Perform a grid distortion of an input image.

Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_shape
def grid_distortion(
    img: np.ndarray,
    num_steps: int = 10,
    xsteps: Tuple[()] = (),
    ysteps: Tuple[()] = (),
    interpolation: int = cv2.INTER_LINEAR,
    border_mode: int = cv2.BORDER_REFLECT_101,
    value: Optional[ImageColorType] = None,
) -> np.ndarray:
    """Perform a grid distortion of an input image.

    Reference:
        http://pythology.blogspot.sg/2014/03/interpolation-on-regular-distorted-grid.html
    """
    height, width = img.shape[:2]

    x_step = width // num_steps
    xx = np.zeros(width, np.float32)
    prev = 0
    for idx in range(num_steps + 1):
        x = idx * x_step
        start = int(x)
        end = int(x) + x_step
        if end > width:
            end = width
            cur = width
        else:
            cur = prev + x_step * xsteps[idx]

        xx[start:end] = np.linspace(prev, cur, end - start)
        prev = cur

    y_step = height // num_steps
    yy = np.zeros(height, np.float32)
    prev = 0
    for idx in range(num_steps + 1):
        y = idx * y_step
        start = int(y)
        end = int(y) + y_step
        if end > height:
            end = height
            cur = height
        else:
            cur = prev + y_step * ysteps[idx]

        yy[start:end] = np.linspace(prev, cur, end - start)
        prev = cur

    map_x, map_y = np.meshgrid(xx, yy)
    map_x = map_x.astype(np.float32)
    map_y = map_y.astype(np.float32)

    remap_fn = _maybe_process_in_chunks(
        cv2.remap,
        map1=map_x,
        map2=map_y,
        interpolation=interpolation,
        borderMode=border_mode,
        borderValue=value,
    )
    return remap_fn(img)
def keypoint_flip (keypoint, d, rows, cols) [view source on GitHub]

Flip a keypoint either vertically, horizontally or both depending on the value of d.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

d int

Number of flip. Must be -1, 0 or 1: * 0 - vertical flip, * 1 - horizontal flip, * -1 - vertical and horizontal flip.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Exceptions:

Type Description
ValueError

if value of d is not -1, 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py
Python
def keypoint_flip(keypoint: KeypointInternalType, d: int, rows: int, cols: int) -> KeypointInternalType:
    """Flip a keypoint either vertically, horizontally or both depending on the value of `d`.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        d: Number of flip. Must be -1, 0 or 1:
            * 0 - vertical flip,
            * 1 - horizontal flip,
            * -1 - vertical and horizontal flip.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    Raises:
        ValueError: if value of `d` is not -1, 0 or 1.

    """
    if d == 0:
        keypoint = keypoint_vflip(keypoint, rows, cols)
    elif d == 1:
        keypoint = keypoint_hflip(keypoint, rows, cols)
    elif d == -1:
        keypoint = keypoint_hflip(keypoint, rows, cols)
        keypoint = keypoint_vflip(keypoint, rows, cols)
    else:
        raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
    return keypoint
def keypoint_hflip (keypoint, rows, cols) [view source on GitHub]

Flip a keypoint horizontally around the y-axis.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/geometric/functional.py
Python
@angle_2pi_range
def keypoint_hflip(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
    """Flip a keypoint horizontally around the y-axis.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    angle = math.pi - angle
    return (cols - 1) - x, y, angle, scale
def keypoint_rot90 (keypoint, factor, rows, cols, ** params) [view source on GitHub]

Rotates a keypoint by 90 degrees CCW (see np.rot90)

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

factor int

Number of CCW rotations. Must be in range [0;3] See np.rot90.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
tuple

A keypoint (x, y, angle, scale).

Exceptions:

Type Description
ValueError

if factor not in set {0, 1, 2, 3}

Source code in albumentations/augmentations/geometric/functional.py
Python
@angle_2pi_range
def keypoint_rot90(
    keypoint: KeypointInternalType, factor: int, rows: int, cols: int, **params: Any
) -> KeypointInternalType:
    """Rotates a keypoint by 90 degrees CCW (see np.rot90)

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        factor: Number of CCW rotations. Must be in range [0;3] See np.rot90.
        rows: Image height.
        cols: Image width.

    Returns:
        tuple: A keypoint `(x, y, angle, scale)`.

    Raises:
        ValueError: if factor not in set {0, 1, 2, 3}

    """
    x, y, angle, scale = keypoint[:4]

    if factor not in {0, 1, 2, 3}:
        msg = "Parameter n must be in set {0, 1, 2, 3}"
        raise ValueError(msg)

    if factor == 1:
        x, y, angle = y, (cols - 1) - x, angle - math.pi / 2
    elif factor == TWO:
        x, y, angle = (cols - 1) - x, (rows - 1) - y, angle - math.pi
    elif factor == THREE:
        x, y, angle = (rows - 1) - y, x, angle + math.pi / 2

    return x, y, angle, scale
def keypoint_rotate (keypoint, angle, rows, cols, ** params) [view source on GitHub]

Rotate a keypoint by angle.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

angle float

Rotation angle.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/geometric/functional.py
Python
@angle_2pi_range
def keypoint_rotate(
    keypoint: KeypointInternalType, angle: float, rows: int, cols: int, **params: Any
) -> KeypointInternalType:
    """Rotate a keypoint by angle.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        angle: Rotation angle.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    center = (cols - 1) * 0.5, (rows - 1) * 0.5
    matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
    x, y, a, s = keypoint[:4]
    x, y = cv2.transform(np.array([[[x, y]]]), matrix).squeeze()
    return x, y, a + math.radians(angle), s
def keypoint_scale (keypoint, scale_x, scale_y) [view source on GitHub]

Scales a keypoint by scale_x and scale_y.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

scale_x float

Scale coefficient x-axis.

scale_y float

Scale coefficient y-axis.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/geometric/functional.py
Python
def keypoint_scale(keypoint: KeypointInternalType, scale_x: float, scale_y: float) -> KeypointInternalType:
    """Scales a keypoint by scale_x and scale_y.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        scale_x: Scale coefficient x-axis.
        scale_y: Scale coefficient y-axis.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    return x * scale_x, y * scale_y, angle, scale * max(scale_x, scale_y)
def keypoint_transpose (keypoint) [view source on GitHub]

Rotate a keypoint by angle.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/geometric/functional.py
Python
def keypoint_transpose(keypoint: KeypointInternalType) -> KeypointInternalType:
    """Rotate a keypoint by angle.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]

    angle = np.pi - angle if angle <= np.pi else 3 * np.pi - angle

    return y, x, angle, scale
def keypoint_vflip (keypoint, rows, cols) [view source on GitHub]

Flip a keypoint vertically around the x-axis.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
tuple

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/geometric/functional.py
Python
@angle_2pi_range
def keypoint_vflip(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
    """Flip a keypoint vertically around the x-axis.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        rows: Image height.
        cols: Image width.

    Returns:
        tuple: A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    angle = -angle
    return x, (rows - 1) - y, angle, scale
def optical_distortion (img, k=0, dx=0, dy=0, interpolation=1, border_mode=4, value=None) [view source on GitHub]

Barrel / pincushion distortion. Unconventional augment.

Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_shape
def optical_distortion(
    img: np.ndarray,
    k: int = 0,
    dx: int = 0,
    dy: int = 0,
    interpolation: int = cv2.INTER_LINEAR,
    border_mode: int = cv2.BORDER_REFLECT_101,
    value: Optional[ImageColorType] = None,
) -> np.ndarray:
    """Barrel / pincushion distortion. Unconventional augment.

    Reference:
        |  https://stackoverflow.com/questions/6199636/formulas-for-barrel-pincushion-distortion
        |  https://stackoverflow.com/questions/10364201/image-transformation-in-opencv
        |  https://stackoverflow.com/questions/2477774/correcting-fisheye-distortion-programmatically
        |  http://www.coldvision.io/2017/03/02/advanced-lane-finding-using-opencv/
    """
    height, width = img.shape[:2]

    fx = width
    fy = height

    cx = width * 0.5 + dx
    cy = height * 0.5 + dy

    camera_matrix = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]], dtype=np.float32)

    distortion = np.array([k, k, 0, 0, 0], dtype=np.float32)
    map1, map2 = cv2.initUndistortRectifyMap(camera_matrix, distortion, None, None, (width, height), cv2.CV_32FC1)
    return cv2.remap(img, map1, map2, interpolation=interpolation, borderMode=border_mode, borderValue=value)
def rotation2d_matrix_to_euler_angles (matrix, y_up=False) [view source on GitHub]

matrix (np.ndarray): Rotation matrix y_up (bool): is Y axis looks up or down

Source code in albumentations/augmentations/geometric/functional.py
Python
def rotation2d_matrix_to_euler_angles(matrix: np.ndarray, y_up: bool = False) -> float:
    """Args:
    matrix (np.ndarray): Rotation matrix
    y_up (bool): is Y axis looks up or down

    """
    if y_up:
        return np.arctan2(matrix[1, 0], matrix[0, 0])
    return np.arctan2(-matrix[1, 0], matrix[0, 0])
def to_distance_maps (keypoints, height, width, inverted=False) [view source on GitHub]

Generate a (H,W,N) array of distance maps for N keypoints.

The n-th distance map contains at every location (y, x) the euclidean distance to the n-th keypoint.

This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.

Parameters:

Name Type Description
keypoint

keypoint coordinates

height int

image height

width int

image width

inverted bool

If True, inverted distance maps are returned where each distance value d is replaced by d/(d+1), i.e. the distance maps have values in the range (0.0, 1.0] with 1.0 denoting exactly the position of the respective keypoint.

Returns:

Type Description
ndarray

(H, W, N) ndarray A float32 array containing N distance maps for N keypoints. Each location (y, x, n) in the array denotes the euclidean distance at (y, x) to the n-th keypoint. If inverted is True, the distance d is replaced by d/(d+1). The height and width of the array match the height and width in KeypointsOnImage.shape.

Source code in albumentations/augmentations/geometric/functional.py
Python
def to_distance_maps(
    keypoints: Sequence[Tuple[float, float]], height: int, width: int, inverted: bool = False
) -> np.ndarray:
    """Generate a ``(H,W,N)`` array of distance maps for ``N`` keypoints.

    The ``n``-th distance map contains at every location ``(y, x)`` the
    euclidean distance to the ``n``-th keypoint.

    This function can be used as a helper when augmenting keypoints with a
    method that only supports the augmentation of images.

    Args:
        keypoint: keypoint coordinates
        height: image height
        width: image width
        inverted (bool): If ``True``, inverted distance maps are returned where each
            distance value d is replaced by ``d/(d+1)``, i.e. the distance
            maps have values in the range ``(0.0, 1.0]`` with ``1.0`` denoting
            exactly the position of the respective keypoint.

    Returns:
        (H, W, N) ndarray
            A ``float32`` array containing ``N`` distance maps for ``N``
            keypoints. Each location ``(y, x, n)`` in the array denotes the
            euclidean distance at ``(y, x)`` to the ``n``-th keypoint.
            If `inverted` is ``True``, the distance ``d`` is replaced
            by ``d/(d+1)``. The height and width of the array match the
            height and width in ``KeypointsOnImage.shape``.

    """
    distance_maps = np.zeros((height, width, len(keypoints)), dtype=np.float32)

    yy = np.arange(0, height)
    xx = np.arange(0, width)
    grid_xx, grid_yy = np.meshgrid(xx, yy)

    for i, (x, y) in enumerate(keypoints):
        distance_maps[:, :, i] = (grid_xx - x) ** 2 + (grid_yy - y) ** 2

    distance_maps = np.sqrt(distance_maps)
    if inverted:
        return 1 / (distance_maps + 1)
    return distance_maps
def validate_if_not_found_coords (if_not_found_coords) [view source on GitHub]

Validate and process if_not_found_coords parameter.

Source code in albumentations/augmentations/geometric/functional.py
Python
def validate_if_not_found_coords(
    if_not_found_coords: Optional[Union[Sequence[int], Dict[str, Any]]],
) -> Tuple[bool, int, int]:
    """Validate and process `if_not_found_coords` parameter."""
    if if_not_found_coords is None:
        return True, -1, -1
    if isinstance(if_not_found_coords, (tuple, list)):
        if len(if_not_found_coords) != TWO:
            msg = "Expected tuple/list 'if_not_found_coords' to contain exactly two entries."
            raise ValueError(msg)
        return False, if_not_found_coords[0], if_not_found_coords[1]
    if isinstance(if_not_found_coords, dict):
        return False, if_not_found_coords["x"], if_not_found_coords["y"]

    msg = "Expected if_not_found_coords to be None, tuple, list, or dict."
    raise ValueError(msg)

resize

class LongestMaxSize (max_size=1024, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name Type Description
max_size int, list of int

maximum size of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.

interpolation OpenCV flag

interpolation method. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/resize.py
Python
class LongestMaxSize(DualTransform):
    """Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
            will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        max_size: Union[int, Sequence[int]] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1,
    ):
        super().__init__(always_apply, p)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self, img: np.ndarray, max_size: int = 1024, interpolation: int = cv2.INTER_LINEAR, **params: Any
    ) -> np.ndarray:
        return F.longest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(
        self, keypoint: KeypointInternalType, max_size: int = 1024, **params: Any
    ) -> KeypointInternalType:
        height = params["rows"]
        width = params["cols"]

        scale = max_size / max([height, width])
        return F.keypoint_scale(keypoint, scale, scale)

    def get_params(self) -> Dict[str, int]:
        return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("max_size", "interpolation")
class RandomScale (scale_limit=0.1, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Randomly resize the input. Output image size is different from the input image size.

Parameters:

Name Type Description
scale_limit float, float) or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/resize.py
Python
class RandomScale(DualTransform):
    """Randomly resize the input. Output image size is different from the input image size.

    Args:
        scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
            range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
            If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
            Default: (-0.1, 0.1).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        scale_limit: ScaleFloatType = 0.1,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.scale_limit = to_tuple(scale_limit, bias=1.0)
        self.interpolation = interpolation

    def get_params(self) -> Dict[str, float]:
        return {"scale": random.uniform(self.scale_limit[0], self.scale_limit[1])}

    def apply(
        self, img: np.ndarray, scale: float = 0, interpolation: int = cv2.INTER_LINEAR, **params: Any
    ) -> np.ndarray:
        return F.scale(img, scale, interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(
        self, keypoint: KeypointInternalType, scale: float = 0, **params: Any
    ) -> KeypointInternalType:
        return F.keypoint_scale(keypoint, scale, scale)

    def get_transform_init_args(self) -> Dict[str, Any]:
        return {"interpolation": self.interpolation, "scale_limit": to_tuple(self.scale_limit, bias=-1.0)}
class Resize (height, width, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Resize the input to the given height and width.

Parameters:

Name Type Description
height int

desired height of the output.

width int

desired width of the output.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/resize.py
Python
class Resize(DualTransform):
    """Resize the input to the given height and width.

    Args:
        height (int): desired height of the output.
        width (int): desired width of the output.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    def __init__(
        self, height: int, width: int, interpolation: int = cv2.INTER_LINEAR, always_apply: bool = False, p: float = 1
    ):
        super().__init__(always_apply, p)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(self, img: np.ndarray, interpolation: int = cv2.INTER_LINEAR, **params: Any) -> np.ndarray:
        return F.resize(img, height=self.height, width=self.width, interpolation=interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        height = params["rows"]
        width = params["cols"]
        scale_x = self.width / width
        scale_y = self.height / height
        return F.keypoint_scale(keypoint, scale_x, scale_y)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("height", "width", "interpolation")
class SmallestMaxSize (max_size=1024, interpolation=1, always_apply=False, p=1) [view source on GitHub]

Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name Type Description
max_size int, list of int

maximum size of smallest side of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.

interpolation OpenCV flag

interpolation method. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/resize.py
Python
class SmallestMaxSize(DualTransform):
    """Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of smallest side of the image after the transformation. When using a
            list, max size will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    def __init__(
        self,
        max_size: Union[int, Sequence[int]] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1,
    ):
        super().__init__(always_apply, p)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self, img: np.ndarray, max_size: int = 1024, interpolation: int = cv2.INTER_LINEAR, **params: Any
    ) -> np.ndarray:
        return F.smallest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return bbox

    def apply_to_keypoint(
        self, keypoint: KeypointInternalType, max_size: int = 1024, **params: Any
    ) -> KeypointInternalType:
        height = params["rows"]
        width = params["cols"]

        scale = max_size / min([height, width])
        return F.keypoint_scale(keypoint, scale, scale)

    def get_params(self) -> Dict[str, int]:
        return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("max_size", "interpolation")

rotate

class RandomRotate90 [view source on GitHub]

Randomly rotate the input by 90 degrees zero or more times.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/rotate.py
Python
class RandomRotate90(DualTransform):
    """Randomly rotate the input by 90 degrees zero or more times.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, factor: float = 0, **params: Any) -> np.ndarray:
        """Args:
        factor (int): number of times the input will be rotated by 90 degrees.

        """
        return np.ascontiguousarray(np.rot90(img, factor))

    def get_params(self) -> Dict[str, int]:
        # Random int in the range [0, 3]
        return {"factor": random.randint(0, 3)}

    def apply_to_bbox(self, bbox: BoxInternalType, factor: int = 0, **params: Any) -> BoxInternalType:
        return F.bbox_rot90(bbox, factor, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, factor: int = 0, **params: Any) -> BoxInternalType:
        return F.keypoint_rot90(keypoint, factor, **params)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()
apply (self, img, factor=0, **params)

factor (int): number of times the input will be rotated by 90 degrees.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def apply(self, img: np.ndarray, factor: float = 0, **params: Any) -> np.ndarray:
    """Args:
    factor (int): number of times the input will be rotated by 90 degrees.

    """
    return np.ascontiguousarray(np.rot90(img, factor))
class Rotate (limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', crop_border=False, always_apply=False, p=0.5) [view source on GitHub]

Rotate the input by an angle selected randomly from the uniform distribution.

Parameters:

Name Type Description
limit Union[int, Tuple[int, int]]

range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"

crop_border bool

If True would make a largest possible crop within rotated image

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/rotate.py
Python
class Rotate(DualTransform):
    """Rotate the input by an angle selected randomly from the uniform distribution.

    Args:
        limit: range from which a random angle is picked. If limit is a single int
            an angle is picked from (-limit, limit). Default: (-90, 90)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
            Default: "largest_box"
        crop_border (bool): If True would make a largest possible crop within rotated image
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        limit: ScaleIntType = 90,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[Union[int, float, Tuple[int, int], Tuple[float, float]]] = None,
        mask_value: Optional[Union[int, float, Tuple[int, int], Tuple[float, float]]] = None,
        rotate_method: str = "largest_box",
        crop_border: bool = False,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.limit = to_tuple(limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.rotate_method = rotate_method
        self.crop_border = crop_border

        if rotate_method not in ["largest_box", "ellipse"]:
            raise ValueError(f"Rotation method {self.rotate_method} is not valid.")

    def apply(
        self,
        img: np.ndarray,
        angle: float = 0,
        interpolation: int = cv2.INTER_LINEAR,
        x_min: Optional[int] = None,
        x_max: Optional[int] = None,
        y_min: Optional[int] = None,
        y_max: Optional[int] = None,
        **params: Any,
    ) -> np.ndarray:
        img_out = F.rotate(img, angle, interpolation, self.border_mode, self.value)
        if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
            return FCrops.crop(img_out, x_min, y_min, x_max, y_max)
        return img_out

    def apply_to_mask(
        self,
        mask: np.ndarray,
        angle: float,
        x_min: Optional[int] = None,
        x_max: Optional[int] = None,
        y_min: Optional[int] = None,
        y_max: Optional[int] = None,
        **params: Any,
    ) -> np.ndarray:
        img_out = F.rotate(mask, angle, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
        if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
            return FCrops.crop(img_out, x_min, y_min, x_max, y_max)
        return img_out

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        angle: float = 0,
        x_min: Optional[int] = None,
        x_max: Optional[int] = None,
        y_min: Optional[int] = None,
        y_max: Optional[int] = None,
        cols: int = 0,
        rows: int = 0,
        **params: Any,
    ) -> np.ndarray:
        bbox_out = F.bbox_rotate(bbox, angle, self.rotate_method, rows, cols)
        if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
            return FCrops.bbox_crop(bbox_out, x_min, y_min, x_max, y_max, rows, cols)
        return bbox_out

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        angle: float = 0,
        x_min: Optional[int] = None,
        x_max: Optional[int] = None,
        y_min: Optional[int] = None,
        y_max: Optional[int] = None,
        cols: int = 0,
        rows: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        keypoint_out = F.keypoint_rotate(keypoint, angle, rows, cols, **params)
        if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
            return FCrops.crop_keypoint_by_coords(keypoint_out, (x_min, y_min, x_max, y_max))
        return keypoint_out

    @staticmethod
    def _rotated_rect_with_max_area(h: int, w: int, angle: float) -> Dict[str, int]:
        """Given a rectangle of size wxh that has been rotated by 'angle' (in
        degrees), computes the width and height of the largest possible
        axis-aligned rectangle (maximal area) within the rotated rectangle.

        Code from: https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders
        """
        angle = math.radians(angle)
        width_is_longer = w >= h
        side_long, side_short = (w, h) if width_is_longer else (h, w)

        # since the solutions for angle, -angle and 180-angle are all the same,
        # it is sufficient to look at the first quadrant and the absolute values of sin,cos:
        sin_a, cos_a = abs(math.sin(angle)), abs(math.cos(angle))
        if side_short <= 2.0 * sin_a * cos_a * side_long or abs(sin_a - cos_a) < SMALL_NUMBER:
            # half constrained case: two crop corners touch the longer side,
            # the other two corners are on the mid-line parallel to the longer line
            x = 0.5 * side_short
            wr, hr = (x / sin_a, x / cos_a) if width_is_longer else (x / cos_a, x / sin_a)
        else:
            # fully constrained case: crop touches all 4 sides
            cos_2a = cos_a * cos_a - sin_a * sin_a
            wr, hr = (w * cos_a - h * sin_a) / cos_2a, (h * cos_a - w * sin_a) / cos_2a

        return {
            "x_min": max(0, int(w / 2 - wr / 2)),
            "x_max": min(w, int(w / 2 + wr / 2)),
            "y_min": max(0, int(h / 2 - hr / 2)),
            "y_max": min(h, int(h / 2 + hr / 2)),
        }

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        out_params = {"angle": random.uniform(self.limit[0], self.limit[1])}
        if self.crop_border:
            h, w = params["image"].shape[:2]
            out_params.update(self._rotated_rect_with_max_area(h, w, out_params["angle"]))
        return out_params

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method", "crop_border")
class SafeRotate (limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5) [view source on GitHub]

Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we may see some artifacts.

Parameters:

Name Type Description
limit int, int) or int

range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/rotate.py
Python
class SafeRotate(DualTransform):
    """Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

    The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and
    after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we
    may see some artifacts.

    Args:
        limit ((int, int) or int): range from which a random angle is picked. If limit is a single int
            an angle is picked from (-limit, limit). Default: (-90, 90)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        limit: Union[float, Tuple[float, float]] = 90,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ColorType] = None,
        mask_value: Optional[ColorType] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.limit = to_tuple(limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def apply(self, img: np.ndarray, matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.safe_rotate(img, matrix, cast(int, self.interpolation), self.value, self.border_mode)

    def apply_to_mask(self, mask: np.ndarray, matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.safe_rotate(mask, matrix, cv2.INTER_NEAREST, self.mask_value, self.border_mode)

    def apply_to_bbox(self, bbox: BoxInternalType, cols: int = 0, rows: int = 0, **params: Any) -> BoxInternalType:
        return F.bbox_safe_rotate(bbox, params["matrix"], cols, rows)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        angle: float = 0,
        scale_x: float = 0,
        scale_y: float = 0,
        cols: int = 0,
        rows: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        return F.keypoint_safe_rotate(keypoint, params["matrix"], angle, scale_x, scale_y, cols, rows)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        angle = random.uniform(self.limit[0], self.limit[1])

        image = params["image"]
        height, width = image.shape[:2]

        # https://stackoverflow.com/questions/43892506/opencv-python-rotate-image-without-cropping-sides
        image_center = (width / 2, height / 2)

        # Rotation Matrix
        rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)

        # rotation calculates the cos and sin, taking absolutes of those.
        abs_cos = abs(rotation_mat[0, 0])
        abs_sin = abs(rotation_mat[0, 1])

        # find the new width and height bounds
        new_w = math.ceil(height * abs_sin + width * abs_cos)
        new_h = math.ceil(height * abs_cos + width * abs_sin)

        scale_x = width / new_w
        scale_y = height / new_h

        # Shift the image to create padding
        rotation_mat[0, 2] += new_w / 2 - image_center[0]
        rotation_mat[1, 2] += new_h / 2 - image_center[1]

        # Rescale to original size
        scale_mat = np.diag(np.ones(3))
        scale_mat[0, 0] *= scale_x
        scale_mat[1, 1] *= scale_y
        _tmp = np.diag(np.ones(3))
        _tmp[:2] = rotation_mat
        _tmp = scale_mat @ _tmp
        rotation_mat = _tmp[:2]

        return {"matrix": rotation_mat, "angle": angle, "scale_x": scale_x, "scale_y": scale_y}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str]:
        return ("limit", "interpolation", "border_mode", "value", "mask_value")

transforms

class Affine (scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', always_apply=False, p=0.5) [view source on GitHub]

Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.

Affine transformations involve:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)

All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval and mode of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.

Parameters:

Name Type Description
scale number, tuple of number or dict

Scaling factor to use, where 1.0 denotes "no change" and 0.5 is zoomed out to 50 percent of the original size. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set keep_ratio=True, then the same value will be used for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes. Note that when the keep_ratio=True, the x- and y-axis ranges should be the same.

translate_percent None, number, tuple of number or dict

Translation as a fraction of the image height/width (x-translation, y-translation), where 0 denotes "no change" and 0.5 denotes "half of the axis size". * If None then equivalent to 0.0 unless translate_px has a value other than None. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

translate_px None, int, tuple of int or dict

Translation in pixels. * If None then equivalent to 0 unless translate_percent has a value other than None. * If a single int, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the discrete interval [a..b]. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

rotate number or tuple of number

Rotation in degrees (NOT radians), i.e. expected value range is around [-360, 360]. Rotation happens around the center of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b] and used as the rotation value.

shear number, tuple of number or dict

Shear in degrees (NOT radians), i.e. expected value range is around [-360, 360], with reasonable values being in the range of [-45, 45]. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple (a, b), then two value will be uniformly sampled per image from the interval [a, b] and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

interpolation int

OpenCV interpolation flag.

mask_interpolation int

OpenCV interpolation flag.

cval number or sequence of number

The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when mode=constant. The expected value range is [0, 255] for uint8 images.

cval_mask number or tuple of number

Same as cval but only for masks.

mode int

OpenCV border flag.

fit_output bool

If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (translate_percent and translate_px are ignored). Otherwise (False), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False

keep_ratio bool

When True, the original aspect ratio will be kept when the random scale is applied. Default: False.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Affine(DualTransform):
    """Augmentation to apply affine transformations to images.
    This is mostly a wrapper around the corresponding classes and functions in OpenCV.

    Affine transformations involve:

        - Translation ("move" image on the x-/y-axis)
        - Rotation
        - Scaling ("zoom" in/out)
        - Shear (move one side of the image, turning a square into a trapezoid)

    All such transformations can create "new" pixels in the image without a defined content, e.g.
    if the image is translated to the left, pixels are created on the right.
    A method has to be defined to deal with these pixel values.
    The parameters `cval` and `mode` of this class deal with this.

    Some transformations involve interpolations between several pixels
    of the input image to generate output pixel values. The parameters `interpolation` and
    `mask_interpolation` deals with the method of interpolation used for this.

    Args:
        scale (number, tuple of number or dict): Scaling factor to use, where ``1.0`` denotes "no change" and
            ``0.5`` is zoomed out to ``50`` percent of the original size.
                * If a single number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
                  That the same range will be used for both x- and y-axis. To keep the aspect ratio, set
                  ``keep_ratio=True``, then the same value will be used for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes. Note that when
                  the ``keep_ratio=True``, the x- and y-axis ranges should be the same.
        translate_percent (None, number, tuple of number or dict): Translation as a fraction of the image height/width
            (x-translation, y-translation), where ``0`` denotes "no change"
            and ``0.5`` denotes "half of the axis size".
                * If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``.
                * If a single number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
                  That sampled fraction value will be used identically for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        translate_px (None, int, tuple of int or dict): Translation in pixels.
                * If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``.
                * If a single int, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from
                  the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        rotate (number or tuple of number): Rotation in degrees (**NOT** radians), i.e. expected value range is
            around ``[-360, 360]``. Rotation happens around the *center* of the image,
            not the top left corner as in some other frameworks.
                * If a number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``
                  and used as the rotation value.
        shear (number, tuple of number or dict): Shear in degrees (**NOT** radians), i.e. expected value range is
            around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``.
                * If a number, then that value will be used for all images as
                  the shear on the x-axis (no shear on the y-axis will be done).
                * If a tuple ``(a, b)``, then two value will be uniformly sampled per image
                  from the interval ``[a, b]`` and be used as the x- and y-shear value.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        interpolation (int): OpenCV interpolation flag.
        mask_interpolation (int): OpenCV interpolation flag.
        cval (number or sequence of number): The constant value to use when filling in newly created pixels.
            (E.g. translating by 1px to the right will create a new 1px-wide column of pixels
            on the left of the image).
            The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images.
        cval_mask (number or tuple of number): Same as cval but only for masks.
        mode (int): OpenCV border flag.
        fit_output (bool): If True, the image plane size and position will be adjusted to tightly capture
            the whole image after affine transformation (`translate_percent` and `translate_px` are ignored).
            Otherwise (``False``),  parts of the transformed image may end up outside the image plane.
            Fitting the output shape can be useful to avoid corners of the image being outside the image plane
            after applying rotations. Default: False
        keep_ratio (bool): When True, the original aspect ratio will be kept when the random scale is applied.
                           Default: False.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or
            "ellipse"[1].
            Default: "largest_box"
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    Reference:
        [1] https://arxiv.org/abs/2109.13488

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        scale: Optional[Union[ScaleFloatType, Dict[str, Any]]] = None,
        translate_percent: Optional[Union[float, Tuple[float, float], Dict[str, Any]]] = None,
        translate_px: Optional[Union[int, Tuple[int, int], Dict[str, Any]]] = None,
        rotate: Optional[ScaleFloatType] = None,
        shear: Optional[Union[ScaleFloatType, Dict[str, Any]]] = None,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        cval: Union[float, Tuple[float, float]] = 0,
        cval_mask: Union[float, Tuple[float, float]] = 0,
        mode: int = cv2.BORDER_CONSTANT,
        fit_output: bool = False,
        keep_ratio: bool = False,
        rotate_method: str = "largest_box",
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)

        params = [scale, translate_percent, translate_px, rotate, shear]
        if all(p is None for p in params):
            scale = {"x": (0.9, 1.1), "y": (0.9, 1.1)}
            translate_percent = {"x": (-0.1, 0.1), "y": (-0.1, 0.1)}
            rotate = (-15, 15)
            shear = {"x": (-10, 10), "y": (-10, 10)}
        else:
            scale = scale if scale is not None else 1.0
            rotate = rotate if rotate is not None else 0.0
            shear = shear if shear is not None else 0.0

        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.scale = self._handle_dict_arg(scale, "scale")
        self.translate_percent, self.translate_px = self._handle_translate_arg(translate_px, translate_percent)
        self.rotate = to_tuple(rotate, rotate)
        self.fit_output = fit_output
        self.shear = self._handle_dict_arg(shear, "shear")
        self.keep_ratio = keep_ratio
        self.rotate_method = rotate_method

        if self.keep_ratio and self.scale["x"] != self.scale["y"]:
            raise ValueError(f"When keep_ratio is True, the x and y scale range should be identical. got {self.scale}")

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "interpolation",
            "mask_interpolation",
            "cval",
            "mode",
            "scale",
            "translate_percent",
            "translate_px",
            "rotate",
            "fit_output",
            "shear",
            "cval_mask",
            "keep_ratio",
            "rotate_method",
        )

    @staticmethod
    def _handle_dict_arg(
        val: Union[float, Tuple[float, float], Dict[str, Any]], name: str, default: float = 1.0
    ) -> Dict[str, Any]:
        if isinstance(val, dict):
            if "x" not in val and "y" not in val:
                raise ValueError(
                    f'Expected {name} dictionary to contain at least key "x" or ' 'key "y". Found neither of them.'
                )
            x = val.get("x", default)
            y = val.get("y", default)
            return {"x": to_tuple(x, x), "y": to_tuple(y, y)}
        return {"x": to_tuple(val, val), "y": to_tuple(val, val)}

    @classmethod
    def _handle_translate_arg(
        cls,
        translate_px: Optional[Union[float, Tuple[float, float], Dict[str, Any]]],
        translate_percent: Optional[Union[float, Tuple[float, float], Dict[str, Any]]],
    ) -> Any:
        if translate_percent is None and translate_px is None:
            translate_px = 0

        if translate_percent is not None and translate_px is not None:
            msg = "Expected either translate_percent or translate_px to be " "provided, " "but neither of them was."
            raise ValueError(msg)

        if translate_percent is not None:
            # translate by percent
            return cls._handle_dict_arg(translate_percent, "translate_percent", default=0.0), translate_px

        if translate_px is None:
            msg = "translate_px is None."
            raise ValueError(msg)
        # translate by pixels
        return translate_percent, cls._handle_dict_arg(translate_px, "translate_px")

    def apply(
        self,
        img: np.ndarray,
        matrix: skimage.transform.ProjectiveTransform = None,
        output_shape: Sequence[int] = (),
        **params: Any,
    ) -> np.ndarray:
        return F.warp_affine(
            img,
            matrix,
            interpolation=cast(int, self.interpolation),
            cval=self.cval,
            mode=self.mode,
            output_shape=output_shape,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        matrix: skimage.transform.ProjectiveTransform = None,
        output_shape: Sequence[int] = (),
        **params: Any,
    ) -> np.ndarray:
        return F.warp_affine(
            mask,
            matrix,
            interpolation=self.mask_interpolation,
            cval=self.cval_mask,
            mode=self.mode,
            output_shape=output_shape,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        matrix: skimage.transform.ProjectiveTransform = None,
        rows: int = 0,
        cols: int = 0,
        output_shape: Sequence[int] = (),
        **params: Any,
    ) -> BoxInternalType:
        return F.bbox_affine(bbox, matrix, self.rotate_method, rows, cols, output_shape)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        matrix: Optional[skimage.transform.ProjectiveTransform] = None,
        scale: Optional[Dict[str, Any]] = None,
        **params: Any,
    ) -> KeypointInternalType:
        if scale is None:
            msg = "Expected scale to be provided, but got None."
            raise ValueError(msg)
        if matrix is None:
            msg = "Expected matrix to be provided, but got None."
            raise ValueError(msg)

        return F.keypoint_affine(keypoint, matrix=matrix, scale=scale)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        translate: Dict[str, Union[int, float]]
        if self.translate_px is not None:
            translate = {key: random.randint(*value) for key, value in self.translate_px.items()}
        elif self.translate_percent is not None:
            translate = {key: random.uniform(*value) for key, value in self.translate_percent.items()}
            translate["x"] = translate["x"] * width
            translate["y"] = translate["y"] * height
        else:
            translate = {"x": 0, "y": 0}

        # Look to issue https://github.com/albumentations-team/albumentations/issues/1079
        shear = {key: -random.uniform(*value) for key, value in self.shear.items()}
        scale = {key: random.uniform(*value) for key, value in self.scale.items()}
        if self.keep_ratio:
            scale["y"] = scale["x"]

        # Look to issue https://github.com/albumentations-team/albumentations/issues/1079
        rotate = -random.uniform(*self.rotate)

        # for images we use additional shifts of (0.5, 0.5) as otherwise
        # we get an ugly black border for 90deg rotations
        shift_x = width / 2 - 0.5
        shift_y = height / 2 - 0.5

        matrix_to_topleft = skimage.transform.SimilarityTransform(translation=[-shift_x, -shift_y])
        matrix_shear_y_rot = skimage.transform.AffineTransform(rotation=-np.pi / 2)
        matrix_shear_y = skimage.transform.AffineTransform(shear=np.deg2rad(shear["y"]))
        matrix_shear_y_rot_inv = skimage.transform.AffineTransform(rotation=np.pi / 2)
        matrix_transforms = skimage.transform.AffineTransform(
            scale=(scale["x"], scale["y"]),
            translation=(translate["x"], translate["y"]),
            rotation=np.deg2rad(rotate),
            shear=np.deg2rad(shear["x"]),
        )
        matrix_to_center = skimage.transform.SimilarityTransform(translation=[shift_x, shift_y])
        matrix = (
            matrix_to_topleft
            + matrix_shear_y_rot
            + matrix_shear_y
            + matrix_shear_y_rot_inv
            + matrix_transforms
            + matrix_to_center
        )
        if self.fit_output:
            matrix, output_shape = self._compute_affine_warp_output_shape(matrix, params["image"].shape)
        else:
            output_shape = params["image"].shape

        return {
            "rotate": rotate,
            "scale": scale,
            "matrix": matrix,
            "output_shape": output_shape,
        }

    @staticmethod
    def _compute_affine_warp_output_shape(
        matrix: skimage.transform.ProjectiveTransform, input_shape: Sequence[int]
    ) -> Tuple[skimage.transform.ProjectiveTransform, Sequence[int]]:
        height, width = input_shape[:2]

        if height == 0 or width == 0:
            return matrix, input_shape

        # determine shape of output image
        corners = np.array([[0, 0], [0, height - 1], [width - 1, height - 1], [width - 1, 0]])
        corners = matrix(corners)
        minc = corners[:, 0].min()
        minr = corners[:, 1].min()
        maxc = corners[:, 0].max()
        maxr = corners[:, 1].max()
        out_height = maxr - minr + 1
        out_width = maxc - minc + 1
        if len(input_shape) == THREE:
            output_shape = np.ceil((out_height, out_width, input_shape[2]))
        else:
            output_shape = np.ceil((out_height, out_width))
        output_shape_tuple = tuple([int(v) for v in output_shape.tolist()])
        # fit output image in new shape
        translation = (-minc, -minr)
        matrix_to_fit = skimage.transform.SimilarityTransform(translation=translation)
        matrix = matrix + matrix_to_fit
        return matrix, output_shape_tuple
class ElasticTransform (alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, same_dxdy=False, p=0.5) [view source on GitHub]

Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters:

Name Type Description
alpha float
sigma float

Gaussian filter parameter.

alpha_affine float

The range will be (-alpha_affine, alpha_affine)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

approximate boolean

Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images.

same_dxdy boolean

Whether to use same random generated shift for x and y. Enabling this option gives ~2X speedup.

Targets

image, mask, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class ElasticTransform(DualTransform):
    """Elastic deformation of images as described in [Simard2003]_ (with modifications).
    Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

    .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
         Convolutional Neural Networks applied to Visual Document Analysis", in
         Proc. of the International Conference on Document Analysis and
         Recognition, 2003.

    Args:
        alpha (float):
        sigma (float): Gaussian filter parameter.
        alpha_affine (float): The range will be (-alpha_affine, alpha_affine)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        approximate (boolean): Whether to smooth displacement map with fixed kernel size.
                               Enabling this option gives ~2X speedup on large images.
        same_dxdy (boolean): Whether to use same random generated shift for x and y.
                             Enabling this option gives ~2X speedup.

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    def __init__(
        self,
        alpha: float = 1,
        sigma: float = 50,
        alpha_affine: float = 50,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[Union[int, float, List[int], List[float]]] = None,
        mask_value: Optional[Union[int, float, List[int], List[float]]] = None,
        always_apply: bool = False,
        approximate: bool = False,
        same_dxdy: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.alpha = alpha
        self.alpha_affine = alpha_affine
        self.sigma = sigma
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.approximate = approximate
        self.same_dxdy = same_dxdy

    def apply(
        self, img: np.ndarray, random_state: Optional[int] = None, interpolation: int = cv2.INTER_LINEAR, **params: Any
    ) -> np.ndarray:
        return F.elastic_transform(
            img,
            self.alpha,
            self.sigma,
            self.alpha_affine,
            interpolation,
            self.border_mode,
            self.value,
            np.random.RandomState(random_state),
            self.approximate,
            self.same_dxdy,
        )

    def apply_to_mask(self, mask: np.ndarray, random_state: Optional[int] = None, **params: Any) -> np.ndarray:
        return F.elastic_transform(
            mask,
            self.alpha,
            self.sigma,
            self.alpha_affine,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
            np.random.RandomState(random_state),
            self.approximate,
            self.same_dxdy,
        )

    def apply_to_bbox(
        self, bbox: BoxInternalType, random_state: Optional[int] = None, **params: Any
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = F.elastic_transform(
            mask,
            self.alpha,
            self.sigma,
            self.alpha_affine,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
            np.random.RandomState(random_state),
            self.approximate,
        )
        bbox_returned = bbox_from_mask(mask)
        return cast(BoxInternalType, F.normalize_bbox(bbox_returned, rows, cols))

    def get_params(self) -> Dict[str, int]:
        return {"random_state": random.randint(0, 10000)}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "alpha",
            "sigma",
            "alpha_affine",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
            "approximate",
            "same_dxdy",
        )
class Flip [view source on GitHub]

Flip the input either horizontally, vertically or both horizontally and vertically.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Flip(DualTransform):
    """Flip the input either horizontally, vertically or both horizontally and vertically.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, d: int = 0, **params: Any) -> np.ndarray:
        """Args:
        d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
                -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
                180 degrees).
        """
        return F.random_flip(img, d)

    def get_params(self) -> Dict[str, int]:
        # Random int in the range [-1, 1]
        return {"d": random.randint(-1, 1)}

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_flip(bbox, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.keypoint_flip(keypoint, **params)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()
apply (self, img, d=0, **params)

d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, d: int = 0, **params: Any) -> np.ndarray:
    """Args:
    d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
            -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
            180 degrees).
    """
    return F.random_flip(img, d)
class GridDistortion (num_steps=5, distort_limit=0.3, interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=False, p=0.5) [view source on GitHub]

Parameters:

Name Type Description
num_steps int

count of grid cells on each side.

distort_limit float, (float, float

If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

normalized bool

if true, distortion will be normalized to do not go outside the image. Default: False See for more information: https://github.com/albumentations-team/albumentations/pull/722

Targets

image, mask, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class GridDistortion(DualTransform):
    """Args:
        num_steps (int): count of grid cells on each side.
        distort_limit (float, (float, float)): If distort_limit is a single float, the range
            will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        normalized (bool): if true, distortion will be normalized to do not go outside the image. Default: False
            See for more information: https://github.com/albumentations-team/albumentations/pull/722

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    def __init__(
        self,
        num_steps: int = 5,
        distort_limit: ScaleFloatType = 0.3,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ImageColorType] = None,
        mask_value: Optional[ImageColorType] = None,
        normalized: bool = False,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        self.num_steps = num_steps
        self.distort_limit = to_tuple(distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.normalized = normalized

    def apply(
        self,
        img: np.ndarray,
        stepsx: Tuple[()] = (),
        stepsy: Tuple[()] = (),
        interpolation: int = cv2.INTER_LINEAR,
        **params: Any,
    ) -> np.ndarray:
        return F.grid_distortion(img, self.num_steps, stepsx, stepsy, interpolation, self.border_mode, self.value)

    def apply_to_mask(
        self, mask: np.ndarray, stepsx: Tuple[()] = (), stepsy: Tuple[()] = (), **params: Any
    ) -> np.ndarray:
        return F.grid_distortion(
            mask, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
        )

    def apply_to_bbox(
        self, bbox: BoxInternalType, stepsx: Tuple[()] = (), stepsy: Tuple[()] = (), **params: Any
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = F.grid_distortion(
            mask, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
        )
        bbox_returned = bbox_from_mask(mask)
        return cast(BoxInternalType, F.normalize_bbox(bbox_returned, rows, cols))

    def _normalize(self, h: int, w: int, xsteps: List[float], ysteps: List[float]) -> Dict[str, Any]:
        # compensate for smaller last steps in source image.
        x_step = w // self.num_steps
        last_x_step = min(w, ((self.num_steps + 1) * x_step)) - (self.num_steps * x_step)
        xsteps[-1] *= last_x_step / x_step

        y_step = h // self.num_steps
        last_y_step = min(h, ((self.num_steps + 1) * y_step)) - (self.num_steps * y_step)
        ysteps[-1] *= last_y_step / y_step

        # now normalize such that distortion never leaves image bounds.
        tx = w / math.floor(w / self.num_steps)
        ty = h / math.floor(h / self.num_steps)
        xsteps = np.array(xsteps) * (tx / np.sum(xsteps))
        ysteps = np.array(ysteps) * (ty / np.sum(ysteps))

        return {"stepsx": xsteps, "stepsy": ysteps}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
        stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]

        if self.normalized:
            return self._normalize(height, width, stepsx, stepsy)

        return {"stepsx": stepsx, "stepsy": stepsy}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"
class HorizontalFlip [view source on GitHub]

Flip the input horizontally around the y-axis.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class HorizontalFlip(DualTransform):
    """Flip the input horizontally around the y-axis.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if img.ndim == THREE and img.shape[2] > 1 and img.dtype == np.uint8:
            # Opencv is faster than numpy only in case of
            # non-gray scale 8bits images
            return F.hflip_cv2(img)

        return F.hflip(img)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_hflip(bbox, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.keypoint_hflip(keypoint, **params)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()
class OpticalDistortion (distort_limit=0.05, shift_limit=0.05, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5) [view source on GitHub]

Parameters:

Name Type Description
distort_limit float, (float, float

If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).

shift_limit float, (float, float

If shift_limit is a single float, the range will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

Targets

image, mask, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class OpticalDistortion(DualTransform):
    """Args:
        distort_limit (float, (float, float)): If distort_limit is a single float, the range
            will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
        shift_limit (float, (float, float))): If shift_limit is a single float, the range
            will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    def __init__(
        self,
        distort_limit: ScaleFloatType = 0.05,
        shift_limit: ScaleFloatType = 0.05,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ImageColorType] = None,
        mask_value: Optional[ImageColorType] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.shift_limit = to_tuple(shift_limit)
        self.distort_limit = to_tuple(distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def apply(
        self,
        img: np.ndarray,
        k: int = 0,
        dx: int = 0,
        dy: int = 0,
        interpolation: int = cv2.INTER_LINEAR,
        **params: Any,
    ) -> np.ndarray:
        return F.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)

    def apply_to_mask(self, mask: np.ndarray, k: int = 0, dx: int = 0, dy: int = 0, **params: Any) -> np.ndarray:
        return F.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)

    def apply_to_bbox(
        self, bbox: BoxInternalType, k: int = 0, dx: int = 0, dy: int = 0, **params: Any
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = F.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
        bbox_returned = bbox_from_mask(mask)
        return cast(BoxInternalType, F.normalize_bbox(bbox_returned, rows, cols))

    def get_params(self) -> Dict[str, Any]:
        return {
            "k": random.uniform(self.distort_limit[0], self.distort_limit[1]),
            "dx": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
            "dy": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
        }

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "distort_limit",
            "shift_limit",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
        )
class PadIfNeeded (min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=False, p=1.0) [view source on GitHub]

Pad side of the image / max if side is less than desired number.

Parameters:

Name Type Description
min_height int

minimal result image height.

min_width int

minimal result image width.

pad_height_divisor int

if not None, ensures image height is dividable by value of this argument.

pad_width_divisor int

if not None, ensures image width is dividable by value of this argument.

position Union[str, PositionType]

Position of the image. should be PositionType.CENTER or PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT. or PositionType.RANDOM. Default: PositionType.CENTER.

border_mode OpenCV flag

OpenCV border mode.

value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

padding value for mask if border_mode is cv2.BORDER_CONSTANT.

p float

probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class PadIfNeeded(DualTransform):
    """Pad side of the image / max if side is less than desired number.

    Args:
        min_height (int): minimal result image height.
        min_width (int): minimal result image width.
        pad_height_divisor (int): if not None, ensures image height is dividable by value of this argument.
        pad_width_divisor (int): if not None, ensures image width is dividable by value of this argument.
        position (Union[str, PositionType]): Position of the image. should be PositionType.CENTER or
            PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT.
            or PositionType.RANDOM. Default: PositionType.CENTER.
        border_mode (OpenCV flag): OpenCV border mode.
        value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of int,
                    list of float): padding value for mask if border_mode is cv2.BORDER_CONSTANT.
        p (float): probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class PositionType(Enum):
        """Enumerates the types of positions for placing an object within a container.

        This Enum class is utilized to define specific anchor positions that an object can
        assume relative to a container. It's particularly useful in image processing, UI layout,
        and graphic design to specify the alignment and positioning of elements.

        Attributes:
            CENTER (str): Specifies that the object should be placed at the center.
            TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
            TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
            BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
            BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
            RANDOM (str): Indicates that the object's position should be determined randomly.

        """

        CENTER = "center"
        TOP_LEFT = "top_left"
        TOP_RIGHT = "top_right"
        BOTTOM_LEFT = "bottom_left"
        BOTTOM_RIGHT = "bottom_right"
        RANDOM = "random"

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        min_height: Optional[int] = 1024,
        min_width: Optional[int] = 1024,
        pad_height_divisor: Optional[int] = None,
        pad_width_divisor: Optional[int] = None,
        position: Union[PositionType, str] = PositionType.CENTER,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ImageColorType] = None,
        mask_value: Optional[ImageColorType] = None,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        if (min_height is None) == (pad_height_divisor is None):
            msg = "Only one of 'min_height' and 'pad_height_divisor' parameters must be set"
            raise ValueError(msg)

        if (min_width is None) == (pad_width_divisor is None):
            msg = "Only one of 'min_width' and 'pad_width_divisor' parameters must be set"
            raise ValueError(msg)

        super().__init__(always_apply, p)
        self.min_height = min_height
        self.min_width = min_width
        self.pad_width_divisor = pad_width_divisor
        self.pad_height_divisor = pad_height_divisor
        self.position = PadIfNeeded.PositionType(position)
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
        params = super().update_params(params, **kwargs)
        rows = params["rows"]
        cols = params["cols"]

        if self.min_height is not None:
            if rows < self.min_height:
                h_pad_top = int((self.min_height - rows) / 2.0)
                h_pad_bottom = self.min_height - rows - h_pad_top
            else:
                h_pad_top = 0
                h_pad_bottom = 0
        else:
            pad_remained = rows % self.pad_height_divisor
            pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0

            h_pad_top = pad_rows // 2
            h_pad_bottom = pad_rows - h_pad_top

        if self.min_width is not None:
            if cols < self.min_width:
                w_pad_left = int((self.min_width - cols) / 2.0)
                w_pad_right = self.min_width - cols - w_pad_left
            else:
                w_pad_left = 0
                w_pad_right = 0
        else:
            pad_remainder = cols % self.pad_width_divisor
            pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0

            w_pad_left = pad_cols // 2
            w_pad_right = pad_cols - w_pad_left

        h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
            h_top=h_pad_top, h_bottom=h_pad_bottom, w_left=w_pad_left, w_right=w_pad_right
        )

        params.update(
            {
                "pad_top": h_pad_top,
                "pad_bottom": h_pad_bottom,
                "pad_left": w_pad_left,
                "pad_right": w_pad_right,
            }
        )
        return params

    def apply(
        self,
        img: np.ndarray,
        pad_top: int = 0,
        pad_bottom: int = 0,
        pad_left: int = 0,
        pad_right: int = 0,
        **params: Any,
    ) -> np.ndarray:
        return F.pad_with_params(
            img,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.value,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        pad_top: int = 0,
        pad_bottom: int = 0,
        pad_left: int = 0,
        pad_right: int = 0,
        **params: Any,
    ) -> np.ndarray:
        return F.pad_with_params(
            mask,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.mask_value,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        pad_top: int = 0,
        pad_bottom: int = 0,
        pad_left: int = 0,
        pad_right: int = 0,
        rows: int = 0,
        cols: int = 0,
        **params: Any,
    ) -> BoxInternalType:
        x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
        bbox = x_min + pad_left, y_min + pad_top, x_max + pad_left, y_max + pad_top
        return cast(BoxInternalType, normalize_bbox(bbox, rows + pad_top + pad_bottom, cols + pad_left + pad_right))

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        pad_top: int = 0,
        pad_bottom: int = 0,
        pad_left: int = 0,
        pad_right: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        x, y, angle, scale = keypoint[:4]
        return x + pad_left, y + pad_top, angle, scale

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "min_height",
            "min_width",
            "pad_height_divisor",
            "pad_width_divisor",
            "position",
            "border_mode",
            "value",
            "mask_value",
        )

    def __update_position_params(
        self, h_top: int, h_bottom: int, w_left: int, w_right: int
    ) -> Tuple[int, int, int, int]:
        if self.position == PadIfNeeded.PositionType.TOP_LEFT:
            h_bottom += h_top
            w_right += w_left
            h_top = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.TOP_RIGHT:
            h_bottom += h_top
            w_left += w_right
            h_top = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_LEFT:
            h_top += h_bottom
            w_right += w_left
            h_bottom = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_RIGHT:
            h_top += h_bottom
            w_left += w_right
            h_bottom = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.RANDOM:
            h_pad = h_top + h_bottom
            w_pad = w_left + w_right
            h_top = random.randint(0, h_pad)
            h_bottom = h_pad - h_top
            w_left = random.randint(0, w_pad)
            w_right = w_pad - w_left

        return h_top, h_bottom, w_left, w_right
class PositionType

Enumerates the types of positions for placing an object within a container.

This Enum class is utilized to define specific anchor positions that an object can assume relative to a container. It's particularly useful in image processing, UI layout, and graphic design to specify the alignment and positioning of elements.

Attributes:

Name Type Description
CENTER str

Specifies that the object should be placed at the center.

TOP_LEFT str

Specifies that the object should be placed at the top-left corner.

TOP_RIGHT str

Specifies that the object should be placed at the top-right corner.

BOTTOM_LEFT str

Specifies that the object should be placed at the bottom-left corner.

BOTTOM_RIGHT str

Specifies that the object should be placed at the bottom-right corner.

RANDOM str

Indicates that the object's position should be determined randomly.

Source code in albumentations/augmentations/geometric/transforms.py
Python
class PositionType(Enum):
    """Enumerates the types of positions for placing an object within a container.

    This Enum class is utilized to define specific anchor positions that an object can
    assume relative to a container. It's particularly useful in image processing, UI layout,
    and graphic design to specify the alignment and positioning of elements.

    Attributes:
        CENTER (str): Specifies that the object should be placed at the center.
        TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
        TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
        BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
        BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
        RANDOM (str): Indicates that the object's position should be determined randomly.

    """

    CENTER = "center"
    TOP_LEFT = "top_left"
    TOP_RIGHT = "top_right"
    BOTTOM_LEFT = "bottom_left"
    BOTTOM_RIGHT = "bottom_right"
    RANDOM = "random"
class Perspective (scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Perform a random four point perspective transform of the input.

Parameters:

Name Type Description
scale Union[float, Tuple[float, float]]

standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).

keep_size bool

Whether to resize image back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True

pad_mode OpenCV flag

OpenCV border mode.

pad_val int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0

mask_pad_val int, float, list of int, list of float

padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0

fit_output bool

If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Perspective(DualTransform):
    """Perform a random four point perspective transform of the input.

    Args:
        scale: standard deviation of the normal distributions. These are used to sample
            the random distances of the subimage's corners from the full image's corners.
            If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
        keep_size: Whether to resize image back to their original size after applying the perspective
            transform. If set to False, the resulting images may end up having different shapes
            and will always be a list, never an array. Default: True
        pad_mode (OpenCV flag): OpenCV border mode.
        pad_val (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
            Default: 0
        mask_pad_val (int, float, list of int, list of float): padding value for mask
            if border_mode is cv2.BORDER_CONSTANT. Default: 0
        fit_output (bool): If True, the image plane size and position will be adjusted to still capture
            the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.)
            Otherwise, parts of the transformed image may be outside of the image plane.
            This setting should not be set to True when using large scale values as it could lead to very large images.
            Default: False
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    def __init__(
        self,
        scale: ScaleFloatType = (0.05, 0.1),
        keep_size: bool = True,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_val: Union[float, List[float]] = 0,
        mask_pad_val: Union[float, List[float]] = 0,
        fit_output: bool = False,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.scale = to_tuple(scale, 0)
        self.keep_size = keep_size
        self.pad_mode = pad_mode
        self.pad_val = pad_val
        self.mask_pad_val = mask_pad_val
        self.fit_output = fit_output
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return F.perspective(
            img, matrix, max_width, max_height, self.pad_val, self.pad_mode, self.keep_size, params["interpolation"]
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> BoxInternalType:
        return F.perspective_bbox(bbox, params["rows"], params["cols"], matrix, max_width, max_height, self.keep_size)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return F.perspective_keypoint(
            keypoint, params["rows"], params["cols"], matrix, max_width, max_height, self.keep_size
        )

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        scale = random_utils.uniform(*self.scale)
        points = random_utils.normal(0, scale, [4, 2])
        points = np.mod(np.abs(points), 0.32)

        # top left -- no changes needed, just use jitter
        # top right
        points[1, 0] = 1.0 - points[1, 0]  # w = 1.0 - jitter
        # bottom right
        points[2] = 1.0 - points[2]  # w = 1.0 - jitt
        # bottom left
        points[3, 1] = 1.0 - points[3, 1]  # h = 1.0 - jitter

        points[:, 0] *= width
        points[:, 1] *= height

        # Obtain a consistent order of the points and unpack them individually.
        # Warning: don't just do (tl, tr, br, bl) = _order_points(...)
        # here, because the reordered points is used further below.
        points = self._order_points(points)
        tl, tr, br, bl = points

        # compute the width of the new image, which will be the
        # maximum distance between bottom-right and bottom-left
        # x-coordiates or the top-right and top-left x-coordinates
        min_width = None
        max_width = None
        while min_width is None or min_width < TWO:
            width_top = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
            width_bottom = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
            max_width = int(max(width_top, width_bottom))
            min_width = int(min(width_top, width_bottom))
            if min_width < TWO:
                step_size = (2 - min_width) / 2
                tl[0] -= step_size
                tr[0] += step_size
                bl[0] -= step_size
                br[0] += step_size

        # compute the height of the new image, which will be the maximum distance between the top-right
        # and bottom-right y-coordinates or the top-left and bottom-left y-coordinates
        min_height = None
        max_height = None
        while min_height is None or min_height < TWO:
            height_right = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
            height_left = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
            max_height = int(max(height_right, height_left))
            min_height = int(min(height_right, height_left))
            if min_height < TWO:
                step_size = (2 - min_height) / 2
                tl[1] -= step_size
                tr[1] -= step_size
                bl[1] += step_size
                br[1] += step_size

        # now that we have the dimensions of the new image, construct
        # the set of destination points to obtain a "birds eye view",
        # (i.e. top-down view) of the image, again specifying points
        # in the top-left, top-right, bottom-right, and bottom-left order
        # do not use width-1 or height-1 here, as for e.g. width=3, height=2
        # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
        dst = np.array([[0, 0], [max_width, 0], [max_width, max_height], [0, max_height]], dtype=np.float32)

        # compute the perspective transform matrix and then apply it
        m = cv2.getPerspectiveTransform(points, dst)

        if self.fit_output:
            m, max_width, max_height = self._expand_transform(m, (height, width))

        return {"matrix": m, "max_height": max_height, "max_width": max_width, "interpolation": self.interpolation}

    @classmethod
    def _expand_transform(cls, matrix: np.ndarray, shape: SizeType) -> Tuple[np.ndarray, int, int]:
        height, width = shape[:2]
        # do not use width-1 or height-1 here, as for e.g. width=3, height=2, max_height
        # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
        rect = np.array([[0, 0], [width, 0], [width, height], [0, height]], dtype=np.float32)
        dst = cv2.perspectiveTransform(np.array([rect]), matrix)[0]

        # get min x, y over transformed 4 points
        # then modify target points by subtracting these minima  => shift to (0, 0)
        dst -= dst.min(axis=0, keepdims=True)
        dst = np.around(dst, decimals=0)

        matrix_expanded = cv2.getPerspectiveTransform(rect, dst)
        max_width, max_height = dst.max(axis=0)
        return matrix_expanded, int(max_width), int(max_height)

    @staticmethod
    def _order_points(pts: np.ndarray) -> np.ndarray:
        pts = np.array(sorted(pts, key=lambda x: x[0]))
        left = pts[:2]  # points with smallest x coordinate - left points
        right = pts[2:]  # points with greatest x coordinate - right points

        if left[0][1] < left[1][1]:
            tl, bl = left
        else:
            bl, tl = left

        if right[0][1] < right[1][1]:
            tr, br = right
        else:
            br, tr = right

        return np.array([tl, tr, br, bl], dtype=np.float32)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "scale", "keep_size", "pad_mode", "pad_val", "mask_pad_val", "fit_output", "interpolation"
class PiecewiseAffine (scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5) [view source on GitHub]

Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.

This is mostly a wrapper around scikit-image's PiecewiseAffine. See also Affine for a similar technique.

Note

This augmenter is very slow. Try to use ElasticTransformation instead, which is at least 10x faster.

Note

For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.

Parameters:

Name Type Description
scale float, tuple of float

Each point on the regular grid is moved around via a normal distribution. This scale factor is equivalent to the normal distribution's sigma. Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of the image if absolute_scale=False (default), so this scale can be the same for different sized images. Recommended values are in the range 0.01 to 0.05 (weak to strong augmentations). * If a single float, then that value will always be used as the scale. * If a tuple (a, b) of float s, then a random value will be uniformly sampled per image from the interval [a, b].

nb_rows int, tuple of int

Number of rows of points that the regular grid should have. Must be at least 2. For large images, you might want to pick a higher value than 4. You might have to then adjust scale to lower values. * If a single int, then that value will always be used as the number of rows. * If a tuple (a, b), then a value from the discrete interval [a..b] will be uniformly sampled per image.

nb_cols int, tuple of int

Number of columns. Analogous to nb_rows.

interpolation int

The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic

mask_interpolation int

same as interpolation but for mask.

cval number

The constant value to use when filling in newly created pixels.

cval_mask number

Same as cval but only for masks.

mode str

{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional Points outside the boundaries of the input are filled according to the given mode. Modes match the behaviour of numpy.pad.

absolute_scale bool

Take scale as an absolute value rather than a relative value.

keypoints_threshold float

Used as threshold in conversion from distance maps to keypoints. The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use None to use no min/max. Default: 0.01

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class PiecewiseAffine(DualTransform):
    """Apply affine transformations that differ between local neighbourhoods.
    This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point
    around via affine transformations. This leads to local distortions.

    This is mostly a wrapper around scikit-image's ``PiecewiseAffine``.
    See also ``Affine`` for a similar technique.

    Note:
        This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.

    Note:
        For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
        this augmenter still has to perform an image-based augmentation,
        which will make it significantly slower and not fully correct for such inputs than other transforms.

    Args:
        scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
            This scale factor is equivalent to the normal distribution's sigma.
            Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
            the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
            Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
                * If a single ``float``, then that value will always be used as the scale.
                * If a tuple ``(a, b)`` of ``float`` s, then a random value will
                  be uniformly sampled per image from the interval ``[a, b]``.
        nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
            Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
            You might have to then adjust scale to lower values.
                * If a single ``int``, then that value will always be used as the number of rows.
                * If a tuple ``(a, b)``, then a value from the discrete interval
                  ``[a..b]`` will be uniformly sampled per image.
        nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
        interpolation (int): The order of interpolation. The order has to be in the range 0-5:
             - 0: Nearest-neighbor
             - 1: Bi-linear (default)
             - 2: Bi-quadratic
             - 3: Bi-cubic
             - 4: Bi-quartic
             - 5: Bi-quintic
        mask_interpolation (int): same as interpolation but for mask.
        cval (number): The constant value to use when filling in newly created pixels.
        cval_mask (number): Same as cval but only for masks.
        mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
            Points outside the boundaries of the input are filled according
            to the given mode.  Modes match the behaviour of `numpy.pad`.
        absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
        keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
            The search for keypoints works by searching for the
            argmin (non-inverted) or argmax (inverted) in each channel. This
            parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
            as a keypoint. Use ``None`` to use no min/max. Default: 0.01

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(
        self,
        scale: ScaleFloatType = (0.03, 0.05),
        nb_rows: Union[ScaleIntType] = 4,
        nb_cols: Union[ScaleIntType] = 4,
        interpolation: int = 1,
        mask_interpolation: int = 0,
        cval: int = 0,
        cval_mask: int = 0,
        mode: str = "constant",
        absolute_scale: bool = False,
        always_apply: bool = False,
        keypoints_threshold: float = 0.01,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        self.scale = to_tuple(scale, scale)
        self.nb_rows = to_tuple(nb_rows, nb_rows)
        self.nb_cols = to_tuple(nb_cols, nb_cols)
        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.absolute_scale = absolute_scale
        self.keypoints_threshold = keypoints_threshold

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "scale",
            "nb_rows",
            "nb_cols",
            "interpolation",
            "mask_interpolation",
            "cval",
            "cval_mask",
            "mode",
            "absolute_scale",
            "keypoints_threshold",
        )

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
        nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
        nb_cells = nb_cols * nb_rows
        scale = random.uniform(*self.scale)

        jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
        if not np.any(jitter > 0):
            for _ in range(10):  # See: https://github.com/albumentations-team/albumentations/issues/1442
                jitter = random_utils.normal(0, scale, (nb_cells, 2))
                if np.any(jitter > 0):
                    break
            if not np.any(jitter > 0):
                return {"matrix": None}

        y = np.linspace(0, height, nb_rows)
        x = np.linspace(0, width, nb_cols)

        # (H, W) and (H, W) for H=rows, W=cols
        xx_src, yy_src = np.meshgrid(x, y)

        # (1, HW, 2) => (HW, 2) for H=rows, W=cols
        points_src = np.dstack([yy_src.flat, xx_src.flat])[0]

        if self.absolute_scale:
            jitter[:, 0] = jitter[:, 0] / height if height > 0 else 0.0
            jitter[:, 1] = jitter[:, 1] / width if width > 0 else 0.0

        jitter[:, 0] = jitter[:, 0] * height
        jitter[:, 1] = jitter[:, 1] * width

        points_dest = np.copy(points_src)
        points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
        points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]

        # Restrict all destination points to be inside the image plane.
        # This is necessary, as otherwise keypoints could be augmented
        # outside of the image plane and these would be replaced by
        # (-1, -1), which would not conform with the behaviour of the other augmenters.
        points_dest[:, 0] = np.clip(points_dest[:, 0], 0, height - 1)
        points_dest[:, 1] = np.clip(points_dest[:, 1], 0, width - 1)

        matrix = skimage.transform.PiecewiseAffineTransform()
        matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])

        return {
            "matrix": matrix,
        }

    def apply(
        self, img: np.ndarray, matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None, **params: Any
    ) -> np.ndarray:
        return F.piecewise_affine(img, matrix, cast(int, self.interpolation), self.mode, self.cval)

    def apply_to_mask(
        self, mask: np.ndarray, matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None, **params: Any
    ) -> np.ndarray:
        return F.piecewise_affine(mask, matrix, self.mask_interpolation, self.mode, self.cval_mask)

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        rows: int = 0,
        cols: int = 0,
        matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None,
        **params: Any,
    ) -> BoxInternalType:
        return F.bbox_piecewise_affine(bbox, matrix, rows, cols, self.keypoints_threshold)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        rows: int = 0,
        cols: int = 0,
        matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None,
        **params: Any,
    ) -> KeypointInternalType:
        return F.keypoint_piecewise_affine(keypoint, matrix, rows, cols, self.keypoints_threshold)
class ShiftScaleRotate (shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=False, p=0.5) [view source on GitHub]

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters:

Name Type Description
shift_limit float, float) or float

shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).

scale_limit float, float) or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).

rotate_limit int, int) or int

rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

shift_limit_x float, float) or float

shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None.

shift_limit_y float, float) or float

shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class ShiftScaleRotate(DualTransform):
    """Randomly apply affine transforms: translate, scale and rotate the input.

    Args:
        shift_limit ((float, float) or float): shift factor range for both height and width. If shift_limit
            is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and
            upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).
        scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
            range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
            If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
            Default: (-0.1, 0.1).
        rotate_limit ((int, int) or int): rotation range. If rotate_limit is a single int value, the
            range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of int,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        shift_limit_x ((float, float) or float): shift factor range for width. If it is set then this value
            instead of shift_limit will be used for shifting width.  If shift_limit_x is a single float value,
            the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in
            the range [0, 1]. Default: None.
        shift_limit_y ((float, float) or float): shift factor range for height. If it is set then this value
            instead of shift_limit will be used for shifting height.  If shift_limit_y is a single float value,
            the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie
            in the range [0, 1]. Default: None.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
            Default: "largest_box"
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    def __init__(
        self,
        shift_limit: ScaleFloatType = 0.0625,
        scale_limit: ScaleFloatType = 0.1,
        rotate_limit: int = 45,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[Tuple[int, ...]] = None,
        mask_value: Optional[Tuple[int, ...]] = None,
        shift_limit_x: Optional[ScaleFloatType] = None,
        shift_limit_y: Optional[ScaleFloatType] = None,
        rotate_method: str = "largest_box",
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.shift_limit_x = to_tuple(shift_limit_x if shift_limit_x is not None else shift_limit)
        self.shift_limit_y = to_tuple(shift_limit_y if shift_limit_y is not None else shift_limit)
        self.scale_limit = to_tuple(scale_limit, bias=1.0)
        self.rotate_limit = to_tuple(rotate_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.rotate_method = rotate_method

        if self.rotate_method not in ["largest_box", "ellipse"]:
            raise ValueError(f"Rotation method {self.rotate_method} is not valid.")

    def apply(
        self,
        img: np.ndarray,
        angle: float = 0,
        scale: float = 0,
        dx: int = 0,
        dy: int = 0,
        interpolation: int = cv2.INTER_LINEAR,
        **params: Any,
    ) -> np.ndarray:
        return F.shift_scale_rotate(img, angle, scale, dx, dy, interpolation, self.border_mode, self.value)

    def apply_to_mask(
        self, mask: np.ndarray, angle: float = 0, scale: float = 0, dx: int = 0, dy: int = 0, **params: Any
    ) -> np.ndarray:
        return F.shift_scale_rotate(mask, angle, scale, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        angle: float = 0,
        scale: float = 0,
        dx: int = 0,
        dy: int = 0,
        rows: int = 0,
        cols: int = 0,
        **params: Any,
    ) -> KeypointInternalType:
        return F.keypoint_shift_scale_rotate(keypoint, angle, scale, dx, dy, rows, cols)

    def get_params(self) -> Dict[str, Any]:
        return {
            "angle": random.uniform(self.rotate_limit[0], self.rotate_limit[1]),
            "scale": random.uniform(self.scale_limit[0], self.scale_limit[1]),
            "dx": random.uniform(self.shift_limit_x[0], self.shift_limit_x[1]),
            "dy": random.uniform(self.shift_limit_y[0], self.shift_limit_y[1]),
        }

    def apply_to_bbox(
        self, bbox: BoxInternalType, angle: float, scale: float, dx: int, dy: int, **params: Any
    ) -> BoxInternalType:
        return F.bbox_shift_scale_rotate(bbox, angle, scale, dx, dy, self.rotate_method, **params)

    def get_transform_init_args(self) -> Dict[str, Any]:
        return {
            "shift_limit_x": self.shift_limit_x,
            "shift_limit_y": self.shift_limit_y,
            "scale_limit": to_tuple(self.scale_limit, bias=-1.0),
            "rotate_limit": self.rotate_limit,
            "interpolation": self.interpolation,
            "border_mode": self.border_mode,
            "value": self.value,
            "mask_value": self.mask_value,
            "rotate_method": self.rotate_method,
        }
class Transpose (always_apply=False, p=0.5) [view source on GitHub]

Transpose the input by swapping rows and columns.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Transpose(DualTransform):
    """Transpose the input by swapping rows and columns.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(self, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.transpose(img)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_transpose(bbox, 0, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.keypoint_transpose(keypoint)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()
class VerticalFlip [view source on GitHub]

Flip the input vertically around the x-axis.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/geometric/transforms.py
Python
class VerticalFlip(DualTransform):
    """Flip the input vertically around the x-axis.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.vflip(img)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return F.bbox_vflip(bbox, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return F.keypoint_vflip(keypoint, **params)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()

mixing special

transforms

class MixUp (reference_data=None, read_fn=<function MixUp.<lambda> at 0x7f0fe4e22040>, alpha=0.4, mix_coef_return_name='mix_coef', always_apply=False, p=0.5) [view source on GitHub]

Performs MixUp data augmentation, blending images, masks, and class labels with reference data.

MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution. This method is known for improving model generalization by promoting linear behavior between classes and smoothing decision boundaries.

Reference

Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412

Parameters:

Name Type Description
reference_data Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]

A sequence or generator of dictionaries containing the reference data for mixing If None or an empty sequence is provided, no operation is performed and a warning is issued.

read_fn Callable[[ReferenceImage], Dict[str, Any]]

A function to process items from reference_data. It should accept items from reference_data and return a dictionary containing processed data: - The returned dictionary must include an 'image' key with a numpy array value. - It may also include 'mask', 'global_label' each associated with numpy array values. Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.

mix_coef_return_name str

Name used for the applied alpha coefficient in the returned dictionary. Defaults to "mix_coef".

alpha float

The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0. Higher values lead to more uniform mixing. Defaults to 0.4.

p float

The probability of applying the transformation. Defaults to 0.5.

Targets

image, mask, global_label

Image types: - uint8, float32

Exceptions:

Type Description
- ValueError

If the alpha parameter is negative.

- NotImplementedError

If the transform is applied to bounding boxes or keypoints.

Notes

  • If no reference data is provided, a warning is issued, and the transform acts as a no-op.
  • Notes if images are in float32 format, they should be within [0, 1] range.

Example Usage: import albumentations as A import numpy as np from albumentations.core.types import ReferenceImage

# Prepare reference data
# Note: This code generates random reference data for demonstration purposes only.
# In real-world applications, it's crucial to use meaningful and representative data.
# The quality and relevance of your input data significantly impact the effectiveness
# of the augmentation process. Ensure your data closely aligns with your specific
# use case and application requirements.
reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
                                 mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
                                 global_label=np.random.choice([0, 1], size=3)) for i in range(10)]

# In this example, the lambda function simply returns its input, which works well for
# data already in the expected format. For more complex scenarios, where the data might not be in
# the required format or additional processing is needed, a more sophisticated function can be implemented.
# Below is a hypothetical example where the input data is a file path, # and the function reads the image
# file, converts it to a specific format, and possibly performs other preprocessing steps.

# Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
# def custom_read_fn(file_path):
#     from PIL import Image
#     image = Image.open(file_path).convert('RGB')
#     image = image.resize((100, 100))  # Example resize, adjust as needed.
#     return np.array(image)

# aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])

# For simplicity, the original lambda function is used in this example.
# Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.

# Apply augmentations
image = np.empty([100, 100, 3], dtype=np.uint8)
mask = np.empty([100, 100], dtype=np.uint8)
global_label = np.array([0, 1, 0])
data = aug(image=image, global_label=global_label, mask=mask)
transformed_image = data["image"]
transformed_mask = data["mask"]
transformed_global_label = data["global_label"]

# Print applied mix coefficient
print(data["mix_coef"])  # Output: e.g., 0.9991580344142427
Source code in albumentations/augmentations/mixing/transforms.py
Python
class MixUp(ReferenceBasedTransform):
    """Performs MixUp data augmentation, blending images, masks, and class labels with reference data.

    MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined
    reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution.
    This method is known for improving model generalization by promoting linear behavior between classes and
    smoothing decision boundaries.

    Reference:
        Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization.
        In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412

    Args:
        reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
            A sequence or generator of dictionaries containing the reference data for mixing
            If None or an empty sequence is provided, no operation is performed and a warning is issued.
        read_fn (Callable[[ReferenceImage], Dict[str, Any]]):
            A function to process items from reference_data. It should accept items from reference_data
            and return a dictionary containing processed data:
                - The returned dictionary must include an 'image' key with a numpy array value.
                - It may also include 'mask', 'global_label' each associated with numpy array values.
            Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.
        mix_coef_return_name (str): Name used for the applied alpha coefficient in the returned dictionary.
            Defaults to "mix_coef".
        alpha (float):
            The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0.
            Higher values lead to more uniform mixing. Defaults to 0.4.
        p (float):
            The probability of applying the transformation. Defaults to 0.5.

    Targets:
        image, mask, global_label

    Image types:
        - uint8, float32

    Raises:
        - ValueError: If the alpha parameter is negative.
        - NotImplementedError: If the transform is applied to bounding boxes or keypoints.

    Notes:
        - If no reference data is provided, a warning is issued, and the transform acts as a no-op.
        - Notes if images are in float32 format, they should be within [0, 1] range.

    Example Usage:
        import albumentations as A
        import numpy as np
        from albumentations.core.types import ReferenceImage

        # Prepare reference data
        # Note: This code generates random reference data for demonstration purposes only.
        # In real-world applications, it's crucial to use meaningful and representative data.
        # The quality and relevance of your input data significantly impact the effectiveness
        # of the augmentation process. Ensure your data closely aligns with your specific
        # use case and application requirements.
        reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
                                         mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
                                         global_label=np.random.choice([0, 1], size=3)) for i in range(10)]

        # In this example, the lambda function simply returns its input, which works well for
        # data already in the expected format. For more complex scenarios, where the data might not be in
        # the required format or additional processing is needed, a more sophisticated function can be implemented.
        # Below is a hypothetical example where the input data is a file path, # and the function reads the image
        # file, converts it to a specific format, and possibly performs other preprocessing steps.

        # Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
        # def custom_read_fn(file_path):
        #     from PIL import Image
        #     image = Image.open(file_path).convert('RGB')
        #     image = image.resize((100, 100))  # Example resize, adjust as needed.
        #     return np.array(image)

        # aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])

        # For simplicity, the original lambda function is used in this example.
        # Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.

        # Apply augmentations
        image = np.empty([100, 100, 3], dtype=np.uint8)
        mask = np.empty([100, 100], dtype=np.uint8)
        global_label = np.array([0, 1, 0])
        data = aug(image=image, global_label=global_label, mask=mask)
        transformed_image = data["image"]
        transformed_mask = data["mask"]
        transformed_global_label = data["global_label"]

        # Print applied mix coefficient
        print(data["mix_coef"])  # Output: e.g., 0.9991580344142427
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.GLOBAL_LABEL)

    def __init__(
        self,
        reference_data: Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]] = None,
        read_fn: Callable[[ReferenceImage], Any] = lambda x: {"image": x, "mask": None, "class_label": None},
        alpha: float = 0.4,
        mix_coef_return_name: str = "mix_coef",
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.mix_coef_return_name = mix_coef_return_name

        if alpha < 0:
            msg = "Alpha must be >= 0."
            raise ValueError(msg)

        self.read_fn = read_fn
        self.alpha = alpha

        if reference_data is None:
            warn("No reference data provided for MixUp. This transform will act as a no-op.")
            # Create an empty generator
            self.reference_data: List[Any] = []
        elif (
            isinstance(reference_data, types.GeneratorType)
            or isinstance(reference_data, Iterable)
            and not isinstance(reference_data, str)
        ):
            self.reference_data = reference_data  # type: ignore[assignment]
        else:
            msg = "reference_data must be a list, tuple, generator, or None."
            raise TypeError(msg)

    def apply(self, img: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
        mix_img = mix_data.get("image")

        if not is_grayscale_image(img) and img.shape != img.shape:
            msg = "The shape of the reference image should be the same as the input image."
            raise ValueError(msg)

        return mix_arrays(img, mix_img, mix_coef) if mix_img is not None else img

    def apply_to_mask(self, mask: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
        mix_mask = mix_data.get("mask")
        return mix_arrays(mask, mix_mask, mix_coef) if mix_mask is not None else mask

    def apply_to_global_label(
        self, label: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any
    ) -> np.ndarray:
        mix_label = mix_data.get("global_label")
        if mix_label is not None and label is not None:
            return mix_coef * label + (1 - mix_coef) * mix_label
        return label

    def apply_to_bboxes(self, bboxes: Sequence[BoxType], mix_data: ReferenceImage, **params: Any) -> Sequence[BoxType]:
        msg = "MixUp does not support bounding boxes yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
        raise NotImplementedError(msg)

    def apply_to_keypoints(
        self, keypoints: Sequence[KeypointType], *args: Any, **params: Any
    ) -> Sequence[KeypointType]:
        msg = "MixUp does not support keypoints yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
        raise NotImplementedError(msg)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "reference_data", "alpha"

    def get_params(self) -> Dict[str, Union[None, float, Dict[str, Any]]]:
        mix_data = None
        # Check if reference_data is not empty and is a sequence (list, tuple, np.array)
        if isinstance(self.reference_data, Sequence) and not isinstance(self.reference_data, (str, bytes)):
            if len(self.reference_data) > 0:  # Additional check to ensure it's not empty
                mix_idx = random.randint(0, len(self.reference_data) - 1)
                mix_data = self.reference_data[mix_idx]
        # Check if reference_data is an iterator or generator
        elif isinstance(self.reference_data, Iterator):
            try:
                mix_data = next(self.reference_data)  # Attempt to get the next item
            except StopIteration:
                warn(
                    "Reference data iterator/generator has been exhausted. "
                    "Further mixing augmentations will not be applied.",
                    RuntimeWarning,
                )
                return {"mix_data": {}, "mix_coef": 1}

        # If mix_data is None or empty after the above checks, return default values
        if mix_data is None:
            return {"mix_data": {}, "mix_coef": 1}

        # If mix_data is not None, calculate mix_coef and apply read_fn
        mix_coef = beta(self.alpha, self.alpha)  # Assuming beta is defined elsewhere
        return {"mix_data": self.read_fn(mix_data), "mix_coef": mix_coef}

    def apply_with_params(self, params: Dict[str, Any], *args: Any, **kwargs: Any) -> Dict[str, Any]:
        res = super().apply_with_params(params, *args, **kwargs)
        if self.mix_coef_return_name:
            res[self.mix_coef_return_name] = params["mix_coef"]
        return res

transforms

class CLAHE (clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5) [view source on GitHub]

Apply Contrast Limited Adaptive Histogram Equalization to the input image.

Parameters:

Name Type Description
clip_limit Union[float, Tuple[float, float]]

upper threshold value for contrast limiting. If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).

tile_grid_size Tuple[int, int]

size of grid for histogram equalization. Default: (8, 8).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8

Source code in albumentations/augmentations/transforms.py
Python
class CLAHE(ImageOnlyTransform):
    """Apply Contrast Limited Adaptive Histogram Equalization to the input image.

    Args:
        clip_limit: upper threshold value for contrast limiting.
            If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
        tile_grid_size: size of grid for histogram equalization. Default: (8, 8).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8

    """

    def __init__(
        self,
        clip_limit: ScaleFloatType = 4.0,
        tile_grid_size: Tuple[int, int] = (8, 8),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.clip_limit = to_tuple(clip_limit, 1)
        self.tile_grid_size = cast(Tuple[int, int], tuple(tile_grid_size))

    def apply(self, img: np.ndarray, clip_limit: float = 2, **params: Any) -> np.ndarray:
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "CLAHE transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)

        return F.clahe(img, clip_limit, self.tile_grid_size)

    def get_params(self) -> Dict[str, float]:
        return {"clip_limit": random.uniform(self.clip_limit[0], self.clip_limit[1])}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("clip_limit", "tile_grid_size")

class ChannelShuffle [view source on GitHub]

Randomly rearrange channels of the input RGB image.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class ChannelShuffle(ImageOnlyTransform):
    """Randomly rearrange channels of the input RGB image.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def apply(self, img: np.ndarray, channels_shuffled: Tuple[int, int, int] = (0, 1, 2), **params: Any) -> np.ndarray:
        return F.channel_shuffle(img, channels_shuffled)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        ch_arr = list(range(img.shape[2]))
        random.shuffle(ch_arr)
        return {"channels_shuffled": ch_arr}

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()

class ChromaticAberration (primary_distortion_limit=0.02, secondary_distortion_limit=0.05, mode='green_purple', interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Add lateral chromatic aberration by distorting the red and blue channels of the input image.

Parameters:

Name Type Description
primary_distortion_limit Union[float, Tuple[float, float]]

range of the primary radial distortion coefficient. If primary_distortion_limit is a single float value, the range will be (-primary_distortion_limit, primary_distortion_limit). Controls the distortion in the center of the image (positive values result in pincushion distortion, negative values result in barrel distortion). Default: 0.02.

secondary_distortion_limit Union[float, Tuple[float, float]]

range of the secondary radial distortion coefficient. If secondary_distortion_limit is a single float value, the range will be (-secondary_distortion_limit, secondary_distortion_limit). Controls the distortion in the corners of the image (positive values result in pincushion distortion, negative values result in barrel distortion). Default: 0.05.

mode Literal['green_purple', 'red_blue', 'random']

type of color fringing. Supported modes are 'green_purple', 'red_blue' and 'random'. 'random' will choose one of the modes 'green_purple' or 'red_blue' randomly. Default: 'green_purple'.

interpolation int

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class ChromaticAberration(ImageOnlyTransform):
    """Add lateral chromatic aberration by distorting the red and blue channels of the input image.

    Args:
        primary_distortion_limit: range of the primary radial distortion coefficient.
            If primary_distortion_limit is a single float value, the range will be
            (-primary_distortion_limit, primary_distortion_limit).
            Controls the distortion in the center of the image (positive values result in pincushion distortion,
            negative values result in barrel distortion).
            Default: 0.02.
        secondary_distortion_limit: range of the secondary radial distortion coefficient.
            If secondary_distortion_limit is a single float value, the range will be
            (-secondary_distortion_limit, secondary_distortion_limit).
            Controls the distortion in the corners of the image (positive values result in pincushion distortion,
            negative values result in barrel distortion).
            Default: 0.05.
        mode: type of color fringing.
            Supported modes are 'green_purple', 'red_blue' and 'random'.
            'random' will choose one of the modes 'green_purple' or 'red_blue' randomly.
            Default: 'green_purple'.
        interpolation: flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p: probability of applying the transform.
            Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        primary_distortion_limit: ScaleFloatType = 0.02,
        secondary_distortion_limit: ScaleFloatType = 0.05,
        mode: ChromaticAberrationMode = "green_purple",
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.primary_distortion_limit = to_tuple(primary_distortion_limit)
        self.secondary_distortion_limit = to_tuple(secondary_distortion_limit)
        self.mode = self._validate_mode(mode)
        self.interpolation = interpolation

    @staticmethod
    def _validate_mode(
        mode: ChromaticAberrationMode,
    ) -> ChromaticAberrationMode:
        valid_modes = ["green_purple", "red_blue", "random"]
        if mode not in valid_modes:
            msg = f"Unsupported mode: {mode}. Supported modes are 'green_purple', 'red_blue', 'random'."
            raise ValueError(msg)
        return mode

    def apply(
        self,
        img: np.ndarray,
        primary_distortion_red: float = -0.02,
        secondary_distortion_red: float = -0.05,
        primary_distortion_blue: float = -0.02,
        secondary_distortion_blue: float = -0.05,
        **params: Any,
    ) -> np.ndarray:
        return F.chromatic_aberration(
            img,
            primary_distortion_red,
            secondary_distortion_red,
            primary_distortion_blue,
            secondary_distortion_blue,
            cast(int, self.interpolation),
        )

    def get_params(self) -> Dict[str, float]:
        primary_distortion_red = random_utils.uniform(*self.primary_distortion_limit)
        secondary_distortion_red = random_utils.uniform(*self.secondary_distortion_limit)
        primary_distortion_blue = random_utils.uniform(*self.primary_distortion_limit)
        secondary_distortion_blue = random_utils.uniform(*self.secondary_distortion_limit)

        secondary_distortion_red = self._match_sign(primary_distortion_red, secondary_distortion_red)
        secondary_distortion_blue = self._match_sign(primary_distortion_blue, secondary_distortion_blue)

        if self.mode == "green_purple":
            # distortion coefficients of the red and blue channels have the same sign
            primary_distortion_blue = self._match_sign(primary_distortion_red, primary_distortion_blue)
            secondary_distortion_blue = self._match_sign(secondary_distortion_red, secondary_distortion_blue)
        if self.mode == "red_blue":
            # distortion coefficients of the red and blue channels have the opposite sign
            primary_distortion_blue = self._unmatch_sign(primary_distortion_red, primary_distortion_blue)
            secondary_distortion_blue = self._unmatch_sign(secondary_distortion_red, secondary_distortion_blue)

        return {
            "primary_distortion_red": primary_distortion_red,
            "secondary_distortion_red": secondary_distortion_red,
            "primary_distortion_blue": primary_distortion_blue,
            "secondary_distortion_blue": secondary_distortion_blue,
        }

    @staticmethod
    def _match_sign(a: float, b: float) -> float:
        # Match the sign of b to a
        if (a < 0 < b) or (a > 0 > b):
            b = -b
        return b

    @staticmethod
    def _unmatch_sign(a: float, b: float) -> float:
        # Unmatch the sign of b to a
        if (a < 0 and b < 0) or (a > 0 and b > 0):
            b = -b
        return b

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return "primary_distortion_limit", "secondary_distortion_limit", "mode", "interpolation"

class ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2, always_apply=False, p=0.5) [view source on GitHub]

Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision, this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8 overflow, but we use value saturation.

Parameters:

Name Type Description
brightness float or tuple of float (min, max

How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.

contrast float or tuple of float (min, max

How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.

saturation float or tuple of float (min, max

How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.

hue float or tuple of float (min, max

How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

Source code in albumentations/augmentations/transforms.py
Python
class ColorJitter(ImageOnlyTransform):
    """Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision,
    this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in
    Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8
    overflow, but we use value saturation.

    Args:
        brightness (float or tuple of float (min, max)): How much to jitter brightness.
            brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
            or the given [min, max]. Should be non negative numbers.
        contrast (float or tuple of float (min, max)): How much to jitter contrast.
            contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]
            or the given [min, max]. Should be non negative numbers.
        saturation (float or tuple of float (min, max)): How much to jitter saturation.
            saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
            or the given [min, max]. Should be non negative numbers.
        hue (float or tuple of float (min, max)): How much to jitter hue.
            hue_factor is chosen uniformly from [-hue, hue] or the given [min, max].
            Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

    """

    def __init__(
        self,
        brightness: ScaleFloatType = 0.2,
        contrast: ScaleFloatType = 0.2,
        saturation: ScaleFloatType = 0.2,
        hue: ScaleFloatType = 0.2,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)

        self.brightness = self.__check_values(brightness, "brightness")
        self.contrast = self.__check_values(contrast, "contrast")
        self.saturation = self.__check_values(saturation, "saturation")
        self.hue = self.__check_values(hue, "hue", offset=0, bounds=(-0.5, 0.5), clip=False)

        self.transforms = [
            F.adjust_brightness_torchvision,
            F.adjust_contrast_torchvision,
            F.adjust_saturation_torchvision,
            F.adjust_hue_torchvision,
        ]

    @staticmethod
    def __check_values(
        value: ScaleFloatType,
        name: str,
        offset: float = 1,
        bounds: Tuple[float, float] = (0, float("inf")),
        clip: bool = True,
    ) -> Tuple[float, float]:
        if isinstance(value, numbers.Number):
            if value < 0:
                raise ValueError(f"If {name} is a single number, it must be non negative.")
            value = [offset - value, offset + value]
            if clip:
                value[0] = max(value[0], 0)
        elif isinstance(value, (tuple, list)) and len(value) == TWO:
            if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
                raise ValueError(f"{name} values should be between {bounds}")
        else:
            raise TypeError(f"{name} should be a single number or a list/tuple with length 2.")

        return value

    def get_params(self) -> Dict[str, Any]:
        brightness = random.uniform(self.brightness[0], self.brightness[1])
        contrast = random.uniform(self.contrast[0], self.contrast[1])
        saturation = random.uniform(self.saturation[0], self.saturation[1])
        hue = random.uniform(self.hue[0], self.hue[1])

        order = [0, 1, 2, 3]
        random.shuffle(order)

        return {
            "brightness": brightness,
            "contrast": contrast,
            "saturation": saturation,
            "hue": hue,
            "order": order,
        }

    def apply(
        self,
        img: np.ndarray,
        brightness: float = 1.0,
        contrast: float = 1.0,
        saturation: float = 1.0,
        hue: float = 0,
        order: Optional[List[int]] = None,
        **params: Any,
    ) -> np.ndarray:
        if order is None:
            order = [0, 1, 2, 3]
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "ColorJitter transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)
        color_transforms = [brightness, contrast, saturation, hue]
        for i in order:
            img = self.transforms[i](img, color_transforms[i])  # type: ignore[operator]
        return img

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("brightness", "contrast", "saturation", "hue")

class Downscale (scale_min=0.25, scale_max=0.25, interpolation=None, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by downscaling and upscaling back.

Parameters:

Name Type Description
scale_min float

lower bound on the image scale. Should be <= scale_max.

scale_max float

upper bound on the image scale. Should be < 1.

interpolation Union[int, albumentations.core.transforms_interface.Interpolation, Dict[str, int]]

cv2 interpolation method. Could be: - single cv2 interpolation flag - selected method will be used for downscale and upscale. - dict(downscale=flag, upscale=flag) - Downscale.Interpolation(downscale=flag, upscale=flag) - Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class Downscale(ImageOnlyTransform):
    """Decreases image quality by downscaling and upscaling back.

    Args:
        scale_min: lower bound on the image scale. Should be <= scale_max.
        scale_max: upper bound on the image scale. Should be < 1.
        interpolation: cv2 interpolation method. Could be:
            - single cv2 interpolation flag - selected method will be used for downscale and upscale.
            - dict(downscale=flag, upscale=flag)
            - Downscale.Interpolation(downscale=flag, upscale=flag) -
            Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        scale_min: float = 0.25,
        scale_max: float = 0.25,
        interpolation: Optional[Union[int, Interpolation, Dict[str, int]]] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        if interpolation is None:
            self.interpolation = Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)
            warnings.warn(
                "Using default interpolation INTER_NEAREST, which is sub-optimal."
                "Please specify interpolation mode for downscale and upscale explicitly."
                "For additional information see this PR https://github.com/albumentations-team/albumentations/pull/584"
            )
        elif isinstance(interpolation, int):
            self.interpolation = Interpolation(downscale=interpolation, upscale=interpolation)
        elif isinstance(interpolation, Interpolation):
            self.interpolation = interpolation
        elif isinstance(interpolation, dict):
            self.interpolation = Interpolation(**interpolation)
        else:
            raise ValueError(
                "Wrong interpolation data type. Supported types: `Optional[Union[int, Interpolation, Dict[str, int]]]`."
                f" Got: {type(interpolation)}"
            )

        if scale_min > scale_max:
            raise ValueError(f"Expected scale_min be less or equal scale_max, got {scale_min} {scale_max}")
        if scale_max >= 1:
            raise ValueError(f"Expected scale_max to be less than 1, got {scale_max}")
        self.scale_min = scale_min
        self.scale_max = scale_max

    def apply(self, img: np.ndarray, scale: float, **params: Any) -> np.ndarray:
        if isinstance(self.interpolation, int):
            msg = "Should not be here, added for typing purposes. Please report this issue."
            raise TypeError(msg)
        return F.downscale(
            img,
            scale=scale,
            down_interpolation=self.interpolation.downscale,
            up_interpolation=self.interpolation.upscale,
        )

    def get_params(self) -> Dict[str, Any]:
        return {"scale": random.uniform(self.scale_min, self.scale_max)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return "scale_min", "scale_max"

    def to_dict_private(self) -> Dict[str, Any]:
        if isinstance(self.interpolation, int):
            msg = "Should not be here, added for typing purposes. Please report this issue."
            raise TypeError(msg)
        result = super().to_dict_private()
        result["interpolation"] = {"upscale": self.interpolation.upscale, "downscale": self.interpolation.downscale}
        return result

class Emboss (alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5) [view source on GitHub]

Emboss the input image and overlays the result with the original image.

Parameters:

Name Type Description
alpha Tuple[float, float]

range to choose the visibility of the embossed image. At 0, only the original image is visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).

strength Tuple[float, float]

strength range of the embossing. Default: (0.2, 0.7).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Source code in albumentations/augmentations/transforms.py
Python
class Emboss(ImageOnlyTransform):
    """Emboss the input image and overlays the result with the original image.

    Args:
        alpha: range to choose the visibility of the embossed image. At 0, only the original image is
            visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
        strength: strength range of the embossing. Default: (0.2, 0.7).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    """

    def __init__(
        self,
        alpha: Tuple[float, float] = (0.2, 0.5),
        strength: Tuple[float, float] = (0.2, 0.7),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
        self.strength = self.__check_values(to_tuple(strength, 0.0), name="strength")

    @staticmethod
    def __check_values(
        value: Tuple[float, float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
    ) -> Tuple[float, float]:
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{name} values should be between {bounds}")
        return value

    @staticmethod
    def __generate_emboss_matrix(alpha_sample: np.ndarray, strength_sample: np.ndarray) -> np.ndarray:
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [
                [-1 - strength_sample, 0 - strength_sample, 0],
                [0 - strength_sample, 1, 0 + strength_sample],
                [0, 0 + strength_sample, 1 + strength_sample],
            ],
            dtype=np.float32,
        )
        return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect

    def get_params(self) -> Dict[str, np.ndarray]:
        alpha = random.uniform(*self.alpha)
        strength = random.uniform(*self.strength)
        emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
        return {"emboss_matrix": emboss_matrix}

    def apply(self, img: np.ndarray, emboss_matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.convolve(img, emboss_matrix)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("alpha", "strength")

class Equalize (mode='cv', by_channels=True, mask=None, mask_params=(), always_apply=False, p=0.5) [view source on GitHub]

Equalize the image histogram.

Parameters:

Name Type Description
mode str

{'cv', 'pil'}. Use OpenCV or Pillow equalization method.

by_channels bool

If True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by Y channel.

mask np.ndarray, callable

If given, only the pixels selected by the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable. Function signature must include image argument.

mask_params list of str

Params for mask function.

Targets

image

Image types: uint8

Source code in albumentations/augmentations/transforms.py
Python
class Equalize(ImageOnlyTransform):
    """Equalize the image histogram.

    Args:
        mode (str): {'cv', 'pil'}. Use OpenCV or Pillow equalization method.
        by_channels (bool): If True, use equalization by channels separately,
            else convert image to YCbCr representation and use equalization by `Y` channel.
        mask (np.ndarray, callable): If given, only the pixels selected by
            the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable.
            Function signature must include `image` argument.
        mask_params (list of str): Params for mask function.

    Targets:
        image

    Image types:
        uint8

    """

    def __init__(
        self,
        mode: ImageMode = "cv",
        by_channels: bool = True,
        mask: Optional[np.ndarray] = None,
        mask_params: Tuple[()] = (),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        if mode not in image_modes:
            raise ValueError(f"Unsupported equalization mode. Supports: {image_modes}. " f"Got: {mode}")

        super().__init__(always_apply, p)
        self.mode = mode
        self.by_channels = by_channels
        self.mask = mask
        self.mask_params = mask_params

    def apply(self, img: np.ndarray, mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.equalize(img, mode=self.mode, by_channels=self.by_channels, mask=mask)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        if not callable(self.mask):
            return {"mask": self.mask}

        return {"mask": self.mask(**params)}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image", *list(self.mask_params)]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("mode", "by_channels", "mask", "mask_params")

class FancyPCA (alpha=0.1, always_apply=False, p=0.5) [view source on GitHub]

Augment RGB image using FancyPCA from Krizhevsky's paper "ImageNet Classification with Deep Convolutional Neural Networks"

Parameters:

Name Type Description
alpha float

how much to perturb/scale the eigen vecs and vals. scale is samples from gaussian distribution (mu=0, sigma=alpha)

Targets

image

Image types: 3-channel uint8 images only

Source code in albumentations/augmentations/transforms.py
Python
class FancyPCA(ImageOnlyTransform):
    """Augment RGB image using FancyPCA from Krizhevsky's paper
    "ImageNet Classification with Deep Convolutional Neural Networks"

    Args:
        alpha:  how much to perturb/scale the eigen vecs and vals.
            scale is samples from gaussian distribution (mu=0, sigma=alpha)

    Targets:
        image

    Image types:
        3-channel uint8 images only

    Credit:
        http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
        https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image
        https://pixelatedbrian.github.io/2018-04-29-fancy_pca/

    """

    def __init__(self, alpha: float = 0.1, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply=always_apply, p=p)
        self.alpha = alpha

    def apply(self, img: np.ndarray, alpha: float = 0.1, **params: Any) -> np.ndarray:
        return F.fancy_pca(img, alpha)

    def get_params(self) -> Dict[str, float]:
        return {"alpha": random.gauss(0, self.alpha)}

    def get_transform_init_args_names(self) -> Tuple[str]:
        return ("alpha",)

class FromFloat (dtype='uint16', max_value=None, always_apply=False, p=1.0) [view source on GitHub]

Take an input array where all values should lie in the range [0, 1.0], multiply them by max_value and then cast the resulted value to a type specified by dtype. If max_value is None the transform will try to infer the maximum value for the data type from the dtype argument.

This is the inverse transform for :class:~albumentations.augmentations.transforms.ToFloat.

Parameters:

Name Type Description
max_value Optional[float]

maximum possible input value. Default: None.

dtype str

data type of the output. See the 'Data types' page from the NumPy docs_. Default: 'uint16'.

p float

probability of applying the transform. Default: 1.0.

Targets

image

Image types: float32

.. _'Data types' page from the NumPy docs: https://docs.scipy.org/doc/numpy/user/basics.types.html

Source code in albumentations/augmentations/transforms.py
Python
class FromFloat(ImageOnlyTransform):
    """Take an input array where all values should lie in the range [0, 1.0], multiply them by `max_value` and then
    cast the resulted value to a type specified by `dtype`. If `max_value` is None the transform will try to infer
    the maximum value for the data type from the `dtype` argument.

    This is the inverse transform for :class:`~albumentations.augmentations.transforms.ToFloat`.

    Args:
        max_value: maximum possible input value. Default: None.
        dtype: data type of the output. See the `'Data types' page from the NumPy docs`_.
            Default: 'uint16'.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        float32

    .. _'Data types' page from the NumPy docs:
       https://docs.scipy.org/doc/numpy/user/basics.types.html

    """

    def __init__(
        self, dtype: str = "uint16", max_value: Optional[float] = None, always_apply: bool = False, p: float = 1.0
    ):
        super().__init__(always_apply, p)
        self.dtype = np.dtype(dtype)
        self.max_value = max_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.from_float(img, self.dtype, self.max_value)

    def get_transform_init_args(self) -> Dict[str, Any]:
        return {"dtype": self.dtype.name, "max_value": self.max_value}

class GaussNoise (var_limit=(10.0, 50.0), mean=0, per_channel=True, always_apply=False, p=0.5) [view source on GitHub]

Apply gaussian noise to the input image.

Parameters:

Name Type Description
var_limit Union[float, Tuple[float, float]]

variance range for noise. If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0).

mean float

mean of the noise. Default: 0

per_channel bool

if set to True, noise will be sampled for each channel independently. Otherwise, the noise will be sampled once for all channels. Default: True

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class GaussNoise(ImageOnlyTransform):
    """Apply gaussian noise to the input image.

    Args:
        var_limit: variance range for noise. If var_limit is a single float, the range
            will be (0, var_limit). Default: (10.0, 50.0).
        mean: mean of the noise. Default: 0
        per_channel: if set to True, noise will be sampled for each channel independently.
            Otherwise, the noise will be sampled once for all channels. Default: True
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        var_limit: ScaleFloatType = (10.0, 50.0),
        mean: float = 0,
        per_channel: bool = True,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        if isinstance(var_limit, (tuple, list)):
            if var_limit[0] < 0:
                msg = "Lower var_limit should be non negative."
                raise ValueError(msg)
            if var_limit[1] < 0:
                msg = "Upper var_limit should be non negative."
                raise ValueError(msg)
            self.var_limit = var_limit
        elif isinstance(var_limit, (int, float)):
            if var_limit < 0:
                msg = "var_limit should be non negative."
                raise ValueError(msg)

            self.var_limit = (0, var_limit)
        else:
            raise TypeError(f"Expected var_limit type to be one of (int, float, tuple, list), got {type(var_limit)}")

        self.mean = mean
        self.per_channel = per_channel

    def apply(self, img: np.ndarray, gauss: Optional[float] = None, **params: Any) -> np.ndarray:
        return F.gauss_noise(img, gauss=gauss)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, float]:
        image = params["image"]
        var = random.uniform(self.var_limit[0], self.var_limit[1])
        sigma = var**0.5

        if self.per_channel:
            gauss = random_utils.normal(self.mean, sigma, image.shape)
        else:
            gauss = random_utils.normal(self.mean, sigma, image.shape[:2])
            if len(image.shape) == THREE:
                gauss = np.expand_dims(gauss, -1)

        return {"gauss": gauss}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("var_limit", "per_channel", "mean")

class HueSaturationValue (hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, always_apply=False, p=0.5) [view source on GitHub]

Randomly change hue, saturation and value of the input image.

Parameters:

Name Type Description
hue_shift_limit Union[int, Tuple[int, int]]

range for changing hue. If hue_shift_limit is a single int, the range will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).

sat_shift_limit Union[int, Tuple[int, int]]

range for changing saturation. If sat_shift_limit is a single int, the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).

val_shift_limit Union[int, Tuple[int, int]]

range for changing value. If val_shift_limit is a single int, the range will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class HueSaturationValue(ImageOnlyTransform):
    """Randomly change hue, saturation and value of the input image.

    Args:
        hue_shift_limit: range for changing hue. If hue_shift_limit is a single int, the range
            will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
        sat_shift_limit: range for changing saturation. If sat_shift_limit is a single int,
            the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
        val_shift_limit: range for changing value. If val_shift_limit is a single int, the range
            will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        hue_shift_limit: ScaleIntType = 20,
        sat_shift_limit: ScaleIntType = 30,
        val_shift_limit: ScaleIntType = 20,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.hue_shift_limit = to_tuple(hue_shift_limit)
        self.sat_shift_limit = to_tuple(sat_shift_limit)
        self.val_shift_limit = to_tuple(val_shift_limit)

    def apply(
        self, img: np.ndarray, hue_shift: int = 0, sat_shift: int = 0, val_shift: int = 0, **params: Any
    ) -> np.ndarray:
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "HueSaturationValue transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)
        return F.shift_hsv(img, hue_shift, sat_shift, val_shift)

    def get_params(self) -> Dict[str, float]:
        return {
            "hue_shift": random.uniform(self.hue_shift_limit[0], self.hue_shift_limit[1]),
            "sat_shift": random.uniform(self.sat_shift_limit[0], self.sat_shift_limit[1]),
            "val_shift": random.uniform(self.val_shift_limit[0], self.val_shift_limit[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("hue_shift_limit", "sat_shift_limit", "val_shift_limit")

class ISONoise (color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply camera sensor noise.

Parameters:

Name Type Description
color_shift float, float

variance range for color hue change. Measured as a fraction of 360 degree Hue angle in HLS colorspace.

intensity float, float

Multiplicative factor that control strength of color and luminace noise.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8

Source code in albumentations/augmentations/transforms.py
Python
class ISONoise(ImageOnlyTransform):
    """Apply camera sensor noise.

    Args:
        color_shift (float, float): variance range for color hue change.
            Measured as a fraction of 360 degree Hue angle in HLS colorspace.
        intensity ((float, float): Multiplicative factor that control strength
            of color and luminace noise.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8

    """

    def __init__(
        self,
        color_shift: Tuple[float, float] = (0.01, 0.05),
        intensity: Tuple[float, float] = (0.1, 0.5),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.intensity = intensity
        self.color_shift = color_shift

    def apply(
        self,
        img: np.ndarray,
        color_shift: float = 0.05,
        intensity: float = 1.0,
        random_state: Optional[int] = None,
        **params: Any,
    ) -> np.ndarray:
        return F.iso_noise(img, color_shift, intensity, np.random.RandomState(random_state))

    def get_params(self) -> Dict[str, Any]:
        return {
            "color_shift": random.uniform(self.color_shift[0], self.color_shift[1]),
            "intensity": random.uniform(self.intensity[0], self.intensity[1]),
            "random_state": random.randint(0, 65536),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("intensity", "color_shift")

class ImageCompression (quality_lower=99, quality_upper=100, compression_type=<ImageCompressionType.JPEG: 0>, always_apply=False, p=0.5) [view source on GitHub]

Decreases image quality by Jpeg, WebP compression of an image.

Parameters:

Name Type Description
quality_lower int

lower bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.

quality_upper int

upper bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.

compression_type ImageCompressionType

should be ImageCompressionType.JPEG or ImageCompressionType.WEBP. Default: ImageCompressionType.JPEG

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class ImageCompression(ImageOnlyTransform):
    """Decreases image quality by Jpeg, WebP compression of an image.

    Args:
        quality_lower: lower bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.
        quality_upper: upper bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.
        compression_type (ImageCompressionType): should be ImageCompressionType.JPEG or ImageCompressionType.WEBP.
            Default: ImageCompressionType.JPEG

    Targets:
        image

    Image types:
        uint8, float32

    """

    class ImageCompressionType(IntEnum):
        """Defines the types of image compression.

        This Enum class is used to specify the image compression format.

        Attributes:
            JPEG (int): Represents the JPEG image compression format.
            WEBP (int): Represents the WEBP image compression format.

        """

        JPEG = 0
        WEBP = 1

    def __init__(
        self,
        quality_lower: int = 99,
        quality_upper: int = 100,
        compression_type: ImageCompressionType = ImageCompressionType.JPEG,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        self.compression_type = ImageCompression.ImageCompressionType(compression_type)
        low_thresh_quality_assert = 0

        if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
            low_thresh_quality_assert = 1

        if not low_thresh_quality_assert <= quality_lower <= MAX_JPEG_QUALITY:
            raise ValueError(f"Invalid quality_lower. Got: {quality_lower}")
        if not low_thresh_quality_assert <= quality_upper <= MAX_JPEG_QUALITY:
            raise ValueError(f"Invalid quality_upper. Got: {quality_upper}")

        self.quality_lower = quality_lower
        self.quality_upper = quality_upper

    def apply(self, img: np.ndarray, quality: int = 100, image_type: str = ".jpg", **params: Any) -> np.ndarray:
        if img.ndim != TWO and img.shape[-1] not in (1, 3, 4):
            msg = "ImageCompression transformation expects 1, 3 or 4 channel images."
            raise TypeError(msg)
        return F.image_compression(img, quality, image_type)

    def get_params(self) -> Dict[str, Any]:
        image_type = ".jpg"

        if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
            image_type = ".webp"

        return {
            "quality": random.randint(self.quality_lower, self.quality_upper),
            "image_type": image_type,
        }

    def get_transform_init_args(self) -> Dict[str, Any]:
        return {
            "quality_lower": self.quality_lower,
            "quality_upper": self.quality_upper,
            "compression_type": self.compression_type.value,
        }
class ImageCompressionType

Defines the types of image compression.

This Enum class is used to specify the image compression format.

Attributes:

Name Type Description
JPEG int

Represents the JPEG image compression format.

WEBP int

Represents the WEBP image compression format.

Source code in albumentations/augmentations/transforms.py
Python
class ImageCompressionType(IntEnum):
    """Defines the types of image compression.

    This Enum class is used to specify the image compression format.

    Attributes:
        JPEG (int): Represents the JPEG image compression format.
        WEBP (int): Represents the WEBP image compression format.

    """

    JPEG = 0
    WEBP = 1

class InvertImg [view source on GitHub]

Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class InvertImg(ImageOnlyTransform):
    """Invert the input image by subtracting pixel values from max values of the image types,
    i.e., 255 for uint8 and 1.0 for float32.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.invert(img)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()

class Lambda (image=None, mask=None, keypoint=None, bbox=None, global_label=None, name=None, always_apply=False, p=1.0) [view source on GitHub]

A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

Parameters:

Name Type Description
image Optional[Callable[..., Any]]

Image transformation function.

mask Optional[Callable[..., Any]]

Mask transformation function.

keypoint Optional[Callable[..., Any]]

Keypoint transformation function.

bbox Optional[Callable[..., Any]]

BBox transformation function.

global_label Optional[Callable[..., Any]]

Global label transformation function.

always_apply bool

Indicates whether this transformation should be always applied.

p float

probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints, global_label

Image types: Any

Source code in albumentations/augmentations/transforms.py
Python
class Lambda(NoOp):
    """A flexible transformation class for using user-defined transformation functions per targets.
    Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

    Args:
        image: Image transformation function.
        mask: Mask transformation function.
        keypoint: Keypoint transformation function.
        bbox: BBox transformation function.
        global_label: Global label transformation function.
        always_apply: Indicates whether this transformation should be always applied.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints, global_label

    Image types:
        Any

    """

    def __init__(
        self,
        image: Optional[Callable[..., Any]] = None,
        mask: Optional[Callable[..., Any]] = None,
        keypoint: Optional[Callable[..., Any]] = None,
        bbox: Optional[Callable[..., Any]] = None,
        global_label: Optional[Callable[..., Any]] = None,
        name: Optional[str] = None,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)

        self.name = name
        self.custom_apply_fns = {
            target_name: F.noop for target_name in ("image", "mask", "keypoint", "bbox", "global_label")
        }
        for target_name, custom_apply_fn in {
            "image": image,
            "mask": mask,
            "keypoint": keypoint,
            "bbox": bbox,
            "global_label": global_label,
        }.items():
            if custom_apply_fn is not None:
                if isinstance(custom_apply_fn, LambdaType) and custom_apply_fn.__name__ == "<lambda>":
                    warnings.warn(
                        "Using lambda is incompatible with multiprocessing. "
                        "Consider using regular functions or partial()."
                    )

                self.custom_apply_fns[target_name] = custom_apply_fn

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["image"]
        return fn(img, **params)

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["mask"]
        return fn(mask, **params)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        fn = self.custom_apply_fns["bbox"]
        return fn(bbox, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        fn = self.custom_apply_fns["keypoint"]
        return fn(keypoint, **params)

    def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["global_label"]
        return fn(label, **params)

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    def to_dict_private(self) -> Dict[str, Any]:
        if self.name is None:
            msg = (
                "To make a Lambda transform serializable you should provide the `name` argument, "
                "e.g. `Lambda(name='my_transform', image=<some func>, ...)`."
            )
            raise ValueError(msg)
        return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}

    def __repr__(self) -> str:
        state = {"name": self.name}
        state.update(self.custom_apply_fns.items())  # type: ignore[arg-type]
        state.update(self.get_base_init_args())
        return f"{self.__class__.__name__}({format_args(state)})"

class MultiplicativeNoise (multiplier=(0.9, 1.1), per_channel=False, elementwise=False, always_apply=False, p=0.5) [view source on GitHub]

Multiply image to random number or array of numbers.

Parameters:

Name Type Description
multiplier Union[float, Tuple[float, float]]

If single float image will be multiplied to this number. If tuple of float multiplier will be in range [multiplier[0], multiplier[1]). Default: (0.9, 1.1).

per_channel bool

If False, same values for all channels will be used. If True use sample values for each channels. Default False.

elementwise bool

If False multiply multiply all pixels in an image with a random value sampled once. If True Multiply image pixels with values that are pixelwise randomly sampled. Default: False.

Targets

image

Image types: Any

Source code in albumentations/augmentations/transforms.py
Python
class MultiplicativeNoise(ImageOnlyTransform):
    """Multiply image to random number or array of numbers.

    Args:
        multiplier: If single float image will be multiplied to this number.
            If tuple of float multiplier will be in range `[multiplier[0], multiplier[1])`. Default: (0.9, 1.1).
        per_channel: If `False`, same values for all channels will be used.
            If `True` use sample values for each channels. Default False.
        elementwise: If `False` multiply multiply all pixels in an image with a random value sampled once.
            If `True` Multiply image pixels with values that are pixelwise randomly sampled. Default: False.

    Targets:
        image

    Image types:
        Any

    """

    def __init__(
        self,
        multiplier: ScaleFloatType = (0.9, 1.1),
        per_channel: bool = False,
        elementwise: bool = False,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.multiplier = to_tuple(multiplier, multiplier)
        self.per_channel = per_channel
        self.elementwise = elementwise

    def apply(self, img: np.ndarray, multiplier: float = np.array([1]), **kwargs: Any) -> np.ndarray:
        return F.multiply(img, multiplier)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        if self.multiplier[0] == self.multiplier[1]:
            return {"multiplier": np.array([self.multiplier[0]])}

        img = params["image"]

        height, width = img.shape[:2]

        num_channels = (1 if is_grayscale_image(img) else img.shape[-1]) if self.per_channel else 1

        shape = [height, width, num_channels] if self.elementwise else [num_channels]

        multiplier = random_utils.uniform(self.multiplier[0], self.multiplier[1], tuple(shape))
        if is_grayscale_image(img) and img.ndim == TWO:
            multiplier = np.squeeze(multiplier)

        return {"multiplier": multiplier}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return "multiplier", "per_channel", "elementwise"

class Normalize (mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, always_apply=False, p=1.0) [view source on GitHub]

Normalization is applied by the formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value)

Parameters:

Name Type Description
mean Union[float, Sequence[float]]

mean values

std Union[float, Sequence[float]]

std values

max_pixel_value float

maximum possible pixel value

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class Normalize(ImageOnlyTransform):
    """Normalization is applied by the formula: `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`

    Args:
        mean: mean values
        std: std values
        max_pixel_value: maximum possible pixel value

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        mean: Union[float, Sequence[float]] = (0.485, 0.456, 0.406),
        std: Union[float, Sequence[float]] = (0.229, 0.224, 0.225),
        max_pixel_value: float = 255.0,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        self.mean = mean
        self.std = std
        self.max_pixel_value = max_pixel_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.normalize(img, self.mean, self.std, self.max_pixel_value)

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("mean", "std", "max_pixel_value")

class PixelDropout (dropout_prob=0.01, per_channel=False, drop_value=0, mask_drop_value=None, always_apply=False, p=0.5) [view source on GitHub]

Set pixels to 0 with some probability.

Parameters:

Name Type Description
dropout_prob float

pixel drop probability. Default: 0.01

per_channel bool

if set to True drop mask will be sampled for each channel, otherwise the same mask will be sampled for all channels. Default: False

drop_value number or sequence of numbers or None

Value that will be set in dropped place. If set to None value will be sampled randomly, default ranges will be used: - uint8 - [0, 255] - uint16 - [0, 65535] - uint32 - [0, 4294967295] - float, double - [0, 1] Default: 0

mask_drop_value number or sequence of numbers or None

Value that will be set in dropped place in masks. If set to None masks will be unchanged. Default: 0

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask

Image types: any

Source code in albumentations/augmentations/transforms.py
Python
class PixelDropout(DualTransform):
    """Set pixels to 0 with some probability.

    Args:
        dropout_prob (float): pixel drop probability. Default: 0.01
        per_channel (bool): if set to `True` drop mask will be sampled for each channel,
            otherwise the same mask will be sampled for all channels. Default: False
        drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
            If set to None value will be sampled randomly, default ranges will be used:
                - uint8 - [0, 255]
                - uint16 - [0, 65535]
                - uint32 - [0, 4294967295]
                - float, double - [0, 1]
            Default: 0
        mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
            If set to None masks will be unchanged. Default: 0
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask
    Image types:
        any

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(
        self,
        dropout_prob: float = 0.01,
        per_channel: bool = False,
        drop_value: Optional[ScaleFloatType] = 0,
        mask_drop_value: Optional[ScaleFloatType] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.dropout_prob = dropout_prob
        self.per_channel = per_channel
        self.drop_value = drop_value
        self.mask_drop_value = mask_drop_value

        if self.mask_drop_value is not None and self.per_channel:
            msg = "PixelDropout supports mask only with per_channel=False"
            raise ValueError(msg)

    def apply(
        self,
        img: np.ndarray,
        drop_mask: Optional[np.ndarray] = None,
        drop_value: Union[float, Sequence[float]] = (),
        **params: Any,
    ) -> np.ndarray:
        return F.pixel_dropout(img, drop_mask, drop_value)

    def apply_to_mask(self, mask: np.ndarray, drop_mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        if self.mask_drop_value is None:
            return mask

        if mask.ndim == TWO:
            drop_mask = np.squeeze(drop_mask)

        return F.pixel_dropout(mask, drop_mask, self.mask_drop_value)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return bbox

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return keypoint

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        shape = img.shape if self.per_channel else img.shape[:2]

        rnd = np.random.RandomState(random.randint(0, 1 << 31))
        # Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
        drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])

        drop_value: Union[float, Sequence[float], np.ndarray]
        if drop_mask.ndim != img.ndim:
            drop_mask = np.expand_dims(drop_mask, -1)
        if self.drop_value is None:
            drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])

            if img.dtype in (np.uint8, np.uint16, np.uint32):
                drop_value = rnd.randint(0, int(F.MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
            elif img.dtype in [np.float32, np.double]:
                drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtype)
            else:
                raise ValueError(f"Unsupported dtype: {img.dtype}")
        else:
            drop_value = self.drop_value

        return {"drop_mask": drop_mask, "drop_value": drop_value}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")

class Posterize (num_bits=4, always_apply=False, p=0.5) [view source on GitHub]

Reduce the number of bits for each color channel.

Parameters:

Name Type Description
num_bits int, int) or int, or list of ints [r, g, b], or list of ints [[r1, r1], [g1, g2], [b1, b2]]

number of high bits. If num_bits is a single value, the range will be [num_bits, num_bits]. Must be in range [0, 8]. Default: 4.

p float

probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

Source code in albumentations/augmentations/transforms.py
Python
class Posterize(ImageOnlyTransform):
    """Reduce the number of bits for each color channel.

    Args:
        num_bits ((int, int) or int,
                  or list of ints [r, g, b],
                  or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
            If num_bits is a single value, the range will be [num_bits, num_bits].
            Must be in range [0, 8]. Default: 4.
        p: probability of applying the transform. Default: 0.5.

    Targets:
    image

    Image types:
        uint8

    """

    def __init__(
        self,
        num_bits: Union[int, Tuple[int, int], Tuple[int, int, int]] = 4,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        if isinstance(num_bits, int):
            self.num_bits = to_tuple(num_bits, num_bits)
        elif isinstance(num_bits, Sequence) and len(num_bits) == THREE:
            self.num_bits = [to_tuple(i, 0) for i in num_bits]  # type: ignore[assignment]
        else:
            self.num_bits = to_tuple(num_bits, 0)  # type: ignore[arg-type]

    def apply(self, img: np.ndarray, num_bits: int = 1, **params: Any) -> np.ndarray:
        return F.posterize(img, num_bits)

    def get_params(self) -> Dict[str, Any]:
        if len(self.num_bits) == THREE:
            return {"num_bits": [random.randint(int(i[0]), int(i[1])) for i in self.num_bits]}  # type: ignore[index]
        num_bits = self.num_bits
        return {"num_bits": random.randint(int(num_bits[0]), int(num_bits[1]))}

    def get_transform_init_args_names(self) -> Tuple[str]:
        return ("num_bits",)

class RGBShift (r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, always_apply=False, p=0.5) [view source on GitHub]

Randomly shift values for each channel of the input RGB image.

Parameters:

Name Type Description
r_shift_limit Union[int, Tuple[int, int]]

range for changing values for the red channel. If r_shift_limit is a single int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).

g_shift_limit Union[int, Tuple[int, int]]

range for changing values for the green channel. If g_shift_limit is a single int, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).

b_shift_limit Union[int, Tuple[int, int]]

range for changing values for the blue channel. If b_shift_limit is a single int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RGBShift(ImageOnlyTransform):
    """Randomly shift values for each channel of the input RGB image.

    Args:
        r_shift_limit: range for changing values for the red channel. If r_shift_limit is a single
            int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
        g_shift_limit: range for changing values for the green channel. If g_shift_limit is a
            single int, the range  will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
        b_shift_limit: range for changing values for the blue channel. If b_shift_limit is a single
            int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        r_shift_limit: ScaleIntType = 20,
        g_shift_limit: ScaleIntType = 20,
        b_shift_limit: ScaleIntType = 20,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.r_shift_limit = to_tuple(r_shift_limit)
        self.g_shift_limit = to_tuple(g_shift_limit)
        self.b_shift_limit = to_tuple(b_shift_limit)

    def apply(self, img: np.ndarray, r_shift: int = 0, g_shift: int = 0, b_shift: int = 0, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            msg = "RGBShift transformation expects 3-channel images."
            raise TypeError(msg)
        return F.shift_rgb(img, r_shift, g_shift, b_shift)

    def get_params(self) -> Dict[str, Any]:
        return {
            "r_shift": random.uniform(self.r_shift_limit[0], self.r_shift_limit[1]),
            "g_shift": random.uniform(self.g_shift_limit[0], self.g_shift_limit[1]),
            "b_shift": random.uniform(self.b_shift_limit[0], self.b_shift_limit[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("r_shift_limit", "g_shift_limit", "b_shift_limit")

class RandomBrightnessContrast (brightness_limit=0.2, contrast_limit=0.2, brightness_by_max=True, always_apply=False, p=0.5) [view source on GitHub]

Randomly change brightness and contrast of the input image.

Parameters:

Name Type Description
brightness_limit Union[float, Tuple[float, float]]

factor range for changing brightness. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).

contrast_limit Union[float, Tuple[float, float]]

factor range for changing contrast. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).

brightness_by_max bool

If True adjust contrast by image dtype maximum, else adjust contrast by image mean.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomBrightnessContrast(ImageOnlyTransform):
    """Randomly change brightness and contrast of the input image.

    Args:
        brightness_limit: factor range for changing brightness.
            If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
        contrast_limit: factor range for changing contrast.
            If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
        brightness_by_max: If True adjust contrast by image dtype maximum,
            else adjust contrast by image mean.
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        brightness_limit: ScaleFloatType = 0.2,
        contrast_limit: ScaleFloatType = 0.2,
        brightness_by_max: bool = True,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.brightness_limit = to_tuple(brightness_limit)
        self.contrast_limit = to_tuple(contrast_limit)
        self.brightness_by_max = brightness_by_max

    def apply(self, img: np.ndarray, alpha: float = 1.0, beta: float = 0.0, **params: Any) -> np.ndarray:
        return F.brightness_contrast_adjust(img, alpha, beta, self.brightness_by_max)

    def get_params(self) -> Dict[str, float]:
        return {
            "alpha": 1.0 + random.uniform(self.contrast_limit[0], self.contrast_limit[1]),
            "beta": 0.0 + random.uniform(self.brightness_limit[0], self.brightness_limit[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("brightness_limit", "contrast_limit", "brightness_by_max")

class RandomFog (fog_coef_lower=0.3, fog_coef_upper=1, alpha_coef=0.08, always_apply=False, p=0.5) [view source on GitHub]

Simulates fog for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
fog_coef_lower float

lower limit for fog intensity coefficient. Should be in [0, 1] range.

fog_coef_upper float

upper limit for fog intensity coefficient. Should be in [0, 1] range.

alpha_coef float

transparency of the fog circles. Should be in [0, 1] range.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomFog(ImageOnlyTransform):
    """Simulates fog for the image

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        fog_coef_lower: lower limit for fog intensity coefficient. Should be in [0, 1] range.
        fog_coef_upper: upper limit for fog intensity coefficient. Should be in [0, 1] range.
        alpha_coef: transparency of the fog circles. Should be in [0, 1] range.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        fog_coef_lower: float = 0.3,
        fog_coef_upper: float = 1,
        alpha_coef: float = 0.08,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        if not 0 <= fog_coef_lower <= fog_coef_upper <= 1:
            raise ValueError(
                f"Invalid combination if fog_coef_lower and fog_coef_upper. Got: {(fog_coef_lower, fog_coef_upper)}"
            )
        if not 0 <= alpha_coef <= 1:
            raise ValueError(f"alpha_coef must be in range [0, 1]. Got: {alpha_coef}")

        self.fog_coef_lower = fog_coef_lower
        self.fog_coef_upper = fog_coef_upper
        self.alpha_coef = alpha_coef

    def apply(
        self,
        img: np.ndarray,
        fog_coef: np.ndarray = 0.1,
        haze_list: Optional[List[Tuple[int, int]]] = None,
        **params: Any,
    ) -> np.ndarray:
        if haze_list is None:
            haze_list = []
        return F.add_fog(img, fog_coef, self.alpha_coef, haze_list)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        fog_coef = random.uniform(self.fog_coef_lower, self.fog_coef_upper)

        height, width = imshape = img.shape[:2]

        hw = max(1, int(width // 3 * fog_coef))

        haze_list = []
        midx = width // 2 - 2 * hw
        midy = height // 2 - hw
        index = 1

        while midx > -hw or midy > -hw:
            for _ in range(hw // 10 * index):
                x = random.randint(midx, width - midx - hw)
                y = random.randint(midy, height - midy - hw)
                haze_list.append((x, y))

            midx -= 3 * hw * width // sum(imshape)
            midy -= 3 * hw * height // sum(imshape)
            index += 1

        return {"haze_list": haze_list, "fog_coef": fog_coef}

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("fog_coef_lower", "fog_coef_upper", "alpha_coef")

class RandomGamma (gamma_limit=(80, 120), always_apply=False, p=0.5) [view source on GitHub]

Applies random gamma correction to an image as a form of data augmentation.

This class adjusts the luminance of an image by applying gamma correction with a randomly selected gamma value from a specified range. Gamma correction can simulate various lighting conditions, potentially enhancing model generalization. For more details on gamma correction, see: https://en.wikipedia.org/wiki/Gamma_correction

Attributes:

Name Type Description
gamma_limit Union[int, Tuple[int, int]]

The range for gamma adjustment. If gamma_limit is a single int, the range will be interpreted as (-gamma_limit, gamma_limit), defining how much to adjust the image's gamma. Default is (80, 120).

always_apply bool

If True, the transform will always be applied, regardless of p. Default is False.

p float

The probability that the transform will be applied. Default is 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomGamma(ImageOnlyTransform):
    """Applies random gamma correction to an image as a form of data augmentation.

    This class adjusts the luminance of an image by applying gamma correction with a randomly
    selected gamma value from a specified range. Gamma correction can simulate various lighting
    conditions, potentially enhancing model generalization. For more details on gamma correction,
    see: https://en.wikipedia.org/wiki/Gamma_correction

    Attributes:
        gamma_limit (Union[int, Tuple[int, int]]): The range for gamma adjustment. If `gamma_limit` is a single
            int, the range will be interpreted as (-gamma_limit, gamma_limit), defining how much
            to adjust the image's gamma. Default is (80, 120).
        always_apply (bool): If `True`, the transform will always be applied, regardless of `p`.
            Default is `False`.
        p (float): The probability that the transform will be applied. Default is 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        gamma_limit: ScaleIntType = (80, 120),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.gamma_limit = to_tuple(gamma_limit)

    def apply(self, img: np.ndarray, gamma: float = 1, **params: Any) -> np.ndarray:
        return F.gamma_transform(img, gamma=gamma)

    def get_params(self) -> Dict[str, float]:
        return {"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("gamma_limit",)

class RandomGravel (gravel_roi=(0.1, 0.4, 0.9, 0.9), number_of_patches=2, always_apply=False, p=0.5) [view source on GitHub]

Add gravels.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
gravel_roi Tuple[float, float, float, float]

(top-left x, top-left y, bottom-right x, bottom right y). Should be in [0, 1] range

number_of_patches int

no. of gravel patches required

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomGravel(ImageOnlyTransform):
    """Add gravels.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        gravel_roi: (top-left x, top-left y,
            bottom-right x, bottom right y). Should be in [0, 1] range
        number_of_patches: no. of gravel patches required

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        gravel_roi: Tuple[float, float, float, float] = (0.1, 0.4, 0.9, 0.9),
        number_of_patches: int = 2,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        (gravel_lower_x, gravel_lower_y, gravel_upper_x, gravel_upper_y) = gravel_roi

        if not 0 <= gravel_lower_x < gravel_upper_x <= 1 or not 0 <= gravel_lower_y < gravel_upper_y <= 1:
            raise ValueError(f"Invalid gravel_roi. Got: {gravel_roi}.")
        if number_of_patches < 1:
            raise ValueError(f"Invalid gravel number_of_patches. Got: {number_of_patches}.")

        self.gravel_roi = gravel_roi
        self.number_of_patches = number_of_patches

    def generate_gravel_patch(self, rectangular_roi: Tuple[int, int, int, int]) -> np.ndarray:
        x1, y1, x2, y2 = rectangular_roi
        area = abs((x2 - x1) * (y2 - y1))
        count = area // 10
        gravels = np.empty([count, 2], dtype=np.int64)
        gravels[:, 0] = random_utils.randint(x1, x2, count)
        gravels[:, 1] = random_utils.randint(y1, y2, count)
        return gravels

    def apply(self, img: np.ndarray, gravels_infos: Optional[List[Any]] = None, **params: Any) -> np.ndarray:
        if gravels_infos is None:
            gravels_infos = []
        return F.add_gravel(img, gravels_infos)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
        img = params["image"]
        height, width = img.shape[:2]

        x_min, y_min, x_max, y_max = self.gravel_roi
        x_min = int(x_min * width)
        x_max = int(x_max * width)
        y_min = int(y_min * height)
        y_max = int(y_max * height)

        max_height = 200
        max_width = 30

        rectangular_rois = np.zeros([self.number_of_patches, 4], dtype=np.int64)
        xx1 = random_utils.randint(x_min + 1, x_max, self.number_of_patches)  # xmax
        xx2 = random_utils.randint(x_min, xx1)  # xmin
        yy1 = random_utils.randint(y_min + 1, y_max, self.number_of_patches)  # ymax
        yy2 = random_utils.randint(y_min, yy1)  # ymin

        rectangular_rois[:, 0] = xx2
        rectangular_rois[:, 1] = yy2
        rectangular_rois[:, 2] = [min(tup) for tup in zip(xx1, xx2 + max_height)]
        rectangular_rois[:, 3] = [min(tup) for tup in zip(yy1, yy2 + max_width)]

        minx = []
        maxx = []
        miny = []
        maxy = []
        val = []
        for roi in rectangular_rois:
            gravels = self.generate_gravel_patch(roi)
            x = gravels[:, 0]
            y = gravels[:, 1]
            r = random_utils.randint(1, 4, len(gravels))
            sat = random_utils.randint(0, 255, len(gravels))
            miny.append(np.maximum(y - r, 0))
            maxy.append(np.minimum(y + r, y))
            minx.append(np.maximum(x - r, 0))
            maxx.append(np.minimum(x + r, x))
            val.append(sat)

        return {
            "gravels_infos": np.stack(
                [
                    np.concatenate(miny),
                    np.concatenate(maxy),
                    np.concatenate(minx),
                    np.concatenate(maxx),
                    np.concatenate(val),
                ],
                1,
            )
        }

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return "gravel_roi", "number_of_patches"

class RandomGridShuffle (grid=(3, 3), always_apply=False, p=0.5) [view source on GitHub]

Random shuffle grid's cells on image.

Parameters:

Name Type Description
grid int, int

size of grid for splitting image.

Targets

image, mask, keypoints

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomGridShuffle(DualTransform):
    """Random shuffle grid's cells on image.

    Args:
        grid ((int, int)): size of grid for splitting image.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    def __init__(self, grid: Tuple[int, int] = (3, 3), always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)

        n, m = grid

        if not all(isinstance(dim, int) and dim > 0 for dim in [n, m]):
            raise ValueError(f"Grid dimensions must be positive integers. Current grid dimensions: [{n}, {m}]")

        self.grid = grid

    def apply(self, img: np.ndarray, tiles: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.swap_tiles_on_image(img, tiles)

    def apply_to_mask(self, mask: np.ndarray, tiles: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.swap_tiles_on_image(mask, tiles)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        tiles: np.ndarray,
        mapping: Dict[int, int],
        **params: Any,
    ) -> KeypointInternalType:
        x, y = keypoint[:2]

        # Find which original tile the keypoint belongs to
        for original_index, (start_y, start_x, end_y, end_x) in enumerate(tiles):
            if start_y <= y < end_y and start_x <= x < end_x:
                # Find this tile's new index after shuffling
                new_index = mapping[original_index]
                # Get the new tile's coordinates
                new_start_y, new_start_x = tiles[new_index][:2]

                # Map the keypoint to the new tile's position
                new_x = (x - start_x) + new_start_x
                new_y = (y - start_y) + new_start_y

                return (new_x, new_y, *keypoint[2:])

        # If the keypoint wasn't in any tile (shouldn't happen), log a warning for debugging purposes
        warn(
            "Keypoint not in any tile, returning it unchanged. This is unexpected and should be investigated.",
            RuntimeWarning,
        )
        return keypoint

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        # Generate the original grid
        original_tiles = split_uniform_grid(params["image"].shape[:2], self.grid)

        # Copy the original grid to keep track of the initial positions
        indexed_tiles = np.array(list(enumerate(original_tiles)), dtype=object)

        # Shuffle the tiles while keeping track of original indices
        random_utils.shuffle(indexed_tiles)

        # Create a mapping from original positions to new positions
        mapping = {original_index: i for i, (original_index, tile) in enumerate(indexed_tiles)}

        # Extract the shuffled tiles without indices
        shuffled_tiles = np.array([tile for _, tile in indexed_tiles])

        return {"tiles": shuffled_tiles, "mapping": mapping}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("grid",)

class RandomRain (slant_lower=-10, slant_upper=10, drop_length=20, drop_width=1, drop_color=(200, 200, 200), blur_value=7, brightness_coefficient=0.7, rain_type=None, always_apply=False, p=0.5) [view source on GitHub]

Adds rain effects.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
slant_lower int

should be in range [-20, 20].

slant_upper int

should be in range [-20, 20].

drop_length int

should be in range [0, 100].

drop_width int

should be in range [1, 5].

drop_color list of (r, g, b

rain lines color.

blur_value int

rainy view are blurry

brightness_coefficient float

rainy days are usually shady. Should be in range [0, 1].

rain_type Optional[str]

One of [None, "drizzle", "heavy", "torrential"]

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomRain(ImageOnlyTransform):
    """Adds rain effects.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        slant_lower: should be in range [-20, 20].
        slant_upper: should be in range [-20, 20].
        drop_length: should be in range [0, 100].
        drop_width: should be in range [1, 5].
        drop_color (list of (r, g, b)): rain lines color.
        blur_value (int): rainy view are blurry
        brightness_coefficient (float): rainy days are usually shady. Should be in range [0, 1].
        rain_type: One of [None, "drizzle", "heavy", "torrential"]

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        slant_lower: int = -10,
        slant_upper: int = 10,
        drop_length: int = 20,
        drop_width: int = 1,
        drop_color: Tuple[int, int, int] = (200, 200, 200),
        blur_value: int = 7,
        brightness_coefficient: float = 0.7,
        rain_type: Optional[str] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        if rain_type not in ["drizzle", "heavy", "torrential", None]:
            msg = "raint_type must be one of ({}). Got: {}".format(["drizzle", "heavy", "torrential", None], rain_type)
            raise ValueError(msg)
        if not -TWENTY <= slant_lower <= slant_upper <= TWENTY:
            raise ValueError(f"Invalid combination of slant_lower and slant_upper. Got: {(slant_lower, slant_upper)}")
        if not 1 <= drop_width <= FIVE:
            raise ValueError(f"drop_width must be in range [1, 5]. Got: {drop_width}")
        if not 0 <= drop_length <= MAX_JPEG_QUALITY:
            raise ValueError(f"drop_length must be in range [0, 100]. Got: {drop_length}")
        if not 0 <= brightness_coefficient <= 1:
            raise ValueError(f"brightness_coefficient must be in range [0, 1]. Got: {brightness_coefficient}")

        self.slant_lower = slant_lower
        self.slant_upper = slant_upper

        self.drop_length = drop_length
        self.drop_width = drop_width
        self.drop_color = drop_color
        self.blur_value = blur_value
        self.brightness_coefficient = brightness_coefficient
        self.rain_type = rain_type

    def apply(
        self,
        img: np.ndarray,
        slant: int = 10,
        drop_length: int = 20,
        rain_drops: Optional[List[Tuple[int, int]]] = None,
        **params: Any,
    ) -> np.ndarray:
        if rain_drops is None:
            rain_drops = []
        return F.add_rain(
            img,
            slant,
            drop_length,
            self.drop_width,
            self.drop_color,
            self.blur_value,
            self.brightness_coefficient,
            rain_drops,
        )

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        slant = int(random.uniform(self.slant_lower, self.slant_upper))

        height, width = img.shape[:2]
        area = height * width

        if self.rain_type == "drizzle":
            num_drops = area // 770
            drop_length = 10
        elif self.rain_type == "heavy":
            num_drops = width * height // 600
            drop_length = 30
        elif self.rain_type == "torrential":
            num_drops = area // 500
            drop_length = 60
        else:
            drop_length = self.drop_length
            num_drops = area // 600

        rain_drops = []

        for _ in range(num_drops):  # If You want heavy rain, try increasing this
            x = random.randint(slant, width) if slant < 0 else random.randint(0, width - slant)

            y = random.randint(0, height - drop_length)

            rain_drops.append((x, y))

        return {"drop_length": drop_length, "slant": slant, "rain_drops": rain_drops}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "slant_lower",
            "slant_upper",
            "drop_length",
            "drop_width",
            "drop_color",
            "blur_value",
            "brightness_coefficient",
            "rain_type",
        )

class RandomShadow (shadow_roi=(0, 0.5, 1, 1), num_shadows_lower=1, num_shadows_upper=2, shadow_dimension=5, always_apply=False, p=0.5) [view source on GitHub]

Simulates shadows for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
shadow_roi Tuple[float, float, float, float]

region of the image where shadows will appear. All values should be in range [0, 1].

num_shadows_lower int

Lower limit for the possible number of shadows. Should be in range [0, num_shadows_upper].

num_shadows_upper int

Lower limit for the possible number of shadows. Should be in range [num_shadows_lower, inf].

shadow_dimension int

number of edges in the shadow polygons

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomShadow(ImageOnlyTransform):
    """Simulates shadows for the image

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        shadow_roi: region of the image where shadows
            will appear. All values should be in range [0, 1].
        num_shadows_lower: Lower limit for the possible number of shadows.
            Should be in range [0, `num_shadows_upper`].
        num_shadows_upper: Lower limit for the possible number of shadows.
            Should be in range [`num_shadows_lower`, inf].
        shadow_dimension: number of edges in the shadow polygons

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        shadow_roi: Tuple[float, float, float, float] = (0, 0.5, 1, 1),
        num_shadows_lower: int = 1,
        num_shadows_upper: int = 2,
        shadow_dimension: int = 5,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        (shadow_lower_x, shadow_lower_y, shadow_upper_x, shadow_upper_y) = shadow_roi

        if not 0 <= shadow_lower_x <= shadow_upper_x <= 1 or not 0 <= shadow_lower_y <= shadow_upper_y <= 1:
            raise ValueError(f"Invalid shadow_roi. Got: {shadow_roi}")
        if not 0 <= num_shadows_lower <= num_shadows_upper:
            msg = "Invalid combination of num_shadows_lower nad num_shadows_upper. "
            f"Got: {(num_shadows_lower, num_shadows_upper)}"
            raise ValueError(msg)

        self.shadow_roi = shadow_roi

        self.num_shadows_lower = num_shadows_lower
        self.num_shadows_upper = num_shadows_upper

        self.shadow_dimension = shadow_dimension

    def apply(
        self, img: np.ndarray, vertices_list: Optional[List[List[Tuple[int, int]]]] = None, **params: Any
    ) -> np.ndarray:
        if vertices_list is None:
            vertices_list = []
        return F.add_shadow(img, vertices_list)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, List[np.ndarray]]:
        img = params["image"]
        height, width = img.shape[:2]

        num_shadows = random.randint(self.num_shadows_lower, self.num_shadows_upper)

        x_min, y_min, x_max, y_max = self.shadow_roi

        x_min = int(x_min * width)
        x_max = int(x_max * width)
        y_min = int(y_min * height)
        y_max = int(y_max * height)

        vertices_list = []

        for _ in range(num_shadows):
            vertex = [
                (random.randint(x_min, x_max), random.randint(y_min, y_max)) for _ in range(self.shadow_dimension)
            ]

            vertices = np.array([vertex], dtype=np.int32)
            vertices_list.append(vertices)

        return {"vertices_list": vertices_list}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return (
            "shadow_roi",
            "num_shadows_lower",
            "num_shadows_upper",
            "shadow_dimension",
        )

class RandomSnow (snow_point_lower=0.1, snow_point_upper=0.3, brightness_coeff=2.5, always_apply=False, p=0.5) [view source on GitHub]

Bleach out some pixel values simulating snow.

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
snow_point_lower float

lower_bond of the amount of snow. Should be in [0, 1] range

snow_point_upper float

upper_bond of the amount of snow. Should be in [0, 1] range

brightness_coeff float

larger number will lead to a more snow on the image. Should be >= 0

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomSnow(ImageOnlyTransform):
    """Bleach out some pixel values simulating snow.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        snow_point_lower: lower_bond of the amount of snow. Should be in [0, 1] range
        snow_point_upper: upper_bond of the amount of snow. Should be in [0, 1] range
        brightness_coeff: larger number will lead to a more snow on the image. Should be >= 0

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        snow_point_lower: float = 0.1,
        snow_point_upper: float = 0.3,
        brightness_coeff: float = 2.5,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        if not 0 <= snow_point_lower <= snow_point_upper <= 1:
            msg = (
                "Invalid combination of snow_point_lower and snow_point_upper. "
                f"Got: {(snow_point_lower, snow_point_upper)}"
            )
            raise ValueError(msg)
        if brightness_coeff < 0:
            raise ValueError(f"brightness_coeff must be greater than 0. Got: {brightness_coeff}")

        self.snow_point_lower = snow_point_lower
        self.snow_point_upper = snow_point_upper
        self.brightness_coeff = brightness_coeff

    def apply(self, img: np.ndarray, snow_point: float = 0.1, **params: Any) -> np.ndarray:
        return F.add_snow(img, snow_point, self.brightness_coeff)

    def get_params(self) -> Dict[str, np.ndarray]:
        return {"snow_point": random.uniform(self.snow_point_lower, self.snow_point_upper)}

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("snow_point_lower", "snow_point_upper", "brightness_coeff")

class RandomSunFlare (flare_roi=(0, 0, 1, 0.5), angle_lower=0, angle_upper=1, num_flare_circles_lower=6, num_flare_circles_upper=10, src_radius=400, src_color=(255, 255, 255), always_apply=False, p=0.5) [view source on GitHub]

Simulates Sun Flare for the image

From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Parameters:

Name Type Description
flare_roi Tuple[float, float, float, float]

region of the image where flare will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].

angle_lower float

should be in range [0, angle_upper].

angle_upper float

should be in range [angle_lower, 1].

num_flare_circles_lower int

lower limit for the number of flare circles. Should be in range [0, num_flare_circles_upper].

num_flare_circles_upper int

upper limit for the number of flare circles. Should be in range [num_flare_circles_lower, inf].

src_radius int
src_color Tuple[int, int, int]

color of the flare

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class RandomSunFlare(ImageOnlyTransform):
    """Simulates Sun Flare for the image

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        flare_roi: region of the image where flare will appear (x_min, y_min, x_max, y_max).
            All values should be in range [0, 1].
        angle_lower: should be in range [0, `angle_upper`].
        angle_upper: should be in range [`angle_lower`, 1].
        num_flare_circles_lower: lower limit for the number of flare circles.
            Should be in range [0, `num_flare_circles_upper`].
        num_flare_circles_upper: upper limit for the number of flare circles.
            Should be in range [`num_flare_circles_lower`, inf].
        src_radius:
        src_color: color of the flare

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        flare_roi: Tuple[float, float, float, float] = (0, 0, 1, 0.5),
        angle_lower: float = 0,
        angle_upper: float = 1,
        num_flare_circles_lower: int = 6,
        num_flare_circles_upper: int = 10,
        src_radius: int = 400,
        src_color: Tuple[int, int, int] = (255, 255, 255),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        (
            flare_center_lower_x,
            flare_center_lower_y,
            flare_center_upper_x,
            flare_center_upper_y,
        ) = flare_roi

        if (
            not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
            or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
        ):
            raise ValueError(f"Invalid flare_roi. Got: {flare_roi}")
        if not 0 <= angle_lower < angle_upper <= 1:
            raise ValueError(f"Invalid combination of angle_lower nad angle_upper. Got: {(angle_lower, angle_upper)}")
        if not 0 <= num_flare_circles_lower < num_flare_circles_upper:
            msg = (
                "Invalid combination of num_flare_circles_lower and num_flare_circles_upper. "
                f"Got: {(num_flare_circles_lower, num_flare_circles_upper)}"
            )
            raise ValueError(msg)

        self.flare_center_lower_x = flare_center_lower_x
        self.flare_center_upper_x = flare_center_upper_x

        self.flare_center_lower_y = flare_center_lower_y
        self.flare_center_upper_y = flare_center_upper_y

        self.angle_lower = angle_lower
        self.angle_upper = angle_upper
        self.num_flare_circles_lower = num_flare_circles_lower
        self.num_flare_circles_upper = num_flare_circles_upper

        self.src_radius = src_radius
        self.src_color = src_color

    def apply(
        self,
        img: np.ndarray,
        flare_center_x: float = 0.5,
        flare_center_y: float = 0.5,
        circles: Optional[List[Any]] = None,
        **params: Any,
    ) -> np.ndarray:
        if circles is None:
            circles = []
        return F.add_sun_flare(
            img,
            flare_center_x,
            flare_center_y,
            self.src_radius,
            self.src_color,
            circles,
        )

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        height, width = img.shape[:2]

        angle = 2 * math.pi * random.uniform(self.angle_lower, self.angle_upper)

        flare_center_x = random.uniform(self.flare_center_lower_x, self.flare_center_upper_x)
        flare_center_y = random.uniform(self.flare_center_lower_y, self.flare_center_upper_y)

        flare_center_x = int(width * flare_center_x)
        flare_center_y = int(height * flare_center_y)

        num_circles = random.randint(self.num_flare_circles_lower, self.num_flare_circles_upper)

        circles = []

        x = []
        y = []

        def line(t: float) -> Tuple[float, float]:
            return (flare_center_x + t * math.cos(angle), flare_center_y + t * math.sin(angle))

        for t_val in range(-flare_center_x, width - flare_center_x, 10):
            rand_x, rand_y = line(t_val)
            x.append(rand_x)
            y.append(rand_y)

        for _ in range(num_circles):
            alpha = random.uniform(0.05, 0.2)
            r = random.randint(0, len(x) - 1)
            rad = random.randint(1, max(height // 100 - 2, 2))

            r_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
            g_color = random.randint(max(self.src_color[1] - 50, 0), self.src_color[1])
            b_color = random.randint(max(self.src_color[2] - 50, 0), self.src_color[2])

            circles += [
                (
                    alpha,
                    (int(x[r]), int(y[r])),
                    pow(rad, 3),
                    (r_color, g_color, b_color),
                )
            ]

        return {
            "circles": circles,
            "flare_center_x": flare_center_x,
            "flare_center_y": flare_center_y,
        }

    def get_transform_init_args(self) -> Dict[str, Any]:
        return {
            "flare_roi": (
                self.flare_center_lower_x,
                self.flare_center_lower_y,
                self.flare_center_upper_x,
                self.flare_center_upper_y,
            ),
            "angle_lower": self.angle_lower,
            "angle_upper": self.angle_upper,
            "num_flare_circles_lower": self.num_flare_circles_lower,
            "num_flare_circles_upper": self.num_flare_circles_upper,
            "src_radius": self.src_radius,
            "src_color": self.src_color,
        }

class RandomToneCurve (scale=0.1, always_apply=False, p=0.5) [view source on GitHub]

Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name Type Description
scale float

standard deviation of the normal distribution. Used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Default: 0.1

Targets

image

Image types: uint8

Source code in albumentations/augmentations/transforms.py
Python
class RandomToneCurve(ImageOnlyTransform):
    """Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

    Args:
        scale: standard deviation of the normal distribution.
            Used to sample random distances to move two control points that modify the image's curve.
            Values should be in range [0, 1]. Default: 0.1


    Targets:
        image

    Image types:
        uint8

    """

    def __init__(
        self,
        scale: float = 0.1,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.scale = scale

    def apply(self, img: np.ndarray, low_y: float, high_y: float, **params: Any) -> np.ndarray:
        return F.move_tone_curve(img, low_y, high_y)

    def get_params(self) -> Dict[str, float]:
        return {
            "low_y": np.clip(random_utils.normal(loc=0.25, scale=self.scale), 0, 1),
            "high_y": np.clip(random_utils.normal(loc=0.75, scale=self.scale), 0, 1),
        }

    def get_transform_init_args_names(self) -> Tuple[str]:
        return ("scale",)

class RingingOvershoot (blur_limit=(7, 15), cutoff=(0.7853981633974483, 1.5707963267948966), always_apply=False, p=0.5) [view source on GitHub]

Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.

Parameters:

Name Type Description
blur_limit Union[int, Tuple[int, int]]

maximum kernel size for sinc filter. Should be in range [3, inf). Default: (7, 15).

cutoff Union[float, Tuple[float, float]]

range to choose the cutoff frequency in radians. Should be in range (0, np.pi) Default: (np.pi / 4, np.pi / 2).

p float

probability of applying the transform. Default: 0.5.

Reference

dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter https://arxiv.org/abs/2107.10833

Targets

image

Source code in albumentations/augmentations/transforms.py
Python
class RingingOvershoot(ImageOnlyTransform):
    """Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.

    Args:
        blur_limit: maximum kernel size for sinc filter.
            Should be in range [3, inf). Default: (7, 15).
        cutoff: range to choose the cutoff frequency in radians.
            Should be in range (0, np.pi)
            Default: (np.pi / 4, np.pi / 2).
        p: probability of applying the transform. Default: 0.5.

    Reference:
        dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
        https://arxiv.org/abs/2107.10833

    Targets:
        image

    """

    def __init__(
        self,
        blur_limit: ScaleIntType = (7, 15),
        cutoff: ScaleFloatType = (np.pi / 4, np.pi / 2),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))
        self.cutoff = self.__check_values(to_tuple(cutoff, np.pi / 2), name="cutoff", bounds=(0, np.pi))

    @staticmethod
    def __check_values(
        value: Tuple[float, float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
    ) -> Tuple[float, float]:
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{name} values should be between {bounds}")
        return value

    def get_params(self) -> Dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        if ksize % 2 == 0:
            raise ValueError(f"Kernel size must be odd. Got: {ksize}")

        cutoff = random.uniform(*self.cutoff)

        # From dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
        with np.errstate(divide="ignore", invalid="ignore"):
            kernel = np.fromfunction(
                lambda x, y: cutoff
                * special.j1(cutoff * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2))
                / (2 * np.pi * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2)),
                [ksize, ksize],
            )
        kernel[(ksize - 1) // 2, (ksize - 1) // 2] = cutoff**2 / (4 * np.pi)

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)

        return {"kernel": kernel}

    def apply(self, img: np.ndarray, kernel: Optional[int] = None, **params: Any) -> np.ndarray:
        return F.convolve(img, kernel)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("blur_limit", "cutoff")

class Sharpen (alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image and overlays the result with the original image.

Parameters:

Name Type Description
alpha Tuple[float, float]

range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).

lightness Tuple[float, float]

range to choose the lightness of the sharpened image. Default: (0.5, 1.0).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Source code in albumentations/augmentations/transforms.py
Python
class Sharpen(ImageOnlyTransform):
    """Sharpen the input image and overlays the result with the original image.

    Args:
        alpha: range to choose the visibility of the sharpened image. At 0, only the original image is
            visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
        lightness: range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    """

    def __init__(
        self,
        alpha: Tuple[float, float] = (0.2, 0.5),
        lightness: Tuple[float, float] = (0.5, 1.0),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
        self.lightness = self.__check_values(to_tuple(lightness, 0.0), name="lightness")

    @staticmethod
    def __check_values(
        value: Tuple[float, float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
    ) -> Tuple[float, float]:
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{name} values should be between {bounds}")
        return value

    @staticmethod
    def __generate_sharpening_matrix(alpha_sample: np.ndarray, lightness_sample: np.ndarray) -> np.ndarray:
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [[-1, -1, -1], [-1, 8 + lightness_sample, -1], [-1, -1, -1]],
            dtype=np.float32,
        )

        return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect

    def get_params(self) -> Dict[str, np.ndarray]:
        alpha = random.uniform(*self.alpha)
        lightness = random.uniform(*self.lightness)
        sharpening_matrix = self.__generate_sharpening_matrix(alpha_sample=alpha, lightness_sample=lightness)
        return {"sharpening_matrix": sharpening_matrix}

    def apply(self, img: np.ndarray, sharpening_matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
        return F.convolve(img, sharpening_matrix)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("alpha", "lightness")

class Solarize (threshold=128, always_apply=False, p=0.5) [view source on GitHub]

Invert all pixel values above a threshold.

Parameters:

Name Type Description
threshold Union[float, Tuple[float, float], int, Tuple[int, int]]

range for solarizing threshold. If threshold is a single value, the range will be [threshold, threshold]. Default: 128.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: any

Source code in albumentations/augmentations/transforms.py
Python
class Solarize(ImageOnlyTransform):
    """Invert all pixel values above a threshold.

    Args:
        threshold: range for solarizing threshold.
            If threshold is a single value, the range will be [threshold, threshold]. Default: 128.
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        any

    """

    def __init__(self, threshold: ScaleType = 128, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)

        if isinstance(threshold, (int, float)):
            self.threshold = to_tuple(threshold, low=threshold)
        else:
            self.threshold = to_tuple(threshold, low=0)

        self.threshold = self.threshold

    def apply(self, img: np.ndarray, threshold: int = 0, **params: Any) -> np.ndarray:
        return F.solarize(img, threshold)

    def get_params(self) -> Dict[str, float]:
        return {"threshold": random.uniform(self.threshold[0], self.threshold[1])}

    def get_transform_init_args_names(self) -> Tuple[str]:
        return ("threshold",)

class Spatter (mean=0.65, std=0.3, gauss_sigma=2, cutout_threshold=0.68, intensity=0.6, mode='rain', color=None, always_apply=False, p=0.5) [view source on GitHub]

Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

Parameters:

Name Type Description
mean float, or tuple of floats

Mean value of normal distribution for generating liquid layer. If single float it will be used as mean. If tuple of float mean will be sampled from range [mean[0], mean[1]). Default: (0.65).

std float, or tuple of floats

Standard deviation value of normal distribution for generating liquid layer. If single float it will be used as std. If tuple of float std will be sampled from range [std[0], std[1]). Default: (0.3).

gauss_sigma float, or tuple of floats

Sigma value for gaussian filtering of liquid layer. If single float it will be used as gauss_sigma. If tuple of float gauss_sigma will be sampled from range [sigma[0], sigma[1]). Default: (2).

cutout_threshold float, or tuple of floats

Threshold for filtering liqued layer (determines number of drops). If single float it will used as cutout_threshold. If tuple of float cutout_threshold will be sampled from range [cutout_threshold[0], cutout_threshold[1]). Default: (0.68).

intensity float, or tuple of floats

Intensity of corruption. If single float it will be used as intensity. If tuple of float intensity will be sampled from range [intensity[0], intensity[1]). Default: (0.6).

mode string, or list of strings

Type of corruption. Currently, supported options are 'rain' and 'mud'. If list is provided type of corruption will be sampled list. Default: ("rain").

color list of (r, g, b) or dict or None

Corruption elements color. If list uses provided list as color for specified mode. If dict uses provided color for specified mode. Color for each specified mode should be provided in dict. If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Reference: | https://arxiv.org/pdf/1903.12261.pdf | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

Source code in albumentations/augmentations/transforms.py
Python
class Spatter(ImageOnlyTransform):
    """Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

    Args:
        mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
            If single float it will be used as mean.
            If tuple of float mean will be sampled from range `[mean[0], mean[1])`. Default: (0.65).
        std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
            If single float it will be used as std.
            If tuple of float std will be sampled from range `[std[0], std[1])`. Default: (0.3).
        gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
            If single float it will be used as gauss_sigma.
            If tuple of float gauss_sigma will be sampled from range `[sigma[0], sigma[1])`. Default: (2).
        cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
            (determines number of drops). If single float it will used as cutout_threshold.
            If tuple of float cutout_threshold will be sampled from range `[cutout_threshold[0], cutout_threshold[1])`.
            Default: (0.68).
        intensity (float, or tuple of floats): Intensity of corruption.
            If single float it will be used as intensity.
            If tuple of float intensity will be sampled from range `[intensity[0], intensity[1])`. Default: (0.6).
        mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
             If list is provided type of corruption will be sampled list. Default: ("rain").
        color (list of (r, g, b) or dict or None): Corruption elements color.
            If list uses provided list as color for specified mode.
            If dict uses provided color for specified mode. Color for each specified mode should be provided in dict.
            If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
    |  https://arxiv.org/pdf/1903.12261.pdf
    |  https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

    """

    def __init__(
        self,
        mean: ScaleFloatType = 0.65,
        std: ScaleFloatType = 0.3,
        gauss_sigma: ScaleFloatType = 2,
        cutout_threshold: ScaleFloatType = 0.68,
        intensity: ScaleFloatType = 0.6,
        mode: Union[SpatterMode, Sequence[SpatterMode]] = "rain",
        color: Optional[Union[Sequence[int], Dict[str, Sequence[int]]]] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)

        self.mean = to_tuple(mean, mean)
        self.std = to_tuple(std, std)
        self.gauss_sigma = to_tuple(gauss_sigma, gauss_sigma)
        self.intensity = to_tuple(intensity, intensity)
        self.cutout_threshold = to_tuple(cutout_threshold, cutout_threshold)
        self.color = (
            color
            if color is not None
            else {
                "rain": [238, 238, 175],
                "mud": [20, 42, 63],
            }
        )
        self.mode = mode if isinstance(mode, (list, tuple)) else [mode]

        if len(set(self.mode)) > 1 and not isinstance(self.color, dict):
            raise ValueError(f"Unsupported color: {self.color}. Please specify color for each mode (use dict for it).")

        for i in self.mode:
            if i not in ["rain", "mud"]:
                raise ValueError(f"Unsupported color mode: {mode}. Transform supports only `rain` and `mud` mods.")
            if isinstance(self.color, dict):
                if i not in self.color:
                    raise ValueError(f"Wrong color definition: {self.color}. Color for mode: {i} not specified.")
                if len(self.color[i]) != THREE:
                    raise ValueError(
                        f"Unsupported color: {self.color[i]} for mode {i}. Color should be presented in RGB format."
                    )

        if isinstance(self.color, (list, tuple)):
            if len(self.color) != THREE:
                raise ValueError(f"Unsupported color: {self.color}. Color should be presented in RGB format.")
            self.color = {self.mode[0]: self.color}

    def apply(
        self,
        img: np.ndarray,
        non_mud: Optional[np.ndarray] = None,
        mud: Optional[np.ndarray] = None,
        drops: Optional[np.ndarray] = None,
        mode: SpatterMode = "mud",
        **params: Dict[str, Any],
    ) -> np.ndarray:
        return F.spatter(img, non_mud, mud, drops, mode)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        mean = random.uniform(self.mean[0], self.mean[1])
        std = random.uniform(self.std[0], self.std[1])
        cutout_threshold = random.uniform(self.cutout_threshold[0], self.cutout_threshold[1])
        sigma = random.uniform(self.gauss_sigma[0], self.gauss_sigma[1])
        mode = random.choice(self.mode)
        intensity = random.uniform(self.intensity[0], self.intensity[1])
        color = np.array(self.color[mode]) / 255.0

        liquid_layer = random_utils.normal(size=(height, width), loc=mean, scale=std)
        liquid_layer = gaussian_filter(liquid_layer, sigma=sigma, mode="nearest")
        liquid_layer[liquid_layer < cutout_threshold] = 0

        if mode == "rain":
            liquid_layer = (liquid_layer * 255).astype(np.uint8)
            dist = 255 - cv2.Canny(liquid_layer, 50, 150)
            dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
            _, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
            dist = blur(dist, 3).astype(np.uint8)
            dist = F.equalize(dist)

            ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
            dist = F.convolve(dist, ker)
            dist = blur(dist, 3).astype(np.float32)

            m = liquid_layer * dist
            m *= 1 / np.max(m, axis=(0, 1))

            drops = m[:, :, None] * color * intensity
            mud = None
            non_mud = None
        else:
            m = np.where(liquid_layer > cutout_threshold, 1, 0)
            m = gaussian_filter(m.astype(np.float32), sigma=sigma, mode="nearest")
            m[m < 1.2 * cutout_threshold] = 0
            m = m[..., np.newaxis]

            mud = m * color
            non_mud = 1 - m
            drops = None

        return {
            "non_mud": non_mud,
            "mud": mud,
            "drops": drops,
            "mode": mode,
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str, str]:
        return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode", "color"

class Superpixels (p_replace=0.1, n_segments=100, max_size=128, interpolation=1, always_apply=False, p=0.5) [view source on GitHub]

Transform images partially/completely to their superpixel representation. This implementation uses skimage's version of the SLIC algorithm.

Parameters:

Name Type Description
p_replace float or tuple of float

Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed).

Examples:

  • A probability of 0.0 would mean, that the pixels in no segment are replaced by their average color (image is not changed at all).
  • A probability of 0.5 would mean, that around half of all segments are replaced by their average color.
  • A probability of 1.0 would mean, that all segments are replaced by their average color (resulting in a voronoi image).
    Behaviour based on chosen data types for this parameter:
        * If a ``float``, then that ``flat`` will always be used.
        * If ``tuple`` ``(a, b)``, then a random probability will be
          sampled from the interval ``[a, b]`` per image.
n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
    may deviate from this number). Lower value will lead to coarser superpixels.
    Higher values are computationally more intensive and will hence lead to a slowdown
    * If a single ``int``, then that value will always be used as the
      number of segments.
    * If a ``tuple`` ``(a, b)``, then a value from the discrete
      interval ``[a..b]`` will be sampled per image.
max_size (int or None): Maximum image size at which the augmentation is performed.
    If the width or height of an image exceeds this value, it will be
    downscaled before the augmentation so that the longest side matches `max_size`.
    This is done to speed up the process. The final output image has the same size as the input image.
    Note that in case `p_replace` is below ``1.0``,
    the down-/upscaling will affect the not-replaced pixels too.
    Use ``None`` to apply no down-/upscaling.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.

Targets

image

Source code in albumentations/augmentations/transforms.py
Python
class Superpixels(ImageOnlyTransform):
    """Transform images partially/completely to their superpixel representation.
    This implementation uses skimage's version of the SLIC algorithm.

    Args:
        p_replace (float or tuple of float): Defines for any segment the probability that the pixels within that
            segment are replaced by their average color (otherwise, the pixels are not changed).

    Examples:
                * A probability of ``0.0`` would mean, that the pixels in no
                  segment are replaced by their average color (image is not
                  changed at all).
                * A probability of ``0.5`` would mean, that around half of all
                  segments are replaced by their average color.
                * A probability of ``1.0`` would mean, that all segments are
                  replaced by their average color (resulting in a voronoi
                  image).
            Behaviour based on chosen data types for this parameter:
                * If a ``float``, then that ``flat`` will always be used.
                * If ``tuple`` ``(a, b)``, then a random probability will be
                  sampled from the interval ``[a, b]`` per image.
        n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
            may deviate from this number). Lower value will lead to coarser superpixels.
            Higher values are computationally more intensive and will hence lead to a slowdown
            * If a single ``int``, then that value will always be used as the
              number of segments.
            * If a ``tuple`` ``(a, b)``, then a value from the discrete
              interval ``[a..b]`` will be sampled per image.
        max_size (int or None): Maximum image size at which the augmentation is performed.
            If the width or height of an image exceeds this value, it will be
            downscaled before the augmentation so that the longest side matches `max_size`.
            This is done to speed up the process. The final output image has the same size as the input image.
            Note that in case `p_replace` is below ``1.0``,
            the down-/upscaling will affect the not-replaced pixels too.
            Use ``None`` to apply no down-/upscaling.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    """

    def __init__(
        self,
        p_replace: ScaleFloatType = 0.1,
        n_segments: ScaleIntType = 100,
        max_size: Optional[int] = 128,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.p_replace = to_tuple(p_replace, p_replace)
        self.n_segments = to_tuple(n_segments, n_segments)
        self.max_size = max_size
        self.interpolation = interpolation

        if min(self.n_segments) < 1:
            raise ValueError(f"n_segments must be >= 1. Got: {n_segments}")

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("p_replace", "n_segments", "max_size", "interpolation")

    def get_params(self) -> Dict[str, Any]:
        n_segments = random.randint(*self.n_segments)
        p = random.uniform(*self.p_replace)
        return {"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}

    def apply(
        self, img: np.ndarray, replace_samples: Sequence[bool] = (False,), n_segments: int = 1, **kwargs: Any
    ) -> np.ndarray:
        return F.superpixels(img, n_segments, replace_samples, self.max_size, cast(int, self.interpolation))

class TemplateTransform (templates, img_weight=0.5, template_weight=0.5, template_transform=None, name=None, always_apply=False, p=0.5) [view source on GitHub]

Apply blending of input image with specified templates

Parameters:

Name Type Description
templates numpy array or list of numpy arrays

Images as template for transform.

img_weight Union[float, Tuple[float, float]]

If single float will be used as weight for input image. If tuple of float img_weight will be in range [img_weight[0], img_weight[1]). Default: 0.5.

template_weight Union[float, Tuple[float, float]]

If single float will be used as weight for template. If tuple of float template_weight will be in range [template_weight[0], template_weight[1]). Default: 0.5.

template_transform Optional[Callable[..., Any]]

transformation object which could be applied to template, must produce template the same size as input image.

name Optional[str]

(Optional) Name of transform, used only for deserialization.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class TemplateTransform(ImageOnlyTransform):
    """Apply blending of input image with specified templates
    Args:
        templates (numpy array or list of numpy arrays): Images as template for transform.
        img_weight: If single float will be used as weight for input image.
            If tuple of float img_weight will be in range `[img_weight[0], img_weight[1])`. Default: 0.5.
        template_weight: If single float will be used as weight for template.
            If tuple of float template_weight will be in range `[template_weight[0], template_weight[1])`.
            Default: 0.5.
        template_transform: transformation object which could be applied to template,
            must produce template the same size as input image.
        name: (Optional) Name of transform, used only for deserialization.
        p: probability of applying the transform. Default: 0.5.
    Targets:
        image
    Image types:
        uint8, float32
    """

    def __init__(
        self,
        templates: Union[np.ndarray, List[np.ndarray]],
        img_weight: ScaleFloatType = 0.5,
        template_weight: ScaleFloatType = 0.5,
        template_transform: Optional[Callable[..., Any]] = None,
        name: Optional[str] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        self.templates = templates if isinstance(templates, (list, tuple)) else [templates]
        self.img_weight = to_tuple(img_weight, img_weight)
        self.template_weight = to_tuple(template_weight, template_weight)
        self.template_transform = template_transform
        self.name = name

    def apply(
        self,
        img: np.ndarray,
        template: Optional[np.ndarray] = None,
        img_weight: float = 0.5,
        template_weight: float = 0.5,
        **params: Any,
    ) -> np.ndarray:
        return F.add_weighted(img, img_weight, template, template_weight)

    def get_params(self) -> Dict[str, float]:
        return {
            "img_weight": random.uniform(self.img_weight[0], self.img_weight[1]),
            "template_weight": random.uniform(self.template_weight[0], self.template_weight[1]),
        }

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        template = random.choice(self.templates)

        if self.template_transform is not None:
            template = self.template_transform(image=template)["image"]

        if get_num_channels(template) not in [1, get_num_channels(img)]:
            msg = (
                "Template must be a single channel or "
                "has the same number of channels as input "
                f"image ({get_num_channels(img)}), got {get_num_channels(template)}"
            )
            raise ValueError(msg)

        if template.dtype != img.dtype:
            msg = "Image and template must be the same image type"
            raise ValueError(msg)

        if img.shape[:2] != template.shape[:2]:
            raise ValueError(f"Image and template must be the same size, got {img.shape[:2]} and {template.shape[:2]}")

        if get_num_channels(template) == 1 and get_num_channels(img) > 1:
            template = np.stack((template,) * get_num_channels(img), axis=-1)

        # in order to support grayscale image with dummy dim
        template = template.reshape(img.shape)

        return {"template": template}

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def to_dict_private(self) -> Dict[str, Any]:
        if self.name is None:
            msg = (
                "To make a TemplateTransform serializable you should provide the `name` argument, "
                "e.g. `TemplateTransform(name='my_transform', ...)`."
            )
            raise ValueError(msg)
        return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}

class ToFloat (max_value=None, always_apply=False, p=1.0) [view source on GitHub]

Divide pixel values by max_value to get a float32 output array where all values lie in the range [0, 1.0]. If max_value is None the transform will try to infer the maximum value by inspecting the data type of the input image.

See Also: :class:~albumentations.augmentations.transforms.FromFloat

Parameters:

Name Type Description
max_value Optional[float]

maximum possible input value. Default: None.

p float

probability of applying the transform. Default: 1.0.

Targets

image

Image types: any type

Source code in albumentations/augmentations/transforms.py
Python
class ToFloat(ImageOnlyTransform):
    """Divide pixel values by `max_value` to get a float32 output array where all values lie in the range [0, 1.0].
    If `max_value` is None the transform will try to infer the maximum value by inspecting the data type of the input
    image.

    See Also:
        :class:`~albumentations.augmentations.transforms.FromFloat`

    Args:
        max_value: maximum possible input value. Default: None.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        any type

    """

    def __init__(self, max_value: Optional[float] = None, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.max_value = max_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return F.to_float(img, self.max_value)

    def get_transform_init_args_names(self) -> Tuple[str]:
        return ("max_value",)

class ToGray [view source on GitHub]

Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater than 127, invert the resulting grayscale image.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class ToGray(ImageOnlyTransform):
    """Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater
    than 127, invert the resulting grayscale image.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if is_grayscale_image(img):
            warnings.warn("The image is already gray.")
            return img
        if not is_rgb_image(img):
            msg = "ToGray transformation expects 3-channel images."
            raise TypeError(msg)

        return F.to_gray(img)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()

class ToRGB (always_apply=True, p=1.0) [view source on GitHub]

Convert the input grayscale image to RGB.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 1.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class ToRGB(ImageOnlyTransform):
    """Convert the input grayscale image to RGB.

    Args:
        p: probability of applying the transform. Default: 1.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, always_apply: bool = True, p: float = 1.0):
        super().__init__(always_apply=always_apply, p=p)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if is_rgb_image(img):
            warnings.warn("The image is already an RGB.")
            return img
        if not is_grayscale_image(img):
            msg = "ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1."
            raise TypeError(msg)

        return F.gray_to_rgb(img)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()

class ToSepia (always_apply=False, p=0.5) [view source on GitHub]

Applies sepia filter to the input RGB image

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Source code in albumentations/augmentations/transforms.py
Python
class ToSepia(ImageOnlyTransform):
    """Applies sepia filter to the input RGB image

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)
        self.sepia_transformation_matrix = np.array(
            [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]]
        )

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            msg = "ToSepia transformation expects 3-channel images."
            raise TypeError(msg)
        return F.linear_transformation_rgb(img, self.sepia_transformation_matrix)

    def get_transform_init_args_names(self) -> Tuple[()]:
        return ()

class UnsharpMask (blur_limit=(3, 7), sigma_limit=0.0, alpha=(0.2, 0.5), threshold=10, always_apply=False, p=0.5) [view source on GitHub]

Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

Parameters:

Name Type Description
blur_limit Union[int, Tuple[int, int]]

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigma_limit Union[float, Tuple[float, float]]

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigma_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

alpha Union[float, Tuple[float, float]]

range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).

threshold int

Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10.

p float

probability of applying the transform. Default: 0.5.

Reference

arxiv.org/pdf/2107.10833.pdf

Targets

image

Source code in albumentations/augmentations/transforms.py
Python
class UnsharpMask(ImageOnlyTransform):
    """Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

    Args:
        blur_limit: maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit: Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        alpha: range to choose the visibility of the sharpened image.
            At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
            Default: (0.2, 0.5).
        threshold: Value to limit sharpening only for areas with high pixel difference between original image
            and it's smoothed version. Higher threshold means less sharpening on flat areas.
            Must be in range [0, 255]. Default: 10.
        p: probability of applying the transform. Default: 0.5.

    Reference:
        arxiv.org/pdf/2107.10833.pdf

    Targets:
        image

    """

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0.0,
        alpha: ScaleFloatType = (0.2, 0.5),
        threshold: int = 10,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))
        self.sigma_limit = self.__check_values(to_tuple(sigma_limit, 0.0), name="sigma_limit")
        self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
        self.threshold = threshold

        if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
            self.blur_limit = 3, max(3, self.blur_limit[1])
            msg = "blur_limit and sigma_limit minimum value can not be both equal to 0."
            raise ValueError(msg)

        if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
            self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
        ):
            msg = "UnsharpMask supports only odd blur limits."
            raise ValueError(msg)

    @staticmethod
    def __check_values(
        value: Union[Tuple[int, int], Tuple[float, float]], name: str, bounds: Tuple[float, float] = (0, float("inf"))
    ) -> Tuple[float, float]:
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{name} values should be between {bounds}")
        return value

    def get_params(self) -> Dict[str, Any]:
        return {
            "ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
            "sigma": random.uniform(*self.sigma_limit),
            "alpha": random.uniform(*self.alpha),
        }

    def apply(self, img: np.ndarray, ksize: int = 3, sigma: int = 0, alpha: float = 0.2, **params: Any) -> np.ndarray:
        return F.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return "blur_limit", "sigma_limit", "alpha", "threshold"

utils

def ensure_contiguous (func) [view source on GitHub]

Ensure that input img is contiguous.

Source code in albumentations/augmentations/utils.py
Python
def ensure_contiguous(
    func: Callable[Concatenate[np.ndarray, P], np.ndarray],
) -> Callable[Concatenate[np.ndarray, P], np.ndarray]:
    """Ensure that input img is contiguous."""

    @wraps(func)
    def wrapped_function(img: np.ndarray, *args: P.args, **kwargs: P.kwargs) -> np.ndarray:
        img = np.require(img, requirements=["C_CONTIGUOUS"])
        return func(img, *args, **kwargs)

    return wrapped_function

def get_opencv_dtype_from_numpy (value) [view source on GitHub]

Return a corresponding OpenCV dtype for a numpy's dtype :param value: Input dtype of numpy array :return: Corresponding dtype for OpenCV

Source code in albumentations/augmentations/utils.py
Python
def get_opencv_dtype_from_numpy(value: Union[np.ndarray, int, np.dtype, object]) -> int:
    """Return a corresponding OpenCV dtype for a numpy's dtype
    :param value: Input dtype of numpy array
    :return: Corresponding dtype for OpenCV
    """
    if isinstance(value, np.ndarray):
        value = value.dtype
    return NPDTYPE_TO_OPENCV_DTYPE[value]

def preserve_channel_dim (func) [view source on GitHub]

Preserve dummy channel dim.

Source code in albumentations/augmentations/utils.py
Python
def preserve_channel_dim(
    func: Callable[Concatenate[np.ndarray, P], np.ndarray],
) -> Callable[Concatenate[np.ndarray, P], np.ndarray]:
    """Preserve dummy channel dim."""

    @wraps(func)
    def wrapped_function(img: np.ndarray, *args: P.args, **kwargs: P.kwargs) -> np.ndarray:
        shape = img.shape
        result = func(img, *args, **kwargs)
        if len(shape) == THREE and shape[-1] == 1 and len(result.shape) == TWO:
            result = np.expand_dims(result, axis=-1)
        return result

    return wrapped_function

def preserve_shape (func) [view source on GitHub]

Preserve shape of the image

Source code in albumentations/augmentations/utils.py
Python
def preserve_shape(
    func: Callable[Concatenate[np.ndarray, P], np.ndarray],
) -> Callable[Concatenate[np.ndarray, P], np.ndarray]:
    """Preserve shape of the image"""

    @wraps(func)
    def wrapped_function(img: np.ndarray, *args: P.args, **kwargs: P.kwargs) -> np.ndarray:
        shape = img.shape
        result = func(img, *args, **kwargs)
        return result.reshape(shape)

    return wrapped_function

core special

bbox_utils

class BboxParams (format, label_fields=None, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0, check_each_transform=True) [view source on GitHub]

Parameters of bounding boxes

Parameters:

Name Type Description
format str

format of bounding boxes. Should be 'coco', 'pascal_voc', 'albumentations' or 'yolo'.

The coco format [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. The albumentations format is like pascal_voc, but normalized, in other words: [x_min, y_min, x_max, y_max], e.g. [0.2, 0.3, 0.4, 0.5]. The yolo format [x, y, width, height], e.g. [0.1, 0.2, 0.3, 0.4]; x, y - normalized bbox center; width, height - normalized bbox width and height.

label_fields list

list of fields that are joined with boxes, e.g labels. Should be same type as boxes.

min_area float

minimum area of a bounding box. All bounding boxes whose visible area in pixels is less than this value will be removed. Default: 0.0.

min_visibility float

minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.

min_width float

Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0.

min_height float

Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0.

check_each_transform bool

if True, then bboxes will be checked after each dual transform. Default: True

Source code in albumentations/core/bbox_utils.py
Python
class BboxParams(Params):
    """Parameters of bounding boxes

    Args:
        format (str): format of bounding boxes. Should be 'coco', 'pascal_voc', 'albumentations' or 'yolo'.

            The `coco` format
                `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
            The `pascal_voc` format
                `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
            The `albumentations` format
                is like `pascal_voc`, but normalized,
                in other words: `[x_min, y_min, x_max, y_max]`, e.g. [0.2, 0.3, 0.4, 0.5].
            The `yolo` format
                `[x, y, width, height]`, e.g. [0.1, 0.2, 0.3, 0.4];
                `x`, `y` - normalized bbox center; `width`, `height` - normalized bbox width and height.
        label_fields (list): list of fields that are joined with boxes, e.g labels.
            Should be same type as boxes.
        min_area (float): minimum area of a bounding box. All bounding boxes whose
            visible area in pixels is less than this value will be removed. Default: 0.0.
        min_visibility (float): minimum fraction of area for a bounding box
            to remain this box in list. Default: 0.0.
        min_width (float): Minimum width of a bounding box. All bounding boxes whose width is
            less than this value will be removed. Default: 0.0.
        min_height (float): Minimum height of a bounding box. All bounding boxes whose height is
            less than this value will be removed. Default: 0.0.
        check_each_transform (bool): if `True`, then bboxes will be checked after each dual transform.
            Default: `True`

    """

    def __init__(
        self,
        format: str,
        label_fields: Optional[Sequence[str]] = None,
        min_area: float = 0.0,
        min_visibility: float = 0.0,
        min_width: float = 0.0,
        min_height: float = 0.0,
        check_each_transform: bool = True,
    ):
        super().__init__(format, label_fields)
        self.min_area = min_area
        self.min_visibility = min_visibility
        self.min_width = min_width
        self.min_height = min_height
        self.check_each_transform = check_each_transform

    def to_dict_private(self) -> Dict[str, Any]:
        data = super().to_dict_private()
        data.update(
            {
                "min_area": self.min_area,
                "min_visibility": self.min_visibility,
                "min_width": self.min_width,
                "min_height": self.min_height,
                "check_each_transform": self.check_each_transform,
            }
        )
        return data

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    @classmethod
    def get_class_fullname(cls) -> str:
        return "BboxParams"

def calculate_bbox_area (bbox, rows, cols) [view source on GitHub]

Calculate the area of a bounding box in (fractional) pixels.

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

A bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
float

Area in (fractional) pixels of the (denormalized) bounding box.

Source code in albumentations/core/bbox_utils.py
Python
def calculate_bbox_area(bbox: BoxType, rows: int, cols: int) -> float:
    """Calculate the area of a bounding box in (fractional) pixels.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.

    Return:
        Area in (fractional) pixels of the (denormalized) bounding box.

    """
    bbox = denormalize_bbox(bbox, rows, cols)
    x_min, y_min, x_max, y_max = bbox[:4]
    return (x_max - x_min) * (y_max - y_min)

def check_bbox (bbox) [view source on GitHub]

Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums

Source code in albumentations/core/bbox_utils.py
Python
def check_bbox(bbox: BoxType) -> None:
    """Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums"""
    for name, value in zip(["x_min", "y_min", "x_max", "y_max"], bbox[:4]):
        if not 0 <= value <= 1 and not np.isclose(value, 0) and not np.isclose(value, 1):
            raise ValueError(f"Expected {name} for bbox {bbox} to be in the range [0.0, 1.0], got {value}.")
    x_min, y_min, x_max, y_max = bbox[:4]
    if x_max <= x_min:
        raise ValueError(f"x_max is less than or equal to x_min for bbox {bbox}.")
    if y_max <= y_min:
        raise ValueError(f"y_max is less than or equal to y_min for bbox {bbox}.")

def check_bboxes (bboxes) [view source on GitHub]

Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums

Source code in albumentations/core/bbox_utils.py
Python
def check_bboxes(bboxes: Sequence[BoxType]) -> None:
    """Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums"""
    for bbox in bboxes:
        check_bbox(bbox)

def convert_bbox_from_albumentations (bbox, target_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a bounding box from the format used by albumentations to a format, specified in target_format.

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

An albumentations bounding box (x_min, y_min, x_max, y_max).

target_format str

required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.

rows int

Image height.

cols int

Image width.

check_validity bool

Check if all boxes are valid boxes.

Returns:

Type Description
tuple

A bounding box.

Note

The coco format of a bounding box looks like [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format of a bounding box looks like [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. The yolo format of a bounding box looks like [x, y, width, height], e.g. [0.3, 0.1, 0.05, 0.07].

Exceptions:

Type Description
ValueError

if target_format is not equal to coco, pascal_voc or yolo.

Source code in albumentations/core/bbox_utils.py
Python
def convert_bbox_from_albumentations(
    bbox: BoxType, target_format: str, rows: int, cols: int, check_validity: bool = False
) -> BoxType:
    """Convert a bounding box from the format used by albumentations to a format, specified in `target_format`.

    Args:
        bbox: An albumentations bounding box `(x_min, y_min, x_max, y_max)`.
        target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
        rows: Image height.
        cols: Image width.
        check_validity: Check if all boxes are valid boxes.

    Returns:
        tuple: A bounding box.

    Note:
        The `coco` format of a bounding box looks like `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
        The `pascal_voc` format of a bounding box looks like `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
        The `yolo` format of a bounding box looks like `[x, y, width, height]`, e.g. [0.3, 0.1, 0.05, 0.07].

    Raises:
        ValueError: if `target_format` is not equal to `coco`, `pascal_voc` or `yolo`.

    """
    if target_format not in {"coco", "pascal_voc", "yolo"}:
        raise ValueError(
            f"Unknown target_format {target_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'"
        )
    if check_validity:
        check_bbox(bbox)

    if target_format != "yolo":
        bbox = denormalize_bbox(bbox, rows, cols)
    if target_format == "coco":
        (x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
        width = x_max - x_min
        height = y_max - y_min
        bbox = cast(BoxType, (x_min, y_min, width, height, *tail))
    elif target_format == "yolo":
        (x_min, y_min, x_max, y_max), tail = bbox[:4], bbox[4:]
        x = (x_min + x_max) / 2.0
        y = (y_min + y_max) / 2.0
        w = x_max - x_min
        h = y_max - y_min
        bbox = cast(BoxType, (x, y, w, h, *tail))
    return bbox

def convert_bbox_to_albumentations (bbox, source_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a bounding box from a format specified in source_format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in a form of (x_min, y_min, x_max, y_max) e.g. (0.15, 0.27, 0.67, 0.5).

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

A bounding box tuple.

source_format str

format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.

check_validity bool

Check if all boxes are valid boxes.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
tuple

A bounding box (x_min, y_min, x_max, y_max).

Note

The coco format of a bounding box looks like (x_min, y_min, width, height), e.g. (97, 12, 150, 200). The pascal_voc format of a bounding box looks like (x_min, y_min, x_max, y_max), e.g. (97, 12, 247, 212). The yolo format of a bounding box looks like (x, y, width, height), e.g. (0.3, 0.1, 0.05, 0.07); where x, y coordinates of the center of the box, all values normalized to 1 by image height and width.

Exceptions:

Type Description
ValueError

if target_format is not equal to coco or pascal_voc, or yolo.

ValueError

If in YOLO format all labels not in range (0, 1).

Source code in albumentations/core/bbox_utils.py
Python
def convert_bbox_to_albumentations(
    bbox: BoxType, source_format: str, rows: int, cols: int, check_validity: bool = False
) -> BoxType:
    """Convert a bounding box from a format specified in `source_format` to the format used by albumentations:
    normalized coordinates of top-left and bottom-right corners of the bounding box in a form of
    `(x_min, y_min, x_max, y_max)` e.g. `(0.15, 0.27, 0.67, 0.5)`.

    Args:
        bbox: A bounding box tuple.
        source_format: format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.
        check_validity: Check if all boxes are valid boxes.
        rows: Image height.
        cols: Image width.

    Returns:
        tuple: A bounding box `(x_min, y_min, x_max, y_max)`.

    Note:
        The `coco` format of a bounding box looks like `(x_min, y_min, width, height)`, e.g. (97, 12, 150, 200).
        The `pascal_voc` format of a bounding box looks like `(x_min, y_min, x_max, y_max)`, e.g. (97, 12, 247, 212).
        The `yolo` format of a bounding box looks like `(x, y, width, height)`, e.g. (0.3, 0.1, 0.05, 0.07);
        where `x`, `y` coordinates of the center of the box, all values normalized to 1 by image height and width.

    Raises:
        ValueError: if `target_format` is not equal to `coco` or `pascal_voc`, or `yolo`.
        ValueError: If in YOLO format all labels not in range (0, 1).

    """
    if source_format not in {"coco", "pascal_voc", "yolo"}:
        raise ValueError(
            f"Unknown source_format {source_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'"
        )

    if source_format == "coco":
        (x_min, y_min, width, height), tail = bbox[:4], bbox[4:]
        x_max = x_min + width
        y_max = y_min + height
    elif source_format == "yolo":
        # https://github.com/pjreddie/darknet/blob/f6d861736038da22c9eb0739dca84003c5a5e275/scripts/voc_label.py#L12
        _bbox = np.array(bbox[:4])
        if check_validity and np.any((_bbox <= 0) | (_bbox > 1)):
            msg = "In YOLO format all coordinates must be float and in range (0, 1]"
            raise ValueError(msg)

        (x, y, w, h), tail = bbox[:4], bbox[4:]

        w_half, h_half = w / 2, h / 2
        x_min = x - w_half
        y_min = y - h_half
        x_max = x_min + w
        y_max = y_min + h
    else:
        (x_min, y_min, x_max, y_max), tail = bbox[:4], bbox[4:]

    bbox = (x_min, y_min, x_max, y_max, *tuple(tail))

    if source_format != "yolo":
        bbox = normalize_bbox(bbox, rows, cols)
    if check_validity:
        check_bbox(bbox)
    return bbox

def convert_bboxes_from_albumentations (bboxes, target_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a list of bounding boxes from the format used by albumentations to a format, specified in target_format.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of albumentations bounding box (x_min, y_min, x_max, y_max).

target_format str

required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.

rows int

Image height.

cols int

Image width.

check_validity bool

Check if all boxes are valid boxes.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of bounding boxes.

Source code in albumentations/core/bbox_utils.py
Python
def convert_bboxes_from_albumentations(
    bboxes: Sequence[BoxType], target_format: str, rows: int, cols: int, check_validity: bool = False
) -> List[BoxType]:
    """Convert a list of bounding boxes from the format used by albumentations to a format, specified
    in `target_format`.

    Args:
        bboxes: List of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
        target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
        rows: Image height.
        cols: Image width.
        check_validity: Check if all boxes are valid boxes.

    Returns:
        List of bounding boxes.

    """
    return [convert_bbox_from_albumentations(bbox, target_format, rows, cols, check_validity) for bbox in bboxes]

def convert_bboxes_to_albumentations (bboxes, source_format, rows, cols, check_validity=False) [view source on GitHub]

Convert a list bounding boxes from a format specified in source_format to the format used by albumentations

Source code in albumentations/core/bbox_utils.py
Python
def convert_bboxes_to_albumentations(
    bboxes: Sequence[BoxType], source_format: str, rows: int, cols: int, check_validity: bool = False
) -> List[BoxType]:
    """Convert a list bounding boxes from a format specified in `source_format` to the format used by albumentations"""
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]

def denormalize_bbox (bbox, rows, cols) [view source on GitHub]

Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates by image height. This is an inverse operation for :func:~albumentations.augmentations.bbox.normalize_bbox.

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

Normalized bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

Denormalized bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

If rows or cols is less or equal zero

Source code in albumentations/core/bbox_utils.py
Python
def denormalize_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
    """Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates
    by image height. This is an inverse operation for :func:`~albumentations.augmentations.bbox.normalize_bbox`.

    Args:
        bbox: Normalized bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.

    Returns:
        Denormalized bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: If rows or cols is less or equal zero

    """
    tail: Tuple[Any, ...]
    (x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])

    if rows <= 0:
        msg = "Argument rows must be positive integer"
        raise ValueError(msg)
    if cols <= 0:
        msg = "Argument cols must be positive integer"
        raise ValueError(msg)

    x_min, x_max = x_min * cols, x_max * cols
    y_min, y_max = y_min * rows, y_max * rows

    return cast(BoxType, (x_min, y_min, x_max, y_max, *tail))

def denormalize_bboxes (bboxes, rows, cols) [view source on GitHub]

Denormalize a list of bounding boxes.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Normalized bounding boxes [(x_min, y_min, x_max, y_max)].

rows int

Image height.

cols int

Image width.

Returns:

Type Description
List

Denormalized bounding boxes [(x_min, y_min, x_max, y_max)].

Source code in albumentations/core/bbox_utils.py
Python
def denormalize_bboxes(bboxes: Sequence[BoxType], rows: int, cols: int) -> List[BoxType]:
    """Denormalize a list of bounding boxes.

    Args:
        bboxes: Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        rows: Image height.
        cols: Image width.

    Returns:
        List: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

    """
    return [denormalize_bbox(bbox, rows, cols) for bbox in bboxes]

def filter_bboxes (bboxes, rows, cols, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0) [view source on GitHub]

Remove bounding boxes that either lie outside of the visible area by more then min_visibility or whose area in pixels is under the threshold set by min_area. Also it crops boxes to final image size.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of albumentations bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

min_area float

Minimum area of a bounding box. All bounding boxes whose visible area in pixels. is less than this value will be removed. Default: 0.0.

min_visibility float

Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.

min_width float

Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0.

min_height float

Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

List of bounding boxes.

Source code in albumentations/core/bbox_utils.py
Python
def filter_bboxes(
    bboxes: Sequence[BoxType],
    rows: int,
    cols: int,
    min_area: float = 0.0,
    min_visibility: float = 0.0,
    min_width: float = 0.0,
    min_height: float = 0.0,
) -> List[BoxType]:
    """Remove bounding boxes that either lie outside of the visible area by more then min_visibility
    or whose area in pixels is under the threshold set by `min_area`. Also it crops boxes to final image size.

    Args:
        bboxes: List of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.
        min_area: Minimum area of a bounding box. All bounding boxes whose visible area in pixels.
            is less than this value will be removed. Default: 0.0.
        min_visibility: Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.
        min_width: Minimum width of a bounding box. All bounding boxes whose width is
            less than this value will be removed. Default: 0.0.
        min_height: Minimum height of a bounding box. All bounding boxes whose height is
            less than this value will be removed. Default: 0.0.

    Returns:
        List of bounding boxes.

    """
    resulting_boxes: List[BoxType] = []
    for i in range(len(bboxes)):
        bbox = bboxes[i]
        # Calculate areas of bounding box before and after clipping.
        transformed_box_area = calculate_bbox_area(bbox, rows, cols)
        bbox, tail = cast(BoxType, tuple(np.clip(bbox[:4], 0, 1.0))), tuple(bbox[4:])
        clipped_box_area = calculate_bbox_area(bbox, rows, cols)

        # Calculate width and height of the clipped bounding box.
        x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
        clipped_width, clipped_height = x_max - x_min, y_max - y_min

        if (
            clipped_box_area != 0  # to ensure transformed_box_area!=0 and to handle min_area=0 or min_visibility=0
            and clipped_box_area >= min_area
            and clipped_box_area / transformed_box_area >= min_visibility
            and clipped_width >= min_width
            and clipped_height >= min_height
        ):
            resulting_boxes.append(cast(BoxType, bbox + tail))
    return resulting_boxes

def filter_bboxes_by_visibility (original_shape, bboxes, transformed_shape, transformed_bboxes, threshold=0.0, min_area=0.0) [view source on GitHub]

Filter bounding boxes and return only those boxes whose visibility after transformation is above the threshold and minimal area of bounding box in pixels is more then min_area.

Parameters:

Name Type Description
original_shape Sequence[int]

Original image shape (height, width, ...).

bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Original bounding boxes [(x_min, y_min, x_max, y_max)].

transformed_shape Sequence[int]

Transformed image shape (height, width).

transformed_bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Transformed bounding boxes [(x_min, y_min, x_max, y_max)].

threshold float

visibility threshold. Should be a value in the range [0.0, 1.0].

min_area float

Minimal area threshold.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Filtered bounding boxes [(x_min, y_min, x_max, y_max)].

Source code in albumentations/core/bbox_utils.py
Python
def filter_bboxes_by_visibility(
    original_shape: Sequence[int],
    bboxes: Sequence[BoxType],
    transformed_shape: Sequence[int],
    transformed_bboxes: Sequence[BoxType],
    threshold: float = 0.0,
    min_area: float = 0.0,
) -> List[BoxType]:
    """Filter bounding boxes and return only those boxes whose visibility after transformation is above
    the threshold and minimal area of bounding box in pixels is more then min_area.

    Args:
        original_shape: Original image shape `(height, width, ...)`.
        bboxes: Original bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        transformed_shape: Transformed image shape `(height, width)`.
        transformed_bboxes: Transformed bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        threshold: visibility threshold. Should be a value in the range [0.0, 1.0].
        min_area: Minimal area threshold.

    Returns:
        Filtered bounding boxes `[(x_min, y_min, x_max, y_max)]`.

    """
    img_height, img_width = original_shape[:2]
    transformed_img_height, transformed_img_width = transformed_shape[:2]

    visible_bboxes = []
    for bbox, transformed_bbox in zip(bboxes, transformed_bboxes):
        if not all(0.0 <= value <= 1.0 for value in transformed_bbox[:4]):
            continue
        bbox_area = calculate_bbox_area(bbox, img_height, img_width)
        transformed_bbox_area = calculate_bbox_area(transformed_bbox, transformed_img_height, transformed_img_width)
        if transformed_bbox_area < min_area:
            continue
        visibility = transformed_bbox_area / bbox_area
        if visibility >= threshold:
            visible_bboxes.append(transformed_bbox)
    return visible_bboxes

def normalize_bbox (bbox, rows, cols) [view source on GitHub]

Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates by image height.

Parameters:

Name Type Description
bbox Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

Denormalized bounding box (x_min, y_min, x_max, y_max).

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]

Normalized bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

If rows or cols is less or equal zero

Source code in albumentations/core/bbox_utils.py
Python
def normalize_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
    """Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates
    by image height.

    Args:
        bbox: Denormalized bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.

    Returns:
        Normalized bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: If rows or cols is less or equal zero

    """
    if rows <= 0:
        msg = "Argument rows must be positive integer"
        raise ValueError(msg)
    if cols <= 0:
        msg = "Argument cols must be positive integer"
        raise ValueError(msg)

    tail: Tuple[Any, ...]
    (x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
    x_min /= cols
    x_max /= cols
    y_min /= rows
    y_max /= rows

    return cast(BoxType, (x_min, y_min, x_max, y_max, *tail))

def normalize_bboxes (bboxes, rows, cols) [view source on GitHub]

Normalize a list of bounding boxes.

Parameters:

Name Type Description
bboxes Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Denormalized bounding boxes [(x_min, y_min, x_max, y_max)].

rows int

Image height.

cols int

Image width.

Returns:

Type Description
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]]

Normalized bounding boxes [(x_min, y_min, x_max, y_max)].

Source code in albumentations/core/bbox_utils.py
Python
def normalize_bboxes(bboxes: Sequence[BoxType], rows: int, cols: int) -> List[BoxType]:
    """Normalize a list of bounding boxes.

    Args:
        bboxes: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        rows: Image height.
        cols: Image width.

    Returns:
        Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

    """
    return [normalize_bbox(bbox, rows, cols) for bbox in bboxes]

def union_of_bboxes (height, width, bboxes, erosion_rate=0.0) [view source on GitHub]

Calculate union of bounding boxes.

Parameters:

Name Type Description
height float

Height of image or space.

width float

Width of image or space.

bboxes List[tuple]

List like bounding boxes. Format is [(x_min, y_min, x_max, y_max)].

erosion_rate float

How much each bounding box can be shrunk, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox to lose its volume.

Returns:

Type Description
tuple

A bounding box (x_min, y_min, x_max, y_max).

Source code in albumentations/core/bbox_utils.py
Python
def union_of_bboxes(height: int, width: int, bboxes: Sequence[BoxType], erosion_rate: float = 0.0) -> BoxInternalType:
    """Calculate union of bounding boxes.

    Args:
        height (float): Height of image or space.
        width (float): Width of image or space.
        bboxes (List[tuple]): List like bounding boxes. Format is `[(x_min, y_min, x_max, y_max)]`.
        erosion_rate (float): How much each bounding box can be shrunk, useful for erosive cropping.
            Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox to lose its volume.

    Returns:
        tuple: A bounding box `(x_min, y_min, x_max, y_max)`.

    """
    x1, y1 = width, height
    x2, y2 = 0, 0
    for bbox in bboxes:
        x_min, y_min, x_max, y_max = bbox[:4]
        w, h = x_max - x_min, y_max - y_min
        lim_x1, lim_y1 = x_min + erosion_rate * w, y_min + erosion_rate * h
        lim_x2, lim_y2 = x_max - erosion_rate * w, y_max - erosion_rate * h
        x1, y1 = np.min([x1, lim_x1]), np.min([y1, lim_y1])
        x2, y2 = np.max([x2, lim_x2]), np.max([y2, lim_y2])
    return x1, y1, x2, y2

composition

class Compose (transforms, bbox_params=None, keypoint_params=None, additional_targets=None, p=1.0, is_check_shapes=True) [view source on GitHub]

Compose transforms and handle all transformations regarding bounding boxes

Parameters:

Name Type Description
transforms list

list of transformations to compose.

bbox_params BboxParams

Parameters for bounding boxes transforms

keypoint_params KeypointParams

Parameters for keypoints transforms

additional_targets dict

Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}

p float

probability of applying all list of transforms. Default: 1.0.

is_check_shapes bool

If True shapes consistency of images/mask/masks would be checked on each call. If you would like to disable this check - pass False (do it only if you are sure in your data consistency).

Source code in albumentations/core/composition.py
Python
class Compose(BaseCompose):
    """Compose transforms and handle all transformations regarding bounding boxes

    Args:
        transforms (list): list of transformations to compose.
        bbox_params (BboxParams): Parameters for bounding boxes transforms
        keypoint_params (KeypointParams): Parameters for keypoints transforms
        additional_targets (dict): Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}
        p (float): probability of applying all list of transforms. Default: 1.0.
        is_check_shapes (bool): If True shapes consistency of images/mask/masks would be checked on each call. If you
            would like to disable this check - pass False (do it only if you are sure in your data consistency).

    """

    def __init__(
        self,
        transforms: TransformsSeqType,
        bbox_params: Optional[Union[Dict[str, Any], "BboxParams"]] = None,
        keypoint_params: Optional[Union[Dict[str, Any], "KeypointParams"]] = None,
        additional_targets: Optional[Dict[str, str]] = None,
        p: float = 1.0,
        is_check_shapes: bool = True,
    ):
        super().__init__(transforms, p)

        self.processors: Dict[str, Union[BboxProcessor, KeypointsProcessor]] = {}
        if bbox_params:
            if isinstance(bbox_params, dict):
                b_params = BboxParams(**bbox_params)
            elif isinstance(bbox_params, BboxParams):
                b_params = bbox_params
            else:
                msg = "unknown format of bbox_params, please use `dict` or `BboxParams`"
                raise ValueError(msg)
            self.processors["bboxes"] = BboxProcessor(b_params, additional_targets)

        if keypoint_params:
            if isinstance(keypoint_params, dict):
                k_params = KeypointParams(**keypoint_params)
            elif isinstance(keypoint_params, KeypointParams):
                k_params = keypoint_params
            else:
                msg = "unknown format of keypoint_params, please use `dict` or `KeypointParams`"
                raise ValueError(msg)
            self.processors["keypoints"] = KeypointsProcessor(k_params, additional_targets)

        if additional_targets is None:
            additional_targets = {}

        self.additional_targets = additional_targets

        for proc in self.processors.values():
            proc.ensure_transforms_valid(self.transforms)

        self.add_targets(additional_targets)

        self.is_check_args = True
        self._disable_check_args_for_transforms(self.transforms)

        self.is_check_shapes = is_check_shapes

    @staticmethod
    def _disable_check_args_for_transforms(transforms: TransformsSeqType) -> None:
        for transform in transforms:
            if isinstance(transform, BaseCompose):
                Compose._disable_check_args_for_transforms(transform.transforms)
            if isinstance(transform, Compose):
                transform.disable_check_args_private()

    def disable_check_args_private(self) -> None:
        self.is_check_args = False

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
        if args:
            msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
            raise KeyError(msg)
        if self.is_check_args:
            self._check_args(**data)

        if not isinstance(force_apply, (bool, int)):
            msg = "force_apply must have bool or int type"
            raise TypeError(msg)

        need_to_run = force_apply or random.random() < self.p

        for p in self.processors.values():
            p.ensure_data_valid(data)
        transforms = self.transforms if need_to_run else get_always_apply(self.transforms)

        check_each_transform = any(
            getattr(item.params, "check_each_transform", False) for item in self.processors.values()
        )

        for p in self.processors.values():
            p.preprocess(data)

        for t in transforms:
            data = t(**data)

            if check_each_transform:
                data = self._check_data_post_transform(data)
        data = Compose._make_targets_contiguous(data)  # ensure output targets are contiguous

        for p in self.processors.values():
            p.postprocess(data)

        return data

    def _check_data_post_transform(self, data: Any) -> Dict[str, Any]:
        rows, cols = get_shape(data["image"])

        for p in self.processors.values():
            if not getattr(p.params, "check_each_transform", False):
                continue

            for data_name in p.data_fields:
                data[data_name] = p.filter(data[data_name], rows, cols)
        return data

    def to_dict_private(self) -> Dict[str, Any]:
        dictionary = super().to_dict_private()
        bbox_processor = self.processors.get("bboxes")
        keypoints_processor = self.processors.get("keypoints")
        dictionary.update(
            {
                "bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
                "keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
                "additional_targets": self.additional_targets,
                "is_check_shapes": self.is_check_shapes,
            }
        )
        return dictionary

    def get_dict_with_id(self) -> Dict[str, Any]:
        dictionary = super().get_dict_with_id()
        bbox_processor = self.processors.get("bboxes")
        keypoints_processor = self.processors.get("keypoints")
        dictionary.update(
            {
                "bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
                "keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
                "additional_targets": self.additional_targets,
                "params": None,
                "is_check_shapes": self.is_check_shapes,
            }
        )
        return dictionary

    def _check_args(self, **kwargs: Any) -> None:
        checked_single = ["image", "mask"]
        checked_multi = ["masks"]
        check_bbox_param = ["bboxes"]
        shapes = []
        for data_name, data in kwargs.items():
            internal_data_name = self.additional_targets.get(data_name, data_name)
            if internal_data_name in checked_single:
                if not isinstance(data, np.ndarray):
                    raise TypeError(f"{data_name} must be numpy array type")
                shapes.append(data.shape[:2])
            if internal_data_name in checked_multi and data is not None and len(data):
                if not isinstance(data[0], np.ndarray):
                    raise TypeError(f"{data_name} must be list of numpy arrays")
                shapes.append(data[0].shape[:2])
            if internal_data_name in check_bbox_param and self.processors.get("bboxes") is None:
                msg = "bbox_params must be specified for bbox transformations"
                raise ValueError(msg)

        if self.is_check_shapes and shapes and shapes.count(shapes[0]) != len(shapes):
            msg = (
                "Height and Width of image, mask or masks should be equal. You can disable shapes check "
                "by setting a parameter is_check_shapes=False of Compose class (do it only if you are sure "
                "about your data consistency)."
            )
            raise ValueError(msg)

    @staticmethod
    def _make_targets_contiguous(data: Any) -> Dict[str, Any]:
        result = {}
        for key, value in data.items():
            if isinstance(value, np.ndarray):
                result[key] = np.ascontiguousarray(value)
            else:
                result[key] = value

        return result

class OneOf (transforms, p=0.5) [view source on GitHub]

Select one of transforms to apply. Selected transform will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name Type Description
transforms list

list of transformations to compose.

p float

probability of applying selected transform. Default: 0.5.

Source code in albumentations/core/composition.py
Python
class OneOf(BaseCompose):
    """Select one of transforms to apply. Selected transform will be called with `force_apply=True`.
    Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

    Args:
        transforms (list): list of transformations to compose.
        p (float): probability of applying selected transform. Default: 0.5.

    """

    def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
        super().__init__(transforms, p)
        transforms_ps = [t.p for t in self.transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if self.transforms_ps and (force_apply or random.random() < self.p):
            idx: int = random_utils.choice(len(self.transforms), p=self.transforms_ps)
            t = self.transforms[idx]
            data = t(force_apply=True, **data)
        return data

class OneOrOther (first=None, second=None, transforms=None, p=0.5) [view source on GitHub]

Select one or another transform to apply. Selected transform will be called with force_apply=True.

Source code in albumentations/core/composition.py
Python
class OneOrOther(BaseCompose):
    """Select one or another transform to apply. Selected transform will be called with `force_apply=True`."""

    def __init__(
        self,
        first: Optional[TransformType] = None,
        second: Optional[TransformType] = None,
        transforms: Optional[TransformsSeqType] = None,
        p: float = 0.5,
    ):
        if transforms is None:
            if first is None or second is None:
                msg = "You must set both first and second or set transforms argument."
                raise ValueError(msg)
            transforms = [first, second]
        super().__init__(transforms, p)
        if len(self.transforms) != TWO:
            warnings.warn("Length of transforms is not equal to 2.")

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if random.random() < self.p:
            return self.transforms[0](force_apply=True, **data)

        return self.transforms[-1](force_apply=True, **data)

class PerChannel (transforms, channels=None, p=0.5) [view source on GitHub]

Apply transformations per-channel

Parameters:

Name Type Description
transforms list

list of transformations to compose.

channels sequence

channels to apply the transform to. Pass None to apply to all.

Default

None (apply to all)

p float

probability of applying the transform. Default: 0.5.

Source code in albumentations/core/composition.py
Python
class PerChannel(BaseCompose):
    """Apply transformations per-channel

    Args:
        transforms (list): list of transformations to compose.
        channels (sequence): channels to apply the transform to. Pass None to apply to all.
        Default: None (apply to all)
        p (float): probability of applying the transform. Default: 0.5.

    """

    def __init__(self, transforms: TransformsSeqType, channels: Optional[Sequence[int]] = None, p: float = 0.5):
        super().__init__(transforms, p)
        self.channels = channels

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
        if force_apply or random.random() < self.p:
            image = data["image"]

            # Expand mono images to have a single channel
            if len(image.shape) == TWO:
                image = np.expand_dims(image, -1)

            if self.channels is None:
                self.channels = range(image.shape[2])

            for c in self.channels:
                for t in self.transforms:
                    image[:, :, c] = t(image=image[:, :, c])["image"]

            data["image"] = image

        return data

class Sequential (transforms, p=0.5) [view source on GitHub]

Sequentially applies all transforms to targets.

Note

This transform is not intended to be a replacement for Compose. Instead, it should be used inside Compose the same way OneOf or OneOrOther are used. For instance, you can combine OneOf with Sequential to create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly chose sequence to input data (see the Example section for an example definition of such pipeline).

Examples:

Python
>>> import albumentations as A
>>> transform = A.Compose([
>>>    A.OneOf([
>>>        A.Sequential([
>>>            A.HorizontalFlip(p=0.5),
>>>            A.ShiftScaleRotate(p=0.5),
>>>        ]),
>>>        A.Sequential([
>>>            A.VerticalFlip(p=0.5),
>>>            A.RandomBrightnessContrast(p=0.5),
>>>        ]),
>>>    ], p=1)
>>> ])
Source code in albumentations/core/composition.py
Python
class Sequential(BaseCompose):
    """Sequentially applies all transforms to targets.

    Note:
        This transform is not intended to be a replacement for `Compose`. Instead, it should be used inside `Compose`
        the same way `OneOf` or `OneOrOther` are used. For instance, you can combine `OneOf` with `Sequential` to
        create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly
        chose sequence to input data (see the `Example` section for an example definition of such pipeline).

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        >>>    A.OneOf([
        >>>        A.Sequential([
        >>>            A.HorizontalFlip(p=0.5),
        >>>            A.ShiftScaleRotate(p=0.5),
        >>>        ]),
        >>>        A.Sequential([
        >>>            A.VerticalFlip(p=0.5),
        >>>            A.RandomBrightnessContrast(p=0.5),
        >>>        ]),
        >>>    ], p=1)
        >>> ])

    """

    def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
        super().__init__(transforms, p)

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
        for t in self.transforms:
            data = t(**data)
        return data

class SomeOf (transforms, n, replace=True, p=1) [view source on GitHub]

Select N transforms to apply. Selected transforms will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name Type Description
transforms list

list of transformations to compose.

n int

number of transforms to apply.

replace bool

Whether the sampled transforms are with or without replacement. Default: True.

p float

probability of applying selected transform. Default: 1.

Source code in albumentations/core/composition.py
Python
class SomeOf(BaseCompose):
    """Select N transforms to apply. Selected transforms will be called with `force_apply=True`.
    Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

    Args:
        transforms (list): list of transformations to compose.
        n (int): number of transforms to apply.
        replace (bool): Whether the sampled transforms are with or without replacement. Default: True.
        p (float): probability of applying selected transform. Default: 1.

    """

    def __init__(self, transforms: TransformsSeqType, n: int, replace: bool = True, p: float = 1):
        super().__init__(transforms, p)
        self.n = n
        self.replace = replace
        transforms_ps = [t.p for t in self.transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, *arg: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if self.transforms_ps and (force_apply or random.random() < self.p):
            idx = random_utils.choice(len(self.transforms), size=self.n, replace=self.replace, p=self.transforms_ps)
            for i in idx:
                t = self.transforms[i]
                data = t(force_apply=True, **data)
        return data

    def to_dict_private(self) -> Dict[str, Any]:
        dictionary = super().to_dict_private()
        dictionary.update({"n": self.n, "replace": self.replace})
        return dictionary

keypoints_utils

class KeypointParams (format, label_fields=None, remove_invisible=True, angle_in_degrees=True, check_each_transform=True) [view source on GitHub]

Parameters of keypoints

Parameters:

Name Type Description
format str

format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

x - X coordinate,

y - Y coordinate

s - Keypoint scale

a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)

label_fields list

list of fields that are joined with keypoints, e.g labels. Should be same type as keypoints.

remove_invisible bool

to remove invisible points after transform or not

angle_in_degrees bool

angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints

check_each_transform bool

if True, then keypoints will be checked after each dual transform. Default: True

Source code in albumentations/core/keypoints_utils.py
Python
class KeypointParams(Params):
    """Parameters of keypoints

    Args:
        format (str): format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

            x - X coordinate,

            y - Y coordinate

            s - Keypoint scale

            a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)
        label_fields (list): list of fields that are joined with keypoints, e.g labels.
            Should be same type as keypoints.
        remove_invisible (bool): to remove invisible points after transform or not
        angle_in_degrees (bool): angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints
        check_each_transform (bool): if `True`, then keypoints will be checked after each dual transform.
            Default: `True`

    """

    def __init__(
        self,
        format: str,
        label_fields: Optional[Sequence[str]] = None,
        remove_invisible: bool = True,
        angle_in_degrees: bool = True,
        check_each_transform: bool = True,
    ):
        super().__init__(format, label_fields)
        self.remove_invisible = remove_invisible
        self.angle_in_degrees = angle_in_degrees
        self.check_each_transform = check_each_transform

    def to_dict_private(self) -> Dict[str, Any]:
        data = super().to_dict_private()
        data.update(
            {
                "remove_invisible": self.remove_invisible,
                "angle_in_degrees": self.angle_in_degrees,
                "check_each_transform": self.check_each_transform,
            }
        )
        return data

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    @classmethod
    def get_class_fullname(cls) -> str:
        return "KeypointParams"

class KeypointsProcessor (params, additional_targets=None) [view source on GitHub]

Source code in albumentations/core/keypoints_utils.py
Python
class KeypointsProcessor(DataProcessor):
    def __init__(self, params: KeypointParams, additional_targets: Optional[Dict[str, str]] = None):
        super().__init__(params, additional_targets)

    @property
    def default_data_name(self) -> str:
        return "keypoints"

    def ensure_data_valid(self, data: Dict[str, Any]) -> None:
        if self.params.label_fields and not all(i in data for i in self.params.label_fields):
            msg = "Your 'label_fields' are not valid - them must have same names as params in " "'keypoint_params' dict"
            raise ValueError(msg)

    def filter(self, data: Sequence[KeypointType], rows: int, cols: int) -> Sequence[KeypointType]:
        """The function filters a sequence of data based on the number of rows and columns, and returns a
        sequence of keypoints.

        :param data: The `data` parameter is a sequence of sequences. Each inner sequence represents a
        set of keypoints
        :type data: Sequence[Sequence]
        :param rows: The `rows` parameter represents the number of rows in the data matrix. It specifies
        the number of rows that will be used for filtering the keypoints
        :type rows: int
        :param cols: The parameter "cols" represents the number of columns in the grid that the
        keypoints will be filtered on
        :type cols: int
        :return: a sequence of KeypointType objects.
        """
        self.params: KeypointParams
        return filter_keypoints(data, rows, cols, remove_invisible=self.params.remove_invisible)

    def check(self, data: Sequence[KeypointType], rows: int, cols: int) -> None:
        check_keypoints(data, rows, cols)

    def convert_from_albumentations(self, data: Sequence[KeypointType], rows: int, cols: int) -> List[KeypointType]:
        params = self.params
        return convert_keypoints_from_albumentations(
            data,
            params.format,
            rows,
            cols,
            check_validity=params.remove_invisible,
            angle_in_degrees=params.angle_in_degrees,
        )

    def convert_to_albumentations(self, data: Sequence[KeypointType], rows: int, cols: int) -> List[KeypointType]:
        params = self.params
        return convert_keypoints_to_albumentations(
            data,
            params.format,
            rows,
            cols,
            check_validity=params.remove_invisible,
            angle_in_degrees=params.angle_in_degrees,
        )
filter (self, data, rows, cols)

The function filters a sequence of data based on the number of rows and columns, and returns a sequence of keypoints.

:param data: The data parameter is a sequence of sequences. Each inner sequence represents a set of keypoints :type data: Sequence[Sequence] :param rows: The rows parameter represents the number of rows in the data matrix. It specifies the number of rows that will be used for filtering the keypoints :type rows: int :param cols: The parameter "cols" represents the number of columns in the grid that the keypoints will be filtered on :type cols: int :return: a sequence of KeypointType objects.

Source code in albumentations/core/keypoints_utils.py
Python
def filter(self, data: Sequence[KeypointType], rows: int, cols: int) -> Sequence[KeypointType]:
    """The function filters a sequence of data based on the number of rows and columns, and returns a
    sequence of keypoints.

    :param data: The `data` parameter is a sequence of sequences. Each inner sequence represents a
    set of keypoints
    :type data: Sequence[Sequence]
    :param rows: The `rows` parameter represents the number of rows in the data matrix. It specifies
    the number of rows that will be used for filtering the keypoints
    :type rows: int
    :param cols: The parameter "cols" represents the number of columns in the grid that the
    keypoints will be filtered on
    :type cols: int
    :return: a sequence of KeypointType objects.
    """
    self.params: KeypointParams
    return filter_keypoints(data, rows, cols, remove_invisible=self.params.remove_invisible)

def check_keypoint (kp, rows, cols) [view source on GitHub]

Check if keypoint coordinates are less than image shapes

Source code in albumentations/core/keypoints_utils.py
Python
def check_keypoint(kp: KeypointType, rows: int, cols: int) -> None:
    """Check if keypoint coordinates are less than image shapes"""
    for name, value, size in zip(["x", "y"], kp[:2], [cols, rows]):
        if not 0 <= value < size:
            raise ValueError(f"Expected {name} for keypoint {kp} " f"to be in the range [0.0, {size}], got {value}.")

    angle = kp[2]
    if not (0 <= angle < 2 * math.pi):
        raise ValueError(f"Keypoint angle must be in range [0, 2 * PI). Got: {angle}")

def check_keypoints (keypoints, rows, cols) [view source on GitHub]

Check if keypoints boundaries are less than image shapes

Source code in albumentations/core/keypoints_utils.py
Python
def check_keypoints(keypoints: Sequence[KeypointType], rows: int, cols: int) -> None:
    """Check if keypoints boundaries are less than image shapes"""
    for kp in keypoints:
        check_keypoint(kp, rows, cols)

serialization

class Serializable [view source on GitHub]

Source code in albumentations/core/serialization.py
Python
class Serializable(metaclass=SerializableMeta):
    @classmethod
    @abstractmethod
    def is_serializable(cls) -> bool:
        raise NotImplementedError

    @classmethod
    @abstractmethod
    def get_class_fullname(cls) -> str:
        raise NotImplementedError

    @abstractmethod
    def to_dict_private(self) -> Dict[str, Any]:
        raise NotImplementedError

    def to_dict(self, on_not_implemented_error: str = "raise") -> Dict[str, Any]:
        """Take a transform pipeline and convert it to a serializable representation that uses only standard
        python data types: dictionaries, lists, strings, integers, and floats.

        Args:
            self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
                method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
                If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
                but no transform parameters will be serialized.
            on_not_implemented_error (str): `raise` or `warn`.

        """
        if on_not_implemented_error not in {"raise", "warn"}:
            msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
            "and 'warn'"
            raise ValueError(msg)
        try:
            transform_dict = self.to_dict_private()
        except NotImplementedError:
            if on_not_implemented_error == "raise":
                raise

            transform_dict = {}
            warnings.warn(
                f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
                f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
                f"or '{self.__class__.__name__}.get_transform_init_args' "
                "method to make the transform serializable"
            )
        return {"__version__": __version__, "transform": transform_dict}
to_dict (self, on_not_implemented_error='raise')

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name Type Description
self

A transform that should be serialized. If the transform doesn't implement the to_dict method and on_not_implemented_error equals to 'raise' then NotImplementedError is raised. If on_not_implemented_error equals to 'warn' then NotImplementedError will be ignored but no transform parameters will be serialized.

on_not_implemented_error str

raise or warn.

Source code in albumentations/core/serialization.py
Python
def to_dict(self, on_not_implemented_error: str = "raise") -> Dict[str, Any]:
    """Take a transform pipeline and convert it to a serializable representation that uses only standard
    python data types: dictionaries, lists, strings, integers, and floats.

    Args:
        self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
            method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
            If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
            but no transform parameters will be serialized.
        on_not_implemented_error (str): `raise` or `warn`.

    """
    if on_not_implemented_error not in {"raise", "warn"}:
        msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
        "and 'warn'"
        raise ValueError(msg)
    try:
        transform_dict = self.to_dict_private()
    except NotImplementedError:
        if on_not_implemented_error == "raise":
            raise

        transform_dict = {}
        warnings.warn(
            f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
            f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
            f"or '{self.__class__.__name__}.get_transform_init_args' "
            "method to make the transform serializable"
        )
    return {"__version__": __version__, "transform": transform_dict}

class SerializableMeta [view source on GitHub]

A metaclass that is used to register classes in SERIALIZABLE_REGISTRY or NON_SERIALIZABLE_REGISTRY so they can be found later while deserializing transformation pipeline using classes full names.

Source code in albumentations/core/serialization.py
Python
class SerializableMeta(ABCMeta):
    """A metaclass that is used to register classes in `SERIALIZABLE_REGISTRY` or `NON_SERIALIZABLE_REGISTRY`
    so they can be found later while deserializing transformation pipeline using classes full names.
    """

    def __new__(cls, name: str, bases: Tuple[type, ...], *args: Any, **kwargs: Any) -> "SerializableMeta":
        cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
        if name != "Serializable" and ABC not in bases:
            if cls_obj.is_serializable():
                SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
            else:
                NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
        return cls_obj

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def _to_dict(cls) -> Dict[str, Any]:
        return {}
__new__ (cls, name, bases, *args, **kwargs) special staticmethod

Create and return a new object. See help(type) for accurate signature.

Source code in albumentations/core/serialization.py
Python
def __new__(cls, name: str, bases: Tuple[type, ...], *args: Any, **kwargs: Any) -> "SerializableMeta":
    cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
    if name != "Serializable" and ABC not in bases:
        if cls_obj.is_serializable():
            SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
        else:
            NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
    return cls_obj

def from_dict (transform_dict, nonserializable=None) [view source on GitHub]

transform_dict: A dictionary with serialized transform pipeline. nonserializable (dict): A dictionary that contains non-serializable transforms. This dictionary is required when you are restoring a pipeline that contains non-serializable transforms. Keys in that dictionary should be named same as name arguments in respective transforms from a serialized pipeline.

Source code in albumentations/core/serialization.py
Python
def from_dict(
    transform_dict: Dict[str, Any], nonserializable: Optional[Dict[str, Any]] = None
) -> Optional[Serializable]:
    """Args:
    transform_dict: A dictionary with serialized transform pipeline.
    nonserializable (dict): A dictionary that contains non-serializable transforms.
        This dictionary is required when you are restoring a pipeline that contains non-serializable transforms.
        Keys in that dictionary should be named same as `name` arguments in respective transforms from
        a serialized pipeline.

    """
    register_additional_transforms()
    transform = transform_dict["transform"]
    lmbd = instantiate_nonserializable(transform, nonserializable)
    if lmbd:
        return lmbd
    name = transform["__class_fullname__"]
    args = {k: v for k, v in transform.items() if k != "__class_fullname__"}
    cls = SERIALIZABLE_REGISTRY[shorten_class_name(name)]
    if "transforms" in args:
        args["transforms"] = [from_dict({"transform": t}, nonserializable=nonserializable) for t in args["transforms"]]
    return cls(**args)

def get_shortest_class_fullname (cls) [view source on GitHub]

The function get_shortest_class_fullname takes a class object as input and returns its shortened full name.

:param cls: The parameter cls is of type Type[BasicCompose], which means it expects a class that is a subclass of BasicCompose :type cls: Type[BasicCompose] :return: a string, which is the shortened version of the full class name.

Source code in albumentations/core/serialization.py
Python
def get_shortest_class_fullname(cls: Type[Any]) -> str:
    """The function `get_shortest_class_fullname` takes a class object as input and returns its shortened
    full name.

    :param cls: The parameter `cls` is of type `Type[BasicCompose]`, which means it expects a class that
    is a subclass of `BasicCompose`
    :type cls: Type[BasicCompose]
    :return: a string, which is the shortened version of the full class name.
    """
    class_fullname = f"{cls.__module__}.{cls.__name__}"
    return shorten_class_name(class_fullname)

def load (filepath_or_buffer, data_format='json', nonserializable=None) [view source on GitHub]

Load a serialized pipeline from a file or file-like object and construct a transform pipeline.

Parameters:

Name Type Description
filepath_or_buffer Union[str, Path, TextIO]

The file path or file-like object to read the serialized data from. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be read from it directly.

data_format str

The format of the serialized data. Valid options are 'json' and 'yaml'. Defaults to 'json'.

nonserializable Optional[Dict[str, Any]]

A dictionary that contains non-serializable transforms. This dictionary is required when restoring a pipeline that contains non-serializable transforms. Keys in the dictionary should be named the same as the name arguments in respective transforms from the serialized pipeline. Defaults to None.

Returns:

Type Description
object

The deserialized transform pipeline.

Exceptions:

Type Description
ValueError

If data_format is 'yaml' but PyYAML is not installed.

Source code in albumentations/core/serialization.py
Python
def load(
    filepath_or_buffer: Union[str, Path, TextIO],
    data_format: str = "json",
    nonserializable: Optional[Dict[str, Any]] = None,
) -> object:
    """Load a serialized pipeline from a file or file-like object and construct a transform pipeline.

    Args:
        filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to read the serialized
            data from.
            If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
            the serialized data will be read from it directly.
        data_format (str): The format of the serialized data. Valid options are 'json' and 'yaml'.
            Defaults to 'json'.
        nonserializable (Optional[Dict[str, Any]]): A dictionary that contains non-serializable transforms.
            This dictionary is required when restoring a pipeline that contains non-serializable transforms.
            Keys in the dictionary should be named the same as the `name` arguments in respective transforms
            from the serialized pipeline. Defaults to None.

    Returns:
        object: The deserialized transform pipeline.

    Raises:
        ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

    """
    check_data_format(data_format)

    if isinstance(filepath_or_buffer, (str, Path)):  # Assume it's a filepath
        with open(filepath_or_buffer) as f:
            if data_format == "json":
                transform_dict = json.load(f)
            else:
                if not yaml_available:
                    msg = "You need to install PyYAML to load a pipeline in yaml format"
                    raise ValueError(msg)
                transform_dict = yaml.safe_load(f)
    elif data_format == "json":
        transform_dict = json.load(filepath_or_buffer)
    else:
        if not yaml_available:
            msg = "You need to install PyYAML to load a pipeline in yaml format"
            raise ValueError(msg)
        transform_dict = yaml.safe_load(filepath_or_buffer)

    return from_dict(transform_dict, nonserializable=nonserializable)

def register_additional_transforms () [view source on GitHub]

Register transforms that are not imported directly into the albumentations module by checking the availability of optional dependencies.

Source code in albumentations/core/serialization.py
Python
def register_additional_transforms() -> None:
    """Register transforms that are not imported directly into the `albumentations` module by checking
    the availability of optional dependencies.
    """
    if importlib.util.find_spec("torch") is not None:
        try:
            # Import `albumentations.pytorch` only if `torch` is installed.
            import albumentations.pytorch

            # Use a dummy operation to acknowledge the use of the imported module and avoid linting errors.
            _ = albumentations.pytorch.ToTensorV2
        except ImportError:
            pass

def save (transform, filepath_or_buffer, data_format='json', on_not_implemented_error='raise') [view source on GitHub]

Serialize a transform pipeline and save it to either a file specified by a path or a file-like object in either JSON or YAML format.

Parameters:

Name Type Description
transform Serializable

The transform pipeline to serialize.

filepath_or_buffer Union[str, Path, TextIO]

The file path or file-like object to write the serialized data to. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be written to it directly.

data_format str

The format to serialize the data in. Valid options are 'json' and 'yaml'. Defaults to 'json'.

on_not_implemented_error str

Determines the behavior if a transform does not implement the to_dict method. If set to 'raise', a NotImplementedError is raised. If set to 'warn', the exception is ignored, and no transform arguments are saved. Defaults to 'raise'.

Exceptions:

Type Description
ValueError

If data_format is 'yaml' but PyYAML is not installed.

Source code in albumentations/core/serialization.py
Python
def save(
    transform: "Serializable",
    filepath_or_buffer: Union[str, Path, TextIO],
    data_format: str = "json",
    on_not_implemented_error: str = "raise",
) -> None:
    """Serialize a transform pipeline and save it to either a file specified by a path or a file-like object
    in either JSON or YAML format.

    Args:
        transform (Serializable): The transform pipeline to serialize.
        filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to write the serialized
            data to.
            If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
            the serialized data will be written to it directly.
        data_format (str): The format to serialize the data in. Valid options are 'json' and 'yaml'.
            Defaults to 'json'.
        on_not_implemented_error (str): Determines the behavior if a transform does not implement the `to_dict` method.
            If set to 'raise', a `NotImplementedError` is raised. If set to 'warn', the exception is ignored, and
            no transform arguments are saved. Defaults to 'raise'.

    Raises:
        ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

    """
    check_data_format(data_format)
    transform_dict = transform.to_dict(on_not_implemented_error=on_not_implemented_error)
    transform_dict = serialize_enum(transform_dict)

    # Determine whether to write to a file or a file-like object
    if isinstance(filepath_or_buffer, (str, Path)):  # It's a filepath
        with open(filepath_or_buffer, "w") as f:
            if data_format == "yaml":
                if not yaml_available:
                    msg = "You need to install PyYAML to save a pipeline in YAML format"
                    raise ValueError(msg)
                yaml.safe_dump(transform_dict, f, default_flow_style=False)
            elif data_format == "json":
                json.dump(transform_dict, f)
    elif data_format == "yaml":
        if not yaml_available:
            msg = "You need to install PyYAML to save a pipeline in YAML format"
            raise ValueError(msg)
        yaml.safe_dump(transform_dict, filepath_or_buffer, default_flow_style=False)
    elif data_format == "json":
        json.dump(transform_dict, filepath_or_buffer)

def serialize_enum (obj) [view source on GitHub]

Recursively search for Enum objects and convert them to their value. Also handle any Mapping or Sequence types.

Source code in albumentations/core/serialization.py
Python
def serialize_enum(obj: Any) -> Any:
    """Recursively search for Enum objects and convert them to their value.
    Also handle any Mapping or Sequence types.
    """
    if isinstance(obj, Mapping):
        return {k: serialize_enum(v) for k, v in obj.items()}
    if isinstance(obj, Sequence) and not isinstance(obj, str):  # exclude strings since they're also sequences
        return [serialize_enum(v) for v in obj]
    return obj.value if isinstance(obj, Enum) else obj

def to_dict (transform, on_not_implemented_error='raise') [view source on GitHub]

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name Type Description
transform Serializable

A transform that should be serialized. If the transform doesn't implement the to_dict method and on_not_implemented_error equals to 'raise' then NotImplementedError is raised. If on_not_implemented_error equals to 'warn' then NotImplementedError will be ignored but no transform parameters will be serialized.

on_not_implemented_error str

raise or warn.

Source code in albumentations/core/serialization.py
Python
def to_dict(transform: Serializable, on_not_implemented_error: str = "raise") -> Dict[str, Any]:
    """Take a transform pipeline and convert it to a serializable representation that uses only standard
    python data types: dictionaries, lists, strings, integers, and floats.

    Args:
        transform: A transform that should be serialized. If the transform doesn't implement the `to_dict`
            method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
            If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
            but no transform parameters will be serialized.
        on_not_implemented_error (str): `raise` or `warn`.

    """
    return transform.to_dict(on_not_implemented_error)

transforms_interface

class BasicTransform (always_apply=False, p=0.5) [view source on GitHub]

Source code in albumentations/core/transforms_interface.py
Python
class BasicTransform(Serializable):
    call_backup = None
    interpolation: Union[int, Interpolation]
    fill_value: ColorType
    mask_fill_value: Optional[ColorType]

    def __init__(self, always_apply: bool = False, p: float = 0.5):
        self.p = p
        self.always_apply = always_apply
        self._additional_targets: Dict[str, str] = {}

        # replay mode params
        self.deterministic = False
        self.save_key = "replay"
        self.params: Dict[Any, Any] = {}
        self.replay_mode = False
        self.applied_in_replay = False

    def __call__(self, *args: Any, force_apply: bool = False, **kwargs: Any) -> Any:
        if args:
            msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
            raise KeyError(msg)
        if self.replay_mode:
            if self.applied_in_replay:
                return self.apply_with_params(self.params, **kwargs)

            return kwargs

        if force_apply or self.always_apply or (random.random() < self.p):
            params = self.get_params()

            if self.targets_as_params:
                if not all(key in kwargs for key in self.targets_as_params):
                    msg = f"{self.__class__.__name__} requires {self.targets_as_params}"
                    raise ValueError(msg)

                targets_as_params = {k: kwargs[k] for k in self.targets_as_params}
                params_dependent_on_targets = self.get_params_dependent_on_targets(targets_as_params)
                params.update(params_dependent_on_targets)
            if self.deterministic:
                if self.targets_as_params:
                    warn(
                        self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
                        " because its' params depend on targets."
                    )
                kwargs[self.save_key][id(self)] = deepcopy(params)
            return self.apply_with_params(params, **kwargs)

        return kwargs

    def apply_with_params(self, params: Dict[str, Any], *args: Any, **kwargs: Any) -> Dict[str, Any]:
        if params is None:
            return kwargs
        params = self.update_params(params, **kwargs)
        res = {}
        for key, arg in kwargs.items():
            if arg is not None:
                target_function = self._get_target_function(key)
                target_dependencies = {k: kwargs[k] for k in self.target_dependence.get(key, [])}
                res[key] = target_function(arg, **dict(params, **target_dependencies))
            else:
                res[key] = None
        return res

    def set_deterministic(self, flag: bool, save_key: str = "replay") -> "BasicTransform":
        if save_key == "params":
            msg = "params save_key is reserved"
            raise KeyError(msg)

        self.deterministic = flag
        self.save_key = save_key
        return self

    def __repr__(self) -> str:
        state = self.get_base_init_args()
        state.update(self.get_transform_init_args())
        return f"{self.__class__.__name__}({format_args(state)})"

    def _get_target_function(self, key: str) -> Callable[..., Any]:
        transform_key = key
        if key in self._additional_targets:
            transform_key = self._additional_targets.get(key, key)

        return self.targets.get(transform_key, lambda x, **p: x)

    def apply(self, img: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        raise NotImplementedError

    def get_params(self) -> Dict[str, Any]:
        return {}

    @property
    def targets(self) -> Dict[str, Callable[..., Any]]:
        # you must specify targets in subclass
        # foe example:
        # >>  ('image', 'mask')
        # >>  ('image', 'boxes')
        raise NotImplementedError

    def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
        if hasattr(self, "interpolation"):
            params["interpolation"] = self.interpolation
        if hasattr(self, "fill_value"):
            params["fill_value"] = self.fill_value
        if hasattr(self, "mask_fill_value"):
            params["mask_fill_value"] = self.mask_fill_value
        params.update({"cols": kwargs["image"].shape[1], "rows": kwargs["image"].shape[0]})
        return params

    @property
    def target_dependence(self) -> Dict[str, Any]:
        return {}

    def add_targets(self, additional_targets: Dict[str, str]) -> None:
        """Add targets to transform them the same way as one of existing targets
        ex: {'target_image': 'image'}
        ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
        by the way you must have at least one object with key 'image'

        Args:
            additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

        """
        self._additional_targets = additional_targets

    @property
    def targets_as_params(self) -> List[str]:
        return []

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        raise NotImplementedError(
            "Method get_params_dependent_on_targets is not implemented in class " + self.__class__.__name__
        )

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        msg = f"Class {self.get_class_fullname()} is not serializable because the `get_transform_init_args_names` "
        "method is not implemented"
        raise NotImplementedError(msg)

    def get_base_init_args(self) -> Dict[str, Any]:
        return {"always_apply": self.always_apply, "p": self.p}

    def get_transform_init_args(self) -> Dict[str, Any]:
        return {k: getattr(self, k) for k in self.get_transform_init_args_names()}

    def to_dict_private(self) -> Dict[str, Any]:
        state = {"__class_fullname__": self.get_class_fullname()}
        state.update(self.get_base_init_args())
        state.update(self.get_transform_init_args())
        return state

    def get_dict_with_id(self) -> Dict[str, Any]:
        d = self.to_dict_private()
        d["id"] = id(self)
        return d
add_targets (self, additional_targets)

Add targets to transform them the same way as one of existing targets ex: {'target_image': 'image'} ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'} by the way you must have at least one object with key 'image'

Parameters:

Name Type Description
additional_targets dict

keys - new target name, values - old target name. ex: {'image2': 'image'}

Source code in albumentations/core/transforms_interface.py
Python
def add_targets(self, additional_targets: Dict[str, str]) -> None:
    """Add targets to transform them the same way as one of existing targets
    ex: {'target_image': 'image'}
    ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
    by the way you must have at least one object with key 'image'

    Args:
        additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

    """
    self._additional_targets = additional_targets

class DualTransform [view source on GitHub]

A base class for transformations that should be applied both to an image and its corresponding properties such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image, all associated entities are transformed accordingly to maintain consistency between the image and its annotations.

Properties

targets (Dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints) that the transform should be applied to and maps them to the corresponding methods.

Methods

apply_to_bbox(bbox: BoxInternalType, args: Any, *params: Any) -> BoxInternalType: Applies the transform to a single bounding box. Should be implemented in the subclass.

apply_to_keypoint(keypoint: KeypointInternalType, args: Any, *params: Any) -> KeypointInternalType: Applies the transform to a single keypoint. Should be implemented in the subclass.

apply_to_bboxes(bboxes: Sequence[BoxType], args: Any, *params: Any) -> Sequence[BoxType]: Applies the transform to a list of bounding boxes. Delegates to apply_to_bbox for each bounding box.

apply_to_keypoints(keypoints: Sequence[KeypointType], args: Any, *params: Any) -> Sequence[KeypointType]: Applies the transform to a list of keypoints. Delegates to apply_to_keypoint for each keypoint.

apply_to_mask(mask: np.ndarray, args: Any, *params: Any) -> np.ndarray: Applies the transform specifically to a single mask.

apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]: Applies the transform to a list of masks. Delegates to apply_to_mask for each mask.

Note

This class is intended to be subclassed and should not be used directly. Subclasses are expected to implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the corresponding apply_to_* methods.

Source code in albumentations/core/transforms_interface.py
Python
class DualTransform(BasicTransform):
    """A base class for transformations that should be applied both to an image and its corresponding properties
    such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image,
    all associated entities are transformed accordingly to maintain consistency between the image and its annotations.

    Properties:
        targets (Dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints)
            that the transform should be applied to and maps them to the corresponding methods.

    Methods:
        apply_to_bbox(bbox: BoxInternalType, *args: Any, **params: Any) -> BoxInternalType:
            Applies the transform to a single bounding box. Should be implemented in the subclass.

        apply_to_keypoint(keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
            Applies the transform to a single keypoint. Should be implemented in the subclass.

        apply_to_bboxes(bboxes: Sequence[BoxType], *args: Any, **params: Any) -> Sequence[BoxType]:
            Applies the transform to a list of bounding boxes. Delegates to `apply_to_bbox` for each bounding box.

        apply_to_keypoints(keypoints: Sequence[KeypointType], *args: Any, **params: Any) -> Sequence[KeypointType]:
            Applies the transform to a list of keypoints. Delegates to `apply_to_keypoint` for each keypoint.

        apply_to_mask(mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
            Applies the transform specifically to a single mask.

        apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]:
            Applies the transform to a list of masks. Delegates to `apply_to_mask` for each mask.

    Note:
        This class is intended to be subclassed and should not be used directly. Subclasses are expected to
        implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the
        corresponding `apply_to_*` methods.

    """

    @property
    def targets(self) -> Dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
            "keypoints": self.apply_to_keypoints,
        }

    def apply_to_bbox(self, bbox: BoxInternalType, *args: Any, **params: Any) -> BoxInternalType:
        msg = f"Method apply_to_bbox is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
        msg = f"Method apply_to_keypoint is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_global_label(self, label: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        msg = f"Method apply_to_global_label is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_bboxes(self, bboxes: Sequence[BoxType], *args: Any, **params: Any) -> Sequence[BoxType]:
        return [
            self.apply_to_bbox(cast(BoxInternalType, tuple(cast(BoxInternalType, bbox[:4]))), **params)
            + tuple(bbox[4:])
            for bbox in bboxes
        ]

    def apply_to_keypoints(
        self, keypoints: Sequence[KeypointType], *args: Any, **params: Any
    ) -> Sequence[KeypointType]:
        return [
            self.apply_to_keypoint(cast(KeypointInternalType, tuple(keypoint[:4])), **params) + tuple(keypoint[4:])
            for keypoint in keypoints
        ]

    def apply_to_mask(self, mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        return self.apply(mask, **{k: cv2.INTER_NEAREST if k == "interpolation" else v for k, v in params.items()})

    def apply_to_masks(self, masks: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]:
        return [self.apply_to_mask(mask, **params) for mask in masks]

    def apply_to_global_labels(self, labels: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]:
        return [self.apply_to_global_label(label, **params) for label in labels]

class ImageOnlyTransform [view source on GitHub]

Transform applied to image only.

Source code in albumentations/core/transforms_interface.py
Python
class ImageOnlyTransform(BasicTransform):
    """Transform applied to image only."""

    _targets = Targets.IMAGE

    @property
    def targets(self) -> Dict[str, Callable[..., Any]]:
        return {"image": self.apply}

class NoOp [view source on GitHub]

Does nothing

Targets

image, mask, bboxes, keypoints, global_label

Source code in albumentations/core/transforms_interface.py
Python
class NoOp(DualTransform):
    """Does nothing

    Targets:
        image, mask, bboxes, keypoints, global_label
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS, Targets.GLOBAL_LABEL)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return keypoint

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return bbox

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return img

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
        return mask

    def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
        return label

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ()

def to_tuple (param, low=None, bias=None) [view source on GitHub]

Convert input argument to a min-max tuple.

Parameters:

Name Type Description
param Union[float, Tuple[float, float], int, Tuple[int, int]]

Input value which could be a scalar or a sequence of exactly 2 scalars.

low Union[float, Tuple[float, float], int, Tuple[int, int]]

Second element of the tuple, provided as an optional argument for when param is a scalar.

bias Union[int, float]

An offset added to both elements of the tuple.

Returns:

Type Description
Union[Tuple[int, int], Tuple[float, float]]

A tuple of two scalars, optionally adjusted by bias. Raises ValueError for invalid combinations or types of arguments.

Source code in albumentations/core/transforms_interface.py
Python
def to_tuple(
    param: ScaleType,
    low: Optional[ScaleType] = None,
    bias: Optional[ScalarType] = None,
) -> Union[Tuple[int, int], Tuple[float, float]]:
    """Convert input argument to a min-max tuple.

    Args:
        param: Input value which could be a scalar or a sequence of exactly 2 scalars.
        low: Second element of the tuple, provided as an optional argument for when `param` is a scalar.
        bias: An offset added to both elements of the tuple.

    Returns:
        A tuple of two scalars, optionally adjusted by `bias`.
        Raises ValueError for invalid combinations or types of arguments.

    """
    # Validate mutually exclusive arguments
    if low is not None and bias is not None:
        msg = "Arguments 'low' and 'bias' cannot be used together."
        raise ValueError(msg)

    if isinstance(param, Sequence) and len(param) == PAIR:
        min_val, max_val = min(param), max(param)

    # Handle scalar input
    elif isinstance(param, (int, float)):
        if isinstance(low, (int, float)):
            # Use low and param to create a tuple
            min_val, max_val = (low, param) if low < param else (param, low)
        else:
            # Create a symmetric tuple around 0
            min_val, max_val = -param, param
    else:
        msg = "Argument 'param' must be either a scalar or a sequence of 2 elements."
        raise ValueError(msg)

    # Apply bias if provided
    if bias is not None:
        return (bias + min_val, bias + max_val)

    return min_val, max_val

types

class Targets

An enumeration.

Source code in albumentations/core/types.py
Python
class Targets(Enum):
    IMAGE = "Image"
    MASK = "Mask"
    BBOXES = "BBoxes"
    KEYPOINTS = "Keypoints"
    GLOBAL_LABEL = "Global Label"

pytorch special

transforms

class ToTensorV2 (transpose_mask=False, always_apply=True, p=1.0) [view source on GitHub]

Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy HWC format and converts them to PyTorch CHW format. If the image is in HW format, it will be converted to PyTorch HW.

Attributes:

Name Type Description
transpose_mask bool

If True, transposes 3D input mask dimensions from [height, width, num_channels] to [num_channels, height, width].

always_apply bool

Indicates if this transformation should be always applied. Default: True.

p float

Probability of applying the transform. Default: 1.0.

Source code in albumentations/pytorch/transforms.py
Python
class ToTensorV2(BasicTransform):
    """Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy `HWC` format
    and converts them to PyTorch `CHW` format. If the image is in `HW` format, it will be converted to PyTorch `HW`.

    Attributes:
        transpose_mask (bool): If True, transposes 3D input mask dimensions from `[height, width, num_channels]` to
            `[num_channels, height, width]`.
        always_apply (bool): Indicates if this transformation should be always applied. Default: True.
        p (float): Probability of applying the transform. Default: 1.0.

    """

    def __init__(self, transpose_mask: bool = False, always_apply: bool = True, p: float = 1.0):
        super().__init__(always_apply=always_apply, p=p)
        self.transpose_mask = transpose_mask

    @property
    def targets(self) -> Dict[str, Any]:
        return {"image": self.apply, "mask": self.apply_to_mask, "masks": self.apply_to_masks}

    def apply(self, img: np.ndarray, **params: Any) -> torch.Tensor:
        if len(img.shape) not in [2, 3]:
            msg = "Albumentations only supports images in HW or HWC format"
            raise ValueError(msg)

        if len(img.shape) == TWO:
            img = np.expand_dims(img, 2)

        return torch.from_numpy(img.transpose(2, 0, 1))

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> torch.Tensor:
        if self.transpose_mask and mask.ndim == THREE:
            mask = mask.transpose(2, 0, 1)
        return torch.from_numpy(mask)

    def apply_to_masks(self, masks: List[np.ndarray], **params: Any) -> List[torch.Tensor]:
        return [self.apply_to_mask(mask, **params) for mask in masks]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("transpose_mask",)

    def get_params_dependent_on_targets(self, params: Any) -> Dict[str, Any]:
        return {}

random_utils

def shuffle (a, random_state=None) [view source on GitHub]

Shuffles an array in-place, using a specified random state or creating a new one if not provided.

Parameters:

Name Type Description
a np.ndarray

The array to be shuffled.

random_state Optional[np.random.RandomState]

The random state used for shuffling. Defaults to None.

Returns:

Type Description
np.ndarray

The shuffled array (note: the shuffle is in-place, so the original array is modified).

Source code in albumentations/random_utils.py
Python
def shuffle(
    a: np.ndarray,
    random_state: Optional[np.random.RandomState] = None,
) -> np.ndarray:
    """Shuffles an array in-place, using a specified random state or creating a new one if not provided.

    Args:
        a (np.ndarray): The array to be shuffled.
        random_state (Optional[np.random.RandomState], optional): The random state used for shuffling. Defaults to None.

    Returns:
        np.ndarray: The shuffled array (note: the shuffle is in-place, so the original array is modified).
    """
    if random_state is None:
        random_state = get_random_state()
    random_state.shuffle(a)
    return a