Full API Reference on a single page¶

Pixel-level transforms¶

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

AdvancedBlur
Blur
CLAHE
ChannelDropout
ChannelShuffle
ChromaticAberration
ColorJitter
Defocus
Downscale
Emboss
Equalize
FDA
FancyPCA
FromFloat
GaussNoise
GaussianBlur
GlassBlur
HistogramMatching
HueSaturationValue
ISONoise
ImageCompression
InvertImg
MedianBlur
MotionBlur
MultiplicativeNoise
Normalize
PixelDistributionAdaptation
PlanckianJitter
Posterize
RGBShift
RandomBrightnessContrast
RandomFog
RandomGamma
RandomGravel
RandomRain
RandomShadow
RandomSnow
RandomSunFlare
RandomToneCurve
RingingOvershoot
Sharpen
Solarize
Spatter
Superpixels
TemplateTransform
TextImage
ToFloat
ToGray
ToRGB
ToSepia
UnsharpMask
ZoomBlur

Spatial-level transforms¶

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform	Image	Mask	BBoxes	Keypoints	Global Label
Affine	✓	✓	✓	✓
BBoxSafeRandomCrop	✓	✓	✓	✓
CenterCrop	✓	✓	✓	✓
CoarseDropout	✓	✓		✓
Crop	✓	✓	✓	✓
CropAndPad	✓	✓	✓	✓
CropNonEmptyMaskIfExists	✓	✓	✓	✓
D4	✓	✓	✓	✓
ElasticTransform	✓	✓	✓
Flip	✓	✓	✓	✓
GridDistortion	✓	✓	✓
GridDropout	✓	✓
HorizontalFlip	✓	✓	✓	✓
Lambda	✓	✓	✓	✓	✓
LongestMaxSize	✓	✓	✓	✓
MaskDropout	✓	✓
MixUp	✓	✓			✓
Morphological	✓	✓
NoOp	✓	✓	✓	✓	✓
OpticalDistortion	✓	✓	✓
OverlayElements	✓	✓
PadIfNeeded	✓	✓	✓	✓
Perspective	✓	✓	✓	✓
PiecewiseAffine	✓	✓	✓	✓
PixelDropout	✓	✓
RandomCrop	✓	✓	✓	✓
RandomCropFromBorders	✓	✓	✓	✓
RandomGridShuffle	✓	✓		✓
RandomResizedCrop	✓	✓	✓	✓
RandomRotate90	✓	✓	✓	✓
RandomScale	✓	✓	✓	✓
RandomSizedBBoxSafeCrop	✓	✓	✓	✓
RandomSizedCrop	✓	✓	✓	✓
Resize	✓	✓	✓	✓
Rotate	✓	✓	✓	✓
SafeRotate	✓	✓	✓	✓
ShiftScaleRotate	✓	✓	✓	✓
SmallestMaxSize	✓	✓	✓	✓
Transpose	✓	✓	✓	✓
VerticalFlip	✓	✓	✓	✓
XYMasking	✓	✓		✓

`augmentations` `special` ¶

`blur` `special` ¶

`transforms` ¶

`class AdvancedBlur` `(blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=None, p=0.5)` [view source on GitHub] ¶

Blurs the input image using a Generalized Normal filter with randomly selected parameters.

This transform also adds multiplicative noise to the generated kernel before convolution, affecting the image in a unique way that combines blurring and noise injection for enhanced data augmentation.

Parameters:

Name	Type	Description
`blur_limit`	`ScaleIntType`	Maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`. If a single value is provided, `blur_limit` will be in the range (0, blur_limit). Defaults to (3, 7).
`sigma_x_limit`	`ScaleFloatType`	Gaussian kernel standard deviation for the X dimension. Must be in range [0, inf). If a single value is provided, `sigma_x_limit` will be in the range (0, sigma_limit). If set to 0, sigma will be computed as `sigma = 0.3((ksize-1)0.5 - 1) + 0.8`. Defaults to (0.2, 1.0).
`sigma_y_limit`	`ScaleFloatType`	Gaussian kernel standard deviation for the Y dimension. Must follow the same rules as `sigma_x_limit`. Defaults to (0.2, 1.0).
`rotate_limit`	`ScaleIntType`	Range from which a random angle used to rotate the Gaussian kernel is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Defaults to (-90, 90).
`beta_limit`	`ScaleFloatType`	Distribution shape parameter. 1 represents the normal distribution. Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it lighter than normal. Defaults to (0.5, 8.0).
`noise_limit`	`ScaleFloatType`	Multiplicative factor that controls the strength of kernel noise. Must be positive and preferably centered around 1.0. If a single value is provided, `noise_limit` will be in the range (0, noise_limit). Defaults to (0.75, 1.25).
`p`	`float`	Probability of applying the transform. Defaults to 0.5.

Reference

"Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data", available at https://arxiv.org/abs/2107.10833

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class AdvancedBlur(ImageOnlyTransform):
    """Blurs the input image using a Generalized Normal filter with randomly selected parameters.

    This transform also adds multiplicative noise to the generated kernel before convolution,
    affecting the image in a unique way that combines blurring and noise injection for enhanced
    data augmentation.

    Args:
        blur_limit (ScaleIntType, optional): Maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If a single value is provided, `blur_limit` will be in the range (0, blur_limit).
            Defaults to (3, 7).
        sigma_x_limit ScaleFloatType: Gaussian kernel standard deviation for the X dimension.
            Must be in range [0, inf). If a single value is provided, `sigma_x_limit` will be in the range
            (0, sigma_limit). If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
            Defaults to (0.2, 1.0).
        sigma_y_limit ScaleFloatType: Gaussian kernel standard deviation for the Y dimension.
            Must follow the same rules as `sigma_x_limit`.
            Defaults to (0.2, 1.0).
        rotate_limit (ScaleIntType, optional): Range from which a random angle used to rotate the Gaussian kernel
            is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit).
            Defaults to (-90, 90).
        beta_limit (ScaleFloatType, optional): Distribution shape parameter. 1 represents the normal distribution.
            Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it
            lighter than normal.
            Defaults to (0.5, 8.0).
        noise_limit (ScaleFloatType, optional): Multiplicative factor that controls the strength of kernel noise.
            Must be positive and preferably centered around 1.0. If a single value is provided,
            `noise_limit` will be in the range (0, noise_limit).
            Defaults to (0.75, 1.25).
        p (float, optional): Probability of applying the transform.
            Defaults to 0.5.

    Reference:
        "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data",
        available at https://arxiv.org/abs/2107.10833

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BlurInitSchema):
        sigma_x_limit: NonNegativeFloatRangeType = (0.2, 1.0)
        sigma_y_limit: NonNegativeFloatRangeType = (0.2, 1.0)
        beta_limit: NonNegativeFloatRangeType = (0.5, 8.0)
        noise_limit: NonNegativeFloatRangeType = (0.75, 1.25)
        rotate_limit: SymmetricRangeType = (-90, 90)

        @field_validator("beta_limit")
        @classmethod
        def check_beta_limit(cls, value: ScaleFloatType) -> tuple[float, float]:
            result = to_tuple(value, low=0)
            if not (result[0] < 1.0 < result[1]):
                msg = "beta_limit is expected to include 1.0."
                raise ValueError(msg)
            return result

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.sigma_x_limit, (tuple, list))
                and self.sigma_x_limit[0] == 0
                and isinstance(self.sigma_y_limit, (tuple, list))
                and self.sigma_y_limit[0] == 0
            ):
                msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_x_limit: ScaleFloatType = (0.2, 1.0),
        sigma_y_limit: ScaleFloatType = (0.2, 1.0),
        sigmaX_limit: ScaleFloatType | None = None,  # noqa: N803
        sigmaY_limit: ScaleFloatType | None = None,  # noqa: N803
        rotate_limit: ScaleIntType = 90,
        beta_limit: ScaleFloatType = (0.5, 8.0),
        noise_limit: ScaleFloatType = (0.9, 1.1),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)

        if sigmaX_limit is not None:
            warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_x_limit = sigmaX_limit

        if sigmaY_limit is not None:
            warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_y_limit = sigmaY_limit

        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_x_limit = cast(Tuple[float, float], sigma_x_limit)
        self.sigma_y_limit = cast(Tuple[float, float], sigma_y_limit)
        self.rotate_limit = cast(Tuple[int, int], rotate_limit)
        self.beta_limit = cast(Tuple[float, float], beta_limit)
        self.noise_limit = cast(Tuple[float, float], noise_limit)

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        sigma_x = random.uniform(*self.sigma_x_limit)
        sigma_y = random.uniform(*self.sigma_y_limit)
        angle = np.deg2rad(random.uniform(*self.rotate_limit))

        # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
        beta = (
            random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
        )

        noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])

        # Generate mesh grid centered at zero.
        ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
        # > Shape (ksize, ksize, 2)
        grid = np.stack(np.meshgrid(ax, ax), axis=-1)

        # Calculate rotated sigma matrix
        d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
        u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
        sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

        inverse_sigma = np.linalg.inv(sigma_matrix)
        # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
        kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
        # Add noise
        kernel *= noise_matrix

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)
        return {"kernel": kernel}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str]:
        return (
            "blur_limit",
            "sigma_x_limit",
            "sigma_y_limit",
            "rotate_limit",
            "beta_limit",
            "noise_limit",
        )

`apply (self, img, kernel, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel=kernel)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params(self) -> dict[str, np.ndarray]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
    sigma_x = random.uniform(*self.sigma_x_limit)
    sigma_y = random.uniform(*self.sigma_y_limit)
    angle = np.deg2rad(random.uniform(*self.rotate_limit))

    # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
    beta = (
        random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
    )

    noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])

    # Generate mesh grid centered at zero.
    ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
    # > Shape (ksize, ksize, 2)
    grid = np.stack(np.meshgrid(ax, ax), axis=-1)

    # Calculate rotated sigma matrix
    d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
    u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
    sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

    inverse_sigma = np.linalg.inv(sigma_matrix)
    # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
    kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
    # Add noise
    kernel *= noise_matrix

    # Normalize kernel
    kernel = kernel.astype(np.float32) / np.sum(kernel)
    return {"kernel": kernel}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str]:
    return (
        "blur_limit",
        "sigma_x_limit",
        "sigma_y_limit",
        "rotate_limit",
        "beta_limit",
        "noise_limit",
    )

`class Blur` `(blur_limit=7, p=0.5, always_apply=None)` [view source on GitHub] ¶

Blur the input image using a random-sized kernel.

Parameters:

Name	Type	Description
`blur_limit`	`ScaleIntType`	maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class Blur(ImageOnlyTransform):
    """Blur the input image using a random-sized kernel.

    Args:
        blur_limit: maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BlurInitSchema):
        pass

    def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.blur(img, kernel)

    def get_params(self) -> dict[str, Any]:
        return {"kernel": random_utils.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("blur_limit",)

`apply (self, img, kernel, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fblur.blur(img, kernel)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {"kernel": random_utils.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("blur_limit",)

`class Defocus` `(radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply defocus transform.

Parameters:

Name	Type	Description
`radius`	`int, int) or int`	range for radius of defocusing. If limit is a single int, the range will be [1, limit]. Default: (3, 10).
`alias_blur`	`float, float) or float`	range for alias_blur of defocusing (sigma of gaussian blur). If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: unit8, float32

Reference

https://arxiv.org/abs/1903.12261

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class Defocus(ImageOnlyTransform):
    """Apply defocus transform.

    Args:
        radius ((int, int) or int): range for radius of defocusing.
            If limit is a single int, the range will be [1, limit]. Default: (3, 10).
        alias_blur ((float, float) or float): range for alias_blur of defocusing (sigma of gaussian blur).
            If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        unit8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
    """

    class InitSchema(BaseTransformInitSchema):
        radius: OnePlusIntRangeType = (3, 10)
        alias_blur: NonNegativeFloatRangeType = (0.1, 0.5)

    def __init__(
        self,
        radius: ScaleIntType = (3, 10),
        alias_blur: ScaleFloatType = (0.1, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.radius = cast(Tuple[int, int], radius)
        self.alias_blur = cast(Tuple[float, float], alias_blur)

    def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
        return fblur.defocus(img, radius, alias_blur)

    def get_params(self) -> dict[str, Any]:
        return {
            "radius": random.randint(self.radius[0], self.radius[1]),
            "alias_blur": random.uniform(self.alias_blur[0], self.alias_blur[1]),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("radius", "alias_blur")

`apply (self, img, radius, alias_blur, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
    return fblur.defocus(img, radius, alias_blur)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {
        "radius": random.randint(self.radius[0], self.radius[1]),
        "alias_blur": random.uniform(self.alias_blur[0], self.alias_blur[1]),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("radius", "alias_blur")

`class GaussianBlur` `(blur_limit=(3, 7), sigma_limit=0, always_apply=None, p=0.5)` [view source on GitHub] ¶

Blur the input image using a Gaussian filter with a random kernel size.

Parameters:

Name	Type	Description
`blur_limit`	`int, (int, int`	maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`. If set single value `blur_limit` will be in range (0, blur_limit). Default: (3, 7).
`sigma_limit`	`float, (float, float`	Gaussian kernel standard deviation. Must be in range [0, inf). If set single value `sigma_limit` will be in range (0, sigma_limit). If set to 0 sigma will be computed as `sigma = 0.3((ksize-1)0.5 - 1) + 0.8`. Default: 0.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class GaussianBlur(ImageOnlyTransform):
    """Blur the input image using a Gaussian filter with a random kernel size.

    Args:
        blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BlurInitSchema):
        sigma_limit: NonNegativeFloatRangeType = 0

        @field_validator("blur_limit")
        @classmethod
        def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
            return process_blur_limit(value, info, min_value=0)

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.blur_limit, (tuple, list))
                and self.blur_limit[0] == 0
                and isinstance(self.sigma_limit, (tuple, list))
                and self.sigma_limit[0] == 0
            ):
                self.blur_limit = 3, max(3, self.blur_limit[1])
                warnings.warn(
                    "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                    "blur_limit minimum value changed to 3.",
                    stacklevel=2,
                )

            if isinstance(self.blur_limit, tuple):
                for v in self.blur_limit:
                    if v != 0 and v % 2 != 1:
                        raise ValueError(f"Blur limit must be 0 or odd. Got: {self.blur_limit}")

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_limit = cast(Tuple[float, float], sigma_limit)

    def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
        return fblur.gaussian_blur(img, ksize, sigma=sigma)

    def get_params(self) -> dict[str, float]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
        if ksize != 0 and ksize % 2 != 1:
            ksize = (ksize + 1) % (self.blur_limit[1] + 1)

        return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("blur_limit", "sigma_limit")

`apply (self, img, ksize, sigma, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
    return fblur.gaussian_blur(img, ksize, sigma=sigma)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params(self) -> dict[str, float]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
    if ksize != 0 and ksize % 2 != 1:
        ksize = (ksize + 1) % (self.blur_limit[1] + 1)

    return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("blur_limit", "sigma_limit")

`class GlassBlur` `(sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply glass noise to the input image.

Parameters:

Name	Type	Description
`sigma`	`float`	standard deviation for Gaussian kernel.
`max_delta`	`int`	max distance between pixels which are swapped.
`iterations`	`int`	number of repeats. Should be in range [1, inf). Default: (2).
`mode`	`str`	mode of computation: fast or exact. Default: "fast".
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Reference

https://arxiv.org/abs/1903.12261 https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class GlassBlur(ImageOnlyTransform):
    """Apply glass noise to the input image.

    Args:
        sigma (float): standard deviation for Gaussian kernel.
        max_delta (int): max distance between pixels which are swapped.
        iterations (int): number of repeats.
            Should be in range [1, inf). Default: (2).
        mode (str): mode of computation: fast or exact. Default: "fast".
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
        https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

    """

    class InitSchema(BaseTransformInitSchema):
        sigma: float = Field(default=0.7, ge=0, description="Standard deviation for the Gaussian kernel.")
        max_delta: int = Field(default=4, ge=1, description="Maximum distance between pixels that are swapped.")
        iterations: int = Field(default=2, ge=1, description="Number of times the glass noise effect is applied.")
        mode: Literal["fast", "exact"] = "fast"

    def __init__(
        self,
        sigma: float = 0.7,
        max_delta: int = 4,
        iterations: int = 2,
        mode: Literal["fast", "exact"] = "fast",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.sigma = sigma
        self.max_delta = max_delta
        self.iterations = iterations
        self.mode = mode

    def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
        if dxy is None:
            msg = "dxy is None"
            raise ValueError(msg)

        return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]

        # generate array containing all necessary values for transformations
        width_pixels = height - self.max_delta * 2
        height_pixels = width - self.max_delta * 2
        total_pixels = int(width_pixels * height_pixels)
        dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

        return {"dxy": dxy}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return ("sigma", "max_delta", "iterations", "mode")

`apply (self, img, args, , dxy, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
    if dxy is None:
        msg = "dxy is None"
        raise ValueError(msg)

    return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    height, width = params["shape"][:2]

    # generate array containing all necessary values for transformations
    width_pixels = height - self.max_delta * 2
    height_pixels = width - self.max_delta * 2
    total_pixels = int(width_pixels * height_pixels)
    dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

    return {"dxy": dxy}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return ("sigma", "max_delta", "iterations", "mode")

`class MedianBlur` `(blur_limit=7, p=0.5, always_apply=None)` [view source on GitHub] ¶

Blur the input image using a median filter with a random aperture linear size.

Parameters:

Name	Type	Description
`blur_limit`	`int`	maximum aperture linear size for blurring the input image. Must be odd and in range [3, inf). Default: (3, 7).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class MedianBlur(Blur):
    """Blur the input image using a median filter with a random aperture linear size.

    Args:
        blur_limit (int): maximum aperture linear size for blurring the input image.
            Must be odd and in range [3, inf). Default: (3, 7).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(blur_limit, p, always_apply)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.median_blur(img, kernel)

`init (self, blur_limit=7, p=0.5, always_apply=None)` `special` ¶

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/blur/transforms.py

Python

def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
    super().__init__(blur_limit, p, always_apply)

`apply (self, img, kernel, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fblur.median_blur(img, kernel)

`class MotionBlur` `(blur_limit=7, allow_shifted=True, always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply motion blur to the input image using a random-sized kernel.

Parameters:

Name	Type	Description
`blur_limit`	`int`	maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).
`allow_shifted`	`bool`	if set to true creates non shifted kernels only, otherwise creates randomly shifted kernels. Default: True.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class MotionBlur(Blur):
    """Apply motion blur to the input image using a random-sized kernel.

    Args:
        blur_limit (int): maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        allow_shifted (bool): if set to true creates non shifted kernels only,
            otherwise creates randomly shifted kernels. Default: True.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        allow_shifted: bool = Field(
            default=True,
            description="If set to true creates non-shifted kernels only, otherwise creates randomly shifted kernels.",
        )
        blur_limit: ScaleIntType = Field(
            default=(3, 7),
            description="Maximum kernel size for blurring the input image.",
        )

        @model_validator(mode="after")
        def process_blur(self) -> Self:
            self.blur_limit = cast(Tuple[int, int], to_tuple(self.blur_limit, 3))

            if self.allow_shifted and isinstance(self.blur_limit, tuple) and any(x % 2 != 1 for x in self.blur_limit):
                raise ValueError(f"Blur limit must be odd when centered=True. Got: {self.blur_limit}")

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = 7,
        allow_shifted: bool = True,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(blur_limit=blur_limit, p=p, always_apply=always_apply)
        self.allow_shifted = allow_shifted
        self.blur_limit = cast(Tuple[int, int], blur_limit)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "allow_shifted")

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> dict[str, Any]:
        ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
        if ksize <= TWO:
            raise ValueError(f"ksize must be > 2. Got: {ksize}")
        kernel = np.zeros((ksize, ksize), dtype=np.uint8)
        x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
        if x1 == x2:
            y1, y2 = random.sample(range(ksize), 2)
        else:
            y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

        def make_odd_val(v1: int, v2: int) -> tuple[int, int]:
            len_v = abs(v1 - v2) + 1
            if len_v % 2 != 1:
                if v2 > v1:
                    v2 -= 1
                else:
                    v1 -= 1
            return v1, v2

        if not self.allow_shifted:
            x1, x2 = make_odd_val(x1, x2)
            y1, y2 = make_odd_val(y1, y2)

            xc = (x1 + x2) / 2
            yc = (y1 + y2) / 2

            center = ksize / 2 - 0.5
            dx = xc - center
            dy = yc - center
            x1, x2 = (int(i - dx) for i in [x1, x2])
            y1, y2 = (int(i - dy) for i in [y1, y2])

        cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

        # Normalize kernel
        return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}

`apply (self, img, kernel, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel=kernel)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
    if ksize <= TWO:
        raise ValueError(f"ksize must be > 2. Got: {ksize}")
    kernel = np.zeros((ksize, ksize), dtype=np.uint8)
    x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
    if x1 == x2:
        y1, y2 = random.sample(range(ksize), 2)
    else:
        y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

    def make_odd_val(v1: int, v2: int) -> tuple[int, int]:
        len_v = abs(v1 - v2) + 1
        if len_v % 2 != 1:
            if v2 > v1:
                v2 -= 1
            else:
                v1 -= 1
        return v1, v2

    if not self.allow_shifted:
        x1, x2 = make_odd_val(x1, x2)
        y1, y2 = make_odd_val(y1, y2)

        xc = (x1 + x2) / 2
        yc = (y1 + y2) / 2

        center = ksize / 2 - 0.5
        dx = xc - center
        dy = yc - center
        x1, x2 = (int(i - dx) for i in [x1, x2])
        y1, y2 = (int(i - dy) for i in [y1, y2])

    cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

    # Normalize kernel
    return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (*super().get_transform_init_args_names(), "allow_shifted")

`class ZoomBlur` `(max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply zoom blur transform.

Parameters:

Name	Type	Description
`max_factor`	`float, float) or float`	range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1.
`step_factor`	`float, float) or float`	If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03). All step_factor values should be positive.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: unit8, float32

Reference

https://arxiv.org/abs/1903.12261

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py

Python

class ZoomBlur(ImageOnlyTransform):
    """Apply zoom blur transform.

    Args:
        max_factor ((float, float) or float): range for max factor for blurring.
            If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
            All max_factor values should be larger than 1.
        step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
            If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
            All step_factor values should be positive.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        unit8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
    """

    class InitSchema(BaseTransformInitSchema):
        max_factor: OnePlusFloatRangeType = (1, 1.31)
        step_factor: NonNegativeFloatRangeType = (0.01, 0.03)

    def __init__(
        self,
        max_factor: ScaleFloatType = (1, 1.31),
        step_factor: ScaleFloatType = (0.01, 0.03),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.max_factor = cast(Tuple[float, float], max_factor)
        self.step_factor = cast(Tuple[float, float], step_factor)

    def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
        return fblur.zoom_blur(img, zoom_factors)

    def get_params(self) -> dict[str, Any]:
        max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
        step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
        return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("max_factor", "step_factor")

`apply (self, img, zoom_factors, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py

Python

def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
    return fblur.zoom_blur(img, zoom_factors)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
    step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
    return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("max_factor", "step_factor")

`crops` `special` ¶

`functional` ¶

`def crop_keypoint_by_coords (keypoint, crop_coords)` [view source on GitHub]¶

Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

Parameters:

Name	Type	Description
`keypoint`	`tuple`	A keypoint `(x, y, angle, scale)`.
`crop_coords`	`tuple`	Crop box coords `(x1, x2, y1, y2)`.

Returns:

Type	Description
`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.

Source code in albumentations/augmentations/crops/functional.py

Python

def crop_keypoint_by_coords(
    keypoint: KeypointInternalType,
    crop_coords: tuple[int, int, int, int],
) -> KeypointInternalType:
    """Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the
    required height and width of the crop.

    Args:
        keypoint (tuple): A keypoint `(x, y, angle, scale)`.
        crop_coords (tuple): Crop box coords `(x1, x2, y1, y2)`.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    x1, y1 = crop_coords[:2]
    return x - x1, y - y1, angle, scale

`transforms` ¶

`class BBoxSafeRandomCrop` `(erosion_rate=0.0, p=1.0, always_apply=None)` [view source on GitHub] ¶

Crop a random part of the input without loss of bboxes.

Parameters:

Name	Type	Description
`erosion_rate`	`float`	erosion rate applied on input image height before crop.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class BBoxSafeRandomCrop(_BaseCrop):
    """Crop a random part of the input without loss of bboxes.

    Args:
        erosion_rate: erosion rate applied on input image height before crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        erosion_rate: float = Field(
            default=0.0,
            ge=0.0,
            le=1.0,
            description="Erosion rate applied on input image height before crop.",
        )
        p: ProbabilityType = 1

    def __init__(self, erosion_rate: float = 0.0, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.erosion_rate = erosion_rate

    def _get_coords_no_bbox(self, image_height: int, image_width: int) -> tuple[int, int, int, int]:
        erosive_h = int(image_height * (1.0 - self.erosion_rate))
        crop_height = image_height if erosive_h >= image_height else random.randint(erosive_h, image_height)

        crop_width = int(crop_height * image_width / image_height)

        h_start = random.random()
        w_start = random.random()

        return fcrops.get_crop_coords(image_height, image_width, crop_height, crop_width, h_start, w_start)

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_height, image_width = params["shape"][:2]

        if len(data["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
            crop_coords = self._get_coords_no_bbox(image_height, image_width)
            return {"crop_coords": crop_coords}

        bbox_union = union_of_bboxes(bboxes=data["bboxes"], erosion_rate=self.erosion_rate)

        if bbox_union is None:
            crop_coords = self._get_coords_no_bbox(image_height, image_width)
            return {"crop_coords": crop_coords}

        x_min, y_min, x_max, y_max = bbox_union

        x_min = np.clip(x_min, 0, 1)
        y_min = np.clip(y_min, 0, 1)
        x_max = np.clip(x_max, x_min, 1)
        y_max = np.clip(y_max, y_min, 1)

        crop_x_min = int(x_min * random.random() * image_width)
        crop_y_min = int(y_min * random.random() * image_height)

        bbox_xmax = x_max + (1 - x_max) * random.random()
        bbox_ymax = y_max + (1 - y_max) * random.random()
        crop_x_max = int(bbox_xmax * image_width)
        crop_y_max = int(bbox_ymax * image_height)

        return {"crop_coords": (crop_x_min, crop_y_min, crop_x_max, crop_y_max)}

    @property
    def targets_as_params(self) -> list[str]:
        return ["bboxes"]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("erosion_rate",)

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_height, image_width = params["shape"][:2]

    if len(data["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
        crop_coords = self._get_coords_no_bbox(image_height, image_width)
        return {"crop_coords": crop_coords}

    bbox_union = union_of_bboxes(bboxes=data["bboxes"], erosion_rate=self.erosion_rate)

    if bbox_union is None:
        crop_coords = self._get_coords_no_bbox(image_height, image_width)
        return {"crop_coords": crop_coords}

    x_min, y_min, x_max, y_max = bbox_union

    x_min = np.clip(x_min, 0, 1)
    y_min = np.clip(y_min, 0, 1)
    x_max = np.clip(x_max, x_min, 1)
    y_max = np.clip(y_max, y_min, 1)

    crop_x_min = int(x_min * random.random() * image_width)
    crop_y_min = int(y_min * random.random() * image_height)

    bbox_xmax = x_max + (1 - x_max) * random.random()
    bbox_ymax = y_max + (1 - y_max) * random.random()
    crop_x_max = int(bbox_xmax * image_width)
    crop_y_max = int(bbox_ymax * image_height)

    return {"crop_coords": (crop_x_min, crop_y_min, crop_x_max, crop_y_max)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("erosion_rate",)

`class CenterCrop` `(height, width, p=1.0, always_apply=None)` [view source on GitHub] ¶

Crop the central part of the input.

Parameters:

Name	Type	Description
`height`	`int`	height of the crop.
`width`	`int`	width of the crop.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class CenterCrop(_BaseCrop):
    """Crop the central part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(CropInitSchema):
        pass

    def __init__(self, height: int, width: int, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.height = height
        self.width = width

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width"

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_height, image_width = params["shape"][:2]
        crop_coords = fcrops.get_center_crop_coords(image_height, image_width, self.height, self.width)

        return {"crop_coords": crop_coords}

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_height, image_width = params["shape"][:2]
    crop_coords = fcrops.get_center_crop_coords(image_height, image_width, self.height, self.width)

    return {"crop_coords": crop_coords}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width"

`class Crop` `(x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=None, p=1.0)` [view source on GitHub] ¶

Crop region from image.

Parameters:

Name	Type	Description
`x_min`	`int`	Minimum upper left x coordinate.
`y_min`	`int`	Minimum upper left y coordinate.
`x_max`	`int`	Maximum lower right x coordinate.
`y_max`	`int`	Maximum lower right y coordinate.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class Crop(_BaseCrop):
    """Crop region from image.

    Args:
        x_min: Minimum upper left x coordinate.
        y_min: Minimum upper left y coordinate.
        x_max: Maximum lower right x coordinate.
        y_max: Maximum lower right y coordinate.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        x_min: Annotated[int, Field(ge=0, description="Minimum upper left x coordinate")]
        y_min: Annotated[int, Field(ge=0, description="Minimum upper left y coordinate")]
        x_max: Annotated[int, Field(gt=0, description="Maximum lower right x coordinate")]
        y_max: Annotated[int, Field(gt=0, description="Maximum lower right y coordinate")]
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_coordinates(self) -> Self:
            if not self.x_min < self.x_max:
                msg = "x_max must be greater than x_min"
                raise ValueError(msg)
            if not self.y_min < self.y_max:
                msg = "y_max must be greater than y_min"
                raise ValueError(msg)
            return self

    def __init__(
        self,
        x_min: int = 0,
        y_min: int = 0,
        x_max: int = 1024,
        y_max: int = 1024,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "x_min", "y_min", "x_max", "y_max"

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        return {"crop_coords": (self.x_min, self.y_min, self.x_max, self.y_max)}

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    return {"crop_coords": (self.x_min, self.y_min, self.x_max, self.y_max)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "x_min", "y_min", "x_max", "y_max"

`class CropAndPad` `(px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e., extracts a subimage from a given full image). Padding adds pixels to the sides (e.g., black pixels). This transformation will never crop images below a height or width of 1.

Note

This transformation automatically resizes images back to their original size. To deactivate this, add the parameter keep_size=False.

Parameters:

Name	Type	Description
`px`	`int, tuple[int, int], tuple[int, int, int, int], tuple[Union[int, tuple[int, int], list[int]], Union[int, tuple[int, int], list[int]], Union[int, tuple[int, int], list[int]], Union[int, tuple[int, int], list[int]]]`	The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter `percent` may be set, not both at the same time. * If `None`, then pixel-based cropping/padding will not be used. * If `int`, then that exact number of pixels will always be cropped/padded. * If a `tuple` of two `int`s with values `a` and `b`, then each side will be cropped/padded by a random amount sampled uniformly per image and side from the interval `[a, b]`. If `sample_independently` is set to `False`, only one value will be sampled per image and used for all sides. * If a `tuple` of four entries, then the entries represent top, right, bottom, and left. Each entry may be: - A single `int` (always crop/pad by exactly that value). - A `tuple` of two `int`s `a` and `b` (crop/pad by an amount within `[a, b]`). - A `list` of `int`s (crop/pad by a random value that is contained in the `list`).
`percent`	`float, tuple[float, float], tuple[float, float, float, float], tuple[Union[float, tuple[float, float], list[float]], Union[float, tuple[float, float], list[float]], Union[float, tuple[float, float], list[float]], Union[float, tuple[float, float], list[float]]]`	The number of pixels to crop (negative values) or pad (positive values) on each side of the image given as a fraction of the image height/width. E.g. if this is set to `-0.1`, the transformation will always crop away `10%` of the image's height at both the top and the bottom (both `10%` each), as well as `10%` of the width at the right and left. Expected value range is `(-1.0, inf)`. Either this or the parameter `px` may be set, not both at the same time. * If `None`, then fraction-based cropping/padding will not be used. * If `float`, then that fraction will always be cropped/padded. * If a `tuple` of two `float`s with values `a` and `b`, then each side will be cropped/padded by a random fraction sampled uniformly per image and side from the interval `[a, b]`. If `sample_independently` is set to `False`, only one value will be sampled per image and used for all sides. * If a `tuple` of four entries, then the entries represent top, right, bottom, and left. Each entry may be: - A single `float` (always crop/pad by exactly that percent value). - A `tuple` of two `float`s `a` and `b` (crop/pad by a fraction from `[a, b]`). - A `list` of `float`s (crop/pad by a random value that is contained in the `list`).
`pad_mode`	`int`	OpenCV border mode.
`pad_cval`	`Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]`	The constant value to use if the pad mode is `BORDER_CONSTANT`. * If `number`, then that value will be used. * If a `tuple` of two numbers and at least one of them is a `float`, then a random number will be uniformly sampled per image from the continuous interval `[a, b]` and used as the value. If both numbers are `int`s, the interval is discrete. * If a `list` of numbers, then a random value will be chosen from the elements of the `list` and used as the value.
`pad_cval_mask`	`Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]`	Same as `pad_cval` but only for masks.
`keep_size`	`bool`	After cropping and padding, the resulting image will usually have a different height/width compared to the original input image. If this parameter is set to `True`, then the cropped/padded image will be resized to the input image's size, i.e., the output shape is always identical to the input shape.
`sample_independently`	`bool`	If `False` and the values for `px`/`percent` result in exactly one probability distribution for all image sides, only one single value will be sampled from that probability distribution and used for all sides. I.e., the crop/pad amount then is the same for all sides. If `True`, four values will be sampled independently, one per side.
`interpolation`	`int`	OpenCV flag that is used to specify the interpolation algorithm for images. Should be one of: `cv2.INTER_NEAREST`, `cv2.INTER_LINEAR`, `cv2.INTER_CUBIC`, `cv2.INTER_AREA`, `cv2.INTER_LANCZOS4`. Default: `cv2.INTER_LINEAR`.

Targets

image, mask, bboxes, keypoints

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class CropAndPad(DualTransform):
    """Crop and pad images by pixel amounts or fractions of image sizes.
    Cropping removes pixels at the sides (i.e., extracts a subimage from a given full image).
    Padding adds pixels to the sides (e.g., black pixels).
    This transformation will never crop images below a height or width of 1.

    Note:
        This transformation automatically resizes images back to their original size. To deactivate this, add the
        parameter `keep_size=False`.

    Args:
        px (int,
            tuple[int, int],
            tuple[int, int, int, int],
            tuple[Union[int, tuple[int, int], list[int]],
                  Union[int, tuple[int, int], list[int]],
                  Union[int, tuple[int, int], list[int]],
                  Union[int, tuple[int, int], list[int]]]):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image.
                Either this or the parameter `percent` may be set, not both at the same time.

                * If `None`, then pixel-based cropping/padding will not be used.
                * If `int`, then that exact number of pixels will always be cropped/padded.
                * If a `tuple` of two `int`s with values `a` and `b`, then each side will be cropped/padded by a
                    random amount sampled uniformly per image and side from the interval `[a, b]`.
                    If `sample_independently` is set to `False`, only one value will be sampled per
                        image and used for all sides.
                * If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
                    Each entry may be:
                    - A single `int` (always crop/pad by exactly that value).
                    - A `tuple` of two `int`s `a` and `b` (crop/pad by an amount within `[a, b]`).
                    - A `list` of `int`s (crop/pad by a random value that is contained in the `list`).

        percent (float,
                 tuple[float, float],
                 tuple[float, float, float, float],
                 tuple[Union[float, tuple[float, float], list[float]],
                       Union[float, tuple[float, float], list[float]],
                       Union[float, tuple[float, float], list[float]],
                       Union[float, tuple[float, float], list[float]]]):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image given
                as a *fraction* of the image height/width. E.g. if this is set to `-0.1`, the transformation will
                always crop away `10%` of the image's height at both the top and the bottom (both `10%` each),
                as well as `10%` of the width at the right and left. Expected value range is `(-1.0, inf)`.
                Either this or the parameter `px` may be set, not both at the same time.

                * If `None`, then fraction-based cropping/padding will not be used.
                * If `float`, then that fraction will always be cropped/padded.
                * If a `tuple` of two `float`s with values `a` and `b`, then each side will be cropped/padded by a
                random fraction sampled uniformly per image and side from the interval `[a, b]`.
                If `sample_independently` is set to `False`, only one value will be sampled per image and used
                for all sides.
                * If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
                    Each entry may be:
                    - A single `float` (always crop/pad by exactly that percent value).
                    - A `tuple` of two `float`s `a` and `b` (crop/pad by a fraction from `[a, b]`).
                    - A `list` of `float`s (crop/pad by a random value that is contained in the `list`).

        pad_mode (int): OpenCV border mode.
        pad_cval (Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]):
            The constant value to use if the pad mode is `BORDER_CONSTANT`.
                * If `number`, then that value will be used.
                * If a `tuple` of two numbers and at least one of them is a `float`, then a random number
                    will be uniformly sampled per image from the continuous interval `[a, b]` and used as the value.
                    If both numbers are `int`s, the interval is discrete.
                * If a `list` of numbers, then a random value will be chosen from the elements of the `list` and
                    used as the value.

        pad_cval_mask (Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]):
            Same as `pad_cval` but only for masks.

        keep_size (bool):
            After cropping and padding, the resulting image will usually have a different height/width compared to
            the original input image. If this parameter is set to `True`, then the cropped/padded image will be
            resized to the input image's size, i.e., the output shape is always identical to the input shape.

        sample_independently (bool):
            If `False` and the values for `px`/`percent` result in exactly one probability distribution for all
            image sides, only one single value will be sampled from that probability distribution and used for
            all sides. I.e., the crop/pad amount then is the same for all sides. If `True`, four values
            will be sampled independently, one per side.

        interpolation (int):
            OpenCV flag that is used to specify the interpolation algorithm for images. Should be one of:
            `cv2.INTER_NEAREST`, `cv2.INTER_LINEAR`, `cv2.INTER_CUBIC`, `cv2.INTER_AREA`, `cv2.INTER_LANCZOS4`.
            Default: `cv2.INTER_LINEAR`.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        unit8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        px: PxType | None = Field(
            default=None,
            description="Number of pixels to crop (negative) or pad (positive).",
        )
        percent: PercentType | None = Field(
            default=None,
            description="Fraction of image size to crop (negative) or pad (positive).",
        )
        pad_mode: BorderModeType = cv2.BORDER_CONSTANT
        pad_cval: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = Field(
            default=0,
            description="Padding value if pad_mode is BORDER_CONSTANT.",
        )
        pad_cval_mask: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = Field(
            default=0,
            description="Padding value for masks if pad_mode is BORDER_CONSTANT.",
        )
        keep_size: bool = Field(
            default=True,
            description="Whether to resize the image back to the original size after cropping and padding.",
        )
        sample_independently: bool = Field(
            default=True,
            description="Whether to sample the crop/pad size independently for each side.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def check_px_percent(self) -> Self:
            if self.px is None and self.percent is None:
                msg = "Both px and percent parameters cannot be None simultaneously."
                raise ValueError(msg)
            if self.px is not None and self.percent is not None:
                msg = "Only px or percent may be set!"
                raise ValueError(msg)
            return self

    def __init__(
        self,
        px: int | list[int] | None = None,
        percent: float | list[float] | None = None,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_cval: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = 0,
        pad_cval_mask: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = 0,
        keep_size: bool = True,
        sample_independently: bool = True,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.px = px
        self.percent = percent

        self.pad_mode = pad_mode
        self.pad_cval = pad_cval
        self.pad_cval_mask = pad_cval_mask

        self.keep_size = keep_size
        self.sample_independently = sample_independently

        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        pad_value: ColorType,
        rows: int,
        cols: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            img,
            crop_params,
            pad_params,
            pad_value,
            rows,
            cols,
            interpolation,
            self.pad_mode,
            self.keep_size,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        pad_value_mask: float,
        rows: int,
        cols: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            mask,
            crop_params,
            pad_params,
            pad_value_mask,
            rows,
            cols,
            interpolation,
            self.pad_mode,
            self.keep_size,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        rows: int,
        cols: int,
        result_rows: int,
        result_cols: int,
        **params: Any,
    ) -> BoxInternalType:
        return fcrops.crop_and_pad_bbox(bbox, crop_params, pad_params, rows, cols, result_rows, result_cols)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        rows: int,
        cols: int,
        result_rows: int,
        result_cols: int,
        **params: Any,
    ) -> KeypointInternalType:
        return fcrops.crop_and_pad_keypoint(
            keypoint,
            crop_params,
            pad_params,
            rows,
            cols,
            result_rows,
            result_cols,
            self.keep_size,
        )

    @staticmethod
    def __prevent_zero(val1: int, val2: int, max_val: int) -> tuple[int, int]:
        regain = abs(max_val) + 1
        regain1 = regain // 2
        regain2 = regain // 2
        if regain1 + regain2 < regain:
            regain1 += 1

        if regain1 > val1:
            diff = regain1 - val1
            regain1 = val1
            regain2 += diff
        elif regain2 > val2:
            diff = regain2 - val2
            regain2 = val2
            regain1 += diff

        return val1 - regain1, val2 - regain2

    @staticmethod
    def _prevent_zero(crop_params: list[int], height: int, width: int) -> list[int]:
        top, right, bottom, left = crop_params

        remaining_height = height - (top + bottom)
        remaining_width = width - (left + right)

        if remaining_height < 1:
            top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
        if remaining_width < 1:
            left, right = CropAndPad.__prevent_zero(left, right, width)

        return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        if self.px is not None:
            new_params = self._get_px_params()
        else:
            percent_params = self._get_percent_params()
            new_params = [
                int(percent_params[0] * height),
                int(percent_params[1] * width),
                int(percent_params[2] * height),
                int(percent_params[3] * width),
            ]

        pad_params = [max(i, 0) for i in new_params]

        crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

        top, right, bottom, left = crop_params
        crop_params = [left, top, width - right, height - bottom]
        result_rows = crop_params[3] - crop_params[1]
        result_cols = crop_params[2] - crop_params[0]
        if result_cols == width and result_rows == height:
            crop_params = []

        top, right, bottom, left = pad_params
        pad_params = [top, bottom, left, right]
        if any(pad_params):
            result_rows += top + bottom
            result_cols += left + right
        else:
            pad_params = []

        return {
            "crop_params": crop_params or None,
            "pad_params": pad_params or None,
            "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
            "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
            "result_rows": result_rows,
            "result_cols": result_cols,
        }

    def _get_px_params(self) -> list[int]:
        if self.px is None:
            msg = "px is not set"
            raise ValueError(msg)

        if isinstance(self.px, int):
            params = [self.px] * 4
        elif len(self.px) == PAIR:
            if self.sample_independently:
                params = [random.randrange(*self.px) for _ in range(4)]
            else:
                px = random.randrange(*self.px)
                params = [px] * 4
        elif isinstance(self.px[0], int):
            params = self.px
        elif len(self.px[0]) == PAIR:
            params = [random.randrange(*i) for i in self.px]
        else:
            params = [random.choice(i) for i in self.px]

        return params

    def _get_percent_params(self) -> list[float]:
        if self.percent is None:
            msg = "percent is not set"
            raise ValueError(msg)

        if isinstance(self.percent, float):
            params = [self.percent] * 4
        elif len(self.percent) == PAIR:
            if self.sample_independently:
                params = [random.uniform(*self.percent) for _ in range(4)]
            else:
                px = random.uniform(*self.percent)
                params = [px] * 4
        elif isinstance(self.percent[0], (int, float)):
            params = self.percent
        elif len(self.percent[0]) == PAIR:
            params = [random.uniform(*i) for i in self.percent]
        else:
            params = [random.choice(i) for i in self.percent]

        return params  # params = [top, right, bottom, left]

    @staticmethod
    def _get_pad_value(
        pad_value: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType],
    ) -> ScalarType:
        if isinstance(pad_value, (int, float)):
            return pad_value

        if len(pad_value) == PAIR:
            a, b = pad_value
            if isinstance(a, int) and isinstance(b, int):
                return random.randint(a, b)

            return random.uniform(a, b)

        return random.choice(pad_value)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "px",
            "percent",
            "pad_mode",
            "pad_cval",
            "pad_cval_mask",
            "keep_size",
            "sample_independently",
            "interpolation",
        )

`apply (self, img, crop_params, pad_params, pad_value, rows, cols, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    crop_params: Sequence[int],
    pad_params: Sequence[int],
    pad_value: ColorType,
    rows: int,
    cols: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.crop_and_pad(
        img,
        crop_params,
        pad_params,
        pad_value,
        rows,
        cols,
        interpolation,
        self.pad_mode,
        self.keep_size,
    )

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    if self.px is not None:
        new_params = self._get_px_params()
    else:
        percent_params = self._get_percent_params()
        new_params = [
            int(percent_params[0] * height),
            int(percent_params[1] * width),
            int(percent_params[2] * height),
            int(percent_params[3] * width),
        ]

    pad_params = [max(i, 0) for i in new_params]

    crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

    top, right, bottom, left = crop_params
    crop_params = [left, top, width - right, height - bottom]
    result_rows = crop_params[3] - crop_params[1]
    result_cols = crop_params[2] - crop_params[0]
    if result_cols == width and result_rows == height:
        crop_params = []

    top, right, bottom, left = pad_params
    pad_params = [top, bottom, left, right]
    if any(pad_params):
        result_rows += top + bottom
        result_cols += left + right
    else:
        pad_params = []

    return {
        "crop_params": crop_params or None,
        "pad_params": pad_params or None,
        "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
        "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
        "result_rows": result_rows,
        "result_cols": result_cols,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "px",
        "percent",
        "pad_mode",
        "pad_cval",
        "pad_cval_mask",
        "keep_size",
        "sample_independently",
        "interpolation",
    )

`class CropNonEmptyMaskIfExists` `(height, width, ignore_values=None, ignore_channels=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

Crop area with mask if mask is non-empty, else make random crop.

Parameters:

Name	Type	Description
`height`	`int`	vertical size of crop in pixels
`width`	`int`	horizontal size of crop in pixels
`ignore_values`	`list of int`	values to ignore in mask, `0` values are always ignored (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
`ignore_channels`	`list of int`	channels to ignore in mask (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
`p`	`float`	probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class CropNonEmptyMaskIfExists(_BaseCrop):
    """Crop area with mask if mask is non-empty, else make random crop.

    Args:
        height: vertical size of crop in pixels
        width: horizontal size of crop in pixels
        ignore_values (list of int): values to ignore in mask, `0` values are always ignored
            (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
        ignore_channels (list of int): channels to ignore in mask
            (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(CropInitSchema):
        ignore_values: list[int] | None = Field(
            default=None,
            description="Values to ignore in mask, `0` values are always ignored",
        )
        ignore_channels: list[int] | None = Field(default=None, description="Channels to ignore in mask")

    def __init__(
        self,
        height: int,
        width: int,
        ignore_values: list[int] | None = None,
        ignore_channels: list[int] | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)

        self.height = height
        self.width = width
        self.ignore_values = ignore_values
        self.ignore_channels = ignore_channels

    def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
        mask_height, mask_width = mask.shape[:2]

        if self.ignore_values is not None:
            ignore_values_np = np.array(self.ignore_values)
            mask = np.where(np.isin(mask, ignore_values_np), 0, mask)

        if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS and self.ignore_channels is not None:
            target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
            mask = np.take(mask, target_channels, axis=-1)

        if self.height > mask_height or self.width > mask_width:
            raise ValueError(
                f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})",
            )

        return mask

    def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
        super().update_params(params, **kwargs)
        if "mask" in kwargs:
            mask = self._preprocess_mask(kwargs["mask"])
        elif "masks" in kwargs and len(kwargs["masks"]):
            masks = kwargs["masks"]
            mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
            for m in masks[1:]:
                mask |= self._preprocess_mask(m)
        else:
            msg = "Can not find mask for CropNonEmptyMaskIfExists"
            raise RuntimeError(msg)

        mask_height, mask_width = mask.shape[:2]

        if mask.any():
            mask = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
            non_zero_yx = np.argwhere(mask)
            y, x = random.choice(non_zero_yx)
            x_min = x - random.randint(0, self.width - 1)
            y_min = y - random.randint(0, self.height - 1)
            x_min = np.clip(x_min, 0, mask_width - self.width)
            y_min = np.clip(y_min, 0, mask_height - self.height)
        else:
            x_min = random.randint(0, mask_width - self.width)
            y_min = random.randint(0, mask_height - self.height)

        x_max = x_min + self.width
        y_max = y_min + self.height

        crop_coords = x_min, y_min, x_max, y_max

        params["crop_coords"] = crop_coords
        return params

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width", "ignore_values", "ignore_channels"

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width", "ignore_values", "ignore_channels"

`update_params (self, params, **kwargs)` ¶

Update parameters with transform specific params. This method is deprecated, use: - get_params for transform specific params like interpolation and - update_params_shape for data like shape.

Source code in albumentations/augmentations/crops/transforms.py

Python

def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
    super().update_params(params, **kwargs)
    if "mask" in kwargs:
        mask = self._preprocess_mask(kwargs["mask"])
    elif "masks" in kwargs and len(kwargs["masks"]):
        masks = kwargs["masks"]
        mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
        for m in masks[1:]:
            mask |= self._preprocess_mask(m)
    else:
        msg = "Can not find mask for CropNonEmptyMaskIfExists"
        raise RuntimeError(msg)

    mask_height, mask_width = mask.shape[:2]

    if mask.any():
        mask = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
        non_zero_yx = np.argwhere(mask)
        y, x = random.choice(non_zero_yx)
        x_min = x - random.randint(0, self.width - 1)
        y_min = y - random.randint(0, self.height - 1)
        x_min = np.clip(x_min, 0, mask_width - self.width)
        y_min = np.clip(y_min, 0, mask_height - self.height)
    else:
        x_min = random.randint(0, mask_width - self.width)
        y_min = random.randint(0, mask_height - self.height)

    x_max = x_min + self.width
    y_max = y_min + self.height

    crop_coords = x_min, y_min, x_max, y_max

    params["crop_coords"] = crop_coords
    return params

`class RandomCrop` `(height, width, p=1.0, always_apply=None)` [view source on GitHub] ¶

Crop a random part of the input.

Parameters:

Name	Type	Description
`height`	`int`	height of the crop.
`width`	`int`	width of the crop.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class RandomCrop(_BaseCrop):
    """Crop a random part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(CropInitSchema):
        pass

    def __init__(self, height: int, width: int, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.height = height
        self.width = width

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        shape = params["shape"]

        image_height, image_width = shape[:2]

        if self.height > image_height or self.width > image_width:
            raise CropSizeError(
                f"Crop size (height, width) exceeds image dimensions (height, width):"
                f" {(self.height, self.width)} vs {shape[:2]}",
            )

        h_start = random.random()
        w_start = random.random()
        crop_coords = fcrops.get_crop_coords(image_height, image_width, self.height, self.width, h_start, w_start)
        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width"

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    shape = params["shape"]

    image_height, image_width = shape[:2]

    if self.height > image_height or self.width > image_width:
        raise CropSizeError(
            f"Crop size (height, width) exceeds image dimensions (height, width):"
            f" {(self.height, self.width)} vs {shape[:2]}",
        )

    h_start = random.random()
    w_start = random.random()
    crop_coords = fcrops.get_crop_coords(image_height, image_width, self.height, self.width, h_start, w_start)
    return {"crop_coords": crop_coords}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width"

`class RandomCropFromBorders` `(crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=None, p=1.0)` [view source on GitHub] ¶

Randomly crops parts of the image from the borders without resizing at the end. The cropped regions are defined as fractions of the original image dimensions, specified for each side of the image (left, right, top, bottom).

Parameters:

Name	Type	Description
`crop_left`	`float`	Fraction of the width to randomly crop from the left side. Must be in the range [0.0, 1.0]. Default is 0.1.
`crop_right`	`float`	Fraction of the width to randomly crop from the right side. Must be in the range [0.0, 1.0]. Default is 0.1.
`crop_top`	`float`	Fraction of the height to randomly crop from the top side. Must be in the range [0.0, 1.0]. Default is 0.1.
`crop_bottom`	`float`	Fraction of the height to randomly crop from the bottom side. Must be in the range [0.0, 1.0]. Default is 0.1.
`p`	`float`	Probability of applying the transform. Default is 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class RandomCropFromBorders(_BaseCrop):
    """Randomly crops parts of the image from the borders without resizing at the end. The cropped regions are defined
    as fractions of the original image dimensions, specified for each side of the image (left, right, top, bottom).

    Args:
        crop_left (float): Fraction of the width to randomly crop from the left side. Must be in the range [0.0, 1.0].
                            Default is 0.1.
        crop_right (float): Fraction of the width to randomly crop from the right side. Must be in the range [0.0, 1.0].
                            Default is 0.1.
        crop_top (float): Fraction of the height to randomly crop from the top side. Must be in the range [0.0, 1.0].
                          Default is 0.1.
        crop_bottom (float): Fraction of the height to randomly crop from the bottom side.
                             Must be in the range [0.0, 1.0]. Default is 0.1.
        p (float): Probability of applying the transform. Default is 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        crop_left: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of width to randomly crop from the left side.",
        )
        crop_right: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of width to randomly crop from the right side.",
        )
        crop_top: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of height to randomly crop from the top side.",
        )
        crop_bottom: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of height to randomly crop from the bottom side.",
        )
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_crop_values(self) -> Self:
            if self.crop_left + self.crop_right > 1.0:
                msg = "The sum of crop_left and crop_right must be <= 1."
                raise ValueError(msg)
            if self.crop_top + self.crop_bottom > 1.0:
                msg = "The sum of crop_top and crop_bottom must be <= 1."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        crop_left: float = 0.1,
        crop_right: float = 0.1,
        crop_top: float = 0.1,
        crop_bottom: float = 0.1,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)
        self.crop_left = crop_left
        self.crop_right = crop_right
        self.crop_top = crop_top
        self.crop_bottom = crop_bottom

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        height, width = params["shape"][:2]

        x_min = random.randint(0, int(self.crop_left * width))
        x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width)

        y_min = random.randint(0, int(self.crop_top * height))
        y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height)

        crop_coords = x_min, y_min, x_max, y_max

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "crop_left", "crop_right", "crop_top", "crop_bottom"

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    height, width = params["shape"][:2]

    x_min = random.randint(0, int(self.crop_left * width))
    x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width)

    y_min = random.randint(0, int(self.crop_top * height))
    y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height)

    crop_coords = x_min, y_min, x_max, y_max

    return {"crop_coords": crop_coords}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "crop_left", "crop_right", "crop_top", "crop_bottom"

`class RandomCropNearBBox` `(max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

Crop bbox from image with random shift by x,y coordinates

Parameters:

Name	Type	Description
`max_part_shift`	`float, (float, float`	Max shift in `height` and `width` dimensions relative to `cropping_bbox` dimension. If max_part_shift is a single float, the range will be (0, max_part_shift). Default (0, 0.3).
`cropping_bbox_key`	`str`	Additional target key for cropping box. Default `cropping_bbox`.
`cropping_box_key`	`str`	[Deprecated] Use `cropping_bbox_key` instead.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:

Python

>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class RandomCropNearBBox(_BaseCrop):
    """Crop bbox from image with random shift by x,y coordinates

    Args:
        max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
            to `cropping_bbox` dimension.
            If max_part_shift is a single float, the range will be (0, max_part_shift).
            Default (0, 0.3).
        cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
        cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
        >>>              bbox_params=BboxParams("pascal_voc"))
        >>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        max_part_shift: ZeroOneRangeType = (0, 0.3)
        cropping_bbox_key: str = Field(default="cropping_bbox", description="Additional target key for cropping box.")
        p: ProbabilityType = 1

    def __init__(
        self,
        max_part_shift: ScaleFloatType = (0, 0.3),
        cropping_bbox_key: str = "cropping_bbox",
        cropping_box_key: str | None = None,  # Deprecated
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        # Check for deprecated parameter and issue warning
        if cropping_box_key is not None:
            warn(
                "The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
                "Use 'cropping_bbox_key' instead.",
                DeprecationWarning,
                stacklevel=2,
            )
            # Ensure the new parameter is used even if the old one is passed
            cropping_bbox_key = cropping_box_key

        self.max_part_shift = cast(Tuple[float, float], max_part_shift)
        self.cropping_bbox_key = cropping_bbox_key

    @staticmethod
    def _clip_bbox(bbox: BoxInternalType, height: int, width: int) -> BoxInternalType:
        x_min, y_min, x_max, y_max = bbox
        x_min = np.clip(x_min, 0, width)
        y_min = np.clip(y_min, 0, height)

        x_max = np.clip(x_max, x_min, width)
        y_max = np.clip(y_max, y_min, height)
        return x_min, y_min, x_max, y_max

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[float, ...]]:
        bbox = data[self.cropping_bbox_key]

        height, width = params["shape"][:2]

        bbox = self._clip_bbox(bbox, height, width)

        h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
        w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

        x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
        x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

        y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
        y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

        crop_coords = self._clip_bbox((x_min, y_min, x_max, y_max), height, width)

        if crop_coords[0] == crop_coords[2] or crop_coords[1] == crop_coords[3]:
            crop_coords = fcrops.get_center_crop_coords(height, width, bbox[3] - bbox[1], bbox[2] - bbox[0])

        return {"crop_coords": crop_coords}

    @property
    def targets_as_params(self) -> list[str]:
        return [self.cropping_bbox_key]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_part_shift", "cropping_bbox_key"

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[float, ...]]:
    bbox = data[self.cropping_bbox_key]

    height, width = params["shape"][:2]

    bbox = self._clip_bbox(bbox, height, width)

    h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
    w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

    x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
    x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

    y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
    y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

    crop_coords = self._clip_bbox((x_min, y_min, x_max, y_max), height, width)

    if crop_coords[0] == crop_coords[2] or crop_coords[1] == crop_coords[3]:
        crop_coords = fcrops.get_center_crop_coords(height, width, bbox[3] - bbox[1], bbox[2] - bbox[0])

    return {"crop_coords": crop_coords}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "max_part_shift", "cropping_bbox_key"

`class RandomResizedCrop` `(size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

Torchvision's variant of crop a random part of the input and rescale it to some size.

Parameters:

Name	Type	Description
`size`	`int, int`	expected output size of the crop, for each edge. If size is an int instead of sequence like (height, width), a square output size (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
`scale`	`float, float`	Specifies the lower and upper bounds for the random area of the crop, before resizing. The scale is defined with respect to the area of the original image.
`ratio`	`float, float`	lower and upper bounds for the random aspect ratio of the crop, before resizing.
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class RandomResizedCrop(_BaseRandomSizedCrop):
    """Torchvision's variant of crop a random part of the input and rescale it to some size.

    Args:
        size (int, int): expected output size of the crop, for each edge. If size is an int instead of sequence
            like (height, width), a square output size (size, size) is made. If provided a sequence of length 1,
            it will be interpreted as (size[0], size[0]).
        scale ((float, float)): Specifies the lower and upper bounds for the random area of the crop, before resizing.
            The scale is defined with respect to the area of the original image.
        ratio ((float, float)): lower and upper bounds for the random aspect ratio of the crop, before resizing.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: Annotated[tuple[float, float], AfterValidator(check_01)] = (0.08, 1.0)
        ratio: Annotated[tuple[float, float], AfterValidator(check_0plus)] = (0.75, 1.3333333333333333)
        width: int | None = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        height: int | None = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        size: ScaleIntType | None = None
        p: ProbabilityType = 1
        interpolation: InterpolationType = cv2.INTER_LINEAR

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)

            return self

    def __init__(
        self,
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: ScaleIntType | None = None,
        width: int | None = None,
        height: int | None = None,
        *,
        scale: tuple[float, float] = (0.08, 1.0),
        ratio: tuple[float, float] = (0.75, 1.3333333333333333),
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(size=cast(Tuple[int, int], size), interpolation=interpolation, p=p, always_apply=always_apply)
        self.scale = scale
        self.ratio = ratio

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_height, image_width = params["shape"][:2]
        area = image_height * image_width

        for _ in range(10):
            target_area = random.uniform(*self.scale) * area
            log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
            aspect_ratio = math.exp(random.uniform(*log_ratio))

            width = int(round(math.sqrt(target_area * aspect_ratio)))
            height = int(round(math.sqrt(target_area / aspect_ratio)))

            if 0 < width <= image_width and 0 < height <= image_height:
                i = random.randint(0, image_height - height)
                j = random.randint(0, image_width - width)

                h_start = i * 1.0 / (image_height - height + 1e-10)
                w_start = j * 1.0 / (image_width - width + 1e-10)

                crop_coords = fcrops.get_crop_coords(image_height, image_width, height, width, h_start, w_start)

                return {"crop_coords": crop_coords}

        # Fallback to central crop
        in_ratio = image_width / image_height
        if in_ratio < min(self.ratio):
            width = image_width
            height = int(round(image_width / min(self.ratio)))
        elif in_ratio > max(self.ratio):
            height = image_height
            width = int(round(height * max(self.ratio)))
        else:  # whole image
            width = image_width
            height = image_height

        i = (image_height - height) // 2
        j = (image_width - width) // 2

        h_start = i * 1.0 / (image_height - height + 1e-10)
        w_start = j * 1.0 / (image_width - width + 1e-10)

        crop_coords = fcrops.get_crop_coords(image_height, image_width, height, width, h_start, w_start)

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "size", "scale", "ratio", "interpolation"

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_height, image_width = params["shape"][:2]
    area = image_height * image_width

    for _ in range(10):
        target_area = random.uniform(*self.scale) * area
        log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
        aspect_ratio = math.exp(random.uniform(*log_ratio))

        width = int(round(math.sqrt(target_area * aspect_ratio)))
        height = int(round(math.sqrt(target_area / aspect_ratio)))

        if 0 < width <= image_width and 0 < height <= image_height:
            i = random.randint(0, image_height - height)
            j = random.randint(0, image_width - width)

            h_start = i * 1.0 / (image_height - height + 1e-10)
            w_start = j * 1.0 / (image_width - width + 1e-10)

            crop_coords = fcrops.get_crop_coords(image_height, image_width, height, width, h_start, w_start)

            return {"crop_coords": crop_coords}

    # Fallback to central crop
    in_ratio = image_width / image_height
    if in_ratio < min(self.ratio):
        width = image_width
        height = int(round(image_width / min(self.ratio)))
    elif in_ratio > max(self.ratio):
        height = image_height
        width = int(round(height * max(self.ratio)))
    else:  # whole image
        width = image_width
        height = image_height

    i = (image_height - height) // 2
    j = (image_width - width) // 2

    h_start = i * 1.0 / (image_height - height + 1e-10)
    w_start = j * 1.0 / (image_width - width + 1e-10)

    crop_coords = fcrops.get_crop_coords(image_height, image_width, height, width, h_start, w_start)

    return {"crop_coords": crop_coords}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "size", "scale", "ratio", "interpolation"

`class RandomSizedBBoxSafeCrop` `(height, width, erosion_rate=0.0, interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

Crop a random part of the input and rescale it to some size without loss of bboxes.

Parameters:

Name	Type	Description
`height`	`int`	height after crop and resize.
`width`	`int`	width after crop and resize.
`erosion_rate`	`float`	erosion rate applied on input image height before crop.
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
    """Crop a random part of the input and rescale it to some size without loss of bboxes.

    Args:
        height: height after crop and resize.
        width: width after crop and resize.
        erosion_rate: erosion rate applied on input image height before crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(CropInitSchema):
        erosion_rate: float = Field(
            default=0.0,
            ge=0.0,
            le=1.0,
            description="Erosion rate applied on input image height before crop.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR

    def __init__(
        self,
        height: int,
        width: int,
        erosion_rate: float = 0.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(erosion_rate=erosion_rate, p=p, always_apply=always_apply)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        crop = fcrops.crop(img, *crop_coords)
        return fgeometric.resize(crop, self.height, self.width, self.interpolation)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> KeypointInternalType:
        keypoint = fcrops.crop_keypoint_by_coords(keypoint, crop_coords)

        crop_height = crop_coords[3] - crop_coords[1]
        crop_width = crop_coords[2] - crop_coords[0]

        scale_y = self.height / crop_height
        scale_x = self.width / crop_width
        return fgeometric.keypoint_scale(keypoint, scale_x=scale_x, scale_y=scale_y)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "height", "width", "interpolation")

`apply (self, img, crop_coords, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    crop_coords: tuple[int, int, int, int],
    **params: Any,
) -> np.ndarray:
    crop = fcrops.crop(img, *crop_coords)
    return fgeometric.resize(crop, self.height, self.width, self.interpolation)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (*super().get_transform_init_args_names(), "height", "width", "interpolation")

`class RandomSizedCrop` `(min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

Crop a random portion of the input and rescale it to a specific size.

Parameters:

Name	Type	Description
`min_max_height`	`int, int`	crop size limits.
`size`	`int, int`	target size for the output image, i.e. (height, width) after crop and resize
`w2h_ratio`	`float`	aspect ratio of crop.
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py

Python

class RandomSizedCrop(_BaseRandomSizedCrop):
    """Crop a random portion of the input and rescale it to a specific size.

    Args:
        min_max_height ((int, int)): crop size limits.
        size ((int, int)): target size for the output image, i.e. (height, width) after crop and resize
        w2h_ratio (float): aspect ratio of crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1
        min_max_height: OnePlusIntRangeType
        w2h_ratio: Annotated[float, Field(gt=0, description="Aspect ratio of crop.")]
        width: int | None = Field(
            None,
            deprecated=(
                "Initializing with 'size' as an integer and a separate 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        height: int | None = Field(
            None,
            deprecated=(
                "Initializing with 'height' and 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        size: ScaleIntType | None = None

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)
            return self

    def __init__(
        self,
        min_max_height: tuple[int, int],
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: ScaleIntType | None = None,
        width: int | None = None,
        height: int | None = None,
        *,
        w2h_ratio: float = 1.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(size=cast(Tuple[int, int], size), interpolation=interpolation, p=p, always_apply=always_apply)
        self.min_max_height = min_max_height
        self.w2h_ratio = w2h_ratio

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_height, image_width = params["shape"][:2]

        crop_height = random.randint(self.min_max_height[0], self.min_max_height[1])
        crop_width = int(crop_height * self.w2h_ratio)

        h_start = random.random()
        w_start = random.random()

        crop_coords = fcrops.get_crop_coords(image_height, image_width, crop_height, crop_width, h_start, w_start)

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "min_max_height", "size", "w2h_ratio", "interpolation"

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_height, image_width = params["shape"][:2]

    crop_height = random.randint(self.min_max_height[0], self.min_max_height[1])
    crop_width = int(crop_height * self.w2h_ratio)

    h_start = random.random()
    w_start = random.random()

    crop_coords = fcrops.get_crop_coords(image_height, image_width, crop_height, crop_width, h_start, w_start)

    return {"crop_coords": crop_coords}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "min_max_height", "size", "w2h_ratio", "interpolation"

`domain_adaptation` ¶

`class FDA` `(reference_images, beta_limit=(0, 0.1), read_fn=<function read_rgb_image at 0x7f7a777ca520>, always_apply=None, p=0.5)` [view source on GitHub] ¶

Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source and target datasets, effectively adapting images from one domain to closely resemble those from another without altering their semantic content.

This transform is particularly beneficial in scenarios where the training (source) and testing (target) images come from different distributions, such as synthetic versus real images, or day versus night scenes. Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain alignment by swapping low-frequency components of the Fourier transform between the source and target images. This technique has shown to improve the performance of models on the target domain, particularly for tasks like semantic segmentation, without additional training for domain invariance.

The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more of the original image's characteristics and higher values leading to more pronounced adaptation effects. It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

Parameters:

Name	Type	Description
`reference_images`	`Sequence[Any]`	Sequence of objects to be converted into images by `read_fn`. This typically involves paths to images that serve as target domain examples for adaptation.
`beta_limit`	`float or tuple of float`	Coefficient beta from the paper, controlling the swapping extent of frequency components. Values should be less than 0.5.
`read_fn`	`Callable`	User-defined function for reading images. It takes an element from `reference_images` and returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a numpy array.

Targets

image

Image types: uint8, float32

Reference

Examples:

Python

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

Note

FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target domain samples are unavailable. It enables significant improvements in model generalization by aligning the low-level statistics of source and target images through a simple yet effective Fourier-based method.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py

Python

class FDA(ImageOnlyTransform):
    """Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation
    (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source
    and target datasets, effectively adapting images from one domain to closely resemble those from another without
    altering their semantic content.

    This transform is particularly beneficial in scenarios where the training (source) and testing (target) images
    come from different distributions, such as synthetic versus real images, or day versus night scenes.
    Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain
    alignment by swapping low-frequency components of the Fourier transform between the source and target images.
    This technique has shown to improve the performance of models on the target domain, particularly for tasks
    like semantic segmentation, without additional training for domain invariance.

    The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more
    of the original image's characteristics and higher values leading to more pronounced adaptation effects.
    It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

    Args:
        reference_images (Sequence[Any]): Sequence of objects to be converted into images by `read_fn`. This typically
            involves paths to images that serve as target domain examples for adaptation.
        beta_limit (float or tuple of float): Coefficient beta from the paper, controlling the swapping extent of
            frequency components. Values should be less than 0.5.
        read_fn (Callable): User-defined function for reading images. It takes an element from `reference_images` and
            returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a
            numpy array.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        - https://github.com/YanchaoYang/FDA
        - https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)

    Note:
        FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target
        domain samples are unavailable. It enables significant improvements in model generalization by aligning
        the low-level statistics of source and target images through a simple yet effective Fourier-based method.
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        read_fn: Callable[[Any], np.ndarray]
        beta_limit: NonNegativeFloatRangeType = (0, 0.1)

        @field_validator("beta_limit")
        @classmethod
        def check_ranges(cls, value: tuple[float, float]) -> tuple[float, float]:
            bounds = 0, MAX_BETA_LIMIT
            if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
                raise ValueError(f"Values should be in the range {bounds} got {value} ")
            return value

    def __init__(
        self,
        reference_images: Sequence[Any],
        beta_limit: ScaleFloatType = (0, 0.1),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.beta_limit = cast(Tuple[float, float], beta_limit)

    def apply(
        self,
        img: np.ndarray,
        target_image: np.ndarray,
        beta: float,
        **params: Any,
    ) -> np.ndarray:
        return fourier_domain_adaptation(img, target_image, beta)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        target_img = self.read_fn(random.choice(self.reference_images))
        target_img = cv2.resize(target_img, dsize=(params["shape"][1], params["shape"][0]))

        return {"target_image": target_img}

    def get_params(self) -> dict[str, float]:
        return {"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}

    def get_transform_init_args_names(self) -> tuple[str, str, str]:
        return "reference_images", "beta_limit", "read_fn"

    def to_dict_private(self) -> dict[str, Any]:
        msg = "FDA can not be serialized."
        raise NotImplementedError(msg)

`apply (self, img, target_image, beta, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def apply(
    self,
    img: np.ndarray,
    target_image: np.ndarray,
    beta: float,
    **params: Any,
) -> np.ndarray:
    return fourier_domain_adaptation(img, target_image, beta)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_params(self) -> dict[str, float]:
    return {"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    target_img = self.read_fn(random.choice(self.reference_images))
    target_img = cv2.resize(target_img, dsize=(params["shape"][1], params["shape"][0]))

    return {"target_image": target_img}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str]:
    return "reference_images", "beta_limit", "read_fn"

`class HistogramMatching` `(reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f7a777ca520>, always_apply=None, p=0.5)` [view source on GitHub] ¶

Implements histogram matching, a technique that adjusts the pixel values of an input image to match the histogram of a reference image. This adjustment ensures that the output image has a similar tone and contrast to the reference. The process is applied independently to each channel of multi-channel images, provided both the input and reference images have the same number of channels.

Histogram matching serves as an effective normalization method in image processing tasks such as feature matching. It is particularly useful when images originate from varied sources or are captured under different lighting conditions, helping to standardize the images' appearance before further processing.

Parameters:

Name	Type	Description
`reference_images`	`Sequence[Any]`	A sequence of objects to be converted into images by `read_fn`. Typically, this is a sequence of image paths.
`blend_ratio`	`tuple[float, float]`	Specifies the minimum and maximum blend ratio for blending the matched image with the original image. A random blend factor within this range is chosen for each image to increase the diversity of the output images.
`read_fn`	`Callable[[Any], np.ndarray]`	A user-defined function for reading images, which accepts an element from `reference_images` and returns a numpy array of image pixels. By default, this is expected to take a file path and return an image as a numpy array.
`p`	`float`	The probability of applying the transform to any given image. Defaults to 0.5.

Targets

image

Image types: uint8, float32

Note

This class cannot be serialized directly due to its dynamic nature and dependency on external image data. An attempt to serialize it will raise a NotImplementedError.

Reference

https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html

Examples:

Python

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py

Python

class HistogramMatching(ImageOnlyTransform):
    """Implements histogram matching, a technique that adjusts the pixel values of an input image
    to match the histogram of a reference image. This adjustment ensures that the output image
    has a similar tone and contrast to the reference. The process is applied independently to
    each channel of multi-channel images, provided both the input and reference images have the
    same number of channels.

    Histogram matching serves as an effective normalization method in image processing tasks such
    as feature matching. It is particularly useful when images originate from varied sources or are
    captured under different lighting conditions, helping to standardize the images' appearance
    before further processing.

    Args:
        reference_images (Sequence[Any]): A sequence of objects to be converted into images by `read_fn`.
            Typically, this is a sequence of image paths.
        blend_ratio (tuple[float, float]): Specifies the minimum and maximum blend ratio for blending the matched
            image with the original image. A random blend factor within this range is chosen for each image to
            increase the diversity of the output images.
        read_fn (Callable[[Any], np.ndarray]): A user-defined function for reading images, which accepts an
            element from `reference_images` and returns a numpy array of image pixels. By default, this is expected
            to take a file path and return an image as a numpy array.
        p (float): The probability of applying the transform to any given image. Defaults to 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        This class cannot be serialized directly due to its dynamic nature and dependency on external image data.
        An attempt to serialize it will raise a NotImplementedError.

    Reference:
        https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        blend_ratio: ZeroOneRangeType = (0.5, 1.0)
        read_fn: Callable[[Any], np.ndarray]

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: tuple[float, float] = (0.5, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio

    def apply(
        self: np.ndarray,
        img: np.ndarray,
        reference_image: np.ndarray,
        blend_ratio: float,
        **params: Any,
    ) -> np.ndarray:
        return apply_histogram(img, reference_image, blend_ratio)

    def get_params(self) -> dict[str, np.ndarray]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "reference_images", "blend_ratio", "read_fn"

    def to_dict_private(self) -> dict[str, Any]:
        msg = "HistogramMatching can not be serialized."
        raise NotImplementedError(msg)

`apply (self, img, reference_image, blend_ratio, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def apply(
    self: np.ndarray,
    img: np.ndarray,
    reference_image: np.ndarray,
    blend_ratio: float,
    **params: Any,
) -> np.ndarray:
    return apply_histogram(img, reference_image, blend_ratio)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_params(self) -> dict[str, np.ndarray]:
    return {
        "reference_image": self.read_fn(random.choice(self.reference_images)),
        "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "reference_images", "blend_ratio", "read_fn"

`class PixelDistributionAdaptation` `(reference_images, blend_ratio=(0.25, 1.0), read_fn=<function read_rgb_image at 0x7f7a777ca520>, transform_type='pca', always_apply=None, p=0.5)` [view source on GitHub] ¶

Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image with that of a reference image. This process involves fitting a simple statistical transformation (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images, transforming the original image with the transformation trained on it, and then applying the inverse transformation using the transform fitted on the reference image. The result is an adapted image that retains the original content while mimicking the pixel value distribution of the reference domain.

The process can be visualized as two main steps: 1. Adjusting the original image to a standard distribution space using a selected transform. 2. Moving the adjusted image into the distribution space of the reference image by applying the inverse of the transform fitted on the reference image.

This technique is especially useful in scenarios where images from different domains (e.g., synthetic vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in image processing tasks.

Parameters:

Name	Type	Description
`reference_images`	`Sequence[Any]`	A sequence of objects (typically image paths) that will be converted into images by `read_fn`. These images serve as references for the domain adaptation.
`blend_ratio`	`tuple[float, float]`	Specifies the minimum and maximum blend ratio for mixing the adapted image with the original, enhancing the diversity of the output images.
`read_fn`	`Callable`	A user-defined function for reading and converting the objects in `reference_images` into numpy arrays. By default, it assumes these objects are image paths.
`transform_type`	`str`	Specifies the type of statistical transformation to apply. Supported values are "pca" for Principal Component Analysis, "standard" for StandardScaler, and "minmax" for MinMaxScaler.
`p`	`float`	The probability of applying the transform to any given image. Default is 1.0.

Targets

image

Image types: uint8, float32

Reference

For more information on the underlying approach, see: https://github.com/arsenyinfo/qudida

Note

The PixelDistributionAdaptation transform is a novel way to perform domain adaptation at the pixel level, suitable for adjusting images across different conditions without complex modeling. It is effective for preparing images before more advanced processing or analysis.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py

Python

class PixelDistributionAdaptation(ImageOnlyTransform):
    """Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image
    with that of a reference image. This process involves fitting a simple statistical transformation
    (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images,
    transforming the original image with the transformation trained on it, and then applying the inverse
    transformation using the transform fitted on the reference image. The result is an adapted image
    that retains the original content while mimicking the pixel value distribution of the reference domain.

    The process can be visualized as two main steps:
    1. Adjusting the original image to a standard distribution space using a selected transform.
    2. Moving the adjusted image into the distribution space of the reference image by applying the inverse
       of the transform fitted on the reference image.

    This technique is especially useful in scenarios where images from different domains (e.g., synthetic
    vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in
    image processing tasks.

    Args:
        reference_images (Sequence[Any]): A sequence of objects (typically image paths) that will be
            converted into images by `read_fn`. These images serve as references for the domain adaptation.
        blend_ratio (tuple[float, float]): Specifies the minimum and maximum blend ratio for mixing
            the adapted image with the original, enhancing the diversity of the output images.
        read_fn (Callable): A user-defined function for reading and converting the objects in
            `reference_images` into numpy arrays. By default, it assumes these objects are image paths.
        transform_type (str): Specifies the type of statistical transformation to apply. Supported values
            are "pca" for Principal Component Analysis, "standard" for StandardScaler, and "minmax" for
            MinMaxScaler.
        p (float): The probability of applying the transform to any given image. Default is 1.0.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        For more information on the underlying approach, see: https://github.com/arsenyinfo/qudida

    Note:
        The PixelDistributionAdaptation transform is a novel way to perform domain adaptation at the pixel level,
        suitable for adjusting images across different conditions without complex modeling. It is effective
        for preparing images before more advanced processing or analysis.
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        blend_ratio: ZeroOneRangeType = (0.25, 1.0)
        read_fn: Callable[[Any], np.ndarray]
        transform_type: Literal["pca", "standard", "minmax"]

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: tuple[float, float] = (0.25, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        transform_type: Literal["pca", "standard", "minmax"] = "pca",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio
        self.transform_type = transform_type

    @staticmethod
    def _validate_shape(img: np.ndarray) -> None:
        if is_grayscale_image(img) or is_multispectral_image(img):
            raise ValueError(
                f"Unexpected image shape: expected 3 dimensions, got {len(img.shape)}."
                f"Is it a grayscale or multispectral image? It's not supported for now.",
            )

    def ensure_uint8(self, img: np.ndarray) -> tuple[np.ndarray, bool]:
        if img.dtype == np.float32:
            if img.min() < 0 or img.max() > 1:
                message = (
                    "PixelDistributionAdaptation uses uint8 under the hood, so float32 should be converted,"
                    "Can not do it automatically when the image is out of [0..1] range."
                )
                raise TypeError(message)
            return clip(img * 255, np.uint8), True
        return img, False

    def apply(self, img: np.ndarray, reference_image: np.ndarray, blend_ratio: float, **params: Any) -> np.ndarray:
        self._validate_shape(img)
        reference_image, _ = self.ensure_uint8(reference_image)
        img, needs_reconvert = self.ensure_uint8(img)

        adapted = adapt_pixel_distribution(
            img,
            ref=reference_image,
            weight=blend_ratio,
            transform_type=self.transform_type,
        )

        return fmain.to_float(adapted) if needs_reconvert else adapted

    def get_params(self) -> dict[str, Any]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
        }

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "reference_images", "blend_ratio", "read_fn", "transform_type"

    def to_dict_private(self) -> dict[str, Any]:
        msg = "PixelDistributionAdaptation can not be serialized."
        raise NotImplementedError(msg)

`apply (self, img, reference_image, blend_ratio, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def apply(self, img: np.ndarray, reference_image: np.ndarray, blend_ratio: float, **params: Any) -> np.ndarray:
    self._validate_shape(img)
    reference_image, _ = self.ensure_uint8(reference_image)
    img, needs_reconvert = self.ensure_uint8(img)

    adapted = adapt_pixel_distribution(
        img,
        ref=reference_image,
        weight=blend_ratio,
        transform_type=self.transform_type,
    )

    return fmain.to_float(adapted) if needs_reconvert else adapted

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_params(self) -> dict[str, Any]:
    return {
        "reference_image": self.read_fn(random.choice(self.reference_images)),
        "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/domain_adaptation.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return "reference_images", "blend_ratio", "read_fn", "transform_type"

`domain_adaptation_functional` ¶

`class DomainAdapter` `(transformer, ref_img, color_conversions=(None, None))` [view source on GitHub] ¶

Source: https://github.com/arsenyinfo/qudida by Arseny Kravchenko

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation_functional.py

Python

class DomainAdapter:
    """Source: https://github.com/arsenyinfo/qudida by Arseny Kravchenko"""

    def __init__(
        self,
        transformer: TransformerInterface,
        ref_img: np.ndarray,
        color_conversions: tuple[None, None] = (None, None),
    ):
        self.color_in, self.color_out = color_conversions
        self.source_transformer = deepcopy(transformer)
        self.target_transformer = transformer
        self.target_transformer.fit(self.flatten(ref_img))

    def to_colorspace(self, img: np.ndarray) -> np.ndarray:
        return img if self.color_in is None else cv2.cvtColor(img, self.color_in)

    def from_colorspace(self, img: np.ndarray) -> np.ndarray:
        if self.color_out is None:
            return img
        return cv2.cvtColor(clip(img, np.uint8), self.color_out)

    def flatten(self, img: np.ndarray) -> np.ndarray:
        img = self.to_colorspace(img)
        img = fmain.to_float(img)
        return img.reshape(-1, 3)

    def reconstruct(self, pixels: np.ndarray, height: int, width: int) -> np.ndarray:
        pixels = (np.clip(pixels, 0, 1) * 255).astype("uint8")
        return self.from_colorspace(pixels.reshape(height, width, 3))

    @staticmethod
    def _pca_sign(x: np.ndarray) -> np.ndarray:
        return np.sign(np.trace(x.components_))

    def __call__(self, image: np.ndarray) -> np.ndarray:
        height, width = image.shape[:2]
        pixels = self.flatten(image)
        self.source_transformer.fit(pixels)

        # dirty hack to make sure colors are not inverted
        if (
            hasattr(self.target_transformer, "components_")
            and hasattr(self.source_transformer, "components_")
            and self._pca_sign(self.target_transformer) != self._pca_sign(self.source_transformer)
        ):
            self.target_transformer.components_ *= -1

        representation = self.source_transformer.transform(pixels)
        result = self.target_transformer.inverse_transform(representation)
        return self.reconstruct(result, height, width)

`dropout` `special` ¶

`channel_dropout` ¶

`class ChannelDropout` `(channel_drop_range=(1, 1), fill_value=0, always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly Drop Channels in the input Image.

Parameters:

Name	Type	Description
`channel_drop_range`	`int, int`	range from which we choose the number of channels to drop.
`fill_value`	`int, float`	pixel value for the dropped channel.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, uint16, unit32, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/channel_dropout.py

Python

class ChannelDropout(ImageOnlyTransform):
    """Randomly Drop Channels in the input Image.

    Args:
        channel_drop_range (int, int): range from which we choose the number of channels to drop.
        fill_value (int, float): pixel value for the dropped channel.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, uint16, unit32, float32

    """

    class InitSchema(BaseTransformInitSchema):
        channel_drop_range: OnePlusIntRangeType = (1, 1)
        fill_value: Annotated[ColorType, Field(description="Pixel value for the dropped channel.")]

    def __init__(
        self,
        channel_drop_range: tuple[int, int] = (1, 1),
        fill_value: float = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.channel_drop_range = channel_drop_range
        self.fill_value = fill_value

    def apply(self, img: np.ndarray, channels_to_drop: tuple[int, ...], **params: Any) -> np.ndarray:
        return channel_dropout(img, channels_to_drop, self.fill_value)

    def get_params_dependent_on_data(self, params: Mapping[str, Any], data: Mapping[str, Any]) -> dict[str, Any]:
        image = data["image"] if "image" in data else data["images"][0]
        num_channels = get_num_channels(image)

        if num_channels == 1:
            msg = "Images has one channel. ChannelDropout is not defined."
            raise NotImplementedError(msg)

        if self.channel_drop_range[1] >= num_channels:
            msg = "Can not drop all channels in ChannelDropout."
            raise ValueError(msg)

        num_drop_channels = random.randint(self.channel_drop_range[0], self.channel_drop_range[1])

        channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)

        return {"channels_to_drop": channels_to_drop}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "channel_drop_range", "fill_value"

`apply (self, img, channels_to_drop, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/dropout/channel_dropout.py

Python

def apply(self, img: np.ndarray, channels_to_drop: tuple[int, ...], **params: Any) -> np.ndarray:
    return channel_dropout(img, channels_to_drop, self.fill_value)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/channel_dropout.py

Python

def get_params_dependent_on_data(self, params: Mapping[str, Any], data: Mapping[str, Any]) -> dict[str, Any]:
    image = data["image"] if "image" in data else data["images"][0]
    num_channels = get_num_channels(image)

    if num_channels == 1:
        msg = "Images has one channel. ChannelDropout is not defined."
        raise NotImplementedError(msg)

    if self.channel_drop_range[1] >= num_channels:
        msg = "Can not drop all channels in ChannelDropout."
        raise ValueError(msg)

    num_drop_channels = random.randint(self.channel_drop_range[0], self.channel_drop_range[1])

    channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)

    return {"channels_to_drop": channels_to_drop}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/channel_dropout.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "channel_drop_range", "fill_value"

`coarse_dropout` ¶

`class CoarseDropout` `(max_holes=None, max_height=None, max_width=None, min_holes=None, min_height=None, min_width=None, fill_value=0, mask_fill_value=None, num_holes_range=(1, 1), hole_height_range=(8, 8), hole_width_range=(8, 8), always_apply=None, p=0.5)` [view source on GitHub] ¶

CoarseDropout randomly drops out rectangular regions from the image and optionally, the corresponding regions in an associated mask, to simulate the occlusion and varied object sizes found in real-world settings. This transformation is an evolution of CutOut and RandomErasing, offering more flexibility in the size, number of dropout regions, and fill values.

Parameters:

Name	Type	Description
`num_holes_range`	`tuple[int, int]`	Specifies the range (minimum and maximum) of the number of rectangular regions to zero out. This allows for dynamic variation in the number of regions removed per transformation instance.
`hole_height_range`	`tuple[ScalarType, ScalarType]`	Defines the minimum and maximum heights of the dropout regions, providing variability in their vertical dimensions.
`hole_width_range`	`tuple[ScalarType, ScalarType]`	Defines the minimum and maximum widths of the dropout regions, providing variability in their horizontal dimensions.
`fill_value`	`ColorType, Literal["random"]`	Specifies the value used to fill the dropout regions. This can be a constant value, a tuple specifying pixel intensity across channels, or 'random' which fills the region with random noise.
`mask_fill_value`	`ColorType \| None`	Specifies the fill value for dropout regions in the mask. If set to `None`, the mask regions corresponding to the image dropout regions are left unchanged.

Targets

image, mask, keypoints

Image types: uint8, float32

Reference

https://arxiv.org/abs/1708.04552 https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/coarse_dropout.py

Python

class CoarseDropout(DualTransform):
    """CoarseDropout randomly drops out rectangular regions from the image and optionally,
    the corresponding regions in an associated mask, to simulate the occlusion and
    varied object sizes found in real-world settings. This transformation is an
    evolution of CutOut and RandomErasing, offering more flexibility in the size,
    number of dropout regions, and fill values.

    Args:
        num_holes_range (tuple[int, int]): Specifies the range (minimum and maximum)
            of the number of rectangular regions to zero out. This allows for dynamic
            variation in the number of regions removed per transformation instance.
        hole_height_range (tuple[ScalarType, ScalarType]): Defines the minimum and
            maximum heights of the dropout regions, providing variability in their vertical dimensions.
        hole_width_range (tuple[ScalarType, ScalarType]): Defines the minimum and
            maximum widths of the dropout regions, providing variability in their horizontal dimensions.
        fill_value (ColorType, Literal["random"]): Specifies the value used to fill the dropout regions.
            This can be a constant value, a tuple specifying pixel intensity across channels, or 'random'
            which fills the region with random noise.
        mask_fill_value (ColorType | None): Specifies the fill value for dropout regions in the mask.
            If set to `None`, the mask regions corresponding to the image dropout regions are left unchanged.


    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/1708.04552
        https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
        https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        min_holes: int | None = Field(
            default=None,
            ge=0,
            description="Minimum number of regions to zero out.",
        )
        max_holes: int | None = Field(
            default=8,
            ge=0,
            description="Maximum number of regions to zero out.",
        )
        num_holes_range: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] = (1, 1)

        min_height: ScalarType | None = Field(
            default=None,
            ge=0,
            description="Minimum height of the hole.",
        )
        max_height: ScalarType | None = Field(
            default=8,
            ge=0,
            description="Maximum height of the hole.",
        )
        hole_height_range: tuple[ScalarType, ScalarType] = (8, 8)

        min_width: ScalarType | None = Field(
            default=None,
            ge=0,
            description="Minimum width of the hole.",
        )
        max_width: ScalarType | None = Field(
            default=8,
            ge=0,
            description="Maximum width of the hole.",
        )
        hole_width_range: tuple[ScalarType, ScalarType] = (8, 8)

        fill_value: ColorType | Literal["random"] = Field(default=0, description="Value for dropped pixels.")
        mask_fill_value: ColorType | None = Field(default=None, description="Fill value for dropped pixels in mask.")

        @staticmethod
        def update_range(
            min_value: NumericType | None,
            max_value: NumericType | None,
            default_range: tuple[NumericType, NumericType],
        ) -> tuple[NumericType, NumericType]:
            if max_value is not None:
                return (min_value or max_value, max_value)

            return default_range

        @staticmethod
        # Validation for hole dimensions ranges
        def validate_range(range_value: tuple[ScalarType, ScalarType], range_name: str, minimum: float = 0) -> None:
            if not minimum <= range_value[0] <= range_value[1]:
                raise ValueError(
                    f"First value in {range_name} should be less or equal than the second value "
                    f"and at least {minimum}. Got: {range_value}",
                )
            if isinstance(range_value[0], float) and not all(0 <= x <= 1 for x in range_value):
                raise ValueError(f"All values in {range_name} should be in [0, 1] range. Got: {range_value}")

        @model_validator(mode="after")
        def check_num_holes_and_dimensions(self) -> Self:
            if self.min_holes is not None:
                warn("`min_holes` is deprecated. Use num_holes_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_holes is not None:
                warn("`max_holes` is deprecated. Use num_holes_range instead.", DeprecationWarning, stacklevel=2)

            if self.min_height is not None:
                warn("`min_height` is deprecated. Use hole_height_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_height is not None:
                warn("`max_height` is deprecated. Use hole_height_range instead.", DeprecationWarning, stacklevel=2)

            if self.min_width is not None:
                warn("`min_width` is deprecated. Use hole_width_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_width is not None:
                warn("`max_width` is deprecated. Use hole_width_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_holes is not None:
                # Update ranges for holes, heights, and widths
                self.num_holes_range = self.update_range(self.min_holes, self.max_holes, self.num_holes_range)

            self.validate_range(self.num_holes_range, "num_holes_range", minimum=1)

            if self.max_height is not None:
                self.hole_height_range = self.update_range(self.min_height, self.max_height, self.hole_height_range)
            self.validate_range(self.hole_height_range, "hole_height_range")

            if self.max_width is not None:
                self.hole_width_range = self.update_range(self.min_width, self.max_width, self.hole_width_range)
            self.validate_range(self.hole_width_range, "hole_width_range")

            return self

    def __init__(
        self,
        max_holes: int | None = None,
        max_height: ScalarType | None = None,
        max_width: ScalarType | None = None,
        min_holes: int | None = None,
        min_height: ScalarType | None = None,
        min_width: ScalarType | None = None,
        fill_value: ColorType | Literal["random"] = 0,
        mask_fill_value: ColorType | None = None,
        num_holes_range: tuple[int, int] = (1, 1),
        hole_height_range: tuple[ScalarType, ScalarType] = (8, 8),
        hole_width_range: tuple[ScalarType, ScalarType] = (8, 8),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.num_holes_range = num_holes_range
        self.hole_height_range = hole_height_range
        self.hole_width_range = hole_width_range

        self.fill_value = fill_value  # type: ignore[assignment]
        self.mask_fill_value = mask_fill_value

    def apply(
        self,
        img: np.ndarray,
        fill_value: ColorType | Literal["random"],
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        return cutout(img, holes, fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        mask_fill_value: ScalarType,
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if mask_fill_value is None:
            return mask
        return cutout(mask, holes, mask_fill_value)

    @staticmethod
    def calculate_hole_dimensions(
        height: int,
        width: int,
        height_range: tuple[ScalarType, ScalarType],
        width_range: tuple[ScalarType, ScalarType],
    ) -> tuple[int, int]:
        """Calculate random hole dimensions based on the provided ranges."""
        if isinstance(height_range[0], int):
            min_height = height_range[0]
            max_height = height_range[1]

            min_width = width_range[0]
            max_width = width_range[1]
            max_height = min(max_height, height)
            max_width = min(max_width, width)
            hole_height = random.randint(int(min_height), int(max_height))
            hole_width = random.randint(int(min_width), int(max_width))

        else:  # Assume float
            hole_height = int(height * random.uniform(height_range[0], height_range[1]))
            hole_width = int(width * random.uniform(width_range[0], width_range[1]))
        return hole_height, hole_width

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        holes = []
        num_holes = random.randint(self.num_holes_range[0], self.num_holes_range[1])

        for _ in range(num_holes):
            hole_height, hole_width = self.calculate_hole_dimensions(
                height,
                width,
                self.hole_height_range,
                self.hole_width_range,
            )

            y1 = random.randint(0, height - hole_height)
            x1 = random.randint(0, width - hole_width)
            y2 = y1 + hole_height
            x2 = x1 + hole_width
            holes.append((x1, y1, x2, y2))

        return {"holes": holes}

    def apply_to_keypoints(
        self,
        keypoints: Sequence[KeypointType],
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> list[KeypointType]:
        return [keypoint for keypoint in keypoints if not any(keypoint_in_hole(keypoint, hole) for hole in holes)]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "num_holes_range",
            "hole_height_range",
            "hole_width_range",
            "fill_value",
            "mask_fill_value",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "keypoints": self.apply_to_keypoints,
        }

`apply (self, img, fill_value, holes, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/dropout/coarse_dropout.py

Python

def apply(
    self,
    img: np.ndarray,
    fill_value: ColorType | Literal["random"],
    holes: Iterable[tuple[int, int, int, int]],
    **params: Any,
) -> np.ndarray:
    return cutout(img, holes, fill_value)

`calculate_hole_dimensions (height, width, height_range, width_range)` `staticmethod` ¶

Calculate random hole dimensions based on the provided ranges.

Source code in albumentations/augmentations/dropout/coarse_dropout.py

Python

@staticmethod
def calculate_hole_dimensions(
    height: int,
    width: int,
    height_range: tuple[ScalarType, ScalarType],
    width_range: tuple[ScalarType, ScalarType],
) -> tuple[int, int]:
    """Calculate random hole dimensions based on the provided ranges."""
    if isinstance(height_range[0], int):
        min_height = height_range[0]
        max_height = height_range[1]

        min_width = width_range[0]
        max_width = width_range[1]
        max_height = min(max_height, height)
        max_width = min(max_width, width)
        hole_height = random.randint(int(min_height), int(max_height))
        hole_width = random.randint(int(min_width), int(max_width))

    else:  # Assume float
        hole_height = int(height * random.uniform(height_range[0], height_range[1]))
        hole_width = int(width * random.uniform(width_range[0], width_range[1]))
    return hole_height, hole_width

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/coarse_dropout.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    holes = []
    num_holes = random.randint(self.num_holes_range[0], self.num_holes_range[1])

    for _ in range(num_holes):
        hole_height, hole_width = self.calculate_hole_dimensions(
            height,
            width,
            self.hole_height_range,
            self.hole_width_range,
        )

        y1 = random.randint(0, height - hole_height)
        x1 = random.randint(0, width - hole_width)
        y2 = y1 + hole_height
        x2 = x1 + hole_width
        holes.append((x1, y1, x2, y2))

    return {"holes": holes}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/coarse_dropout.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "num_holes_range",
        "hole_height_range",
        "hole_width_range",
        "fill_value",
        "mask_fill_value",
    )

`functional` ¶

`def cutout (img, holes, fill_value=0)` [view source on GitHub]¶

Apply cutout augmentation to the image by cutting out holes and filling them with either a given value or random noise.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	The image to augment.
`holes`	`Iterable[tuple[int, int, int, int]]`	An iterable of tuples where each tuple contains the coordinates of the top-left and bottom-right corners of the rectangular hole (x1, y1, x2, y2).
`fill_value`	`Union[ColorType, Literal["random"]]`	The fill value to use for the hole. Can be a single integer, a tuple or list of numbers for multichannel, or the string "random" to fill with random noise.

Returns:

Type	Description
`np.ndarray`	The augmented image.

Source code in albumentations/augmentations/dropout/functional.py

Python

def cutout(
    img: np.ndarray,
    holes: Iterable[tuple[int, int, int, int]],
    fill_value: ColorType | Literal["random"] = 0,
) -> np.ndarray:
    """Apply cutout augmentation to the image by cutting out holes and filling them
    with either a given value or random noise.

    Args:
        img (np.ndarray): The image to augment.
        holes (Iterable[tuple[int, int, int, int]]): An iterable of tuples where each
            tuple contains the coordinates of the top-left and bottom-right corners of
            the rectangular hole (x1, y1, x2, y2).
        fill_value (Union[ColorType, Literal["random"]]): The fill value to use for the hole. Can be
            a single integer, a tuple or list of numbers for multichannel,
            or the string "random" to fill with random noise.

    Returns:
        np.ndarray: The augmented image.
    """
    img = img.copy()

    if isinstance(fill_value, (int, float, tuple, list)):
        fill_value = np.array(fill_value, dtype=img.dtype)

    for x1, y1, x2, y2 in holes:
        if isinstance(fill_value, str) and fill_value == "random":
            shape = (y2 - y1, x2 - x1) if img.ndim == MONO_CHANNEL_DIMENSIONS else (y2 - y1, x2 - x1, img.shape[2])
            random_fill = generate_random_fill(img.dtype, shape)
            img[y1:y2, x1:x2] = random_fill
        else:
            img[y1:y2, x1:x2] = fill_value

    return img

`def generate_random_fill (dtype, shape)` [view source on GitHub]¶

Generate a random fill based on dtype and target shape.

Source code in albumentations/augmentations/dropout/functional.py

Python

def generate_random_fill(dtype: np.dtype, shape: tuple[int, ...]) -> np.ndarray:
    """Generate a random fill based on dtype and target shape."""
    max_value = MAX_VALUES_BY_DTYPE[dtype]
    if np.issubdtype(dtype, np.integer):
        return random_utils.randint(0, max_value + 1, size=shape, dtype=dtype)
    if np.issubdtype(dtype, np.floating):
        return random_utils.uniform(0, max_value, size=shape).astype(dtype)
    raise ValueError(f"Unsupported dtype: {dtype}")

`grid_dropout` ¶

`class GridDropout` `(ratio=0.5, unit_size_min=None, unit_size_max=None, holes_number_x=None, holes_number_y=None, shift_x=None, shift_y=None, random_offset=False, fill_value=0, mask_fill_value=None, unit_size_range=None, holes_number_xy=None, shift_xy=(0, 0), always_apply=None, p=0.5)` [view source on GitHub] ¶

GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

Parameters:

Name	Type	Description
`ratio`	`float`	The ratio of the mask holes to the unit_size (same for horizontal and vertical directions). Must be between 0 and 1. Default: 0.5.
`random_offset`	`bool`	Whether to offset the grid randomly between 0 and grid unit size - hole size. If True, entered shift_x and shift_y are ignored and set randomly. Default: False.
`fill_value`	`Optional[ColorType]`	Value for the dropped pixels. Default: 0.
`mask_fill_value`	`Optional[ColorType]`	Value for the dropped pixels in mask. If None, transformation is not applied to the mask. Default: None.
`unit_size_range`	`Optional[tuple[int, int]]`	Range from which to sample grid size. Default: None. Must be between 2 and the image shorter edge.
`holes_number_xy`	`Optional[tuple[int, int]]`	The number of grid units in x and y directions. First value should be between 1 and image width//2, Second value should be between 1 and image height//2. Default: None.
`shift_xy`	`tuple[int, int]`	Offsets of the grid start in x and y directions. Offsets of the grid start in x and y directions from (0,0) coordinate. Default: (0, 0).
`p`	`float`	Probability of applying the transform. Default: 0.5.

Targets

image, mask

Image types: uint8, float32

Reference

https://arxiv.org/abs/2001.04086

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/grid_dropout.py

Python

class GridDropout(DualTransform):
    """GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

    Args:
        ratio (float): The ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
            Must be between 0 and 1. Default: 0.5.
        random_offset (bool): Whether to offset the grid randomly between 0 and grid unit size - hole size.
            If True, entered shift_x and shift_y are ignored and set randomly. Default: False.
        fill_value (Optional[ColorType]): Value for the dropped pixels. Default: 0.
        mask_fill_value (Optional[ColorType]): Value for the dropped pixels in mask.
            If None, transformation is not applied to the mask. Default: None.
        unit_size_range (Optional[tuple[int, int]]): Range from which to sample grid size. Default: None.
             Must be between 2 and the image shorter edge.
        holes_number_xy (Optional[tuple[int, int]]): The number of grid units in x and y directions.
            First value should be between 1 and image width//2,
            Second value should be between 1 and image height//2.
            Default: None.
        shift_xy (tuple[int, int]): Offsets of the grid start in x and y directions.
            Offsets of the grid start in x and y directions from (0,0) coordinate.
            Default: (0, 0).

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/2001.04086

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        ratio: float = Field(description="The ratio of the mask holes to the unit_size.", gt=0, le=1)

        unit_size_min: int | None = Field(None, description="Minimum size of the grid unit.", ge=2)
        unit_size_max: int | None = Field(None, description="Maximum size of the grid unit.", ge=2)

        holes_number_x: int | None = Field(None, description="The number of grid units in x direction.", ge=1)
        holes_number_y: int | None = Field(None, description="The number of grid units in y direction.", ge=1)

        shift_x: int | None = Field(0, description="Offsets of the grid start in x direction.", ge=0)
        shift_y: int | None = Field(0, description="Offsets of the grid start in y direction.", ge=0)

        random_offset: bool = Field(False, description="Whether to offset the grid randomly.")
        fill_value: ColorType | None = Field(0, description="Value for the dropped pixels.")
        mask_fill_value: ColorType | None = Field(None, description="Value for the dropped pixels in mask.")
        unit_size_range: (
            Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] | None
        ) = None
        shift_xy: Annotated[tuple[int, int], AfterValidator(check_0plus)] = Field(
            (0, 0),
            description="Offsets of the grid start in x and y directions.",
        )
        holes_number_xy: Annotated[tuple[int, int], AfterValidator(check_1plus)] | None = Field(
            None,
            description="The number of grid units in x and y directions.",
        )

        @model_validator(mode="after")
        def validate_normalization(self) -> Self:
            if self.unit_size_min is not None and self.unit_size_max is not None:
                self.unit_size_range = self.unit_size_min, self.unit_size_max
                warn(
                    "unit_size_min and unit_size_max are deprecated. Use unit_size_range instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.shift_x is not None and self.shift_y is not None:
                self.shift_xy = self.shift_x, self.shift_y
                warn("shift_x and shift_y are deprecated. Use shift_xy instead.", DeprecationWarning, stacklevel=2)

            if self.holes_number_x is not None and self.holes_number_y is not None:
                self.holes_number_xy = self.holes_number_x, self.holes_number_y
                warn(
                    "holes_number_x and holes_number_y are deprecated. Use holes_number_xy instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.unit_size_range and not MIN_UNIT_SIZE <= self.unit_size_range[0] <= self.unit_size_range[1]:
                raise ValueError("Max unit size should be >= min size, both at least 2 pixels.")

            return self

    def __init__(
        self,
        ratio: float = 0.5,
        unit_size_min: int | None = None,
        unit_size_max: int | None = None,
        holes_number_x: int | None = None,
        holes_number_y: int | None = None,
        shift_x: int | None = None,
        shift_y: int | None = None,
        random_offset: bool = False,
        fill_value: ColorType = 0,
        mask_fill_value: ColorType | None = None,
        unit_size_range: tuple[int, int] | None = None,
        holes_number_xy: tuple[int, int] | None = None,
        shift_xy: tuple[int, int] = (0, 0),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.ratio = ratio
        self.unit_size_range = unit_size_range
        self.holes_number_xy = holes_number_xy
        self.random_offset = random_offset
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value
        self.shift_xy = shift_xy

    def apply(self, img: np.ndarray, holes: Iterable[tuple[int, int, int, int]], **params: Any) -> np.ndarray:
        return fdropout.cutout(img, holes, self.fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if self.mask_fill_value is None:
            return mask

        return fdropout.cutout(mask, holes, self.mask_fill_value)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]
        unit_width, unit_height = self._calculate_unit_dimensions(width, height)
        hole_width, hole_height = self._calculate_hole_dimensions(unit_width, unit_height)
        shift_x, shift_y = self._calculate_shifts(unit_width, unit_height, hole_width, hole_height)
        holes = self._generate_holes(width, height, unit_width, unit_height, hole_width, hole_height, shift_x, shift_y)
        return {"holes": holes}

    def _calculate_unit_dimensions(self, width: int, height: int) -> tuple[int, int]:
        """Calculates the dimensions of the grid units."""
        if self.unit_size_range is not None:
            self._validate_unit_sizes(height, width)
            unit_size = random.randint(*self.unit_size_range)
            return unit_size, unit_size

        return self._calculate_dimensions_based_on_holes(width, height)

    def _validate_unit_sizes(self, height: int, width: int) -> None:
        """Validates the minimum and maximum unit sizes."""
        if self.unit_size_range is None:
            raise ValueError("unit_size_range must not be None.")
        if self.unit_size_range[1] > min(height, width):
            msg = "Grid size limits must be within the shortest image edge."
            raise ValueError(msg)

    def _calculate_dimensions_based_on_holes(self, width: int, height: int) -> tuple[int, int]:
        """Calculates dimensions based on the number of holes specified."""
        holes_number_x, holes_number_y = self.holes_number_xy or (None, None)
        unit_width = self._calculate_dimension(width, holes_number_x, 10)
        unit_height = self._calculate_dimension(height, holes_number_y, unit_width)
        return unit_width, unit_height

    @staticmethod
    def _calculate_dimension(dimension: int, holes_number: int | None, fallback: int) -> int:
        """Helper function to calculate unit width or height."""
        if holes_number is None:
            return max(2, dimension // fallback)

        if not 1 <= holes_number <= dimension // 2:
            raise ValueError(f"The number of holes must be between 1 and {dimension // 2}.")
        return dimension // holes_number

    def _calculate_hole_dimensions(self, unit_width: int, unit_height: int) -> tuple[int, int]:
        """Calculates the dimensions of the holes to be dropped out."""
        hole_width = int(unit_width * self.ratio)
        hole_height = int(unit_height * self.ratio)
        hole_width = min(max(hole_width, 1), unit_width - 1)
        hole_height = min(max(hole_height, 1), unit_height - 1)
        return hole_width, hole_height

    def _calculate_shifts(
        self,
        unit_width: int,
        unit_height: int,
        hole_width: int,
        hole_height: int,
    ) -> tuple[int, int]:
        """Calculates the shifts for the grid start."""
        if self.random_offset:
            shift_x = random.randint(0, unit_width - hole_width)
            shift_y = random.randint(0, unit_height - hole_height)
            return shift_x, shift_y

        if isinstance(self.shift_xy, Sequence) and len(self.shift_xy) == PAIR:
            shift_x = min(max(0, self.shift_xy[0]), unit_width - hole_width)
            shift_y = min(max(0, self.shift_xy[1]), unit_height - hole_height)
            return shift_x, shift_y

        return 0, 0

    def _generate_holes(
        self,
        width: int,
        height: int,
        unit_width: int,
        unit_height: int,
        hole_width: int,
        hole_height: int,
        shift_x: int,
        shift_y: int,
    ) -> list[tuple[int, int, int, int]]:
        """Generates the list of holes to be dropped out."""
        holes = []
        for i in range(width // unit_width + 1):
            for j in range(height // unit_height + 1):
                x1 = min(shift_x + unit_width * i, width)
                y1 = min(shift_y + unit_height * j, height)
                x2 = min(x1 + hole_width, width)
                y2 = min(y1 + hole_height, height)
                holes.append((x1, y1, x2, y2))
        return holes

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "ratio",
            "unit_size_range",
            "holes_number_xy",
            "shift_xy",
            "random_offset",
            "fill_value",
            "mask_fill_value",
        )

`apply (self, img, holes, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/dropout/grid_dropout.py

Python

def apply(self, img: np.ndarray, holes: Iterable[tuple[int, int, int, int]], **params: Any) -> np.ndarray:
    return fdropout.cutout(img, holes, self.fill_value)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/grid_dropout.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]
    unit_width, unit_height = self._calculate_unit_dimensions(width, height)
    hole_width, hole_height = self._calculate_hole_dimensions(unit_width, unit_height)
    shift_x, shift_y = self._calculate_shifts(unit_width, unit_height, hole_width, hole_height)
    holes = self._generate_holes(width, height, unit_width, unit_height, hole_width, hole_height, shift_x, shift_y)
    return {"holes": holes}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/grid_dropout.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "ratio",
        "unit_size_range",
        "holes_number_xy",
        "shift_xy",
        "random_offset",
        "fill_value",
        "mask_fill_value",
    )

`mask_dropout` ¶

`class MaskDropout` `(max_objects=(1, 1), image_fill_value=0, mask_fill_value=0, always_apply=None, p=0.5)` [view source on GitHub] ¶

Image & mask augmentation that zero out mask and image regions corresponding to randomly chosen object instance from mask.

Mask must be single-channel image, zero values treated as background. Image can be any number of channels.

Parameters:

Name	Type	Description
`max_objects`	`ScaleIntType`	Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
`image_fill_value`	`float \| Literal['inpaint']`	Fill value to use when filling image. Can be 'inpaint' to apply inpainting (works only for 3-channel images)
`mask_fill_value`	`ScalarType`	Fill value to use when filling mask.

Targets

image, mask

Image types: uint8, float32

Reference

https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/mask_dropout.py

Python

class MaskDropout(DualTransform):
    """Image & mask augmentation that zero out mask and image regions corresponding
    to randomly chosen object instance from mask.

    Mask must be single-channel image, zero values treated as background.
    Image can be any number of channels.

    Args:
        max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
        image_fill_value: Fill value to use when filling image.
            Can be 'inpaint' to apply inpainting (works only  for 3-channel images)
        mask_fill_value: Fill value to use when filling mask.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        max_objects: OnePlusIntRangeType = (1, 1)

        image_fill_value: float | Literal["inpaint"] = Field(
            default=0,
            description=(
                "Fill value to use when filling image. "
                "Can be 'inpaint' to apply inpainting (works only for 3-channel images)."
            ),
        )
        mask_fill_value: float = Field(default=0, description="Fill value to use when filling mask.")

    def __init__(
        self,
        max_objects: ScaleIntType = (1, 1),
        image_fill_value: float | Literal["inpaint"] = 0,
        mask_fill_value: ScalarType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.max_objects = cast(Tuple[int, int], max_objects)
        self.image_fill_value = image_fill_value
        self.mask_fill_value = mask_fill_value

    @property
    def targets_as_params(self) -> list[str]:
        return ["mask"]

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        mask = data["mask"]

        label_image, num_labels = label(mask, return_num=True)

        if num_labels == 0:
            dropout_mask = None
        else:
            objects_to_drop = random.randint(self.max_objects[0], self.max_objects[1])
            objects_to_drop = min(num_labels, objects_to_drop)

            if objects_to_drop == num_labels:
                dropout_mask = mask > 0
            else:
                labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
                dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=bool)
                for label_index in labels_index:
                    dropout_mask |= label_image == label_index

        params.update({"dropout_mask": dropout_mask})
        return params

    def apply(self, img: np.ndarray, dropout_mask: np.ndarray, **params: Any) -> np.ndarray:
        if dropout_mask is None:
            return img

        if self.image_fill_value == "inpaint":
            dropout_mask = dropout_mask.astype(np.uint8)
            _, _, width, height = cv2.boundingRect(dropout_mask)
            radius = min(3, max(width, height) // 2)
            return cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)

        img = img.copy()
        img[dropout_mask] = self.image_fill_value

        return img

    def apply_to_mask(self, mask: np.ndarray, dropout_mask: np.ndarray, **params: Any) -> np.ndarray:
        if dropout_mask is None:
            return mask

        mask = mask.copy()
        mask[dropout_mask] = self.mask_fill_value
        return mask

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_objects", "image_fill_value", "mask_fill_value"

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
        }

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`apply (self, img, dropout_mask, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/dropout/mask_dropout.py

Python

def apply(self, img: np.ndarray, dropout_mask: np.ndarray, **params: Any) -> np.ndarray:
    if dropout_mask is None:
        return img

    if self.image_fill_value == "inpaint":
        dropout_mask = dropout_mask.astype(np.uint8)
        _, _, width, height = cv2.boundingRect(dropout_mask)
        radius = min(3, max(width, height) // 2)
        return cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)

    img = img.copy()
    img[dropout_mask] = self.image_fill_value

    return img

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/mask_dropout.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    mask = data["mask"]

    label_image, num_labels = label(mask, return_num=True)

    if num_labels == 0:
        dropout_mask = None
    else:
        objects_to_drop = random.randint(self.max_objects[0], self.max_objects[1])
        objects_to_drop = min(num_labels, objects_to_drop)

        if objects_to_drop == num_labels:
            dropout_mask = mask > 0
        else:
            labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
            dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=bool)
            for label_index in labels_index:
                dropout_mask |= label_image == label_index

    params.update({"dropout_mask": dropout_mask})
    return params

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/mask_dropout.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "max_objects", "image_fill_value", "mask_fill_value"

`xy_masking` ¶

`class XYMasking` `(num_masks_x=0, num_masks_y=0, mask_x_length=0, mask_y_length=0, fill_value=0, mask_fill_value=0, always_apply=None, p=0.5)` [view source on GitHub] ¶

Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis), simulating occlusions. This transform is useful for training models to recognize images with varied visibility conditions. It's particularly effective for spectrogram images, allowing spectral and frequency masking to improve model robustness.

At least one of max_x_length or max_y_length must be specified, dictating the mask's maximum size along each axis.

Parameters:

Name	Type	Description
`num_masks_x`	`Union[int, tuple[int, int]]`	Number or range of horizontal regions to mask. Defaults to 0.
`num_masks_y`	`Union[int, tuple[int, int]]`	Number or range of vertical regions to mask. Defaults to 0.
`mask_x_length`	`[Union[int, tuple[int, int]]`	Specifies the length of the masks along the X (horizontal) axis. If an integer is provided, it sets a fixed mask length. If a tuple of two integers (min, max) is provided, the mask length is randomly chosen within this range for each mask. This allows for variable-length masks in the horizontal direction.
`mask_y_length`	`Union[int, tuple[int, int]]`	Specifies the height of the masks along the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height, while a tuple (min, max) allows for variable-height masks, chosen randomly within the specified range for each mask. This flexibility facilitates creating masks of various sizes in the vertical direction.
`fill_value`	`Union[int, float, list[int], list[float]]`	Value to fill image masks. Defaults to 0.
`mask_fill_value`	`Optional[Union[int, float, list[int], list[float]]]`	Value to fill masks in the mask. If `None`, uses mask is not affected. Default: `None`.
`p`	`float`	Probability of applying the transform. Defaults to 0.5.

Targets

image, mask, keypoints

Image types: uint8, float32

Note: Either max_x_length or max_y_length or both must be defined.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/xy_masking.py

Python

class XYMasking(DualTransform):
    """Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis),
    simulating occlusions. This transform is useful for training models to recognize images
    with varied visibility conditions. It's particularly effective for spectrogram images,
    allowing spectral and frequency masking to improve model robustness.

    At least one of `max_x_length` or `max_y_length` must be specified, dictating the mask's
    maximum size along each axis.

    Args:
        num_masks_x (Union[int, tuple[int, int]]): Number or range of horizontal regions to mask. Defaults to 0.
        num_masks_y (Union[int, tuple[int, int]]): Number or range of vertical regions to mask. Defaults to 0.
        mask_x_length ([Union[int, tuple[int, int]]): Specifies the length of the masks along
            the X (horizontal) axis. If an integer is provided, it sets a fixed mask length.
            If a tuple of two integers (min, max) is provided,
            the mask length is randomly chosen within this range for each mask.
            This allows for variable-length masks in the horizontal direction.
        mask_y_length (Union[int, tuple[int, int]]): Specifies the height of the masks along
            the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height,
            while a tuple (min, max) allows for variable-height masks, chosen randomly
            within the specified range for each mask. This flexibility facilitates creating masks of various
            sizes in the vertical direction.
        fill_value (Union[int, float, list[int], list[float]]): Value to fill image masks. Defaults to 0.
        mask_fill_value (Optional[Union[int, float, list[int], list[float]]]): Value to fill masks in the mask.
            If `None`, uses mask is not affected. Default: `None`.
        p (float): Probability of applying the transform. Defaults to 0.5.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Note: Either `max_x_length` or `max_y_length` or both must be defined.

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        num_masks_x: NonNegativeIntRangeType = 0
        num_masks_y: NonNegativeIntRangeType = 0
        mask_x_length: NonNegativeIntRangeType = 0
        mask_y_length: NonNegativeIntRangeType = 0

        fill_value: ColorType = Field(default=0, description="Value to fill image masks.")
        mask_fill_value: ColorType = Field(default=0, description="Value to fill masks in the mask.")

        @model_validator(mode="after")
        def check_mask_length(self) -> Self:
            if (
                isinstance(self.mask_x_length, int)
                and self.mask_x_length <= 0
                and isinstance(self.mask_y_length, int)
                and self.mask_y_length <= 0
            ):
                msg = "At least one of `mask_x_length` or `mask_y_length` Should be a positive number."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        num_masks_x: ScaleIntType = 0,
        num_masks_y: ScaleIntType = 0,
        mask_x_length: ScaleIntType = 0,
        mask_y_length: ScaleIntType = 0,
        fill_value: ColorType = 0,
        mask_fill_value: ColorType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.num_masks_x = cast(Tuple[int, int], num_masks_x)
        self.num_masks_y = cast(Tuple[int, int], num_masks_y)

        self.mask_x_length = cast(Tuple[int, int], mask_x_length)
        self.mask_y_length = cast(Tuple[int, int], mask_y_length)
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value

    def apply(
        self,
        img: np.ndarray,
        masks_x: list[tuple[int, int, int, int]],
        masks_y: list[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        return cutout(img, masks_x + masks_y, self.fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        masks_x: list[tuple[int, int, int, int]],
        masks_y: list[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if self.mask_fill_value is None:
            return mask
        return cutout(mask, masks_x + masks_y, self.mask_fill_value)

    def validate_mask_length(
        self,
        mask_length: tuple[int, int] | None,
        dimension_size: int,
        dimension_name: str,
    ) -> None:
        """Validate the mask length against the corresponding image dimension size.

        Args:
            mask_length (Optional[tuple[int, int]]): The length of the mask to be validated.
            dimension_size (int): The size of the image dimension (width or height)
                against which to validate the mask length.
            dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

        """
        if mask_length is not None:
            if isinstance(mask_length, (tuple, list)):
                if mask_length[0] < 0 or mask_length[1] > dimension_size:
                    raise ValueError(
                        f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]",
                    )
            elif mask_length < 0 or mask_length > dimension_size:
                raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, list[tuple[int, int, int, int]]]:
        height, width = params["shape"][:2]

        # Use the helper method to validate mask lengths against image dimensions
        self.validate_mask_length(self.mask_x_length, width, "mask_x_length")
        self.validate_mask_length(self.mask_y_length, height, "mask_y_length")

        masks_x = self.generate_masks(self.num_masks_x, width, height, self.mask_x_length, axis="x")
        masks_y = self.generate_masks(self.num_masks_y, width, height, self.mask_y_length, axis="y")

        return {"masks_x": masks_x, "masks_y": masks_y}

    @staticmethod
    def generate_mask_size(mask_length: tuple[int, int]) -> int:
        return random.randint(mask_length[0], mask_length[1])

    def generate_masks(
        self,
        num_masks: tuple[int, int],
        width: int,
        height: int,
        max_length: tuple[int, int] | None,
        axis: str,
    ) -> list[tuple[int, int, int, int]]:
        if max_length is None or max_length == 0 or isinstance(num_masks, (int, float)) and num_masks == 0:
            return []

        masks = []

        num_masks_integer = num_masks if isinstance(num_masks, int) else random.randint(num_masks[0], num_masks[1])

        for _ in range(num_masks_integer):
            length = self.generate_mask_size(max_length)

            if axis == "x":
                x1 = random.randint(0, width - length)
                y1 = 0
                x2, y2 = x1 + length, height
            else:  # axis == 'y'
                y1 = random.randint(0, height - length)
                x1 = 0
                x2, y2 = width, y1 + length

            masks.append((x1, y1, x2, y2))
        return masks

    def apply_to_keypoints(
        self,
        keypoints: Sequence[KeypointType],
        masks_x: list[tuple[int, int, int, int]],
        masks_y: list[tuple[int, int, int, int]],
        **params: Any,
    ) -> list[KeypointType]:
        return [
            keypoint
            for keypoint in keypoints
            if not any(keypoint_in_hole(keypoint, hole) for hole in masks_x + masks_y)
        ]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "num_masks_x",
            "num_masks_y",
            "mask_x_length",
            "mask_y_length",
            "fill_value",
            "mask_fill_value",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "keypoints": self.apply_to_keypoints,
        }

`apply (self, img, masks_x, masks_y, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/dropout/xy_masking.py

Python

def apply(
    self,
    img: np.ndarray,
    masks_x: list[tuple[int, int, int, int]],
    masks_y: list[tuple[int, int, int, int]],
    **params: Any,
) -> np.ndarray:
    return cutout(img, masks_x + masks_y, self.fill_value)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/xy_masking.py

Python

def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, list[tuple[int, int, int, int]]]:
    height, width = params["shape"][:2]

    # Use the helper method to validate mask lengths against image dimensions
    self.validate_mask_length(self.mask_x_length, width, "mask_x_length")
    self.validate_mask_length(self.mask_y_length, height, "mask_y_length")

    masks_x = self.generate_masks(self.num_masks_x, width, height, self.mask_x_length, axis="x")
    masks_y = self.generate_masks(self.num_masks_y, width, height, self.mask_y_length, axis="y")

    return {"masks_x": masks_x, "masks_y": masks_y}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/xy_masking.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "num_masks_x",
        "num_masks_y",
        "mask_x_length",
        "mask_y_length",
        "fill_value",
        "mask_fill_value",
    )

`validate_mask_length (self, mask_length, dimension_size, dimension_name)` ¶

Validate the mask length against the corresponding image dimension size.

Parameters:

Name	Type	Description
`mask_length`	`Optional[tuple[int, int]]`	The length of the mask to be validated.
`dimension_size`	`int`	The size of the image dimension (width or height) against which to validate the mask length.
`dimension_name`	`str`	The name of the dimension ('width' or 'height') for error messaging.

Source code in albumentations/augmentations/dropout/xy_masking.py

Python

def validate_mask_length(
    self,
    mask_length: tuple[int, int] | None,
    dimension_size: int,
    dimension_name: str,
) -> None:
    """Validate the mask length against the corresponding image dimension size.

    Args:
        mask_length (Optional[tuple[int, int]]): The length of the mask to be validated.
        dimension_size (int): The size of the image dimension (width or height)
            against which to validate the mask length.
        dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

    """
    if mask_length is not None:
        if isinstance(mask_length, (tuple, list)):
            if mask_length[0] < 0 or mask_length[1] > dimension_size:
                raise ValueError(
                    f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]",
                )
        elif mask_length < 0 or mask_length > dimension_size:
            raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")

`functional` ¶

`def add_fog (img, fog_coef, alpha_coef, haze_list)` [view source on GitHub]¶

Add fog to an image using the provided coefficients and haze points.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	The input image, expected to be a numpy array.
`fog_coef`	`float`	The fog coefficient, used to determine the intensity of the fog.
`alpha_coef`	`float`	The alpha coefficient, used to determine the transparency of the fog.
`haze_list`	`list[tuple[int, int]]`	A list of tuples, where each tuple represents the x and y coordinates of a haze point.

Returns:

Type	Description
`np.ndarray`	The output image with added fog, as a numpy array.

Exceptions:

Type	Description
`ValueError`	If the input image's dtype is not uint8 or float32.

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Source code in albumentations/augmentations/functional.py

Python

@preserve_channel_dim
def add_fog(img: np.ndarray, fog_coef: float, alpha_coef: float, haze_list: list[tuple[int, int]]) -> np.ndarray:
    """Add fog to an image using the provided coefficients and haze points.

    Args:
        img (np.ndarray): The input image, expected to be a numpy array.
        fog_coef (float): The fog coefficient, used to determine the intensity of the fog.
        alpha_coef (float): The alpha coefficient, used to determine the transparency of the fog.
        haze_list (list[tuple[int, int]]): A list of tuples, where each tuple represents the x and y
            coordinates of a haze point.

    Returns:
        np.ndarray: The output image with added fog, as a numpy array.

    Raises:
        ValueError: If the input image's dtype is not uint8 or float32.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for RandomFog augmentation")

    width = img.shape[1]

    hw = max(int(width // 3 * fog_coef), 10)

    for haze_points in haze_list:
        x, y = haze_points
        overlay = img.copy()
        output = img.copy()
        alpha = alpha_coef * fog_coef
        rad = hw // 2
        point = (x + hw // 2, y + hw // 2)
        cv2.circle(overlay, point, int(rad), (255, 255, 255), -1)
        output = add_weighted(overlay, alpha, output, 1 - alpha)

        img = output.copy()

    image_rgb = cv2.blur(img, (hw // 10, hw // 10))

    return to_float(image_rgb, max_value=255) if needs_float else image_rgb

`def add_gravel (img, gravels)` [view source on GitHub]¶

Add gravel to the image.

Parameters:

Name	Type	Description
`img`	`numpy.ndarray`	image to add gravel to
`gravels`	`list`	list of gravel parameters. (float, float, float, float): (top-left x, top-left y, bottom-right x, bottom right y)

Returns:

Type	Description
`numpy.ndarray`

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Source code in albumentations/augmentations/functional.py

Python

@contiguous
@preserve_channel_dim
def add_gravel(img: np.ndarray, gravels: list[Any]) -> np.ndarray:
    """Add gravel to the image.

    Args:
        img (numpy.ndarray): image to add gravel to
        gravels (list): list of gravel parameters. (float, float, float, float):
            (top-left x, top-left y, bottom-right x, bottom right y)

    Returns:
        numpy.ndarray:

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    non_rgb_warning(img)
    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError(f"Unexpected dtype {input_dtype} for AddGravel augmentation")

    image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)

    for gravel in gravels:
        y1, y2, x1, x2, sat = gravel
        image_hls[x1:x2, y1:y2, 1] = sat

    image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)

    return to_float(image_rgb, max_value=255) if needs_float else image_rgb

`def add_rain (img, slant, drop_length, drop_width, drop_color, blur_value, brightness_coefficient, rain_drops)` [view source on GitHub]¶

Adds rain drops to the image.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	Input image.
`slant`	`int`	The angle of the rain drops.
`drop_length`	`int`	The length of each rain drop.
`drop_width`	`int`	The width of each rain drop.
`drop_color`	`tuple[int, int, int]`	The color of the rain drops in RGB format.
`blur_value`	`int`	The size of the kernel used to blur the image. Rainy views are blurry.
`brightness_coefficient`	`float`	Coefficient to adjust the brightness of the image. Rainy days are usually shady.
`rain_drops`	`list[tuple[int, int]]`	A list of tuples where each tuple represents the (x, y) coordinates of the starting point of a rain drop.

Returns:

Type	Description
`np.ndarray`	Image with rain effect added.

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Source code in albumentations/augmentations/functional.py

Python

@preserve_channel_dim
def add_rain(
    img: np.ndarray,
    slant: int,
    drop_length: int,
    drop_width: int,
    drop_color: tuple[int, int, int],
    blur_value: int,
    brightness_coefficient: float,
    rain_drops: list[tuple[int, int]],
) -> np.ndarray:
    """Adds rain drops to the image.

    Args:
        img (np.ndarray): Input image.
        slant (int): The angle of the rain drops.
        drop_length (int): The length of each rain drop.
        drop_width (int): The width of each rain drop.
        drop_color (tuple[int, int, int]): The color of the rain drops in RGB format.
        blur_value (int): The size of the kernel used to blur the image. Rainy views are blurry.
        brightness_coefficient (float): Coefficient to adjust the brightness of the image. Rainy days are usually shady.
        rain_drops (list[tuple[int, int]]): A list of tuples where each tuple represents the (x, y)
            coordinates of the starting point of a rain drop.

    Returns:
        np.ndarray: Image with rain effect added.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True

    image = img.copy()

    for rain_drop_x0, rain_drop_y0 in rain_drops:
        rain_drop_x1 = rain_drop_x0 + slant
        rain_drop_y1 = rain_drop_y0 + drop_length

        cv2.line(
            image,
            (rain_drop_x0, rain_drop_y0),
            (rain_drop_x1, rain_drop_y1),
            drop_color,
            drop_width,
        )

    image = cv2.blur(image, (blur_value, blur_value))  # rainy view are blurry
    image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype(np.float32)
    image_hsv[:, :, 2] *= brightness_coefficient

    image_rgb = cv2.cvtColor(image_hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)

    return to_float(image_rgb, max_value=255) if needs_float else image_rgb

`def add_shadow (img, vertices_list)` [view source on GitHub]¶

Add shadows to the image by reducing the intensity of the RGB values in specified regions.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	Input image.
`vertices_list`	`list[np.ndarray]`	list of vertices for shadow polygons.

Returns:

Type	Description
`np.ndarray`	Image with shadows added.

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Source code in albumentations/augmentations/functional.py

Python

@contiguous
@preserve_channel_dim
def add_shadow(img: np.ndarray, vertices_list: list[np.ndarray]) -> np.ndarray:
    """Add shadows to the image by reducing the intensity of the RGB values in specified regions.

    Args:
        img (np.ndarray): Input image.
        vertices_list (list[np.ndarray]): list of vertices for shadow polygons.

    Returns:
        np.ndarray: Image with shadows added.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    non_rgb_warning(img)
    input_dtype = img.dtype
    needs_float = False

    max_value = MAX_VALUES_BY_DTYPE[np.uint8]

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True

    mask = np.zeros_like(img, dtype=np.uint8)
    cv2.fillPoly(mask, vertices_list, (max_value, max_value, max_value))

    # Apply shadow to the RGB channels directly
    # It could be tempting to convert to HLS and apply the shadow to the L channel, but it creates artifacts
    shadow_intensity = 0.5  # Adjust this value to control the shadow intensity
    img_shadowed = img.copy()
    shadowed_indices = mask[:, :, 0] == max_value
    img_shadowed[shadowed_indices] = clip(img_shadowed[shadowed_indices] * shadow_intensity, np.uint8)

    if needs_float:
        return to_float(img_shadowed, max_value=max_value)

    return img_shadowed

`def add_snow (img, snow_point, brightness_coeff)` [view source on GitHub]¶

Bleaches out pixels, imitating snow.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	Input image.
`snow_point`	`float`	A float in the range [0, 1], scaled and adjusted to determine the threshold for pixel modification.
`brightness_coeff`	`float`	Coefficient applied to increase the brightness of pixels below the snow_point threshold. Larger values lead to more pronounced snow effects.

Returns:

Type	Description
`np.ndarray`	Image with simulated snow effect.

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Source code in albumentations/augmentations/functional.py

Python

@preserve_channel_dim
def add_snow(img: np.ndarray, snow_point: float, brightness_coeff: float) -> np.ndarray:
    """Bleaches out pixels, imitating snow.

    Args:
        img (np.ndarray): Input image.
        snow_point (float): A float in the range [0, 1], scaled and adjusted to determine
            the threshold for pixel modification.
        brightness_coeff (float): Coefficient applied to increase the brightness of pixels below the snow_point
            threshold. Larger values lead to more pronounced snow effects.

    Returns:
        np.ndarray: Image with simulated snow effect.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    snow_point *= 127.5  # = 255 / 2
    snow_point += 85  # = 255 / 3

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True

    image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
    image_hls = np.array(image_hls, dtype=np.float32)

    image_hls[:, :, 1][image_hls[:, :, 1] < snow_point] *= brightness_coeff

    image_hls[:, :, 1] = clip(image_hls[:, :, 1], np.uint8)

    image_hls = np.array(image_hls, dtype=np.uint8)

    image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)

    return to_float(image_rgb, max_value=255) if needs_float else image_rgb

`def add_sun_flare (img, flare_center, src_radius, src_color, circles)` [view source on GitHub]¶

Add a sun flare effect to an image.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	The input image.
`flare_center`	`tuple[float, float]`	(x, y) coordinates of the flare center
`src_radius`	`int`	The radius of the source of the flare.
`src_color`	`ColorType`	The color of the flare, represented as a tuple of RGB values.
`circles`	`list[Any]`	A list of tuples, each representing a circle that contributes to the flare effect. Each tuple contains the alpha value, the center coordinates, the radius, and the color of the circle.

Returns:

Type	Description
`np.ndarray`	The output image with the sun flare effect added.

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Source code in albumentations/augmentations/functional.py

Python

@preserve_channel_dim
def add_sun_flare(
    img: np.ndarray,
    flare_center: tuple[float, float],
    src_radius: int,
    src_color: ColorType,
    circles: list[Any],
) -> np.ndarray:
    """Add a sun flare effect to an image.

    Args:
        img (np.ndarray): The input image.
        flare_center (tuple[float, float]): (x, y) coordinates of the flare center
        src_radius (int): The radius of the source of the flare.
        src_color (ColorType): The color of the flare, represented as a tuple of RGB values.
        circles (list[Any]): A list of tuples, each representing a circle that contributes to the flare effect.
            Each tuple contains the alpha value, the center coordinates, the radius, and the color of the circle.

    Returns:
        np.ndarray: The output image with the sun flare effect added.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    non_rgb_warning(img)

    input_dtype = img.dtype
    needs_float = False

    if input_dtype == np.float32:
        img = from_float(img, dtype=np.dtype("uint8"))
        needs_float = True

    overlay = img.copy()
    output = img.copy()

    for alpha, (x, y), rad3, (r_color, g_color, b_color) in circles:
        cv2.circle(overlay, (x, y), rad3, (r_color, g_color, b_color), -1)
        output = add_weighted(overlay, alpha, output, 1 - alpha)

    point = [int(x) for x in flare_center]

    overlay = output.copy()
    num_times = src_radius // 10
    alpha = np.linspace(0.0, 1, num=num_times)
    rad = np.linspace(1, src_radius, num=num_times)
    for i in range(num_times):
        cv2.circle(overlay, point, int(rad[i]), src_color, -1)
        alp = alpha[num_times - i - 1] * alpha[num_times - i - 1] * alpha[num_times - i - 1]
        output = add_weighted(overlay, alp, output, 1 - alp)

    return to_float(output, max_value=255) if needs_float else output

`def almost_equal_intervals (n, parts)` [view source on GitHub]¶

Generates an array of nearly equal integer intervals that sum up to n.

This function divides the number n into parts nearly equal parts. It ensures that the sum of all parts equals n, and the difference between any two parts is at most one. This is useful for distributing a total amount into nearly equal discrete parts.

Parameters:

Name	Type	Description
`n`	`int`	The total value to be split.
`parts`	`int`	The number of parts to split into.

Returns:

Type	Description
`np.ndarray`	An array of integers where each integer represents the size of a part.

Examples:

Python

>>> almost_equal_intervals(20, 3)
array([7, 7, 6])  # Splits 20 into three parts: 7, 7, and 6
>>> almost_equal_intervals(16, 4)
array([4, 4, 4, 4])  # Splits 16 into four equal parts

Source code in albumentations/augmentations/functional.py

Python

def almost_equal_intervals(n: int, parts: int) -> np.ndarray:
    """Generates an array of nearly equal integer intervals that sum up to `n`.

    This function divides the number `n` into `parts` nearly equal parts. It ensures that
    the sum of all parts equals `n`, and the difference between any two parts is at most one.
    This is useful for distributing a total amount into nearly equal discrete parts.

    Args:
        n (int): The total value to be split.
        parts (int): The number of parts to split into.

    Returns:
        np.ndarray: An array of integers where each integer represents the size of a part.

    Example:
        >>> almost_equal_intervals(20, 3)
        array([7, 7, 6])  # Splits 20 into three parts: 7, 7, and 6
        >>> almost_equal_intervals(16, 4)
        array([4, 4, 4, 4])  # Splits 16 into four equal parts
    """
    part_size, remainder = divmod(n, parts)
    # Create an array with the base part size and adjust the first `remainder` parts by adding 1
    return np.array([part_size + 1 if i < remainder else part_size for i in range(parts)])

`def bbox_from_mask (mask)` [view source on GitHub]¶

Create bounding box from binary mask (fast version)

Parameters:

Name	Type	Description
`mask`	`numpy.ndarray`	binary mask.

Returns:

Type	Description
`tuple`	A bounding box tuple `(x_min, y_min, x_max, y_max)`.

Source code in albumentations/augmentations/functional.py

Python

def bbox_from_mask(mask: np.ndarray) -> tuple[int, int, int, int]:
    """Create bounding box from binary mask (fast version)

    Args:
        mask (numpy.ndarray): binary mask.

    Returns:
        tuple: A bounding box tuple `(x_min, y_min, x_max, y_max)`.

    """
    rows = np.any(mask, axis=1)
    if not rows.any():
        return -1, -1, -1, -1
    cols = np.any(mask, axis=0)
    y_min, y_max = np.where(rows)[0][[0, -1]]
    x_min, x_max = np.where(cols)[0][[0, -1]]
    return x_min, y_min, x_max + 1, y_max + 1

`def center (width, height)` [view source on GitHub]¶

Calculate the center coordinates if image. Used by images, masks and keypoints.

Parameters:

Name	Type	Description
`width`	`NumericType`	The width of the rectangle.
`height`	`NumericType`	The height of the rectangle.

Returns:

Type	Description
`tuple[float, float]`	The center coordinates.

Source code in albumentations/augmentations/functional.py

Python

def center(width: NumericType, height: NumericType) -> tuple[float, float]:
    """Calculate the center coordinates if image. Used by images, masks and keypoints.

    Args:
        width (NumericType): The width of the rectangle.
        height (NumericType): The height of the rectangle.

    Returns:
        tuple[float, float]: The center coordinates.
    """
    return width / 2 - 0.5, height / 2 - 0.5

`def center_bbox (width, height)` [view source on GitHub]¶

Calculate the center coordinates for of image for bounding boxes.

Parameters:

Name	Type	Description
`width`	`NumericType`	The width of the rectangle.
`height`	`NumericType`	The height of the rectangle.

Returns:

Type	Description
`tuple[float, float]`	The center coordinates.

Source code in albumentations/augmentations/functional.py

Python

def center_bbox(width: NumericType, height: NumericType) -> tuple[float, float]:
    """Calculate the center coordinates for of image for bounding boxes.

    Args:
        width (NumericType): The width of the rectangle.
        height (NumericType): The height of the rectangle.

    Returns:
        tuple[float, float]: The center coordinates.
    """
    return width / 2, height / 2

`def create_shape_groups (tiles)` [view source on GitHub]¶

Groups tiles by their shape and stores the indices for each shape.

Source code in albumentations/augmentations/functional.py

Python

def create_shape_groups(tiles: np.ndarray) -> dict[tuple[int, int], list[int]]:
    """Groups tiles by their shape and stores the indices for each shape."""
    shape_groups = defaultdict(list)
    for index, (start_y, start_x, end_y, end_x) in enumerate(tiles):
        shape = (end_y - start_y, end_x - start_x)
        shape_groups[shape].append(index)
    return shape_groups

`def fancy_pca (img, alpha=0.1)` [view source on GitHub]¶

Perform 'Fancy PCA' augmentation

Parameters:

Name	Type	Description
`img`	`np.ndarray`	numpy array with (h, w, rgb) shape, as ints between 0-255
`alpha`	`float`	how much to perturb/scale the eigen vectors and values the paper used std=0.1

Returns:

Type	Description
`np.ndarray`	numpy image-like array as uint8 range(0, 255)

Reference

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Source code in albumentations/augmentations/functional.py

Python

@clipped
def fancy_pca(img: np.ndarray, alpha: float = 0.1) -> np.ndarray:
    """Perform 'Fancy PCA' augmentation

    Args:
        img: numpy array with (h, w, rgb) shape, as ints between 0-255
        alpha: how much to perturb/scale the eigen vectors and values
                the paper used std=0.1

    Returns:
        numpy image-like array as uint8 range(0, 255)

    Reference:
        http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
    """
    if not is_rgb_image(img) or img.dtype != np.uint8:
        msg = "Image must be RGB image in uint8 format."
        raise TypeError(msg)

    orig_img = img.astype(float).copy()

    img = to_float(img)  # rescale to 0 to 1 range

    # flatten image to columns of RGB
    img_rs = img.reshape(-1, 3)
    # img_rs shape (640000, 3)

    # center mean
    img_centered = img_rs - np.mean(img_rs, axis=0)

    # paper says 3x3 covariance matrix
    img_cov = np.cov(img_centered, rowvar=False)

    # eigen values and eigen vectors
    eig_vals, eig_vecs = np.linalg.eigh(img_cov)

    # sort values and vector
    sort_perm = eig_vals[::-1].argsort()
    eig_vals[::-1].sort()
    eig_vecs = eig_vecs[:, sort_perm]

    # > get [p1, p2, p3]
    m1 = np.column_stack(eig_vecs)

    # get 3x1 matrix of eigen values multiplied by random variable draw from normal
    # distribution with mean of 0 and standard deviation of 0.1
    m2 = np.zeros((3, 1))
    # according to the paper alpha should only be draw once per augmentation (not once per channel)
    # > alpha = np.random.normal(0, alpha_std)

    # broad cast to speed things up
    m2[:, 0] = alpha * eig_vals[:]

    # this is the vector that we're going to add to each pixel in a moment
    add_vect = np.array(m1) @ np.array(m2)

    for idx in range(3):  # RGB
        orig_img[..., idx] += add_vect[idx] * 255

    # for image processing it was found that working with float 0.0 to 1.0
    # was easier than integers between 0-255
    return orig_img

`def generate_shuffled_splits (size, divisions, random_state=None)` [view source on GitHub]¶

Generate shuffled splits for a given dimension size and number of divisions.

Parameters:

Name	Type	Description
`size`	`int`	Total size of the dimension (height or width).
`divisions`	`int`	Number of divisions (rows or columns).
`random_state`	`Optional[np.random.RandomState]`	Seed for the random number generator for reproducibility.

Returns:

Type	Description
`np.ndarray`	Cumulative edges of the shuffled intervals.

Source code in albumentations/augmentations/functional.py

Python

def generate_shuffled_splits(
    size: int,
    divisions: int,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Generate shuffled splits for a given dimension size and number of divisions.

    Args:
        size (int): Total size of the dimension (height or width).
        divisions (int): Number of divisions (rows or columns).
        random_state (Optional[np.random.RandomState]): Seed for the random number generator for reproducibility.

    Returns:
        np.ndarray: Cumulative edges of the shuffled intervals.
    """
    intervals = almost_equal_intervals(size, divisions)
    intervals = random_utils.shuffle(intervals, random_state=random_state)
    return np.insert(np.cumsum(intervals), 0, 0)

`def iso_noise (image, color_shift=0.05, intensity=0.5, random_state=None)` [view source on GitHub]¶

Apply poisson noise to an image to simulate camera sensor noise.

Parameters:

Name	Type	Description
`image`	`np.ndarray`	Input image. Currently, only RGB images are supported.
`color_shift`	`float`	The amount of color shift to apply. Default is 0.05.
`intensity`	`float`	Multiplication factor for noise values. Values of ~0.5 produce a noticeable, yet acceptable level of noise. Default is 0.5.
`random_state`	`Optional[np.random.RandomState]`	If specified, this will be random state used for noise generation.

Returns:

Type	Description
`np.ndarray`	The noised image.

Exceptions:

Type	Description
`TypeError`	If the input image's dtype is not RGB.

Source code in albumentations/augmentations/functional.py

Python

@clipped
def iso_noise(
    image: np.ndarray,
    color_shift: float = 0.05,
    intensity: float = 0.5,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Apply poisson noise to an image to simulate camera sensor noise.

    Args:
        image (np.ndarray): Input image. Currently, only RGB images are supported.
        color_shift (float): The amount of color shift to apply. Default is 0.05.
        intensity (float): Multiplication factor for noise values. Values of ~0.5 produce a noticeable,
                           yet acceptable level of noise. Default is 0.5.
        random_state (Optional[np.random.RandomState]): If specified, this will be random state used
            for noise generation.

    Returns:
        np.ndarray: The noised image.

    Raises:
        TypeError: If the input image's dtype is not RGB.
    """
    if not is_rgb_image(image):
        msg = "Image must be RGB"
        raise TypeError(msg)

    input_dtype = image.dtype
    factor = 1

    if input_dtype == np.uint8:
        image = to_float(image)
        factor = MAX_VALUES_BY_DTYPE[input_dtype]

    hls = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
    _, stddev = cv2.meanStdDev(hls)

    luminance_noise = random_utils.poisson(stddev[1] * intensity * 255, size=hls.shape[:2], random_state=random_state)
    color_noise = random_utils.normal(0, color_shift * 360 * intensity, size=hls.shape[:2], random_state=random_state)

    hue = hls[..., 0]
    hue += color_noise
    hue %= 360

    luminance = hls[..., 1]
    luminance += (luminance_noise / 255) * (1.0 - luminance)

    return cv2.cvtColor(hls, cv2.COLOR_HLS2RGB) * factor

`def mask_from_bbox (img, bbox)` [view source on GitHub]¶

Create binary mask from bounding box

Parameters:

Name	Type	Description
`img`	`np.ndarray`	input image
`bbox`	`tuple[int, int, int, int]`	A bounding box tuple `(x_min, y_min, x_max, y_max)`

Returns:

Type	Description
`mask`	binary mask

Source code in albumentations/augmentations/functional.py

Python

def mask_from_bbox(img: np.ndarray, bbox: tuple[int, int, int, int]) -> np.ndarray:
    """Create binary mask from bounding box

    Args:
        img: input image
        bbox: A bounding box tuple `(x_min, y_min, x_max, y_max)`

    Returns:
        mask: binary mask

    """
    mask = np.zeros(img.shape[:2], dtype=np.uint8)
    x_min, y_min, x_max, y_max = bbox
    mask[y_min:y_max, x_min:x_max] = 1
    return mask

`def move_tone_curve (img, low_y, high_y)` [view source on GitHub]¶

Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	np.ndarray. Any number of channels
`low_y`	`float \| np.ndarray`	per-channel or single y-position of a Bezier control point used to adjust the tone curve, must be in range [0, 1]
`high_y`	`float \| np.ndarray`	per-channel or single y-position of a Bezier control point used to adjust image tone curve, must be in range [0, 1]

Source code in albumentations/augmentations/functional.py

Python

@preserve_channel_dim
def move_tone_curve(
    img: np.ndarray,
    low_y: float | np.ndarray,
    high_y: float | np.ndarray,
) -> np.ndarray:
    """Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

    Args:
        img: np.ndarray. Any number of channels
        low_y: per-channel or single y-position of a Bezier control point used
            to adjust the tone curve, must be in range [0, 1]
        high_y: per-channel or single y-position of a Bezier control point used
            to adjust image tone curve, must be in range [0, 1]

    """
    input_dtype = img.dtype
    needs_float = False

    if input_dtype in [np.float32, np.float64, np.float16]:
        img = from_float(img, dtype=np.uint8)
        needs_float = True

    t = np.linspace(0.0, 1.0, 256)

    def evaluate_bez(t: np.ndarray, low_y: float | np.ndarray, high_y: float | np.ndarray) -> np.ndarray:
        one_minus_t = 1 - t
        return (3 * one_minus_t**2 * t * low_y + 3 * one_minus_t * t**2 * high_y + t**3) * 255

    num_channels = get_num_channels(img)

    if np.isscalar(low_y) and np.isscalar(high_y):
        lut = clip(np.rint(evaluate_bez(t, low_y, high_y)), np.uint8)
        output = cv2.LUT(img, lut)
    elif isinstance(low_y, np.ndarray) and isinstance(high_y, np.ndarray):
        luts = clip(np.rint(evaluate_bez(t[:, np.newaxis], low_y, high_y).T), np.uint8)
        output = cv2.merge([cv2.LUT(img[:, :, i], luts[i]) for i in range(num_channels)])
    else:
        raise TypeError(
            f"low_y and high_y must both be of type float or np.ndarray. Got {type(low_y)} and {type(high_y)}",
        )

    return to_float(output, max_value=255) if needs_float else output

`def posterize (img, bits)` [view source on GitHub]¶

Reduce the number of bits for each color channel.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	image to posterize.
`bits`	`int`	number of high bits. Must be in range [0, 8]

Returns:

Type	Description
`np.ndarray`	Image with reduced color channels.

Source code in albumentations/augmentations/functional.py

Python

@preserve_channel_dim
def posterize(img: np.ndarray, bits: int) -> np.ndarray:
    """Reduce the number of bits for each color channel.

    Args:
        img: image to posterize.
        bits: number of high bits. Must be in range [0, 8]

    Returns:
        Image with reduced color channels.

    """
    bits_array = np.uint8(bits)

    if img.dtype != np.uint8:
        msg = "Image must have uint8 channel type"
        raise TypeError(msg)
    if np.any((bits_array < 0) | (bits_array > EIGHT)):
        msg = "bits must be in range [0, 8]"
        raise ValueError(msg)

    if not bits_array.shape or len(bits_array) == 1:
        if bits_array == 0:
            return np.zeros_like(img)
        if bits_array == EIGHT:
            return img.copy()

        lut = np.arange(0, 256, dtype=np.uint8)
        mask = ~np.uint8(2 ** (8 - bits_array) - 1)
        lut &= mask

        return cv2.LUT(img, lut)

    if not is_rgb_image(img):
        msg = "If bits is iterable image must be RGB"
        raise TypeError(msg)

    result_img = np.empty_like(img)
    for i, channel_bits in enumerate(bits_array):
        if channel_bits == 0:
            result_img[..., i] = np.zeros_like(img[..., i])
        elif channel_bits == EIGHT:
            result_img[..., i] = img[..., i].copy()
        else:
            lut = np.arange(0, 256, dtype=np.uint8)
            mask = ~np.uint8(2 ** (8 - channel_bits) - 1)
            lut &= mask

            result_img[..., i] = cv2.LUT(img[..., i], lut)

    return result_img

`def shuffle_tiles_within_shape_groups (shape_groups, random_state=None)` [view source on GitHub]¶

Shuffles indices within each group of similar shapes and creates a list where each index points to the index of the tile it should be mapped to.

Parameters:

Name	Type	Description
`shape_groups`	`dict[tuple[int, int], list[int]]`	Groups of tile indices categorized by shape.
`random_state`	`Optional[np.random.RandomState]`	Seed for the random number generator for reproducibility.

Returns:

Type	Description
`list[int]`	A list where each index is mapped to the new index of the tile after shuffling.

Source code in albumentations/augmentations/functional.py

Python

def shuffle_tiles_within_shape_groups(
    shape_groups: dict[tuple[int, int], list[int]],
    random_state: np.random.RandomState | None = None,
) -> list[int]:
    """Shuffles indices within each group of similar shapes and creates a list where each
    index points to the index of the tile it should be mapped to.

    Args:
        shape_groups (dict[tuple[int, int], list[int]]): Groups of tile indices categorized by shape.
        random_state (Optional[np.random.RandomState]): Seed for the random number generator for reproducibility.

    Returns:
        list[int]: A list where each index is mapped to the new index of the tile after shuffling.
    """
    # Initialize the output list with the same size as the total number of tiles, filled with -1
    num_tiles = sum(len(indices) for indices in shape_groups.values())
    mapping = [-1] * num_tiles

    # Prepare the random number generator

    for indices in shape_groups.values():
        shuffled_indices = random_utils.shuffle(indices.copy(), random_state=random_state)
        for old, new in zip(indices, shuffled_indices):
            mapping[old] = new

    return mapping

`def solarize (img, threshold=128)` [view source on GitHub]¶

Invert all pixel values above a threshold.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	The image to solarize.
`threshold`	`int`	All pixels above this grayscale level are inverted.

Returns:

Type	Description
`np.ndarray`	Solarized image.

Source code in albumentations/augmentations/functional.py

Python

def solarize(img: np.ndarray, threshold: int = 128) -> np.ndarray:
    """Invert all pixel values above a threshold.

    Args:
        img: The image to solarize.
        threshold: All pixels above this grayscale level are inverted.

    Returns:
        Solarized image.

    """
    dtype = img.dtype
    max_val = MAX_VALUES_BY_DTYPE[dtype]

    if dtype == np.uint8:
        lut = [(i if i < threshold else max_val - i) for i in range(int(max_val) + 1)]

        prev_shape = img.shape
        img = cv2.LUT(img, np.array(lut, dtype=dtype))

        if len(prev_shape) != len(img.shape):
            img = np.expand_dims(img, -1)
        return img

    result_img = img.copy()
    cond = img >= threshold
    result_img[cond] = max_val - result_img[cond]
    return result_img

`def split_uniform_grid (image_shape, grid, random_state=None)` [view source on GitHub]¶

Splits an image shape into a uniform grid specified by the grid dimensions.

Parameters:

Name	Type	Description
`image_shape`	`tuple[int, int]`	The shape of the image as (height, width).
`grid`	`tuple[int, int]`	The grid size as (rows, columns).
`random_state`	`Optional[np.random.RandomState]`	The random state to use for shuffling the splits. If None, the splits are not shuffled.

Returns:

Type	Description
`np.ndarray`	An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).

Note

The function uses generate_shuffled_splits to generate the splits for the height and width of the image. The splits are then used to calculate the coordinates of the tiles.

Source code in albumentations/augmentations/functional.py

Python

def split_uniform_grid(
    image_shape: tuple[int, int],
    grid: tuple[int, int],
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Splits an image shape into a uniform grid specified by the grid dimensions.

    Args:
        image_shape (tuple[int, int]): The shape of the image as (height, width).
        grid (tuple[int, int]): The grid size as (rows, columns).
        random_state (Optional[np.random.RandomState]): The random state to use for shuffling the splits.
            If None, the splits are not shuffled.

    Returns:
        np.ndarray: An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).

    Note:
        The function uses `generate_shuffled_splits` to generate the splits for the height and width of the image.
        The splits are then used to calculate the coordinates of the tiles.
    """
    n_rows, n_cols = grid

    height_splits = generate_shuffled_splits(image_shape[0], grid[0], random_state)
    width_splits = generate_shuffled_splits(image_shape[1], grid[1], random_state)

    # Calculate tiles coordinates
    tiles = [
        (height_splits[i], width_splits[j], height_splits[i + 1], width_splits[j + 1])
        for i in range(n_rows)
        for j in range(n_cols)
    ]

    return np.array(tiles)

`def swap_tiles_on_image (image, tiles, mapping=None)` [view source on GitHub]¶

Swap tiles on the image according to the new format.

Parameters:

Name	Type	Description
`image`	`np.ndarray`	Input image.
`tiles`	`np.ndarray`	Array of tiles with each tile as [start_y, start_x, end_y, end_x].
`mapping`	`list[int] \| None`	list of new tile indices.

Returns:

Type	Description
`np.ndarray`	Output image with tiles swapped according to the random shuffle.

Source code in albumentations/augmentations/functional.py

Python

def swap_tiles_on_image(image: np.ndarray, tiles: np.ndarray, mapping: list[int] | None = None) -> np.ndarray:
    """Swap tiles on the image according to the new format.

    Args:
        image: Input image.
        tiles: Array of tiles with each tile as [start_y, start_x, end_y, end_x].
        mapping: list of new tile indices.

    Returns:
        np.ndarray: Output image with tiles swapped according to the random shuffle.
    """
    # If no tiles are provided, return a copy of the original image
    if tiles.size == 0 or mapping is None:
        return image.copy()

    # Create a copy of the image to retain original for reference
    new_image = np.empty_like(image)
    for num, new_index in enumerate(mapping):
        start_y, start_x, end_y, end_x = tiles[new_index]
        start_y_orig, start_x_orig, end_y_orig, end_x_orig = tiles[num]
        # Assign the corresponding tile from the original image to the new image
        new_image[start_y:end_y, start_x:end_x] = image[start_y_orig:end_y_orig, start_x_orig:end_x_orig]

    return new_image

`geometric` `special` ¶

`functional` ¶

`def bbox_d4 (bbox, group_member, rows=None, cols=None)` [view source on GitHub]¶

Applies a D_4 symmetry group transformation to a bounding box.

The function transforms a bounding box according to the specified group member from the D_4 group. These transformations include rotations and reflections, specified to work on an image's bounding box given its dimensions.

bbox (BoxInternalType): The bounding box to transform. This should be a structure specifying coordinates like (xmin, ymin, xmax, ymax).
group_member (D4Type): A string identifier for the D_4 group transformation to apply. Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'.
rows (int): The number of rows in the image, used to adjust transformations that depend on image dimensions.
cols (int): The number of columns in the image, used for the same purposes as rows.

BoxInternalType: The transformed bounding box.

ValueError: If an invalid group member is specified.

Examples:

Applying a 90-degree rotation: bbox_d4((10, 20, 110, 120), 'r90', 100, 100) This would rotate the bounding box 90 degrees within a 100x100 image.

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_d4(
    bbox: BoxInternalType,
    group_member: D4Type,
    rows: int | None = None,
    cols: int | None = None,
) -> BoxInternalType:
    """Applies a `D_4` symmetry group transformation to a bounding box.

    The function transforms a bounding box according to the specified group member from the `D_4` group.
    These transformations include rotations and reflections, specified to work on an image's bounding box given
    its dimensions.

    Parameters:
    - bbox (BoxInternalType): The bounding box to transform. This should be a structure specifying coordinates
        like (xmin, ymin, xmax, ymax).
    - group_member (D4Type): A string identifier for the `D_4` group transformation to apply.
        Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'.
    - rows (int): The number of rows in the image, used to adjust transformations that depend on image dimensions.
    - cols (int): The number of columns in the image, used for the same purposes as rows.

    Returns:
    - BoxInternalType: The transformed bounding box.

    Raises:
    - ValueError: If an invalid group member is specified.

    Examples:
    - Applying a 90-degree rotation:
      `bbox_d4((10, 20, 110, 120), 'r90', 100, 100)`
      This would rotate the bounding box 90 degrees within a 100x100 image.
    """
    transformations = {
        "e": lambda x: x,  # Identity transformation
        "r90": lambda x: bbox_rot90(x, 1),  # Rotate 90 degrees
        "r180": lambda x: bbox_rot90(x, 2),  # Rotate 180 degrees
        "r270": lambda x: bbox_rot90(x, 3),  # Rotate 270 degrees
        "v": lambda x: bbox_vflip(x, rows, cols),  # Vertical flip
        "hvt": lambda x: bbox_transpose(bbox_rot90(x, 2)),  # Reflect over anti-diagonal
        "h": lambda x: bbox_hflip(x),  # Horizontal flip
        "t": lambda x: bbox_transpose(x),  # Transpose (reflect over main diagonal)
    }

    # Execute the appropriate transformation
    if group_member in transformations:
        return transformations[group_member](bbox)

    raise ValueError(f"Invalid group member: {group_member}")

`def bbox_flip (bbox, d, rows=None, cols=None)` [view source on GitHub]¶

Flip a bounding box either vertically, horizontally or both depending on the value of d.

Parameters:

Name	Type	Description
`bbox`	`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.
`d`	`int`	dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose
`rows`	`int \| None`	Image rows.
`cols`	`int \| None`	Image cols.

Returns:

Type	Description
`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.

Exceptions:

Type	Description
`ValueError`	if value of `d` is not -1, 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_flip(bbox: BoxInternalType, d: int, rows: int | None = None, cols: int | None = None) -> BoxInternalType:
    """Flip a bounding box either vertically, horizontally or both depending on the value of `d`.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        d: dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: if value of `d` is not -1, 0 or 1.

    """
    if d == 0:
        bbox = bbox_vflip(bbox)
    elif d == 1:
        bbox = bbox_hflip(bbox)
    elif d == -1:
        bbox = bbox_hflip(bbox)
        bbox = bbox_vflip(bbox)
    else:
        raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
    return bbox

`def bbox_hflip (bbox, rows=None, cols=None)` [view source on GitHub]¶

Flip a bounding box horizontally around the y-axis.

Parameters:

Name	Type	Description
`bbox`	`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int \| None`	Image rows.
`cols`	`int \| None`	Image cols.

Returns:

Type	Description
`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_hflip(bbox: BoxInternalType, rows: int | None = None, cols: int | None = None) -> BoxInternalType:
    """Flip a bounding box horizontally around the y-axis.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    return 1 - x_max, y_min, 1 - x_min, y_max

`def bbox_rot90 (bbox, factor, rows=None, cols=None)` [view source on GitHub]¶

Rotates a bounding box by 90 degrees CCW (see np.rot90)

Parameters:

Name	Type	Description
`bbox`	`BoxInternalType`	A bounding box tuple (x_min, y_min, x_max, y_max).
`factor`	`int`	Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.
`rows`	`int \| None`	Image rows.
`cols`	`int \| None`	Image cols.

Returns:

Type	Description
`tuple`	A bounding box tuple (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_rot90(bbox: BoxInternalType, factor: int, rows: int | None = None, cols: int | None = None) -> BoxInternalType:
    """Rotates a bounding box by 90 degrees CCW (see np.rot90)

    Args:
        bbox: A bounding box tuple (x_min, y_min, x_max, y_max).
        factor: Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.
        rows: Image rows.
        cols: Image cols.

    Returns:
        tuple: A bounding box tuple (x_min, y_min, x_max, y_max).

    """
    if factor not in {0, 1, 2, 3}:
        msg = "Parameter n must be in set {0, 1, 2, 3}"
        raise ValueError(msg)
    x_min, y_min, x_max, y_max = bbox[:4]
    if factor == 1:
        bbox = y_min, 1 - x_max, y_max, 1 - x_min
    elif factor == ROT90_180_FACTOR:
        bbox = 1 - x_max, 1 - y_max, 1 - x_min, 1 - y_min
    elif factor == ROT90_270_FACTOR:
        bbox = 1 - y_max, x_min, 1 - y_min, x_max
    return bbox

`def bbox_rotate (bbox, angle, method, rows, cols)` [view source on GitHub]¶

Rotates a bounding box by angle degrees.

Parameters:

Name	Type	Description
`bbox`	`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.
`angle`	`float`	Angle of rotation in degrees.
`method`	`str`	Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".
`rows`	`int`	Image rows.
`cols`	`int`	Image cols.

Returns:

Type	Description
`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.

Reference

https://arxiv.org/abs/2109.13488

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_rotate(bbox: BoxInternalType, angle: float, method: str, rows: int, cols: int) -> BoxInternalType:
    """Rotates a bounding box by angle degrees.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        angle: Angle of rotation in degrees.
        method: Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    Reference:
        https://arxiv.org/abs/2109.13488

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    scale = cols / float(rows)
    if method == "largest_box":
        x = np.array([x_min, x_max, x_max, x_min]) - 0.5
        y = np.array([y_min, y_min, y_max, y_max]) - 0.5
    elif method == "ellipse":
        w = (x_max - x_min) / 2
        h = (y_max - y_min) / 2
        data = np.arange(0, 360, dtype=np.float32)
        x = w * np.sin(np.radians(data)) + (w + x_min - 0.5)
        y = h * np.cos(np.radians(data)) + (h + y_min - 0.5)
    else:
        raise ValueError(f"Method {method} is not a valid rotation method.")
    angle = np.deg2rad(angle)
    x_t = (np.cos(angle) * x * scale + np.sin(angle) * y) / scale
    y_t = -np.sin(angle) * x * scale + np.cos(angle) * y
    x_t = x_t + 0.5
    y_t = y_t + 0.5

    x_min, x_max = min(x_t), max(x_t)
    y_min, y_max = min(y_t), max(y_t)

    return x_min, y_min, x_max, y_max

`def bbox_transpose (bbox, rows=None, cols=None)` [view source on GitHub]¶

Transposes a bounding box along given axis.

Parameters:

Name	Type	Description
`bbox`	`KeypointInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int \| None`	Image rows.
`cols`	`int \| None`	Image cols.

Returns:

Type	Description
`KeypointInternalType`	A bounding box tuple `(x_min, y_min, x_max, y_max)`.

Exceptions:

Type	Description
`ValueError`	If axis not equal to 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_transpose(
    bbox: KeypointInternalType,
    rows: int | None = None,
    cols: int | None = None,
) -> KeypointInternalType:
    """Transposes a bounding box along given axis.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image rows.
        cols: Image cols.

    Returns:
        A bounding box tuple `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: If axis not equal to 0 or 1.

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    return (y_min, x_min, y_max, x_max)

`def bbox_vflip (bbox, rows=None, cols=None)` [view source on GitHub]¶

Flip a bounding box vertically around the x-axis.

Parameters:

Name	Type	Description
`bbox`	`BoxInternalType`	A bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int \| None`	Image rows.
`cols`	`int \| None`	Image cols.

Returns:

Type	Description
`tuple`	A bounding box `(x_min, y_min, x_max, y_max)`.

Source code in albumentations/augmentations/geometric/functional.py

Python

def bbox_vflip(bbox: BoxInternalType, rows: int | None = None, cols: int | None = None) -> BoxInternalType:
    """Flip a bounding box vertically around the x-axis.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image rows.
        cols: Image cols.

    Returns:
        tuple: A bounding box `(x_min, y_min, x_max, y_max)`.

    """
    x_min, y_min, x_max, y_max = bbox[:4]
    return x_min, 1 - y_max, x_max, 1 - y_min

`def d4 (img, group_member)` [view source on GitHub]¶

Applies a D_4 symmetry group transformation to an image array.

This function manipulates an image using transformations such as rotations and flips, corresponding to the D_4 dihedral group symmetry operations. Each transformation is identified by a unique group member code.

img (np.ndarray): The input image array to transform.
group_member (D4Type): A string identifier indicating the specific transformation to apply. Valid codes include:
'e': Identity (no transformation).
'r90': Rotate 90 degrees counterclockwise.
'r180': Rotate 180 degrees.
'r270': Rotate 270 degrees counterclockwise.
'v': Vertical flip.
'hvt': Transpose over second diagonal
'h': Horizontal flip.
't': Transpose (reflect over the main diagonal).

np.ndarray: The transformed image array.

ValueError: If an invalid group member is specified.

Examples:

Rotating an image by 90 degrees: transformed_image = d4(original_image, 'r90')
Applying a horizontal flip to an image: transformed_image = d4(original_image, 'h')

Source code in albumentations/augmentations/geometric/functional.py

Python

def d4(img: np.ndarray, group_member: D4Type) -> np.ndarray:
    """Applies a `D_4` symmetry group transformation to an image array.

    This function manipulates an image using transformations such as rotations and flips,
    corresponding to the `D_4` dihedral group symmetry operations.
    Each transformation is identified by a unique group member code.

    Parameters:
    - img (np.ndarray): The input image array to transform.
    - group_member (D4Type): A string identifier indicating the specific transformation to apply. Valid codes include:
      - 'e': Identity (no transformation).
      - 'r90': Rotate 90 degrees counterclockwise.
      - 'r180': Rotate 180 degrees.
      - 'r270': Rotate 270 degrees counterclockwise.
      - 'v': Vertical flip.
      - 'hvt': Transpose over second diagonal
      - 'h': Horizontal flip.
      - 't': Transpose (reflect over the main diagonal).

    Returns:
    - np.ndarray: The transformed image array.

    Raises:
    - ValueError: If an invalid group member is specified.

    Examples:
    - Rotating an image by 90 degrees:
      `transformed_image = d4(original_image, 'r90')`
    - Applying a horizontal flip to an image:
      `transformed_image = d4(original_image, 'h')`
    """
    transformations = {
        "e": lambda x: x,  # Identity transformation
        "r90": lambda x: rot90(x, 1),  # Rotate 90 degrees
        "r180": lambda x: rot90(x, 2),  # Rotate 180 degrees
        "r270": lambda x: rot90(x, 3),  # Rotate 270 degrees
        "v": vflip,  # Vertical flip
        "hvt": lambda x: transpose(rot90(x, 2)),  # Reflect over anti-diagonal
        "h": hflip,  # Horizontal flip
        "t": transpose,  # Transpose (reflect over main diagonal)
    }

    # Execute the appropriate transformation
    if group_member in transformations:
        return np.ascontiguousarray(transformations[group_member](img))

    raise ValueError(f"Invalid group member: {group_member}")

`def elastic_transform (img, alpha, sigma, interpolation, border_mode, value=None, random_state=None, approximate=False, same_dxdy=False)` [view source on GitHub]¶

Apply an elastic transformation to an image.

Source code in albumentations/augmentations/geometric/functional.py

Python

@preserve_channel_dim
def elastic_transform(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    interpolation: int,
    border_mode: int,
    value: ColorType | None = None,
    random_state: np.random.RandomState | None = None,
    approximate: bool = False,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Apply an elastic transformation to an image."""
    if approximate:
        return elastic_transform_approximate(
            img,
            alpha,
            sigma,
            interpolation,
            border_mode,
            value,
            random_state,
            same_dxdy,
        )
    return elastic_transform_precise(
        img,
        alpha,
        sigma,
        interpolation,
        border_mode,
        value,
        random_state,
        same_dxdy,
    )

`def elastic_transform_approximate (img, alpha, sigma, interpolation, border_mode, value, random_state, same_dxdy=False)` [view source on GitHub]¶

Apply an approximate elastic transformation to an image.

Source code in albumentations/augmentations/geometric/functional.py

Python

def elastic_transform_approximate(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    interpolation: int,
    border_mode: int,
    value: ColorType | None,
    random_state: np.random.RandomState | None,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Apply an approximate elastic transformation to an image."""
    return elastic_transform_helper(
        img,
        alpha,
        sigma,
        interpolation,
        border_mode,
        value,
        random_state,
        same_dxdy,
        kernel_size=(17, 17),
    )

`def elastic_transform_precise (img, alpha, sigma, interpolation, border_mode, value, random_state, same_dxdy=False)` [view source on GitHub]¶

Apply a precise elastic transformation to an image.

This function applies an elastic deformation to the input image using a precise method. The transformation involves creating random displacement fields, smoothing them using Gaussian blur with adaptive kernel size, and then remapping the image according to the smoothed displacement fields.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	Input image.
`alpha`	`float`	Scaling factor for the random displacement fields.
`sigma`	`float`	Standard deviation for Gaussian blur applied to the displacement fields.
`interpolation`	`int`	Interpolation method to be used (e.g., cv2.INTER_LINEAR).
`border_mode`	`int`	Pixel extrapolation method (e.g., cv2.BORDER_CONSTANT).
`value`	`ColorType \| None`	Border value if border_mode is cv2.BORDER_CONSTANT.
`random_state`	`np.random.RandomState \| None`	Random state for reproducibility.
`same_dxdy`	`bool`	If True, use the same displacement field for both x and y directions.

Returns:

Type	Description
`np.ndarray`	Transformed image with precise elastic deformation applied.

Source code in albumentations/augmentations/geometric/functional.py

Python

def elastic_transform_precise(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    interpolation: int,
    border_mode: int,
    value: ColorType | None,
    random_state: np.random.RandomState | None,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Apply a precise elastic transformation to an image.

    This function applies an elastic deformation to the input image using a precise method.
    The transformation involves creating random displacement fields, smoothing them using Gaussian
    blur with adaptive kernel size, and then remapping the image according to the smoothed displacement fields.

    Args:
        img (np.ndarray): Input image.
        alpha (float): Scaling factor for the random displacement fields.
        sigma (float): Standard deviation for Gaussian blur applied to the displacement fields.
        interpolation (int): Interpolation method to be used (e.g., cv2.INTER_LINEAR).
        border_mode (int): Pixel extrapolation method (e.g., cv2.BORDER_CONSTANT).
        value (ColorType | None): Border value if border_mode is cv2.BORDER_CONSTANT.
        random_state (np.random.RandomState | None): Random state for reproducibility.
        same_dxdy (bool, optional): If True, use the same displacement field for both x and y directions.

    Returns:
        np.ndarray: Transformed image with precise elastic deformation applied.
    """
    return elastic_transform_helper(
        img,
        alpha,
        sigma,
        interpolation,
        border_mode,
        value,
        random_state,
        same_dxdy,
        kernel_size=(0, 0),
    )

`def find_keypoint (position, distance_map, threshold, inverted)` [view source on GitHub]¶

Determine if a valid keypoint can be found at the given position.

Source code in albumentations/augmentations/geometric/functional.py

Python

def find_keypoint(
    position: tuple[int, int],
    distance_map: np.ndarray,
    threshold: float | None,
    inverted: bool,
) -> tuple[float, float] | None:
    """Determine if a valid keypoint can be found at the given position."""
    y, x = position
    value = distance_map[y, x]
    if not inverted and threshold is not None and value >= threshold:
        return None
    if inverted and threshold is not None and value < threshold:
        return None
    return float(x), float(y)

`def from_distance_maps (distance_maps, inverted, if_not_found_coords, threshold)` [view source on GitHub]¶

Convert outputs of to_distance_maps to KeypointsOnImage. This is the inverse of to_distance_maps.

Source code in albumentations/augmentations/geometric/functional.py

Python

def from_distance_maps(
    distance_maps: np.ndarray,
    inverted: bool,
    if_not_found_coords: Sequence[int] | dict[str, Any] | None,
    threshold: float | None,
) -> list[tuple[float, float]]:
    """Convert outputs of `to_distance_maps` to `KeypointsOnImage`.
    This is the inverse of `to_distance_maps`.
    """
    if distance_maps.ndim != NUM_MULTI_CHANNEL_DIMENSIONS:
        msg = f"Expected three-dimensional input, got {distance_maps.ndim} dimensions and shape {distance_maps.shape}."
        raise ValueError(msg)
    height, width, nb_keypoints = distance_maps.shape

    drop_if_not_found, if_not_found_x, if_not_found_y = validate_if_not_found_coords(if_not_found_coords)

    keypoints = []
    for i in range(nb_keypoints):
        hitidx_flat = np.argmax(distance_maps[..., i]) if inverted else np.argmin(distance_maps[..., i])
        hitidx_ndim = np.unravel_index(hitidx_flat, (height, width))
        keypoint = find_keypoint(hitidx_ndim, distance_maps[:, :, i], threshold, inverted)
        if keypoint:
            keypoints.append(keypoint)
        elif not drop_if_not_found:
            keypoints.append((if_not_found_x, if_not_found_y))

    return keypoints

`def keypoint_d4 (keypoint, group_member, rows, cols, ** params)` [view source on GitHub]¶

Applies a D_4 symmetry group transformation to a keypoint.

This function adjusts a keypoint's coordinates according to the specified D_4 group transformation, which includes rotations and reflections suitable for image processing tasks. These transformations account for the dimensions of the image to ensure the keypoint remains within its boundaries.

keypoint (KeypointInternalType): The keypoint to transform. T his should be a structure or tuple specifying coordinates like (x, y, [additional parameters]).
group_member (D4Type): A string identifier for the D_4 group transformation to apply. Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hv', 'h', 't'.
rows (int): The number of rows in the image.
cols (int): The number of columns in the image.
params (Any): Not used

KeypointInternalType: The transformed keypoint.

ValueError: If an invalid group member is specified, indicating that the specified transformation does not exist.

Examples:

Rotating a keypoint by 90 degrees in a 100x100 image: keypoint_d4((50, 30), 'r90', 100, 100) This would move the keypoint from (50, 30) to (70, 50) assuming standard coordinate transformations.

Source code in albumentations/augmentations/geometric/functional.py

Python

def keypoint_d4(
    keypoint: KeypointInternalType,
    group_member: D4Type,
    rows: int,
    cols: int,
    **params: Any,
) -> KeypointInternalType:
    """Applies a `D_4` symmetry group transformation to a keypoint.

    This function adjusts a keypoint's coordinates according to the specified `D_4` group transformation,
    which includes rotations and reflections suitable for image processing tasks. These transformations account
    for the dimensions of the image to ensure the keypoint remains within its boundaries.

    Parameters:
    - keypoint (KeypointInternalType): The keypoint to transform. T
        his should be a structure or tuple specifying coordinates
        like (x, y, [additional parameters]).
    - group_member (D4Type): A string identifier for the `D_4` group transformation to apply.
        Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hv', 'h', 't'.
    - rows (int): The number of rows in the image.
    - cols (int): The number of columns in the image.
    - params (Any): Not used

    Returns:
    - KeypointInternalType: The transformed keypoint.

    Raises:
    - ValueError: If an invalid group member is specified, indicating that the specified transformation does not exist.

    Examples:
    - Rotating a keypoint by 90 degrees in a 100x100 image:
      `keypoint_d4((50, 30), 'r90', 100, 100)`
      This would move the keypoint from (50, 30) to (70, 50) assuming standard coordinate transformations.
    """
    transformations = {
        "e": lambda x: x,  # Identity transformation
        "r90": lambda x: keypoint_rot90(x, 1, rows, cols),  # Rotate 90 degrees
        "r180": lambda x: keypoint_rot90(x, 2, rows, cols),  # Rotate 180 degrees
        "r270": lambda x: keypoint_rot90(x, 3, rows, cols),  # Rotate 270 degrees
        "v": lambda x: keypoint_vflip(x, rows, cols),  # Vertical flip
        "hvt": lambda x: keypoint_transpose(keypoint_rot90(x, 2, rows, cols), rows, cols),  # Reflect over anti diagonal
        "h": lambda x: keypoint_hflip(x, rows, cols),  # Horizontal flip
        "t": lambda x: keypoint_transpose(x, rows, cols),  # Transpose (reflect over main diagonal)
    }
    # Execute the appropriate transformation
    if group_member in transformations:
        return transformations[group_member](keypoint)

    raise ValueError(f"Invalid group member: {group_member}")

`def keypoint_flip (keypoint, d, rows, cols)` [view source on GitHub]¶

Flip a keypoint either vertically, horizontally or both depending on the value of d.

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.
`d`	`int`	Number of flip. Must be -1, 0 or 1: * 0 - vertical flip, * 1 - horizontal flip, * -1 - vertical and horizontal flip.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.

Exceptions:

Type	Description
`ValueError`	if value of `d` is not -1, 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py

Python

@angle_2pi_range
def keypoint_flip(keypoint: KeypointInternalType, d: int, rows: int, cols: int) -> KeypointInternalType:
    """Flip a keypoint either vertically, horizontally or both depending on the value of `d`.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        d: Number of flip. Must be -1, 0 or 1:
            * 0 - vertical flip,
            * 1 - horizontal flip,
            * -1 - vertical and horizontal flip.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    Raises:
        ValueError: if value of `d` is not -1, 0 or 1.

    """
    if d == 0:
        keypoint = keypoint_vflip(keypoint, rows, cols)
    elif d == 1:
        keypoint = keypoint_hflip(keypoint, rows, cols)
    elif d == -1:
        keypoint = keypoint_hflip(keypoint, rows, cols)
        keypoint = keypoint_vflip(keypoint, rows, cols)
    else:
        raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
    return keypoint

`def keypoint_hflip (keypoint, rows, cols)` [view source on GitHub]¶

Flip a keypoint horizontally around the y-axis.

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.

Source code in albumentations/augmentations/geometric/functional.py

Python

@angle_2pi_range
def keypoint_hflip(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
    """Flip a keypoint horizontally around the y-axis.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    angle = math.pi - angle
    return (cols - 1) - x, y, angle, scale

`def keypoint_rot90 (keypoint, factor, rows, cols, ** params)` [view source on GitHub]¶

Rotate a keypoint by 90 degrees counter-clockwise (CCW) a specified number of times.

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint in the format `(x, y, angle, scale)`.
`factor`	`int`	The number of 90 degree CCW rotations to apply. Must be in the range [0, 3].
`rows`	`int`	The height of the image the keypoint belongs to.
`cols`	`int`	The width of the image the keypoint belongs to.
`**params`	`Any`	Additional parameters.

Returns:

Type	Description
`KeypointInternalType`	The rotated keypoint in the format `(x, y, angle, scale)`.

Exceptions:

Type	Description
`ValueError`	If the factor is not in the set {0, 1, 2, 3}.

Source code in albumentations/augmentations/geometric/functional.py

Python

@angle_2pi_range
def keypoint_rot90(
    keypoint: KeypointInternalType,
    factor: int,
    rows: int,
    cols: int,
    **params: Any,
) -> KeypointInternalType:
    """Rotate a keypoint by 90 degrees counter-clockwise (CCW) a specified number of times.

    Args:
        keypoint (KeypointInternalType): A keypoint in the format `(x, y, angle, scale)`.
        factor (int): The number of 90 degree CCW rotations to apply. Must be in the range [0, 3].
        rows (int): The height of the image the keypoint belongs to.
        cols (int): The width of the image the keypoint belongs to.
        **params: Additional parameters.

    Returns:
        KeypointInternalType: The rotated keypoint in the format `(x, y, angle, scale)`.

    Raises:
        ValueError: If the factor is not in the set {0, 1, 2, 3}.
    """
    x, y, angle, scale = keypoint

    if factor not in {0, 1, 2, 3}:
        raise ValueError("Parameter factor must be in set {0, 1, 2, 3}")

    if factor == 1:
        x, y, angle = y, (cols - 1) - x, angle - math.pi / 2
    elif factor == ROT90_180_FACTOR:
        x, y, angle = (cols - 1) - x, (rows - 1) - y, angle - math.pi
    elif factor == ROT90_270_FACTOR:
        x, y, angle = (rows - 1) - y, x, angle + math.pi / 2

    return x, y, angle, scale

`def keypoint_rotate (keypoint, angle, rows, cols, ** params)` [view source on GitHub]¶

Rotate a keypoint by a specified angle.

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint in the format `(x, y, angle, scale)`.
`angle`	`float`	The angle by which to rotate the keypoint, in degrees.
`rows`	`int`	The height of the image the keypoint belongs to.
`cols`	`int`	The width of the image the keypoint belongs to.
`**params`	`Any`	Additional parameters.

Returns:

Type	Description
`KeypointInternalType`	The rotated keypoint in the format `(x, y, angle, scale)`.

Note

The rotation is performed around the center of the image.

Source code in albumentations/augmentations/geometric/functional.py

Python

@angle_2pi_range
def keypoint_rotate(
    keypoint: KeypointInternalType,
    angle: float,
    rows: int,
    cols: int,
    **params: Any,
) -> KeypointInternalType:
    """Rotate a keypoint by a specified angle.

    Args:
        keypoint (KeypointInternalType): A keypoint in the format `(x, y, angle, scale)`.
        angle (float): The angle by which to rotate the keypoint, in degrees.
        rows (int): The height of the image the keypoint belongs to.
        cols (int): The width of the image the keypoint belongs to.
        **params: Additional parameters.

    Returns:
        KeypointInternalType: The rotated keypoint in the format `(x, y, angle, scale)`.

    Note:
        The rotation is performed around the center of the image.
    """
    image_center = center(cols, rows)
    matrix = cv2.getRotationMatrix2D(image_center, angle, 1.0)
    x, y, a, s = keypoint[:4]
    x, y = cv2.transform(np.array([[[x, y]]]), matrix).squeeze()
    return x, y, a + math.radians(angle), s

`def keypoint_scale (keypoint, scale_x, scale_y)` [view source on GitHub]¶

Scales a keypoint by scale_x and scale_y.

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.
`scale_x`	`float`	Scale coefficient x-axis.
`scale_y`	`float`	Scale coefficient y-axis.

Returns:

Type	Description
`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.

Source code in albumentations/augmentations/geometric/functional.py

Python

def keypoint_scale(keypoint: KeypointInternalType, scale_x: float, scale_y: float) -> KeypointInternalType:
    """Scales a keypoint by scale_x and scale_y.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        scale_x: Scale coefficient x-axis.
        scale_y: Scale coefficient y-axis.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    return x * scale_x, y * scale_y, angle, scale * max(scale_x, scale_y)

`def keypoint_transpose (keypoint, rows, cols)` [view source on GitHub]¶

Transposes a keypoint along a specified axis: main diagonal

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.
`rows`	`int`	Total number of rows (height) in the image.
`cols`	`int`	Total number of columns (width) in the image.

Returns:

Type	Description
`KeypointInternalType`	A transformed keypoint `(x, y, angle, scale)`.

Exceptions:

Type	Description
`ValueError`	If axis is not 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py

Python

@angle_2pi_range
def keypoint_transpose(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
    """Transposes a keypoint along a specified axis: main diagonal

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        rows: Total number of rows (height) in the image.
        cols: Total number of columns (width) in the image.

    Returns:
        A transformed keypoint `(x, y, angle, scale)`.

    Raises:
        ValueError: If axis is not 0 or 1.

    """
    x, y, angle, scale = keypoint[:4]

    # Transpose over the main diagonal: swap x and y.
    new_x, new_y = y, x
    # Adjust angle to reflect the coordinate swap.
    angle = np.pi / 2 - angle if angle <= np.pi else 3 * np.pi / 2 - angle

    return new_x, new_y, angle, scale

`def keypoint_vflip (keypoint, rows, cols)` [view source on GitHub]¶

Flip a keypoint vertically around the x-axis.

Parameters:

Name	Type	Description
`keypoint`	`KeypointInternalType`	A keypoint `(x, y, angle, scale)`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`tuple`	A keypoint `(x, y, angle, scale)`.

Source code in albumentations/augmentations/geometric/functional.py

Python

@angle_2pi_range
def keypoint_vflip(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
    """Flip a keypoint vertically around the x-axis.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        rows: Image height.
        cols: Image width.

    Returns:
        tuple: A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    angle = -angle
    return x, (rows - 1) - y, angle, scale

`def optical_distortion (img, k, dx, dy, interpolation, border_mode, value=None)` [view source on GitHub]¶

Barrel / pincushion distortion. Unconventional augment.

Reference

| https://stackoverflow.com/questions/6199636/formulas-for-barrel-pincushion-distortion | https://stackoverflow.com/questions/10364201/image-transformation-in-opencv | https://stackoverflow.com/questions/2477774/correcting-fisheye-distortion-programmatically | http://www.coldvision.io/2017/03/02/advanced-lane-finding-using-opencv/

Source code in albumentations/augmentations/geometric/functional.py

Python

@preserve_channel_dim
def optical_distortion(
    img: np.ndarray,
    k: int,
    dx: int,
    dy: int,
    interpolation: int,
    border_mode: int,
    value: ColorType | None = None,
) -> np.ndarray:
    """Barrel / pincushion distortion. Unconventional augment.

    Reference:
        |  https://stackoverflow.com/questions/6199636/formulas-for-barrel-pincushion-distortion
        |  https://stackoverflow.com/questions/10364201/image-transformation-in-opencv
        |  https://stackoverflow.com/questions/2477774/correcting-fisheye-distortion-programmatically
        |  http://www.coldvision.io/2017/03/02/advanced-lane-finding-using-opencv/
    """
    height, width = img.shape[:2]

    fx = width
    fy = height

    cx = width * 0.5 + dx
    cy = height * 0.5 + dy

    camera_matrix = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]], dtype=np.float32)

    distortion = np.array([k, k, 0, 0, 0], dtype=np.float32)
    map1, map2 = cv2.initUndistortRectifyMap(camera_matrix, distortion, None, None, (width, height), cv2.CV_32FC1)
    return cv2.remap(img, map1, map2, interpolation=interpolation, borderMode=border_mode, borderValue=value)

`def rotation2d_matrix_to_euler_angles (matrix, y_up)` [view source on GitHub]¶

matrix (np.ndarray): Rotation matrix y_up (bool): is Y axis looks up or down

Source code in albumentations/augmentations/geometric/functional.py

Python

def rotation2d_matrix_to_euler_angles(matrix: np.ndarray, y_up: bool) -> float:
    """Args:
    matrix (np.ndarray): Rotation matrix
    y_up (bool): is Y axis looks up or down

    """
    if y_up:
        return np.arctan2(matrix[1, 0], matrix[0, 0])
    return np.arctan2(-matrix[1, 0], matrix[0, 0])

`def to_distance_maps (keypoints, height, width, inverted=False)` [view source on GitHub]¶

Generate a (H,W,N) array of distance maps for N keypoints.

The n-th distance map contains at every location (y, x) the euclidean distance to the n-th keypoint.

This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.

Parameters:

Name	Type	Description
`keypoints`	`Sequence[tuple[float, float]]`	keypoint coordinates
`height`	`int`	image height
`width`	`int`	image width
`inverted`	`bool`	If `True`, inverted distance maps are returned where each distance value d is replaced by `d/(d+1)`, i.e. the distance maps have values in the range `(0.0, 1.0]` with `1.0` denoting exactly the position of the respective keypoint.

Returns:

Type	Description
`np.ndarray`	(H, W, N) ndarray A `float32` array containing `N` distance maps for `N` keypoints. Each location `(y, x, n)` in the array denotes the euclidean distance at `(y, x)` to the `n`-th keypoint. If `inverted` is `True`, the distance `d` is replaced by `d/(d+1)`. The height and width of the array match the height and width in `KeypointsOnImage.shape`.

Source code in albumentations/augmentations/geometric/functional.py

Python

def to_distance_maps(
    keypoints: Sequence[tuple[float, float]],
    height: int,
    width: int,
    inverted: bool = False,
) -> np.ndarray:
    """Generate a ``(H,W,N)`` array of distance maps for ``N`` keypoints.

    The ``n``-th distance map contains at every location ``(y, x)`` the
    euclidean distance to the ``n``-th keypoint.

    This function can be used as a helper when augmenting keypoints with a
    method that only supports the augmentation of images.

    Args:
        keypoints: keypoint coordinates
        height: image height
        width: image width
        inverted (bool): If ``True``, inverted distance maps are returned where each
            distance value d is replaced by ``d/(d+1)``, i.e. the distance
            maps have values in the range ``(0.0, 1.0]`` with ``1.0`` denoting
            exactly the position of the respective keypoint.

    Returns:
        (H, W, N) ndarray
            A ``float32`` array containing ``N`` distance maps for ``N``
            keypoints. Each location ``(y, x, n)`` in the array denotes the
            euclidean distance at ``(y, x)`` to the ``n``-th keypoint.
            If `inverted` is ``True``, the distance ``d`` is replaced
            by ``d/(d+1)``. The height and width of the array match the
            height and width in ``KeypointsOnImage.shape``.

    """
    distance_maps = np.zeros((height, width, len(keypoints)), dtype=np.float32)

    yy = np.arange(0, height)
    xx = np.arange(0, width)
    grid_xx, grid_yy = np.meshgrid(xx, yy)

    for i, (x, y) in enumerate(keypoints):
        distance_maps[:, :, i] = (grid_xx - x) ** 2 + (grid_yy - y) ** 2

    distance_maps = np.sqrt(distance_maps)
    if inverted:
        return 1 / (distance_maps + 1)
    return distance_maps

`def transpose (img)` [view source on GitHub]¶

Transposes the first two dimensions of an array of any dimensionality. Retains the order of any additional dimensions.

Parameters:

Name	Type	Description
`img`	`np.ndarray`	Input array.

Returns:

Type	Description
`np.ndarray`	Transposed array.

Source code in albumentations/augmentations/geometric/functional.py

Python

def transpose(img: np.ndarray) -> np.ndarray:
    """Transposes the first two dimensions of an array of any dimensionality.
    Retains the order of any additional dimensions.

    Args:
        img (np.ndarray): Input array.

    Returns:
        np.ndarray: Transposed array.
    """
    # Generate the new axes order
    new_axes = list(range(img.ndim))
    new_axes[0], new_axes[1] = 1, 0  # Swap the first two dimensions

    # Transpose the array using the new axes order
    return img.transpose(new_axes)

`def validate_if_not_found_coords (if_not_found_coords)` [view source on GitHub]¶

Validate and process if_not_found_coords parameter.

Source code in albumentations/augmentations/geometric/functional.py

Python

def validate_if_not_found_coords(
    if_not_found_coords: Sequence[int] | dict[str, Any] | None,
) -> tuple[bool, int, int]:
    """Validate and process `if_not_found_coords` parameter."""
    if if_not_found_coords is None:
        return True, -1, -1
    if isinstance(if_not_found_coords, (tuple, list)):
        if len(if_not_found_coords) != TWO:
            msg = "Expected tuple/list 'if_not_found_coords' to contain exactly two entries."
            raise ValueError(msg)
        return False, if_not_found_coords[0], if_not_found_coords[1]
    if isinstance(if_not_found_coords, dict):
        return False, if_not_found_coords["x"], if_not_found_coords["y"]

    msg = "Expected if_not_found_coords to be None, tuple, list, or dict."
    raise ValueError(msg)

`resize` ¶

`class LongestMaxSize` `(max_size=1024, interpolation=1, always_apply=None, p=1)` [view source on GitHub] ¶

Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name	Type	Description
`max_size`	`int, list of int`	maximum size of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.
`interpolation`	`OpenCV flag`	interpolation method. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py

Python

class LongestMaxSize(DualTransform):
    """Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
            will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(MaxSizeInitSchema):
        pass

    def __init__(
        self,
        max_size: int | Sequence[int] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self,
        img: np.ndarray,
        max_size: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.longest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        max_size: int,
        **params: Any,
    ) -> KeypointInternalType:
        height = params["rows"]
        width = params["cols"]

        scale = max_size / max([height, width])
        return fgeometric.keypoint_scale(keypoint, scale, scale)

    def get_params(self) -> dict[str, int]:
        return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("max_size", "interpolation")

`apply (self, img, max_size, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py

Python

def apply(
    self,
    img: np.ndarray,
    max_size: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.longest_max_size(img, max_size=max_size, interpolation=interpolation)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/resize.py

Python

def get_params(self) -> dict[str, int]:
    return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/resize.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("max_size", "interpolation")

`class RandomScale` `(scale_limit=0.1, interpolation=1, always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly resize the input. Output image size is different from the input image size.

Parameters:

Name	Type	Description
`scale_limit`	`float, float) or float`	scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py

Python

class RandomScale(DualTransform):
    """Randomly resize the input. Output image size is different from the input image size.

    Args:
        scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
            range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
            If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
            Default: (-0.1, 0.1).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale_limit: ScaleFloatType = Field(
            default=0.1,
            description="Scaling factor range. If a single float value => (1-scale_limit, 1 + scale_limit).",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR

        @field_validator("scale_limit")
        @classmethod
        def check_scale_limit(cls, v: ScaleFloatType) -> tuple[float, float]:
            return to_tuple(v, bias=1.0)

    def __init__(
        self,
        scale_limit: ScaleFloatType = 0.1,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.scale_limit = cast(Tuple[float, float], scale_limit)
        self.interpolation = interpolation

    def get_params(self) -> dict[str, float]:
        return {"scale": random.uniform(self.scale_limit[0], self.scale_limit[1])}

    def apply(
        self,
        img: np.ndarray,
        scale: float,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.scale(img, scale, interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        scale: float,
        **params: Any,
    ) -> KeypointInternalType:
        return fgeometric.keypoint_scale(keypoint, scale, scale)

    def get_transform_init_args(self) -> dict[str, Any]:
        return {"interpolation": self.interpolation, "scale_limit": to_tuple(self.scale_limit, bias=-1.0)}

`apply (self, img, scale, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py

Python

def apply(
    self,
    img: np.ndarray,
    scale: float,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.scale(img, scale, interpolation)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/resize.py

Python

def get_params(self) -> dict[str, float]:
    return {"scale": random.uniform(self.scale_limit[0], self.scale_limit[1])}

`class Resize` `(height, width, interpolation=1, always_apply=None, p=1)` [view source on GitHub] ¶

Resize the input to the given height and width.

Parameters:

Name	Type	Description
`height`	`int`	desired height of the output.
`width`	`int`	desired width of the output.
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py

Python

class Resize(DualTransform):
    """Resize the input to the given height and width.

    Args:
        height (int): desired height of the output.
        width (int): desired width of the output.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        height: int = Field(ge=1, description="Desired height of the output.")
        width: int = Field(ge=1, description="Desired width of the output.")
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1

    def __init__(
        self,
        height: int,
        width: int,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(self, img: np.ndarray, interpolation: int, **params: Any) -> np.ndarray:
        return fgeometric.resize(img, height=self.height, width=self.width, interpolation=interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        height = params["rows"]
        width = params["cols"]
        scale_x = self.width / width
        scale_y = self.height / height
        return fgeometric.keypoint_scale(keypoint, scale_x, scale_y)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("height", "width", "interpolation")

`apply (self, img, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py

Python

def apply(self, img: np.ndarray, interpolation: int, **params: Any) -> np.ndarray:
    return fgeometric.resize(img, height=self.height, width=self.width, interpolation=interpolation)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/resize.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("height", "width", "interpolation")

`class SmallestMaxSize` `(max_size=1024, interpolation=1, always_apply=None, p=1)` [view source on GitHub] ¶

Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name	Type	Description
`max_size`	`int, list of int`	maximum size of smallest side of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.
`interpolation`	`OpenCV flag`	interpolation method. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py

Python

class SmallestMaxSize(DualTransform):
    """Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of smallest side of the image after the transformation. When using a
            list, max size will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(MaxSizeInitSchema):
        pass

    def __init__(
        self,
        max_size: int | Sequence[int] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self,
        img: np.ndarray,
        max_size: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.smallest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return bbox

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        max_size: int,
        **params: Any,
    ) -> KeypointInternalType:
        height = params["rows"]
        width = params["cols"]

        scale = max_size / min([height, width])
        return fgeometric.keypoint_scale(keypoint, scale, scale)

    def get_params(self) -> dict[str, int]:
        return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("max_size", "interpolation")

`apply (self, img, max_size, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py

Python

def apply(
    self,
    img: np.ndarray,
    max_size: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.smallest_max_size(img, max_size=max_size, interpolation=interpolation)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/resize.py

Python

def get_params(self) -> dict[str, int]:
    return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/resize.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("max_size", "interpolation")

`rotate` ¶

`class RandomRotate90` [view source on GitHub] ¶

Randomly rotate the input by 90 degrees zero or more times.

Parameters:

Name	Type	Description
`p`		probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py

Python

class RandomRotate90(DualTransform):
    """Randomly rotate the input by 90 degrees zero or more times.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, factor: float, **params: Any) -> np.ndarray:
        return fgeometric.rot90(img, factor)

    def get_params(self) -> dict[str, int]:
        # Random int in the range [0, 3]
        return {"factor": random.randint(0, 3)}

    def apply_to_bbox(self, bbox: BoxInternalType, factor: int, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_rot90(bbox, factor, params["shape"][0], params["shape"][1])

    def apply_to_keypoint(self, keypoint: KeypointInternalType, factor: int, **params: Any) -> BoxInternalType:
        return fgeometric.keypoint_rot90(keypoint, factor, params["shape"][0], params["shape"][1])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, factor, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def apply(self, img: np.ndarray, factor: float, **params: Any) -> np.ndarray:
    return fgeometric.rot90(img, factor)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def get_params(self) -> dict[str, int]:
    # Random int in the range [0, 3]
    return {"factor": random.randint(0, 3)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class Rotate` `(limit=(-90, 90), interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', crop_border=False, always_apply=None, p=0.5)` [view source on GitHub] ¶

Rotate the input by an angle selected randomly from the uniform distribution.

Parameters:

Name	Type	Description
`limit`	`ScaleFloatType`	range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`border_mode`	`OpenCV flag`	flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
`value`	`int, float, list of ints, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT.
`mask_value`	`int, float, list of ints, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
`rotate_method`	`str`	rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"
`crop_border`	`bool`	If True would make a largest possible crop within rotated image
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py

Python

class Rotate(DualTransform):
    """Rotate the input by an angle selected randomly from the uniform distribution.

    Args:
        limit: range from which a random angle is picked. If limit is a single int
            an angle is picked from (-limit, limit). Default: (-90, 90)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
            Default: "largest_box"
        crop_border (bool): If True would make a largest possible crop within rotated image
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(RotateInitSchema):
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box"
        crop_border: bool = Field(
            default=False,
            description="If True, makes a largest possible crop within the rotated image.",
        )

    def __init__(
        self,
        limit: ScaleFloatType = (-90, 90),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        crop_border: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.limit = cast(Tuple[float, float], limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.rotate_method = rotate_method
        self.crop_border = crop_border

    def apply(
        self,
        img: np.ndarray,
        angle: float,
        interpolation: int,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        img_out = fgeometric.rotate(img, angle, interpolation, self.border_mode, self.value)
        if self.crop_border:
            return fcrops.crop(img_out, x_min, y_min, x_max, y_max)
        return img_out

    def apply_to_mask(
        self,
        mask: np.ndarray,
        angle: float,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        img_out = fgeometric.rotate(mask, angle, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
        if self.crop_border:
            return fcrops.crop(img_out, x_min, y_min, x_max, y_max)
        return img_out

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        angle: float,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        cols: int,
        rows: int,
        **params: Any,
    ) -> np.ndarray:
        bbox_out = fgeometric.bbox_rotate(bbox, angle, self.rotate_method, rows, cols)
        if self.crop_border:
            return fcrops.crop_bbox_by_coords(bbox_out, (x_min, y_min, x_max, y_max), rows, cols)
        return bbox_out

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        angle: float,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        cols: int,
        rows: int,
        **params: Any,
    ) -> KeypointInternalType:
        keypoint_out = fgeometric.keypoint_rotate(keypoint, angle, rows, cols, **params)
        if self.crop_border:
            return fcrops.crop_keypoint_by_coords(keypoint_out, (x_min, y_min, x_max, y_max))
        return keypoint_out

    @staticmethod
    def _rotated_rect_with_max_area(height: int, width: int, angle: float) -> dict[str, int]:
        """Given a rectangle of size wxh that has been rotated by 'angle' (in
        degrees), computes the width and height of the largest possible
        axis-aligned rectangle (maximal area) within the rotated rectangle.

        Reference:
            https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders
        """
        angle = math.radians(angle)
        width_is_longer = width >= height
        side_long, side_short = (width, height) if width_is_longer else (height, width)

        # since the solutions for angle, -angle and 180-angle are all the same,
        # it is sufficient to look at the first quadrant and the absolute values of sin,cos:
        sin_a, cos_a = abs(math.sin(angle)), abs(math.cos(angle))
        if side_short <= 2.0 * sin_a * cos_a * side_long or abs(sin_a - cos_a) < SMALL_NUMBER:
            # half constrained case: two crop corners touch the longer side,
            # the other two corners are on the mid-line parallel to the longer line
            x = 0.5 * side_short
            wr, hr = (x / sin_a, x / cos_a) if width_is_longer else (x / cos_a, x / sin_a)
        else:
            # fully constrained case: crop touches all 4 sides
            cos_2a = cos_a * cos_a - sin_a * sin_a
            wr, hr = (width * cos_a - height * sin_a) / cos_2a, (height * cos_a - width * sin_a) / cos_2a

        return {
            "x_min": max(0, int(width / 2 - wr / 2)),
            "x_max": min(width, int(width / 2 + wr / 2)),
            "y_min": max(0, int(height / 2 - hr / 2)),
            "y_max": min(height, int(height / 2 + hr / 2)),
        }

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        out_params = {"angle": random.uniform(self.limit[0], self.limit[1])}
        if self.crop_border:
            height, width = params["shape"][:2]
            out_params.update(self._rotated_rect_with_max_area(height, width, out_params["angle"]))
        else:
            out_params.update({"x_min": -1, "x_max": -1, "y_min": -1, "y_max": -1})

        return out_params

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method", "crop_border"

`apply (self, img, angle, interpolation, x_min, x_max, y_min, y_max, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def apply(
    self,
    img: np.ndarray,
    angle: float,
    interpolation: int,
    x_min: int,
    x_max: int,
    y_min: int,
    y_max: int,
    **params: Any,
) -> np.ndarray:
    img_out = fgeometric.rotate(img, angle, interpolation, self.border_mode, self.value)
    if self.crop_border:
        return fcrops.crop(img_out, x_min, y_min, x_max, y_max)
    return img_out

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    out_params = {"angle": random.uniform(self.limit[0], self.limit[1])}
    if self.crop_border:
        height, width = params["shape"][:2]
        out_params.update(self._rotated_rect_with_max_area(height, width, out_params["angle"]))
    else:
        out_params.update({"x_min": -1, "x_max": -1, "y_min": -1, "y_max": -1})

    return out_params

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method", "crop_border"

`class SafeRotate` `(limit=(-90, 90), interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=None, p=0.5)` [view source on GitHub] ¶

Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we may see some artifacts.

Parameters:

Name	Type	Description
`limit`	`int, int) or int`	range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`border_mode`	`OpenCV flag`	flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
`value`	`int, float, list of ints, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT.
`mask_value`	`int, float, list of ints, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py

Python

class SafeRotate(DualTransform):
    """Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

    The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and
    after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we
    may see some artifacts.

    Args:
        limit ((int, int) or int): range from which a random angle is picked. If limit is a single int
            an angle is picked from (-limit, limit). Default: (-90, 90)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(RotateInitSchema):
        pass

    def __init__(
        self,
        limit: ScaleFloatType = (-90, 90),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.limit = cast(Tuple[float, float], limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def apply(self, img: np.ndarray, matrix: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.safe_rotate(img, matrix, self.interpolation, self.value, self.border_mode)

    def apply_to_mask(self, mask: np.ndarray, matrix: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.safe_rotate(mask, matrix, cv2.INTER_NEAREST, self.mask_value, self.border_mode)

    def apply_to_bbox(self, bbox: BoxInternalType, cols: int, rows: int, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_safe_rotate(bbox, params["matrix"], cols, rows)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        angle: float,
        scale_x: float,
        scale_y: float,
        cols: int,
        rows: int,
        **params: Any,
    ) -> KeypointInternalType:
        return fgeometric.keypoint_safe_rotate(keypoint, params["matrix"], angle, scale_x, scale_y, cols, rows)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        angle = random.uniform(*self.limit)

        # https://stackoverflow.com/questions/43892506/opencv-python-rotate-image-without-cropping-sides
        image_center = center(width, height)

        # Rotation Matrix
        rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)

        # rotation calculates the cos and sin, taking absolutes of those.
        abs_cos = abs(rotation_mat[0, 0])
        abs_sin = abs(rotation_mat[0, 1])

        # find the new width and height bounds
        new_w = math.ceil(height * abs_sin + width * abs_cos)
        new_h = math.ceil(height * abs_cos + width * abs_sin)

        scale_x = width / new_w
        scale_y = height / new_h

        # Shift the image to create padding
        rotation_mat[0, 2] += new_w / 2 - image_center[0]
        rotation_mat[1, 2] += new_h / 2 - image_center[1]

        # Rescale to original size
        scale_mat = np.diag(np.ones(3))
        scale_mat[0, 0] *= scale_x
        scale_mat[1, 1] *= scale_y
        _tmp = np.diag(np.ones(3))
        _tmp[:2] = rotation_mat
        _tmp = scale_mat @ _tmp
        rotation_mat = _tmp[:2]

        return {"matrix": rotation_mat, "angle": angle, "scale_x": scale_x, "scale_y": scale_y}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "limit", "interpolation", "border_mode", "value", "mask_value"

`apply (self, img, matrix, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def apply(self, img: np.ndarray, matrix: np.ndarray, **params: Any) -> np.ndarray:
    return fgeometric.safe_rotate(img, matrix, self.interpolation, self.value, self.border_mode)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    angle = random.uniform(*self.limit)

    # https://stackoverflow.com/questions/43892506/opencv-python-rotate-image-without-cropping-sides
    image_center = center(width, height)

    # Rotation Matrix
    rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)

    # rotation calculates the cos and sin, taking absolutes of those.
    abs_cos = abs(rotation_mat[0, 0])
    abs_sin = abs(rotation_mat[0, 1])

    # find the new width and height bounds
    new_w = math.ceil(height * abs_sin + width * abs_cos)
    new_h = math.ceil(height * abs_cos + width * abs_sin)

    scale_x = width / new_w
    scale_y = height / new_h

    # Shift the image to create padding
    rotation_mat[0, 2] += new_w / 2 - image_center[0]
    rotation_mat[1, 2] += new_h / 2 - image_center[1]

    # Rescale to original size
    scale_mat = np.diag(np.ones(3))
    scale_mat[0, 0] *= scale_x
    scale_mat[1, 1] *= scale_y
    _tmp = np.diag(np.ones(3))
    _tmp[:2] = rotation_mat
    _tmp = scale_mat @ _tmp
    rotation_mat = _tmp[:2]

    return {"matrix": rotation_mat, "angle": angle, "scale_x": scale_x, "scale_y": scale_y}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/rotate.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "limit", "interpolation", "border_mode", "value", "mask_value"

`transforms` ¶

`class Affine` `(scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', balanced_scale=False, always_apply=None, p=0.5)` [view source on GitHub] ¶

Augmentation to apply affine transformations to images.

Affine transformations involve:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)

All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval and mode of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.

Parameters:

Name	Type	Description
`scale`	`number, tuple of number or dict`	Scaling factor to use, where `1.0` denotes "no change" and `0.5` is zoomed out to `50` percent of the original size. * If a single number, then that value will be used for all images. * If a tuple `(a, b)`, then a value will be uniformly sampled per image from the interval `[a, b]`. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set `keep_ratio=True`, then the same value will be used for both x- and y-axis. * If a dictionary, then it is expected to have the keys `x` and/or `y`. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes. Note that when the `keep_ratio=True`, the x- and y-axis ranges should be the same.
`translate_percent`	`None, number, tuple of number or dict`	Translation as a fraction of the image height/width (x-translation, y-translation), where `0` denotes "no change" and `0.5` denotes "half of the axis size". * If `None` then equivalent to `0.0` unless `translate_px` has a value other than `None`. * If a single number, then that value will be used for all images. * If a tuple `(a, b)`, then a value will be uniformly sampled per image from the interval `[a, b]`. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys `x` and/or `y`. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.
`translate_px`	`None, int, tuple of int or dict`	Translation in pixels. * If `None` then equivalent to `0` unless `translate_percent` has a value other than `None`. * If a single int, then that value will be used for all images. * If a tuple `(a, b)`, then a value will be uniformly sampled per image from the discrete interval `[a..b]`. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys `x` and/or `y`. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.
`rotate`	`number or tuple of number`	Rotation in degrees (NOT radians), i.e. expected value range is around `[-360, 360]`. Rotation happens around the center of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple `(a, b)`, then a value will be uniformly sampled per image from the interval `[a, b]` and used as the rotation value.
`shear`	`number, tuple of number or dict`	Shear in degrees (NOT radians), i.e. expected value range is around `[-360, 360]`, with reasonable values being in the range of `[-45, 45]`. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple `(a, b)`, then two value will be uniformly sampled per image from the interval `[a, b]` and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys `x` and/or `y`. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.
`interpolation`	`int`	OpenCV interpolation flag.
`mask_interpolation`	`int`	OpenCV interpolation flag.
`cval`	`number or sequence of number`	The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when `mode=constant`. The expected value range is `[0, 255]` for `uint8` images.
`cval_mask`	`number or tuple of number`	Same as cval but only for masks.
`mode`	`int`	OpenCV border flag.
`fit_output`	`bool`	If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (`translate_percent` and `translate_px` are ignored). Otherwise (`False`), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False
`keep_ratio`	`bool`	When True, the original aspect ratio will be kept when the random scale is applied. Default: False.
`rotate_method`	`Literal["largest_box", "ellipse"]`	rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"
`balanced_scale`	`bool`	When True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False. This is important because without it, scaling tends to lean towards upscaling. For example, if we want the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly from [0.5, 2]. With `balanced_scale`, the function ensures that half the time, the scaling factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in). This makes the zooming in and out process more balanced.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Reference

[1] https://arxiv.org/abs/2109.13488

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class Affine(DualTransform):
    """Augmentation to apply affine transformations to images.

    Affine transformations involve:

        - Translation ("move" image on the x-/y-axis)
        - Rotation
        - Scaling ("zoom" in/out)
        - Shear (move one side of the image, turning a square into a trapezoid)

    All such transformations can create "new" pixels in the image without a defined content, e.g.
    if the image is translated to the left, pixels are created on the right.
    A method has to be defined to deal with these pixel values.
    The parameters `cval` and `mode` of this class deal with this.

    Some transformations involve interpolations between several pixels
    of the input image to generate output pixel values. The parameters `interpolation` and
    `mask_interpolation` deals with the method of interpolation used for this.

    Args:
        scale (number, tuple of number or dict): Scaling factor to use, where ``1.0`` denotes "no change" and
            ``0.5`` is zoomed out to ``50`` percent of the original size.
                * If a single number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
                  That the same range will be used for both x- and y-axis. To keep the aspect ratio, set
                  ``keep_ratio=True``, then the same value will be used for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes. Note that when
                  the ``keep_ratio=True``, the x- and y-axis ranges should be the same.
        translate_percent (None, number, tuple of number or dict): Translation as a fraction of the image height/width
            (x-translation, y-translation), where ``0`` denotes "no change"
            and ``0.5`` denotes "half of the axis size".
                * If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``.
                * If a single number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
                  That sampled fraction value will be used identically for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        translate_px (None, int, tuple of int or dict): Translation in pixels.
                * If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``.
                * If a single int, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from
                  the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        rotate (number or tuple of number): Rotation in degrees (**NOT** radians), i.e. expected value range is
            around ``[-360, 360]``. Rotation happens around the *center* of the image,
            not the top left corner as in some other frameworks.
                * If a number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``
                  and used as the rotation value.
        shear (number, tuple of number or dict): Shear in degrees (**NOT** radians), i.e. expected value range is
            around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``.
                * If a number, then that value will be used for all images as
                  the shear on the x-axis (no shear on the y-axis will be done).
                * If a tuple ``(a, b)``, then two value will be uniformly sampled per image
                  from the interval ``[a, b]`` and be used as the x- and y-shear value.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        interpolation (int): OpenCV interpolation flag.
        mask_interpolation (int): OpenCV interpolation flag.
        cval (number or sequence of number): The constant value to use when filling in newly created pixels.
            (E.g. translating by 1px to the right will create a new 1px-wide column of pixels
            on the left of the image).
            The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images.
        cval_mask (number or tuple of number): Same as cval but only for masks.
        mode (int): OpenCV border flag.
        fit_output (bool): If True, the image plane size and position will be adjusted to tightly capture
            the whole image after affine transformation (`translate_percent` and `translate_px` are ignored).
            Otherwise (``False``),  parts of the transformed image may end up outside the image plane.
            Fitting the output shape can be useful to avoid corners of the image being outside the image plane
            after applying rotations. Default: False
        keep_ratio (bool): When True, the original aspect ratio will be kept when the random scale is applied.
            Default: False.
        rotate_method (Literal["largest_box", "ellipse"]): rotation method used for the bounding boxes.
            Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"
        balanced_scale (bool): When True, scaling factors are chosen to be either entirely below or above 1,
            ensuring balanced scaling. Default: False.

            This is important because without it, scaling tends to lean towards upscaling. For example, if we want
            the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is
            three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly
            from [0.5, 2]. With `balanced_scale`, the  function ensures that half the time, the scaling
            factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in).
            This makes the zooming in and out process more balanced.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    Reference:
        [1] https://arxiv.org/abs/2109.13488

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: ScaleFloatType | dict[str, Any] | None = Field(
            default=None,
            description="Scaling factor or dictionary for independent axis scaling.",
        )
        translate_percent: ScaleFloatType | dict[str, Any] | None = Field(
            default=None,
            description="Translation as a fraction of the image dimension.",
        )
        translate_px: ScaleIntType | dict[str, Any] | None = Field(
            default=None,
            description="Translation in pixels.",
        )
        rotate: ScaleFloatType | None = Field(default=None, description="Rotation angle in degrees.")
        shear: ScaleFloatType | dict[str, Any] | None = Field(
            default=None,
            description="Shear angle in degrees.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        mask_interpolation: InterpolationType = cv2.INTER_NEAREST

        cval: ColorType = Field(default=0, description="Value used for constant padding.")
        cval_mask: ColorType = Field(default=0, description="Value used for mask constant padding.")
        mode: BorderModeType = cv2.BORDER_CONSTANT
        fit_output: Annotated[bool, Field(default=False, description="Adjust output to capture whole image.")]
        keep_ratio: Annotated[bool, Field(default=False, description="Maintain aspect ratio when scaling.")]
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box"
        balanced_scale: Annotated[bool, Field(default=False, description="Use balanced scaling.")]

    def __init__(
        self,
        scale: ScaleFloatType | dict[str, Any] | None = None,
        translate_percent: ScaleFloatType | dict[str, Any] | None = None,
        translate_px: ScaleIntType | dict[str, Any] | None = None,
        rotate: ScaleFloatType | None = None,
        shear: ScaleFloatType | dict[str, Any] | None = None,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        cval: ColorType = 0,
        cval_mask: ColorType = 0,
        mode: int = cv2.BORDER_CONSTANT,
        fit_output: bool = False,
        keep_ratio: bool = False,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        balanced_scale: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        params = [scale, translate_percent, translate_px, rotate, shear]
        if all(p is None for p in params):
            scale = {"x": (0.9, 1.1), "y": (0.9, 1.1)}
            translate_percent = {"x": (-0.1, 0.1), "y": (-0.1, 0.1)}
            rotate = (-15, 15)
            shear = {"x": (-10, 10), "y": (-10, 10)}
        else:
            scale = scale if scale is not None else 1.0
            rotate = rotate if rotate is not None else 0.0
            shear = shear if shear is not None else 0.0

        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.scale = self._handle_dict_arg(scale, "scale")
        self.translate_percent, self.translate_px = self._handle_translate_arg(translate_px, translate_percent)
        self.rotate = to_tuple(rotate, rotate)
        self.fit_output = fit_output
        self.shear = self._handle_dict_arg(shear, "shear")
        self.keep_ratio = keep_ratio
        self.rotate_method = rotate_method
        self.balanced_scale = balanced_scale

        if self.keep_ratio and self.scale["x"] != self.scale["y"]:
            raise ValueError(f"When keep_ratio is True, the x and y scale range should be identical. got {self.scale}")

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "interpolation",
            "mask_interpolation",
            "cval",
            "mode",
            "scale",
            "translate_percent",
            "translate_px",
            "rotate",
            "fit_output",
            "shear",
            "cval_mask",
            "keep_ratio",
            "rotate_method",
            "balanced_scale",
        )

    @staticmethod
    def _handle_dict_arg(
        val: float | tuple[float, float] | dict[str, Any],
        name: str,
        default: float = 1.0,
    ) -> dict[str, Any]:
        if isinstance(val, dict):
            if "x" not in val and "y" not in val:
                raise ValueError(
                    f'Expected {name} dictionary to contain at least key "x" or key "y". Found neither of them.',
                )
            x = val.get("x", default)
            y = val.get("y", default)
            return {"x": to_tuple(x, x), "y": to_tuple(y, y)}
        return {"x": to_tuple(val, val), "y": to_tuple(val, val)}

    @classmethod
    def _handle_translate_arg(
        cls,
        translate_px: ScaleFloatType | dict[str, Any] | None,
        translate_percent: ScaleFloatType | dict[str, Any] | None,
    ) -> Any:
        if translate_percent is None and translate_px is None:
            translate_px = 0

        if translate_percent is not None and translate_px is not None:
            msg = "Expected either translate_percent or translate_px to be provided, but both were provided."
            raise ValueError(msg)

        if translate_percent is not None:
            # translate by percent
            return cls._handle_dict_arg(translate_percent, "translate_percent", default=0.0), translate_px

        if translate_px is None:
            msg = "translate_px is None."
            raise ValueError(msg)
        # translate by pixels
        return translate_percent, cls._handle_dict_arg(translate_px, "translate_px")

    def apply(
        self,
        img: np.ndarray,
        matrix: skimage.transform.ProjectiveTransform,
        output_shape: SizeType,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.warp_affine(
            img,
            matrix,
            interpolation=self.interpolation,
            cval=self.cval,
            mode=self.mode,
            output_shape=output_shape,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        matrix: skimage.transform.ProjectiveTransform,
        output_shape: SizeType,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.warp_affine(
            mask,
            matrix,
            interpolation=self.mask_interpolation,
            cval=self.cval_mask,
            mode=self.mode,
            output_shape=output_shape,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        bbox_matrix: skimage.transform.ProjectiveTransform,
        rows: int,
        cols: int,
        output_shape: SizeType,
        **params: Any,
    ) -> BoxInternalType:
        return fgeometric.bbox_affine(bbox, bbox_matrix, self.rotate_method, rows, cols, output_shape)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        matrix: skimage.transform.ProjectiveTransform,
        scale: dict[str, Any],
        **params: Any,
    ) -> KeypointInternalType:
        if scale is None:
            msg = "Expected scale to be provided, but got None."
            raise ValueError(msg)
        if matrix is None:
            msg = "Expected matrix to be provided, but got None."
            raise ValueError(msg)

        return fgeometric.keypoint_affine(keypoint, matrix=matrix, scale=scale)

    @staticmethod
    def get_scale(scale: dict[str, tuple[float, float]], keep_ratio: bool, balanced_scale: bool) -> dict[str, float]:
        result_scale = {}
        if balanced_scale:
            for key, value in scale.items():
                lower_interval = (value[0], 1.0) if value[0] < 1 else None
                upper_interval = (1.0, value[1]) if value[1] > 1 else None

                if lower_interval is not None and upper_interval is not None:
                    selected_interval = random.choice([lower_interval, upper_interval])
                elif lower_interval is not None:
                    selected_interval = lower_interval
                elif upper_interval is not None:
                    selected_interval = upper_interval
                else:
                    raise ValueError(f"Both lower_interval and upper_interval are None for key: {key}")

                result_scale[key] = random.uniform(*selected_interval)
        else:
            result_scale = {key: random.uniform(*value) for key, value in scale.items()}

        if keep_ratio:
            result_scale["y"] = result_scale["x"]

        return result_scale

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        translate: dict[str, int | float]
        if self.translate_px is not None:
            translate = {key: random.randint(*value) for key, value in self.translate_px.items()}
        elif self.translate_percent is not None:
            translate = {key: random.uniform(*value) for key, value in self.translate_percent.items()}
            translate["x"] = translate["x"] * width
            translate["y"] = translate["y"] * height
        else:
            translate = {"x": 0, "y": 0}

        shear = {key: -random.uniform(*value) for key, value in self.shear.items()}

        scale = self.get_scale(self.scale, self.keep_ratio, self.balanced_scale)
        rotate = -random.uniform(*self.rotate)

        shift_x, shift_y = center(width, height)
        shift_x_bbox, shift_y_bbox = center_bbox(width, height)

        # Image transformation matrix
        matrix_to_topleft = skimage.transform.SimilarityTransform(translation=[-shift_x, -shift_y])
        matrix_shear_y_rot = skimage.transform.AffineTransform(rotation=-np.pi / 2)
        matrix_shear_y = skimage.transform.AffineTransform(shear=np.deg2rad(shear["y"]))
        matrix_shear_y_rot_inv = skimage.transform.AffineTransform(rotation=np.pi / 2)
        matrix_transforms = skimage.transform.AffineTransform(
            scale=(scale["x"], scale["y"]),
            translation=(translate["x"], translate["y"]),
            rotation=np.deg2rad(rotate),
            shear=np.deg2rad(shear["x"]),
        )
        matrix_to_center = skimage.transform.SimilarityTransform(translation=[shift_x, shift_y])
        matrix = (
            matrix_to_topleft
            + matrix_shear_y_rot
            + matrix_shear_y
            + matrix_shear_y_rot_inv
            + matrix_transforms
            + matrix_to_center
        )

        # Bounding box transformation matrix
        matrix_to_topleft_bbox = skimage.transform.SimilarityTransform(translation=[-shift_x_bbox, -shift_y_bbox])
        matrix_to_center_bbox = skimage.transform.SimilarityTransform(translation=[shift_x_bbox, shift_y_bbox])
        bbox_matrix = (
            matrix_to_topleft_bbox
            + matrix_shear_y_rot
            + matrix_shear_y
            + matrix_shear_y_rot_inv
            + matrix_transforms
            + matrix_to_center_bbox
        )

        if self.fit_output:
            matrix, output_shape = self._compute_affine_warp_output_shape(matrix, params["shape"])
        else:
            output_shape = params["shape"]

        return {
            "rotate": rotate,
            "scale": scale,
            "matrix": matrix,
            "bbox_matrix": bbox_matrix,
            "output_shape": output_shape,
        }

    @staticmethod
    def _compute_affine_warp_output_shape(
        matrix: skimage.transform.ProjectiveTransform,
        input_shape: SizeType,
    ) -> tuple[skimage.transform.ProjectiveTransform, SizeType]:
        height, width = input_shape[:2]

        if height == 0 or width == 0:
            return matrix, input_shape

        # determine shape of output image
        corners = np.array([[0, 0], [0, height - 1], [width - 1, height - 1], [width - 1, 0]])
        corners = matrix(corners)

        minc = corners[:, 0].min()
        minr = corners[:, 1].min()
        maxc = corners[:, 0].max()
        maxr = corners[:, 1].max()

        out_height = maxr - minr + 1
        out_width = maxc - minc + 1

        if len(input_shape) == NUM_MULTI_CHANNEL_DIMENSIONS:
            output_shape = np.ceil((out_height, out_width, input_shape[2]))
        else:
            output_shape = np.ceil((out_height, out_width))

        output_shape_tuple = tuple(int(v) for v in output_shape.tolist())
        # fit output image in new shape
        translation = -minc, -minr
        matrix_to_fit = skimage.transform.SimilarityTransform(translation=translation)
        matrix += matrix_to_fit
        return matrix, output_shape_tuple

`apply (self, img, matrix, output_shape, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    matrix: skimage.transform.ProjectiveTransform,
    output_shape: SizeType,
    **params: Any,
) -> np.ndarray:
    return fgeometric.warp_affine(
        img,
        matrix,
        interpolation=self.interpolation,
        cval=self.cval,
        mode=self.mode,
        output_shape=output_shape,
    )

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    translate: dict[str, int | float]
    if self.translate_px is not None:
        translate = {key: random.randint(*value) for key, value in self.translate_px.items()}
    elif self.translate_percent is not None:
        translate = {key: random.uniform(*value) for key, value in self.translate_percent.items()}
        translate["x"] = translate["x"] * width
        translate["y"] = translate["y"] * height
    else:
        translate = {"x": 0, "y": 0}

    shear = {key: -random.uniform(*value) for key, value in self.shear.items()}

    scale = self.get_scale(self.scale, self.keep_ratio, self.balanced_scale)
    rotate = -random.uniform(*self.rotate)

    shift_x, shift_y = center(width, height)
    shift_x_bbox, shift_y_bbox = center_bbox(width, height)

    # Image transformation matrix
    matrix_to_topleft = skimage.transform.SimilarityTransform(translation=[-shift_x, -shift_y])
    matrix_shear_y_rot = skimage.transform.AffineTransform(rotation=-np.pi / 2)
    matrix_shear_y = skimage.transform.AffineTransform(shear=np.deg2rad(shear["y"]))
    matrix_shear_y_rot_inv = skimage.transform.AffineTransform(rotation=np.pi / 2)
    matrix_transforms = skimage.transform.AffineTransform(
        scale=(scale["x"], scale["y"]),
        translation=(translate["x"], translate["y"]),
        rotation=np.deg2rad(rotate),
        shear=np.deg2rad(shear["x"]),
    )
    matrix_to_center = skimage.transform.SimilarityTransform(translation=[shift_x, shift_y])
    matrix = (
        matrix_to_topleft
        + matrix_shear_y_rot
        + matrix_shear_y
        + matrix_shear_y_rot_inv
        + matrix_transforms
        + matrix_to_center
    )

    # Bounding box transformation matrix
    matrix_to_topleft_bbox = skimage.transform.SimilarityTransform(translation=[-shift_x_bbox, -shift_y_bbox])
    matrix_to_center_bbox = skimage.transform.SimilarityTransform(translation=[shift_x_bbox, shift_y_bbox])
    bbox_matrix = (
        matrix_to_topleft_bbox
        + matrix_shear_y_rot
        + matrix_shear_y
        + matrix_shear_y_rot_inv
        + matrix_transforms
        + matrix_to_center_bbox
    )

    if self.fit_output:
        matrix, output_shape = self._compute_affine_warp_output_shape(matrix, params["shape"])
    else:
        output_shape = params["shape"]

    return {
        "rotate": rotate,
        "scale": scale,
        "matrix": matrix,
        "bbox_matrix": bbox_matrix,
        "output_shape": output_shape,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "interpolation",
        "mask_interpolation",
        "cval",
        "mode",
        "scale",
        "translate_percent",
        "translate_px",
        "rotate",
        "fit_output",
        "shear",
        "cval_mask",
        "keep_ratio",
        "rotate_method",
        "balanced_scale",
    )

`class D4` `(always_apply=None, p=1)` [view source on GitHub] ¶

Applies one of the eight possible D4 dihedral group transformations to a square-shaped input, maintaining the square shape. These transformations correspond to the symmetries of a square, including rotations and reflections.

The D4 group transformations include: - 'e' (identity): No transformation is applied. - 'r90' (rotation by 90 degrees counterclockwise) - 'r180' (rotation by 180 degrees) - 'r270' (rotation by 270 degrees counterclockwise) - 'v' (reflection across the vertical midline) - 'hvt' (reflection across the anti-diagonal) - 'h' (reflection across the horizontal midline) - 't' (reflection across the main diagonal)

Even if the probability (p) of applying the transform is set to 1, the identity transformation 'e' may still occur, which means the input will remain unchanged in one out of eight cases.

Parameters:

Name	Type	Description
`p`	`float`	Probability of applying the transform. Default is 1, meaning the transform is applied every time it is called.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

This transform is particularly useful when augmenting data that does not have a clear orientation: - Top view satellite or drone imagery - Medical images

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class D4(DualTransform):
    """Applies one of the eight possible D4 dihedral group transformations to a square-shaped input,
        maintaining the square shape. These transformations correspond to the symmetries of a square,
        including rotations and reflections.

    The D4 group transformations include:
    - 'e' (identity): No transformation is applied.
    - 'r90' (rotation by 90 degrees counterclockwise)
    - 'r180' (rotation by 180 degrees)
    - 'r270' (rotation by 270 degrees counterclockwise)
    - 'v' (reflection across the vertical midline)
    - 'hvt' (reflection across the anti-diagonal)
    - 'h' (reflection across the horizontal midline)
    - 't' (reflection across the main diagonal)

    Even if the probability (`p`) of applying the transform is set to 1, the identity transformation
    'e' may still occur, which means the input will remain unchanged in one out of eight cases.

    Args:
        p (float): Probability of applying the transform. Default is 1, meaning the
                   transform is applied every time it is called.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        This transform is particularly useful when augmenting data that does not have a clear orientation:
        - Top view satellite or drone imagery
        - Medical images

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        p: ProbabilityType = 1

    def __init__(
        self,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)

    def apply(self, img: np.ndarray, group_element: D4Type, **params: Any) -> np.ndarray:
        return fgeometric.d4(img, group_element)

    def apply_to_bbox(self, bbox: BoxInternalType, group_element: D4Type, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_d4(bbox, group_element, params["shape"][0], params["shape"][1])

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        group_element: D4Type,
        **params: Any,
    ) -> KeypointInternalType:
        return fgeometric.keypoint_d4(keypoint, group_element, params["shape"][0], params["shape"][1])

    def get_params(self) -> dict[str, D4Type]:
        return {
            "group_element": random_utils.choice(d4_group_elements),
        }

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, group_element, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(self, img: np.ndarray, group_element: D4Type, **params: Any) -> np.ndarray:
    return fgeometric.d4(img, group_element)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params(self) -> dict[str, D4Type]:
    return {
        "group_element": random_utils.choice(d4_group_elements),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class ElasticTransform` `(alpha=3, sigma=50, alpha_affine=None, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=None, approximate=False, same_dxdy=False, p=0.5)` [view source on GitHub] ¶

Apply elastic deformation to images, masks, and bounding boxes as described in [Simard2003]_.

This transformation introduces random elastic distortions to images, which can be useful for data augmentation in training convolutional neural networks. The transformation can be applied in an approximate or precise manner, with an option to use the same displacement field for both x and y directions to speed up the process.

Parameters:

Name	Type	Description
`alpha`	`float`	Scaling factor for the random displacement fields.
`sigma`	`float`	Standard deviation for Gaussian filter applied to the displacement fields.
`interpolation`	`int`	Interpolation method to be used. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default is cv2.INTER_LINEAR.
`border_mode`	`int`	Pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default is cv2.BORDER_REFLECT_101.
`value`	`int, float, list of int, list of float`	Padding value if border_mode is cv2.BORDER_CONSTANT.
`mask_value`	`int, float, list of int, list of float`	Padding value if border_mode is cv2.BORDER_CONSTANT, applied to masks.
`approximate`	`bool`	Whether to smooth displacement map with a fixed kernel size. Enabling this option gives ~2X speedup on large images. Default is False.
`same_dxdy`	`bool`	Whether to use the same random displacement for x and y directions. Enabling this option gives ~2X speedup. Default is False.

Targets

image, mask, bboxes

Image types: uint8, float32

Reference

Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003. https://gist.github.com/ernestum/601cdf56d2b424757de5

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class ElasticTransform(DualTransform):
    """Apply elastic deformation to images, masks, and bounding boxes as described in [Simard2003]_.

    This transformation introduces random elastic distortions to images, which can be useful for data augmentation
    in training convolutional neural networks. The transformation can be applied in an approximate or precise manner,
    with an option to use the same displacement field for both x and y directions to speed up the process.

    Args:
        alpha (float): Scaling factor for the random displacement fields.
        sigma (float): Standard deviation for Gaussian filter applied to the displacement fields.
        interpolation (int): Interpolation method to be used. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default is cv2.INTER_LINEAR.
        border_mode (int): Pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default is cv2.BORDER_REFLECT_101.
        value (int, float, list of int, list of float, optional): Padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float, list of int, list of float, optional): Padding value if border_mode is
            cv2.BORDER_CONSTANT, applied to masks.
        approximate (bool, optional): Whether to smooth displacement map with a fixed kernel size.
            Enabling this option gives ~2X speedup on large images. Default is False.
        same_dxdy (bool, optional): Whether to use the same random displacement for x and y directions.
            Enabling this option gives ~2X speedup. Default is False.

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    Reference:
        Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to
        Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
        https://gist.github.com/ernestum/601cdf56d2b424757de5
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        alpha: Annotated[float, Field(description="Alpha parameter.", ge=0)]
        sigma: Annotated[float, Field(default=50, description="Sigma parameter for Gaussian filter.", ge=1)]
        alpha_affine: None = Field(
            description="Alpha affine parameter.",
            deprecated="Use Affine transform to get affine effects",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: int | float | list[int] | list[float] | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_value: float | list[int] | list[float] | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.",
        )
        approximate: Annotated[bool, Field(default=False, description="Approximate displacement map smoothing.")]
        same_dxdy: Annotated[bool, Field(default=False, description="Use same shift for x and y.")]

    def __init__(
        self,
        alpha: float = 3,
        sigma: float = 50,
        alpha_affine: None = None,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ScalarType | list[ScalarType] | None = None,
        mask_value: ScalarType | list[ScalarType] | None = None,
        always_apply: bool | None = None,
        approximate: bool = False,
        same_dxdy: bool = False,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha
        self.sigma = sigma
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.approximate = approximate
        self.same_dxdy = same_dxdy

    def apply(
        self,
        img: np.ndarray,
        random_seed: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.elastic_transform(
            img,
            self.alpha,
            self.sigma,
            interpolation,
            self.border_mode,
            self.value,
            np.random.RandomState(random_seed),
            self.approximate,
            self.same_dxdy,
        )

    def apply_to_mask(self, mask: np.ndarray, random_seed: int, **params: Any) -> np.ndarray:
        return fgeometric.elastic_transform(
            mask,
            self.alpha,
            self.sigma,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
            np.random.RandomState(random_seed),
            self.approximate,
            self.same_dxdy,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        random_seed: int,
        **params: Any,
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = fgeometric.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = fgeometric.elastic_transform(
            mask,
            self.alpha,
            self.sigma,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
            np.random.RandomState(random_seed),
            self.approximate,
        )
        bbox_returned = bbox_from_mask(mask)
        return cast(BoxInternalType, fgeometric.normalize_bbox(bbox_returned, rows, cols))

    def get_params(self) -> dict[str, int]:
        return {"random_seed": random_utils.get_random_seed()}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "alpha",
            "sigma",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
            "approximate",
            "same_dxdy",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }

`apply (self, img, random_seed, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    random_seed: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.elastic_transform(
        img,
        self.alpha,
        self.sigma,
        interpolation,
        self.border_mode,
        self.value,
        np.random.RandomState(random_seed),
        self.approximate,
        self.same_dxdy,
    )

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params(self) -> dict[str, int]:
    return {"random_seed": random_utils.get_random_seed()}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "alpha",
        "sigma",
        "interpolation",
        "border_mode",
        "value",
        "mask_value",
        "approximate",
        "same_dxdy",
    )

`class Flip` [view source on GitHub] ¶

Flip the input either horizontally, vertically or both horizontally and vertically.

Parameters:

Name	Type	Description
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class Flip(DualTransform):
    """Flip the input either horizontally, vertically or both horizontally and vertically.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, d: int, **params: Any) -> np.ndarray:
        """Args:
        d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
                -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
                180 degrees).
        """
        return fgeometric.random_flip(img, d)

    def get_params(self) -> dict[str, int]:
        # Random int in the range [-1, 1]
        return {"d": random.randint(-1, 1)}

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_flip(bbox, params["d"], params["shape"][0], params["shape"][1])

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fgeometric.keypoint_flip(keypoint, params["d"], params["shape"][0], params["shape"][1])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, d, **params)` ¶

d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(self, img: np.ndarray, d: int, **params: Any) -> np.ndarray:
    """Args:
    d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
            -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
            180 degrees).
    """
    return fgeometric.random_flip(img, d)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params(self) -> dict[str, int]:
    # Random int in the range [-1, 1]
    return {"d": random.randint(-1, 1)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class GridDistortion` `(num_steps=5, distort_limit=(-0.3, 0.3), interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=None, p=0.5)` [view source on GitHub] ¶

Applies grid distortion augmentation to images, masks, and bounding boxes. This technique involves dividing the image into a grid of cells and randomly displacing the intersection points of the grid, resulting in localized distortions.

Parameters:

Name	Type	Description
`num_steps`	`int`	Number of grid cells on each side (minimum 1).
`distort_limit`	`float, (float, float`	Range of distortion limits. If a single float is provided, the range will be from (-distort_limit, distort_limit). Default: (-0.3, 0.3).
`interpolation`	`OpenCV flag`	Interpolation algorithm used for image transformation. Options are: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`border_mode`	`OpenCV flag`	Pixel extrapolation method used when pixels outside the image are required. Options are: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101.
`value`	`int, float, list of ints, list of floats`	Value used for padding when border_mode is cv2.BORDER_CONSTANT.
`mask_value`	`int, float, list of ints, list of floats`	Padding value for masks when border_mode is cv2.BORDER_CONSTANT.
`normalized`	`bool`	If True, ensures that distortion does not exceed image boundaries. Default: False. Reference: https://github.com/albumentations-team/albumentations/pull/722

Targets

image, mask, bboxes

Image types: uint8, float32

Note

This transform is helpful in medical imagery, Optical Character Recognition, and other tasks where local distance may not be preserved.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class GridDistortion(DualTransform):
    """Applies grid distortion augmentation to images, masks, and bounding boxes. This technique involves dividing
    the image into a grid of cells and randomly displacing the intersection points of the grid,
    resulting in localized distortions.

    Args:
        num_steps (int): Number of grid cells on each side (minimum 1).
        distort_limit (float, (float, float)): Range of distortion limits. If a single float is provided,
            the range will be from (-distort_limit, distort_limit). Default: (-0.3, 0.3).
        interpolation (OpenCV flag): Interpolation algorithm used for image transformation. Options are:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): Pixel extrapolation method used when pixels outside the image are required.
            Options are: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP,
            cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101.
        value (int, float, list of ints, list of floats, optional): Value used for padding when
            border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float, list of ints, list of floats, optional): Padding value for masks when
            border_mode is cv2.BORDER_CONSTANT.
        normalized (bool): If True, ensures that distortion does not exceed image boundaries. Default: False.
            Reference: https://github.com/albumentations-team/albumentations/pull/722

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    Note:
        This transform is helpful in medical imagery, Optical Character Recognition, and other tasks where local
        distance may not be preserved.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        num_steps: Annotated[int, Field(ge=1, description="Count of grid cells on each side.")]
        distort_limit: SymmetricRangeType = (-0.3, 0.3)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_value: ColorType | None = Field(
            default=None,
            description="Padding value for mask if border_mode is cv2.BORDER_CONSTANT.",
        )
        normalized: bool = Field(
            default=False,
            description="If true, distortion will be normalized to not go outside the image.",
        )

        @field_validator("distort_limit")
        @classmethod
        def check_limits(cls, v: tuple[float, float], info: ValidationInfo) -> tuple[float, float]:
            bounds = -1, 1
            result = to_tuple(v)
            check_range(result, *bounds, info.field_name)
            return result

    def __init__(
        self,
        num_steps: int = 5,
        distort_limit: ScaleFloatType = (-0.3, 0.3),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        normalized: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)

        self.num_steps = num_steps
        self.distort_limit = cast(Tuple[float, float], distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.normalized = normalized

    def apply(
        self,
        img: np.ndarray,
        stepsx: tuple[()],
        stepsy: tuple[()],
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.grid_distortion(
            img,
            self.num_steps,
            stepsx,
            stepsy,
            interpolation,
            self.border_mode,
            self.value,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        stepsx: tuple[()],
        stepsy: tuple[()],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.grid_distortion(
            mask,
            self.num_steps,
            stepsx,
            stepsy,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        stepsx: tuple[()],
        stepsy: tuple[()],
        **params: Any,
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = fgeometric.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = fgeometric.grid_distortion(
            mask,
            self.num_steps,
            stepsx,
            stepsy,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
        )
        bbox_returned = bbox_from_mask(mask)
        return cast(BoxInternalType, fgeometric.normalize_bbox(bbox_returned, rows, cols))

    def _normalize(self, h: int, w: int, xsteps: list[float], ysteps: list[float]) -> dict[str, Any]:
        # compensate for smaller last steps in source image.
        x_step = w // self.num_steps
        last_x_step = min(w, ((self.num_steps + 1) * x_step)) - (self.num_steps * x_step)
        xsteps[-1] *= last_x_step / x_step

        y_step = h // self.num_steps
        last_y_step = min(h, ((self.num_steps + 1) * y_step)) - (self.num_steps * y_step)
        ysteps[-1] *= last_y_step / y_step

        # now normalize such that distortion never leaves image bounds.
        tx = w / math.floor(w / self.num_steps)
        ty = h / math.floor(h / self.num_steps)
        xsteps = np.array(xsteps) * (tx / np.sum(xsteps))
        ysteps = np.array(ysteps) * (ty / np.sum(ysteps))

        return {"stepsx": xsteps, "stepsy": ysteps}

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
        stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]

        if self.normalized:
            return self._normalize(height, width, stepsx, stepsy)

        return {"stepsx": stepsx, "stepsy": stepsy}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }

`apply (self, img, stepsx, stepsy, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    stepsx: tuple[()],
    stepsy: tuple[()],
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.grid_distortion(
        img,
        self.num_steps,
        stepsx,
        stepsy,
        interpolation,
        self.border_mode,
        self.value,
    )

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
    stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]

    if self.normalized:
        return self._normalize(height, width, stepsx, stepsy)

    return {"stepsx": stepsx, "stepsy": stepsy}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"

`class HorizontalFlip` [view source on GitHub] ¶

Flip the input horizontally around the y-axis.

Parameters:

Name	Type	Description
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class HorizontalFlip(DualTransform):
    """Flip the input horizontally around the y-axis.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if get_num_channels(img) > 1 and img.dtype == np.uint8:
            # Opencv is faster than numpy only in case of
            # non-gray scale 8bits images
            return fgeometric.hflip_cv2(img)

        return fgeometric.hflip(img)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_hflip(bbox, params["shape"][0], params["shape"][1])

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fgeometric.keypoint_hflip(keypoint, params["shape"][0], params["shape"][1])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if get_num_channels(img) > 1 and img.dtype == np.uint8:
        # Opencv is faster than numpy only in case of
        # non-gray scale 8bits images
        return fgeometric.hflip_cv2(img)

    return fgeometric.hflip(img)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class OpticalDistortion` `(distort_limit=(-0.05, 0.05), shift_limit=(-0.05, 0.05), interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=None, p=0.5)` [view source on GitHub] ¶

Parameters:

Name	Type	Description
`distort_limit`	`float, (float, float`	If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
`shift_limit`	`float, (float, float`	If shift_limit is a single float, the range will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`border_mode`	`OpenCV flag`	flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
`value`	`int, float, list of ints, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT.
`mask_value`	`int, float, list of ints, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

Targets

image, mask, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class OpticalDistortion(DualTransform):
    """Args:
        distort_limit (float, (float, float)): If distort_limit is a single float, the range
            will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
        shift_limit (float, (float, float))): If shift_limit is a single float, the range
            will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        distort_limit: SymmetricRangeType = (-0.05, 0.05)
        shift_limit: SymmetricRangeType = (-0.05, 0.05)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_value: ColorType | None = Field(
            default=None,
            description="Padding value for mask if border_mode is cv2.BORDER_CONSTANT.",
        )

    def __init__(
        self,
        distort_limit: ScaleFloatType = (-0.05, 0.05),
        shift_limit: ScaleFloatType = (-0.05, 0.05),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.shift_limit = cast(Tuple[float, float], shift_limit)
        self.distort_limit = cast(Tuple[float, float], distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def apply(
        self,
        img: np.ndarray,
        k: int,
        dx: int,
        dy: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)

    def apply_to_mask(self, mask: np.ndarray, k: int, dx: int, dy: int, **params: Any) -> np.ndarray:
        return fgeometric.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        k: int,
        dx: int,
        dy: int,
        **params: Any,
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = fgeometric.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = fgeometric.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
        bbox_returned = bbox_from_mask(mask)
        return cast(BoxInternalType, fgeometric.normalize_bbox(bbox_returned, rows, cols))

    def get_params(self) -> dict[str, Any]:
        return {
            "k": random.uniform(self.distort_limit[0], self.distort_limit[1]),
            "dx": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
            "dy": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "distort_limit",
            "shift_limit",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }

`apply (self, img, k, dx, dy, interpolation, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    k: int,
    dx: int,
    dy: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {
        "k": random.uniform(self.distort_limit[0], self.distort_limit[1]),
        "dx": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
        "dy": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "distort_limit",
        "shift_limit",
        "interpolation",
        "border_mode",
        "value",
        "mask_value",
    )

`class PadIfNeeded` `(min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

Pads the sides of an image if the image dimensions are less than the specified minimum dimensions. If the pad_height_divisor or pad_width_divisor is specified, the function additionally ensures that the image dimensions are divisible by these values.

Parameters:

Name	Type	Description
`min_height`	`int`	Minimum desired height of the image. Ensures image height is at least this value.
`min_width`	`int`	Minimum desired width of the image. Ensures image width is at least this value.
`pad_height_divisor`	`int`	If set, pads the image height to make it divisible by this value.
`pad_width_divisor`	`int`	If set, pads the image width to make it divisible by this value.
`position`	`Union[str, PositionType]`	Position where the image is to be placed after padding. Can be one of 'center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', or 'random'. Default is 'center'.
`border_mode`	`int`	Specifies the border mode to use if padding is required. The default is `cv2.BORDER_REFLECT_101`. If `value` is provided and `border_mode` is set to a mode that does not use a constant value, it should be manually set to `cv2.BORDER_CONSTANT`.
`value`	`Union[int, float, list[int], list[float]]`	Value to fill the border pixels if the border mode is `cv2.BORDER_CONSTANT`. Default is None.
`mask_value`	`Union[int, float, list[int], list[float]]`	Similar to `value` but used for padding masks. Default is None.
`p`	`float`	Probability of applying the transform. Default is 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class PadIfNeeded(DualTransform):
    """Pads the sides of an image if the image dimensions are less than the specified minimum dimensions.
    If the `pad_height_divisor` or `pad_width_divisor` is specified, the function additionally ensures
    that the image dimensions are divisible by these values.

    Args:
        min_height (int): Minimum desired height of the image. Ensures image height is at least this value.
        min_width (int): Minimum desired width of the image. Ensures image width is at least this value.
        pad_height_divisor (int, optional): If set, pads the image height to make it divisible by this value.
        pad_width_divisor (int, optional): If set, pads the image width to make it divisible by this value.
        position (Union[str, PositionType]): Position where the image is to be placed after padding.
            Can be one of 'center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', or 'random'.
            Default is 'center'.
        border_mode (int): Specifies the border mode to use if padding is required.
            The default is `cv2.BORDER_REFLECT_101`. If `value` is provided and `border_mode` is set to a mode
            that does not use a constant value, it should be manually set to `cv2.BORDER_CONSTANT`.
        value (Union[int, float, list[int], list[float]], optional): Value to fill the border pixels if
            the border mode is `cv2.BORDER_CONSTANT`. Default is None.
        mask_value (Union[int, float, list[int], list[float]], optional): Similar to `value` but used for padding masks.
            Default is None.
        p (float): Probability of applying the transform. Default is 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class PositionType(Enum):
        """Enumerates the types of positions for placing an object within a container.

        This Enum class is utilized to define specific anchor positions that an object can
        assume relative to a container. It's particularly useful in image processing, UI layout,
        and graphic design to specify the alignment and positioning of elements.

        Attributes:
            CENTER (str): Specifies that the object should be placed at the center.
            TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
            TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
            BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
            BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
            RANDOM (str): Indicates that the object's position should be determined randomly.

        """

        CENTER = "center"
        TOP_LEFT = "top_left"
        TOP_RIGHT = "top_right"
        BOTTOM_LEFT = "bottom_left"
        BOTTOM_RIGHT = "bottom_right"
        RANDOM = "random"

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        min_height: int | None = Field(default=None, ge=1, description="Minimal result image height.")
        min_width: int | None = Field(default=None, ge=1, description="Minimal result image width.")
        pad_height_divisor: int | None = Field(
            default=None,
            ge=1,
            description="Ensures image height is divisible by this value.",
        )
        pad_width_divisor: int | None = Field(
            default=None,
            ge=1,
            description="Ensures image width is divisible by this value.",
        )
        position: str = Field(default="center", description="Position of the padded image.")
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType | None = Field(default=None, description="Value for border if BORDER_CONSTANT is used.")
        mask_value: ColorType | None = Field(
            default=None,
            description="Value for mask border if BORDER_CONSTANT is used.",
        )
        p: ProbabilityType = 1.0

        @model_validator(mode="after")
        def validate_divisibility(self) -> Self:
            if (self.min_height is None) == (self.pad_height_divisor is None):
                msg = "Only one of 'min_height' and 'pad_height_divisor' parameters must be set"
                raise ValueError(msg)
            if (self.min_width is None) == (self.pad_width_divisor is None):
                msg = "Only one of 'min_width' and 'pad_width_divisor' parameters must be set"
                raise ValueError(msg)

            if self.value is not None and self.border_mode in {cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT101}:
                self.border_mode = cv2.BORDER_CONSTANT

            if self.border_mode == cv2.BORDER_CONSTANT and self.value is None:
                msg = "If 'border_mode' is set to 'BORDER_CONSTANT', 'value' must be provided."
                raise ValueError(msg)

            return self

    def __init__(
        self,
        min_height: int | None = 1024,
        min_width: int | None = 1024,
        pad_height_divisor: int | None = None,
        pad_width_divisor: int | None = None,
        position: PositionType | str = PositionType.CENTER,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)
        self.min_height = min_height
        self.min_width = min_width
        self.pad_width_divisor = pad_width_divisor
        self.pad_height_divisor = pad_height_divisor
        self.position = PadIfNeeded.PositionType(position)
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
        params = super().update_params(params, **kwargs)
        rows = params["rows"]
        cols = params["cols"]

        if self.min_height is not None:
            if rows < self.min_height:
                h_pad_top = int((self.min_height - rows) / 2.0)
                h_pad_bottom = self.min_height - rows - h_pad_top
            else:
                h_pad_top = 0
                h_pad_bottom = 0
        else:
            pad_remained = rows % self.pad_height_divisor
            pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0

            h_pad_top = pad_rows // 2
            h_pad_bottom = pad_rows - h_pad_top

        if self.min_width is not None:
            if cols < self.min_width:
                w_pad_left = int((self.min_width - cols) / 2.0)
                w_pad_right = self.min_width - cols - w_pad_left
            else:
                w_pad_left = 0
                w_pad_right = 0
        else:
            pad_remainder = cols % self.pad_width_divisor
            pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0

            w_pad_left = pad_cols // 2
            w_pad_right = pad_cols - w_pad_left

        h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
            h_top=h_pad_top,
            h_bottom=h_pad_bottom,
            w_left=w_pad_left,
            w_right=w_pad_right,
        )

        params.update(
            {
                "pad_top": h_pad_top,
                "pad_bottom": h_pad_bottom,
                "pad_left": w_pad_left,
                "pad_right": w_pad_right,
            },
        )
        return params

    def apply(
        self,
        img: np.ndarray,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.pad_with_params(
            img,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.value,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.pad_with_params(
            mask,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.mask_value,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        rows: int,
        cols: int,
        **params: Any,
    ) -> BoxInternalType:
        x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
        bbox = x_min + pad_left, y_min + pad_top, x_max + pad_left, y_max + pad_top
        return cast(BoxInternalType, normalize_bbox(bbox, rows + pad_top + pad_bottom, cols + pad_left + pad_right))

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> KeypointInternalType:
        x, y, angle, scale = keypoint[:4]
        return x + pad_left, y + pad_top, angle, scale

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "min_height",
            "min_width",
            "pad_height_divisor",
            "pad_width_divisor",
            "position",
            "border_mode",
            "value",
            "mask_value",
        )

    def __update_position_params(
        self,
        h_top: int,
        h_bottom: int,
        w_left: int,
        w_right: int,
    ) -> tuple[int, int, int, int]:
        if self.position == PadIfNeeded.PositionType.TOP_LEFT:
            h_bottom += h_top
            w_right += w_left
            h_top = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.TOP_RIGHT:
            h_bottom += h_top
            w_left += w_right
            h_top = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_LEFT:
            h_top += h_bottom
            w_right += w_left
            h_bottom = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_RIGHT:
            h_top += h_bottom
            w_left += w_right
            h_bottom = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.RANDOM:
            h_pad = h_top + h_bottom
            w_pad = w_left + w_right
            h_top = random.randint(0, h_pad)
            h_bottom = h_pad - h_top
            w_left = random.randint(0, w_pad)
            w_right = w_pad - w_left

        return h_top, h_bottom, w_left, w_right

`class PositionType` ¶

Enumerates the types of positions for placing an object within a container.

This Enum class is utilized to define specific anchor positions that an object can assume relative to a container. It's particularly useful in image processing, UI layout, and graphic design to specify the alignment and positioning of elements.

Attributes:

Name	Type	Description
`CENTER`	`str`	Specifies that the object should be placed at the center.
`TOP_LEFT`	`str`	Specifies that the object should be placed at the top-left corner.
`TOP_RIGHT`	`str`	Specifies that the object should be placed at the top-right corner.
`BOTTOM_LEFT`	`str`	Specifies that the object should be placed at the bottom-left corner.
`BOTTOM_RIGHT`	`str`	Specifies that the object should be placed at the bottom-right corner.
`RANDOM`	`str`	Indicates that the object's position should be determined randomly.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class PositionType(Enum):
    """Enumerates the types of positions for placing an object within a container.

    This Enum class is utilized to define specific anchor positions that an object can
    assume relative to a container. It's particularly useful in image processing, UI layout,
    and graphic design to specify the alignment and positioning of elements.

    Attributes:
        CENTER (str): Specifies that the object should be placed at the center.
        TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
        TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
        BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
        BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
        RANDOM (str): Indicates that the object's position should be determined randomly.

    """

    CENTER = "center"
    TOP_LEFT = "top_left"
    TOP_RIGHT = "top_right"
    BOTTOM_LEFT = "bottom_left"
    BOTTOM_RIGHT = "bottom_right"
    RANDOM = "random"

`apply (self, img, pad_top, pad_bottom, pad_left, pad_right, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.pad_with_params(
        img,
        pad_top,
        pad_bottom,
        pad_left,
        pad_right,
        border_mode=self.border_mode,
        value=self.value,
    )

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "min_height",
        "min_width",
        "pad_height_divisor",
        "pad_width_divisor",
        "position",
        "border_mode",
        "value",
        "mask_value",
    )

`update_params (self, params, **kwargs)` ¶

Update parameters with transform specific params. This method is deprecated, use: - get_params for transform specific params like interpolation and - update_params_shape for data like shape.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
    params = super().update_params(params, **kwargs)
    rows = params["rows"]
    cols = params["cols"]

    if self.min_height is not None:
        if rows < self.min_height:
            h_pad_top = int((self.min_height - rows) / 2.0)
            h_pad_bottom = self.min_height - rows - h_pad_top
        else:
            h_pad_top = 0
            h_pad_bottom = 0
    else:
        pad_remained = rows % self.pad_height_divisor
        pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0

        h_pad_top = pad_rows // 2
        h_pad_bottom = pad_rows - h_pad_top

    if self.min_width is not None:
        if cols < self.min_width:
            w_pad_left = int((self.min_width - cols) / 2.0)
            w_pad_right = self.min_width - cols - w_pad_left
        else:
            w_pad_left = 0
            w_pad_right = 0
    else:
        pad_remainder = cols % self.pad_width_divisor
        pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0

        w_pad_left = pad_cols // 2
        w_pad_right = pad_cols - w_pad_left

    h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
        h_top=h_pad_top,
        h_bottom=h_pad_bottom,
        w_left=w_pad_left,
        w_right=w_pad_right,
    )

    params.update(
        {
            "pad_top": h_pad_top,
            "pad_bottom": h_pad_bottom,
            "pad_left": w_pad_left,
            "pad_right": w_pad_right,
        },
    )
    return params

`class Perspective` `(scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=None, p=0.5)` [view source on GitHub] ¶

Perform a random four point perspective transform of the input.

Parameters:

Name	Type	Description
`scale`	`ScaleFloatType`	standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
`keep_size`	`bool`	Whether to resize image back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True
`pad_mode`	`OpenCV flag`	OpenCV border mode.
`pad_val`	`int, float, list of int, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0
`mask_pad_val`	`int, float, list of int, list of float`	padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0
`fit_output`	`bool`	If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class Perspective(DualTransform):
    """Perform a random four point perspective transform of the input.

    Args:
        scale: standard deviation of the normal distributions. These are used to sample
            the random distances of the subimage's corners from the full image's corners.
            If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
        keep_size: Whether to resize image back to their original size after applying the perspective
            transform. If set to False, the resulting images may end up having different shapes
            and will always be a list, never an array. Default: True
        pad_mode (OpenCV flag): OpenCV border mode.
        pad_val (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
            Default: 0
        mask_pad_val (int, float, list of int, list of float): padding value for mask
            if border_mode is cv2.BORDER_CONSTANT. Default: 0
        fit_output (bool): If True, the image plane size and position will be adjusted to still capture
            the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.)
            Otherwise, parts of the transformed image may be outside of the image plane.
            This setting should not be set to True when using large scale values as it could lead to very large images.
            Default: False
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        scale: NonNegativeFloatRangeType = (0.05, 0.1)
        keep_size: Annotated[bool, Field(default=True, description="Keep size after transform.")]
        pad_mode: BorderModeType = cv2.BORDER_CONSTANT
        pad_val: ColorType | None = Field(
            default=0,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_pad_val: ColorType | None = Field(
            default=0,
            description="Mask padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        fit_output: Annotated[bool, Field(default=False, description="Adjust image plane to capture whole image.")]
        interpolation: InterpolationType = cv2.INTER_LINEAR

    def __init__(
        self,
        scale: ScaleFloatType = (0.05, 0.1),
        keep_size: bool = True,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_val: ColorType = 0,
        mask_pad_val: ColorType = 0,
        fit_output: bool = False,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.scale = cast(Tuple[float, float], scale)
        self.keep_size = keep_size
        self.pad_mode = pad_mode
        self.pad_val = pad_val
        self.mask_pad_val = mask_pad_val
        self.fit_output = fit_output
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.perspective(
            img,
            matrix,
            max_width,
            max_height,
            self.pad_val,
            self.pad_mode,
            self.keep_size,
            params["interpolation"],
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> BoxInternalType:
        return fgeometric.perspective_bbox(
            bbox,
            params["rows"],
            params["cols"],
            matrix,
            max_width,
            max_height,
            self.keep_size,
        )

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.perspective_keypoint(
            keypoint,
            params["rows"],
            params["cols"],
            matrix,
            max_width,
            max_height,
            self.keep_size,
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        scale = random.uniform(*self.scale)
        points = random_utils.normal(0, scale, [4, 2])
        points = np.mod(np.abs(points), 0.32)

        # top left -- no changes needed, just use jitter
        # top right
        points[1, 0] = 1.0 - points[1, 0]  # w = 1.0 - jitter
        # bottom right
        points[2] = 1.0 - points[2]  # w = 1.0 - jitt
        # bottom left
        points[3, 1] = 1.0 - points[3, 1]  # h = 1.0 - jitter

        points[:, 0] *= width
        points[:, 1] *= height

        # Obtain a consistent order of the points and unpack them individually.
        # Warning: don't just do (tl, tr, br, bl) = _order_points(...)
        # here, because the reordered points is used further below.
        points = self._order_points(points)
        tl, tr, br, bl = points

        # compute the width of the new image, which will be the
        # maximum distance between bottom-right and bottom-left
        # x-coordiates or the top-right and top-left x-coordinates
        min_width = None
        max_width = None
        while min_width is None or min_width < TWO:
            width_top = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
            width_bottom = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
            max_width = int(max(width_top, width_bottom))
            min_width = int(min(width_top, width_bottom))
            if min_width < TWO:
                step_size = (2 - min_width) / 2
                tl[0] -= step_size
                tr[0] += step_size
                bl[0] -= step_size
                br[0] += step_size

        # compute the height of the new image, which will be the maximum distance between the top-right
        # and bottom-right y-coordinates or the top-left and bottom-left y-coordinates
        min_height = None
        max_height = None
        while min_height is None or min_height < TWO:
            height_right = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
            height_left = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
            max_height = int(max(height_right, height_left))
            min_height = int(min(height_right, height_left))
            if min_height < TWO:
                step_size = (2 - min_height) / 2
                tl[1] -= step_size
                tr[1] -= step_size
                bl[1] += step_size
                br[1] += step_size

        # now that we have the dimensions of the new image, construct
        # the set of destination points to obtain a "birds eye view",
        # (i.e. top-down view) of the image, again specifying points
        # in the top-left, top-right, bottom-right, and bottom-left order
        # do not use width-1 or height-1 here, as for e.g. width=3, height=2
        # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
        dst = np.array([[0, 0], [max_width, 0], [max_width, max_height], [0, max_height]], dtype=np.float32)

        # compute the perspective transform matrix and then apply it
        m = cv2.getPerspectiveTransform(points, dst)

        if self.fit_output:
            m, max_width, max_height = self._expand_transform(m, (height, width))

        return {"matrix": m, "max_height": max_height, "max_width": max_width, "interpolation": self.interpolation}

    @classmethod
    def _expand_transform(cls, matrix: np.ndarray, shape: SizeType) -> tuple[np.ndarray, int, int]:
        height, width = shape[:2]
        # do not use width-1 or height-1 here, as for e.g. width=3, height=2, max_height
        # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
        rect = np.array([[0, 0], [width, 0], [width, height], [0, height]], dtype=np.float32)
        dst = cv2.perspectiveTransform(np.array([rect]), matrix)[0]

        # get min x, y over transformed 4 points
        # then modify target points by subtracting these minima  => shift to (0, 0)
        dst -= dst.min(axis=0, keepdims=True)
        dst = np.around(dst, decimals=0)

        matrix_expanded = cv2.getPerspectiveTransform(rect, dst)
        max_width, max_height = dst.max(axis=0)
        return matrix_expanded, int(max_width), int(max_height)

    @staticmethod
    def _order_points(pts: np.ndarray) -> np.ndarray:
        pts = np.array(sorted(pts, key=lambda x: x[0]))
        left = pts[:2]  # points with smallest x coordinate - left points
        right = pts[2:]  # points with greatest x coordinate - right points

        if left[0][1] < left[1][1]:
            tl, bl = left
        else:
            bl, tl = left

        if right[0][1] < right[1][1]:
            tr, br = right
        else:
            br, tr = right

        return np.array([tl, tr, br, bl], dtype=np.float32)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "scale", "keep_size", "pad_mode", "pad_val", "mask_pad_val", "fit_output", "interpolation"

`apply (self, img, matrix, max_height, max_width, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    matrix: np.ndarray,
    max_height: int,
    max_width: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.perspective(
        img,
        matrix,
        max_width,
        max_height,
        self.pad_val,
        self.pad_mode,
        self.keep_size,
        params["interpolation"],
    )

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    scale = random.uniform(*self.scale)
    points = random_utils.normal(0, scale, [4, 2])
    points = np.mod(np.abs(points), 0.32)

    # top left -- no changes needed, just use jitter
    # top right
    points[1, 0] = 1.0 - points[1, 0]  # w = 1.0 - jitter
    # bottom right
    points[2] = 1.0 - points[2]  # w = 1.0 - jitt
    # bottom left
    points[3, 1] = 1.0 - points[3, 1]  # h = 1.0 - jitter

    points[:, 0] *= width
    points[:, 1] *= height

    # Obtain a consistent order of the points and unpack them individually.
    # Warning: don't just do (tl, tr, br, bl) = _order_points(...)
    # here, because the reordered points is used further below.
    points = self._order_points(points)
    tl, tr, br, bl = points

    # compute the width of the new image, which will be the
    # maximum distance between bottom-right and bottom-left
    # x-coordiates or the top-right and top-left x-coordinates
    min_width = None
    max_width = None
    while min_width is None or min_width < TWO:
        width_top = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
        width_bottom = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
        max_width = int(max(width_top, width_bottom))
        min_width = int(min(width_top, width_bottom))
        if min_width < TWO:
            step_size = (2 - min_width) / 2
            tl[0] -= step_size
            tr[0] += step_size
            bl[0] -= step_size
            br[0] += step_size

    # compute the height of the new image, which will be the maximum distance between the top-right
    # and bottom-right y-coordinates or the top-left and bottom-left y-coordinates
    min_height = None
    max_height = None
    while min_height is None or min_height < TWO:
        height_right = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
        height_left = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
        max_height = int(max(height_right, height_left))
        min_height = int(min(height_right, height_left))
        if min_height < TWO:
            step_size = (2 - min_height) / 2
            tl[1] -= step_size
            tr[1] -= step_size
            bl[1] += step_size
            br[1] += step_size

    # now that we have the dimensions of the new image, construct
    # the set of destination points to obtain a "birds eye view",
    # (i.e. top-down view) of the image, again specifying points
    # in the top-left, top-right, bottom-right, and bottom-left order
    # do not use width-1 or height-1 here, as for e.g. width=3, height=2
    # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
    dst = np.array([[0, 0], [max_width, 0], [max_width, max_height], [0, max_height]], dtype=np.float32)

    # compute the perspective transform matrix and then apply it
    m = cv2.getPerspectiveTransform(points, dst)

    if self.fit_output:
        m, max_width, max_height = self._expand_transform(m, (height, width))

    return {"matrix": m, "max_height": max_height, "max_width": max_width, "interpolation": self.interpolation}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "scale", "keep_size", "pad_mode", "pad_val", "mask_pad_val", "fit_output", "interpolation"

`class PiecewiseAffine` `(scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=None, keypoints_threshold=0.01, p=0.5)` [view source on GitHub] ¶

Apply affine transformations that differ between local neighborhoods. This augmentation places a regular grid of points on an image and randomly moves the neighborhood of these point around via affine transformations. This leads to local distortions.

This is mostly a wrapper around scikit-image's PiecewiseAffine. See also Affine for a similar technique.

Note

This augmenter is very slow. Try to use ElasticTransformation instead, which is at least 10x faster.

Note

For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.

Parameters:

Name	Type	Description
`scale`	`float, tuple of float`	Each point on the regular grid is moved around via a normal distribution. This scale factor is equivalent to the normal distribution's sigma. Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of the image if `absolute_scale=False` (default), so this scale can be the same for different sized images. Recommended values are in the range `0.01` to `0.05` (weak to strong augmentations). * If a single `float`, then that value will always be used as the scale. * If a tuple `(a, b)` of `float` s, then a random value will be uniformly sampled per image from the interval `[a, b]`.
`nb_rows`	`int, tuple of int`	Number of rows of points that the regular grid should have. Must be at least `2`. For large images, you might want to pick a higher value than `4`. You might have to then adjust scale to lower values. * If a single `int`, then that value will always be used as the number of rows. * If a tuple `(a, b)`, then a value from the discrete interval `[a..b]` will be uniformly sampled per image.
`nb_cols`	`int, tuple of int`	Number of columns. Analogous to `nb_rows`.
`interpolation`	`int`	The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic
`mask_interpolation`	`int`	same as interpolation but for mask.
`cval`	`number`	The constant value to use when filling in newly created pixels.
`cval_mask`	`number`	Same as cval but only for masks.
`mode`	`str`	{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional Points outside the boundaries of the input are filled according to the given mode. Modes match the behaviour of `numpy.pad`.
`absolute_scale`	`bool`	Take `scale` as an absolute value rather than a relative value.
`keypoints_threshold`	`float`	Used as threshold in conversion from distance maps to keypoints. The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use `None` to use no min/max. Default: 0.01

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class PiecewiseAffine(DualTransform):
    """Apply affine transformations that differ between local neighborhoods.
    This augmentation places a regular grid of points on an image and randomly moves the neighborhood of these point
    around via affine transformations. This leads to local distortions.

    This is mostly a wrapper around scikit-image's ``PiecewiseAffine``.
    See also ``Affine`` for a similar technique.

    Note:
        This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.

    Note:
        For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
        this augmenter still has to perform an image-based augmentation,
        which will make it significantly slower and not fully correct for such inputs than other transforms.

    Args:
        scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
            This scale factor is equivalent to the normal distribution's sigma.
            Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
            the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
            Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
                * If a single ``float``, then that value will always be used as the scale.
                * If a tuple ``(a, b)`` of ``float`` s, then a random value will
                  be uniformly sampled per image from the interval ``[a, b]``.
        nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
            Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
            You might have to then adjust scale to lower values.
                * If a single ``int``, then that value will always be used as the number of rows.
                * If a tuple ``(a, b)``, then a value from the discrete interval
                  ``[a..b]`` will be uniformly sampled per image.
        nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
        interpolation (int): The order of interpolation. The order has to be in the range 0-5:
             - 0: Nearest-neighbor
             - 1: Bi-linear (default)
             - 2: Bi-quadratic
             - 3: Bi-cubic
             - 4: Bi-quartic
             - 5: Bi-quintic
        mask_interpolation (int): same as interpolation but for mask.
        cval (number): The constant value to use when filling in newly created pixels.
        cval_mask (number): Same as cval but only for masks.
        mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
            Points outside the boundaries of the input are filled according
            to the given mode.  Modes match the behaviour of `numpy.pad`.
        absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
        keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
            The search for keypoints works by searching for the
            argmin (non-inverted) or argmax (inverted) in each channel. This
            parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
            as a keypoint. Use ``None`` to use no min/max. Default: 0.01

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: NonNegativeFloatRangeType = (0.03, 0.05)
        nb_rows: ScaleIntType = Field(default=4, description="Number of rows in the regular grid.")
        nb_cols: ScaleIntType = Field(default=4, description="Number of columns in the regular grid.")
        interpolation: InterpolationType = cv2.INTER_LINEAR
        mask_interpolation: InterpolationType = cv2.INTER_NEAREST
        cval: int = Field(default=0, description="Constant value used for newly created pixels.")
        cval_mask: int = Field(default=0, description="Constant value used for newly created mask pixels.")
        mode: Literal["constant", "edge", "symmetric", "reflect", "wrap"] = "constant"
        absolute_scale: bool = Field(
            default=False,
            description="Whether scale is an absolute value rather than relative.",
        )
        keypoints_threshold: float = Field(
            default=0.01,
            description="Threshold for conversion from distance maps to keypoints.",
        )

        @field_validator("nb_rows", "nb_cols")
        @classmethod
        def process_range(cls, value: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
            bounds = 2, BIG_INTEGER
            result = to_tuple(value, value)
            check_range(result, *bounds, info.field_name)
            return result

    def __init__(
        self,
        scale: ScaleFloatType = (0.03, 0.05),
        nb_rows: ScaleIntType = 4,
        nb_cols: ScaleIntType = 4,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        cval: int = 0,
        cval_mask: int = 0,
        mode: Literal["constant", "edge", "symmetric", "reflect", "wrap"] = "constant",
        absolute_scale: bool = False,
        always_apply: bool | None = None,
        keypoints_threshold: float = 0.01,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)

        warn(
            "This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.",
            stacklevel=2,
        )

        self.scale = cast(Tuple[float, float], scale)
        self.nb_rows = cast(Tuple[int, int], nb_rows)
        self.nb_cols = cast(Tuple[int, int], nb_cols)
        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.absolute_scale = absolute_scale
        self.keypoints_threshold = keypoints_threshold

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "scale",
            "nb_rows",
            "nb_cols",
            "interpolation",
            "mask_interpolation",
            "cval",
            "cval_mask",
            "mode",
            "absolute_scale",
            "keypoints_threshold",
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
        nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
        nb_cells = nb_cols * nb_rows
        scale = random.uniform(*self.scale)

        jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
        if not np.any(jitter > 0):
            for _ in range(10):  # See: https://github.com/albumentations-team/albumentations/issues/1442
                jitter = random_utils.normal(0, scale, (nb_cells, 2))
                if np.any(jitter > 0):
                    break
            if not np.any(jitter > 0):
                return {"matrix": None}

        y = np.linspace(0, height, nb_rows)
        x = np.linspace(0, width, nb_cols)

        # (H, W) and (H, W) for H=rows, W=cols
        xx_src, yy_src = np.meshgrid(x, y)

        # (1, HW, 2) => (HW, 2) for H=rows, W=cols
        points_src = np.dstack([yy_src.flat, xx_src.flat])[0]

        if self.absolute_scale:
            jitter[:, 0] = jitter[:, 0] / height if height > 0 else 0.0
            jitter[:, 1] = jitter[:, 1] / width if width > 0 else 0.0

        jitter[:, 0] = jitter[:, 0] * height
        jitter[:, 1] = jitter[:, 1] * width

        points_dest = np.copy(points_src)
        points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
        points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]

        # Restrict all destination points to be inside the image plane.
        # This is necessary, as otherwise keypoints could be augmented
        # outside of the image plane and these would be replaced by
        # (-1, -1), which would not conform with the behaviour of the other augmenters.
        points_dest[:, 0] = np.clip(points_dest[:, 0], 0, height - 1)
        points_dest[:, 1] = np.clip(points_dest[:, 1], 0, width - 1)

        matrix = skimage.transform.PiecewiseAffineTransform()
        matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])

        return {
            "matrix": matrix,
        }

    def apply(
        self,
        img: np.ndarray,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.piecewise_affine(img, matrix, self.interpolation, self.mode, self.cval)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.piecewise_affine(mask, matrix, self.mask_interpolation, self.mode, self.cval_mask)

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        rows: int,
        cols: int,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> BoxInternalType:
        return fgeometric.bbox_piecewise_affine(bbox, matrix, rows, cols, self.keypoints_threshold)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        rows: int,
        cols: int,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> KeypointInternalType:
        return fgeometric.keypoint_piecewise_affine(keypoint, matrix, rows, cols, self.keypoints_threshold)

`apply (self, img, matrix, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    matrix: skimage.transform.PiecewiseAffineTransform,
    **params: Any,
) -> np.ndarray:
    return fgeometric.piecewise_affine(img, matrix, self.interpolation, self.mode, self.cval)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
    nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
    nb_cells = nb_cols * nb_rows
    scale = random.uniform(*self.scale)

    jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
    if not np.any(jitter > 0):
        for _ in range(10):  # See: https://github.com/albumentations-team/albumentations/issues/1442
            jitter = random_utils.normal(0, scale, (nb_cells, 2))
            if np.any(jitter > 0):
                break
        if not np.any(jitter > 0):
            return {"matrix": None}

    y = np.linspace(0, height, nb_rows)
    x = np.linspace(0, width, nb_cols)

    # (H, W) and (H, W) for H=rows, W=cols
    xx_src, yy_src = np.meshgrid(x, y)

    # (1, HW, 2) => (HW, 2) for H=rows, W=cols
    points_src = np.dstack([yy_src.flat, xx_src.flat])[0]

    if self.absolute_scale:
        jitter[:, 0] = jitter[:, 0] / height if height > 0 else 0.0
        jitter[:, 1] = jitter[:, 1] / width if width > 0 else 0.0

    jitter[:, 0] = jitter[:, 0] * height
    jitter[:, 1] = jitter[:, 1] * width

    points_dest = np.copy(points_src)
    points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
    points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]

    # Restrict all destination points to be inside the image plane.
    # This is necessary, as otherwise keypoints could be augmented
    # outside of the image plane and these would be replaced by
    # (-1, -1), which would not conform with the behaviour of the other augmenters.
    points_dest[:, 0] = np.clip(points_dest[:, 0], 0, height - 1)
    points_dest[:, 1] = np.clip(points_dest[:, 1], 0, width - 1)

    matrix = skimage.transform.PiecewiseAffineTransform()
    matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])

    return {
        "matrix": matrix,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "scale",
        "nb_rows",
        "nb_cols",
        "interpolation",
        "mask_interpolation",
        "cval",
        "cval_mask",
        "mode",
        "absolute_scale",
        "keypoints_threshold",
    )

`class ShiftScaleRotate` `(shift_limit=(-0.0625, 0.0625), scale_limit=(-0.1, 0.1), rotate_limit=(-45, 45), interpolation=1, border_mode=4, value=0, mask_value=0, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters:

Name	Type	Description
`shift_limit`	`float, float) or float`	shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [-1, 1]. Default: (-0.0625, 0.0625).
`scale_limit`	`float, float) or float`	scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).
`rotate_limit`	`int, int) or int`	rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
`interpolation`	`OpenCV flag`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`border_mode`	`OpenCV flag`	flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
`value`	`int, float, list of int, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT.
`mask_value`	`int, float, list of int, list of float`	padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
`shift_limit_x`	`float, float) or float`	shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [-1, 1]. Default: None.
`shift_limit_y`	`float, float) or float`	shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [-, 1]. Default: None.
`rotate_method`	`str`	rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class ShiftScaleRotate(Affine):
    """Randomly apply affine transforms: translate, scale and rotate the input.

    Args:
        shift_limit ((float, float) or float): shift factor range for both height and width. If shift_limit
            is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and
            upper bounds should lie in range [-1, 1]. Default: (-0.0625, 0.0625).
        scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
            range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
            If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
            Default: (-0.1, 0.1).
        rotate_limit ((int, int) or int): rotation range. If rotate_limit is a single int value, the
            range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of int,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        shift_limit_x ((float, float) or float): shift factor range for width. If it is set then this value
            instead of shift_limit will be used for shifting width.  If shift_limit_x is a single float value,
            the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in
            the range [-1, 1]. Default: None.
        shift_limit_y ((float, float) or float): shift factor range for height. If it is set then this value
            instead of shift_limit will be used for shifting height.  If shift_limit_y is a single float value,
            the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie
            in the range [-, 1]. Default: None.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
            Default: "largest_box"
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        shift_limit: SymmetricRangeType = (-0.0625, 0.0625)
        scale_limit: SymmetricRangeType = (-0.1, 0.1)
        rotate_limit: SymmetricRangeType = (-45, 45)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType = 0
        mask_value: ColorType = 0
        shift_limit_x: ScaleFloatType | None = Field(default=None)
        shift_limit_y: ScaleFloatType | None = Field(default=None)
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box"

        @model_validator(mode="after")
        def check_shift_limit(self) -> Self:
            bounds = -1, 1
            self.shift_limit_x = to_tuple(self.shift_limit_x if self.shift_limit_x is not None else self.shift_limit)
            check_range(self.shift_limit_x, *bounds, "shift_limit_x")
            self.shift_limit_y = to_tuple(self.shift_limit_y if self.shift_limit_y is not None else self.shift_limit)
            check_range(self.shift_limit_y, *bounds, "shift_limit_y")
            return self

        @field_validator("scale_limit")
        @classmethod
        def check_scale_limit(cls, value: ScaleFloatType, info: ValidationInfo) -> ScaleFloatType:
            bounds = 0, float("inf")
            result = to_tuple(value, bias=1.0)
            check_range(result, *bounds, str(info.field_name))
            return result

    def __init__(
        self,
        shift_limit: ScaleFloatType = (-0.0625, 0.0625),
        scale_limit: ScaleFloatType = (-0.1, 0.1),
        rotate_limit: ScaleFloatType = (-45, 45),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType = 0,
        mask_value: ColorType = 0,
        shift_limit_x: ScaleFloatType | None = None,
        shift_limit_y: ScaleFloatType | None = None,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(
            scale=scale_limit,
            translate_percent={"x": shift_limit_x, "y": shift_limit_y},
            rotate=rotate_limit,
            shear=(0, 0),
            interpolation=interpolation,
            mask_interpolation=cv2.INTER_NEAREST,
            cval=value,
            cval_mask=mask_value,
            mode=border_mode,
            fit_output=False,
            keep_ratio=False,
            rotate_method=rotate_method,
            always_apply=always_apply,
            p=p,
        )
        warn(
            "ShiftScaleRotate is deprecated. Please use Affine transform instead .",
            DeprecationWarning,
            stacklevel=2,
        )
        self.shift_limit_x = cast(Tuple[float, float], shift_limit_x)
        self.shift_limit_y = cast(Tuple[float, float], shift_limit_y)
        self.scale_limit = cast(Tuple[float, float], scale_limit)
        self.rotate_limit = cast(Tuple[int, int], rotate_limit)
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def get_transform_init_args(self) -> dict[str, Any]:
        return {
            "shift_limit_x": self.shift_limit_x,
            "shift_limit_y": self.shift_limit_y,
            "scale_limit": to_tuple(self.scale_limit, bias=-1.0),
            "rotate_limit": self.rotate_limit,
            "interpolation": self.interpolation,
            "border_mode": self.border_mode,
            "value": self.value,
            "mask_value": self.mask_value,
            "rotate_method": self.rotate_method,
        }

`class Transpose` [view source on GitHub] ¶

Transpose the input by swapping rows and columns.

Parameters:

Name	Type	Description
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class Transpose(DualTransform):
    """Transpose the input by swapping rows and columns.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.transpose(img)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_transpose(bbox, params["shape"][0], params["shape"][1])

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fgeometric.keypoint_transpose(keypoint, params["shape"][0], params["shape"][1])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fgeometric.transpose(img)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class VerticalFlip` [view source on GitHub] ¶

Flip the input vertically around the x-axis.

Parameters:

Name	Type	Description
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py

Python

class VerticalFlip(DualTransform):
    """Flip the input vertically around the x-axis.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.vflip(img)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fgeometric.bbox_vflip(bbox, params["shape"][0], params["shape"][1])

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fgeometric.keypoint_vflip(keypoint, params["shape"][0], params["shape"][1])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fgeometric.vflip(img)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`mixing` `special` ¶

`transforms` ¶

`class MixUp` `(reference_data=None, read_fn=<function MixUp.<lambda> at 0x7f7a67a2c040>, alpha=0.4, mix_coef_return_name='mix_coef', always_apply=None, p=0.5)` [view source on GitHub] ¶

Performs MixUp data augmentation, blending images, masks, and class labels with reference data.

MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution. This method is known for improving model generalization by promoting linear behavior between classes and smoothing decision boundaries.

Reference

Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412

Parameters:

Name	Type	Description
`reference_data`	`Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]`	A sequence or generator of dictionaries containing the reference data for mixing If None or an empty sequence is provided, no operation is performed and a warning is issued.
`read_fn`	`Callable[[ReferenceImage], dict[str, Any]]`	A function to process items from reference_data. It should accept items from reference_data and return a dictionary containing processed data: - The returned dictionary must include an 'image' key with a numpy array value. - It may also include 'mask', 'global_label' each associated with numpy array values. Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.
`mix_coef_return_name`	`str`	Name used for the applied alpha coefficient in the returned dictionary. Defaults to "mix_coef".
`alpha`	`float`	The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0. Higher values lead to more uniform mixing. Defaults to 0.4.
`p`	`float`	The probability of applying the transformation. Defaults to 0.5.

Targets

image, mask, global_label

Image types: - uint8, float32

Exceptions:

Type	Description
`- ValueError`	If the alpha parameter is negative.
`- NotImplementedError`	If the transform is applied to bounding boxes or keypoints.

Notes

If no reference data is provided, a warning is issued, and the transform acts as a no-op.
Notes if images are in float32 format, they should be within [0, 1] range.

Example Usage: import albumentations as A import numpy as np from albumentations.core.types import ReferenceImage

# Prepare reference data
# Note: This code generates random reference data for demonstration purposes only.
# In real-world applications, it's crucial to use meaningful and representative data.
# The quality and relevance of your input data significantly impact the effectiveness
# of the augmentation process. Ensure your data closely aligns with your specific
# use case and application requirements.
reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
                                 mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
                                 global_label=np.random.choice([0, 1], size=3)) for i in range(10)]

# In this example, the lambda function simply returns its input, which works well for
# data already in the expected format. For more complex scenarios, where the data might not be in
# the required format or additional processing is needed, a more sophisticated function can be implemented.
# Below is a hypothetical example where the input data is a file path, # and the function reads the image
# file, converts it to a specific format, and possibly performs other preprocessing steps.

# Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
# def custom_read_fn(file_path):
#     from PIL import Image
#     image = Image.open(file_path).convert('RGB')
#     image = image.resize((100, 100))  # Example resize, adjust as needed.
#     return np.array(image)

# aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])

# For simplicity, the original lambda function is used in this example.
# Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.

# Apply augmentations
image = np.empty([100, 100, 3], dtype=np.uint8)
mask = np.empty([100, 100], dtype=np.uint8)
global_label = np.array([0, 1, 0])
data = aug(image=image, global_label=global_label, mask=mask)
transformed_image = data["image"]
transformed_mask = data["mask"]
transformed_global_label = data["global_label"]

# Print applied mix coefficient
print(data["mix_coef"])  # Output: e.g., 0.9991580344142427

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/mixing/transforms.py

Python

class MixUp(ReferenceBasedTransform):
    """Performs MixUp data augmentation, blending images, masks, and class labels with reference data.

    MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined
    reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution.
    This method is known for improving model generalization by promoting linear behavior between classes and
    smoothing decision boundaries.

    Reference:
        - Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization.
        In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412

    Args:
        reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
            A sequence or generator of dictionaries containing the reference data for mixing
            If None or an empty sequence is provided, no operation is performed and a warning is issued.
        read_fn (Callable[[ReferenceImage], dict[str, Any]]):
            A function to process items from reference_data. It should accept items from reference_data
            and return a dictionary containing processed data:
                - The returned dictionary must include an 'image' key with a numpy array value.
                - It may also include 'mask', 'global_label' each associated with numpy array values.
            Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.
        mix_coef_return_name (str): Name used for the applied alpha coefficient in the returned dictionary.
            Defaults to "mix_coef".
        alpha (float):
            The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0.
            Higher values lead to more uniform mixing. Defaults to 0.4.
        p (float):
            The probability of applying the transformation. Defaults to 0.5.

    Targets:
        image, mask, global_label

    Image types:
        - uint8, float32

    Raises:
        - ValueError: If the alpha parameter is negative.
        - NotImplementedError: If the transform is applied to bounding boxes or keypoints.

    Notes:
        - If no reference data is provided, a warning is issued, and the transform acts as a no-op.
        - Notes if images are in float32 format, they should be within [0, 1] range.

    Example Usage:
        import albumentations as A
        import numpy as np
        from albumentations.core.types import ReferenceImage

        # Prepare reference data
        # Note: This code generates random reference data for demonstration purposes only.
        # In real-world applications, it's crucial to use meaningful and representative data.
        # The quality and relevance of your input data significantly impact the effectiveness
        # of the augmentation process. Ensure your data closely aligns with your specific
        # use case and application requirements.
        reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
                                         mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
                                         global_label=np.random.choice([0, 1], size=3)) for i in range(10)]

        # In this example, the lambda function simply returns its input, which works well for
        # data already in the expected format. For more complex scenarios, where the data might not be in
        # the required format or additional processing is needed, a more sophisticated function can be implemented.
        # Below is a hypothetical example where the input data is a file path, # and the function reads the image
        # file, converts it to a specific format, and possibly performs other preprocessing steps.

        # Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
        # def custom_read_fn(file_path):
        #     from PIL import Image
        #     image = Image.open(file_path).convert('RGB')
        #     image = image.resize((100, 100))  # Example resize, adjust as needed.
        #     return np.array(image)

        # aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])

        # For simplicity, the original lambda function is used in this example.
        # Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.

        # Apply augmentations
        image = np.empty([100, 100, 3], dtype=np.uint8)
        mask = np.empty([100, 100], dtype=np.uint8)
        global_label = np.array([0, 1, 0])
        data = aug(image=image, global_label=global_label, mask=mask)
        transformed_image = data["image"]
        transformed_mask = data["mask"]
        transformed_global_label = data["global_label"]

        # Print applied mix coefficient
        print(data["mix_coef"])  # Output: e.g., 0.9991580344142427
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.GLOBAL_LABEL)

    class InitSchema(BaseTransformInitSchema):
        reference_data: Generator[Any, None, None] | Sequence[Any] | None = None
        read_fn: Callable[[ReferenceImage], Any]
        alpha: Annotated[float, Field(default=0.4, ge=0, le=1)]
        mix_coef_return_name: str = "mix_coef"

    def __init__(
        self,
        reference_data: Generator[Any, None, None] | Sequence[Any] | None = None,
        read_fn: Callable[[ReferenceImage], Any] = lambda x: {"image": x, "mask": None, "class_label": None},
        alpha: float = 0.4,
        mix_coef_return_name: str = "mix_coef",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.mix_coef_return_name = mix_coef_return_name

        self.read_fn = read_fn
        self.alpha = alpha

        if reference_data is None:
            warn("No reference data provided for MixUp. This transform will act as a no-op.", stacklevel=2)
            # Create an empty generator
            self.reference_data: list[Any] = []
        elif (
            isinstance(reference_data, types.GeneratorType)
            or isinstance(reference_data, Iterable)
            and not isinstance(reference_data, str)
        ):
            self.reference_data = reference_data  # type: ignore[assignment]
        else:
            msg = "reference_data must be a list, tuple, generator, or None."
            raise TypeError(msg)

    def apply(self, img: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
        if not mix_data:
            return img

        mix_img = mix_data["image"]

        if img.shape != mix_img.shape and not is_grayscale_image(img):
            msg = "The shape of the reference image should be the same as the input image."
            raise ValueError(msg)

        return add_weighted(img, mix_coef, mix_img.reshape(img.shape), 1 - mix_coef) if mix_img is not None else img

    def apply_to_mask(self, mask: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
        mix_mask = mix_data.get("mask")
        return (
            add_weighted(mask, mix_coef, mix_mask.reshape(mask.shape), 1 - mix_coef) if mix_mask is not None else mask
        )

    def apply_to_global_label(
        self,
        label: np.ndarray,
        mix_data: ReferenceImage,
        mix_coef: float,
        **params: Any,
    ) -> np.ndarray:
        mix_label = mix_data.get("global_label")
        if mix_label is not None and label is not None:
            return mix_coef * label + (1 - mix_coef) * mix_label
        return label

    def apply_to_bboxes(self, bboxes: Sequence[BoxType], mix_data: ReferenceImage, **params: Any) -> Sequence[BoxType]:
        msg = "MixUp does not support bounding boxes yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
        raise NotImplementedError(msg)

    def apply_to_keypoints(
        self,
        keypoints: Sequence[KeypointType],
        *args: Any,
        **params: Any,
    ) -> Sequence[KeypointType]:
        msg = "MixUp does not support keypoints yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
        raise NotImplementedError(msg)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "reference_data", "alpha"

    def get_params(self) -> dict[str, None | float | dict[str, Any]]:
        mix_data = None
        # Check if reference_data is not empty and is a sequence (list, tuple, np.array)
        if isinstance(self.reference_data, Sequence) and not isinstance(self.reference_data, (str, bytes)):
            if len(self.reference_data) > 0:  # Additional check to ensure it's not empty
                mix_idx = random.randint(0, len(self.reference_data) - 1)
                mix_data = self.reference_data[mix_idx]
        # Check if reference_data is an iterator or generator
        elif isinstance(self.reference_data, Iterator):
            try:
                mix_data = next(self.reference_data)  # Attempt to get the next item
            except StopIteration:
                warn(
                    "Reference data iterator/generator has been exhausted. "
                    "Further mixing augmentations will not be applied.",
                    RuntimeWarning,
                    stacklevel=2,
                )
                return {"mix_data": {}, "mix_coef": 1}

        # If mix_data is None or empty after the above checks, return default values
        if mix_data is None:
            return {"mix_data": {}, "mix_coef": 1}

        # If mix_data is not None, calculate mix_coef and apply read_fn
        mix_coef = beta(self.alpha, self.alpha)  # Assuming beta is defined elsewhere
        return {"mix_data": self.read_fn(mix_data), "mix_coef": mix_coef}

    def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
        res = super().apply_with_params(params, *args, **kwargs)
        if self.mix_coef_return_name:
            res[self.mix_coef_return_name] = params["mix_coef"]
        return res

`apply (self, img, mix_data, mix_coef, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def apply(self, img: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
    if not mix_data:
        return img

    mix_img = mix_data["image"]

    if img.shape != mix_img.shape and not is_grayscale_image(img):
        msg = "The shape of the reference image should be the same as the input image."
        raise ValueError(msg)

    return add_weighted(img, mix_coef, mix_img.reshape(img.shape), 1 - mix_coef) if mix_img is not None else img

`apply_with_params (self, params, *args, **kwargs)` ¶

Apply transforms with parameters.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
    res = super().apply_with_params(params, *args, **kwargs)
    if self.mix_coef_return_name:
        res[self.mix_coef_return_name] = params["mix_coef"]
    return res

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def get_params(self) -> dict[str, None | float | dict[str, Any]]:
    mix_data = None
    # Check if reference_data is not empty and is a sequence (list, tuple, np.array)
    if isinstance(self.reference_data, Sequence) and not isinstance(self.reference_data, (str, bytes)):
        if len(self.reference_data) > 0:  # Additional check to ensure it's not empty
            mix_idx = random.randint(0, len(self.reference_data) - 1)
            mix_data = self.reference_data[mix_idx]
    # Check if reference_data is an iterator or generator
    elif isinstance(self.reference_data, Iterator):
        try:
            mix_data = next(self.reference_data)  # Attempt to get the next item
        except StopIteration:
            warn(
                "Reference data iterator/generator has been exhausted. "
                "Further mixing augmentations will not be applied.",
                RuntimeWarning,
                stacklevel=2,
            )
            return {"mix_data": {}, "mix_coef": 1}

    # If mix_data is None or empty after the above checks, return default values
    if mix_data is None:
        return {"mix_data": {}, "mix_coef": 1}

    # If mix_data is not None, calculate mix_coef and apply read_fn
    mix_coef = beta(self.alpha, self.alpha)  # Assuming beta is defined elsewhere
    return {"mix_data": self.read_fn(mix_data), "mix_coef": mix_coef}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "reference_data", "alpha"

`class OverlayElements` `(metadata_key='overlay_metadata', p=0.5, always_apply=None)` [view source on GitHub] ¶

Apply overlay elements such as images and masks onto an input image. This transformation can be used to add various objects (e.g., stickers, logos) to images with optional masks and bounding boxes for better placement control.

Parameters:

Name	Type	Description
`metadata_key`	`str`	Additional target key for metadata. Default `overlay_metadata`.
`p`	`float`	Probability of applying the transformation. Default: 0.5.

Possible Metadata Fields: - image (np.ndarray): The overlay image to be applied. This is a required field. - bbox (list[int]): The bounding box specifying the region where the overlay should be applied. It should contain four floats: [y_min, x_min, y_max, x_max]. If label_id is provided, it should be appended as the fifth element in the bbox. BBox should be in Albumentations format, that is the same as normalized Pascal VOC format [x_min / width, y_min / height, x_max / width, y_max / height] - mask (np.ndarray): An optional mask that defines the non-rectangular region of the overlay image. If not provided, the entire overlay image is used. - mask_id (int): An optional identifier for the mask. If provided, the regions specified by the mask will be labeled with this identifier in the output mask.

Targets

image, mask

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/mixing/transforms.py

Python

class OverlayElements(ReferenceBasedTransform):
    """Apply overlay elements such as images and masks onto an input image. This transformation can be used to add
    various objects (e.g., stickers, logos) to images with optional masks and bounding boxes for better placement
    control.

    Args:
        metadata_key (str): Additional target key for metadata. Default `overlay_metadata`.
        p (float): Probability of applying the transformation. Default: 0.5.

    Possible Metadata Fields:
        - image (np.ndarray): The overlay image to be applied. This is a required field.
        - bbox (list[int]): The bounding box specifying the region where the overlay should be applied. It should
                            contain four floats: [y_min, x_min, y_max, x_max]. If `label_id` is provided, it should
                            be appended as the fifth element in the bbox. BBox should be in Albumentations format,
                            that is the same as normalized Pascal VOC format
                            [x_min / width, y_min / height, x_max / width, y_max / height]
        - mask (np.ndarray): An optional mask that defines the non-rectangular region of the overlay image. If not
                             provided, the entire overlay image is used.
        - mask_id (int): An optional identifier for the mask. If provided, the regions specified by the mask will
                         be labeled with this identifier in the output mask.

    Targets:
        image, mask

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        metadata_key: str

    def __init__(
        self,
        metadata_key: str = "overlay_metadata",
        p: float = 0.5,
        always_apply: bool | None = None,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.metadata_key = metadata_key

    @property
    def targets_as_params(self) -> list[str]:
        return [self.metadata_key]

    @staticmethod
    def preprocess_metadata(metadata: dict[str, Any], img_shape: SizeType) -> dict[str, Any]:
        overlay_image = metadata["image"]
        overlay_height, overlay_width = overlay_image.shape[:2]
        image_height, image_width = img_shape[:2]

        if "bbox" in metadata:
            bbox = metadata["bbox"]
            check_bbox(bbox)
            denormalized_bbox = denormalize_bbox(bbox[:4], rows=image_height, cols=image_width)

            x_min, y_min, x_max, y_max = (int(x) for x in denormalized_bbox[:4])

            if "mask" in metadata:
                mask = metadata["mask"]
                mask = cv2.resize(mask, (x_max - x_min, y_max - y_min), interpolation=cv2.INTER_NEAREST)
            else:
                mask = np.ones((y_max - y_min, x_max - x_min), dtype=np.uint8)

            overlay_image = cv2.resize(overlay_image, (x_max - x_min, y_max - y_min), interpolation=cv2.INTER_AREA)
            offset = (y_min, x_min)

            if len(bbox) == LENGTH_RAW_BBOX and "bbox_id" in metadata:
                bbox = [x_min, y_min, x_max, y_max, metadata["bbox_id"]]
            else:
                bbox = (x_min, y_min, x_max, y_max, *bbox[4:])
        else:
            if image_height < overlay_height or image_width < overlay_width:
                overlay_image = cv2.resize(overlay_image, (image_width, image_height), interpolation=cv2.INTER_AREA)
                overlay_height, overlay_width = overlay_image.shape[:2]

            mask = metadata["mask"] if "mask" in metadata else np.ones_like(overlay_image, dtype=np.uint8)

            max_x_offset = image_width - overlay_width
            max_y_offset = image_height - overlay_height

            offset_x = random.randint(0, max_x_offset)
            offset_y = random.randint(0, max_y_offset)

            offset = (offset_y, offset_x)

            bbox = [
                offset_x,
                offset_y,
                offset_x + overlay_width,
                offset_y + overlay_height,
            ]

            if "bbox_id" in metadata:
                bbox = [*bbox, metadata["bbox_id"]]

        result = {
            "overlay_image": overlay_image,
            "overlay_mask": mask,
            "offset": offset,
            "bbox": bbox,
        }

        if "mask_id" in metadata:
            result["mask_id"] = metadata["mask_id"]

        return result

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        metadata = data[self.metadata_key]
        img_shape = params["shape"]

        if isinstance(metadata, list):
            overlay_data = [self.preprocess_metadata(md, img_shape) for md in metadata]
        else:
            overlay_data = [self.preprocess_metadata(metadata, img_shape)]

        return {
            "overlay_data": overlay_data,
        }

    def apply(
        self,
        img: np.ndarray,
        overlay_data: list[dict[str, Any]],
        **params: Any,
    ) -> np.ndarray:
        for data in overlay_data:
            overlay_image = data["overlay_image"]
            overlay_mask = data["overlay_mask"]
            offset = data["offset"]
            img = fmixing.copy_and_paste_blend(img, overlay_image, overlay_mask, offset=offset)
        return img

    def apply_to_mask(
        self,
        mask: np.ndarray,
        overlay_data: list[dict[str, Any]],
        **params: Any,
    ) -> np.ndarray:
        for data in overlay_data:
            if "mask_id" in data and data["mask_id"] is not None:
                overlay_mask = data["overlay_mask"]
                offset = data["offset"]
                mask_id = data["mask_id"]

                y_min, x_min = offset
                y_max = y_min + overlay_mask.shape[0]
                x_max = x_min + overlay_mask.shape[1]

                mask_section = mask[y_min:y_max, x_min:x_max]
                mask_section[overlay_mask > 0] = mask_id

        return mask

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("metadata_key",)

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`apply (self, img, overlay_data, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    overlay_data: list[dict[str, Any]],
    **params: Any,
) -> np.ndarray:
    for data in overlay_data:
        overlay_image = data["overlay_image"]
        overlay_mask = data["overlay_mask"]
        offset = data["offset"]
        img = fmixing.copy_and_paste_blend(img, overlay_image, overlay_mask, offset=offset)
    return img

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    metadata = data[self.metadata_key]
    img_shape = params["shape"]

    if isinstance(metadata, list):
        overlay_data = [self.preprocess_metadata(md, img_shape) for md in metadata]
    else:
        overlay_data = [self.preprocess_metadata(metadata, img_shape)]

    return {
        "overlay_data": overlay_data,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/mixing/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("metadata_key",)

`text` `special` ¶

`functional` ¶

`def convert_image_to_pil (image)` [view source on GitHub]¶

Convert a NumPy array image to a PIL image.

Source code in albumentations/augmentations/text/functional.py

Python

def convert_image_to_pil(image: np.ndarray) -> Image:
    """Convert a NumPy array image to a PIL image."""
    try:
        from PIL import Image
    except ImportError:
        raise ImportError("Pillow is not installed") from ImportError

    if len(image.shape) == MONO_CHANNEL_DIMENSIONS:  # (height, width)
        return Image.fromarray(image)
    if len(image.shape) == NUM_MULTI_CHANNEL_DIMENSIONS and image.shape[2] == 1:  # (height, width, 1)
        return Image.fromarray(image[:, :, 0], mode="L")
    if len(image.shape) == NUM_MULTI_CHANNEL_DIMENSIONS and image.shape[2] == NUM_RGB_CHANNELS:  # (height, width, 3)
        return Image.fromarray(image)

    raise TypeError(f"Unsupported image shape: {image.shape}")

`def draw_text_on_multi_channel_image (image, metadata_list)` [view source on GitHub]¶

Draw text on a multi-channel image with more than three channels.

Source code in albumentations/augmentations/text/functional.py

Python

def draw_text_on_multi_channel_image(image: np.ndarray, metadata_list: list[dict[str, Any]]) -> np.ndarray:
    """Draw text on a multi-channel image with more than three channels."""
    try:
        from PIL import ImageDraw, Image
    except ImportError:
        raise ImportError("Pillow is not installed") from ImportError

    channels = [Image.fromarray(image[:, :, i]) for i in range(image.shape[2])]
    pil_images = [ImageDraw.Draw(channel) for channel in channels]

    for metadata in metadata_list:
        bbox_coords = metadata["bbox_coords"]
        text = metadata["text"]
        font = metadata["font"]
        font_color = metadata["font_color"]
        if isinstance(font_color, Sequence):
            font_color = tuple(int(c) for c in font_color)
        position = bbox_coords[:2]

        for channel_id, pil_image in enumerate(pil_images):
            pil_image.text(position, text, font=font, fill=font_color[channel_id])

    return np.stack([np.array(channel) for channel in channels], axis=2)

`def draw_text_on_pil_image (pil_image, metadata_list)` [view source on GitHub]¶

Draw text on a PIL image using metadata information.

Source code in albumentations/augmentations/text/functional.py

Python

def draw_text_on_pil_image(pil_image: Image, metadata_list: list[dict[str, Any]]) -> Image:
    """Draw text on a PIL image using metadata information."""
    try:
        from PIL import ImageDraw
    except ImportError:
        raise ImportError("Pillow is not installed") from ImportError

    draw = ImageDraw.Draw(pil_image)
    for metadata in metadata_list:
        bbox_coords = metadata["bbox_coords"]
        text = metadata["text"]
        font = metadata["font"]
        font_color = metadata["font_color"]
        if isinstance(font_color, (list, tuple)):
            font_color = tuple(int(c) for c in font_color)
        elif isinstance(font_color, float):
            font_color = int(font_color)
        position = bbox_coords[:2]
        draw.text(position, text, font=font, fill=font_color)
    return pil_image

`transforms` ¶

`class TextImage` `(font_path, stopwords=None, augmentations=(None,), fraction_range=(1.0, 1.0), font_size_fraction_range=(0.8, 0.9), font_color='black', clear_bg=False, metadata_key='textimage_metadata', always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply text rendering transformations on images.

This class supports rendering text directly onto images using a variety of configurations, such as custom fonts, font sizes, colors, and augmentation methods. The text can be placed inside specified bounding boxes.

Parameters:

Name	Type	Description
`font_path`	`str \| Path`	Path to the font file to use for rendering text.
`stopwords`	`list[str] \| None`	List of stopwords for text augmentation.
`augmentations`	`tuple[str \| None, ...] \| list[str \| None]`	List of text augmentations to apply. None: text is printed as is "insertion": insert random stop words into the text. "swap": swap random words in the text. "deletion": delete random words from the text.
`fraction_range`	`tuple[float, float]`	Range for selecting a fraction of bounding boxes to modify.
`font_size_fraction_range`	`tuple[float, float]`	Range for selecting the font size as a fraction of bounding box height.
`font_color`	`list[str] \| str`	List of possible font colors or a single font color.
`clear_bg`	`bool`	Whether to clear the background before rendering text.
`metadata_key`	`str`	Key to access metadata in the parameters.
`p`	`float`	Probability of applying the transform.

Targets

image

Image types: uint8, float32

Examples:

Python

>>> import albumentations as A
>>> transform = A.Compose([
    A.TextImage(
        font_path=Path("/path/to/font.ttf"),
        stopwords=["the", "is", "in"],
        augmentations=("insertion", "deletion"),
        fraction_range=(0.5, 1.0),
        font_size_fraction_range=(0.5, 0.9),
        font_color=["red", "green", "blue"],
        metadata_key="text_metadata",
        p=0.5
    )
])
>>> transformed = transform(image=my_image, text_metadata=my_metadata)
>>> image = transformed['image']
# This will render text on `my_image` based on the metadata provided in `my_metadata`.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/text/transforms.py

Python

class TextImage(ImageOnlyTransform):
    """Apply text rendering transformations on images.

    This class supports rendering text directly onto images using a variety of configurations,
    such as custom fonts, font sizes, colors, and augmentation methods. The text can be placed
    inside specified bounding boxes.

    Args:
        font_path (str | Path): Path to the font file to use for rendering text.
        stopwords (list[str] | None): List of stopwords for text augmentation.
        augmentations (tuple[str | None, ...] | list[str | None]): List of text augmentations to apply.
            None: text is printed as is
            "insertion": insert random stop words into the text.
            "swap": swap random words in the text.
            "deletion": delete random words from the text.
        fraction_range (tuple[float, float]): Range for selecting a fraction of bounding boxes to modify.
        font_size_fraction_range (tuple[float, float]): Range for selecting the font size as a fraction of
            bounding box height.
        font_color (list[str] | str): List of possible font colors or a single font color.
        clear_bg (bool): Whether to clear the background before rendering text.
        metadata_key (str): Key to access metadata in the parameters.
        p (float): Probability of applying the transform.

    Targets:
        image

    Image types:
        uint8, float32

    Examples:
        >>> import albumentations as A
        >>> transform = A.Compose([
            A.TextImage(
                font_path=Path("/path/to/font.ttf"),
                stopwords=["the", "is", "in"],
                augmentations=("insertion", "deletion"),
                fraction_range=(0.5, 1.0),
                font_size_fraction_range=(0.5, 0.9),
                font_color=["red", "green", "blue"],
                metadata_key="text_metadata",
                p=0.5
            )
        ])
        >>> transformed = transform(image=my_image, text_metadata=my_metadata)
        >>> image = transformed['image']
        # This will render text on `my_image` based on the metadata provided in `my_metadata`.
    """

    class InitSchema(BaseTransformInitSchema):
        font_path: str
        stopwords: list[str] | None
        augmentations: tuple[str | None, ...] | list[str | None]
        fraction_range: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
        font_size_fraction_range: Annotated[
            tuple[float, float],
            AfterValidator(nondecreasing),
            AfterValidator(check_01),
        ]
        font_color: list[ColorType | str] | ColorType | str
        clear_bg: bool
        metadata_key: str

        @model_validator(mode="after")
        def validate_input(self) -> Self:
            if not self.stopwords:
                self.augmentations = [aug for aug in self.augmentations if aug != "insertion"]

            self.stopwords = self.stopwords or ["the", "is", "in", "at", "of"]

            return self

    def __init__(
        self,
        font_path: str,
        stopwords: list[str] | None = None,
        augmentations: tuple[Literal["insertion", "swap", "deletion"] | None] = (None,),
        fraction_range: tuple[float, float] = (1.0, 1.0),
        font_size_fraction_range: tuple[float, float] = (0.8, 0.9),
        font_color: list[ColorType | str] | ColorType | str = "black",
        clear_bg: bool = False,
        metadata_key: str = "textimage_metadata",
        always_apply: bool | None = None,
        p: float = 0.5,
    ) -> None:
        super().__init__(p=p, always_apply=always_apply)
        self.metadata_key = metadata_key
        self.font_path = font_path
        self.fraction_range = fraction_range
        self.stopwords = stopwords
        self.augmentations = list(augmentations)
        self.font_size_fraction_range = font_size_fraction_range
        self.font_color = font_color
        self.clear_bg = clear_bg

    @property
    def targets_as_params(self) -> list[str]:
        return [self.metadata_key]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "font_path",
            "stopwords",
            "augmentations",
            "fraction_range",
            "font_size_fraction_range",
            "font_color",
            "metadata_key",
            "clear_bg",
        )

    def random_aug(
        self,
        text: str,
        fraction: float,
        choice: Literal["insertion", "swap", "deletion"],
    ) -> str:
        words = [word for word in text.strip().split() if word]
        num_words = len(words)
        num_words_to_modify = max(1, int(fraction * num_words))

        if choice == "insertion":
            result_sentence = ftext.insert_random_stopwords(words, num_words_to_modify, self.stopwords)
        elif choice == "swap":
            result_sentence = ftext.swap_random_words(words, num_words_to_modify)
        elif choice == "deletion":
            result_sentence = ftext.delete_random_words(words, num_words_to_modify)
        else:
            raise ValueError("Invalid choice. Choose from 'insertion', 'swap', or 'deletion'.")

        result_sentence = re.sub(" +", " ", result_sentence).strip()
        return result_sentence if result_sentence != text else ""

    def preprocess_metadata(self, image: np.ndarray, bbox: BoxType, text: str) -> dict[str, Any]:
        image_height, image_width = image.shape[:2]

        check_bbox(bbox)
        denormalized_bbox = denormalize_bbox(bbox[:4], rows=image_height, cols=image_width)

        x_min, y_min, x_max, y_max = (int(x) for x in denormalized_bbox[:4])
        bbox_height = y_max - y_min

        font_size_fraction = random.uniform(*self.font_size_fraction_range)

        font = ImageFont.truetype(str(self.font_path), int(font_size_fraction * bbox_height))

        if not self.augmentations or self.augmentations is None:
            augmented_text = text
        else:
            augmentation = random.choice(self.augmentations)

            augmented_text = text if augmentation is None else self.random_aug(text, 0.5, choice=augmentation)

        font_color = random.choice(self.font_color) if isinstance(self.font_color, list) else self.font_color

        return {
            "bbox_coords": (x_min, y_min, x_max, y_max),
            "text": augmented_text,
            "font": font,
            "font_color": font_color,
        }

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image = data["image"]

        metadata = data[self.metadata_key]

        if metadata == []:
            return {
                "overlay_data": [],
            }

        if isinstance(metadata, dict):
            metadata = [metadata]

        fraction = random.uniform(*self.fraction_range)

        num_bboxes_to_modify = int(len(metadata) * fraction)

        bbox_indices_to_update = random.sample(range(len(metadata)), num_bboxes_to_modify)

        overlay_data = [
            self.preprocess_metadata(image, metadata[index]["bbox"], metadata[index]["text"])
            for index in bbox_indices_to_update
        ]

        return {
            "overlay_data": overlay_data,
        }

    def apply(
        self,
        img: np.ndarray,
        overlay_data: list[dict[str, Any]],
        **params: Any,
    ) -> np.ndarray:
        return ftext.render_text(img, overlay_data, clear_bg=self.clear_bg)

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`apply (self, img, overlay_data, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/text/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    overlay_data: list[dict[str, Any]],
    **params: Any,
) -> np.ndarray:
    return ftext.render_text(img, overlay_data, clear_bg=self.clear_bg)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/text/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image = data["image"]

    metadata = data[self.metadata_key]

    if metadata == []:
        return {
            "overlay_data": [],
        }

    if isinstance(metadata, dict):
        metadata = [metadata]

    fraction = random.uniform(*self.fraction_range)

    num_bboxes_to_modify = int(len(metadata) * fraction)

    bbox_indices_to_update = random.sample(range(len(metadata)), num_bboxes_to_modify)

    overlay_data = [
        self.preprocess_metadata(image, metadata[index]["bbox"], metadata[index]["text"])
        for index in bbox_indices_to_update
    ]

    return {
        "overlay_data": overlay_data,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/text/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "font_path",
        "stopwords",
        "augmentations",
        "fraction_range",
        "font_size_fraction_range",
        "font_color",
        "metadata_key",
        "clear_bg",
    )

`transforms` ¶

`class CLAHE` `(clip_limit=4.0, tile_grid_size=(8, 8), always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply Contrast Limited Adaptive Histogram Equalization to the input image.

Parameters:

Name	Type	Description
`clip_limit`	`ScaleFloatType`	upper threshold value for contrast limiting. If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
`tile_grid_size`	`tuple[int, int]`	size of grid for histogram equalization. Default: (8, 8).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class CLAHE(ImageOnlyTransform):
    """Apply Contrast Limited Adaptive Histogram Equalization to the input image.

    Args:
        clip_limit: upper threshold value for contrast limiting.
            If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
        tile_grid_size: size of grid for histogram equalization. Default: (8, 8).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8

    """

    class InitSchema(BaseTransformInitSchema):
        clip_limit: OnePlusFloatRangeType = (1.0, 4.0)
        tile_grid_size: OnePlusIntRangeType = (8, 8)

    def __init__(
        self,
        clip_limit: ScaleFloatType = 4.0,
        tile_grid_size: tuple[int, int] = (8, 8),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.clip_limit = cast(Tuple[float, float], clip_limit)
        self.tile_grid_size = tile_grid_size

    def apply(self, img: np.ndarray, clip_limit: float, **params: Any) -> np.ndarray:
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "CLAHE transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)

        return fmain.clahe(img, clip_limit, self.tile_grid_size)

    def get_params(self) -> dict[str, float]:
        return {"clip_limit": random.uniform(self.clip_limit[0], self.clip_limit[1])}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("clip_limit", "tile_grid_size")

`apply (self, img, clip_limit, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, clip_limit: float, **params: Any) -> np.ndarray:
    if not is_rgb_image(img) and not is_grayscale_image(img):
        msg = "CLAHE transformation expects 1-channel or 3-channel images."
        raise TypeError(msg)

    return fmain.clahe(img, clip_limit, self.tile_grid_size)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {"clip_limit": random.uniform(self.clip_limit[0], self.clip_limit[1])}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("clip_limit", "tile_grid_size")

`class ChannelShuffle` [view source on GitHub] ¶

Randomly rearrange channels of the image.

Parameters:

Name	Type	Description
`p`		probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ChannelShuffle(ImageOnlyTransform):
    """Randomly rearrange channels of the image.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def apply(self, img: np.ndarray, channels_shuffled: tuple[int, ...], **params: Any) -> np.ndarray:
        return fmain.channel_shuffle(img, channels_shuffled)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        ch_arr = list(range(params["shape"][2]))
        ch_arr = random_utils.shuffle(ch_arr)
        return {"channels_shuffled": ch_arr}

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, channels_shuffled, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, channels_shuffled: tuple[int, ...], **params: Any) -> np.ndarray:
    return fmain.channel_shuffle(img, channels_shuffled)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    ch_arr = list(range(params["shape"][2]))
    ch_arr = random_utils.shuffle(ch_arr)
    return {"channels_shuffled": ch_arr}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class ChromaticAberration` `(primary_distortion_limit=(-0.02, 0.02), secondary_distortion_limit=(-0.05, 0.05), mode='green_purple', interpolation=1, always_apply=None, p=0.5)` [view source on GitHub] ¶

Add lateral chromatic aberration by distorting the red and blue channels of the input image.

Parameters:

Name	Type	Description
`primary_distortion_limit`	`ScaleFloatType`	range of the primary radial distortion coefficient. If primary_distortion_limit is a single float value, the range will be (-primary_distortion_limit, primary_distortion_limit). Controls the distortion in the center of the image (positive values result in pincushion distortion, negative values result in barrel distortion). Default: 0.02.
`secondary_distortion_limit`	`ScaleFloatType`	range of the secondary radial distortion coefficient. If secondary_distortion_limit is a single float value, the range will be (-secondary_distortion_limit, secondary_distortion_limit). Controls the distortion in the corners of the image (positive values result in pincushion distortion, negative values result in barrel distortion). Default: 0.05.
`mode`	`ChromaticAberrationMode`	type of color fringing. Supported modes are 'green_purple', 'red_blue' and 'random'. 'random' will choose one of the modes 'green_purple' or 'red_blue' randomly. Default: 'green_purple'.
`interpolation`	`int`	flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ChromaticAberration(ImageOnlyTransform):
    """Add lateral chromatic aberration by distorting the red and blue channels of the input image.

    Args:
        primary_distortion_limit: range of the primary radial distortion coefficient.
            If primary_distortion_limit is a single float value, the range will be
            (-primary_distortion_limit, primary_distortion_limit).
            Controls the distortion in the center of the image (positive values result in pincushion distortion,
            negative values result in barrel distortion).
            Default: 0.02.
        secondary_distortion_limit: range of the secondary radial distortion coefficient.
            If secondary_distortion_limit is a single float value, the range will be
            (-secondary_distortion_limit, secondary_distortion_limit).
            Controls the distortion in the corners of the image (positive values result in pincushion distortion,
            negative values result in barrel distortion).
            Default: 0.05.
        mode: type of color fringing.
            Supported modes are 'green_purple', 'red_blue' and 'random'.
            'random' will choose one of the modes 'green_purple' or 'red_blue' randomly.
            Default: 'green_purple'.
        interpolation: flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p: probability of applying the transform.
            Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        primary_distortion_limit: SymmetricRangeType = (-0.02, 0.02)
        secondary_distortion_limit: SymmetricRangeType = (-0.05, 0.05)
        mode: ChromaticAberrationMode = Field(default="green_purple", description="Type of color fringing.")
        interpolation: InterpolationType = cv2.INTER_LINEAR

    def __init__(
        self,
        primary_distortion_limit: ScaleFloatType = (-0.02, 0.02),
        secondary_distortion_limit: ScaleFloatType = (-0.05, 0.05),
        mode: ChromaticAberrationMode = "green_purple",
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.primary_distortion_limit = cast(Tuple[float, float], primary_distortion_limit)
        self.secondary_distortion_limit = cast(Tuple[float, float], secondary_distortion_limit)
        self.mode = mode
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        primary_distortion_red: float,
        secondary_distortion_red: float,
        primary_distortion_blue: float,
        secondary_distortion_blue: float,
        **params: Any,
    ) -> np.ndarray:
        return fmain.chromatic_aberration(
            img,
            primary_distortion_red,
            secondary_distortion_red,
            primary_distortion_blue,
            secondary_distortion_blue,
            self.interpolation,
        )

    def get_params(self) -> dict[str, float]:
        primary_distortion_red = random.uniform(*self.primary_distortion_limit)
        secondary_distortion_red = random.uniform(*self.secondary_distortion_limit)
        primary_distortion_blue = random.uniform(*self.primary_distortion_limit)
        secondary_distortion_blue = random.uniform(*self.secondary_distortion_limit)

        secondary_distortion_red = self._match_sign(primary_distortion_red, secondary_distortion_red)
        secondary_distortion_blue = self._match_sign(primary_distortion_blue, secondary_distortion_blue)

        if self.mode == "green_purple":
            # distortion coefficients of the red and blue channels have the same sign
            primary_distortion_blue = self._match_sign(primary_distortion_red, primary_distortion_blue)
            secondary_distortion_blue = self._match_sign(secondary_distortion_red, secondary_distortion_blue)
        if self.mode == "red_blue":
            # distortion coefficients of the red and blue channels have the opposite sign
            primary_distortion_blue = self._unmatch_sign(primary_distortion_red, primary_distortion_blue)
            secondary_distortion_blue = self._unmatch_sign(secondary_distortion_red, secondary_distortion_blue)

        return {
            "primary_distortion_red": primary_distortion_red,
            "secondary_distortion_red": secondary_distortion_red,
            "primary_distortion_blue": primary_distortion_blue,
            "secondary_distortion_blue": secondary_distortion_blue,
        }

    @staticmethod
    def _match_sign(a: float, b: float) -> float:
        # Match the sign of b to a
        if (a < 0 < b) or (a > 0 > b):
            return -b
        return b

    @staticmethod
    def _unmatch_sign(a: float, b: float) -> float:
        # Unmatch the sign of b to a
        if (a < 0 and b < 0) or (a > 0 and b > 0):
            return -b
        return b

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "primary_distortion_limit", "secondary_distortion_limit", "mode", "interpolation"

`apply (self, img, primary_distortion_red, secondary_distortion_red, primary_distortion_blue, secondary_distortion_blue, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    primary_distortion_red: float,
    secondary_distortion_red: float,
    primary_distortion_blue: float,
    secondary_distortion_blue: float,
    **params: Any,
) -> np.ndarray:
    return fmain.chromatic_aberration(
        img,
        primary_distortion_red,
        secondary_distortion_red,
        primary_distortion_blue,
        secondary_distortion_blue,
        self.interpolation,
    )

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    primary_distortion_red = random.uniform(*self.primary_distortion_limit)
    secondary_distortion_red = random.uniform(*self.secondary_distortion_limit)
    primary_distortion_blue = random.uniform(*self.primary_distortion_limit)
    secondary_distortion_blue = random.uniform(*self.secondary_distortion_limit)

    secondary_distortion_red = self._match_sign(primary_distortion_red, secondary_distortion_red)
    secondary_distortion_blue = self._match_sign(primary_distortion_blue, secondary_distortion_blue)

    if self.mode == "green_purple":
        # distortion coefficients of the red and blue channels have the same sign
        primary_distortion_blue = self._match_sign(primary_distortion_red, primary_distortion_blue)
        secondary_distortion_blue = self._match_sign(secondary_distortion_red, secondary_distortion_blue)
    if self.mode == "red_blue":
        # distortion coefficients of the red and blue channels have the opposite sign
        primary_distortion_blue = self._unmatch_sign(primary_distortion_red, primary_distortion_blue)
        secondary_distortion_blue = self._unmatch_sign(secondary_distortion_red, secondary_distortion_blue)

    return {
        "primary_distortion_red": primary_distortion_red,
        "secondary_distortion_red": secondary_distortion_red,
        "primary_distortion_blue": primary_distortion_blue,
        "secondary_distortion_blue": secondary_distortion_blue,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return "primary_distortion_limit", "secondary_distortion_limit", "mode", "interpolation"

`class ColorJitter` `(brightness=(0.8, 1), contrast=(0.8, 1), saturation=(0.8, 1), hue=(-0.5, 0.5), always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision, this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8 overflow, but we use value saturation.

Parameters:

Name	Type	Description
`brightness`	`float or tuple of float (min, max`	How much to jitter brightness. If float: brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] If tuple[float, float]] will be sampled from that range. Both values should be non negative numbers.
`contrast`	`float or tuple of float (min, max`	How much to jitter contrast. If float: contrast_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] If tuple[float, float]] will be sampled from that range. Both values should be non negative numbers.
`saturation`	`float or tuple of float (min, max`	How much to jitter saturation. If float: saturation_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] If tuple[float, float]] will be sampled from that range. Both values should be non negative numbers.
`hue`	`float or tuple of float (min, max`	How much to jitter hue. If float: saturation_factor is chosen uniformly from [-hue, hue]. Should have 0 <= hue <= 0.5. If tuple[float, float]] will be sampled from that range. Both values should be in range [-0.5, 0.5].

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ColorJitter(ImageOnlyTransform):
    """Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision,
    this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in
    Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8
    overflow, but we use value saturation.

    Args:
        brightness (float or tuple of float (min, max)): How much to jitter brightness.
            If float:
                brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
            If tuple[float, float]] will be sampled from that range. Both values should be non negative numbers.
        contrast (float or tuple of float (min, max)): How much to jitter contrast.
            If float:
                contrast_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
            If tuple[float, float]] will be sampled from that range. Both values should be non negative numbers.
        saturation (float or tuple of float (min, max)): How much to jitter saturation.
            If float:
               saturation_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
            If tuple[float, float]] will be sampled from that range. Both values should be non negative numbers.
        hue (float or tuple of float (min, max)): How much to jitter hue.
            If float:
               saturation_factor is chosen uniformly from [-hue, hue]. Should have 0 <= hue <= 0.5.
            If tuple[float, float]] will be sampled from that range. Both values should be in range [-0.5, 0.5].

    """

    class InitSchema(BaseTransformInitSchema):
        brightness: Annotated[ScaleFloatType, Field(default=0.2, description="Range for jittering brightness.")]
        contrast: Annotated[ScaleFloatType, Field(default=0.2, description="Range for jittering contrast.")]
        saturation: Annotated[ScaleFloatType, Field(default=0.2, description="Range for jittering saturation.")]
        hue: Annotated[ScaleFloatType, Field(default=0.2, description="Range for jittering hue.")]

        @field_validator("brightness", "contrast", "saturation", "hue")
        @classmethod
        def check_ranges(cls, value: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
            if info.field_name == "hue":
                bounds = -0.5, 0.5
                bias = 0
                clip = False
            elif info.field_name in ["brightness", "contrast", "saturation"]:
                bounds = 0, float("inf")
                bias = 1
                clip = True

            if isinstance(value, numbers.Number):
                if value < 0:
                    raise ValueError(f"If {info.field_name} is a single number, it must be non negative.")
                value = [bias - value, bias + value]
                if clip:
                    value[0] = max(value[0], 0)
            elif isinstance(value, (tuple, list)) and len(value) == PAIR:
                check_range(value, *bounds, info.field_name)

            return cast(Tuple[float, float], value)

    def __init__(
        self,
        brightness: ScaleFloatType = (0.8, 1),
        contrast: ScaleFloatType = (0.8, 1),
        saturation: ScaleFloatType = (0.8, 1),
        hue: ScaleFloatType = (-0.5, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.brightness = cast(Tuple[float, float], brightness)
        self.contrast = cast(Tuple[float, float], contrast)
        self.saturation = cast(Tuple[float, float], saturation)
        self.hue = cast(Tuple[float, float], hue)

        self.transforms = [
            fmain.adjust_brightness_torchvision,
            fmain.adjust_contrast_torchvision,
            fmain.adjust_saturation_torchvision,
            fmain.adjust_hue_torchvision,
        ]

    def get_params(self) -> dict[str, Any]:
        brightness = random.uniform(self.brightness[0], self.brightness[1])
        contrast = random.uniform(self.contrast[0], self.contrast[1])
        saturation = random.uniform(self.saturation[0], self.saturation[1])
        hue = random.uniform(self.hue[0], self.hue[1])

        order = [0, 1, 2, 3]
        order = random_utils.shuffle(order)

        return {
            "brightness": brightness,
            "contrast": contrast,
            "saturation": saturation,
            "hue": hue,
            "order": order,
        }

    def apply(
        self,
        img: np.ndarray,
        brightness: float,
        contrast: float,
        saturation: float,
        hue: float,
        order: list[int],
        **params: Any,
    ) -> np.ndarray:
        if order is None:
            order = [0, 1, 2, 3]
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "ColorJitter transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)
        color_transforms = [brightness, contrast, saturation, hue]
        for i in order:
            img = self.transforms[i](img, color_transforms[i])
        return img

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return ("brightness", "contrast", "saturation", "hue")

`apply (self, img, brightness, contrast, saturation, hue, order, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    brightness: float,
    contrast: float,
    saturation: float,
    hue: float,
    order: list[int],
    **params: Any,
) -> np.ndarray:
    if order is None:
        order = [0, 1, 2, 3]
    if not is_rgb_image(img) and not is_grayscale_image(img):
        msg = "ColorJitter transformation expects 1-channel or 3-channel images."
        raise TypeError(msg)
    color_transforms = [brightness, contrast, saturation, hue]
    for i in order:
        img = self.transforms[i](img, color_transforms[i])
    return img

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    brightness = random.uniform(self.brightness[0], self.brightness[1])
    contrast = random.uniform(self.contrast[0], self.contrast[1])
    saturation = random.uniform(self.saturation[0], self.saturation[1])
    hue = random.uniform(self.hue[0], self.hue[1])

    order = [0, 1, 2, 3]
    order = random_utils.shuffle(order)

    return {
        "brightness": brightness,
        "contrast": contrast,
        "saturation": saturation,
        "hue": hue,
        "order": order,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return ("brightness", "contrast", "saturation", "hue")

`class Downscale` `(scale_min=None, scale_max=None, interpolation=None, scale_range=(0.25, 0.25), interpolation_pair={'upscale': 0, 'downscale': 0}, always_apply=None, p=0.5)` [view source on GitHub] ¶

Decreases image quality by downscaling and then upscaling it back to its original size.

Parameters:

Name	Type	Description
`scale_range`	`tuple[float, float]`	A tuple defining the minimum and maximum scale to which the image will be downscaled. The range should be between 0 and 1, inclusive at minimum and exclusive at maximum. The first value should be less than or equal to the second value.
`interpolation_pair`	`InterpolationDict`	A dictionary specifying the interpolation methods to use for downscaling and upscaling. Should include keys 'downscale' and 'upscale' with cv2 interpolation flags as values. Example: {"downscale": cv2.INTER_NEAREST, "upscale": cv2.INTER_LINEAR}.

Targets

image

Image types: uint8, float32

Examples:

Python

>>> transform = Downscale(scale_range=(0.5, 0.9), interpolation_pair={"downscale": cv2.INTER_AREA,
                                                  "upscale": cv2.INTER_CUBIC})
>>> transformed = transform(image=img)

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Downscale(ImageOnlyTransform):
    """Decreases image quality by downscaling and then upscaling it back to its original size.

    Args:
        scale_range (tuple[float, float]): A tuple defining the minimum and maximum scale to which the image
            will be downscaled. The range should be between 0 and 1, inclusive at minimum and exclusive at maximum.
            The first value should be less than or equal to the second value.
        interpolation_pair (InterpolationDict): A dictionary specifying the interpolation methods to use for
            downscaling and upscaling. Should include keys 'downscale' and 'upscale' with cv2 interpolation
                flags as values.
            Example: {"downscale": cv2.INTER_NEAREST, "upscale": cv2.INTER_LINEAR}.

    Targets:
        image

    Image types:
        uint8, float32

    Example:
        >>> transform = Downscale(scale_range=(0.5, 0.9), interpolation_pair={"downscale": cv2.INTER_AREA,
                                                          "upscale": cv2.INTER_CUBIC})
        >>> transformed = transform(image=img)
    """

    class InitSchema(BaseTransformInitSchema):
        scale_min: float | None = Field(
            default=None,
            ge=0,
            le=1,
            description="Lower bound on the image scale.",
        )
        scale_max: float | None = Field(
            default=None,
            ge=0,
            lt=1,
            description="Upper bound on the image scale.",
        )

        interpolation: int | Interpolation | InterpolationDict | None = Field(
            default_factory=lambda: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST),
        )
        interpolation_pair: InterpolationPydantic

        scale_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)] = (
            0.25,
            0.25,
        )

        @model_validator(mode="after")
        def validate_params(self) -> Self:
            if self.scale_min is not None and self.scale_max is not None:
                warn(
                    "scale_min and scale_max are deprecated. Use scale_range instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

                self.scale_range = (self.scale_min, self.scale_max)
                self.scale_min = None
                self.scale_max = None

            if self.interpolation is not None:
                warn(
                    "Downscale.interpolation is deprecated. Use Downscale.interpolation_pair instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

                if isinstance(self.interpolation, dict):
                    self.interpolation_pair = InterpolationPydantic(**self.interpolation)
                elif isinstance(self.interpolation, int):
                    self.interpolation_pair = InterpolationPydantic(
                        upscale=self.interpolation,
                        downscale=self.interpolation,
                    )
                elif isinstance(self.interpolation, Interpolation):
                    self.interpolation_pair = InterpolationPydantic(
                        upscale=self.interpolation.upscale,
                        downscale=self.interpolation.downscale,
                    )
                self.interpolation = None

            return self

    def __init__(
        self,
        scale_min: float | None = None,
        scale_max: float | None = None,
        interpolation: int | Interpolation | InterpolationDict | None = None,
        scale_range: tuple[float, float] = (0.25, 0.25),
        interpolation_pair: InterpolationDict = InterpolationDict(
            {"upscale": cv2.INTER_NEAREST, "downscale": cv2.INTER_NEAREST},
        ),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.scale_range = scale_range
        self.interpolation_pair = interpolation_pair

    def apply(self, img: np.ndarray, scale: float, **params: Any) -> np.ndarray:
        return fmain.downscale(
            img,
            scale=scale,
            down_interpolation=self.interpolation_pair["downscale"],
            up_interpolation=self.interpolation_pair["upscale"],
        )

    def get_params(self) -> dict[str, Any]:
        return {"scale": random.uniform(self.scale_range[0], self.scale_range[1])}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("scale_range", "interpolation_pair")

`apply (self, img, scale, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, scale: float, **params: Any) -> np.ndarray:
    return fmain.downscale(
        img,
        scale=scale,
        down_interpolation=self.interpolation_pair["downscale"],
        up_interpolation=self.interpolation_pair["upscale"],
    )

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {"scale": random.uniform(self.scale_range[0], self.scale_range[1])}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("scale_range", "interpolation_pair")

`class Emboss` `(alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=None, p=0.5)` [view source on GitHub] ¶

Emboss the input image and overlays the result with the original image.

Parameters:

Name	Type	Description
`alpha`	`tuple[float, float]`	range to choose the visibility of the embossed image. At 0, only the original image is visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
`strength`	`tuple[float, float]`	strength range of the embossing. Default: (0.2, 0.7).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Emboss(ImageOnlyTransform):
    """Emboss the input image and overlays the result with the original image.

    Args:
        alpha: range to choose the visibility of the embossed image. At 0, only the original image is
            visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
        strength: strength range of the embossing. Default: (0.2, 0.7).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    """

    class InitSchema(BaseTransformInitSchema):
        alpha: ZeroOneRangeType = (0.2, 0.5)
        strength: NonNegativeFloatRangeType = (0.2, 0.7)

    def __init__(
        self,
        alpha: tuple[float, float] = (0.2, 0.5),
        strength: tuple[float, float] = (0.2, 0.7),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha
        self.strength = strength

    @staticmethod
    def __generate_emboss_matrix(alpha_sample: np.ndarray, strength_sample: np.ndarray) -> np.ndarray:
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [
                [-1 - strength_sample, 0 - strength_sample, 0],
                [0 - strength_sample, 1, 0 + strength_sample],
                [0, 0 + strength_sample, 1 + strength_sample],
            ],
            dtype=np.float32,
        )
        return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect

    def get_params(self) -> dict[str, np.ndarray]:
        alpha = random.uniform(*self.alpha)
        strength = random.uniform(*self.strength)
        emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
        return {"emboss_matrix": emboss_matrix}

    def apply(self, img: np.ndarray, emboss_matrix: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, emboss_matrix)

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("alpha", "strength")

`apply (self, img, emboss_matrix, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, emboss_matrix: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, emboss_matrix)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, np.ndarray]:
    alpha = random.uniform(*self.alpha)
    strength = random.uniform(*self.strength)
    emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
    return {"emboss_matrix": emboss_matrix}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("alpha", "strength")

`class Equalize` `(mode='cv', by_channels=True, mask=None, mask_params=(), always_apply=None, p=0.5)` [view source on GitHub] ¶

Equalize the image histogram.

Parameters:

Name	Type	Description
`mode`	`str`	{'cv', 'pil'}. Use OpenCV or Pillow equalization method.
`by_channels`	`bool`	If True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by `Y` channel.
`mask`	`np.ndarray, callable`	If given, only the pixels selected by the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable. Function signature must include `image` argument.
`mask_params`	`list of str`	Params for mask function.

Targets

image

Image types: uint8

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Equalize(ImageOnlyTransform):
    """Equalize the image histogram.

    Args:
        mode (str): {'cv', 'pil'}. Use OpenCV or Pillow equalization method.
        by_channels (bool): If True, use equalization by channels separately,
            else convert image to YCbCr representation and use equalization by `Y` channel.
        mask (np.ndarray, callable): If given, only the pixels selected by
            the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable.
            Function signature must include `image` argument.
        mask_params (list of str): Params for mask function.

    Targets:
        image

    Image types:
        uint8

    """

    class InitSchema(BaseTransformInitSchema):
        mode: ImageMode = "cv"
        by_channels: Annotated[bool, Field(default=True, description="Equalize channels separately if True")]
        mask: Annotated[
            np.ndarray | Callable[..., Any] | None,
            Field(default=None, description="Mask to apply for equalization"),
        ]
        mask_params: Annotated[Sequence[str], Field(default=[], description="Parameters for mask function")]

    def __init__(
        self,
        mode: ImageMode = "cv",
        by_channels: bool = True,
        mask: np.ndarray | Callable[..., Any] | None = None,
        mask_params: Sequence[str] = (),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.mode = mode
        self.by_channels = by_channels
        self.mask = mask
        self.mask_params = mask_params

    def apply(self, img: np.ndarray, mask: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.equalize(img, mode=self.mode, by_channels=self.by_channels, mask=mask)

    def get_params_dependent_on_targets(self, params: dict[str, Any]) -> dict[str, Any]:
        if not callable(self.mask):
            return {"mask": self.mask}

        return {"mask": self.mask(**params)}

    @property
    def targets_as_params(self) -> list[str]:
        return ["image", *list(self.mask_params)]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("mode", "by_channels", "mask", "mask_params")

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`apply (self, img, mask, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, mask: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.equalize(img, mode=self.mode, by_channels=self.by_channels, mask=mask)

`get_params_dependent_on_targets (self, params)` ¶

This method is deprecated. Use get_params_dependent_on_data instead. Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_targets(self, params: dict[str, Any]) -> dict[str, Any]:
    if not callable(self.mask):
        return {"mask": self.mask}

    return {"mask": self.mask(**params)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("mode", "by_channels", "mask", "mask_params")

`class FancyPCA` `(alpha=0.1, p=0.5, always_apply=None)` [view source on GitHub] ¶

Augment RGB image using FancyPCA from Krizhevsky's paper "ImageNet Classification with Deep Convolutional Neural Networks"

Parameters:

Name	Type	Description
`alpha`	`float`	how much to perturb/scale the eigen vectors and eigenvalues. scale is samples from gaussian distribution (mu=0, sigma=alpha)

Targets

image

Image types: 3-channel uint8 images only

Credit

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image https://pixelatedbrian.github.io/2018-04-29-fancy_pca/

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class FancyPCA(ImageOnlyTransform):
    """Augment RGB image using FancyPCA from Krizhevsky's paper
    "ImageNet Classification with Deep Convolutional Neural Networks"

    Args:
        alpha:  how much to perturb/scale the eigen vectors and eigenvalues.
            scale is samples from gaussian distribution (mu=0, sigma=alpha)

    Targets:
        image

    Image types:
        3-channel uint8 images only

    Credit:
        http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
        https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image
        https://pixelatedbrian.github.io/2018-04-29-fancy_pca/

    """

    class InitSchema(BaseTransformInitSchema):
        alpha: float = Field(default=0.1, description="Scale for perturbing the eigen vectors and values", ge=0)

    def __init__(self, alpha: float = 0.1, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha

    def apply(self, img: np.ndarray, alpha: float, **params: Any) -> np.ndarray:
        return fmain.fancy_pca(img, alpha)

    def get_params(self) -> dict[str, float]:
        return {"alpha": random.gauss(0, self.alpha)}

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("alpha",)

`apply (self, img, alpha, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, alpha: float, **params: Any) -> np.ndarray:
    return fmain.fancy_pca(img, alpha)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {"alpha": random.gauss(0, self.alpha)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str]:
    return ("alpha",)

`class FromFloat` `(dtype='uint16', max_value=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

Take an input array where all values should lie in the range [0, 1.0], multiply them by max_value and then cast the resulted value to a type specified by dtype. If max_value is None the transform will try to infer the maximum value for the data type from the dtype argument.

This is the inverse transform for :class:~albumentations.augmentations.transforms.ToFloat.

Parameters:

Name	Type	Description
`max_value`	`float \| None`	maximum possible input value. Default: None.
`dtype`	`Literal['uint8', 'uint16', 'float32', 'float64']`	data type of the output. See the `'Data types' page from the NumPy docs`_. Default: 'uint16'.
`p`	`float`	probability of applying the transform. Default: 1.0.

Targets

image

Image types: float32

.. _'Data types' page from the NumPy docs: https://docs.scipy.org/doc/numpy/user/basics.types.html

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class FromFloat(ImageOnlyTransform):
    """Take an input array where all values should lie in the range [0, 1.0], multiply them by `max_value` and then
    cast the resulted value to a type specified by `dtype`. If `max_value` is None the transform will try to infer
    the maximum value for the data type from the `dtype` argument.

    This is the inverse transform for :class:`~albumentations.augmentations.transforms.ToFloat`.

    Args:
        max_value: maximum possible input value. Default: None.
        dtype: data type of the output. See the `'Data types' page from the NumPy docs`_.
            Default: 'uint16'.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        float32

    .. _'Data types' page from the NumPy docs:
       https://docs.scipy.org/doc/numpy/user/basics.types.html

    """

    class InitSchema(BaseTransformInitSchema):
        dtype: Literal["uint8", "uint16", "float32", "float64"]
        max_value: float | None = Field(default=None, description="Maximum possible input value.")
        p: ProbabilityType = 1

    def __init__(
        self,
        dtype: Literal["uint8", "uint16", "float32", "float64"] = "uint16",
        max_value: float | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.dtype = np.dtype(dtype)
        self.max_value = max_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.from_float(img, self.dtype, self.max_value)

    def get_transform_init_args(self) -> dict[str, Any]:
        return {"dtype": self.dtype.name, "max_value": self.max_value}

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.from_float(img, self.dtype, self.max_value)

`class GaussNoise` `(var_limit=(10.0, 50.0), mean=0, per_channel=True, noise_scale_factor=1, always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply Gaussian noise to the input image.

Parameters:

Name	Type	Description
`var_limit`	`Union[float, tuple[float, float]]`	Variance range for noise. If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0).
`mean`	`float`	Mean of the noise. Default: 0
`per_channel`	`bool`	If set to True, noise will be sampled for each channel independently. Otherwise, the noise will be sampled once for all channels. Faster when `per_channel = False`. Default: True
`noise_scale_factor`	`float`	Scaling factor for noise generation. Value should be in the range (0, 1]. When set to 1, noise is sampled for each pixel independently. If less, noise is sampled for a smaller size and resized to fit the shape of the image. Smaller values make the transform faster. Default: 1.0.
`p`	`float`	Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class GaussNoise(ImageOnlyTransform):
    """Apply Gaussian noise to the input image.

    Args:
        var_limit (Union[float, tuple[float, float]]): Variance range for noise.
            If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0).
        mean (float): Mean of the noise. Default: 0
        per_channel (bool): If set to True, noise will be sampled for each channel independently.
            Otherwise, the noise will be sampled once for all channels.
            Faster when `per_channel = False`.
            Default: True
        noise_scale_factor (float): Scaling factor for noise generation. Value should be in the range (0, 1].
            When set to 1, noise is sampled for each pixel independently. If less, noise is sampled for a smaller size
            and resized to fit the shape of the image. Smaller values make the transform faster. Default: 1.0.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        var_limit: NonNegativeFloatRangeType = Field(default=(10.0, 50.0), description="Variance range for noise.")
        mean: float = Field(default=0, description="Mean of the noise.")
        per_channel: bool = Field(default=True, description="Apply noise per channel.")
        noise_scale_factor: float = Field(gt=0, le=1)

    def __init__(
        self,
        var_limit: ScaleFloatType = (10.0, 50.0),
        mean: float = 0,
        per_channel: bool = True,
        noise_scale_factor: float = 1,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.var_limit = cast(Tuple[float, float], var_limit)
        self.mean = mean
        self.per_channel = per_channel
        self.noise_scale_factor = noise_scale_factor

    def apply(self, img: np.ndarray, gauss: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.add_noise(img, gauss)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, float]:
        image = data["image"] if "image" in data else data["images"][0]
        var = random.uniform(self.var_limit[0], self.var_limit[1])
        sigma = math.sqrt(var)

        if self.per_channel:
            target_shape = image.shape
            if self.noise_scale_factor == 1:
                gauss = random_utils.normal(self.mean, sigma, target_shape)
            else:
                gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)
        else:
            target_shape = image.shape[:2]
            if self.noise_scale_factor == 1:
                gauss = random_utils.normal(self.mean, sigma, target_shape)
            else:
                gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)

            if image.ndim > MONO_CHANNEL_DIMENSIONS:
                gauss = np.expand_dims(gauss, -1)

        return {"gauss": gauss}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "var_limit", "per_channel", "mean", "noise_scale_factor"

`apply (self, img, gauss, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, gauss: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.add_noise(img, gauss)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, float]:
    image = data["image"] if "image" in data else data["images"][0]
    var = random.uniform(self.var_limit[0], self.var_limit[1])
    sigma = math.sqrt(var)

    if self.per_channel:
        target_shape = image.shape
        if self.noise_scale_factor == 1:
            gauss = random_utils.normal(self.mean, sigma, target_shape)
        else:
            gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)
    else:
        target_shape = image.shape[:2]
        if self.noise_scale_factor == 1:
            gauss = random_utils.normal(self.mean, sigma, target_shape)
        else:
            gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)

        if image.ndim > MONO_CHANNEL_DIMENSIONS:
            gauss = np.expand_dims(gauss, -1)

    return {"gauss": gauss}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "var_limit", "per_channel", "mean", "noise_scale_factor"

`class HueSaturationValue` `(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly change hue, saturation and value of the input image.

Parameters:

Name	Type	Description
`hue_shift_limit`	`ScaleIntType`	range for changing hue. If hue_shift_limit is a single int, the range will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
`sat_shift_limit`	`ScaleIntType`	range for changing saturation. If sat_shift_limit is a single int, the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
`val_shift_limit`	`ScaleIntType`	range for changing value. If val_shift_limit is a single int, the range will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class HueSaturationValue(ImageOnlyTransform):
    """Randomly change hue, saturation and value of the input image.

    Args:
        hue_shift_limit: range for changing hue. If hue_shift_limit is a single int, the range
            will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
        sat_shift_limit: range for changing saturation. If sat_shift_limit is a single int,
            the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
        val_shift_limit: range for changing value. If val_shift_limit is a single int, the range
            will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        hue_shift_limit: SymmetricRangeType = (-20, 20)
        sat_shift_limit: SymmetricRangeType = (-30, 30)
        val_shift_limit: SymmetricRangeType = (-20, 20)

    def __init__(
        self,
        hue_shift_limit: ScaleIntType = 20,
        sat_shift_limit: ScaleIntType = 30,
        val_shift_limit: ScaleIntType = 20,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.hue_shift_limit = cast(Tuple[float, float], hue_shift_limit)
        self.sat_shift_limit = cast(Tuple[float, float], sat_shift_limit)
        self.val_shift_limit = cast(Tuple[float, float], val_shift_limit)

    def apply(
        self,
        img: np.ndarray,
        hue_shift: int,
        sat_shift: int,
        val_shift: int,
        **params: Any,
    ) -> np.ndarray:
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "HueSaturationValue transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)
        return fmain.shift_hsv(img, hue_shift, sat_shift, val_shift)

    def get_params(self) -> dict[str, float]:
        return {
            "hue_shift": random.uniform(self.hue_shift_limit[0], self.hue_shift_limit[1]),
            "sat_shift": random.uniform(self.sat_shift_limit[0], self.sat_shift_limit[1]),
            "val_shift": random.uniform(self.val_shift_limit[0], self.val_shift_limit[1]),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("hue_shift_limit", "sat_shift_limit", "val_shift_limit")

`apply (self, img, hue_shift, sat_shift, val_shift, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    hue_shift: int,
    sat_shift: int,
    val_shift: int,
    **params: Any,
) -> np.ndarray:
    if not is_rgb_image(img) and not is_grayscale_image(img):
        msg = "HueSaturationValue transformation expects 1-channel or 3-channel images."
        raise TypeError(msg)
    return fmain.shift_hsv(img, hue_shift, sat_shift, val_shift)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {
        "hue_shift": random.uniform(self.hue_shift_limit[0], self.hue_shift_limit[1]),
        "sat_shift": random.uniform(self.sat_shift_limit[0], self.sat_shift_limit[1]),
        "val_shift": random.uniform(self.val_shift_limit[0], self.val_shift_limit[1]),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("hue_shift_limit", "sat_shift_limit", "val_shift_limit")

`class ISONoise` `(color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply camera sensor noise.

Parameters:

Name	Type	Description
`color_shift`	`float, float`	variance range for color hue change. Measured as a fraction of 360 degree Hue angle in HLS colorspace.
`intensity`	`float, float`	Multiplicative factor that control strength of color and luminace noise.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Exceptions:

Type	Description
`TypeError`	If the input image is not RGB.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ISONoise(ImageOnlyTransform):
    """Apply camera sensor noise.

    Args:
        color_shift (float, float): variance range for color hue change.
            Measured as a fraction of 360 degree Hue angle in HLS colorspace.
        intensity ((float, float): Multiplicative factor that control strength
            of color and luminace noise.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Raises:
        TypeError: If the input image is not RGB.

    """

    class InitSchema(BaseTransformInitSchema):
        color_shift: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)] = Field(
            default=(0.01, 0.05),
            description=(
                "Variance range for color hue change. Measured as a fraction of 360 degree Hue angle in HLS colorspace."
            ),
        )
        intensity: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)] = Field(
            default=(0.1, 0.5),
            description="Multiplicative factor that control strength of color and luminance noise.",
        )

    def __init__(
        self,
        color_shift: tuple[float, float] = (0.01, 0.05),
        intensity: tuple[float, float] = (0.1, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.intensity = intensity
        self.color_shift = color_shift

    def apply(
        self,
        img: np.ndarray,
        color_shift: float,
        intensity: float,
        random_seed: int,
        **params: Any,
    ) -> np.ndarray:
        return fmain.iso_noise(img, color_shift, intensity, np.random.RandomState(random_seed))

    def get_params(self) -> dict[str, Any]:
        return {
            "color_shift": random.uniform(self.color_shift[0], self.color_shift[1]),
            "intensity": random.uniform(self.intensity[0], self.intensity[1]),
            "random_seed": random_utils.get_random_seed(),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "intensity", "color_shift"

`apply (self, img, color_shift, intensity, random_seed, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    color_shift: float,
    intensity: float,
    random_seed: int,
    **params: Any,
) -> np.ndarray:
    return fmain.iso_noise(img, color_shift, intensity, np.random.RandomState(random_seed))

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {
        "color_shift": random.uniform(self.color_shift[0], self.color_shift[1]),
        "intensity": random.uniform(self.intensity[0], self.intensity[1]),
        "random_seed": random_utils.get_random_seed(),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return "intensity", "color_shift"

`class ImageCompression` `(quality_lower=None, quality_upper=None, compression_type=<ImageCompressionType.JPEG: 0>, quality_range=(99, 100), always_apply=None, p=0.5)` [view source on GitHub] ¶

Decreases image quality by Jpeg, WebP compression of an image.

Parameters:

Name	Type	Description
`quality_range`	`tuple[int, int]`	tuple of bounds on the image quality i.e. (quality_lower, quality_upper). Both values should be in [1, 100] range.
`compression_type`	`ImageCompressionType`	should be ImageCompressionType.JPEG or ImageCompressionType.WEBP. Default: ImageCompressionType.JPEG

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ImageCompression(ImageOnlyTransform):
    """Decreases image quality by Jpeg, WebP compression of an image.

    Args:
        quality_range: tuple of bounds on the image quality i.e. (quality_lower, quality_upper).
            Both values should be in [1, 100] range.
        compression_type (ImageCompressionType): should be ImageCompressionType.JPEG or ImageCompressionType.WEBP.
            Default: ImageCompressionType.JPEG

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        quality_range: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] = (
            99,
            100,
        )

        quality_lower: int | None = Field(
            default=None,
            description="Lower bound on the image quality",
            ge=1,
            le=100,
        )
        quality_upper: int | None = Field(
            default=None,
            description="Upper bound on the image quality",
            ge=1,
            le=100,
        )
        compression_type: ImageCompressionType = Field(
            default=ImageCompressionType.JPEG,
            description="Image compression format",
        )

        @model_validator(mode="after")
        def validate_ranges(self) -> Self:
            # Update the quality_range based on the non-None values of quality_lower and quality_upper
            if self.quality_lower is not None or self.quality_upper is not None:
                if self.quality_lower is not None:
                    warn(
                        "`quality_lower` is deprecated. Use `quality_range` as tuple"
                        " (quality_lower, quality_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.quality_upper is not None:
                    warn(
                        "`quality_upper` is deprecated. Use `quality_range` as tuple"
                        " (quality_lower, quality_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.quality_lower if self.quality_lower is not None else self.quality_range[0]
                upper = self.quality_upper if self.quality_upper is not None else self.quality_range[1]
                self.quality_range = (lower, upper)
                # Clear the deprecated individual quality settings
                self.quality_lower = None
                self.quality_upper = None

            # Validate the quality_range
            if not (1 <= self.quality_range[0] <= MAX_JPEG_QUALITY and 1 <= self.quality_range[1] <= MAX_JPEG_QUALITY):
                raise ValueError(f"Quality range values should be within [1, {MAX_JPEG_QUALITY}] range.")

            return self

    def __init__(
        self,
        quality_lower: int | None = None,
        quality_upper: int | None = None,
        compression_type: ImageCompressionType = ImageCompressionType.JPEG,
        quality_range: tuple[int, int] = (99, 100),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.quality_range = quality_range
        self.compression_type = compression_type

    def apply(self, img: np.ndarray, quality: int, image_type: Literal[".jpg", ".webp"], **params: Any) -> np.ndarray:
        if img.ndim != MONO_CHANNEL_DIMENSIONS and img.shape[-1] not in (1, 3, 4):
            msg = "ImageCompression transformation expects 1, 3 or 4 channel images."
            raise TypeError(msg)
        return fmain.image_compression(img, quality, image_type)

    def get_params(self) -> dict[str, int | str]:
        if self.compression_type == ImageCompressionType.JPEG:
            image_type = ".jpg"
        elif self.compression_type == ImageCompressionType.WEBP:
            image_type = ".webp"
        else:
            raise ValueError(f"Unknown image compression type: {self.compression_type}")

        return {
            "quality": random.randint(self.quality_range[0], self.quality_range[1]),
            "image_type": image_type,
        }

    def get_transform_init_args(self) -> dict[str, Any]:
        return {
            "quality_range": self.quality_range,
            "compression_type": self.compression_type.value,
        }

`apply (self, img, quality, image_type, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, quality: int, image_type: Literal[".jpg", ".webp"], **params: Any) -> np.ndarray:
    if img.ndim != MONO_CHANNEL_DIMENSIONS and img.shape[-1] not in (1, 3, 4):
        msg = "ImageCompression transformation expects 1, 3 or 4 channel images."
        raise TypeError(msg)
    return fmain.image_compression(img, quality, image_type)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, int | str]:
    if self.compression_type == ImageCompressionType.JPEG:
        image_type = ".jpg"
    elif self.compression_type == ImageCompressionType.WEBP:
        image_type = ".webp"
    else:
        raise ValueError(f"Unknown image compression type: {self.compression_type}")

    return {
        "quality": random.randint(self.quality_range[0], self.quality_range[1]),
        "image_type": image_type,
    }

`class InvertImg` [view source on GitHub] ¶

Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.

Parameters:

Name	Type	Description
`p`		probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class InvertImg(ImageOnlyTransform):
    """Invert the input image by subtracting pixel values from max values of the image types,
    i.e., 255 for uint8 and 1.0 for float32.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.invert(img)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.invert(img)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class Lambda` `(image=None, mask=None, keypoint=None, bbox=None, global_label=None, name=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

Parameters:

Name	Type	Description
`image`	`Callable[..., Any] \| None`	Image transformation function.
`mask`	`Callable[..., Any] \| None`	Mask transformation function.
`keypoint`	`Callable[..., Any] \| None`	Keypoint transformation function.
`bbox`	`Callable[..., Any] \| None`	BBox transformation function.
`global_label`	`Callable[..., Any] \| None`	Global label transformation function.
`p`	`float`	probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints, global_label

Image types: Any

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Lambda(NoOp):
    """A flexible transformation class for using user-defined transformation functions per targets.
    Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

    Args:
        image: Image transformation function.
        mask: Mask transformation function.
        keypoint: Keypoint transformation function.
        bbox: BBox transformation function.
        global_label: Global label transformation function.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints, global_label

    Image types:
        Any

    """

    def __init__(
        self,
        image: Callable[..., Any] | None = None,
        mask: Callable[..., Any] | None = None,
        keypoint: Callable[..., Any] | None = None,
        bbox: Callable[..., Any] | None = None,
        global_label: Callable[..., Any] | None = None,
        name: str | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)

        self.name = name
        self.custom_apply_fns = {
            target_name: fmain.noop for target_name in ("image", "mask", "keypoint", "bbox", "global_label")
        }
        for target_name, custom_apply_fn in {
            "image": image,
            "mask": mask,
            "keypoint": keypoint,
            "bbox": bbox,
            "global_label": global_label,
        }.items():
            if custom_apply_fn is not None:
                if isinstance(custom_apply_fn, LambdaType) and custom_apply_fn.__name__ == "<lambda>":
                    warnings.warn(
                        "Using lambda is incompatible with multiprocessing. "
                        "Consider using regular functions or partial().",
                        stacklevel=2,
                    )

                self.custom_apply_fns[target_name] = custom_apply_fn

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["image"]
        return fn(img, **params)

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["mask"]
        return fn(mask, **params)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        fn = self.custom_apply_fns["bbox"]
        return fn(bbox, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        fn = self.custom_apply_fns["keypoint"]
        return fn(keypoint, **params)

    def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["global_label"]
        return fn(label, **params)

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    def to_dict_private(self) -> dict[str, Any]:
        if self.name is None:
            msg = (
                "To make a Lambda transform serializable you should provide the `name` argument, "
                "e.g. `Lambda(name='my_transform', image=<some func>, ...)`."
            )
            raise ValueError(msg)
        return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}

    def __repr__(self) -> str:
        state = {"name": self.name}
        state.update(self.custom_apply_fns.items())  # type: ignore[arg-type]
        state.update(self.get_base_init_args())
        return f"{self.__class__.__name__}({format_args(state)})"

`init (self, image=None, mask=None, keypoint=None, bbox=None, global_label=None, name=None, always_apply=None, p=1.0)` `special` ¶

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/transforms.py

Python

def __init__(
    self,
    image: Callable[..., Any] | None = None,
    mask: Callable[..., Any] | None = None,
    keypoint: Callable[..., Any] | None = None,
    bbox: Callable[..., Any] | None = None,
    global_label: Callable[..., Any] | None = None,
    name: str | None = None,
    always_apply: bool | None = None,
    p: float = 1.0,
):
    super().__init__(p, always_apply)

    self.name = name
    self.custom_apply_fns = {
        target_name: fmain.noop for target_name in ("image", "mask", "keypoint", "bbox", "global_label")
    }
    for target_name, custom_apply_fn in {
        "image": image,
        "mask": mask,
        "keypoint": keypoint,
        "bbox": bbox,
        "global_label": global_label,
    }.items():
        if custom_apply_fn is not None:
            if isinstance(custom_apply_fn, LambdaType) and custom_apply_fn.__name__ == "<lambda>":
                warnings.warn(
                    "Using lambda is incompatible with multiprocessing. "
                    "Consider using regular functions or partial().",
                    stacklevel=2,
                )

            self.custom_apply_fns[target_name] = custom_apply_fn

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    fn = self.custom_apply_fns["image"]
    return fn(img, **params)

`class Morphological` `(scale=(2, 3), operation='dilation', always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply a morphological operation (dilation or erosion) to an image, with particular value for enhancing document scans.

Morphological operations modify the structure of the image. Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them. These operations are beneficial in document processing, for example: - Dilation helps in closing up gaps within text or making thin lines thicker, enhancing legibility for OCR (Optical Character Recognition). - Erosion can remove small white noise and detach connected objects, making the structure of larger objects more pronounced.

Parameters:

Name	Type	Description
`scale`	`int or tuple/list of int`	Specifies the size of the structuring element (kernel) used for the operation. - If an integer is provided, a square kernel of that size will be used. - If a tuple or list is provided, it should contain two integers representing the minimum and maximum sizes for the dilation kernel.
`operation`	`str`	The morphological operation to apply. Options are 'dilation' or 'erosion'. Default is 'dilation'.
`p`	`float`	The probability of applying this transformation. Default is 0.5.

Targets

image, mask

Image types: uint8, float32

Reference

https://github.com/facebookresearch/nougat

Examples:

Python

>>> import albumentations as A
>>> transform = A.Compose([
>>>     A.Morphological(scale=(2, 3), operation='dilation', p=0.5)
>>> ])
>>> image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Morphological(DualTransform):
    """Apply a morphological operation (dilation or erosion) to an image,
    with particular value for enhancing document scans.

    Morphological operations modify the structure of the image.
    Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them.
    These operations are beneficial in document processing, for example:
    - Dilation helps in closing up gaps within text or making thin lines thicker,
        enhancing legibility for OCR (Optical Character Recognition).
    - Erosion can remove small white noise and detach connected objects,
        making the structure of larger objects more pronounced.

    Args:
        scale (int or tuple/list of int): Specifies the size of the structuring element (kernel) used for the operation.
            - If an integer is provided, a square kernel of that size will be used.
            - If a tuple or list is provided, it should contain two integers representing the minimum
                and maximum sizes for the dilation kernel.
        operation (str, optional): The morphological operation to apply. Options are 'dilation' or 'erosion'.
            Default is 'dilation'.
        p (float, optional): The probability of applying this transformation. Default is 0.5.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://github.com/facebookresearch/nougat

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        >>>     A.Morphological(scale=(2, 3), operation='dilation', p=0.5)
        >>> ])
        >>> image = transform(image=image)["image"]
    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        scale: OnePlusIntRangeType = (2, 3)
        operation: MorphologyMode = "dilation"

    def __init__(
        self,
        scale: ScaleIntType = (2, 3),
        operation: MorphologyMode = "dilation",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.scale = cast(Tuple[int, int], scale)
        self.operation = operation

    def apply(self, img: np.ndarray, kernel: tuple[int, int], **params: Any) -> np.ndarray:
        return fmain.morphology(img, kernel, self.operation)

    def apply_to_mask(self, mask: np.ndarray, kernel: tuple[int, int], **params: Any) -> np.ndarray:
        return fmain.morphology(mask, kernel, self.operation)

    def get_params(self) -> dict[str, float]:
        return {
            "kernel": cv2.getStructuringElement(cv2.MORPH_ELLIPSE, self.scale),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("scale", "operation")

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
        }

`apply (self, img, kernel, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, kernel: tuple[int, int], **params: Any) -> np.ndarray:
    return fmain.morphology(img, kernel, self.operation)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {
        "kernel": cv2.getStructuringElement(cv2.MORPH_ELLIPSE, self.scale),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("scale", "operation")

`class MultiplicativeNoise` `(multiplier=(0.9, 1.1), per_channel=None, elementwise=False, always_apply=None, p=0.5)` [view source on GitHub] ¶

Multiply image by a random number or array of numbers.

Parameters:

Name	Type	Description
`multiplier`	`ScaleFloatType`	If a single float, the image will be multiplied by this number. If a tuple of floats, the multiplier will be a random number in the range `[multiplier[0], multiplier[1])`. Default: (0.9, 1.1).
`elementwise`	`bool`	If `False`, multiply all pixels in the image by a single random value sampled once. If `True`, multiply image pixels by values that are pixelwise randomly sampled. Default: False.
`p`	`float`	Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, np.float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class MultiplicativeNoise(ImageOnlyTransform):
    """Multiply image by a random number or array of numbers.

    Args:
        multiplier: If a single float, the image will be multiplied by this number.
            If a tuple of floats, the multiplier will be a random number in the range `[multiplier[0], multiplier[1])`.
            Default: (0.9, 1.1).
        elementwise: If `False`, multiply all pixels in the image by a single random value sampled once.
            If `True`, multiply image pixels by values that are pixelwise randomly sampled. Default: False.
        p: Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, np.float32

    """

    class InitSchema(BaseTransformInitSchema):
        multiplier: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)] = (
            0.9,
            1.1,
        )
        per_channel: bool | None = Field(
            default=False,
            description="Apply multiplier per channel.",
            deprecated="Does not have any effect. Will be removed in future releases.",
        )
        elementwise: bool = Field(default=False, description="Apply multiplier element-wise to pixels.")

    def __init__(
        self,
        multiplier: ScaleFloatType = (0.9, 1.1),
        per_channel: bool | None = None,
        elementwise: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.multiplier = cast(Tuple[float, float], multiplier)
        self.elementwise = elementwise

    def apply(
        self,
        img: np.ndarray,
        multiplier: float | np.ndarray,
        **kwargs: Any,
    ) -> np.ndarray:
        return multiply(img, multiplier)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        if self.multiplier[0] == self.multiplier[1]:
            return {"multiplier": self.multiplier[0]}

        img = data["image"] if "image" in data else data["images"][0]
        shape = img.shape if self.elementwise else get_num_channels(img)

        multiplier = random_utils.uniform(self.multiplier[0], self.multiplier[1], shape).astype(np.float32)

        return {"multiplier": multiplier}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "multiplier", "elementwise"

`apply (self, img, multiplier, **kwargs)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    multiplier: float | np.ndarray,
    **kwargs: Any,
) -> np.ndarray:
    return multiply(img, multiplier)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    if self.multiplier[0] == self.multiplier[1]:
        return {"multiplier": self.multiplier[0]}

    img = data["image"] if "image" in data else data["images"][0]
    shape = img.shape if self.elementwise else get_num_channels(img)

    multiplier = random_utils.uniform(self.multiplier[0], self.multiplier[1], shape).astype(np.float32)

    return {"multiplier": multiplier}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "multiplier", "elementwise"

`class Normalize` `(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, normalization='standard', always_apply=None, p=1.0)` [view source on GitHub] ¶

Applies various normalization techniques to an image. The specific normalization technique can be selected with the normalization parameter.

Standard normalization is applied using the formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value). Other normalization techniques adjust the image based on global or per-channel statistics, or scale pixel values to a specified range.

Parameters:

Name	Type	Description
`mean`	`ColorType \| None`	Mean values for standard normalization. For "standard" normalization, the default values are ImageNet mean values: (0.485, 0.456, 0.406). For "inception" normalization, use mean values of (0.5, 0.5, 0.5).
`std`	`ColorType \| None`	Standard deviation values for standard normalization. For "standard" normalization, the default values are ImageNet standard deviation :(0.229, 0.224, 0.225). For "inception" normalization, use standard deviation values of (0.5, 0.5, 0.5).
`max_pixel_value`	`float \| None`	Maximum possible pixel value, used for scaling in standard normalization. Defaults to 255.0.
`normalization`	`Literal["standard", "image", "image_per_channel", "min_max", "min_max_per_channel", "inception"]) Specifies the normalization technique to apply. Defaults to "standard". - "standard"`	Applies the formula `(img - mean * max_pixel_value) / (std * max_pixel_value)`. The default mean and std are based on ImageNet. - "image": Normalizes the whole image based on its global mean and standard deviation. - "image_per_channel": Normalizes the image per channel based on each channel's mean and standard deviation. - "min_max": Scales the image pixel values to a [0, 1] range based on the global minimum and maximum pixel values. - "min_max_per_channel": Scales each channel of the image pixel values to a [0, 1] range based on the per-channel minimum and maximum pixel values.
`p`	`float`	Probability of applying the transform. Defaults to 1.0.

Targets

image

Image types: uint8, float32

Note

For "standard" normalization, mean, std, and max_pixel_value must be provided. For other normalization types, these parameters are ignored.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Normalize(ImageOnlyTransform):
    """Applies various normalization techniques to an image. The specific normalization technique can be selected
        with the `normalization` parameter.

    Standard normalization is applied using the formula:
        `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`.
        Other normalization techniques adjust the image based on global or per-channel statistics,
        or scale pixel values to a specified range.

    Args:
        mean (ColorType | None): Mean values for standard normalization.
            For "standard" normalization, the default values are ImageNet mean values: (0.485, 0.456, 0.406).
            For "inception" normalization, use mean values of (0.5, 0.5, 0.5).
        std (ColorType | None): Standard deviation values for standard normalization.
            For "standard" normalization, the default values are ImageNet standard deviation :(0.229, 0.224, 0.225).
            For "inception" normalization, use standard deviation values of (0.5, 0.5, 0.5).
        max_pixel_value (float | None): Maximum possible pixel value, used for scaling in standard normalization.
            Defaults to 255.0.
        normalization (Literal["standard", "image", "image_per_channel", "min_max", "min_max_per_channel", "inception"])
            Specifies the normalization technique to apply. Defaults to "standard".
            - "standard": Applies the formula `(img - mean * max_pixel_value) / (std * max_pixel_value)`.
                The default mean and std are based on ImageNet.
            - "image": Normalizes the whole image based on its global mean and standard deviation.
            - "image_per_channel": Normalizes the image per channel based on each channel's mean and standard deviation.
            - "min_max": Scales the image pixel values to a [0, 1] range based on the global
                minimum and maximum pixel values.
            - "min_max_per_channel": Scales each channel of the image pixel values to a [0, 1]
                range based on the per-channel minimum and maximum pixel values.

        p (float): Probability of applying the transform. Defaults to 1.0.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        For "standard" normalization, `mean`, `std`, and `max_pixel_value` must be provided.
        For other normalization types, these parameters are ignored.
    """

    class InitSchema(BaseTransformInitSchema):
        mean: ColorType | None = Field(
            default=(0.485, 0.456, 0.406),
            description="Mean values for normalization, defaulting to ImageNet mean values.",
        )
        std: ColorType | None = Field(
            default=(0.229, 0.224, 0.225),
            description="Standard deviation values for normalization, defaulting to ImageNet std values.",
        )
        max_pixel_value: float | None = Field(default=255.0, description="Maximum possible pixel value.")
        normalization: Literal[
            "standard",
            "image",
            "image_per_channel",
            "min_max",
            "min_max_per_channel",
        ] = "standard"
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_normalization(self) -> Self:
            if (
                self.mean is None
                or self.std is None
                or self.max_pixel_value is None
                and self.normalization == "standard"
            ):
                raise ValueError("mean, std, and max_pixel_value must be provided for standard normalization.")
            return self

    def __init__(
        self,
        mean: ColorType | None = (0.485, 0.456, 0.406),
        std: ColorType | None = (0.229, 0.224, 0.225),
        max_pixel_value: float | None = 255.0,
        normalization: Literal["standard", "image", "image_per_channel", "min_max", "min_max_per_channel"] = "standard",
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.mean = mean
        self.mean_np = np.array(mean, dtype=np.float32) * max_pixel_value
        self.std = std
        self.denominator = np.reciprocal(np.array(std, dtype=np.float32) * max_pixel_value)
        self.max_pixel_value = max_pixel_value
        if normalization not in {"standard", "image", "image_per_channel", "min_max", "min_max_per_channel"}:
            raise ValueError(
                f"Error during Normalize initialization. Unknown normalization type: {normalization}",
            )
        self.normalization = normalization

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if self.normalization == "standard":
            return normalize(
                img,
                self.mean_np,
                self.denominator,
            )
        return normalize_per_image(img, self.normalization)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "mean", "std", "max_pixel_value", "normalization"

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if self.normalization == "standard":
        return normalize(
            img,
            self.mean_np,
            self.denominator,
        )
    return normalize_per_image(img, self.normalization)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "mean", "std", "max_pixel_value", "normalization"

`class PixelDropout` `(dropout_prob=0.01, per_channel=False, drop_value=0, mask_drop_value=None, always_apply=None, p=0.5)` [view source on GitHub] ¶

Set pixels to 0 with some probability.

Parameters:

Name	Type	Description
`dropout_prob`	`float`	pixel drop probability. Default: 0.01
`per_channel`	`bool`	if set to `True` drop mask will be sampled for each channel, otherwise the same mask will be sampled for all channels. Default: False
`drop_value`	`number or sequence of numbers or None`	Value that will be set in dropped place. If set to None value will be sampled randomly, default ranges will be used: - uint8 - [0, 255] - uint16 - [0, 65535] - uint32 - [0, 4294967295] - float, double - [0, 1] Default: 0
`mask_drop_value`	`number or sequence of numbers or None`	Value that will be set in dropped place in masks. If set to None masks will be unchanged. Default: 0
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image, mask

Image types: any

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class PixelDropout(DualTransform):
    """Set pixels to 0 with some probability.

    Args:
        dropout_prob (float): pixel drop probability. Default: 0.01
        per_channel (bool): if set to `True` drop mask will be sampled for each channel,
            otherwise the same mask will be sampled for all channels. Default: False
        drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
            If set to None value will be sampled randomly, default ranges will be used:
                - uint8 - [0, 255]
                - uint16 - [0, 65535]
                - uint32 - [0, 4294967295]
                - float, double - [0, 1]
            Default: 0
        mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
            If set to None masks will be unchanged. Default: 0
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask
    Image types:
        any

    """

    class InitSchema(BaseTransformInitSchema):
        dropout_prob: ProbabilityType = 0.01
        per_channel: bool = Field(default=False, description="Sample drop mask per channel.")
        drop_value: ScaleFloatType | None = Field(
            default=0,
            description="Value to set in dropped pixels. None for random sampling.",
        )
        mask_drop_value: ScaleFloatType | None = Field(
            default=None,
            description="Value to set in dropped pixels in masks. None to leave masks unchanged.",
        )

        @model_validator(mode="after")
        def validate_mask_drop_value(self) -> Self:
            if self.mask_drop_value is not None and self.per_channel:
                msg = "PixelDropout supports mask only with per_channel=False."
                raise ValueError(msg)
            return self

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(
        self,
        dropout_prob: float = 0.01,
        per_channel: bool = False,
        drop_value: ScaleFloatType | None = 0,
        mask_drop_value: ScaleFloatType | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.dropout_prob = dropout_prob
        self.per_channel = per_channel
        self.drop_value = drop_value
        self.mask_drop_value = mask_drop_value

    def apply(
        self,
        img: np.ndarray,
        drop_mask: np.ndarray,
        drop_value: float | Sequence[float],
        **params: Any,
    ) -> np.ndarray:
        return fmain.pixel_dropout(img, drop_mask, drop_value)

    def apply_to_mask(self, mask: np.ndarray, drop_mask: np.ndarray, **params: Any) -> np.ndarray:
        if self.mask_drop_value is None:
            return mask

        if mask.ndim == MONO_CHANNEL_DIMENSIONS:
            drop_mask = np.squeeze(drop_mask)

        return fmain.pixel_dropout(mask, drop_mask, self.mask_drop_value)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return bbox

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return keypoint

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        img = data["image"] if "image" in data else data["images"][0]
        shape = img.shape if self.per_channel else img.shape[:2]

        rnd = np.random.RandomState(random.randint(0, 1 << 31))
        # Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
        drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])

        drop_value: float | Sequence[float] | np.ndarray
        if drop_mask.ndim != img.ndim:
            drop_mask = np.expand_dims(drop_mask, -1)
        if self.drop_value is None:
            drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])

            if img.dtype in (np.uint8, np.uint16, np.uint32):
                drop_value = rnd.randint(0, int(MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
            elif img.dtype in [np.float32, np.double]:
                drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtype)
            else:
                raise ValueError(f"Unsupported dtype: {img.dtype}")
        else:
            drop_value = self.drop_value

        return {"drop_mask": drop_mask, "drop_value": drop_value}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")

`apply (self, img, drop_mask, drop_value, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    drop_mask: np.ndarray,
    drop_value: float | Sequence[float],
    **params: Any,
) -> np.ndarray:
    return fmain.pixel_dropout(img, drop_mask, drop_value)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    img = data["image"] if "image" in data else data["images"][0]
    shape = img.shape if self.per_channel else img.shape[:2]

    rnd = np.random.RandomState(random.randint(0, 1 << 31))
    # Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
    drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])

    drop_value: float | Sequence[float] | np.ndarray
    if drop_mask.ndim != img.ndim:
        drop_mask = np.expand_dims(drop_mask, -1)
    if self.drop_value is None:
        drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])

        if img.dtype in (np.uint8, np.uint16, np.uint32):
            drop_value = rnd.randint(0, int(MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
        elif img.dtype in [np.float32, np.double]:
            drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtype)
        else:
            raise ValueError(f"Unsupported dtype: {img.dtype}")
    else:
        drop_value = self.drop_value

    return {"drop_mask": drop_mask, "drop_value": drop_value}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")

`class PlanckianJitter` `(mode='blackbody', temperature_limit=None, sampling_method='uniform', always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly jitter the image illuminant along the Planckian locus.

Physics-based color augmentation creates realistic variations in chromaticity, simulating illumination changes in a scene.

Parameters:

Name	Type	Description
`mode`	`Literal["blackbody", "cied"]`	The mode of the transformation. `blackbody` simulates blackbody radiation, and `cied` uses the CIED illuminant series.
`temperature_limit`	`tuple[int, int]`	Temperature range to sample from. For `blackbody` mode, the range should be within `[3000K, 15000K]`. For "cied" mode, the range should be within `[4000K, 15000K]`. Range should include white temperature `6000` Higher temperatures produce cooler (bluish) images. If not defined, it defaults to: - `[3000, 15000]` for `blackbody` mode - `[4000, 15000]` for `cied` mode
`p`	`float`	Probability of applying the transform. Defaults to 0.5.
`sampling_method`	`Literal["uniform", "gaussian"]`	Method to sample the temperature. "uniform" samples uniformly across the range, while "gaussian" samples from a Gaussian distribution.
`p`	`float`	Probability of applying the transform. Defaults to 0.5.

If temperature_limit is not defined, it defaults to: - [3000, 15000] for blackbody mode - [4000, 15000] for cied mode

Targets

image

Image types: uint8, float32

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class PlanckianJitter(ImageOnlyTransform):
    r"""Randomly jitter the image illuminant along the Planckian locus.

    Physics-based color augmentation creates realistic variations in chromaticity, simulating illumination changes
    in a scene.

    Args:
        mode (Literal["blackbody", "cied"]): The mode of the transformation. `blackbody` simulates blackbody radiation,
            and `cied` uses the CIED illuminant series.
        temperature_limit (tuple[int, int]): Temperature range to sample from. For `blackbody` mode, the range should
            be within `[3000K, 15000K]`. For "cied" mode, the range should be within `[4000K, 15000K]`. Range should
            include white temperature `6000`
            Higher temperatures produce cooler (bluish) images. If not defined, it defaults to:
            - `[3000, 15000]` for `blackbody` mode
            - `[4000, 15000]` for `cied` mode
        p (float): Probability of applying the transform. Defaults to 0.5.
        sampling_method (Literal["uniform", "gaussian"]): Method to sample the temperature.
            "uniform" samples uniformly across the range, while "gaussian" samples from a Gaussian distribution.
        p (float): Probability of applying the transform. Defaults to 0.5.

    If `temperature_limit` is not defined, it defaults to:
        - `[3000, 15000]` for `blackbody` mode
        - `[4000, 15000]` for `cied` mode

    Targets:
        image

    Image types:
        uint8, float32

    References:
        - https://github.com/TheZino/PlanckianJitter
        - https://arxiv.org/pdf/2202.07993.pdf

    """

    class InitSchema(BaseTransformInitSchema):
        mode: PlanckianJitterMode = "blackbody"
        temperature_limit: Annotated[tuple[int, int], AfterValidator(nondecreasing)] | None = None
        sampling_method: Literal["uniform", "gaussian"] = "uniform"

        @model_validator(mode="after")
        def validate_temperature(self) -> Self:
            max_temp = int(PLANKIAN_JITTER_CONST["MAX_TEMP"])

            if self.temperature_limit is None:
                if self.mode == "blackbody":
                    self.temperature_limit = int(PLANKIAN_JITTER_CONST["MIN_BLACKBODY_TEMP"]), max_temp
                elif self.mode == "cied":
                    self.temperature_limit = int(PLANKIAN_JITTER_CONST["MIN_CIED_TEMP"]), max_temp
            else:
                if self.mode == "blackbody" and (
                    min(self.temperature_limit) < PLANKIAN_JITTER_CONST["MIN_BLACKBODY_TEMP"]
                    or max(self.temperature_limit) > max_temp
                ):
                    raise ValueError("Temperature limits for blackbody should be in [3000, 15000] range")
                if self.mode == "cied" and (
                    min(self.temperature_limit) < PLANKIAN_JITTER_CONST["MIN_CIED_TEMP"]
                    or max(self.temperature_limit) > max_temp
                ):
                    raise ValueError("Temperature limits for CIED should be in [4000, 15000] range")

                if not self.temperature_limit[0] <= PLANKIAN_JITTER_CONST["WHITE_TEMP"] <= self.temperature_limit[1]:
                    raise ValueError("White temperature should be within the temperature limits")

            return self

    def __init__(
        self,
        mode: PlanckianJitterMode = "blackbody",
        temperature_limit: tuple[int, int] | None = None,
        sampling_method: Literal["uniform", "gaussian"] = "uniform",
        always_apply: bool | None = None,
        p: float = 0.5,
    ) -> None:
        super().__init__(p=p, always_apply=always_apply)

        self.mode = mode
        self.temperature_limit = cast(Tuple[int, int], temperature_limit)
        self.sampling_method = sampling_method

    def apply(self, img: np.ndarray, temperature: int, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            raise TypeError("PlanckianJitter transformation expects 3-channel images.")
        return fmain.planckian_jitter(img, temperature, mode=self.mode)

    def get_params(self) -> dict[str, Any]:
        sampling_prob_boundary = PLANKIAN_JITTER_CONST["SAMPLING_TEMP_PROB"]
        sampling_temp_boundary = PLANKIAN_JITTER_CONST["WHITE_TEMP"]

        if self.sampling_method == "uniform":
            # Split into 2 cases to avoid selecting cold temperatures (>6000) too often
            if random.random() < sampling_prob_boundary:
                temperature = (
                    random.uniform(
                        self.temperature_limit[0],
                        sampling_temp_boundary,
                    ),
                )
            else:
                temperature = (
                    random.uniform(
                        sampling_temp_boundary,
                        self.temperature_limit[1],
                    ),
                )
        elif self.sampling_method == "gaussian":
            # Sample values from asymmetric gaussian distribution
            if random.random() < sampling_prob_boundary:
                # Left side
                shift = np.abs(
                    random.gauss(
                        0,
                        np.abs(sampling_temp_boundary - self.temperature_limit[0]) / 3,
                    ),
                )
            else:
                # Right side
                shift = -np.abs(
                    random.gauss(
                        0,
                        np.abs(self.temperature_limit[1] - sampling_temp_boundary) / 3,
                    ),
                )

            temperature = sampling_temp_boundary - shift
        else:
            raise ValueError(f"Unknown sampling method: {self.sampling_method}")

        return {"temperature": int(np.clip(temperature, self.temperature_limit[0], self.temperature_limit[1]))}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "mode", "temperature_limit", "sampling_method"

`apply (self, img, temperature, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, temperature: int, **params: Any) -> np.ndarray:
    if not is_rgb_image(img):
        raise TypeError("PlanckianJitter transformation expects 3-channel images.")
    return fmain.planckian_jitter(img, temperature, mode=self.mode)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    sampling_prob_boundary = PLANKIAN_JITTER_CONST["SAMPLING_TEMP_PROB"]
    sampling_temp_boundary = PLANKIAN_JITTER_CONST["WHITE_TEMP"]

    if self.sampling_method == "uniform":
        # Split into 2 cases to avoid selecting cold temperatures (>6000) too often
        if random.random() < sampling_prob_boundary:
            temperature = (
                random.uniform(
                    self.temperature_limit[0],
                    sampling_temp_boundary,
                ),
            )
        else:
            temperature = (
                random.uniform(
                    sampling_temp_boundary,
                    self.temperature_limit[1],
                ),
            )
    elif self.sampling_method == "gaussian":
        # Sample values from asymmetric gaussian distribution
        if random.random() < sampling_prob_boundary:
            # Left side
            shift = np.abs(
                random.gauss(
                    0,
                    np.abs(sampling_temp_boundary - self.temperature_limit[0]) / 3,
                ),
            )
        else:
            # Right side
            shift = -np.abs(
                random.gauss(
                    0,
                    np.abs(self.temperature_limit[1] - sampling_temp_boundary) / 3,
                ),
            )

        temperature = sampling_temp_boundary - shift
    else:
        raise ValueError(f"Unknown sampling method: {self.sampling_method}")

    return {"temperature": int(np.clip(temperature, self.temperature_limit[0], self.temperature_limit[1]))}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "mode", "temperature_limit", "sampling_method"

`class Posterize` `(num_bits=4, always_apply=None, p=0.5)` [view source on GitHub] ¶

Reduce the number of bits for each color channel.

Parameters:

Name	Type	Description
`num_bits`	`int, int) or int, or list of ints [r, g, b], or list of ints [[r1, r1], [g1, g2], [b1, b2]]`	number of high bits. If num_bits is a single value, the range will be [num_bits, num_bits]. Must be in range [0, 8]. Default: 4.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets: image

Image types: uint8

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Posterize(ImageOnlyTransform):
    """Reduce the number of bits for each color channel.

    Args:
        num_bits ((int, int) or int,
                  or list of ints [r, g, b],
                  or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
            If num_bits is a single value, the range will be [num_bits, num_bits].
            Must be in range [0, 8]. Default: 4.
        p: probability of applying the transform. Default: 0.5.

    Targets:
    image

    Image types:
        uint8

    """

    class InitSchema(BaseTransformInitSchema):
        num_bits: Annotated[
            int | tuple[int, int] | tuple[int, int, int],
            Field(default=4, description="Number of high bits"),
        ]

        @field_validator("num_bits")
        @classmethod
        def validate_num_bits(cls, num_bits: Any) -> tuple[int, int] | list[tuple[int, int]]:
            if isinstance(num_bits, int):
                return cast(Tuple[int, int], to_tuple(num_bits, num_bits))
            if isinstance(num_bits, Sequence) and len(num_bits) == NUM_BITS_ARRAY_LENGTH:
                return [cast(Tuple[int, int], to_tuple(i, 0)) for i in num_bits]
            return cast(Tuple[int, int], to_tuple(num_bits, 0))

    def __init__(
        self,
        num_bits: int | tuple[int, int] | tuple[int, int, int] = 4,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.num_bits = cast(Union[Tuple[int, ...], List[Tuple[int, ...]]], num_bits)

    def apply(self, img: np.ndarray, num_bits: int, **params: Any) -> np.ndarray:
        return fmain.posterize(img, num_bits)

    def get_params(self) -> dict[str, Any]:
        if len(self.num_bits) == NUM_BITS_ARRAY_LENGTH:
            return {"num_bits": [random.randint(int(i[0]), int(i[1])) for i in self.num_bits]}  # type: ignore[index]
        num_bits = self.num_bits
        return {"num_bits": random.randint(int(num_bits[0]), int(num_bits[1]))}  # type: ignore[arg-type]

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("num_bits",)

`apply (self, img, num_bits, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, num_bits: int, **params: Any) -> np.ndarray:
    return fmain.posterize(img, num_bits)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    if len(self.num_bits) == NUM_BITS_ARRAY_LENGTH:
        return {"num_bits": [random.randint(int(i[0]), int(i[1])) for i in self.num_bits]}  # type: ignore[index]
    num_bits = self.num_bits
    return {"num_bits": random.randint(int(num_bits[0]), int(num_bits[1]))}  # type: ignore[arg-type]

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str]:
    return ("num_bits",)

`class RGBShift` `(r_shift_limit=(-20, 20), g_shift_limit=(-20, 20), b_shift_limit=(-20, 20), always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly shift values for each channel of the input RGB image.

Parameters:

Name	Type	Description
`r_shift_limit`	`ScaleIntType`	range for changing values for the red channel. If r_shift_limit is a single int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
`g_shift_limit`	`ScaleIntType`	range for changing values for the green channel. If g_shift_limit is a single int, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
`b_shift_limit`	`ScaleIntType`	range for changing values for the blue channel. If b_shift_limit is a single int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RGBShift(ImageOnlyTransform):
    """Randomly shift values for each channel of the input RGB image.

    Args:
        r_shift_limit: range for changing values for the red channel. If r_shift_limit is a single
            int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
        g_shift_limit: range for changing values for the green channel. If g_shift_limit is a
            single int, the range  will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
        b_shift_limit: range for changing values for the blue channel. If b_shift_limit is a single
            int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        r_shift_limit: SymmetricRangeType = (-20, 20)
        g_shift_limit: SymmetricRangeType = (-20, 20)
        b_shift_limit: SymmetricRangeType = (-20, 20)

    def __init__(
        self,
        r_shift_limit: ScaleIntType = (-20, 20),
        g_shift_limit: ScaleIntType = (-20, 20),
        b_shift_limit: ScaleIntType = (-20, 20),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.r_shift_limit = cast(Tuple[float, float], r_shift_limit)
        self.g_shift_limit = cast(Tuple[float, float], g_shift_limit)
        self.b_shift_limit = cast(Tuple[float, float], b_shift_limit)

    def apply(self, img: np.ndarray, shift: np.ndarray, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            msg = "RGBShift transformation expects 3-channel images."
            raise TypeError(msg)

        return albucore.add_vector(img, shift)

    def get_params(self) -> dict[str, Any]:
        return {
            "shift": np.array(
                [
                    random.uniform(self.r_shift_limit[0], self.r_shift_limit[1]),
                    random.uniform(self.g_shift_limit[0], self.g_shift_limit[1]),
                    random.uniform(self.b_shift_limit[0], self.b_shift_limit[1]),
                ],
            ),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "r_shift_limit", "g_shift_limit", "b_shift_limit"

`apply (self, img, shift, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, shift: np.ndarray, **params: Any) -> np.ndarray:
    if not is_rgb_image(img):
        msg = "RGBShift transformation expects 3-channel images."
        raise TypeError(msg)

    return albucore.add_vector(img, shift)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {
        "shift": np.array(
            [
                random.uniform(self.r_shift_limit[0], self.r_shift_limit[1]),
                random.uniform(self.g_shift_limit[0], self.g_shift_limit[1]),
                random.uniform(self.b_shift_limit[0], self.b_shift_limit[1]),
            ],
        ),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "r_shift_limit", "g_shift_limit", "b_shift_limit"

`class RandomBrightnessContrast` `(brightness_limit=(-0.2, 0.2), contrast_limit=(-0.2, 0.2), brightness_by_max=True, always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly change brightness and contrast of the input image.

Parameters:

Name	Type	Description
`brightness_limit`	`ScaleFloatType`	factor range for changing brightness. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
`contrast_limit`	`ScaleFloatType`	factor range for changing contrast. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
`brightness_by_max`	`bool`	If True adjust contrast by image dtype maximum, else adjust contrast by image mean.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomBrightnessContrast(ImageOnlyTransform):
    """Randomly change brightness and contrast of the input image.

    Args:
        brightness_limit: factor range for changing brightness.
            If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
        contrast_limit: factor range for changing contrast.
            If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
        brightness_by_max: If True adjust contrast by image dtype maximum,
            else adjust contrast by image mean.
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        brightness_limit: SymmetricRangeType = (-0.2, 0.2)
        contrast_limit: SymmetricRangeType = (-0.2, 0.2)
        brightness_by_max: bool = Field(default=True, description="Adjust brightness by image dtype maximum if True.")

    def __init__(
        self,
        brightness_limit: ScaleFloatType = (-0.2, 0.2),
        contrast_limit: ScaleFloatType = (-0.2, 0.2),
        brightness_by_max: bool = True,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.brightness_limit = cast(Tuple[float, float], brightness_limit)
        self.contrast_limit = cast(Tuple[float, float], contrast_limit)
        self.brightness_by_max = brightness_by_max

    def apply(self, img: np.ndarray, alpha: float, beta: float, **params: Any) -> np.ndarray:
        return fmain.brightness_contrast_adjust(img, alpha, beta, self.brightness_by_max)

    def get_params(self) -> dict[str, float]:
        return {
            "alpha": 1.0 + random.uniform(self.contrast_limit[0], self.contrast_limit[1]),
            "beta": 0.0 + random.uniform(self.brightness_limit[0], self.brightness_limit[1]),
        }

    def get_transform_init_args_names(self) -> tuple[str, str, str]:
        return ("brightness_limit", "contrast_limit", "brightness_by_max")

`apply (self, img, alpha, beta, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, alpha: float, beta: float, **params: Any) -> np.ndarray:
    return fmain.brightness_contrast_adjust(img, alpha, beta, self.brightness_by_max)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {
        "alpha": 1.0 + random.uniform(self.contrast_limit[0], self.contrast_limit[1]),
        "beta": 0.0 + random.uniform(self.brightness_limit[0], self.brightness_limit[1]),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str]:
    return ("brightness_limit", "contrast_limit", "brightness_by_max")

`class RandomFog` `(fog_coef_lower=None, fog_coef_upper=None, alpha_coef=0.08, fog_coef_range=(0.3, 1), always_apply=None, p=0.5)` [view source on GitHub] ¶

Simulates fog for the image.

Parameters:

Name	Type	Description
`fog_coef_range`	`tuple`	tuple of bounds on the fog intensity coefficient (fog_coef_lower, fog_coef_upper). Default: (0.3, 1).
`alpha_coef`	`float`	Transparency of the fog circles. Should be in [0, 1] range. Default: 0.08.
`p`	`float`	Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomFog(ImageOnlyTransform):
    """Simulates fog for the image.

    Args:
        fog_coef_range (tuple): tuple of bounds on the fog intensity coefficient (fog_coef_lower, fog_coef_upper).
            Default: (0.3, 1).
        alpha_coef (float): Transparency of the fog circles. Should be in [0, 1] range. Default: 0.08.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """

    class InitSchema(BaseTransformInitSchema):
        fog_coef_lower: float | None = Field(
            default=None,
            description="Lower limit for fog intensity coefficient",
            ge=0,
            le=1,
        )
        fog_coef_upper: float | None = Field(
            default=None,
            description="Upper limit for fog intensity coefficient",
            ge=0,
            le=1,
        )
        fog_coef_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)] = (
            0.3,
            1,
        )

        alpha_coef: float = Field(default=0.08, description="Transparency of the fog circles", ge=0, le=1)

        @model_validator(mode="after")
        def validate_fog_coefficients(self) -> Self:
            if self.fog_coef_lower is not None:
                warn("`fog_coef_lower` is deprecated, use `fog_coef_range` instead.", DeprecationWarning, stacklevel=2)
            if self.fog_coef_upper is not None:
                warn("`fog_coef_upper` is deprecated, use `fog_coef_range` instead.", DeprecationWarning, stacklevel=2)

            lower = self.fog_coef_lower if self.fog_coef_lower is not None else self.fog_coef_range[0]
            upper = self.fog_coef_upper if self.fog_coef_upper is not None else self.fog_coef_range[1]
            self.fog_coef_range = (lower, upper)

            self.fog_coef_lower = None
            self.fog_coef_upper = None

            return self

    def __init__(
        self,
        fog_coef_lower: float | None = None,
        fog_coef_upper: float | None = None,
        alpha_coef: float = 0.08,
        fog_coef_range: tuple[float, float] = (0.3, 1),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.fog_coef_range = fog_coef_range
        self.alpha_coef = alpha_coef

    def apply(
        self,
        img: np.ndarray,
        fog_coef: np.ndarray,
        haze_list: list[tuple[int, int]],
        **params: Any,
    ) -> np.ndarray:
        return fmain.add_fog(img, fog_coef, self.alpha_coef, haze_list)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        fog_coef = random.uniform(*self.fog_coef_range)

        height, width = imshape = params["shape"][:2]

        hw = max(1, int(width // 3 * fog_coef))

        haze_list = []
        midx = width // 2 - 2 * hw
        midy = height // 2 - hw
        index = 1

        while midx > -hw or midy > -hw:
            for _ in range(hw // 10 * index):
                x = random.randint(midx, width - midx - hw)
                y = random.randint(midy, height - midy - hw)
                haze_list.append((x, y))

            midx -= 3 * hw * width // sum(imshape)
            midy -= 3 * hw * height // sum(imshape)
            index += 1

        return {"haze_list": haze_list, "fog_coef": fog_coef}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "fog_coef_range", "alpha_coef"

`apply (self, img, fog_coef, haze_list, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    fog_coef: np.ndarray,
    haze_list: list[tuple[int, int]],
    **params: Any,
) -> np.ndarray:
    return fmain.add_fog(img, fog_coef, self.alpha_coef, haze_list)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    fog_coef = random.uniform(*self.fog_coef_range)

    height, width = imshape = params["shape"][:2]

    hw = max(1, int(width // 3 * fog_coef))

    haze_list = []
    midx = width // 2 - 2 * hw
    midy = height // 2 - hw
    index = 1

    while midx > -hw or midy > -hw:
        for _ in range(hw // 10 * index):
            x = random.randint(midx, width - midx - hw)
            y = random.randint(midy, height - midy - hw)
            haze_list.append((x, y))

        midx -= 3 * hw * width // sum(imshape)
        midy -= 3 * hw * height // sum(imshape)
        index += 1

    return {"haze_list": haze_list, "fog_coef": fog_coef}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return "fog_coef_range", "alpha_coef"

`class RandomGamma` `(gamma_limit=(80, 120), always_apply=None, p=0.5)` [view source on GitHub] ¶

Applies random gamma correction to an image as a form of data augmentation.

This class adjusts the luminance of an image by applying gamma correction with a randomly selected gamma value from a specified range. Gamma correction can simulate various lighting conditions, potentially enhancing model generalization.

Attributes:

Name	Type	Description
`gamma_limit`	`Union[int, tuple[int, int]]`	The range for gamma adjustment. If `gamma_limit` is a single int, the range will be interpreted as (-gamma_limit, gamma_limit), defining how much to adjust the image's gamma. Default is (80, 120).
`always_apply`		Depreciated. Use `p=1` instead.
`p`	`float`	The probability that the transform will be applied. Default is 0.5.

Targets

image

Image types: uint8, float32

Reference

https://en.wikipedia.org/wiki/Gamma_correction

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomGamma(ImageOnlyTransform):
    """Applies random gamma correction to an image as a form of data augmentation.

    This class adjusts the luminance of an image by applying gamma correction with a randomly
    selected gamma value from a specified range. Gamma correction can simulate various lighting
    conditions, potentially enhancing model generalization.

    Attributes:
        gamma_limit (Union[int, tuple[int, int]]): The range for gamma adjustment. If `gamma_limit` is a single
            int, the range will be interpreted as (-gamma_limit, gamma_limit), defining how much
            to adjust the image's gamma. Default is (80, 120).
        always_apply: Depreciated. Use `p=1` instead.
        p (float): The probability that the transform will be applied. Default is 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
         https://en.wikipedia.org/wiki/Gamma_correction

    """

    class InitSchema(BaseTransformInitSchema):
        gamma_limit: OnePlusFloatRangeType = (80, 120)

    def __init__(
        self,
        gamma_limit: ScaleIntType = (80, 120),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.gamma_limit = cast(Tuple[float, float], gamma_limit)

    def apply(self, img: np.ndarray, gamma: float, **params: Any) -> np.ndarray:
        return fmain.gamma_transform(img, gamma=gamma)

    def get_params(self) -> dict[str, float]:
        return {"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("gamma_limit",)

`apply (self, img, gamma, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, gamma: float, **params: Any) -> np.ndarray:
    return fmain.gamma_transform(img, gamma=gamma)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("gamma_limit",)

`class RandomGravel` `(gravel_roi=(0.1, 0.4, 0.9, 0.9), number_of_patches=2, always_apply=None, p=0.5)` [view source on GitHub] ¶

Add gravels.

Parameters:

Name	Type	Description
`gravel_roi`	`tuple[float, float, float, float]`	(top-left x, top-left y, bottom-right x, bottom right y). Should be in [0, 1] range
`number_of_patches`	`int`	no. of gravel patches required

Targets

image

Image types: uint8, float32

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomGravel(ImageOnlyTransform):
    """Add gravels.

    Args:
        gravel_roi: (top-left x, top-left y,
            bottom-right x, bottom right y). Should be in [0, 1] range
        number_of_patches: no. of gravel patches required

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    """

    class InitSchema(BaseTransformInitSchema):
        gravel_roi: tuple[float, float, float, float] = Field(
            default=(0.1, 0.4, 0.9, 0.9),
            description="Region of interest for gravel placement",
        )
        number_of_patches: int = Field(default=2, description="Number of gravel patches", ge=1)

        @model_validator(mode="after")
        def validate_gravel_roi(self) -> Self:
            gravel_lower_x, gravel_lower_y, gravel_upper_x, gravel_upper_y = self.gravel_roi
            if not 0 <= gravel_lower_x < gravel_upper_x <= 1 or not 0 <= gravel_lower_y < gravel_upper_y <= 1:
                raise ValueError(f"Invalid gravel_roi. Got: {self.gravel_roi}.")
            return self

    def __init__(
        self,
        gravel_roi: tuple[float, float, float, float] = (0.1, 0.4, 0.9, 0.9),
        number_of_patches: int = 2,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.gravel_roi = gravel_roi
        self.number_of_patches = number_of_patches

    def generate_gravel_patch(self, rectangular_roi: tuple[int, int, int, int]) -> np.ndarray:
        x1, y1, x2, y2 = rectangular_roi
        area = abs((x2 - x1) * (y2 - y1))
        count = area // 10
        gravels = np.empty([count, 2], dtype=np.int64)
        gravels[:, 0] = random_utils.randint(x1, x2, count)
        gravels[:, 1] = random_utils.randint(y1, y2, count)
        return gravels

    def apply(self, img: np.ndarray, gravels_infos: list[Any], **params: Any) -> np.ndarray:
        if gravels_infos is None:
            gravels_infos = []
        return fmain.add_gravel(img, gravels_infos)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]

        x_min, y_min, x_max, y_max = self.gravel_roi
        x_min = int(x_min * width)
        x_max = int(x_max * width)
        y_min = int(y_min * height)
        y_max = int(y_max * height)

        max_height = 200
        max_width = 30

        rectangular_rois = np.zeros([self.number_of_patches, 4], dtype=np.int64)
        xx1 = random_utils.randint(x_min + 1, x_max, self.number_of_patches)  # xmax
        xx2 = random_utils.randint(x_min, xx1)  # xmin
        yy1 = random_utils.randint(y_min + 1, y_max, self.number_of_patches)  # ymax
        yy2 = random_utils.randint(y_min, yy1)  # ymin

        rectangular_rois[:, 0] = xx2
        rectangular_rois[:, 1] = yy2
        rectangular_rois[:, 2] = [min(tup) for tup in zip(xx1, xx2 + max_height)]
        rectangular_rois[:, 3] = [min(tup) for tup in zip(yy1, yy2 + max_width)]

        minx = []
        maxx = []
        miny = []
        maxy = []
        val = []
        for roi in rectangular_rois:
            gravels = self.generate_gravel_patch(roi)
            x = gravels[:, 0]
            y = gravels[:, 1]
            r = random_utils.randint(1, 4, len(gravels))
            sat = random_utils.randint(0, 255, len(gravels))
            miny.append(np.maximum(y - r, 0))
            maxy.append(np.minimum(y + r, y))
            minx.append(np.maximum(x - r, 0))
            maxx.append(np.minimum(x + r, x))
            val.append(sat)

        return {
            "gravels_infos": np.stack(
                [
                    np.concatenate(miny),
                    np.concatenate(maxy),
                    np.concatenate(minx),
                    np.concatenate(maxx),
                    np.concatenate(val),
                ],
                1,
            ),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "gravel_roi", "number_of_patches"

`apply (self, img, gravels_infos, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, gravels_infos: list[Any], **params: Any) -> np.ndarray:
    if gravels_infos is None:
        gravels_infos = []
    return fmain.add_gravel(img, gravels_infos)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    height, width = params["shape"][:2]

    x_min, y_min, x_max, y_max = self.gravel_roi
    x_min = int(x_min * width)
    x_max = int(x_max * width)
    y_min = int(y_min * height)
    y_max = int(y_max * height)

    max_height = 200
    max_width = 30

    rectangular_rois = np.zeros([self.number_of_patches, 4], dtype=np.int64)
    xx1 = random_utils.randint(x_min + 1, x_max, self.number_of_patches)  # xmax
    xx2 = random_utils.randint(x_min, xx1)  # xmin
    yy1 = random_utils.randint(y_min + 1, y_max, self.number_of_patches)  # ymax
    yy2 = random_utils.randint(y_min, yy1)  # ymin

    rectangular_rois[:, 0] = xx2
    rectangular_rois[:, 1] = yy2
    rectangular_rois[:, 2] = [min(tup) for tup in zip(xx1, xx2 + max_height)]
    rectangular_rois[:, 3] = [min(tup) for tup in zip(yy1, yy2 + max_width)]

    minx = []
    maxx = []
    miny = []
    maxy = []
    val = []
    for roi in rectangular_rois:
        gravels = self.generate_gravel_patch(roi)
        x = gravels[:, 0]
        y = gravels[:, 1]
        r = random_utils.randint(1, 4, len(gravels))
        sat = random_utils.randint(0, 255, len(gravels))
        miny.append(np.maximum(y - r, 0))
        maxy.append(np.minimum(y + r, y))
        minx.append(np.maximum(x - r, 0))
        maxx.append(np.minimum(x + r, x))
        val.append(sat)

    return {
        "gravels_infos": np.stack(
            [
                np.concatenate(miny),
                np.concatenate(maxy),
                np.concatenate(minx),
                np.concatenate(maxx),
                np.concatenate(val),
            ],
            1,
        ),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return "gravel_roi", "number_of_patches"

`class RandomGridShuffle` `(grid=(3, 3), p=0.5, always_apply=None)` [view source on GitHub] ¶

Randomly shuffles the grid's cells on an image, mask, or keypoints, effectively rearranging patches within the image. This transformation divides the image into a grid and then permutes these grid cells based on a random mapping.

Parameters:

Name	Type	Description
`grid`	`tuple[int, int]`	Size of the grid for splitting the image into cells. Each cell is shuffled randomly.
`p`	`float`	Probability that the transform will be applied.

Targets

image, mask, keypoints

Image types: uint8, float32

Examples:

Python

>>> import albumentations as A
>>> transform = A.Compose([
    A.RandomGridShuffle(grid=(3, 3), p=1.0)
])
>>> transformed = transform(image=my_image, mask=my_mask)
>>> image, mask = transformed['image'], transformed['mask']
# This will shuffle the 3x3 grid cells of `my_image` and `my_mask` randomly.
# Mask and image are shuffled in a consistent way

Note

This transform could be useful when only micro features are important for the model, and memorizing the global structure could be harmful. For example: - Identifying the type of cell phone used to take a picture based on micro artifacts generated by phone post-processing algorithms, rather than the semantic features of the photo. See more at https://ieeexplore.ieee.org/abstract/document/8622031 - Identifying stress, glucose, hydration levels based on skin images.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomGridShuffle(DualTransform):
    """Randomly shuffles the grid's cells on an image, mask, or keypoints,
    effectively rearranging patches within the image.
    This transformation divides the image into a grid and then permutes these grid cells based on a random mapping.


    Args:
        grid (tuple[int, int]): Size of the grid for splitting the image into cells. Each cell is shuffled randomly.
        p (float): Probability that the transform will be applied.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> import albumentations as A
        >>> transform = A.Compose([
            A.RandomGridShuffle(grid=(3, 3), p=1.0)
        ])
        >>> transformed = transform(image=my_image, mask=my_mask)
        >>> image, mask = transformed['image'], transformed['mask']
        # This will shuffle the 3x3 grid cells of `my_image` and `my_mask` randomly.
        # Mask and image are shuffled in a consistent way
    Note:
        This transform could be useful when only micro features are important for the model, and memorizing
        the global structure could be harmful. For example:
        - Identifying the type of cell phone used to take a picture based on micro artifacts generated by
        phone post-processing algorithms, rather than the semantic features of the photo.
        See more at https://ieeexplore.ieee.org/abstract/document/8622031
        - Identifying stress, glucose, hydration levels based on skin images.
    """

    class InitSchema(BaseTransformInitSchema):
        grid: Annotated[tuple[int, int], AfterValidator(check_1plus)] = (3, 3)

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    def __init__(self, grid: tuple[int, int] = (3, 3), p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.grid = grid

    def apply(self, img: np.ndarray, tiles: np.ndarray, mapping: list[int], **params: Any) -> np.ndarray:
        return fmain.swap_tiles_on_image(img, tiles, mapping)

    def apply_to_mask(self, mask: np.ndarray, tiles: np.ndarray, mapping: list[int], **params: Any) -> np.ndarray:
        return fmain.swap_tiles_on_image(mask, tiles, mapping)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        tiles: np.ndarray,
        mapping: list[int],
        **params: Any,
    ) -> KeypointInternalType:
        x, y = keypoint[:2]

        # Find which original tile the keypoint belongs to
        for original_index, new_index in enumerate(mapping):
            start_y, start_x, end_y, end_x = tiles[original_index]
            # check if the keypoint is in this tile
            if start_y <= y < end_y and start_x <= x < end_x:
                # Get the new tile's coordinates
                new_start_y, new_start_x = tiles[new_index][:2]

                # Map the keypoint to the new tile's position
                new_x = (x - start_x) + new_start_x
                new_y = (y - start_y) + new_start_y

                return (new_x, new_y, *keypoint[2:])

        # If the keypoint wasn't in any tile (shouldn't happen), log a warning for debugging purposes
        warn(
            "Keypoint not in any tile, returning it unchanged. This is unexpected and should be investigated.",
            RuntimeWarning,
            stacklevel=2,
        )
        return keypoint

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]
        random_state = random_utils.get_random_state()
        original_tiles = fmain.split_uniform_grid(
            (height, width),
            self.grid,
            random_state=random_state,
        )
        shape_groups = fmain.create_shape_groups(original_tiles)
        mapping = fmain.shuffle_tiles_within_shape_groups(shape_groups, random_state=random_state)

        return {"tiles": original_tiles, "mapping": mapping}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("grid",)

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "keypoints": self.apply_to_keypoints,
        }

`apply (self, img, tiles, mapping, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, tiles: np.ndarray, mapping: list[int], **params: Any) -> np.ndarray:
    return fmain.swap_tiles_on_image(img, tiles, mapping)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    height, width = params["shape"][:2]
    random_state = random_utils.get_random_state()
    original_tiles = fmain.split_uniform_grid(
        (height, width),
        self.grid,
        random_state=random_state,
    )
    shape_groups = fmain.create_shape_groups(original_tiles)
    mapping = fmain.shuffle_tiles_within_shape_groups(shape_groups, random_state=random_state)

    return {"tiles": original_tiles, "mapping": mapping}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("grid",)

`class RandomRain` `(slant_lower=None, slant_upper=None, slant_range=(-10, 10), drop_length=20, drop_width=1, drop_color=(200, 200, 200), blur_value=7, brightness_coefficient=0.7, rain_type=None, always_apply=None, p=0.5)` [view source on GitHub] ¶

Adds rain effects to an image.

Parameters:

Name	Type	Description
`slant_range`	`tuple[int, int]`	tuple of type (slant_lower, slant_upper) representing the range for rain slant angle.
`drop_length`	`int`	Length of the raindrops.
`drop_width`	`int`	Width of the raindrops.
`drop_color`	`tuple[int, int, int]`	Color of the rain drops in RGB format.
`blur_value`	`int`	Blur value for simulating rain effect. Rainy views are blurry.
`brightness_coefficient`	`float`	Coefficient to adjust the brightness of the image. Rainy days are usually shady. Should be in the range (0, 1].
`rain_type`	`Optional[str]`	Type of rain to simulate. One of [None, "drizzle", "heavy", "torrential"].

Targets

image

Image types: uint8, float32

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomRain(ImageOnlyTransform):
    """Adds rain effects to an image.

    Args:
        slant_range (tuple[int, int]): tuple of type (slant_lower, slant_upper) representing the range for
            rain slant angle.
        drop_length (int): Length of the raindrops.
        drop_width (int): Width of the raindrops.
        drop_color (tuple[int, int, int]): Color of the rain drops in RGB format.
        blur_value (int): Blur value for simulating rain effect. Rainy views are blurry.
        brightness_coefficient (float): Coefficient to adjust the brightness of the image.
            Rainy days are usually shady. Should be in the range (0, 1].
        rain_type (Optional[str]): Type of rain to simulate. One of [None, "drizzle", "heavy", "torrential"].


    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    """

    class InitSchema(BaseTransformInitSchema):
        slant_lower: int | None = Field(
            default=None,
            description="Lower bound for rain slant angle",
        )
        slant_upper: int | None = Field(
            default=None,
            description="Upper bound for rain slant angle",
        )
        slant_range: Annotated[tuple[float, float], AfterValidator(nondecreasing)] = Field(
            default=(-10, 10),
            description="tuple like (slant_lower, slant_upper) for rain slant angle",
        )
        drop_length: int = Field(default=20, description="Length of raindrops", ge=1)
        drop_width: int = Field(default=1, description="Width of raindrops", ge=1)
        drop_color: tuple[int, int, int] = Field(default=(200, 200, 200), description="Color of raindrops")
        blur_value: int = Field(default=7, description="Blur value for simulating rain effect", ge=1)
        brightness_coefficient: float = Field(
            default=0.7,
            description="Brightness coefficient for rainy effect",
            gt=0,
            le=1,
        )
        rain_type: RainMode | None = Field(default=None, description="Type of rain to simulate")

        @model_validator(mode="after")
        def validate_ranges(self) -> Self:
            if self.slant_lower is not None or self.slant_upper is not None:
                if self.slant_lower is not None:
                    warn(
                        "`slant_lower` deprecated. Use `slant_range` as tuple (slant_lower, slant_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.slant_upper is not None:
                    warn(
                        "`slant_upper` deprecated. Use `slant_range` as tuple (slant_lower, slant_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.slant_lower if self.slant_lower is not None else self.slant_range[0]
                upper = self.slant_upper if self.slant_upper is not None else self.slant_range[1]
                self.slant_range = (lower, upper)
                self.slant_lower = None
                self.slant_upper = None

            # Validate the slant_range
            if not (-MAX_RAIN_ANGLE <= self.slant_range[0] <= self.slant_range[1] <= MAX_RAIN_ANGLE):
                raise ValueError(
                    f"slant_range values should be increasing within [-{MAX_RAIN_ANGLE}, {MAX_RAIN_ANGLE}] range.",
                )
            return self

    def __init__(
        self,
        slant_lower: int | None = None,
        slant_upper: int | None = None,
        slant_range: tuple[int, int] = (-10, 10),
        drop_length: int = 20,
        drop_width: int = 1,
        drop_color: tuple[int, int, int] = (200, 200, 200),
        blur_value: int = 7,
        brightness_coefficient: float = 0.7,
        rain_type: RainMode | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.slant_range = slant_range
        self.drop_length = drop_length
        self.drop_width = drop_width
        self.drop_color = drop_color
        self.blur_value = blur_value
        self.brightness_coefficient = brightness_coefficient
        self.rain_type = rain_type

    def apply(
        self,
        img: np.ndarray,
        slant: int,
        drop_length: int,
        rain_drops: list[tuple[int, int]],
        **params: Any,
    ) -> np.ndarray:
        return fmain.add_rain(
            img,
            slant,
            drop_length,
            self.drop_width,
            self.drop_color,
            self.blur_value,
            self.brightness_coefficient,
            rain_drops,
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        slant = int(random.uniform(*self.slant_range))

        height, width = params["shape"][:2]
        area = height * width

        if self.rain_type == "drizzle":
            num_drops = area // 770
            drop_length = 10
        elif self.rain_type == "heavy":
            num_drops = width * height // 600
            drop_length = 30
        elif self.rain_type == "torrential":
            num_drops = area // 500
            drop_length = 60
        else:
            drop_length = self.drop_length
            num_drops = area // 600

        rain_drops = []

        for _ in range(num_drops):  # If You want heavy rain, try increasing this
            x = random.randint(slant, width) if slant < 0 else random.randint(0, width - slant)

            y = random.randint(0, height - drop_length)

            rain_drops.append((x, y))

        return {"drop_length": drop_length, "slant": slant, "rain_drops": rain_drops}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "slant_range",
            "drop_length",
            "drop_width",
            "drop_color",
            "blur_value",
            "brightness_coefficient",
            "rain_type",
        )

`apply (self, img, slant, drop_length, rain_drops, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    slant: int,
    drop_length: int,
    rain_drops: list[tuple[int, int]],
    **params: Any,
) -> np.ndarray:
    return fmain.add_rain(
        img,
        slant,
        drop_length,
        self.drop_width,
        self.drop_color,
        self.blur_value,
        self.brightness_coefficient,
        rain_drops,
    )

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    slant = int(random.uniform(*self.slant_range))

    height, width = params["shape"][:2]
    area = height * width

    if self.rain_type == "drizzle":
        num_drops = area // 770
        drop_length = 10
    elif self.rain_type == "heavy":
        num_drops = width * height // 600
        drop_length = 30
    elif self.rain_type == "torrential":
        num_drops = area // 500
        drop_length = 60
    else:
        drop_length = self.drop_length
        num_drops = area // 600

    rain_drops = []

    for _ in range(num_drops):  # If You want heavy rain, try increasing this
        x = random.randint(slant, width) if slant < 0 else random.randint(0, width - slant)

        y = random.randint(0, height - drop_length)

        rain_drops.append((x, y))

    return {"drop_length": drop_length, "slant": slant, "rain_drops": rain_drops}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "slant_range",
        "drop_length",
        "drop_width",
        "drop_color",
        "blur_value",
        "brightness_coefficient",
        "rain_type",
    )

`class RandomShadow` `(shadow_roi=(0, 0.5, 1, 1), num_shadows_limit=(1, 2), num_shadows_lower=None, num_shadows_upper=None, shadow_dimension=5, always_apply=None, p=0.5)` [view source on GitHub] ¶

Simulates shadows for the image

Parameters:

Name	Type	Description
`shadow_roi`	`tuple[float, float, float, float]`	region of the image where shadows will appear. All values should be in range [0, 1].
`num_shadows_limit`	`tuple[int, int]`	Lower and upper limits for the possible number of shadows.
`shadow_dimension`	`int`	number of edges in the shadow polygons

Targets

image

Image types: uint8, float32

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomShadow(ImageOnlyTransform):
    """Simulates shadows for the image

    Args:
        shadow_roi: region of the image where shadows
            will appear. All values should be in range [0, 1].
        num_shadows_limit: Lower and upper limits for the possible number of shadows.
        shadow_dimension: number of edges in the shadow polygons

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """

    class InitSchema(BaseTransformInitSchema):
        shadow_roi: tuple[float, float, float, float] = Field(
            default=(0, 0.5, 1, 1),
            description="Region of the image where shadows will appear",
        )
        num_shadows_limit: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] = (
            1,
            2,
        )
        num_shadows_lower: int | None = Field(
            default=None,
            description="Lower limit for the possible number of shadows",
        )
        num_shadows_upper: int | None = Field(
            default=None,
            description="Upper limit for the possible number of shadows",
        )
        shadow_dimension: int = Field(default=5, description="Number of edges in the shadow polygons", ge=1)

        @model_validator(mode="after")
        def validate_shadows(self) -> Self:
            if self.num_shadows_lower is not None:
                warn(
                    "`num_shadows_lower` is deprecated. Use `num_shadows_limit` instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.num_shadows_upper is not None:
                warn(
                    "`num_shadows_upper` is deprecated. Use `num_shadows_limit` instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.num_shadows_lower is not None or self.num_shadows_upper is not None:
                num_shadows_lower = (
                    self.num_shadows_lower if self.num_shadows_lower is not None else self.num_shadows_limit[0]
                )
                num_shadows_upper = (
                    self.num_shadows_upper if self.num_shadows_upper is not None else self.num_shadows_limit[1]
                )

                self.num_shadows_limit = (num_shadows_lower, num_shadows_upper)
                self.num_shadows_lower = None
                self.num_shadows_upper = None

            shadow_lower_x, shadow_lower_y, shadow_upper_x, shadow_upper_y = self.shadow_roi

            if not 0 <= shadow_lower_x <= shadow_upper_x <= 1 or not 0 <= shadow_lower_y <= shadow_upper_y <= 1:
                raise ValueError(f"Invalid shadow_roi. Got: {self.shadow_roi}")

            return self

    def __init__(
        self,
        shadow_roi: tuple[float, float, float, float] = (0, 0.5, 1, 1),
        num_shadows_limit: tuple[int, int] = (1, 2),
        num_shadows_lower: int | None = None,
        num_shadows_upper: int | None = None,
        shadow_dimension: int = 5,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.shadow_roi = shadow_roi
        self.shadow_dimension = shadow_dimension
        self.num_shadows_limit = num_shadows_limit

    def apply(self, img: np.ndarray, vertices_list: list[np.ndarray], **params: Any) -> np.ndarray:
        return fmain.add_shadow(img, vertices_list)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, list[np.ndarray]]:
        height, width = params["shape"][:2]

        num_shadows = random.randint(self.num_shadows_limit[0], self.num_shadows_limit[1])

        x_min, y_min, x_max, y_max = self.shadow_roi

        x_min = int(x_min * width)
        x_max = int(x_max * width)
        y_min = int(y_min * height)
        y_max = int(y_max * height)

        vertices_list = [
            np.stack(
                [
                    random_utils.randint(x_min, x_max, size=5),
                    random_utils.randint(y_min, y_max, size=5),
                ],
                axis=1,
            )
            for _ in range(num_shadows)
        ]

        return {"vertices_list": vertices_list}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "shadow_roi",
            "num_shadows_limit",
            "shadow_dimension",
        )

`apply (self, img, vertices_list, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, vertices_list: list[np.ndarray], **params: Any) -> np.ndarray:
    return fmain.add_shadow(img, vertices_list)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, list[np.ndarray]]:
    height, width = params["shape"][:2]

    num_shadows = random.randint(self.num_shadows_limit[0], self.num_shadows_limit[1])

    x_min, y_min, x_max, y_max = self.shadow_roi

    x_min = int(x_min * width)
    x_max = int(x_max * width)
    y_min = int(y_min * height)
    y_max = int(y_max * height)

    vertices_list = [
        np.stack(
            [
                random_utils.randint(x_min, x_max, size=5),
                random_utils.randint(y_min, y_max, size=5),
            ],
            axis=1,
        )
        for _ in range(num_shadows)
    ]

    return {"vertices_list": vertices_list}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "shadow_roi",
        "num_shadows_limit",
        "shadow_dimension",
    )

`class RandomSnow` `(snow_point_lower=None, snow_point_upper=None, brightness_coeff=2.5, snow_point_range=(0.1, 0.3), always_apply=None, p=0.5)` [view source on GitHub] ¶

Bleach out some pixel values imitating snow.

Parameters:

Name	Type	Description
`snow_point_range`	`tuple`	tuple of bounds on the amount of snow i.e. (snow_point_lower, snow_point_upper). Both values should be in the (0, 1) range. Default: (0.1, 0.3).
`brightness_coeff`	`float`	Coefficient applied to increase the brightness of pixels below the snow_point threshold. Larger values lead to more pronounced snow effects. Should be > 0. Default: 2.5.
`p`	`float`	Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomSnow(ImageOnlyTransform):
    """Bleach out some pixel values imitating snow.

    Args:
        snow_point_range (tuple): tuple of bounds on the amount of snow i.e. (snow_point_lower, snow_point_upper).
            Both values should be in the (0, 1) range. Default: (0.1, 0.3).
        brightness_coeff (float): Coefficient applied to increase the brightness of pixels
            below the snow_point threshold. Larger values lead to more pronounced snow effects.
            Should be > 0. Default: 2.5.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    """

    class InitSchema(BaseTransformInitSchema):
        snow_point_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)] = (
            Field(
                default=(0.1, 0.3),
                description="lower and upper bound on the amount of snow as tuple (snow_point_lower, snow_point_upper)",
            )
        )
        snow_point_lower: float | None = Field(
            default=None,
            description="Lower bound of the amount of snow",
            gt=0,
            lt=1,
        )
        snow_point_upper: float | None = Field(
            default=None,
            description="Upper bound of the amount of snow",
            gt=0,
            lt=1,
        )
        brightness_coeff: float = Field(default=2.5, description="Brightness coefficient, must be > 0", gt=0)

        @model_validator(mode="after")
        def validate_ranges(self) -> Self:
            if self.snow_point_lower is not None or self.snow_point_upper is not None:
                if self.snow_point_lower is not None:
                    warn(
                        "`snow_point_lower` deprecated. Use `snow_point_range` as tuple"
                        " (snow_point_lower, snow_point_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.snow_point_upper is not None:
                    warn(
                        "`snow_point_upper` deprecated. Use `snow_point_range` as tuple"
                        "(snow_point_lower, snow_point_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.snow_point_lower if self.snow_point_lower is not None else self.snow_point_range[0]
                upper = self.snow_point_upper if self.snow_point_upper is not None else self.snow_point_range[1]
                self.snow_point_range = (lower, upper)
                self.snow_point_lower = None
                self.snow_point_upper = None

            # Validate the snow_point_range
            if not (0 < self.snow_point_range[0] <= self.snow_point_range[1] < 1):
                raise ValueError("snow_point_range values should be increasing within (0, 1) range.")

            return self

    def __init__(
        self,
        snow_point_lower: float | None = None,
        snow_point_upper: float | None = None,
        brightness_coeff: float = 2.5,
        snow_point_range: tuple[float, float] = (0.1, 0.3),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)

        self.snow_point_range = snow_point_range
        self.brightness_coeff = brightness_coeff

    def apply(self, img: np.ndarray, snow_point: float, **params: Any) -> np.ndarray:
        return fmain.add_snow(img, snow_point, self.brightness_coeff)

    def get_params(self) -> dict[str, np.ndarray]:
        return {"snow_point": random.uniform(*self.snow_point_range)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "snow_point_range", "brightness_coeff"

`apply (self, img, snow_point, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, snow_point: float, **params: Any) -> np.ndarray:
    return fmain.add_snow(img, snow_point, self.brightness_coeff)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, np.ndarray]:
    return {"snow_point": random.uniform(*self.snow_point_range)}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return "snow_point_range", "brightness_coeff"

`class RandomSunFlare` `(flare_roi=(0, 0, 1, 0.5), angle_lower=None, angle_upper=None, num_flare_circles_lower=None, num_flare_circles_upper=None, src_radius=400, src_color=(255, 255, 255), angle_range=(0, 1), num_flare_circles_range=(6, 10), always_apply=None, p=0.5)` [view source on GitHub] ¶

Simulates Sun Flare for the image

Parameters:

Name	Type	Description
`flare_roi`	`tuple[float, float, float, float]`	Tuple specifying the region of the image where flare will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].
`src_radius`	`int`	Radius of the source for the flare.
`src_color`	`tuple[int, int, int]`	Color of the flare as an (R, G, B) tuple.
`angle_range`	`tuple[float, float]`	tuple specifying the range of angles for the flare. Both ends of the range are in the [0, 1] interval.
`num_flare_circles_range`	`tuple[int, int]`	tuple specifying the range for the number of flare circles.
`p`	`float`	Probability of applying the transform.

Targets

image

Image types: uint8

Reference

https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomSunFlare(ImageOnlyTransform):
    """Simulates Sun Flare for the image

    Args:
        flare_roi (tuple[float, float, float, float]): Tuple specifying the region of the image where flare will
            appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].
        src_radius (int): Radius of the source for the flare.
        src_color (tuple[int, int, int]): Color of the flare as an (R, G, B) tuple.
        angle_range (tuple[float, float]): tuple specifying the range of angles for the flare.
            Both ends of the range are in the [0, 1] interval.
        num_flare_circles_range (tuple[int, int]): tuple specifying the range for the number of flare circles.
        p (float): Probability of applying the transform.

    Targets:
        image

    Image types:
        uint8

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """

    class InitSchema(BaseTransformInitSchema):
        flare_roi: tuple[float, float, float, float] = Field(
            default=(0, 0, 1, 0.5),
            description="Region of the image where flare will appear",
        )
        angle_lower: float | None = Field(default=None, description="Lower bound for the angle", ge=0, le=1)
        angle_upper: float | None = Field(default=None, description="Upper bound for the angle", ge=0, le=1)

        num_flare_circles_lower: int | None = Field(
            default=6,
            description="Lower limit for the number of flare circles",
            ge=0,
        )
        num_flare_circles_upper: int | None = Field(
            default=10,
            description="Upper limit for the number of flare circles",
            gt=0,
        )
        src_radius: int = Field(default=400, description="Source radius for the flare")
        src_color: tuple[int, ...] = Field(default=(255, 255, 255), description="Color of the flare")

        angle_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)] = Field(
            default=(0, 1),
            description="Angle range",
        )

        num_flare_circles_range: Annotated[
            tuple[int, int],
            AfterValidator(check_1plus),
            AfterValidator(nondecreasing),
        ] = Field(default=(6, 10), description="Number of flare circles range")

        @model_validator(mode="after")
        def validate_parameters(self) -> Self:
            flare_center_lower_x, flare_center_lower_y, flare_center_upper_x, flare_center_upper_y = self.flare_roi
            if (
                not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
                or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
            ):
                raise ValueError(f"Invalid flare_roi. Got: {self.flare_roi}")

            if self.angle_lower is not None or self.angle_upper is not None:
                if self.angle_lower is not None:
                    warn(
                        "`angle_lower` deprecated. Use `angle_range` as tuple (angle_lower, angle_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.angle_upper is not None:
                    warn(
                        "`angle_upper` deprecated. Use `angle_range` as tuple(angle_lower, angle_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.angle_lower if self.angle_lower is not None else self.angle_range[0]
                upper = self.angle_upper if self.angle_upper is not None else self.angle_range[1]
                self.angle_range = (lower, upper)

            if self.num_flare_circles_lower is not None or self.num_flare_circles_upper is not None:
                if self.num_flare_circles_lower is not None:
                    warn(
                        "`num_flare_circles_lower` deprecated. Use `num_flare_circles_range` as tuple"
                        " (num_flare_circles_lower, num_flare_circles_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.num_flare_circles_upper is not None:
                    warn(
                        "`num_flare_circles_upper` deprecated. Use `num_flare_circles_range` as tuple"
                        " (num_flare_circles_lower, num_flare_circles_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = (
                    self.num_flare_circles_lower
                    if self.num_flare_circles_lower is not None
                    else self.num_flare_circles_range[0]
                )
                upper = (
                    self.num_flare_circles_upper
                    if self.num_flare_circles_upper is not None
                    else self.num_flare_circles_range[1]
                )
                self.num_flare_circles_range = (lower, upper)

            return self

    def __init__(
        self,
        flare_roi: tuple[float, float, float, float] = (0, 0, 1, 0.5),
        angle_lower: float | None = None,
        angle_upper: float | None = None,
        num_flare_circles_lower: int | None = None,
        num_flare_circles_upper: int | None = None,
        src_radius: int = 400,
        src_color: tuple[int, ...] = (255, 255, 255),
        angle_range: tuple[float, float] = (0, 1),
        num_flare_circles_range: tuple[int, int] = (6, 10),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.angle_range = angle_range
        self.num_flare_circles_range = num_flare_circles_range

        self.src_radius = src_radius
        self.src_color = src_color
        self.flare_roi = flare_roi

    def apply(
        self,
        img: np.ndarray,
        flare_center: tuple[float, float],
        circles: list[Any],
        **params: Any,
    ) -> np.ndarray:
        if circles is None:
            circles = []
        return fmain.add_sun_flare(
            img,
            flare_center,
            self.src_radius,
            self.src_color,
            circles,
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        angle = 2 * math.pi * random.uniform(*self.angle_range)

        (flare_center_lower_x, flare_center_lower_y, flare_center_upper_x, flare_center_upper_y) = self.flare_roi

        flare_center_x = random.uniform(flare_center_lower_x, flare_center_upper_x)
        flare_center_y = random.uniform(flare_center_lower_y, flare_center_upper_y)

        flare_center_x = int(width * flare_center_x)
        flare_center_y = int(height * flare_center_y)

        num_circles = random.randint(*self.num_flare_circles_range)

        circles = []

        x = []
        y = []

        def line(t: float) -> tuple[float, float]:
            return (flare_center_x + t * math.cos(angle), flare_center_y + t * math.sin(angle))

        for t_val in range(-flare_center_x, width - flare_center_x, 10):
            rand_x, rand_y = line(t_val)
            x.append(rand_x)
            y.append(rand_y)

        for _ in range(num_circles):
            alpha = random.uniform(0.05, 0.2)
            r = random.randint(0, len(x) - 1)
            rad = random.randint(1, max(height // 100 - 2, 2))

            r_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
            g_color = random.randint(max(self.src_color[1] - 50, 0), self.src_color[1])
            b_color = random.randint(max(self.src_color[2] - 50, 0), self.src_color[2])

            circles += [
                (
                    alpha,
                    (int(x[r]), int(y[r])),
                    pow(rad, 3),
                    (r_color, g_color, b_color),
                ),
            ]

        return {
            "circles": circles,
            "flare_center": (flare_center_x, flare_center_y),
        }

    def get_transform_init_args(self) -> dict[str, Any]:
        return {
            "flare_roi": self.flare_roi,
            "angle_range": self.angle_range,
            "num_flare_circles_range": self.num_flare_circles_range,
            "src_radius": self.src_radius,
            "src_color": self.src_color,
        }

`apply (self, img, flare_center, circles, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    flare_center: tuple[float, float],
    circles: list[Any],
    **params: Any,
) -> np.ndarray:
    if circles is None:
        circles = []
    return fmain.add_sun_flare(
        img,
        flare_center,
        self.src_radius,
        self.src_color,
        circles,
    )

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    angle = 2 * math.pi * random.uniform(*self.angle_range)

    (flare_center_lower_x, flare_center_lower_y, flare_center_upper_x, flare_center_upper_y) = self.flare_roi

    flare_center_x = random.uniform(flare_center_lower_x, flare_center_upper_x)
    flare_center_y = random.uniform(flare_center_lower_y, flare_center_upper_y)

    flare_center_x = int(width * flare_center_x)
    flare_center_y = int(height * flare_center_y)

    num_circles = random.randint(*self.num_flare_circles_range)

    circles = []

    x = []
    y = []

    def line(t: float) -> tuple[float, float]:
        return (flare_center_x + t * math.cos(angle), flare_center_y + t * math.sin(angle))

    for t_val in range(-flare_center_x, width - flare_center_x, 10):
        rand_x, rand_y = line(t_val)
        x.append(rand_x)
        y.append(rand_y)

    for _ in range(num_circles):
        alpha = random.uniform(0.05, 0.2)
        r = random.randint(0, len(x) - 1)
        rad = random.randint(1, max(height // 100 - 2, 2))

        r_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
        g_color = random.randint(max(self.src_color[1] - 50, 0), self.src_color[1])
        b_color = random.randint(max(self.src_color[2] - 50, 0), self.src_color[2])

        circles += [
            (
                alpha,
                (int(x[r]), int(y[r])),
                pow(rad, 3),
                (r_color, g_color, b_color),
            ),
        ]

    return {
        "circles": circles,
        "flare_center": (flare_center_x, flare_center_y),
    }

`class RandomToneCurve` `(scale=0.1, per_channel=False, always_apply=None, p=0.5)` [view source on GitHub] ¶

Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name	Type	Description
`scale`	`float`	Standard deviation of the normal distribution used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Default: 0.1
`per_channel`	`bool`	If `True`, the tone curve will be applied to each channel of the input image separately, which can lead to color distortion. Default: False.
`p`	`float`	Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Reference

"What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance" by Mahmoud Afifi and Michael S. Brown, ICCV 2019.
GitHub repository: https://github.com/mahmoudnafifi/WB_color_augmenter

Examples:

Python

>>> import numpy as np
>>> from albumentations import RandomToneCurve
>>> img = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = RandomToneCurve(scale=0.1, per_channel=True, p=1.0)
>>> transformed_img = transform(image=img)['image']

This transform applies a random tone curve to the input image by adjusting the relationship between bright and dark areas. When per_channel is set to True, each channel is adjusted separately, potentially causing color distortions. Otherwise, the same adjustment is applied to all channels, preserving the original color relationships.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RandomToneCurve(ImageOnlyTransform):
    """Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

    Args:
        scale (float): Standard deviation of the normal distribution used to sample random distances
            to move two control points that modify the image's curve. Values should be in range [0, 1]. Default: 0.1
        per_channel (bool): If `True`, the tone curve will be applied to each channel of the input image separately,
            which can lead to color distortion. Default: False.
        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        - "What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance"
          by Mahmoud Afifi and Michael S. Brown, ICCV 2019.
        - GitHub repository: https://github.com/mahmoudnafifi/WB_color_augmenter

    Example:
        >>> import numpy as np
        >>> from albumentations import RandomToneCurve
        >>> img = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = RandomToneCurve(scale=0.1, per_channel=True, p=1.0)
        >>> transformed_img = transform(image=img)['image']

    This transform applies a random tone curve to the input image by adjusting the relationship between bright and
    dark areas. When `per_channel` is set to True, each channel is adjusted separately, potentially causing color
    distortions. Otherwise, the same adjustment is applied to all channels, preserving the original color relationships.
    """

    class InitSchema(BaseTransformInitSchema):
        scale: float = Field(
            default=0.1,
            description="Standard deviation of the normal distribution used to sample random distances",
            ge=0,
            le=1,
        )
        per_channel: bool = Field(default=False, description="Apply the tone curve to each channel separately")

    def __init__(
        self,
        scale: float = 0.1,
        per_channel: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.scale = scale
        self.per_channel = per_channel

    def apply(
        self,
        img: np.ndarray,
        low_y: float | np.ndarray,
        high_y: float | np.ndarray,
        **params: Any,
    ) -> np.ndarray:
        return fmain.move_tone_curve(img, low_y, high_y)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image = data["image"] if "image" in data else data["images"][0]
        num_channels = get_num_channels(image)

        if self.per_channel and num_channels != 1:
            return {
                "low_y": np.clip(random_utils.normal(loc=0.25, scale=self.scale, size=[num_channels]), 0, 1),
                "high_y": np.clip(random_utils.normal(loc=0.75, scale=self.scale, size=[num_channels]), 0, 1),
            }
        # Same values for all channels
        low_y = np.clip(random_utils.normal(loc=0.25, scale=self.scale), 0, 1)
        high_y = np.clip(random_utils.normal(loc=0.75, scale=self.scale), 0, 1)

        return {"low_y": low_y, "high_y": high_y}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "scale", "per_channel"

`apply (self, img, low_y, high_y, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    low_y: float | np.ndarray,
    high_y: float | np.ndarray,
    **params: Any,
) -> np.ndarray:
    return fmain.move_tone_curve(img, low_y, high_y)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image = data["image"] if "image" in data else data["images"][0]
    num_channels = get_num_channels(image)

    if self.per_channel and num_channels != 1:
        return {
            "low_y": np.clip(random_utils.normal(loc=0.25, scale=self.scale, size=[num_channels]), 0, 1),
            "high_y": np.clip(random_utils.normal(loc=0.75, scale=self.scale, size=[num_channels]), 0, 1),
        }
    # Same values for all channels
    low_y = np.clip(random_utils.normal(loc=0.25, scale=self.scale), 0, 1)
    high_y = np.clip(random_utils.normal(loc=0.75, scale=self.scale), 0, 1)

    return {"low_y": low_y, "high_y": high_y}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "scale", "per_channel"

`class RingingOvershoot` `(blur_limit=(7, 15), cutoff=(0.7853981633974483, 1.5707963267948966), always_apply=None, p=0.5)` [view source on GitHub] ¶

Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.

Parameters:

Name	Type	Description
`blur_limit`	`ScaleIntType`	maximum kernel size for sinc filter. Should be in range [3, inf). Default: (7, 15).
`cutoff`	`ScaleFloatType`	range to choose the cutoff frequency in radians. Should be in range (0, np.pi) Default: (np.pi / 4, np.pi / 2).
`p`	`float`	probability of applying the transform. Default: 0.5.

Reference

dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter https://arxiv.org/abs/2107.10833

Targets

image

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class RingingOvershoot(ImageOnlyTransform):
    """Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.

    Args:
        blur_limit: maximum kernel size for sinc filter.
            Should be in range [3, inf). Default: (7, 15).
        cutoff: range to choose the cutoff frequency in radians.
            Should be in range (0, np.pi)
            Default: (np.pi / 4, np.pi / 2).
        p: probability of applying the transform. Default: 0.5.

    Reference:
        dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
        https://arxiv.org/abs/2107.10833

    Targets:
        image

    """

    class InitSchema(BlurInitSchema):
        blur_limit: ScaleIntType = Field(default=(7, 15), description="Maximum kernel size for sinc filter.")
        cutoff: ScaleFloatType = Field(default=(np.pi / 4, np.pi / 2), description="Cutoff frequency range in radians.")

        @field_validator("cutoff")
        @classmethod
        def check_cutoff(cls, v: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
            bounds = 0, np.pi
            result = to_tuple(v, v)
            check_range(result, *bounds, info.field_name)
            return result

    def __init__(
        self,
        blur_limit: ScaleIntType = (7, 15),
        cutoff: ScaleFloatType = (np.pi / 4, np.pi / 2),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.cutoff = cast(Tuple[float, float], cutoff)

    def get_params(self) -> dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        if ksize % 2 == 0:
            raise ValueError(f"Kernel size must be odd. Got: {ksize}")

        cutoff = random.uniform(*self.cutoff)

        # From dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
        with np.errstate(divide="ignore", invalid="ignore"):
            kernel = np.fromfunction(
                lambda x, y: cutoff
                * special.j1(cutoff * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2))
                / (2 * np.pi * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2)),
                [ksize, ksize],
            )
        kernel[(ksize - 1) // 2, (ksize - 1) // 2] = cutoff**2 / (4 * np.pi)

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)

        return {"kernel": kernel}

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel)

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("blur_limit", "cutoff")

`apply (self, img, kernel, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, np.ndarray]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
    if ksize % 2 == 0:
        raise ValueError(f"Kernel size must be odd. Got: {ksize}")

    cutoff = random.uniform(*self.cutoff)

    # From dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
    with np.errstate(divide="ignore", invalid="ignore"):
        kernel = np.fromfunction(
            lambda x, y: cutoff
            * special.j1(cutoff * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2))
            / (2 * np.pi * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2)),
            [ksize, ksize],
        )
    kernel[(ksize - 1) // 2, (ksize - 1) // 2] = cutoff**2 / (4 * np.pi)

    # Normalize kernel
    kernel = kernel.astype(np.float32) / np.sum(kernel)

    return {"kernel": kernel}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("blur_limit", "cutoff")

`class Sharpen` `(alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=None, p=0.5)` [view source on GitHub] ¶

Sharpen the input image and overlays the result with the original image.

Parameters:

Name	Type	Description
`alpha`	`tuple[float, float]`	range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
`lightness`	`tuple[float, float]`	range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Sharpen(ImageOnlyTransform):
    """Sharpen the input image and overlays the result with the original image.

    Args:
        alpha: range to choose the visibility of the sharpened image. At 0, only the original image is
            visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
        lightness: range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    """

    class InitSchema(BaseTransformInitSchema):
        alpha: ZeroOneRangeType = (0.2, 0.5)
        lightness: NonNegativeFloatRangeType = (0.5, 1.0)

    def __init__(
        self,
        alpha: tuple[float, float] = (0.2, 0.5),
        lightness: tuple[float, float] = (0.5, 1.0),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha
        self.lightness = lightness

    @staticmethod
    def __generate_sharpening_matrix(alpha_sample: np.ndarray, lightness_sample: np.ndarray) -> np.ndarray:
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [[-1, -1, -1], [-1, 8 + lightness_sample, -1], [-1, -1, -1]],
            dtype=np.float32,
        )

        return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect

    def get_params(self) -> dict[str, np.ndarray]:
        alpha = random.uniform(*self.alpha)
        lightness = random.uniform(*self.lightness)
        sharpening_matrix = self.__generate_sharpening_matrix(alpha_sample=alpha, lightness_sample=lightness)
        return {"sharpening_matrix": sharpening_matrix}

    def apply(self, img: np.ndarray, sharpening_matrix: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, sharpening_matrix)

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("alpha", "lightness")

`apply (self, img, sharpening_matrix, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, sharpening_matrix: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, sharpening_matrix)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, np.ndarray]:
    alpha = random.uniform(*self.alpha)
    lightness = random.uniform(*self.lightness)
    sharpening_matrix = self.__generate_sharpening_matrix(alpha_sample=alpha, lightness_sample=lightness)
    return {"sharpening_matrix": sharpening_matrix}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("alpha", "lightness")

`class Solarize` `(threshold=(128, 128), p=0.5, always_apply=None)` [view source on GitHub] ¶

Invert all pixel values above a threshold.

Parameters:

Name	Type	Description
`threshold`	`ScaleType`	range for solarizing threshold. If threshold is a single value, the range will be [1, threshold]. Default: 128.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Solarize(ImageOnlyTransform):
    """Invert all pixel values above a threshold.

    Args:
        threshold: range for solarizing threshold.
            If threshold is a single value, the range will be [1, threshold]. Default: 128.
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        threshold: OnePlusFloatRangeType = (128, 128)

    def __init__(self, threshold: ScaleType = (128, 128), p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.threshold = cast(Tuple[float, float], threshold)

    def apply(self, img: np.ndarray, threshold: int, **params: Any) -> np.ndarray:
        return fmain.solarize(img, threshold)

    def get_params(self) -> dict[str, float]:
        return {"threshold": random.uniform(self.threshold[0], self.threshold[1])}

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("threshold",)

`apply (self, img, threshold, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, threshold: int, **params: Any) -> np.ndarray:
    return fmain.solarize(img, threshold)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {"threshold": random.uniform(self.threshold[0], self.threshold[1])}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str]:
    return ("threshold",)

`class Spatter` `(mean=(0.65, 0.65), std=(0.3, 0.3), gauss_sigma=(2, 2), cutout_threshold=(0.68, 0.68), intensity=(0.6, 0.6), mode='rain', color=None, always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

Parameters:

Name	Type	Description
`mean`	`float, or tuple of floats`	Mean value of normal distribution for generating liquid layer. If single float mean will be sampled from `(0, mean)` If tuple of float mean will be sampled from range `(mean[0], mean[1])`. If you want constant value use (mean, mean). Default (0.65, 0.65)
`std`	`float, or tuple of floats`	Standard deviation value of normal distribution for generating liquid layer. If single float the number will be sampled from `(0, std)`. If tuple of float std will be sampled from range `(std[0], std[1])`. If you want constant value use (std, std). Default: (0.3, 0.3).
`gauss_sigma`	`float, or tuple of floats`	Sigma value for gaussian filtering of liquid layer. If single float the number will be sampled from `(0, gauss_sigma)`. If tuple of float gauss_sigma will be sampled from range `(gauss_sigma[0], gauss_sigma[1])`. If you want constant value use (gauss_sigma, gauss_sigma). Default: (2, 3).
`cutout_threshold`	`float, or tuple of floats`	Threshold for filtering liqued layer (determines number of drops). If single float it will used as cutout_threshold. If single float the number will be sampled from `(0, cutout_threshold)`. If tuple of float cutout_threshold will be sampled from range `(cutout_threshold[0], cutout_threshold[1])`. If you want constant value use `(cutout_threshold, cutout_threshold)`. Default: (0.68, 0.68).
`intensity`	`float, or tuple of floats`	Intensity of corruption. If single float the number will be sampled from `(0, intensity)`. If tuple of float intensity will be sampled from range `(intensity[0], intensity[1])`. If you want constant value use `(intensity, intensity)`. Default: (0.6, 0.6).
`mode`	`string, or list of strings`	Type of corruption. Currently, supported options are 'rain' and 'mud'. If list is provided type of corruption will be sampled list. Default: ("rain").
`color`	`list of (r, g, b) or dict or None`	Corruption elements color. If list uses provided list as color for specified mode. If dict uses provided color for specified mode. Color for each specified mode should be provided in dict. If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Reference

https://arxiv.org/abs/1903.12261 https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Spatter(ImageOnlyTransform):
    """Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

    Args:
        mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
            If single float mean will be sampled from `(0, mean)`
            If tuple of float mean will be sampled from range `(mean[0], mean[1])`.
            If you want constant value use (mean, mean).
            Default (0.65, 0.65)
        std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
            If single float the number will be sampled from `(0, std)`.
            If tuple of float std will be sampled from range `(std[0], std[1])`.
            If you want constant value use (std, std).
            Default: (0.3, 0.3).
        gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
            If single float the number will be sampled from `(0, gauss_sigma)`.
            If tuple of float gauss_sigma will be sampled from range `(gauss_sigma[0], gauss_sigma[1])`.
            If you want constant value use (gauss_sigma, gauss_sigma).
            Default: (2, 3).
        cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
            (determines number of drops). If single float it will used as cutout_threshold.
            If single float the number will be sampled from `(0, cutout_threshold)`.
            If tuple of float cutout_threshold will be sampled from range `(cutout_threshold[0], cutout_threshold[1])`.
            If you want constant value use `(cutout_threshold, cutout_threshold)`.
            Default: (0.68, 0.68).
        intensity (float, or tuple of floats): Intensity of corruption.
            If single float the number will be sampled from `(0, intensity)`.
            If tuple of float intensity will be sampled from range `(intensity[0], intensity[1])`.
            If you want constant value use `(intensity, intensity)`.
            Default: (0.6, 0.6).
        mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
             If list is provided type of corruption will be sampled list. Default: ("rain").
        color (list of (r, g, b) or dict or None): Corruption elements color.
            If list uses provided list as color for specified mode.
            If dict uses provided color for specified mode. Color for each specified mode should be provided in dict.
            If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
        https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

    """

    class InitSchema(BaseTransformInitSchema):
        mean: ZeroOneRangeType = (0.65, 0.65)
        std: ZeroOneRangeType = (0.3, 0.3)
        gauss_sigma: NonNegativeFloatRangeType = (2, 2)
        cutout_threshold: ZeroOneRangeType = (0.68, 0.68)
        intensity: ZeroOneRangeType = (0.6, 0.6)
        mode: SpatterMode | Sequence[SpatterMode] = Field(
            default="rain",
            description="Type of corruption ('rain', 'mud').",
        )
        color: Sequence[int] | dict[str, Sequence[int]] | None = None

        @field_validator("mode")
        @classmethod
        def check_mode(cls, mode: SpatterMode | Sequence[SpatterMode]) -> Sequence[SpatterMode]:
            if isinstance(mode, str):
                return [mode]
            return mode

        @model_validator(mode="after")
        def check_color(self) -> Self:
            if self.color is None:
                self.color = {"rain": [238, 238, 175], "mud": [20, 42, 63]}

            elif isinstance(self.color, (list, tuple)) and len(self.mode) == 1:
                if len(self.color) != NUM_RGB_CHANNELS:
                    msg = "Color must be a list of three integers for RGB format."
                    raise ValueError(msg)
                self.color = {self.mode[0]: self.color}
            elif isinstance(self.color, dict):
                result = {}
                for mode in self.mode:
                    if mode not in self.color:
                        raise ValueError(f"Color for mode {mode} is not specified.")
                    if len(self.color[mode]) != NUM_RGB_CHANNELS:
                        raise ValueError(f"Color for mode {mode} must be in RGB format.")
                    result[mode] = self.color[mode]
            else:
                msg = "Color must be a list of RGB values or a dict mapping mode to RGB values."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        mean: ScaleFloatType = (0.65, 0.65),
        std: ScaleFloatType = (0.3, 0.3),
        gauss_sigma: ScaleFloatType = (2, 2),
        cutout_threshold: ScaleFloatType = (0.68, 0.68),
        intensity: ScaleFloatType = (0.6, 0.6),
        mode: SpatterMode | Sequence[SpatterMode] = "rain",
        color: Sequence[int] | dict[str, Sequence[int]] | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.mean = cast(Tuple[float, float], mean)
        self.std = cast(Tuple[float, float], std)
        self.gauss_sigma = cast(Tuple[float, float], gauss_sigma)
        self.cutout_threshold = cast(Tuple[float, float], cutout_threshold)
        self.intensity = cast(Tuple[float, float], intensity)
        self.mode = mode
        self.color = cast(Dict[str, Sequence[int]], color)

    def apply(
        self,
        img: np.ndarray,
        non_mud: np.ndarray,
        mud: np.ndarray,
        drops: np.ndarray,
        mode: SpatterMode,
        **params: dict[str, Any],
    ) -> np.ndarray:
        return fmain.spatter(img, non_mud, mud, drops, mode)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        mean = random.uniform(self.mean[0], self.mean[1])
        std = random.uniform(self.std[0], self.std[1])
        cutout_threshold = random.uniform(self.cutout_threshold[0], self.cutout_threshold[1])
        sigma = random.uniform(self.gauss_sigma[0], self.gauss_sigma[1])
        mode = random.choice(self.mode)
        intensity = random.uniform(self.intensity[0], self.intensity[1])
        color = np.array(self.color[mode]) / 255.0

        liquid_layer = random_utils.normal(size=(height, width), loc=mean, scale=std)
        liquid_layer = gaussian_filter(liquid_layer, sigma=sigma, mode="nearest")
        liquid_layer[liquid_layer < cutout_threshold] = 0

        if mode == "rain":
            liquid_layer = clip(liquid_layer * 255, np.uint8)
            dist = 255 - cv2.Canny(liquid_layer, 50, 150)
            dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
            _, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
            dist = clip(blur(dist, 3), np.uint8)
            dist = fmain.equalize(dist)

            ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
            dist = fmain.convolve(dist, ker)
            dist = blur(dist, 3).astype(np.float32)

            m = liquid_layer * dist
            m *= 1 / np.max(m, axis=(0, 1))

            drops = m[:, :, None] * color * intensity
            mud = None
            non_mud = None
        else:
            m = np.where(liquid_layer > cutout_threshold, 1, 0)
            m = gaussian_filter(m.astype(np.float32), sigma=sigma, mode="nearest")
            m[m < 1.2 * cutout_threshold] = 0
            m = m[..., np.newaxis]

            mud = m * color
            non_mud = 1 - m
            drops = None

        return {
            "non_mud": non_mud,
            "mud": mud,
            "drops": drops,
            "mode": mode,
        }

    def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str, str]:
        return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode", "color"

`apply (self, img, non_mud, mud, drops, mode, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    non_mud: np.ndarray,
    mud: np.ndarray,
    drops: np.ndarray,
    mode: SpatterMode,
    **params: dict[str, Any],
) -> np.ndarray:
    return fmain.spatter(img, non_mud, mud, drops, mode)

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    mean = random.uniform(self.mean[0], self.mean[1])
    std = random.uniform(self.std[0], self.std[1])
    cutout_threshold = random.uniform(self.cutout_threshold[0], self.cutout_threshold[1])
    sigma = random.uniform(self.gauss_sigma[0], self.gauss_sigma[1])
    mode = random.choice(self.mode)
    intensity = random.uniform(self.intensity[0], self.intensity[1])
    color = np.array(self.color[mode]) / 255.0

    liquid_layer = random_utils.normal(size=(height, width), loc=mean, scale=std)
    liquid_layer = gaussian_filter(liquid_layer, sigma=sigma, mode="nearest")
    liquid_layer[liquid_layer < cutout_threshold] = 0

    if mode == "rain":
        liquid_layer = clip(liquid_layer * 255, np.uint8)
        dist = 255 - cv2.Canny(liquid_layer, 50, 150)
        dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
        _, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
        dist = clip(blur(dist, 3), np.uint8)
        dist = fmain.equalize(dist)

        ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
        dist = fmain.convolve(dist, ker)
        dist = blur(dist, 3).astype(np.float32)

        m = liquid_layer * dist
        m *= 1 / np.max(m, axis=(0, 1))

        drops = m[:, :, None] * color * intensity
        mud = None
        non_mud = None
    else:
        m = np.where(liquid_layer > cutout_threshold, 1, 0)
        m = gaussian_filter(m.astype(np.float32), sigma=sigma, mode="nearest")
        m[m < 1.2 * cutout_threshold] = 0
        m = m[..., np.newaxis]

        mud = m * color
        non_mud = 1 - m
        drops = None

    return {
        "non_mud": non_mud,
        "mud": mud,
        "drops": drops,
        "mode": mode,
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str, str]:
    return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode", "color"

`class Superpixels` `(p_replace=(0, 0.1), n_segments=(100, 100), max_size=128, interpolation=1, always_apply=None, p=0.5)` [view source on GitHub] ¶

Transform images partially/completely to their superpixel representation. This implementation uses skimage's version of the SLIC algorithm.

Parameters:

Name	Type	Description
`p_replace`	`float or tuple of float`	Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed).

Examples:

A probability of 0.0 would mean, that the pixels in no segment are replaced by their average color (image is not changed at all).
A probability of 0.5 would mean, that around half of all segments are replaced by their average color.
A probability of 1.0 would mean, that all segments are replaced by their average color (resulting in a Voronoi image).

    Behavior based on chosen data types for this parameter:
        * If a ``float``, then that ``flat`` will always be used.
        * If ``tuple`` ``(a, b)``, then a random probability will be
          sampled from the interval ``[a, b]`` per image.
n_segments (tuple of int): Rough target number of how many superpixels to generate (the algorithm
    may deviate from this number). Lower value will lead to coarser superpixels.
    Higher values are computationally more intensive and will hence lead to a slowdown
    Then a value from the discrete interval ``[a..b]`` will be sampled per image.
    If input is a single integer, the range will be ``(1, n_segments)``.
    If interested in a fixed number of segments, use ``(n_segments, n_segments)``.
max_size (int or None): Maximum image size at which the augmentation is performed.
    If the width or height of an image exceeds this value, it will be
    downscaled before the augmentation so that the longest side matches `max_size`.
    This is done to speed up the process. The final output image has the same size as the input image.
    Note that in case `p_replace` is below ``1.0``,
    the down-/upscaling will affect the not-replaced pixels too.
    Use ``None`` to apply no down-/upscaling.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.

Targets

image

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class Superpixels(ImageOnlyTransform):
    """Transform images partially/completely to their superpixel representation.
    This implementation uses skimage's version of the SLIC algorithm.

    Args:
        p_replace (float or tuple of float): Defines for any segment the probability that the pixels within that
            segment are replaced by their average color (otherwise, the pixels are not changed).

    Examples:
                * A probability of ``0.0`` would mean, that the pixels in no
                  segment are replaced by their average color (image is not
                  changed at all).
                * A probability of ``0.5`` would mean, that around half of all
                  segments are replaced by their average color.
                * A probability of ``1.0`` would mean, that all segments are
                  replaced by their average color (resulting in a Voronoi
                  image).
            Behavior based on chosen data types for this parameter:
                * If a ``float``, then that ``flat`` will always be used.
                * If ``tuple`` ``(a, b)``, then a random probability will be
                  sampled from the interval ``[a, b]`` per image.
        n_segments (tuple of int): Rough target number of how many superpixels to generate (the algorithm
            may deviate from this number). Lower value will lead to coarser superpixels.
            Higher values are computationally more intensive and will hence lead to a slowdown
            Then a value from the discrete interval ``[a..b]`` will be sampled per image.
            If input is a single integer, the range will be ``(1, n_segments)``.
            If interested in a fixed number of segments, use ``(n_segments, n_segments)``.
        max_size (int or None): Maximum image size at which the augmentation is performed.
            If the width or height of an image exceeds this value, it will be
            downscaled before the augmentation so that the longest side matches `max_size`.
            This is done to speed up the process. The final output image has the same size as the input image.
            Note that in case `p_replace` is below ``1.0``,
            the down-/upscaling will affect the not-replaced pixels too.
            Use ``None`` to apply no down-/upscaling.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    """

    class InitSchema(BaseTransformInitSchema):
        p_replace: ZeroOneRangeType = (0, 0.1)
        n_segments: OnePlusIntRangeType = (100, 100)
        max_size: int | None = Field(default=128, ge=1, description="Maximum image size for the transformation.")
        interpolation: InterpolationType = cv2.INTER_LINEAR

    def __init__(
        self,
        p_replace: ScaleFloatType = (0, 0.1),
        n_segments: ScaleIntType = (100, 100),
        max_size: int | None = 128,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.p_replace = cast(Tuple[float, float], p_replace)
        self.n_segments = cast(Tuple[int, int], n_segments)
        self.max_size = max_size
        self.interpolation = interpolation

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return ("p_replace", "n_segments", "max_size", "interpolation")

    def get_params(self) -> dict[str, Any]:
        n_segments = random.randint(self.n_segments[0], self.n_segments[1])
        p = random.uniform(*self.p_replace)
        return {"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}

    def apply(
        self,
        img: np.ndarray,
        replace_samples: Sequence[bool],
        n_segments: int,
        **kwargs: Any,
    ) -> np.ndarray:
        return fmain.superpixels(img, n_segments, replace_samples, self.max_size, self.interpolation)

`apply (self, img, replace_samples, n_segments, **kwargs)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    replace_samples: Sequence[bool],
    n_segments: int,
    **kwargs: Any,
) -> np.ndarray:
    return fmain.superpixels(img, n_segments, replace_samples, self.max_size, self.interpolation)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    n_segments = random.randint(self.n_segments[0], self.n_segments[1])
    p = random.uniform(*self.p_replace)
    return {"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return ("p_replace", "n_segments", "max_size", "interpolation")

`class TemplateTransform` `(templates, img_weight=(0.5, 0.5), template_weight=(0.5, 0.5), template_transform=None, name=None, always_apply=None, p=0.5)` [view source on GitHub] ¶

Apply blending of input image with specified templates

Parameters:

Name	Type	Description
`templates`	`numpy array or list of numpy arrays`	Images as template for transform.
`img_weight`	`ScaleFloatType`	If single float weight will be sampled from (0, img_weight). If tuple of float img_weight will be in range `[img_weight[0], img_weight[1])`. If you want fixed weight, use (img_weight, img_weight) Default: (0.5, 0.5).
`template_weight`	`ScaleFloatType`	If single float weight will be sampled from (0, template_weight). If tuple of float template_weight will be in range `[template_weight[0], template_weight[1])`. If you want fixed weight, use (template_weight, template_weight) Default: (0.5, 0.5).
`template_transform`	`Callable[..., Any] \| None`	transformation object which could be applied to template, must produce template the same size as input image.
`name`	`str \| None`	(Optional) Name of transform, used only for deserialization.
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class TemplateTransform(ImageOnlyTransform):
    """Apply blending of input image with specified templates
    Args:
        templates (numpy array or list of numpy arrays): Images as template for transform.
        img_weight: If single float weight will be sampled from (0, img_weight).
            If tuple of float img_weight will be in range `[img_weight[0], img_weight[1])`.
            If you want fixed weight, use (img_weight, img_weight)
            Default: (0.5, 0.5).
        template_weight: If single float weight will be sampled from (0, template_weight).
            If tuple of float template_weight will be in range `[template_weight[0], template_weight[1])`.
            If you want fixed weight, use (template_weight, template_weight)
            Default: (0.5, 0.5).
        template_transform: transformation object which could be applied to template,
            must produce template the same size as input image.
        name: (Optional) Name of transform, used only for deserialization.
        p: probability of applying the transform. Default: 0.5.
    Targets:
        image
    Image types:
        uint8, float32
    """

    class InitSchema(BaseTransformInitSchema):
        templates: np.ndarray | Sequence[np.ndarray] = Field(..., description="Images as template for transform.")
        img_weight: ZeroOneRangeType = (0.5, 0.5)
        template_weight: ZeroOneRangeType = (0.5, 0.5)
        template_transform: Callable[..., Any] | None = Field(
            default=None,
            description="Transformation object applied to template.",
        )
        name: str | None = Field(default=None, description="Name of transform, used only for deserialization.")

        @field_validator("templates")
        @classmethod
        def validate_templates(cls, v: np.ndarray | list[np.ndarray]) -> list[np.ndarray]:
            if isinstance(v, np.ndarray):
                return [v]
            if isinstance(v, list):
                if not all(isinstance(item, np.ndarray) for item in v):
                    msg = "All templates must be numpy arrays."
                    raise ValueError(msg)
                return v
            msg = "Templates must be a numpy array or a list of numpy arrays."
            raise TypeError(msg)

    def __init__(
        self,
        templates: np.ndarray | list[np.ndarray],
        img_weight: ScaleFloatType = (0.5, 0.5),
        template_weight: ScaleFloatType = (0.5, 0.5),
        template_transform: Callable[..., Any] | None = None,
        name: str | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.templates = templates
        self.img_weight = cast(Tuple[float, float], img_weight)
        self.template_weight = cast(Tuple[float, float], template_weight)
        self.template_transform = template_transform
        self.name = name

    def apply(
        self,
        img: np.ndarray,
        template: np.ndarray,
        img_weight: float,
        template_weight: float,
        **params: Any,
    ) -> np.ndarray:
        return add_weighted(img, img_weight, template, template_weight)

    def get_params(self) -> dict[str, float]:
        return {
            "img_weight": random.uniform(self.img_weight[0], self.img_weight[1]),
            "template_weight": random.uniform(self.template_weight[0], self.template_weight[1]),
        }

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        img = data["image"] if "image" in data else data["images"][0]
        template = random.choice(self.templates)

        if self.template_transform is not None:
            template = self.template_transform(image=template)["image"]

        if get_num_channels(template) not in [1, get_num_channels(img)]:
            msg = (
                "Template must be a single channel or "
                "has the same number of channels as input "
                f"image ({get_num_channels(img)}), got {get_num_channels(template)}"
            )
            raise ValueError(msg)

        if template.dtype != img.dtype:
            msg = "Image and template must be the same image type"
            raise ValueError(msg)

        if img.shape[:2] != template.shape[:2]:
            raise ValueError(f"Image and template must be the same size, got {img.shape[:2]} and {template.shape[:2]}")

        if get_num_channels(template) == 1 and get_num_channels(img) > 1:
            template = np.stack((template,) * get_num_channels(img), axis=-1)

        # in order to support grayscale image with dummy dim
        template = template.reshape(img.shape)

        return {"template": template}

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    def to_dict_private(self) -> dict[str, Any]:
        if self.name is None:
            msg = (
                "To make a TemplateTransform serializable you should provide the `name` argument, "
                "e.g. `TemplateTransform(name='my_transform', ...)`."
            )
            raise ValueError(msg)
        return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}

`apply (self, img, template, img_weight, template_weight, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(
    self,
    img: np.ndarray,
    template: np.ndarray,
    img_weight: float,
    template_weight: float,
    **params: Any,
) -> np.ndarray:
    return add_weighted(img, img_weight, template, template_weight)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, float]:
    return {
        "img_weight": random.uniform(self.img_weight[0], self.img_weight[1]),
        "template_weight": random.uniform(self.template_weight[0], self.template_weight[1]),
    }

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    img = data["image"] if "image" in data else data["images"][0]
    template = random.choice(self.templates)

    if self.template_transform is not None:
        template = self.template_transform(image=template)["image"]

    if get_num_channels(template) not in [1, get_num_channels(img)]:
        msg = (
            "Template must be a single channel or "
            "has the same number of channels as input "
            f"image ({get_num_channels(img)}), got {get_num_channels(template)}"
        )
        raise ValueError(msg)

    if template.dtype != img.dtype:
        msg = "Image and template must be the same image type"
        raise ValueError(msg)

    if img.shape[:2] != template.shape[:2]:
        raise ValueError(f"Image and template must be the same size, got {img.shape[:2]} and {template.shape[:2]}")

    if get_num_channels(template) == 1 and get_num_channels(img) > 1:
        template = np.stack((template,) * get_num_channels(img), axis=-1)

    # in order to support grayscale image with dummy dim
    template = template.reshape(img.shape)

    return {"template": template}

`class ToFloat` `(max_value=None, p=1.0, always_apply=None)` [view source on GitHub] ¶

Divide pixel values by max_value to get a float32 output array where all values lie in the range [0, 1.0]. If max_value is None the transform will try to infer the maximum value by inspecting the data type of the input image.

See Also: :class:~albumentations.augmentations.transforms.FromFloat

Parameters:

Name	Type	Description
`max_value`	`float \| None`	maximum possible input value. Default: None.
`p`	`float`	probability of applying the transform. Default: 1.0.

Targets

image

Image types: any type

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ToFloat(ImageOnlyTransform):
    """Divide pixel values by `max_value` to get a float32 output array where all values lie in the range [0, 1.0].
    If `max_value` is None the transform will try to infer the maximum value by inspecting the data type of the input
    image.

    See Also:
        :class:`~albumentations.augmentations.transforms.FromFloat`

    Args:
        max_value: maximum possible input value. Default: None.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        any type

    """

    class InitSchema(BaseTransformInitSchema):
        max_value: float | None = Field(default=None, description="Maximum possible input value.")
        p: ProbabilityType = 1

    def __init__(self, max_value: float | None = None, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.max_value = max_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.to_float(img, self.max_value)

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("max_value",)

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.to_float(img, self.max_value)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str]:
    return ("max_value",)

`class ToGray` [view source on GitHub] ¶

Convert the input RGB image to grayscale. If the input image is already grayscale, a warning is issued but the original image is returned unchanged. This transformation checks if the image is RGB; if not, it raises a TypeError.

Parameters:

Name	Type	Description
`p`	`float`	Probability of applying the transform. Default is 0.5.

Targets

image

Image types: uint8, float32

Note

This transform assumes the input image is in RGB format.

Exceptions:

Type	Description
`TypeError`	If the input image is not a 3-channel RGB image.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ToGray(ImageOnlyTransform):
    """Convert the input RGB image to grayscale. If the input image is already grayscale, a warning is issued but
    the original image is returned unchanged. This transformation checks if the image is RGB; if not, it raises
    a TypeError.

    Args:
        p (float): Probability of applying the transform. Default is 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        This transform assumes the input image is in RGB format.

    Raises:
        TypeError: If the input image is not a 3-channel RGB image.
    """

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if is_grayscale_image(img):
            warnings.warn("The image is already gray.", stacklevel=2)
            return img
        if not is_rgb_image(img):
            msg = "ToGray transformation expects 3-channel images."
            raise TypeError(msg)

        return fmain.to_gray(img)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if is_grayscale_image(img):
        warnings.warn("The image is already gray.", stacklevel=2)
        return img
    if not is_rgb_image(img):
        msg = "ToGray transformation expects 3-channel images."
        raise TypeError(msg)

    return fmain.to_gray(img)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class ToRGB` `(p=1.0, always_apply=None)` [view source on GitHub] ¶

Convert the input grayscale image to RGB.

Parameters:

Name	Type	Description
`p`	`float`	probability of applying the transform. Default: 1.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ToRGB(ImageOnlyTransform):
    """Convert the input grayscale image to RGB.

    Args:
        p: probability of applying the transform. Default: 1.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if is_rgb_image(img):
            warnings.warn("The image is already an RGB.", stacklevel=2)
            return img
        if not is_grayscale_image(img):
            msg = "ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1."
            raise TypeError(msg)

        return fmain.gray_to_rgb(img)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`init (self, p=1.0, always_apply=None)` `special` ¶

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/transforms.py

Python

def __init__(self, p: float = 1.0, always_apply: bool | None = None):
    super().__init__(p=p, always_apply=always_apply)

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if is_rgb_image(img):
        warnings.warn("The image is already an RGB.", stacklevel=2)
        return img
    if not is_grayscale_image(img):
        msg = "ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1."
        raise TypeError(msg)

    return fmain.gray_to_rgb(img)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class ToSepia` `(p=0.5, always_apply=None)` [view source on GitHub] ¶

Applies sepia filter to the input RGB image

Parameters:

Name	Type	Description
`p`	`float`	probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class ToSepia(ImageOnlyTransform):
    """Applies sepia filter to the input RGB image

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.sepia_transformation_matrix = np.array(
            [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]],
        )

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            msg = "ToSepia transformation expects 3-channel images."
            raise TypeError(msg)
        return fmain.linear_transformation_rgb(img, self.sepia_transformation_matrix)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()

`init (self, p=0.5, always_apply=None)` `special` ¶

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/transforms.py

Python

def __init__(self, p: float = 0.5, always_apply: bool | None = None):
    super().__init__(p, always_apply)
    self.sepia_transformation_matrix = np.array(
        [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]],
    )

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if not is_rgb_image(img):
        msg = "ToSepia transformation expects 3-channel images."
        raise TypeError(msg)
    return fmain.linear_transformation_rgb(img, self.sepia_transformation_matrix)

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[()]:
    return ()

`class UnsharpMask` `(blur_limit=(3, 7), sigma_limit=0.0, alpha=(0.2, 0.5), threshold=10, always_apply=None, p=0.5)` [view source on GitHub] ¶

Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

Parameters:

Name	Type	Description
`blur_limit`	`ScaleIntType`	maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`. If set single value `blur_limit` will be in range (0, blur_limit). Default: (3, 7).
`sigma_limit`	`ScaleFloatType`	Gaussian kernel standard deviation. Must be in range [0, inf). If set single value `sigma_limit` will be in range (0, sigma_limit). If set to 0 sigma will be computed as `sigma = 0.3((ksize-1)0.5 - 1) + 0.8`. Default: 0.
`alpha`	`ScaleFloatType`	range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
`threshold`	`int`	Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10.
`p`	`float`	probability of applying the transform. Default: 0.5.

Reference

arxiv.org/pdf/2107.10833.pdf

Targets

image

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py

Python

class UnsharpMask(ImageOnlyTransform):
    """Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

    Args:
        blur_limit: maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit: Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        alpha: range to choose the visibility of the sharpened image.
            At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
            Default: (0.2, 0.5).
        threshold: Value to limit sharpening only for areas with high pixel difference between original image
            and it's smoothed version. Higher threshold means less sharpening on flat areas.
            Must be in range [0, 255]. Default: 10.
        p: probability of applying the transform. Default: 0.5.

    Reference:
        arxiv.org/pdf/2107.10833.pdf

    Targets:
        image

    """

    class InitSchema(BaseTransformInitSchema):
        sigma_limit: NonNegativeFloatRangeType = 0
        alpha: ZeroOneRangeType = (0.2, 0.5)
        threshold: int = Field(default=10, ge=0, le=255, description="Threshold for limiting sharpening.")

        blur_limit: ScaleIntType = Field(
            default=(3, 7),
            description="Maximum kernel size for blurring the input image.",
        )

        @field_validator("blur_limit")
        @classmethod
        def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
            return process_blur_limit(value, info, min_value=3)

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0.0,
        alpha: ScaleFloatType = (0.2, 0.5),
        threshold: int = 10,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_limit = cast(Tuple[float, float], sigma_limit)
        self.alpha = cast(Tuple[float, float], alpha)
        self.threshold = threshold

    def get_params(self) -> dict[str, Any]:
        return {
            "ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
            "sigma": random.uniform(*self.sigma_limit),
            "alpha": random.uniform(*self.alpha),
        }

    def apply(self, img: np.ndarray, ksize: int, sigma: int, alpha: float, **params: Any) -> np.ndarray:
        return fmain.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "blur_limit", "sigma_limit", "alpha", "threshold"

`apply (self, img, ksize, sigma, alpha, **params)` ¶

Apply transform on image.

Source code in albumentations/augmentations/transforms.py

Python

def apply(self, img: np.ndarray, ksize: int, sigma: int, alpha: float, **params: Any) -> np.ndarray:
    return fmain.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py

Python

def get_params(self) -> dict[str, Any]:
    return {
        "ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
        "sigma": random.uniform(*self.sigma_limit),
        "alpha": random.uniform(*self.alpha),
    }

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "blur_limit", "sigma_limit", "alpha", "threshold"

`utils` ¶

`def check_range (value, lower_bound, upper_bound, name)` [view source on GitHub]¶

Checks if the given value is within the specified bounds

Parameters:

Name	Type	Description
`value`	`tuple[float, float]`	The value to check and convert. Can be a single float or a tuple of floats.
`lower_bound`	`float`	The lower bound for the range check.
`upper_bound`	`float`	The upper bound for the range check.
`name`	`str \| None`	The name of the parameter being checked. Used for error messages.

Exceptions:

Type	Description
`ValueError`	If the value is outside the bounds or if the tuple values are not ordered correctly.

Source code in albumentations/augmentations/utils.py

Python

def check_range(value: tuple[float, float], lower_bound: float, upper_bound: float, name: str | None) -> None:
    """Checks if the given value is within the specified bounds

    Args:
        value: The value to check and convert. Can be a single float or a tuple of floats.
        lower_bound: The lower bound for the range check.
        upper_bound: The upper bound for the range check.
        name: The name of the parameter being checked. Used for error messages.

    Raises:
        ValueError: If the value is outside the bounds or if the tuple values are not ordered correctly.
    """
    if not all(lower_bound <= x <= upper_bound for x in value):
        raise ValueError(f"All values in {name} must be within [{lower_bound}, {upper_bound}] for tuple inputs.")
    if not value[0] <= value[1]:
        raise ValueError(f"{name!s} tuple values must be ordered as (min, max). Got: {value}")

`check_version` ¶

`def parse_version (data)` [view source on GitHub]¶

Parses the version from the given JSON data.

Source code in albumentations/check_version.py

Python

def parse_version(data: str) -> str:
    """Parses the version from the given JSON data."""
    if data:
        try:
            json_data = json.loads(data)
            # Use .get() to avoid KeyError if 'version' is not present
            return json_data.get("info", {}).get("version", "")
        except json.JSONDecodeError:
            # This will handle malformed JSON data
            return ""
    return ""

`core` `special` ¶

`bbox_utils` ¶

`class BboxParams` `(format, label_fields=None, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0, check_each_transform=True, clip=False)` [view source on GitHub] ¶

Parameters of bounding boxes

Parameters:

Name	Type	Description
`format`	`str`	format of bounding boxes. Should be `coco`, `pascal_voc`, `albumentations` or `yolo`. The `coco` format `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200]. The `pascal_voc` format `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212]. The `albumentations` format is like `pascal_voc`, but normalized, in other words: `[x_min, y_min, x_max, y_max]`, e.g. [0.2, 0.3, 0.4, 0.5]. The `yolo` format `[x, y, width, height]`, e.g. [0.1, 0.2, 0.3, 0.4]; `x`, `y` - normalized bbox center; `width`, `height` - normalized bbox width and height.
`label_fields`	`list`	List of fields joined with boxes, e.g., labels.
`min_area`	`float`	Minimum area of a bounding box in pixels or normalized units. Bounding boxes with an area less than this value will be removed. Default: 0.0.
`min_visibility`	`float`	Minimum fraction of area for a bounding box to remain in the list. Bounding boxes with a visible area less than this fraction will be removed. Default: 0.0.
`min_width`	`float`	Minimum width of a bounding box in pixels or normalized units. Bounding boxes with a width less than this value will be removed. Default: 0.0.
`min_height`	`float`	Minimum height of a bounding box in pixels or normalized units. Bounding boxes with a height less than this value will be removed. Default: 0.0.
`check_each_transform`	`bool`	If True, bounding boxes will be checked after each dual transform. Default: True.
`clip`	`bool`	If True, bounding boxes will be clipped to the image borders before applying any transform. Default: False.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/bbox_utils.py

Python

class BboxParams(Params):
    """Parameters of bounding boxes

    Args:
        format (str): format of bounding boxes. Should be `coco`, `pascal_voc`, `albumentations` or `yolo`.

            The `coco` format
                `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
            The `pascal_voc` format
                `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
            The `albumentations` format
                is like `pascal_voc`, but normalized,
                in other words: `[x_min, y_min, x_max, y_max]`, e.g. [0.2, 0.3, 0.4, 0.5].
            The `yolo` format
                `[x, y, width, height]`, e.g. [0.1, 0.2, 0.3, 0.4];
                `x`, `y` - normalized bbox center; `width`, `height` - normalized bbox width and height.

        label_fields (list): List of fields joined with boxes, e.g., labels.
        min_area (float): Minimum area of a bounding box in pixels or normalized units.
            Bounding boxes with an area less than this value will be removed. Default: 0.0.
        min_visibility (float): Minimum fraction of area for a bounding box to remain in the list.
            Bounding boxes with a visible area less than this fraction will be removed. Default: 0.0.
        min_width (float): Minimum width of a bounding box in pixels or normalized units.
            Bounding boxes with a width less than this value will be removed. Default: 0.0.
        min_height (float): Minimum height of a bounding box in pixels or normalized units.
            Bounding boxes with a height less than this value will be removed. Default: 0.0.
        check_each_transform (bool): If True, bounding boxes will be checked after each dual transform. Default: True.
        clip (bool): If True, bounding boxes will be clipped to the image borders before applying any transform.
            Default: False.

    """

    def __init__(
        self,
        format: str,  # noqa: A002
        label_fields: Sequence[Any] | None = None,
        min_area: float = 0.0,
        min_visibility: float = 0.0,
        min_width: float = 0.0,
        min_height: float = 0.0,
        check_each_transform: bool = True,
        clip: bool = False,
    ):
        super().__init__(format, label_fields)
        self.min_area = min_area
        self.min_visibility = min_visibility
        self.min_width = min_width
        self.min_height = min_height
        self.check_each_transform = check_each_transform
        self.clip = clip

    def to_dict_private(self) -> dict[str, Any]:
        data = super().to_dict_private()
        data.update(
            {
                "min_area": self.min_area,
                "min_visibility": self.min_visibility,
                "min_width": self.min_width,
                "min_height": self.min_height,
                "check_each_transform": self.check_each_transform,
                "clip": self.clip,
            },
        )
        return data

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    @classmethod
    def get_class_fullname(cls) -> str:
        return "BboxParams"

`def calculate_bbox_area (bbox, rows, cols)` [view source on GitHub]¶

Calculate the area of a bounding box in (fractional) pixels.

Parameters:

Name	Type	Description
`bbox`	`BoxType`	A bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`float`	Area in (fractional) pixels of the (denormalized) bounding box.

Source code in albumentations/core/bbox_utils.py

Python

def calculate_bbox_area(bbox: BoxType, rows: int, cols: int) -> float:
    """Calculate the area of a bounding box in (fractional) pixels.

    Args:
        bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.

    Return:
        Area in (fractional) pixels of the (denormalized) bounding box.

    """
    bbox = denormalize_bbox(bbox, rows, cols)
    x_min, y_min, x_max, y_max = bbox[:4]
    return (x_max - x_min) * (y_max - y_min)

`def check_bbox (bbox)` [view source on GitHub]¶

Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums

Source code in albumentations/core/bbox_utils.py

Python

def check_bbox(bbox: BoxType) -> None:
    """Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums"""
    for name, value in zip(["x_min", "y_min", "x_max", "y_max"], bbox[:4]):
        if not 0 <= value <= 1 and not np.isclose(value, 0) and not np.isclose(value, 1):
            raise ValueError(f"Expected {name} for bbox {bbox} to be in the range [0.0, 1.0], got {value}.")
    x_min, y_min, x_max, y_max = bbox[:4]
    if x_max <= x_min:
        raise ValueError(f"x_max is less than or equal to x_min for bbox {bbox}.")
    if y_max <= y_min:
        raise ValueError(f"y_max is less than or equal to y_min for bbox {bbox}.")

`def check_bboxes (bboxes)` [view source on GitHub]¶

Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums

Source code in albumentations/core/bbox_utils.py

Python

def check_bboxes(bboxes: Sequence[BoxType]) -> None:
    """Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums"""
    for bbox in bboxes:
        check_bbox(bbox)

`def clip_bbox (bbox, rows, cols)` [view source on GitHub]¶

Clips the bounding box coordinates to ensure they fit within the boundaries of an image.

The function first denormalizes the bounding box coordinates from relative to absolute (pixel) values. Each coordinate is then clipped to the respective dimension of the image to ensure that the bounding box does not exceed the image's boundaries. Finally, the bounding box is normalized back to relative values.

Parameters:

Name	Type	Description
`bbox`	`BoxInternalType`	The bounding box in normalized format (relative to image dimensions).
`rows`	`int`	The number of rows (height) in the image.
`cols`	`int`	The number of columns (width) in the image.

Returns:

Type	Description
`BoxInternalType`	The clipped bounding box, normalized to the image dimensions.

Source code in albumentations/core/bbox_utils.py

Python

def clip_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
    """Clips the bounding box coordinates to ensure they fit within the boundaries of an image.

    The function first denormalizes the bounding box coordinates from relative to absolute (pixel) values.
    Each coordinate is then clipped to the respective dimension of the image to ensure that the bounding box
    does not exceed the image's boundaries. Finally, the bounding box is normalized back to relative values.

    Parameters:
        bbox (BoxInternalType): The bounding box in normalized format (relative to image dimensions).
        rows (int): The number of rows (height) in the image.
        cols (int): The number of columns (width) in the image.

    Returns:
        BoxInternalType: The clipped bounding box, normalized to the image dimensions.
    """
    x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]

    ## Note:
    # It could be tempting to use cols - 1 and rows - 1 as the upper bounds for the clipping

    # But this would cause the bounding box to be clipped to the image dimensions - 1 which is not what we want.
    # Bounding box lives not in the middle of pixels but between them.

    # Example: for image with height 100, width 100, the pixel values are in the range [0, 99]
    # but if we want bounding box to be 1 pixel width and height and lie on the boundary of the image
    # it will be described as [99, 99, 100, 100] => clip by image_size - 1 will lead to [99, 99, 99, 99]
    # which is incorrect

    # It could be also tempting to clip `x_min`` to `cols - 1`` and `y_min` to `rows - 1`, but this also leads
    # to another error. If image fully lies outside of the visible area and min_area is set to 0, then
    # the bounding box will be clipped to the image size - 1 and will be 1 pixel in size and fully visible,
    # but it should be completely removed.

    x_min = np.clip(x_min, 0, cols)
    x_max = np.clip(x_max, 0, cols)
    y_min = np.clip(y_min, 0, rows)
    y_max = np.clip(y_max, 0, rows)
    return cast(BoxType, normalize_bbox((x_min, y_min, x_max, y_max), rows, cols) + tuple(bbox[4:]))

`def convert_bbox_from_albumentations (bbox, target_format, rows, cols, check_validity=False)` [view source on GitHub]¶

Convert a bounding box from the format used by albumentations to a format, specified in target_format.

Parameters:

Name	Type	Description
`bbox`	`BoxType`	An albumentations bounding box `(x_min, y_min, x_max, y_max)`.
`target_format`	`str`	required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
`rows`	`int`	Image height.
`cols`	`int`	Image width.
`check_validity`	`bool`	Check if all boxes are valid boxes.

Returns:

Type	Description
`tuple`	A bounding box.

Note

The coco format of a bounding box looks like [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format of a bounding box looks like [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. The yolo format of a bounding box looks like [x, y, width, height], e.g. [0.3, 0.1, 0.05, 0.07].

Exceptions:

Type	Description
`ValueError`	if `target_format` is not equal to `coco`, `pascal_voc` or `yolo`.

Source code in albumentations/core/bbox_utils.py

Python

def convert_bbox_from_albumentations(
    bbox: BoxType,
    target_format: str,
    rows: int,
    cols: int,
    check_validity: bool = False,
) -> BoxType:
    """Convert a bounding box from the format used by albumentations to a format, specified in `target_format`.

    Args:
        bbox: An albumentations bounding box `(x_min, y_min, x_max, y_max)`.
        target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
        rows: Image height.
        cols: Image width.
        check_validity: Check if all boxes are valid boxes.

    Returns:
        tuple: A bounding box.

    Note:
        The `coco` format of a bounding box looks like `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
        The `pascal_voc` format of a bounding box looks like `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
        The `yolo` format of a bounding box looks like `[x, y, width, height]`, e.g. [0.3, 0.1, 0.05, 0.07].

    Raises:
        ValueError: if `target_format` is not equal to `coco`, `pascal_voc` or `yolo`.

    """
    if target_format not in {"coco", "pascal_voc", "yolo"}:
        raise ValueError(
            f"Unknown target_format {target_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'",
        )
    if check_validity:
        check_bbox(bbox)

    if target_format != "yolo":
        bbox = denormalize_bbox(bbox, rows, cols)
    if target_format == "coco":
        (x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
        width = x_max - x_min
        height = y_max - y_min
        bbox = cast(BoxType, (x_min, y_min, width, height, *tail))
    elif target_format == "yolo":
        (x_min, y_min, x_max, y_max), tail = bbox[:4], bbox[4:]
        x = (x_min + x_max) / 2.0
        y = (y_min + y_max) / 2.0
        width = x_max - x_min
        height = y_max - y_min
        bbox = cast(BoxType, (x, y, width, height, *tail))
    return bbox

`def convert_bbox_to_albumentations (bbox, source_format, rows, cols, check_validity=False)` [view source on GitHub]¶

Convert a bounding box from a format specified in source_format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in a form of (x_min, y_min, x_max, y_max) e.g. (0.15, 0.27, 0.67, 0.5).

Parameters:

Name	Type	Description
`bbox`	`BoxType`	A bounding box tuple.
`source_format`	`str`	format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.
`check_validity`	`bool`	Check if all boxes are valid boxes.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`tuple`	A bounding box `(x_min, y_min, x_max, y_max)`.

Note

The coco format of a bounding box looks like (x_min, y_min, width, height), e.g. (97, 12, 150, 200). The pascal_voc format of a bounding box looks like (x_min, y_min, x_max, y_max), e.g. (97, 12, 247, 212). The yolo format of a bounding box looks like (x, y, width, height), e.g. (0.3, 0.1, 0.05, 0.07); where x, y coordinates of the center of the box, all values normalized to 1 by image height and width.

Exceptions:

Type	Description
`ValueError`	if `target_format` is not equal to `coco` or `pascal_voc`, or `yolo`.
`ValueError`	If in YOLO format all labels not in range (0, 1).

Source code in albumentations/core/bbox_utils.py

Python

def convert_bbox_to_albumentations(
    bbox: BoxType,
    source_format: str,
    rows: int,
    cols: int,
    check_validity: bool = False,
) -> BoxType:
    """Convert a bounding box from a format specified in `source_format` to the format used by albumentations:
    normalized coordinates of top-left and bottom-right corners of the bounding box in a form of
    `(x_min, y_min, x_max, y_max)` e.g. `(0.15, 0.27, 0.67, 0.5)`.

    Args:
        bbox: A bounding box tuple.
        source_format: format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.
        check_validity: Check if all boxes are valid boxes.
        rows: Image height.
        cols: Image width.

    Returns:
        tuple: A bounding box `(x_min, y_min, x_max, y_max)`.

    Note:
        The `coco` format of a bounding box looks like `(x_min, y_min, width, height)`, e.g. (97, 12, 150, 200).
        The `pascal_voc` format of a bounding box looks like `(x_min, y_min, x_max, y_max)`, e.g. (97, 12, 247, 212).
        The `yolo` format of a bounding box looks like `(x, y, width, height)`, e.g. (0.3, 0.1, 0.05, 0.07);
        where `x`, `y` coordinates of the center of the box, all values normalized to 1 by image height and width.

    Raises:
        ValueError: if `target_format` is not equal to `coco` or `pascal_voc`, or `yolo`.
        ValueError: If in YOLO format all labels not in range (0, 1).

    """
    if source_format not in {"coco", "pascal_voc", "yolo"}:
        raise ValueError(
            f"Unknown source_format {source_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'",
        )

    if source_format == "coco":
        (x_min, y_min, width, height), tail = bbox[:4], bbox[4:]
        x_max = x_min + width
        y_max = y_min + height
    elif source_format == "yolo":
        # https://github.com/pjreddie/darknet/blob/f6d861736038da22c9eb0739dca84003c5a5e275/scripts/voc_label.py#L12
        _bbox = np.array(bbox[:4])
        if check_validity and np.any((_bbox <= 0) | (_bbox > 1)):
            msg = "In YOLO format all coordinates must be float and in range (0, 1]"
            raise ValueError(msg)

        (x, y, width, height), tail = bbox[:4], bbox[4:]

        w_half, h_half = width / 2, height / 2
        x_min = x - w_half
        y_min = y - h_half
        x_max = x_min + width
        y_max = y_min + height
    else:
        (x_min, y_min, x_max, y_max), tail = bbox[:4], bbox[4:]

    bbox = (x_min, y_min, x_max, y_max, *tuple(tail))

    if source_format != "yolo":
        bbox = normalize_bbox(bbox, rows, cols)
    if check_validity:
        check_bbox(bbox)
    return bbox

`def convert_bboxes_from_albumentations (bboxes, target_format, rows, cols, check_validity=False)` [view source on GitHub]¶

Convert a list of bounding boxes from the format used by albumentations to a format, specified in target_format.

Parameters:

Name	Type	Description
`bboxes`	`Sequence[BoxType]`	list of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
`target_format`	`str`	required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
`rows`	`int`	Image height.
`cols`	`int`	Image width.
`check_validity`	`bool`	Check if all boxes are valid boxes.

Returns:

Type	Description
`list[BoxType]`	list of bounding boxes.

Source code in albumentations/core/bbox_utils.py

Python

def convert_bboxes_from_albumentations(
    bboxes: Sequence[BoxType],
    target_format: str,
    rows: int,
    cols: int,
    check_validity: bool = False,
) -> list[BoxType]:
    """Convert a list of bounding boxes from the format used by albumentations to a format, specified
    in `target_format`.

    Args:
        bboxes: list of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
        target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
        rows: Image height.
        cols: Image width.
        check_validity: Check if all boxes are valid boxes.

    Returns:
        list of bounding boxes.

    """
    return [convert_bbox_from_albumentations(bbox, target_format, rows, cols, check_validity) for bbox in bboxes]

`def convert_bboxes_to_albumentations (bboxes, source_format, rows, cols, check_validity=False)` [view source on GitHub]¶

Convert a list bounding boxes from a format specified in source_format to the format used by albumentations

Source code in albumentations/core/bbox_utils.py

Python

def convert_bboxes_to_albumentations(
    bboxes: Sequence[BoxType],
    source_format: str,
    rows: int,
    cols: int,
    check_validity: bool = False,
) -> list[BoxType]:
    """Convert a list bounding boxes from a format specified in `source_format` to the format used by albumentations"""
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]

`def denormalize_bbox (bbox, rows, cols)` [view source on GitHub]¶

Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates by image height. This is an inverse operation for :func:~albumentations.augmentations.bbox.normalize_bbox.

Parameters:

Name	Type	Description
`bbox`	`BoxType`	Normalized bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`BoxType`	Denormalized bounding box `(x_min, y_min, x_max, y_max)`.

Exceptions:

Type	Description
`ValueError`	If rows or cols is less or equal zero

Source code in albumentations/core/bbox_utils.py

Python

def denormalize_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
    """Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates
    by image height. This is an inverse operation for :func:`~albumentations.augmentations.bbox.normalize_bbox`.

    Args:
        bbox: Normalized bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.

    Returns:
        Denormalized bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: If rows or cols is less or equal zero

    """
    tail: tuple[Any, ...]
    (x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])

    if rows <= 0:
        msg = "Argument rows must be positive integer"
        raise ValueError(msg)
    if cols <= 0:
        msg = "Argument cols must be positive integer"
        raise ValueError(msg)

    x_min, x_max = x_min * cols, x_max * cols
    y_min, y_max = y_min * rows, y_max * rows

    return cast(BoxType, (x_min, y_min, x_max, y_max, *tail))

`def denormalize_bboxes (bboxes, rows, cols)` [view source on GitHub]¶

Denormalize a list of bounding boxes.

Parameters:

Name	Type	Description
`bboxes`	`Sequence[BoxType]`	Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`list`	Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

Source code in albumentations/core/bbox_utils.py

Python

def denormalize_bboxes(bboxes: Sequence[BoxType], rows: int, cols: int) -> list[BoxType]:
    """Denormalize a list of bounding boxes.

    Args:
        bboxes: Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        rows: Image height.
        cols: Image width.

    Returns:
        list: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

    """
    return [denormalize_bbox(bbox, rows, cols) for bbox in bboxes]

`def filter_bboxes (bboxes, rows, cols, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0)` [view source on GitHub]¶

Remove bounding boxes that either lie outside of the visible area by more then min_visibility or whose area in pixels is under the threshold set by min_area. Also it crops boxes to final image size.

Parameters:

Name	Type	Description
`bboxes`	`Sequence[BoxType]`	list of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.
`min_area`	`float`	Minimum area of a bounding box. All bounding boxes whose visible area in pixels. is less than this value will be removed. Default: 0.0.
`min_visibility`	`float`	Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.
`min_width`	`float`	Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0.
`min_height`	`float`	Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0.

Returns:

Type	Description
`list[BoxType]`	list of bounding boxes.

Source code in albumentations/core/bbox_utils.py

Python

def filter_bboxes(
    bboxes: Sequence[BoxType],
    rows: int,
    cols: int,
    min_area: float = 0.0,
    min_visibility: float = 0.0,
    min_width: float = 0.0,
    min_height: float = 0.0,
) -> list[BoxType]:
    """Remove bounding boxes that either lie outside of the visible area by more then min_visibility
    or whose area in pixels is under the threshold set by `min_area`. Also it crops boxes to final image size.

    Args:
        bboxes: list of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.
        min_area: Minimum area of a bounding box. All bounding boxes whose visible area in pixels.
            is less than this value will be removed. Default: 0.0.
        min_visibility: Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.
        min_width: Minimum width of a bounding box. All bounding boxes whose width is
            less than this value will be removed. Default: 0.0.
        min_height: Minimum height of a bounding box. All bounding boxes whose height is
            less than this value will be removed. Default: 0.0.

    Returns:
        list of bounding boxes.

    """
    resulting_boxes: list[BoxType] = []
    for i in range(len(bboxes)):
        bbox = bboxes[i]
        # Calculate areas of bounding box before and after clipping.
        transformed_box_area = calculate_bbox_area(bbox, rows, cols)
        clipped_bbox = clip_bbox(bbox, rows, cols)

        bbox, tail = clipped_bbox[:4], clipped_bbox[4:]

        clipped_box_area = calculate_bbox_area(bbox, rows, cols)

        # Calculate width and height of the clipped bounding box.
        x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
        clipped_width, clipped_height = x_max - x_min, y_max - y_min

        if (
            clipped_box_area != 0  # to ensure transformed_box_area!=0 and to handle min_area=0 or min_visibility=0
            and clipped_box_area >= min_area
            and clipped_box_area / transformed_box_area >= min_visibility
            and clipped_width >= min_width
            and clipped_height >= min_height
        ):
            resulting_boxes.append(cast(BoxType, bbox + tail))
    return resulting_boxes

`def filter_bboxes_by_visibility (original_shape, bboxes, transformed_shape, transformed_bboxes, threshold=0.0, min_area=0.0)` [view source on GitHub]¶

Filter bounding boxes and return only those boxes whose visibility after transformation is above the threshold and minimal area of bounding box in pixels is more then min_area.

Parameters:

Name	Type	Description
`original_shape`	`Sequence[int]`	Original image shape `(height, width, ...)`.
`bboxes`	`Sequence[BoxType]`	Original bounding boxes `[(x_min, y_min, x_max, y_max)]`.
`transformed_shape`	`Sequence[int]`	Transformed image shape `(height, width)`.
`transformed_bboxes`	`Sequence[BoxType]`	Transformed bounding boxes `[(x_min, y_min, x_max, y_max)]`.
`threshold`	`float`	visibility threshold. Should be a value in the range [0.0, 1.0].
`min_area`	`float`	Minimal area threshold.

Returns:

Type	Description
`list[BoxType]`	Filtered bounding boxes `[(x_min, y_min, x_max, y_max)]`.

Source code in albumentations/core/bbox_utils.py

Python

def filter_bboxes_by_visibility(
    original_shape: Sequence[int],
    bboxes: Sequence[BoxType],
    transformed_shape: Sequence[int],
    transformed_bboxes: Sequence[BoxType],
    threshold: float = 0.0,
    min_area: float = 0.0,
) -> list[BoxType]:
    """Filter bounding boxes and return only those boxes whose visibility after transformation is above
    the threshold and minimal area of bounding box in pixels is more then min_area.

    Args:
        original_shape: Original image shape `(height, width, ...)`.
        bboxes: Original bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        transformed_shape: Transformed image shape `(height, width)`.
        transformed_bboxes: Transformed bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        threshold: visibility threshold. Should be a value in the range [0.0, 1.0].
        min_area: Minimal area threshold.

    Returns:
        Filtered bounding boxes `[(x_min, y_min, x_max, y_max)]`.

    """
    img_height, img_width = original_shape[:2]
    transformed_img_height, transformed_img_width = transformed_shape[:2]

    visible_bboxes = []
    for bbox, transformed_bbox in zip(bboxes, transformed_bboxes):
        if not all(0.0 <= value <= 1.0 for value in transformed_bbox[:4]):
            continue
        bbox_area = calculate_bbox_area(bbox, img_height, img_width)
        transformed_bbox_area = calculate_bbox_area(transformed_bbox, transformed_img_height, transformed_img_width)
        if transformed_bbox_area < min_area:
            continue
        visibility = transformed_bbox_area / bbox_area
        if visibility >= threshold:
            visible_bboxes.append(transformed_bbox)
    return visible_bboxes

`def normalize_bbox (bbox, rows, cols)` [view source on GitHub]¶

Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates by image height.

Parameters:

Name	Type	Description
`bbox`	`BoxType`	Denormalized bounding box `(x_min, y_min, x_max, y_max)`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`BoxType`	Normalized bounding box `(x_min, y_min, x_max, y_max)`.

Exceptions:

Type	Description
`ValueError`	If rows or cols is less or equal zero

Source code in albumentations/core/bbox_utils.py

Python

def normalize_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
    """Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates
    by image height.

    Args:
        bbox: Denormalized bounding box `(x_min, y_min, x_max, y_max)`.
        rows: Image height.
        cols: Image width.

    Returns:
        Normalized bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: If rows or cols is less or equal zero

    """
    if rows <= 0:
        msg = "Argument rows must be positive integer"
        raise ValueError(msg)
    if cols <= 0:
        msg = "Argument cols must be positive integer"
        raise ValueError(msg)

    tail: tuple[Any, ...]
    (x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
    x_min /= cols
    x_max /= cols
    y_min /= rows
    y_max /= rows

    return cast(BoxType, (x_min, y_min, x_max, y_max, *tail))

`def normalize_bboxes (bboxes, rows, cols)` [view source on GitHub]¶

Normalize a list of bounding boxes.

Parameters:

Name	Type	Description
`bboxes`	`Sequence[BoxType]`	Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
`rows`	`int`	Image height.
`cols`	`int`	Image width.

Returns:

Type	Description
`list[BoxType]`	Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

Source code in albumentations/core/bbox_utils.py

Python

def normalize_bboxes(bboxes: Sequence[BoxType], rows: int, cols: int) -> list[BoxType]:
    """Normalize a list of bounding boxes.

    Args:
        bboxes: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
        rows: Image height.
        cols: Image width.

    Returns:
        Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.

    """
    return [normalize_bbox(bbox, rows, cols) for bbox in bboxes]

`def union_of_bboxes (bboxes, erosion_rate)` [view source on GitHub]¶

Calculate union of bounding boxes. Boxes could be in albumentations or Pascal Voc format.

Parameters:

Name	Type	Description
`bboxes`	`list[tuple]`	List of bounding boxes
`erosion_rate`	`float`	How much each bounding box can be shrunk, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox lose its volume.

Returns:

Type	Description
`Optional[tuple]`	A bounding box `(x_min, y_min, x_max, y_max)` or None if no bboxes are given or if the bounding boxes become invalid after erosion.

Source code in albumentations/core/bbox_utils.py

Python

def union_of_bboxes(bboxes: Sequence[BoxType], erosion_rate: float) -> BoxInternalType | None:
    """Calculate union of bounding boxes. Boxes could be in albumentations or Pascal Voc format.

    Args:
        bboxes (list[tuple]): List of bounding boxes
        erosion_rate (float): How much each bounding box can be shrunk, useful for erosive cropping.
            Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox lose its volume.

    Returns:
        Optional[tuple]: A bounding box `(x_min, y_min, x_max, y_max)` or None if no bboxes are given or if
                         the bounding boxes become invalid after erosion.
    """
    if not bboxes:
        return None

    if len(bboxes) == 1:
        if erosion_rate == 1:
            return None
        if erosion_rate == 0:
            return bboxes[0][:4]

    bboxes_np = np.array([bbox[:4] for bbox in bboxes])
    x_min = bboxes_np[:, 0]
    y_min = bboxes_np[:, 1]
    x_max = bboxes_np[:, 2]
    y_max = bboxes_np[:, 3]

    bbox_width = x_max - x_min
    bbox_height = y_max - y_min

    # Adjust erosion rate to shrink bounding boxes accordingly
    lim_x1 = x_min + erosion_rate * 0.5 * bbox_width
    lim_y1 = y_min + erosion_rate * 0.5 * bbox_height
    lim_x2 = x_max - erosion_rate * 0.5 * bbox_width
    lim_y2 = y_max - erosion_rate * 0.5 * bbox_height

    x1 = np.min(lim_x1)
    y1 = np.min(lim_y1)
    x2 = np.max(lim_x2)
    y2 = np.max(lim_y2)

    if x1 == x2 or y1 == y2:
        return None

    return x1, y1, x2, y2

`composition` ¶

`class Compose` `(transforms, bbox_params=None, keypoint_params=None, additional_targets=None, p=1.0, is_check_shapes=True, strict=True, return_params=False, save_key='applied_params')` [view source on GitHub] ¶

Compose transforms and handle all transformations regarding bounding boxes

Parameters:

Name	Type	Description
`transforms`	`list`	list of transformations to compose.
`bbox_params`	`BboxParams`	Parameters for bounding boxes transforms
`keypoint_params`	`KeypointParams`	Parameters for keypoints transforms
`additional_targets`	`dict`	Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}
`p`	`float`	probability of applying all list of transforms. Default: 1.0.
`is_check_shapes`	`bool`	If True shapes consistency of images/mask/masks would be checked on each call. If you would like to disable this check - pass False (do it only if you are sure in your data consistency).
`strict`	`bool`	If True, unknown keys will raise an error. If False, unknown keys will be ignored. Default: True.
`return_params`	`bool`	if True returns params of each applied transform
`save_key`	`str`	key to save applied params, default is 'applied_params'

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py

Python

class Compose(BaseCompose, HubMixin):
    """Compose transforms and handle all transformations regarding bounding boxes

    Args:
        transforms (list): list of transformations to compose.
        bbox_params (BboxParams): Parameters for bounding boxes transforms
        keypoint_params (KeypointParams): Parameters for keypoints transforms
        additional_targets (dict): Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}
        p (float): probability of applying all list of transforms. Default: 1.0.
        is_check_shapes (bool): If True shapes consistency of images/mask/masks would be checked on each call. If you
            would like to disable this check - pass False (do it only if you are sure in your data consistency).
        strict (bool): If True, unknown keys will raise an error. If False, unknown keys will be ignored. Default: True.
        return_params (bool): if True returns params of each applied transform
        save_key (str): key to save applied params, default is 'applied_params'

    """

    def __init__(
        self,
        transforms: TransformsSeqType,
        bbox_params: dict[str, Any] | BboxParams | None = None,
        keypoint_params: dict[str, Any] | KeypointParams | None = None,
        additional_targets: dict[str, str] | None = None,
        p: float = 1.0,
        is_check_shapes: bool = True,
        strict: bool = True,
        return_params: bool = False,
        save_key: str = "applied_params",
    ):
        super().__init__(transforms, p)

        if bbox_params:
            if isinstance(bbox_params, dict):
                b_params = BboxParams(**bbox_params)
            elif isinstance(bbox_params, BboxParams):
                b_params = bbox_params
            else:
                msg = "unknown format of bbox_params, please use `dict` or `BboxParams`"
                raise ValueError(msg)
            self.processors["bboxes"] = BboxProcessor(b_params)

        if keypoint_params:
            if isinstance(keypoint_params, dict):
                k_params = KeypointParams(**keypoint_params)
            elif isinstance(keypoint_params, KeypointParams):
                k_params = keypoint_params
            else:
                msg = "unknown format of keypoint_params, please use `dict` or `KeypointParams`"
                raise ValueError(msg)
            self.processors["keypoints"] = KeypointsProcessor(k_params)

        for proc in self.processors.values():
            proc.ensure_transforms_valid(self.transforms)

        self.add_targets(additional_targets)
        if not self.transforms:  # if no transforms -> do nothing, all keys will be available
            self._available_keys.update(AVAILABLE_KEYS)

        self.is_check_args = True
        self.strict = strict

        self.is_check_shapes = is_check_shapes
        self.check_each_transform = tuple(  # processors that checks after each transform
            proc for proc in self.processors.values() if getattr(proc.params, "check_each_transform", False)
        )
        self._set_check_args_for_transforms(self.transforms)

        self.return_params = return_params
        if return_params:
            self.save_key = save_key
            self._available_keys.add(save_key)
            self._transforms_dict = get_transforms_dict(self.transforms)
            self.set_deterministic(True, save_key=save_key)

    def _set_check_args_for_transforms(self, transforms: TransformsSeqType) -> None:
        for transform in transforms:
            if isinstance(transform, BaseCompose):
                self._set_check_args_for_transforms(transform.transforms)
                transform.check_each_transform = self.check_each_transform
                transform.processors = self.processors
            if isinstance(transform, Compose):
                transform.disable_check_args_private()

    def disable_check_args_private(self) -> None:
        self.is_check_args = False
        self.strict = False
        self.main_compose = False

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if args:
            msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
            raise KeyError(msg)

        if not isinstance(force_apply, (bool, int)):
            msg = "force_apply must have bool or int type"
            raise TypeError(msg)

        if self.return_params and self.main_compose:
            data[self.save_key] = OrderedDict()

        need_to_run = force_apply or random.random() < self.p
        if not need_to_run:
            return data

        self.preprocess(data)

        for t in self.transforms:
            data = t(**data)
            data = self.check_data_post_transform(data)

        return self.postprocess(data)

    def run_with_params(self, *, params: dict[int, dict[str, Any]], **data: Any) -> dict[str, Any]:
        """Run transforms with given parameters. Available only for Compose with `return_params=True`."""
        if self._transforms_dict is None:
            raise RuntimeError("`run_with_params` is not available for Compose with `return_params=False`.")

        self.preprocess(data)

        for tr_id, param in params.items():
            tr = self._transforms_dict[tr_id]
            data = tr.apply_with_params(param, **data)
            data = self.check_data_post_transform(data)

        return self.postprocess(data)

    def preprocess(self, data: Any) -> None:
        if self.strict:
            for data_name in data:
                if data_name not in self._available_keys and data_name not in MASK_KEYS and data_name not in IMAGE_KEYS:
                    msg = f"Key {data_name} is not in available keys."
                    raise ValueError(msg)
        if self.is_check_args:
            self._check_args(**data)
        if self.main_compose:
            for p in self.processors.values():
                p.ensure_data_valid(data)
            for p in self.processors.values():
                p.preprocess(data)

    def postprocess(self, data: dict[str, Any]) -> dict[str, Any]:
        if self.main_compose:
            data = Compose._make_targets_contiguous(data)  # ensure output targets are contiguous
            for p in self.processors.values():
                p.postprocess(data)
        return data

    def to_dict_private(self) -> dict[str, Any]:
        dictionary = super().to_dict_private()
        bbox_processor = self.processors.get("bboxes")
        keypoints_processor = self.processors.get("keypoints")
        dictionary.update(
            {
                "bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
                "keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
                "additional_targets": self.additional_targets,
                "is_check_shapes": self.is_check_shapes,
            },
        )
        return dictionary

    def get_dict_with_id(self) -> dict[str, Any]:
        dictionary = super().get_dict_with_id()
        bbox_processor = self.processors.get("bboxes")
        keypoints_processor = self.processors.get("keypoints")
        dictionary.update(
            {
                "bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
                "keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
                "additional_targets": self.additional_targets,
                "params": None,
                "is_check_shapes": self.is_check_shapes,
            },
        )
        return dictionary

    def _check_args(self, **kwargs: Any) -> None:
        shapes = []

        for data_name, data in kwargs.items():
            internal_data_name = self._additional_targets.get(data_name, data_name)
            if internal_data_name in CHECKED_SINGLE:
                if not isinstance(data, np.ndarray):
                    raise TypeError(f"{data_name} must be numpy array type")
                shapes.append(data.shape[:2])
            if internal_data_name in CHECKED_MULTI and data is not None and len(data):
                if not isinstance(data[0], np.ndarray):
                    raise TypeError(f"{data_name} must be list of numpy arrays")
                shapes.append(data[0].shape[:2])
            if internal_data_name in CHECK_BBOX_PARAM and self.processors.get("bboxes") is None:
                msg = "bbox_params must be specified for bbox transformations"
                raise ValueError(msg)

            if internal_data_name in CHECK_KEYPOINTS_PARAM and self.processors.get("keypoints") is None:
                msg = "keypoints_params must be specified for keypoint transformations"
                raise ValueError(msg)

        if self.is_check_shapes and shapes and shapes.count(shapes[0]) != len(shapes):
            msg = (
                "Height and Width of image, mask or masks should be equal. You can disable shapes check "
                "by setting a parameter is_check_shapes=False of Compose class (do it only if you are sure "
                "about your data consistency)."
            )
            raise ValueError(msg)

    @staticmethod
    def _make_targets_contiguous(data: Any) -> dict[str, Any]:
        result = {}
        for key, value in data.items():
            if isinstance(value, np.ndarray):
                result[key] = np.ascontiguousarray(value)
            else:
                result[key] = value

        return result

`run_with_params (self, *, params, **data)` ¶

Run transforms with given parameters. Available only for Compose with return_params=True.

Source code in albumentations/core/composition.py

Python

def run_with_params(self, *, params: dict[int, dict[str, Any]], **data: Any) -> dict[str, Any]:
    """Run transforms with given parameters. Available only for Compose with `return_params=True`."""
    if self._transforms_dict is None:
        raise RuntimeError("`run_with_params` is not available for Compose with `return_params=False`.")

    self.preprocess(data)

    for tr_id, param in params.items():
        tr = self._transforms_dict[tr_id]
        data = tr.apply_with_params(param, **data)
        data = self.check_data_post_transform(data)

    return self.postprocess(data)

`class OneOf` `(transforms, p=0.5)` [view source on GitHub] ¶

Select one of transforms to apply. Selected transform will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name	Type	Description
`transforms`	`list`	list of transformations to compose.
`p`	`float`	probability of applying selected transform. Default: 0.5.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py

Python

class OneOf(BaseCompose):
    """Select one of transforms to apply. Selected transform will be called with `force_apply=True`.
    Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

    Args:
        transforms (list): list of transformations to compose.
        p (float): probability of applying selected transform. Default: 0.5.

    """

    def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
        super().__init__(transforms, p)
        transforms_ps = [t.p for t in self.transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if self.transforms_ps and (force_apply or random.random() < self.p):
            idx: int = random_utils.choice(len(self.transforms), p=self.transforms_ps)
            t = self.transforms[idx]
            data = t(force_apply=True, **data)
        return data

`class OneOrOther` `(first=None, second=None, transforms=None, p=0.5)` [view source on GitHub] ¶

Select one or another transform to apply. Selected transform will be called with force_apply=True.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py

Python

class OneOrOther(BaseCompose):
    """Select one or another transform to apply. Selected transform will be called with `force_apply=True`."""

    def __init__(
        self,
        first: TransformType | None = None,
        second: TransformType | None = None,
        transforms: TransformsSeqType | None = None,
        p: float = 0.5,
    ):
        if transforms is None:
            if first is None or second is None:
                msg = "You must set both first and second or set transforms argument."
                raise ValueError(msg)
            transforms = [first, second]
        super().__init__(transforms, p)
        if len(self.transforms) != NUM_ONEOF_TRANSFORMS:
            warnings.warn("Length of transforms is not equal to 2.", stacklevel=2)

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if random.random() < self.p:
            return self.transforms[0](force_apply=True, **data)

        return self.transforms[-1](force_apply=True, **data)

`class SelectiveChannelTransform` `(transforms, channels=(0, 1, 2), p=1.0)` [view source on GitHub] ¶

A transformation class to apply specified transforms to selected channels of an image.

This class extends BaseCompose to allow selective application of transformations to specified image channels. It extracts the selected channels, applies the transformations, and then reinserts the transformed channels back into their original positions in the image.

Parameters:

Name	Type	Description
`transforms`	`TransformsSeqType`	A sequence of transformations (from Albumentations) to be applied to the specified channels.
`channels`	`Sequence[int]`	A sequence of integers specifying the indices of the channels to which the transforms should be applied.
`p`	`float`	Probability that the transform will be applied; the default is 1.0 (always apply).

Methods

call(args, *kwargs): Applies the transforms to the image according to the specified channels. The input data should include 'image' key with the image array.

Returns:

Type	Description
`dict[str, Any]`	The transformed data dictionary, which includes the transformed 'image' key.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py

Python

class SelectiveChannelTransform(BaseCompose):
    """A transformation class to apply specified transforms to selected channels of an image.

    This class extends BaseCompose to allow selective application of transformations to
    specified image channels. It extracts the selected channels, applies the transformations,
    and then reinserts the transformed channels back into their original positions in the image.

    Parameters:
        transforms (TransformsSeqType):
            A sequence of transformations (from Albumentations) to be applied to the specified channels.
        channels (Sequence[int]):
            A sequence of integers specifying the indices of the channels to which the transforms should be applied.
        p (float):
            Probability that the transform will be applied; the default is 1.0 (always apply).

    Methods:
        __call__(*args, **kwargs):
            Applies the transforms to the image according to the specified channels.
            The input data should include 'image' key with the image array.

    Returns:
        dict[str, Any]: The transformed data dictionary, which includes the transformed 'image' key.
    """

    def __init__(
        self,
        transforms: TransformsSeqType,
        channels: Sequence[int] = (0, 1, 2),
        p: float = 1.0,
    ) -> None:
        super().__init__(transforms, p)
        self.channels = channels

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if force_apply or random.random() < self.p:
            image = data["image"]

            selected_channels = image[:, :, self.channels]
            sub_image = np.ascontiguousarray(selected_channels)

            for t in self.transforms:
                sub_image = t(image=sub_image)["image"]

            transformed_channels = cv2.split(sub_image)
            output_img = image.copy()

            for idx, channel in zip(self.channels, transformed_channels):
                output_img[:, :, idx] = channel

            data["image"] = np.ascontiguousarray(output_img)

        return data

`class Sequential` `(transforms, p=0.5)` [view source on GitHub] ¶

Sequentially applies all transforms to targets.

Note

This transform is not intended to be a replacement for Compose. Instead, it should be used inside Compose the same way OneOf or OneOrOther are used. For instance, you can combine OneOf with Sequential to create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly chose sequence to input data (see the Example section for an example definition of such pipeline).

Examples:

Python

>>> import albumentations as A
>>> transform = A.Compose([
>>>    A.OneOf([
>>>        A.Sequential([
>>>            A.HorizontalFlip(p=0.5),
>>>            A.ShiftScaleRotate(p=0.5),
>>>        ]),
>>>        A.Sequential([
>>>            A.VerticalFlip(p=0.5),
>>>            A.RandomBrightnessContrast(p=0.5),
>>>        ]),
>>>    ], p=1)
>>> ])

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py

Python

class Sequential(BaseCompose):
    """Sequentially applies all transforms to targets.

    Note:
        This transform is not intended to be a replacement for `Compose`. Instead, it should be used inside `Compose`
        the same way `OneOf` or `OneOrOther` are used. For instance, you can combine `OneOf` with `Sequential` to
        create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly
        chose sequence to input data (see the `Example` section for an example definition of such pipeline).

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        >>>    A.OneOf([
        >>>        A.Sequential([
        >>>            A.HorizontalFlip(p=0.5),
        >>>            A.ShiftScaleRotate(p=0.5),
        >>>        ]),
        >>>        A.Sequential([
        >>>            A.VerticalFlip(p=0.5),
        >>>            A.RandomBrightnessContrast(p=0.5),
        >>>        ]),
        >>>    ], p=1)
        >>> ])

    """

    def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
        super().__init__(transforms, p)

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode or force_apply or random.random() < self.p:
            for t in self.transforms:
                data = t(**data)
                data = self.check_data_post_transform(data)
        return data

`class SomeOf` `(transforms, n, replace=True, p=1)` [view source on GitHub] ¶

Select N transforms to apply. Selected transforms will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name	Type	Description
`transforms`	`list`	list of transformations to compose.
`n`	`int`	number of transforms to apply.
`replace`	`bool`	Whether the sampled transforms are with or without replacement. Default: True.
`p`	`float`	probability of applying selected transform. Default: 1.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py

Python

class SomeOf(BaseCompose):
    """Select N transforms to apply. Selected transforms will be called with `force_apply=True`.
    Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

    Args:
        transforms (list): list of transformations to compose.
        n (int): number of transforms to apply.
        replace (bool): Whether the sampled transforms are with or without replacement. Default: True.
        p (float): probability of applying selected transform. Default: 1.

    """

    def __init__(self, transforms: TransformsSeqType, n: int, replace: bool = True, p: float = 1):
        super().__init__(transforms, p)
        self.n = n
        self.replace = replace
        transforms_ps = [t.p for t in self.transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, *arg: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
                data = self.check_data_post_transform(data)
            return data

        if self.transforms_ps and (force_apply or random.random() < self.p):
            idx = random_utils.choice(len(self.transforms), size=self.n, replace=self.replace, p=self.transforms_ps)
            for i in idx:
                t = self.transforms[i]
                data = t(force_apply=True, **data)
                data = self.check_data_post_transform(data)
        return data

    def to_dict_private(self) -> dict[str, Any]:
        dictionary = super().to_dict_private()
        dictionary.update({"n": self.n, "replace": self.replace})
        return dictionary

`hub_mixin` ¶

This module provides mixin functionality for the Albumentations library. It includes utility functions and classes to enhance the core capabilities.

`class HubMixin` [view source on GitHub] ¶

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/hub_mixin.py

Python

class HubMixin:
    _CONFIG_KEYS = ("train", "eval")
    _CONFIG_FILE_NAME_TEMPLATE = "albumentations_config_{}.json"

    def _save_pretrained(self, save_directory: str | Path, filename: str) -> Path:
        """Save the transform to a specified directory.

        Args:
            save_directory (Union[str, Path]):
                Directory where the transform will be saved.
            filename (str):
                Name of the file to save the transform.

        Returns:
            Path: Path to the saved transform file.
        """
        # create save directory and path
        save_directory = Path(save_directory)
        save_directory.mkdir(parents=True, exist_ok=True)
        save_path = save_directory / filename

        # save transforms
        save_transform(self, save_path, data_format="json")  # type: ignore[arg-type]

        return save_path

    @classmethod
    def _from_pretrained(cls, save_directory: str | Path, filename: str) -> object:
        """Load a transform from a specified directory.

        Args:
            save_directory (Union[str, Path]):
                Directory from where the transform will be loaded.
            filename (str):
                Name of the file to load the transform from.

        Returns:
            A.Compose: Loaded transform.
        """
        save_path = Path(save_directory) / filename
        return load_transform(save_path, data_format="json")

    def save_pretrained(
        self,
        save_directory: str | Path,
        *,
        key: str = "eval",
        allow_custom_keys: bool = False,
        repo_id: str | None = None,
        push_to_hub: bool = False,
        **push_to_hub_kwargs: Any,
    ) -> str | None:
        """Save the transform and optionally push it to the Huggingface Hub.

        Args:
            save_directory (`str` or `Path`):
                Path to directory in which the transform configuration will be saved.
            key (`str`, *optional*):
                Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
            allow_custom_keys (`bool`, *optional*):
                Allow custom keys for the configuration. Defaults to False.
            push_to_hub (`bool`, *optional*, defaults to `False`):
                Whether or not to push your transform to the Huggingface Hub after saving it.
            repo_id (`str`, *optional*):
                ID of your repository on the Hub. Used only if `push_to_hub=True`. Will default to the folder name if
                not provided.
            push_to_hub_kwargs:
                Additional key word arguments passed along to the [`push_to_hub`] method.

        Returns:
            `str` or `None`: url of the commit on the Hub if `push_to_hub=True`, `None` otherwise.
        """
        if not allow_custom_keys and key not in self._CONFIG_KEYS:
            raise ValueError(
                f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
                "If you want to use a custom key, set `allow_custom_keys=True`.",
            )

        # save model transforms
        filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
        self._save_pretrained(save_directory, filename)

        # push to the Hub if required
        if push_to_hub:
            kwargs = push_to_hub_kwargs.copy()  # soft-copy to avoid mutating input
            if repo_id is None:
                repo_id = Path(save_directory).name  # Defaults to `save_directory` name
            return self.push_to_hub(repo_id=repo_id, key=key, **kwargs)
        return None

    @classmethod
    def from_pretrained(
        cls: Any,
        directory_or_repo_id: str | Path,
        *,
        key: str = "eval",
        force_download: bool = False,
        proxies: dict[str, str] | None = None,
        token: str | bool | None = None,
        cache_dir: str | Path | None = None,
        local_files_only: bool = False,
        revision: str | None = None,
    ) -> object:
        """Load a transform from the Huggingface Hub or a local directory.

        Args:
            directory_or_repo_id (`str`, `Path`):
                - Either the `repo_id` (string) of a repo with hosted transform on the Hub, e.g. `qubvel-hf/albu`.
                - Or a path to a `directory` containing transform config saved using
                    [`~albumentations.Compose.save_pretrained`], e.g., `../path/to/my_directory/`.
            key (`str`, *optional*):
                Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
            revision (`str`, *optional*):
                Revision of the repo on the Hub. Can be a branch name, a git tag or any commit id.
                Defaults to the latest commit on `main` branch.
            force_download (`bool`, *optional*, defaults to `False`):
                Whether to force (re-)downloading the transform configuration files from the Hub, overriding
                the existing cache.
            proxies (`dict[str, str]`, *optional*):
                A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
                'http://hostname': 'foo.bar:4012'}`. The proxies are used on every request.
            token (`str` or `bool`, *optional*):
                The token to use as HTTP bearer authorization for remote files. By default, it will use the token
                cached when running `huggingface-cli login`.
            cache_dir (`str`, `Path`, *optional*):
                Path to the folder where cached files are stored.
            local_files_only (`bool`, *optional*, defaults to `False`):
                If `True`, avoid downloading the file and return the path to the local cached file if it exists.
        """
        filename = cls._CONFIG_FILE_NAME_TEMPLATE.format(key)
        directory_or_repo_id = Path(directory_or_repo_id)
        transform = None

        # check if the file is already present locally
        if directory_or_repo_id.is_dir():
            if filename in os.listdir(directory_or_repo_id):
                transform = cls._from_pretrained(save_directory=directory_or_repo_id, filename=filename)
            elif is_huggingface_hub_available:
                logging.info(
                    f"{filename} not found in {Path(directory_or_repo_id).resolve()}, trying to load from the Hub.",
                )
            else:
                raise FileNotFoundError(
                    f"{filename} not found in {Path(directory_or_repo_id).resolve()}."
                    " Please install `huggingface_hub` to load from the Hub.",
                )
        if transform is not None:
            return transform

        # download the file from the Hub
        try:
            config_file = hf_hub_download(
                repo_id=directory_or_repo_id,
                filename=filename,
                revision=revision,
                cache_dir=cache_dir,
                force_download=force_download,
                proxies=proxies,
                token=token,
                local_files_only=local_files_only,
            )
            directory, filename = Path(config_file).parent, Path(config_file).name
            return cls._from_pretrained(save_directory=directory, filename=filename)

        except HfHubHTTPError as e:
            raise HfHubHTTPError(f"{filename} not found on the HuggingFace Hub") from e

    @require_huggingface_hub
    def push_to_hub(
        self,
        repo_id: str,
        *,
        key: str = "eval",
        allow_custom_keys: bool = False,
        commit_message: str = "Push transform using huggingface_hub.",
        private: bool = False,
        token: str | None = None,
        branch: str | None = None,
        create_pr: bool | None = None,
    ) -> str:
        """Push the transform to the Huggingface Hub.

        Use `allow_patterns` and `ignore_patterns` to precisely filter which files should be pushed to the hub. Use
        `delete_patterns` to delete existing remote files in the same commit. See [`upload_folder`] reference for more
        details.

        Args:
            repo_id (`str`):
                ID of the repository to push to (example: `"username/my-model"`).
            key (`str`, *optional*):
                Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
            allow_custom_keys (`bool`, *optional*):
                Allow custom keys for the configuration. Defaults to False.
            commit_message (`str`, *optional*):
                Message to commit while pushing.
            private (`bool`, *optional*, defaults to `False`):
                Whether the repository created should be private.
            token (`str`, *optional*):
                The token to use as HTTP bearer authorization for remote files. By default, it will use the token
                cached when running `huggingface-cli login`.
            branch (`str`, *optional*):
                The git branch on which to push the transform. This defaults to `"main"`.
            create_pr (`boolean`, *optional*):
                Whether or not to create a Pull Request from `branch` with that commit. Defaults to `False`.

        Returns:
            The url of the commit of your transform in the given repository.
        """
        if not allow_custom_keys and key not in self._CONFIG_KEYS:
            raise ValueError(
                f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
                "If you still want to use a custom key, set `allow_custom_keys=True`.",
            )

        api = HfApi(token=token)
        repo_id = api.create_repo(repo_id=repo_id, private=private, exist_ok=True).repo_id

        # Push the files to the repo in a single commit
        with SoftTemporaryDirectory() as tmp:
            save_directory = Path(tmp) / repo_id
            filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
            save_path = self._save_pretrained(save_directory, filename=filename)
            return api.upload_file(
                path_or_fileobj=save_path,
                path_in_repo=filename,
                repo_id=repo_id,
                commit_message=commit_message,
                revision=branch,
                create_pr=create_pr,
            )

`from_pretrained (directory_or_repo_id, *, key='eval', force_download=False, proxies=None, token=None, cache_dir=None, local_files_only=False, revision=None)` `classmethod` ¶

Load a transform from the Huggingface Hub or a local directory.

Parameters:

Name	Type	Description
`directory_or_repo_id`	`str`, `Path`	Either the `repo_id` (string) of a repo with hosted transform on the Hub, e.g. `qubvel-hf/albu`. Or a path to a `directory` containing transform config saved using [`~albumentations.Compose.save_pretrained`], e.g., `../path/to/my_directory/`.
`key`	`str`, optional	Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
`revision`	`str`, optional	Revision of the repo on the Hub. Can be a branch name, a git tag or any commit id. Defaults to the latest commit on `main` branch.
`force_download`	`bool`, optional, defaults to `False`	Whether to force (re-)downloading the transform configuration files from the Hub, overriding the existing cache.
`proxies`	`dict[str, str]`, optional	A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`. The proxies are used on every request.
`token`	`str` or `bool`, optional	The token to use as HTTP bearer authorization for remote files. By default, it will use the token cached when running `huggingface-cli login`.
`cache_dir`	`str`, `Path`, optional	Path to the folder where cached files are stored.
`local_files_only`	`bool`, optional, defaults to `False`	If `True`, avoid downloading the file and return the path to the local cached file if it exists.

Source code in albumentations/core/hub_mixin.py

Python

@classmethod
def from_pretrained(
    cls: Any,
    directory_or_repo_id: str | Path,
    *,
    key: str = "eval",
    force_download: bool = False,
    proxies: dict[str, str] | None = None,
    token: str | bool | None = None,
    cache_dir: str | Path | None = None,
    local_files_only: bool = False,
    revision: str | None = None,
) -> object:
    """Load a transform from the Huggingface Hub or a local directory.

    Args:
        directory_or_repo_id (`str`, `Path`):
            - Either the `repo_id` (string) of a repo with hosted transform on the Hub, e.g. `qubvel-hf/albu`.
            - Or a path to a `directory` containing transform config saved using
                [`~albumentations.Compose.save_pretrained`], e.g., `../path/to/my_directory/`.
        key (`str`, *optional*):
            Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
        revision (`str`, *optional*):
            Revision of the repo on the Hub. Can be a branch name, a git tag or any commit id.
            Defaults to the latest commit on `main` branch.
        force_download (`bool`, *optional*, defaults to `False`):
            Whether to force (re-)downloading the transform configuration files from the Hub, overriding
            the existing cache.
        proxies (`dict[str, str]`, *optional*):
            A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
            'http://hostname': 'foo.bar:4012'}`. The proxies are used on every request.
        token (`str` or `bool`, *optional*):
            The token to use as HTTP bearer authorization for remote files. By default, it will use the token
            cached when running `huggingface-cli login`.
        cache_dir (`str`, `Path`, *optional*):
            Path to the folder where cached files are stored.
        local_files_only (`bool`, *optional*, defaults to `False`):
            If `True`, avoid downloading the file and return the path to the local cached file if it exists.
    """
    filename = cls._CONFIG_FILE_NAME_TEMPLATE.format(key)
    directory_or_repo_id = Path(directory_or_repo_id)
    transform = None

    # check if the file is already present locally
    if directory_or_repo_id.is_dir():
        if filename in os.listdir(directory_or_repo_id):
            transform = cls._from_pretrained(save_directory=directory_or_repo_id, filename=filename)
        elif is_huggingface_hub_available:
            logging.info(
                f"{filename} not found in {Path(directory_or_repo_id).resolve()}, trying to load from the Hub.",
            )
        else:
            raise FileNotFoundError(
                f"{filename} not found in {Path(directory_or_repo_id).resolve()}."
                " Please install `huggingface_hub` to load from the Hub.",
            )
    if transform is not None:
        return transform

    # download the file from the Hub
    try:
        config_file = hf_hub_download(
            repo_id=directory_or_repo_id,
            filename=filename,
            revision=revision,
            cache_dir=cache_dir,
            force_download=force_download,
            proxies=proxies,
            token=token,
            local_files_only=local_files_only,
        )
        directory, filename = Path(config_file).parent, Path(config_file).name
        return cls._from_pretrained(save_directory=directory, filename=filename)

    except HfHubHTTPError as e:
        raise HfHubHTTPError(f"{filename} not found on the HuggingFace Hub") from e

`push_to_hub (self, repo_id, *, key='eval', allow_custom_keys=False, commit_message='Push transform using huggingface_hub.', private=False, token=None, branch=None, create_pr=None)` ¶

Push the transform to the Huggingface Hub.

Use allow_patterns and ignore_patterns to precisely filter which files should be pushed to the hub. Use delete_patterns to delete existing remote files in the same commit. See [upload_folder] reference for more details.

Parameters:

Name	Type	Description
`repo_id`	`str`	ID of the repository to push to (example: `"username/my-model"`).
`key`	`str`, optional	Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
`allow_custom_keys`	`bool`, optional	Allow custom keys for the configuration. Defaults to False.
`commit_message`	`str`, optional	Message to commit while pushing.
`private`	`bool`, optional, defaults to `False`	Whether the repository created should be private.
`token`	`str`, optional	The token to use as HTTP bearer authorization for remote files. By default, it will use the token cached when running `huggingface-cli login`.
`branch`	`str`, optional	The git branch on which to push the transform. This defaults to `"main"`.
`create_pr`	`boolean`, optional	Whether or not to create a Pull Request from `branch` with that commit. Defaults to `False`.

Returns:

Type	Description
`str`	The url of the commit of your transform in the given repository.

Source code in albumentations/core/hub_mixin.py

Python

@require_huggingface_hub
def push_to_hub(
    self,
    repo_id: str,
    *,
    key: str = "eval",
    allow_custom_keys: bool = False,
    commit_message: str = "Push transform using huggingface_hub.",
    private: bool = False,
    token: str | None = None,
    branch: str | None = None,
    create_pr: bool | None = None,
) -> str:
    """Push the transform to the Huggingface Hub.

    Use `allow_patterns` and `ignore_patterns` to precisely filter which files should be pushed to the hub. Use
    `delete_patterns` to delete existing remote files in the same commit. See [`upload_folder`] reference for more
    details.

    Args:
        repo_id (`str`):
            ID of the repository to push to (example: `"username/my-model"`).
        key (`str`, *optional*):
            Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
        allow_custom_keys (`bool`, *optional*):
            Allow custom keys for the configuration. Defaults to False.
        commit_message (`str`, *optional*):
            Message to commit while pushing.
        private (`bool`, *optional*, defaults to `False`):
            Whether the repository created should be private.
        token (`str`, *optional*):
            The token to use as HTTP bearer authorization for remote files. By default, it will use the token
            cached when running `huggingface-cli login`.
        branch (`str`, *optional*):
            The git branch on which to push the transform. This defaults to `"main"`.
        create_pr (`boolean`, *optional*):
            Whether or not to create a Pull Request from `branch` with that commit. Defaults to `False`.

    Returns:
        The url of the commit of your transform in the given repository.
    """
    if not allow_custom_keys and key not in self._CONFIG_KEYS:
        raise ValueError(
            f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
            "If you still want to use a custom key, set `allow_custom_keys=True`.",
        )

    api = HfApi(token=token)
    repo_id = api.create_repo(repo_id=repo_id, private=private, exist_ok=True).repo_id

    # Push the files to the repo in a single commit
    with SoftTemporaryDirectory() as tmp:
        save_directory = Path(tmp) / repo_id
        filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
        save_path = self._save_pretrained(save_directory, filename=filename)
        return api.upload_file(
            path_or_fileobj=save_path,
            path_in_repo=filename,
            repo_id=repo_id,
            commit_message=commit_message,
            revision=branch,
            create_pr=create_pr,
        )

`save_pretrained (self, save_directory, *, key='eval', allow_custom_keys=False, repo_id=None, push_to_hub=False, **push_to_hub_kwargs)` ¶

Save the transform and optionally push it to the Huggingface Hub.

Parameters:

Name	Type	Description
`save_directory`	`str` or `Path`	Path to directory in which the transform configuration will be saved.
`key`	`str`, optional	Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
`allow_custom_keys`	`bool`, optional	Allow custom keys for the configuration. Defaults to False.
`push_to_hub`	`bool`, optional, defaults to `False`	Whether or not to push your transform to the Huggingface Hub after saving it.
`repo_id`	`str`, optional	ID of your repository on the Hub. Used only if `push_to_hub=True`. Will default to the folder name if not provided.
`push_to_hub_kwargs`	`Any`	Additional key word arguments passed along to the [`push_to_hub`] method.

Returns:

Type	Description
`str` or `None`	url of the commit on the Hub if `push_to_hub=True`, `None` otherwise.

Source code in albumentations/core/hub_mixin.py

Python

def save_pretrained(
    self,
    save_directory: str | Path,
    *,
    key: str = "eval",
    allow_custom_keys: bool = False,
    repo_id: str | None = None,
    push_to_hub: bool = False,
    **push_to_hub_kwargs: Any,
) -> str | None:
    """Save the transform and optionally push it to the Huggingface Hub.

    Args:
        save_directory (`str` or `Path`):
            Path to directory in which the transform configuration will be saved.
        key (`str`, *optional*):
            Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
        allow_custom_keys (`bool`, *optional*):
            Allow custom keys for the configuration. Defaults to False.
        push_to_hub (`bool`, *optional*, defaults to `False`):
            Whether or not to push your transform to the Huggingface Hub after saving it.
        repo_id (`str`, *optional*):
            ID of your repository on the Hub. Used only if `push_to_hub=True`. Will default to the folder name if
            not provided.
        push_to_hub_kwargs:
            Additional key word arguments passed along to the [`push_to_hub`] method.

    Returns:
        `str` or `None`: url of the commit on the Hub if `push_to_hub=True`, `None` otherwise.
    """
    if not allow_custom_keys and key not in self._CONFIG_KEYS:
        raise ValueError(
            f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
            "If you want to use a custom key, set `allow_custom_keys=True`.",
        )

    # save model transforms
    filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
    self._save_pretrained(save_directory, filename)

    # push to the Hub if required
    if push_to_hub:
        kwargs = push_to_hub_kwargs.copy()  # soft-copy to avoid mutating input
        if repo_id is None:
            repo_id = Path(save_directory).name  # Defaults to `save_directory` name
        return self.push_to_hub(repo_id=repo_id, key=key, **kwargs)
    return None

`keypoints_utils` ¶

`class KeypointParams` `(format, label_fields=None, remove_invisible=True, angle_in_degrees=True, check_each_transform=True)` [view source on GitHub] ¶

Parameters of keypoints

Parameters:

Name	Type	Description
`format`	`str`	format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'. x - X coordinate, y - Y coordinate s - Keypoint scale a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)
`label_fields`	`list`	list of fields that are joined with keypoints, e.g labels. Should be same type as keypoints.
`remove_invisible`	`bool`	to remove invisible points after transform or not
`angle_in_degrees`	`bool`	angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints
`check_each_transform`	`bool`	if `True`, then keypoints will be checked after each dual transform. Default: `True`

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/keypoints_utils.py

Python

class KeypointParams(Params):
    """Parameters of keypoints

    Args:
        format (str): format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

            x - X coordinate,

            y - Y coordinate

            s - Keypoint scale

            a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)
        label_fields (list): list of fields that are joined with keypoints, e.g labels.
            Should be same type as keypoints.
        remove_invisible (bool): to remove invisible points after transform or not
        angle_in_degrees (bool): angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints
        check_each_transform (bool): if `True`, then keypoints will be checked after each dual transform.
            Default: `True`

    """

    def __init__(
        self,
        format: str,  # noqa: A002
        label_fields: Sequence[str] | None = None,
        remove_invisible: bool = True,
        angle_in_degrees: bool = True,
        check_each_transform: bool = True,
    ):
        super().__init__(format, label_fields)
        self.remove_invisible = remove_invisible
        self.angle_in_degrees = angle_in_degrees
        self.check_each_transform = check_each_transform

    def to_dict_private(self) -> dict[str, Any]:
        data = super().to_dict_private()
        data.update(
            {
                "remove_invisible": self.remove_invisible,
                "angle_in_degrees": self.angle_in_degrees,
                "check_each_transform": self.check_each_transform,
            },
        )
        return data

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    @classmethod
    def get_class_fullname(cls) -> str:
        return "KeypointParams"

`class KeypointsProcessor` `(params, additional_targets=None)` [view source on GitHub] ¶

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/keypoints_utils.py

Python

class KeypointsProcessor(DataProcessor):
    def __init__(self, params: KeypointParams, additional_targets: dict[str, str] | None = None):
        super().__init__(params, additional_targets)

    @property
    def default_data_name(self) -> str:
        return "keypoints"

    def ensure_data_valid(self, data: dict[str, Any]) -> None:
        if self.params.label_fields and not all(i in data for i in self.params.label_fields):
            msg = "Your 'label_fields' are not valid - them must have same names as params in 'keypoint_params' dict"
            raise ValueError(msg)

    def filter(self, data: Sequence[KeypointType], rows: int, cols: int) -> Sequence[KeypointType]:
        """The function filters a sequence of data based on the number of rows and columns, and returns a
        sequence of keypoints.

        :param data: The `data` parameter is a sequence of sequences. Each inner sequence represents a
        set of keypoints
        :type data: Sequence[Sequence]
        :param rows: The `rows` parameter represents the number of rows in the data matrix. It specifies
        the number of rows that will be used for filtering the keypoints
        :type rows: int
        :param cols: The parameter "cols" represents the number of columns in the grid that the
        keypoints will be filtered on
        :type cols: int
        :return: a sequence of KeypointType objects.
        """
        self.params: KeypointParams
        return filter_keypoints(data, rows, cols, remove_invisible=self.params.remove_invisible)

    def check(self, data: Sequence[KeypointType], rows: int, cols: int) -> None:
        check_keypoints(data, rows, cols)

    def convert_from_albumentations(self, data: Sequence[KeypointType], rows: int, cols: int) -> list[KeypointType]:
        params = self.params
        return convert_keypoints_from_albumentations(
            data,
            params.format,
            rows,
            cols,
            check_validity=params.remove_invisible,
            angle_in_degrees=params.angle_in_degrees,
        )

    def convert_to_albumentations(self, data: Sequence[KeypointType], rows: int, cols: int) -> list[KeypointType]:
        params = self.params
        return convert_keypoints_to_albumentations(
            data,
            params.format,
            rows,
            cols,
            check_validity=params.remove_invisible,
            angle_in_degrees=params.angle_in_degrees,
        )

`filter (self, data, rows, cols)` ¶

The function filters a sequence of data based on the number of rows and columns, and returns a sequence of keypoints.

:param data: The data parameter is a sequence of sequences. Each inner sequence represents a set of keypoints :type data: Sequence[Sequence] :param rows: The rows parameter represents the number of rows in the data matrix. It specifies the number of rows that will be used for filtering the keypoints :type rows: int :param cols: The parameter "cols" represents the number of columns in the grid that the keypoints will be filtered on :type cols: int :return: a sequence of KeypointType objects.

Source code in albumentations/core/keypoints_utils.py

Python

def filter(self, data: Sequence[KeypointType], rows: int, cols: int) -> Sequence[KeypointType]:
    """The function filters a sequence of data based on the number of rows and columns, and returns a
    sequence of keypoints.

    :param data: The `data` parameter is a sequence of sequences. Each inner sequence represents a
    set of keypoints
    :type data: Sequence[Sequence]
    :param rows: The `rows` parameter represents the number of rows in the data matrix. It specifies
    the number of rows that will be used for filtering the keypoints
    :type rows: int
    :param cols: The parameter "cols" represents the number of columns in the grid that the
    keypoints will be filtered on
    :type cols: int
    :return: a sequence of KeypointType objects.
    """
    self.params: KeypointParams
    return filter_keypoints(data, rows, cols, remove_invisible=self.params.remove_invisible)

`def check_keypoint (kp, rows, cols)` [view source on GitHub]¶

Check if keypoint coordinates are less than image shapes

Source code in albumentations/core/keypoints_utils.py

Python

def check_keypoint(kp: KeypointType, rows: int, cols: int) -> None:
    """Check if keypoint coordinates are less than image shapes"""
    for name, value, size in zip(["x", "y"], kp[:2], [cols, rows]):
        if not 0 <= value < size:
            raise ValueError(f"Expected {name} for keypoint {kp} to be in the range [0.0, {size}], got {value}.")

    angle = kp[2]
    if not (0 <= angle < 2 * math.pi):
        raise ValueError(f"Keypoint angle must be in range [0, 2 * PI). Got: {angle}")

`def check_keypoints (keypoints, rows, cols)` [view source on GitHub]¶

Check if keypoints boundaries are less than image shapes

Source code in albumentations/core/keypoints_utils.py

Python

def check_keypoints(keypoints: Sequence[KeypointType], rows: int, cols: int) -> None:
    """Check if keypoints boundaries are less than image shapes"""
    for kp in keypoints:
        check_keypoint(kp, rows, cols)

`serialization` ¶

`class Serializable` [view source on GitHub] ¶

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/serialization.py

Python

class Serializable(metaclass=SerializableMeta):
    @classmethod
    @abstractmethod
    def is_serializable(cls) -> bool:
        raise NotImplementedError

    @classmethod
    @abstractmethod
    def get_class_fullname(cls) -> str:
        raise NotImplementedError

    @abstractmethod
    def to_dict_private(self) -> dict[str, Any]:
        raise NotImplementedError

    def to_dict(self, on_not_implemented_error: str = "raise") -> dict[str, Any]:
        """Take a transform pipeline and convert it to a serializable representation that uses only standard
        python data types: dictionaries, lists, strings, integers, and floats.

        Args:
            self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
                method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
                If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
                but no transform parameters will be serialized.
            on_not_implemented_error (str): `raise` or `warn`.

        """
        if on_not_implemented_error not in {"raise", "warn"}:
            msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
            "and 'warn'"
            raise ValueError(msg)
        try:
            transform_dict = self.to_dict_private()
        except NotImplementedError:
            if on_not_implemented_error == "raise":
                raise

            transform_dict = {}
            warnings.warn(
                f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
                f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
                f"or '{self.__class__.__name__}.get_transform_init_args' "
                "method to make the transform serializable",
                stacklevel=2,
            )
        return {"__version__": __version__, "transform": transform_dict}

`to_dict (self, on_not_implemented_error='raise')` ¶

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name	Type	Description
`self`		A transform that should be serialized. If the transform doesn't implement the `to_dict` method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised. If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored but no transform parameters will be serialized.
`on_not_implemented_error`	`str`	`raise` or `warn`.

Source code in albumentations/core/serialization.py

Python

def to_dict(self, on_not_implemented_error: str = "raise") -> dict[str, Any]:
    """Take a transform pipeline and convert it to a serializable representation that uses only standard
    python data types: dictionaries, lists, strings, integers, and floats.

    Args:
        self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
            method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
            If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
            but no transform parameters will be serialized.
        on_not_implemented_error (str): `raise` or `warn`.

    """
    if on_not_implemented_error not in {"raise", "warn"}:
        msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
        "and 'warn'"
        raise ValueError(msg)
    try:
        transform_dict = self.to_dict_private()
    except NotImplementedError:
        if on_not_implemented_error == "raise":
            raise

        transform_dict = {}
        warnings.warn(
            f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
            f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
            f"or '{self.__class__.__name__}.get_transform_init_args' "
            "method to make the transform serializable",
            stacklevel=2,
        )
    return {"__version__": __version__, "transform": transform_dict}

`class SerializableMeta` [view source on GitHub] ¶

A metaclass that is used to register classes in SERIALIZABLE_REGISTRY or NON_SERIALIZABLE_REGISTRY so they can be found later while deserializing transformation pipeline using classes full names.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/serialization.py

Python

class SerializableMeta(ABCMeta):
    """A metaclass that is used to register classes in `SERIALIZABLE_REGISTRY` or `NON_SERIALIZABLE_REGISTRY`
    so they can be found later while deserializing transformation pipeline using classes full names.
    """

    def __new__(cls, name: str, bases: tuple[type, ...], *args: Any, **kwargs: Any) -> SerializableMeta:
        cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
        if name != "Serializable" and ABC not in bases:
            if cls_obj.is_serializable():
                SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
            else:
                NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
        return cls_obj

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def _to_dict(cls) -> dict[str, Any]:
        return {}

`new (cls, name, bases, *args, **kwargs)` `special` `staticmethod` ¶

Create and return a new object. See help(type) for accurate signature.

Source code in albumentations/core/serialization.py

Python

def __new__(cls, name: str, bases: tuple[type, ...], *args: Any, **kwargs: Any) -> SerializableMeta:
    cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
    if name != "Serializable" and ABC not in bases:
        if cls_obj.is_serializable():
            SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
        else:
            NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
    return cls_obj

`def from_dict (transform_dict, nonserializable=None)` [view source on GitHub]¶

transform_dict: A dictionary with serialized transform pipeline. nonserializable (dict): A dictionary that contains non-serializable transforms. This dictionary is required when you are restoring a pipeline that contains non-serializable transforms. Keys in that dictionary should be named same as name arguments in respective transforms from a serialized pipeline.

Source code in albumentations/core/serialization.py

Python

def from_dict(
    transform_dict: dict[str, Any],
    nonserializable: dict[str, Any] | None = None,
) -> Serializable | None:
    """Args:
    transform_dict: A dictionary with serialized transform pipeline.
    nonserializable (dict): A dictionary that contains non-serializable transforms.
        This dictionary is required when you are restoring a pipeline that contains non-serializable transforms.
        Keys in that dictionary should be named same as `name` arguments in respective transforms from
        a serialized pipeline.

    """
    register_additional_transforms()
    transform = transform_dict["transform"]
    lmbd = instantiate_nonserializable(transform, nonserializable)
    if lmbd:
        return lmbd
    name = transform["__class_fullname__"]
    args = {k: v for k, v in transform.items() if k != "__class_fullname__"}
    cls = SERIALIZABLE_REGISTRY[shorten_class_name(name)]
    if "transforms" in args:
        args["transforms"] = [from_dict({"transform": t}, nonserializable=nonserializable) for t in args["transforms"]]
    return cls(**args)

`def get_shortest_class_fullname (cls)` [view source on GitHub]¶

The function get_shortest_class_fullname takes a class object as input and returns its shortened full name.

:param cls: The parameter cls is of type Type[BasicCompose], which means it expects a class that is a subclass of BasicCompose :type cls: Type[BasicCompose] :return: a string, which is the shortened version of the full class name.

Source code in albumentations/core/serialization.py

Python

def get_shortest_class_fullname(cls: type[Any]) -> str:
    """The function `get_shortest_class_fullname` takes a class object as input and returns its shortened
    full name.

    :param cls: The parameter `cls` is of type `Type[BasicCompose]`, which means it expects a class that
    is a subclass of `BasicCompose`
    :type cls: Type[BasicCompose]
    :return: a string, which is the shortened version of the full class name.
    """
    class_fullname = f"{cls.__module__}.{cls.__name__}"
    return shorten_class_name(class_fullname)

`def load (filepath_or_buffer, data_format='json', nonserializable=None)` [view source on GitHub]¶

Load a serialized pipeline from a file or file-like object and construct a transform pipeline.

Parameters:

Name	Type	Description
`filepath_or_buffer`	`Union[str, Path, TextIO]`	The file path or file-like object to read the serialized data from. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be read from it directly.
`data_format`	`str`	The format of the serialized data. Valid options are 'json' and 'yaml'. Defaults to 'json'.
`nonserializable`	`Optional[dict[str, Any]]`	A dictionary that contains non-serializable transforms. This dictionary is required when restoring a pipeline that contains non-serializable transforms. Keys in the dictionary should be named the same as the `name` arguments in respective transforms from the serialized pipeline. Defaults to None.

Returns:

Type	Description
`object`	The deserialized transform pipeline.

Exceptions:

Type	Description
`ValueError`	If `data_format` is 'yaml' but PyYAML is not installed.

Source code in albumentations/core/serialization.py

Python

def load(
    filepath_or_buffer: str | Path | TextIO,
    data_format: str = "json",
    nonserializable: dict[str, Any] | None = None,
) -> object:
    """Load a serialized pipeline from a file or file-like object and construct a transform pipeline.

    Args:
        filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to read the serialized
            data from.
            If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
            the serialized data will be read from it directly.
        data_format (str): The format of the serialized data. Valid options are 'json' and 'yaml'.
            Defaults to 'json'.
        nonserializable (Optional[dict[str, Any]]): A dictionary that contains non-serializable transforms.
            This dictionary is required when restoring a pipeline that contains non-serializable transforms.
            Keys in the dictionary should be named the same as the `name` arguments in respective transforms
            from the serialized pipeline. Defaults to None.

    Returns:
        object: The deserialized transform pipeline.

    Raises:
        ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

    """
    check_data_format(data_format)

    if isinstance(filepath_or_buffer, (str, Path)):  # Assume it's a filepath
        with open(filepath_or_buffer) as f:
            if data_format == "json":
                transform_dict = json.load(f)
            else:
                if not yaml_available:
                    msg = "You need to install PyYAML to load a pipeline in yaml format"
                    raise ValueError(msg)
                transform_dict = yaml.safe_load(f)
    elif data_format == "json":
        transform_dict = json.load(filepath_or_buffer)
    else:
        if not yaml_available:
            msg = "You need to install PyYAML to load a pipeline in yaml format"
            raise ValueError(msg)
        transform_dict = yaml.safe_load(filepath_or_buffer)

    return from_dict(transform_dict, nonserializable=nonserializable)

`def register_additional_transforms ()` [view source on GitHub]¶

Register transforms that are not imported directly into the albumentations module by checking the availability of optional dependencies.

Source code in albumentations/core/serialization.py

Python

def register_additional_transforms() -> None:
    """Register transforms that are not imported directly into the `albumentations` module by checking
    the availability of optional dependencies.
    """
    if importlib.util.find_spec("torch") is not None:
        try:
            # Import `albumentations.pytorch` only if `torch` is installed.
            import albumentations.pytorch

            # Use a dummy operation to acknowledge the use of the imported module and avoid linting errors.
            _ = albumentations.pytorch.ToTensorV2
        except ImportError:
            pass

`def save (transform, filepath_or_buffer, data_format='json', on_not_implemented_error='raise')` [view source on GitHub]¶

Serialize a transform pipeline and save it to either a file specified by a path or a file-like object in either JSON or YAML format.

Parameters:

Name	Type	Description
`transform`	`Serializable`	The transform pipeline to serialize.
`filepath_or_buffer`	`Union[str, Path, TextIO]`	The file path or file-like object to write the serialized data to. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be written to it directly.
`data_format`	`str`	The format to serialize the data in. Valid options are 'json' and 'yaml'. Defaults to 'json'.
`on_not_implemented_error`	`str`	Determines the behavior if a transform does not implement the `to_dict` method. If set to 'raise', a `NotImplementedError` is raised. If set to 'warn', the exception is ignored, and no transform arguments are saved. Defaults to 'raise'.

Exceptions:

Type	Description
`ValueError`	If `data_format` is 'yaml' but PyYAML is not installed.

Source code in albumentations/core/serialization.py

Python

def save(
    transform: Serializable,
    filepath_or_buffer: str | Path | TextIO,
    data_format: str = "json",
    on_not_implemented_error: str = "raise",
) -> None:
    """Serialize a transform pipeline and save it to either a file specified by a path or a file-like object
    in either JSON or YAML format.

    Args:
        transform (Serializable): The transform pipeline to serialize.
        filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to write the serialized
            data to.
            If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
            the serialized data will be written to it directly.
        data_format (str): The format to serialize the data in. Valid options are 'json' and 'yaml'.
            Defaults to 'json'.
        on_not_implemented_error (str): Determines the behavior if a transform does not implement the `to_dict` method.
            If set to 'raise', a `NotImplementedError` is raised. If set to 'warn', the exception is ignored, and
            no transform arguments are saved. Defaults to 'raise'.

    Raises:
        ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

    """
    check_data_format(data_format)
    transform_dict = transform.to_dict(on_not_implemented_error=on_not_implemented_error)
    transform_dict = serialize_enum(transform_dict)

    # Determine whether to write to a file or a file-like object
    if isinstance(filepath_or_buffer, (str, Path)):  # It's a filepath
        with open(filepath_or_buffer, "w") as f:
            if data_format == "yaml":
                if not yaml_available:
                    msg = "You need to install PyYAML to save a pipeline in YAML format"
                    raise ValueError(msg)
                yaml.safe_dump(transform_dict, f, default_flow_style=False)
            elif data_format == "json":
                json.dump(transform_dict, f)
    elif data_format == "yaml":
        if not yaml_available:
            msg = "You need to install PyYAML to save a pipeline in YAML format"
            raise ValueError(msg)
        yaml.safe_dump(transform_dict, filepath_or_buffer, default_flow_style=False)
    elif data_format == "json":
        json.dump(transform_dict, filepath_or_buffer, indent=2)

`def serialize_enum (obj)` [view source on GitHub]¶

Recursively search for Enum objects and convert them to their value. Also handle any Mapping or Sequence types.

Source code in albumentations/core/serialization.py

Python

def serialize_enum(obj: Any) -> Any:
    """Recursively search for Enum objects and convert them to their value.
    Also handle any Mapping or Sequence types.
    """
    if isinstance(obj, Mapping):
        return {k: serialize_enum(v) for k, v in obj.items()}
    if isinstance(obj, Sequence) and not isinstance(obj, str):  # exclude strings since they're also sequences
        return [serialize_enum(v) for v in obj]
    return obj.value if isinstance(obj, Enum) else obj

`def to_dict (transform, on_not_implemented_error='raise')` [view source on GitHub]¶

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name	Type	Description
`transform`	`Serializable`	A transform that should be serialized. If the transform doesn't implement the `to_dict` method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised. If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored but no transform parameters will be serialized.
`on_not_implemented_error`	`str`	`raise` or `warn`.

Source code in albumentations/core/serialization.py

Python

def to_dict(transform: Serializable, on_not_implemented_error: str = "raise") -> dict[str, Any]:
    """Take a transform pipeline and convert it to a serializable representation that uses only standard
    python data types: dictionaries, lists, strings, integers, and floats.

    Args:
        transform: A transform that should be serialized. If the transform doesn't implement the `to_dict`
            method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
            If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
            but no transform parameters will be serialized.
        on_not_implemented_error (str): `raise` or `warn`.

    """
    return transform.to_dict(on_not_implemented_error)

`transforms_interface` ¶

`class BasicTransform` `(p=0.5, always_apply=None)` [view source on GitHub] ¶

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py

Python

class BasicTransform(Serializable, metaclass=CombinedMeta):
    _targets: tuple[Targets, ...] | Targets  # targets that this transform can work on
    _available_keys: set[str]  # targets that this transform, as string, lower-cased
    _key2func: dict[
        str,
        Callable[..., Any],
    ]  # mapping for targets (plus additional targets) and methods for which they depend
    call_backup = None
    interpolation: int
    fill_value: ColorType
    mask_fill_value: ColorType | None
    # replay mode params
    deterministic: bool = False
    save_key = "replay"
    replay_mode = False
    applied_in_replay = False

    class InitSchema(BaseTransformInitSchema):
        pass

    def __init__(self, p: float = 0.5, always_apply: bool | None = None):
        self.p = p
        if always_apply is not None:
            if always_apply:
                warn(
                    "always_apply is deprecated. Use `p=1` if you want to always apply the transform."
                    " self.p will be set to 1.",
                    DeprecationWarning,
                    stacklevel=2,
                )
                self.p = 1.0
            else:
                warn(
                    "always_apply is deprecated.",
                    DeprecationWarning,
                    stacklevel=2,
                )
        self._additional_targets: dict[str, str] = {}
        # replay mode params
        self.params: dict[Any, Any] = {}
        self._key2func = {}
        self._set_keys()

    def __call__(self, *args: Any, force_apply: bool = False, **kwargs: Any) -> Any:
        if args:
            msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
            raise KeyError(msg)
        if self.replay_mode:
            if self.applied_in_replay:
                return self.apply_with_params(self.params, **kwargs)

            return kwargs

        if force_apply or (random.random() < self.p):
            params = self.get_params()
            params = self.update_params_shape(params=params, data=kwargs)

            if self.targets_as_params:  # check if all required targets are in kwargs.
                missing_keys = set(self.targets_as_params).difference(kwargs.keys())
                if missing_keys and not (missing_keys == {"image"} and "images" in kwargs):
                    msg = f"{self.__class__.__name__} requires {self.targets_as_params} missing keys: {missing_keys}"
                    raise ValueError(msg)

            params_dependent_on_data = self.get_params_dependent_on_data(params=params, data=kwargs)
            params.update(params_dependent_on_data)

            if self.targets_as_params:  # this block will be removed after removing `get_params_dependent_on_targets`
                targets_as_params = {k: kwargs.get(k, None) for k in self.targets_as_params}
                if missing_keys:  # here we expecting case when missing_keys == {"image"} and "images" in kwargs
                    targets_as_params["image"] = kwargs["images"][0]
                params_dependent_on_targets = self.get_params_dependent_on_targets(targets_as_params)
                params.update(params_dependent_on_targets)
            if self.deterministic:
                kwargs[self.save_key][id(self)] = deepcopy(params)
            return self.apply_with_params(params, **kwargs)

        return kwargs

    def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
        """Apply transforms with parameters."""
        params = self.update_params(params, **kwargs)  # remove after move parameters like interpolation
        res = {}
        for key, arg in kwargs.items():
            if key in self._key2func and arg is not None:
                target_function = self._key2func[key]
                res[key] = target_function(arg, **params)
            else:
                res[key] = arg
        return res

    def set_deterministic(self, flag: bool, save_key: str = "replay") -> BasicTransform:
        """Set transform to be deterministic."""
        if save_key == "params":
            msg = "params save_key is reserved"
            raise KeyError(msg)

        self.deterministic = flag
        if self.deterministic and self.targets_as_params:
            warn(
                self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
                " because its' params depend on targets.",
                stacklevel=2,
            )
        self.save_key = save_key
        return self

    def __repr__(self) -> str:
        state = self.get_base_init_args()
        state.update(self.get_transform_init_args())
        return f"{self.__class__.__name__}({format_args(state)})"

    def apply(self, img: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        """Apply transform on image."""
        raise NotImplementedError

    def apply_to_images(self, images: np.ndarray, **params: Any) -> list[np.ndarray]:
        """Apply transform on images."""
        return [self.apply(image, **params) for image in images]

    def get_params(self) -> dict[str, Any]:
        """Returns parameters independent of input."""
        return {}

    def update_params_shape(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        """Updates parameters with input image shape."""
        # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
        shape = data["image"].shape if "image" in data else data["images"][0].shape
        params["shape"] = shape
        params.update({"cols": shape[1], "rows": shape[0]})
        return params

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        """Returns parameters dependent on input."""
        return params

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        # mapping for targets and methods for which they depend
        # for example:
        # >>  {"image": self.apply}
        # >>  {"masks": self.apply_to_masks}
        raise NotImplementedError

    def _set_keys(self) -> None:
        """Set _available_keys."""
        if not hasattr(self, "_targets"):
            self._available_keys = set()
        else:
            self._available_keys = {
                target.value.lower()
                for target in (self._targets if isinstance(self._targets, tuple) else [self._targets])
            }
        self._available_keys.update(self.targets.keys())
        self._key2func = {key: self.targets[key] for key in self._available_keys if key in self.targets}

    @property
    def available_keys(self) -> set[str]:
        """Returns set of available keys."""
        return self._available_keys

    def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
        """Update parameters with transform specific params.
        This method is deprecated, use:
        - `get_params` for transform specific params like interpolation and
        - `update_params_shape` for data like shape.
        """
        if hasattr(self, "interpolation"):
            params["interpolation"] = self.interpolation
        if hasattr(self, "fill_value"):
            params["fill_value"] = self.fill_value
        if hasattr(self, "mask_fill_value"):
            params["mask_fill_value"] = self.mask_fill_value

        # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
        shape = kwargs["image"].shape if "image" in kwargs else kwargs["images"][0].shape
        params["shape"] = shape
        params.update({"cols": shape[1], "rows": shape[0]})
        return params

    def add_targets(self, additional_targets: dict[str, str]) -> None:
        """Add targets to transform them the same way as one of existing targets.
        ex: {'target_image': 'image'}
        ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
        by the way you must have at least one object with key 'image'

        Args:
            additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

        """
        for k, v in additional_targets.items():
            if k in self._additional_targets and v != self._additional_targets[k]:
                raise ValueError(
                    f"Trying to overwrite existed additional targets. "
                    f"Key={k} Exists={self._additional_targets[k]} New value: {v}",
                )
            if v in self._available_keys:
                self._additional_targets[k] = v
                self._key2func[k] = self.targets[v]
                self._available_keys.add(k)

    @property
    def targets_as_params(self) -> list[str]:
        """Targets used to get params dependent on targets.
        This is used to check input has all required targets.
        """
        return []

    def get_params_dependent_on_targets(self, params: dict[str, Any]) -> dict[str, Any]:
        """This method is deprecated.
        Use `get_params_dependent_on_data` instead.
        Returns parameters dependent on targets.
        Dependent target is defined in `self.targets_as_params`
        """
        return {}

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        """Returns names of arguments that are used in __init__ method of the transform."""
        msg = (
            f"Class {self.get_class_fullname()} is not serializable because the `get_transform_init_args_names` "
            "method is not implemented"
        )
        raise NotImplementedError(msg)

    def get_base_init_args(self) -> dict[str, Any]:
        """Returns base init args - p"""
        return {"p": self.p}

    def get_transform_init_args(self) -> dict[str, Any]:
        return {k: getattr(self, k) for k in self.get_transform_init_args_names()}

    def to_dict_private(self) -> dict[str, Any]:
        state = {"__class_fullname__": self.get_class_fullname()}
        state.update(self.get_base_init_args())
        state.update(self.get_transform_init_args())

        return state

    def get_dict_with_id(self) -> dict[str, Any]:
        d = self.to_dict_private()
        d["id"] = id(self)
        return d

`available_keys: set[str]` `property` `readonly` ¶

Returns set of available keys.

`targets_as_params: list[str]` `property` `readonly` ¶

Targets used to get params dependent on targets. This is used to check input has all required targets.

`add_targets (self, additional_targets)` ¶

Add targets to transform them the same way as one of existing targets. ex: {'target_image': 'image'} ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'} by the way you must have at least one object with key 'image'

Parameters:

Name	Type	Description
`additional_targets`	`dict`	keys - new target name, values - old target name. ex: {'image2': 'image'}

Source code in albumentations/core/transforms_interface.py

Python

def add_targets(self, additional_targets: dict[str, str]) -> None:
    """Add targets to transform them the same way as one of existing targets.
    ex: {'target_image': 'image'}
    ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
    by the way you must have at least one object with key 'image'

    Args:
        additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

    """
    for k, v in additional_targets.items():
        if k in self._additional_targets and v != self._additional_targets[k]:
            raise ValueError(
                f"Trying to overwrite existed additional targets. "
                f"Key={k} Exists={self._additional_targets[k]} New value: {v}",
            )
        if v in self._available_keys:
            self._additional_targets[k] = v
            self._key2func[k] = self.targets[v]
            self._available_keys.add(k)

`apply (self, img, *args, **params)` ¶

Apply transform on image.

Source code in albumentations/core/transforms_interface.py

Python

def apply(self, img: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
    """Apply transform on image."""
    raise NotImplementedError

`apply_to_images (self, images, **params)` ¶

Apply transform on images.

Source code in albumentations/core/transforms_interface.py

Python

def apply_to_images(self, images: np.ndarray, **params: Any) -> list[np.ndarray]:
    """Apply transform on images."""
    return [self.apply(image, **params) for image in images]

`apply_with_params (self, params, *args, **kwargs)` ¶

Apply transforms with parameters.

Source code in albumentations/core/transforms_interface.py

Python

def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
    """Apply transforms with parameters."""
    params = self.update_params(params, **kwargs)  # remove after move parameters like interpolation
    res = {}
    for key, arg in kwargs.items():
        if key in self._key2func and arg is not None:
            target_function = self._key2func[key]
            res[key] = target_function(arg, **params)
        else:
            res[key] = arg
    return res

`get_base_init_args (self)` ¶

Returns base init args - p

Source code in albumentations/core/transforms_interface.py

Python

def get_base_init_args(self) -> dict[str, Any]:
    """Returns base init args - p"""
    return {"p": self.p}

`get_params (self)` ¶

Returns parameters independent of input.

Source code in albumentations/core/transforms_interface.py

Python

def get_params(self) -> dict[str, Any]:
    """Returns parameters independent of input."""
    return {}

`get_params_dependent_on_data (self, params, data)` ¶

Returns parameters dependent on input.

Source code in albumentations/core/transforms_interface.py

Python

def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    """Returns parameters dependent on input."""
    return params

`get_params_dependent_on_targets (self, params)` ¶

This method is deprecated. Use get_params_dependent_on_data instead. Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/core/transforms_interface.py

Python

def get_params_dependent_on_targets(self, params: dict[str, Any]) -> dict[str, Any]:
    """This method is deprecated.
    Use `get_params_dependent_on_data` instead.
    Returns parameters dependent on targets.
    Dependent target is defined in `self.targets_as_params`
    """
    return {}

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/core/transforms_interface.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    """Returns names of arguments that are used in __init__ method of the transform."""
    msg = (
        f"Class {self.get_class_fullname()} is not serializable because the `get_transform_init_args_names` "
        "method is not implemented"
    )
    raise NotImplementedError(msg)

`set_deterministic (self, flag, save_key='replay')` ¶

Set transform to be deterministic.

Source code in albumentations/core/transforms_interface.py

Python

def set_deterministic(self, flag: bool, save_key: str = "replay") -> BasicTransform:
    """Set transform to be deterministic."""
    if save_key == "params":
        msg = "params save_key is reserved"
        raise KeyError(msg)

    self.deterministic = flag
    if self.deterministic and self.targets_as_params:
        warn(
            self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
            " because its' params depend on targets.",
            stacklevel=2,
        )
    self.save_key = save_key
    return self

`update_params (self, params, **kwargs)` ¶

Update parameters with transform specific params. This method is deprecated, use: - get_params for transform specific params like interpolation and - update_params_shape for data like shape.

Source code in albumentations/core/transforms_interface.py

Python

def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
    """Update parameters with transform specific params.
    This method is deprecated, use:
    - `get_params` for transform specific params like interpolation and
    - `update_params_shape` for data like shape.
    """
    if hasattr(self, "interpolation"):
        params["interpolation"] = self.interpolation
    if hasattr(self, "fill_value"):
        params["fill_value"] = self.fill_value
    if hasattr(self, "mask_fill_value"):
        params["mask_fill_value"] = self.mask_fill_value

    # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
    shape = kwargs["image"].shape if "image" in kwargs else kwargs["images"][0].shape
    params["shape"] = shape
    params.update({"cols": shape[1], "rows": shape[0]})
    return params

`update_params_shape (self, params, data)` ¶

Updates parameters with input image shape.

Source code in albumentations/core/transforms_interface.py

Python

def update_params_shape(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    """Updates parameters with input image shape."""
    # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
    shape = data["image"].shape if "image" in data else data["images"][0].shape
    params["shape"] = shape
    params.update({"cols": shape[1], "rows": shape[0]})
    return params

`class DualTransform` [view source on GitHub] ¶

A base class for transformations that should be applied both to an image and its corresponding properties such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image, all associated entities are transformed accordingly to maintain consistency between the image and its annotations.

Properties

targets (dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints) that the transform should be applied to and maps them to the corresponding methods.

Methods

apply_to_bbox(bbox: BoxInternalType, args: Any, *params: Any) -> BoxInternalType: Applies the transform to a single bounding box. Should be implemented in the subclass.

apply_to_keypoint(keypoint: KeypointInternalType, args: Any, *params: Any) -> KeypointInternalType: Applies the transform to a single keypoint. Should be implemented in the subclass.

apply_to_bboxes(bboxes: Sequence[BoxType], args: Any, *params: Any) -> Sequence[BoxType]: Applies the transform to a list of bounding boxes. Delegates to apply_to_bbox for each bounding box.

apply_to_keypoints(keypoints: Sequence[KeypointType], args: Any, *params: Any) -> Sequence[KeypointType]: Applies the transform to a list of keypoints. Delegates to apply_to_keypoint for each keypoint.

apply_to_mask(mask: np.ndarray, args: Any, *params: Any) -> np.ndarray: Applies the transform specifically to a single mask.

apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]: Applies the transform to a list of masks. Delegates to apply_to_mask for each mask.

Note

This class is intended to be subclassed and should not be used directly. Subclasses are expected to implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the corresponding apply_to_* methods.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py

Python

class DualTransform(BasicTransform):
    """A base class for transformations that should be applied both to an image and its corresponding properties
    such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image,
    all associated entities are transformed accordingly to maintain consistency between the image and its annotations.

    Properties:
        targets (dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints)
            that the transform should be applied to and maps them to the corresponding methods.

    Methods:
        apply_to_bbox(bbox: BoxInternalType, *args: Any, **params: Any) -> BoxInternalType:
            Applies the transform to a single bounding box. Should be implemented in the subclass.

        apply_to_keypoint(keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
            Applies the transform to a single keypoint. Should be implemented in the subclass.

        apply_to_bboxes(bboxes: Sequence[BoxType], *args: Any, **params: Any) -> Sequence[BoxType]:
            Applies the transform to a list of bounding boxes. Delegates to `apply_to_bbox` for each bounding box.

        apply_to_keypoints(keypoints: Sequence[KeypointType], *args: Any, **params: Any) -> Sequence[KeypointType]:
            Applies the transform to a list of keypoints. Delegates to `apply_to_keypoint` for each keypoint.

        apply_to_mask(mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
            Applies the transform specifically to a single mask.

        apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]:
            Applies the transform to a list of masks. Delegates to `apply_to_mask` for each mask.

    Note:
        This class is intended to be subclassed and should not be used directly. Subclasses are expected to
        implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the
        corresponding `apply_to_*` methods.

    """

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "images": self.apply_to_images,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
            "keypoints": self.apply_to_keypoints,
        }

    def apply_to_bbox(self, bbox: BoxInternalType, *args: Any, **params: Any) -> BoxInternalType:
        msg = f"Method apply_to_bbox is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
        msg = f"Method apply_to_keypoint is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_global_label(self, label: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        msg = f"Method apply_to_global_label is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_bboxes(self, bboxes: Sequence[BoxType], *args: Any, **params: Any) -> Sequence[BoxType]:
        return [
            self.apply_to_bbox(cast(BoxInternalType, tuple(cast(BoxInternalType, bbox[:4]))), **params)
            + tuple(bbox[4:])
            for bbox in bboxes
        ]

    def apply_to_keypoints(
        self,
        keypoints: Sequence[KeypointType],
        *args: Any,
        **params: Any,
    ) -> Sequence[KeypointType]:
        return [
            self.apply_to_keypoint(cast(KeypointInternalType, tuple(keypoint[:4])), **params) + tuple(keypoint[4:])
            for keypoint in keypoints
        ]

    def apply_to_mask(self, mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        return self.apply(mask, **{k: cv2.INTER_NEAREST if k == "interpolation" else v for k, v in params.items()})

    def apply_to_masks(self, masks: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]:
        return [self.apply_to_mask(mask, **params) for mask in masks]

    def apply_to_global_labels(self, labels: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]:
        return [self.apply_to_global_label(label, **params) for label in labels]

`class ImageOnlyTransform` [view source on GitHub] ¶

Transform applied to image only.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py

Python

class ImageOnlyTransform(BasicTransform):
    """Transform applied to image only."""

    _targets = Targets.IMAGE

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {"image": self.apply, "images": self.apply_to_images}

`class NoOp` [view source on GitHub] ¶

Identity transform (does nothing).

Targets

image, mask, bboxes, keypoints, global_label

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py

Python

class NoOp(DualTransform):
    """Identity transform (does nothing).

    Targets:
        image, mask, bboxes, keypoints, global_label
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS, Targets.GLOBAL_LABEL)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return keypoint

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return bbox

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return img

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
        return mask

    def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
        return label

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ()

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/core/transforms_interface.py

Python

def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return img

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/core/transforms_interface.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ()

`types` ¶

`class ImageCompressionType` [view source on GitHub] ¶

Defines the types of image compression.

This Enum class is used to specify the image compression format.

Attributes:

Name	Type	Description
`JPEG`	`int`	Represents the JPEG image compression format.
`WEBP`	`int`	Represents the WEBP image compression format.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/types.py

Python

class ImageCompressionType(IntEnum):
    """Defines the types of image compression.

    This Enum class is used to specify the image compression format.

    Attributes:
        JPEG (int): Represents the JPEG image compression format.
        WEBP (int): Represents the WEBP image compression format.

    """

    JPEG = 0
    WEBP = 1

`utils` ¶

`class DataProcessor` `(params, additional_targets=None)` [view source on GitHub] ¶

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/utils.py

Python

class DataProcessor(ABC):
    def __init__(self, params: Params, additional_targets: dict[str, str] | None = None):
        self.params = params
        self.data_fields = [self.default_data_name]
        if additional_targets is not None:
            self.add_targets(additional_targets)

    @property
    @abstractmethod
    def default_data_name(self) -> str:
        raise NotImplementedError

    def add_targets(self, additional_targets: dict[str, str]) -> None:
        """Add targets to transform them the same way as one of existing targets."""
        for k, v in additional_targets.items():
            if v == self.default_data_name and k not in self.data_fields:
                self.data_fields.append(k)

    def ensure_data_valid(self, data: dict[str, Any]) -> None:
        pass

    def ensure_transforms_valid(self, transforms: Sequence[object]) -> None:
        pass

    def postprocess(self, data: dict[str, Any]) -> dict[str, Any]:
        rows, cols = get_shape(data["image"])

        for data_name in self.data_fields:
            if data_name in data:
                data[data_name] = self.filter(data[data_name], rows, cols)
                data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="from")

        return self.remove_label_fields_from_data(data)

    def preprocess(self, data: dict[str, Any]) -> None:
        data = self.add_label_fields_to_data(data)

        rows, cols = data["image"].shape[:2]
        for data_name in self.data_fields:
            if data_name in data:
                data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="to")

    def check_and_convert(
        self,
        data: list[BoxOrKeypointType],
        rows: int,
        cols: int,
        direction: Literal["to", "from"] = "to",
    ) -> list[BoxOrKeypointType]:
        if self.params.format == "albumentations":
            self.check(data, rows, cols)
            return data

        if direction == "to":
            return self.convert_to_albumentations(data, rows, cols)

        if direction == "from":
            return self.convert_from_albumentations(data, rows, cols)

        raise ValueError(f"Invalid direction. Must be `to` or `from`. Got `{direction}`")

    @abstractmethod
    def filter(self, data: Sequence[BoxOrKeypointType], rows: int, cols: int) -> Sequence[BoxOrKeypointType]:
        pass

    @abstractmethod
    def check(self, data: list[BoxOrKeypointType], rows: int, cols: int) -> None:
        pass

    @abstractmethod
    def convert_to_albumentations(self, data: list[BoxOrKeypointType], rows: int, cols: int) -> list[BoxOrKeypointType]:
        pass

    @abstractmethod
    def convert_from_albumentations(
        self,
        data: list[BoxOrKeypointType],
        rows: int,
        cols: int,
    ) -> list[BoxOrKeypointType]:
        pass

    def add_label_fields_to_data(self, data: dict[str, Any]) -> dict[str, Any]:
        if self.params.label_fields is None:
            return data
        for data_name in self.data_fields:
            if data_name in data:
                for field in self.params.label_fields:
                    if not len(data[data_name]) == len(data[field]):
                        raise ValueError(
                            f"The lengths of bboxes and labels do not match. Got {len(data[data_name])} "
                            f"and {len(data[field])} respectively.",
                        )

                    data_with_added_field = []
                    for d, field_value in zip(data[data_name], data[field]):
                        data_with_added_field.append([*list(d), field_value])
                    data[data_name] = data_with_added_field
        return data

    def remove_label_fields_from_data(self, data: dict[str, Any]) -> dict[str, Any]:
        if not self.params.label_fields:
            return data
        label_fields_len = len(self.params.label_fields)
        for data_name in self.data_fields:
            if data_name in data:
                for idx, field in enumerate(self.params.label_fields):
                    data[field] = [bbox[-label_fields_len + idx] for bbox in data[data_name]]
                data[data_name] = [d[:-label_fields_len] for d in data[data_name]]
        return data

`add_targets (self, additional_targets)` ¶

Add targets to transform them the same way as one of existing targets.

Source code in albumentations/core/utils.py

Python

def add_targets(self, additional_targets: dict[str, str]) -> None:
    """Add targets to transform them the same way as one of existing targets."""
    for k, v in additional_targets.items():
        if v == self.default_data_name and k not in self.data_fields:
            self.data_fields.append(k)

`def to_tuple (param, low=None, bias=None)` [view source on GitHub]¶

Convert input argument to a min-max tuple.

Parameters:

Name	Type	Description
`param`	`ScaleType`	Input value which could be a scalar or a sequence of exactly 2 scalars.
`low`	`ScaleType \| None`	Second element of the tuple, provided as an optional argument for when `param` is a scalar.
`bias`	`ScalarType \| None`	An offset added to both elements of the tuple.

Returns:

Type	Description
`tuple[int, int] \| tuple[float, float]`	A tuple of two scalars, optionally adjusted by `bias`. Raises ValueError for invalid combinations or types of arguments.

Source code in albumentations/core/utils.py

Python

def to_tuple(
    param: ScaleType,
    low: ScaleType | None = None,
    bias: ScalarType | None = None,
) -> tuple[int, int] | tuple[float, float]:
    """Convert input argument to a min-max tuple.

    Args:
        param: Input value which could be a scalar or a sequence of exactly 2 scalars.
        low: Second element of the tuple, provided as an optional argument for when `param` is a scalar.
        bias: An offset added to both elements of the tuple.

    Returns:
        A tuple of two scalars, optionally adjusted by `bias`.
        Raises ValueError for invalid combinations or types of arguments.

    """
    # Validate mutually exclusive arguments
    if low is not None and bias is not None:
        msg = "Arguments 'low' and 'bias' cannot be used together."
        raise ValueError(msg)

    if isinstance(param, Sequence) and len(param) == PAIR:
        min_val, max_val = min(param), max(param)

    # Handle scalar input
    elif isinstance(param, (int, float)):
        if isinstance(low, (int, float)):
            # Use low and param to create a tuple
            min_val, max_val = (low, param) if low < param else (param, low)
        else:
            # Create a symmetric tuple around 0
            min_val, max_val = -param, param
    else:
        msg = "Argument 'param' must be either a scalar or a sequence of 2 elements."
        raise ValueError(msg)

    # Apply bias if provided
    if bias is not None:
        return (bias + min_val, bias + max_val)

    return min_val, max_val

`validation` ¶

`class ValidatedTransformMeta` [view source on GitHub] ¶

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/validation.py

Python

class ValidatedTransformMeta(type):
    def __new__(cls: type[Any], name: str, bases: tuple[type, ...], dct: dict[str, Any]) -> type[Any]:
        if "InitSchema" in dct and issubclass(dct["InitSchema"], BaseModel):
            original_init: Callable[..., Any] | None = dct.get("__init__")
            if original_init is None:
                msg = "__init__ not found in class definition"
                raise ValueError(msg)

            original_sig = signature(original_init)

            def custom_init(self: Any, *args: Any, **kwargs: Any) -> None:
                init_params = signature(original_init).parameters
                param_names = list(init_params.keys())[1:]  # Exclude 'self'
                full_kwargs: dict[str, Any] = dict(zip(param_names, args))
                full_kwargs.update(kwargs)

                for parameter_name, parameter in init_params.items():
                    if (
                        parameter_name != "self"
                        and parameter_name not in full_kwargs
                        and parameter.default is not Parameter.empty
                    ):
                        full_kwargs[parameter_name] = parameter.default

                # No try-except block needed as we want the exception to propagate naturally
                config = dct["InitSchema"](**full_kwargs)

                validated_kwargs = config.model_dump()
                for name_arg in kwargs:
                    if name_arg not in validated_kwargs:
                        warn(
                            f"Argument '{name_arg}' is not valid and will be ignored.",
                            stacklevel=2,
                        )

                original_init(self, **validated_kwargs)

            # Preserve the original signature and docstring
            custom_init.__signature__ = original_sig  # type: ignore[attr-defined]
            custom_init.__doc__ = original_init.__doc__

            # Rename __init__ to custom_init to avoid the N807 warning
            dct["__init__"] = custom_init

        return super().__new__(cls, name, bases, dct)

`new (cls, name, bases, dct)` `special` `staticmethod` ¶

Create and return a new object. See help(type) for accurate signature.

Source code in albumentations/core/validation.py

Python

def __new__(cls: type[Any], name: str, bases: tuple[type, ...], dct: dict[str, Any]) -> type[Any]:
    if "InitSchema" in dct and issubclass(dct["InitSchema"], BaseModel):
        original_init: Callable[..., Any] | None = dct.get("__init__")
        if original_init is None:
            msg = "__init__ not found in class definition"
            raise ValueError(msg)

        original_sig = signature(original_init)

        def custom_init(self: Any, *args: Any, **kwargs: Any) -> None:
            init_params = signature(original_init).parameters
            param_names = list(init_params.keys())[1:]  # Exclude 'self'
            full_kwargs: dict[str, Any] = dict(zip(param_names, args))
            full_kwargs.update(kwargs)

            for parameter_name, parameter in init_params.items():
                if (
                    parameter_name != "self"
                    and parameter_name not in full_kwargs
                    and parameter.default is not Parameter.empty
                ):
                    full_kwargs[parameter_name] = parameter.default

            # No try-except block needed as we want the exception to propagate naturally
            config = dct["InitSchema"](**full_kwargs)

            validated_kwargs = config.model_dump()
            for name_arg in kwargs:
                if name_arg not in validated_kwargs:
                    warn(
                        f"Argument '{name_arg}' is not valid and will be ignored.",
                        stacklevel=2,
                    )

            original_init(self, **validated_kwargs)

        # Preserve the original signature and docstring
        custom_init.__signature__ = original_sig  # type: ignore[attr-defined]
        custom_init.__doc__ = original_init.__doc__

        # Rename __init__ to custom_init to avoid the N807 warning
        dct["__init__"] = custom_init

    return super().__new__(cls, name, bases, dct)

`pytorch` `special` ¶

`transforms` ¶

`class ToTensorV2` `(transpose_mask=False, p=1.0, always_apply=None)` [view source on GitHub] ¶

Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy HWC format and converts them to PyTorch CHW format. If the image is in HW format, it will be converted to PyTorch HW.

Attributes:

Name	Type	Description
`transpose_mask`	`bool`	If True, transposes 3D input mask dimensions from `[height, width, num_channels]` to `[num_channels, height, width]`.
`always_apply`	`bool`	Deprecated. Default: None.
`p`	`float`	Probability of applying the transform. Default: 1.0.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/pytorch/transforms.py

Python

class ToTensorV2(BasicTransform):
    """Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy `HWC` format
    and converts them to PyTorch `CHW` format. If the image is in `HW` format, it will be converted to PyTorch `HW`.

    Attributes:
        transpose_mask (bool): If True, transposes 3D input mask dimensions from `[height, width, num_channels]` to
            `[num_channels, height, width]`.
        always_apply (bool): Deprecated. Default: None.
        p (float): Probability of applying the transform. Default: 1.0.

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(self, transpose_mask: bool = False, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.transpose_mask = transpose_mask

    @property
    def targets(self) -> dict[str, Any]:
        return {"image": self.apply, "mask": self.apply_to_mask, "masks": self.apply_to_masks}

    def apply(self, img: np.ndarray, **params: Any) -> torch.Tensor:
        if len(img.shape) not in [2, 3]:
            msg = "Albumentations only supports images in HW or HWC format"
            raise ValueError(msg)

        if len(img.shape) == MONO_CHANNEL_DIMENSIONS:
            img = np.expand_dims(img, 2)

        return torch.from_numpy(img.transpose(2, 0, 1))

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> torch.Tensor:
        if self.transpose_mask and mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS:
            mask = mask.transpose(2, 0, 1)
        return torch.from_numpy(mask)

    def apply_to_masks(self, masks: list[np.ndarray], **params: Any) -> list[torch.Tensor]:
        return [self.apply_to_mask(mask, **params) for mask in masks]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("transpose_mask",)

`init (self, transpose_mask=False, p=1.0, always_apply=None)` `special` ¶

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/pytorch/transforms.py

Python

def __init__(self, transpose_mask: bool = False, p: float = 1.0, always_apply: bool | None = None):
    super().__init__(p=p, always_apply=always_apply)
    self.transpose_mask = transpose_mask

`apply (self, img, **params)` ¶

Apply transform on image.

Source code in albumentations/pytorch/transforms.py

Python

def apply(self, img: np.ndarray, **params: Any) -> torch.Tensor:
    if len(img.shape) not in [2, 3]:
        msg = "Albumentations only supports images in HW or HWC format"
        raise ValueError(msg)

    if len(img.shape) == MONO_CHANNEL_DIMENSIONS:
        img = np.expand_dims(img, 2)

    return torch.from_numpy(img.transpose(2, 0, 1))

`get_transform_init_args_names (self)` ¶

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/pytorch/transforms.py

Python

def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("transpose_mask",)

`random_utils` ¶

`def shuffle (a, random_state=None)` [view source on GitHub]¶

Shuffles an array in-place, using a specified random state or creating a new one if not provided.

Parameters:

Name	Type	Description
`a`	`np.ndarray`	The array to be shuffled.
`random_state`	`Optional[np.random.RandomState]`	The random state used for shuffling. Defaults to None.

Returns:

Type	Description
`np.ndarray`	The shuffled array (note: the shuffle is in-place, so the original array is modified).

Source code in albumentations/random_utils.py

Python

def shuffle(
    a: np.ndarray,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Shuffles an array in-place, using a specified random state or creating a new one if not provided.

    Args:
        a (np.ndarray): The array to be shuffled.
        random_state (Optional[np.random.RandomState], optional): The random state used for shuffling. Defaults to None.

    Returns:
        np.ndarray: The shuffled array (note: the shuffle is in-place, so the original array is modified).
    """
    if random_state is None:
        random_state = get_random_state()
    random_state.shuffle(a)
    return a

Full API Reference on a single page¶

Pixel-level transforms¶

Spatial-level transforms¶

augmentations special ¶

blur special ¶

transforms ¶

class AdvancedBlur (blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, kernel, **params) ¶

get_params (self) ¶

get_transform_init_args_names (self) ¶

class Blur (blur_limit=7, p=0.5, always_apply=None) [view source on GitHub] ¶

apply (self, img, kernel, **params) ¶

get_params (self) ¶

get_transform_init_args_names (self) ¶

class Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, radius, alias_blur, **params) ¶

get_params (self) ¶

get_transform_init_args_names (self) ¶

class GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, ksize, sigma, **params) ¶

get_params (self) ¶

get_transform_init_args_names (self) ¶

class GlassBlur (sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, *args, *, dxy, **params) ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class MedianBlur (blur_limit=7, p=0.5, always_apply=None) [view source on GitHub] ¶

__init__ (self, blur_limit=7, p=0.5, always_apply=None) special ¶

apply (self, img, kernel, **params) ¶

class MotionBlur (blur_limit=7, allow_shifted=True, always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, kernel, **params) ¶

get_params (self) ¶

get_transform_init_args_names (self) ¶

class ZoomBlur (max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, zoom_factors, **params) ¶

get_params (self) ¶

get_transform_init_args_names (self) ¶

crops special ¶

functional ¶

def crop_keypoint_by_coords (keypoint, crop_coords) [view source on GitHub]¶

transforms ¶

class BBoxSafeRandomCrop (erosion_rate=0.0, p=1.0, always_apply=None) [view source on GitHub] ¶

targets_as_params: list[str] property readonly ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class CenterCrop (height, width, p=1.0, always_apply=None) [view source on GitHub] ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=None, p=1.0) [view source on GitHub] ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class CropAndPad (px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=None, p=1.0) [view source on GitHub] ¶

apply (self, img, crop_params, pad_params, pad_value, rows, cols, interpolation, **params) ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, always_apply=None, p=1.0) [view source on GitHub] ¶

get_transform_init_args_names (self) ¶

update_params (self, params, **kwargs) ¶

class RandomCrop (height, width, p=1.0, always_apply=None) [view source on GitHub] ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=None, p=1.0) [view source on GitHub] ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class RandomCropNearBBox (max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=None, p=1.0) [view source on GitHub] ¶

targets_as_params: list[str] property readonly ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class RandomResizedCrop (size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=None, p=1.0) [view source on GitHub] ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

class RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, always_apply=None, p=1.0) [view source on GitHub] ¶

apply (self, img, crop_coords, **params) ¶

get_transform_init_args_names (self) ¶

class RandomSizedCrop (min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, always_apply=None, p=1.0) [view source on GitHub] ¶

get_params_dependent_on_data (self, params, data) ¶

get_transform_init_args_names (self) ¶

domain_adaptation ¶

class FDA (reference_images, beta_limit=(0, 0.1), read_fn=<function read_rgb_image at 0x7f7a777ca520>, always_apply=None, p=0.5) [view source on GitHub] ¶

apply (self, img, target_image, beta, **params) ¶

`augmentations` `special` ¶

`blur` `special` ¶

`transforms` ¶

`class AdvancedBlur` `(blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, kernel, **params)` ¶

`get_params (self)` ¶

`get_transform_init_args_names (self)` ¶

`class Blur` `(blur_limit=7, p=0.5, always_apply=None)` [view source on GitHub] ¶

`apply (self, img, kernel, **params)` ¶

`get_params (self)` ¶

`get_transform_init_args_names (self)` ¶

`class Defocus` `(radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, radius, alias_blur, **params)` ¶

`get_params (self)` ¶

`get_transform_init_args_names (self)` ¶

`class GaussianBlur` `(blur_limit=(3, 7), sigma_limit=0, always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, ksize, sigma, **params)` ¶

`get_params (self)` ¶

`get_transform_init_args_names (self)` ¶

`class GlassBlur` `(sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, args, , dxy, **params)` ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class MedianBlur` `(blur_limit=7, p=0.5, always_apply=None)` [view source on GitHub] ¶

`init (self, blur_limit=7, p=0.5, always_apply=None)` `special` ¶

`apply (self, img, kernel, **params)` ¶

`class MotionBlur` `(blur_limit=7, allow_shifted=True, always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, kernel, **params)` ¶

`get_params (self)` ¶

`get_transform_init_args_names (self)` ¶

`class ZoomBlur` `(max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, zoom_factors, **params)` ¶

`get_params (self)` ¶

`get_transform_init_args_names (self)` ¶

`crops` `special` ¶

`functional` ¶

`def crop_keypoint_by_coords (keypoint, crop_coords)` [view source on GitHub]¶

`transforms` ¶

`class BBoxSafeRandomCrop` `(erosion_rate=0.0, p=1.0, always_apply=None)` [view source on GitHub] ¶

`targets_as_params: list[str]` `property` `readonly` ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class CenterCrop` `(height, width, p=1.0, always_apply=None)` [view source on GitHub] ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class Crop` `(x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=None, p=1.0)` [view source on GitHub] ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class CropAndPad` `(px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

`apply (self, img, crop_params, pad_params, pad_value, rows, cols, interpolation, **params)` ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class CropNonEmptyMaskIfExists` `(height, width, ignore_values=None, ignore_channels=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

`get_transform_init_args_names (self)` ¶

`update_params (self, params, **kwargs)` ¶

`class RandomCrop` `(height, width, p=1.0, always_apply=None)` [view source on GitHub] ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class RandomCropFromBorders` `(crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=None, p=1.0)` [view source on GitHub] ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class RandomCropNearBBox` `(max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=None, p=1.0)` [view source on GitHub] ¶

`targets_as_params: list[str]` `property` `readonly` ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class RandomResizedCrop` `(size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`class RandomSizedBBoxSafeCrop` `(height, width, erosion_rate=0.0, interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

`apply (self, img, crop_coords, **params)` ¶

`get_transform_init_args_names (self)` ¶

`class RandomSizedCrop` `(min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, always_apply=None, p=1.0)` [view source on GitHub] ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶

`domain_adaptation` ¶

`class FDA` `(reference_images, beta_limit=(0, 0.1), read_fn=<function read_rgb_image at 0x7f7a777ca520>, always_apply=None, p=0.5)` [view source on GitHub] ¶

`apply (self, img, target_image, beta, **params)` ¶

`get_params (self)` ¶

`get_params_dependent_on_data (self, params, data)` ¶

`get_transform_init_args_names (self)` ¶