Skip to content

Full API Reference on a single page

Pixel-level transforms

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

Spatial-level transforms

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform Image Mask BBoxes Keypoints Global Label
Affine
BBoxSafeRandomCrop
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
D4
ElasticTransform
Flip
GridDistortion
GridDropout
HorizontalFlip
Lambda
LongestMaxSize
MaskDropout
MixUp
Morphological
NoOp
OpticalDistortion
PadIfNeeded
Perspective
PiecewiseAffine
PixelDropout
RandomCrop
RandomCropFromBorders
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
SafeRotate
ShiftScaleRotate
SmallestMaxSize
Transpose
VerticalFlip
XYMasking

augmentations special

blur special

transforms

class AdvancedBlur (blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=False, p=0.5) [view source on GitHub]

Blurs the input image using a Generalized Normal filter with randomly selected parameters.

This transform also adds multiplicative noise to the generated kernel before convolution, affecting the image in a unique way that combines blurring and noise injection for enhanced data augmentation.

Parameters:

Name Type Description
blur_limit ScaleIntType

Maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If a single value is provided, blur_limit will be in the range (0, blur_limit). Defaults to (3, 7).

sigma_x_limit ScaleFloatType

Gaussian kernel standard deviation for the X dimension. Must be in range [0, inf). If a single value is provided, sigma_x_limit will be in the range (0, sigma_limit). If set to 0, sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Defaults to (0.2, 1.0).

sigma_y_limit ScaleFloatType

Gaussian kernel standard deviation for the Y dimension. Must follow the same rules as sigma_x_limit. Defaults to (0.2, 1.0).

rotate_limit ScaleIntType

Range from which a random angle used to rotate the Gaussian kernel is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Defaults to (-90, 90).

beta_limit ScaleFloatType

Distribution shape parameter. 1 represents the normal distribution. Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it lighter than normal. Defaults to (0.5, 8.0).

noise_limit ScaleFloatType

Multiplicative factor that controls the strength of kernel noise. Must be positive and preferably centered around 1.0. If a single value is provided, noise_limit will be in the range (0, noise_limit). Defaults to (0.75, 1.25).

p float

Probability of applying the transform. Defaults to 0.5.

Reference

"Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data", available at https://arxiv.org/abs/2107.10833

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class AdvancedBlur(ImageOnlyTransform):
    """Blurs the input image using a Generalized Normal filter with randomly selected parameters.

    This transform also adds multiplicative noise to the generated kernel before convolution,
    affecting the image in a unique way that combines blurring and noise injection for enhanced
    data augmentation.

    Args:
        blur_limit (ScaleIntType, optional): Maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If a single value is provided, `blur_limit` will be in the range (0, blur_limit).
            Defaults to (3, 7).
        sigma_x_limit ScaleFloatType: Gaussian kernel standard deviation for the X dimension.
            Must be in range [0, inf). If a single value is provided, `sigma_x_limit` will be in the range
            (0, sigma_limit). If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
            Defaults to (0.2, 1.0).
        sigma_y_limit ScaleFloatType: Gaussian kernel standard deviation for the Y dimension.
            Must follow the same rules as `sigma_x_limit`.
            Defaults to (0.2, 1.0).
        rotate_limit (ScaleIntType, optional): Range from which a random angle used to rotate the Gaussian kernel
            is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit).
            Defaults to (-90, 90).
        beta_limit (ScaleFloatType, optional): Distribution shape parameter. 1 represents the normal distribution.
            Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it
            lighter than normal.
            Defaults to (0.5, 8.0).
        noise_limit (ScaleFloatType, optional): Multiplicative factor that controls the strength of kernel noise.
            Must be positive and preferably centered around 1.0. If a single value is provided,
            `noise_limit` will be in the range (0, noise_limit).
            Defaults to (0.75, 1.25).
        p (float, optional): Probability of applying the transform.
            Defaults to 0.5.

    Reference:
        "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data",
        available at https://arxiv.org/abs/2107.10833

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BlurInitSchema):
        sigma_x_limit: NonNegativeFloatRangeType = (0.2, 1.0)
        sigma_y_limit: NonNegativeFloatRangeType = (0.2, 1.0)
        beta_limit: NonNegativeFloatRangeType = (0.5, 8.0)
        noise_limit: NonNegativeFloatRangeType = (0.75, 1.25)
        rotate_limit: SymmetricRangeType = (-90, 90)

        @field_validator("beta_limit")
        @classmethod
        def check_beta_limit(cls, value: ScaleFloatType) -> Tuple[float, float]:
            result = to_tuple(value, low=0)
            if not (result[0] < 1.0 < result[1]):
                msg = "beta_limit is expected to include 1.0."
                raise ValueError(msg)
            return result

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.sigma_x_limit, (tuple, list))
                and self.sigma_x_limit[0] == 0
                and isinstance(self.sigma_y_limit, (tuple, list))
                and self.sigma_y_limit[0] == 0
            ):
                msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_x_limit: ScaleFloatType = (0.2, 1.0),
        sigma_y_limit: ScaleFloatType = (0.2, 1.0),
        sigmaX_limit: Optional[ScaleFloatType] = None,  # noqa: N803
        sigmaY_limit: Optional[ScaleFloatType] = None,  # noqa: N803
        rotate_limit: ScaleIntType = 90,
        beta_limit: ScaleFloatType = (0.5, 8.0),
        noise_limit: ScaleFloatType = (0.9, 1.1),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)

        if sigmaX_limit is not None:
            warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_x_limit = sigmaX_limit

        if sigmaY_limit is not None:
            warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_y_limit = sigmaY_limit

        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_x_limit = cast(Tuple[float, float], sigma_x_limit)
        self.sigma_y_limit = cast(Tuple[float, float], sigma_y_limit)
        self.rotate_limit = cast(Tuple[int, int], rotate_limit)
        self.beta_limit = cast(Tuple[float, float], beta_limit)
        self.noise_limit = cast(Tuple[float, float], noise_limit)

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> Dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        sigma_x = random_utils.uniform(*self.sigma_x_limit)
        sigma_y = random_utils.uniform(*self.sigma_y_limit)
        angle = np.deg2rad(random.uniform(*self.rotate_limit))

        # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
        beta = (
            random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
        )

        noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])

        # Generate mesh grid centered at zero.
        ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
        # > Shape (ksize, ksize, 2)
        grid = np.stack(np.meshgrid(ax, ax), axis=-1)

        # Calculate rotated sigma matrix
        d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
        u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
        sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

        inverse_sigma = np.linalg.inv(sigma_matrix)
        # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
        kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
        # Add noise
        kernel *= noise_matrix

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)
        return {"kernel": kernel}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str]:
        return (
            "blur_limit",
            "sigma_x_limit",
            "sigma_y_limit",
            "rotate_limit",
            "beta_limit",
            "noise_limit",
        )
apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel=kernel)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> Dict[str, np.ndarray]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
    sigma_x = random_utils.uniform(*self.sigma_x_limit)
    sigma_y = random_utils.uniform(*self.sigma_y_limit)
    angle = np.deg2rad(random.uniform(*self.rotate_limit))

    # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
    beta = (
        random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
    )

    noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])

    # Generate mesh grid centered at zero.
    ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
    # > Shape (ksize, ksize, 2)
    grid = np.stack(np.meshgrid(ax, ax), axis=-1)

    # Calculate rotated sigma matrix
    d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
    u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
    sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

    inverse_sigma = np.linalg.inv(sigma_matrix)
    # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
    kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
    # Add noise
    kernel *= noise_matrix

    # Normalize kernel
    kernel = kernel.astype(np.float32) / np.sum(kernel)
    return {"kernel": kernel}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str]:
    return (
        "blur_limit",
        "sigma_x_limit",
        "sigma_y_limit",
        "rotate_limit",
        "beta_limit",
        "noise_limit",
    )
class Blur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a random-sized kernel.

Parameters:

Name Type Description
blur_limit Union[int, Tuple[int, int]]

maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class Blur(ImageOnlyTransform):
    """Blur the input image using a random-sized kernel.

    Args:
        blur_limit: maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BlurInitSchema):
        pass

    def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], blur_limit)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.blur(img, kernel)

    def get_params(self) -> Dict[str, Any]:
        return {"kernel": random_utils.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("blur_limit",)
apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fblur.blur(img, kernel)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> Dict[str, Any]:
    return {"kernel": random_utils.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return ("blur_limit",)
class Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=False, p=0.5) [view source on GitHub]

Apply defocus transform.

Parameters:

Name Type Description
radius int, int) or int

range for radius of defocusing. If limit is a single int, the range will be [1, limit]. Default: (3, 10).

alias_blur float, float) or float

range for alias_blur of defocusing (sigma of gaussian blur). If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class Defocus(ImageOnlyTransform):
    """Apply defocus transform.

    Args:
        radius ((int, int) or int): range for radius of defocusing.
            If limit is a single int, the range will be [1, limit]. Default: (3, 10).
        alias_blur ((float, float) or float): range for alias_blur of defocusing (sigma of gaussian blur).
            If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        unit8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
    """

    class InitSchema(BaseTransformInitSchema):
        radius: OnePlusIntRangeType = (3, 10)
        alias_blur: NonNegativeFloatRangeType = (0.1, 0.5)

    def __init__(
        self,
        radius: ScaleIntType = (3, 10),
        alias_blur: ScaleFloatType = (0.1, 0.5),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.radius = cast(Tuple[int, int], radius)
        self.alias_blur = cast(Tuple[float, float], alias_blur)

    def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
        return fblur.defocus(img, radius, alias_blur)

    def get_params(self) -> Dict[str, Any]:
        return {
            "radius": random_utils.randint(self.radius[0], self.radius[1] + 1),
            "alias_blur": random_utils.uniform(self.alias_blur[0], self.alias_blur[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("radius", "alias_blur")
apply (self, img, radius, alias_blur, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
    return fblur.defocus(img, radius, alias_blur)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> Dict[str, Any]:
    return {
        "radius": random_utils.randint(self.radius[0], self.radius[1] + 1),
        "alias_blur": random_utils.uniform(self.alias_blur[0], self.alias_blur[1]),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str]:
    return ("radius", "alias_blur")
class GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a Gaussian filter with a random kernel size.

Parameters:

Name Type Description
blur_limit int, (int, int

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigma_limit float, (float, float

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigma_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class GaussianBlur(ImageOnlyTransform):
    """Blur the input image using a Gaussian filter with a random kernel size.

    Args:
        blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BlurInitSchema):
        sigma_limit: NonNegativeFloatRangeType = 0

        @field_validator("blur_limit")
        @classmethod
        def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> Tuple[int, int]:
            return process_blur_limit(value, info, min_value=0)

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.blur_limit, (tuple, list))
                and self.blur_limit[0] == 0
                and isinstance(self.sigma_limit, (tuple, list))
                and self.sigma_limit[0] == 0
            ):
                self.blur_limit = 3, max(3, self.blur_limit[1])
                warnings.warn(
                    "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                    "blur_limit minimum value changed to 3.",
                    stacklevel=2,
                )

            if isinstance(self.blur_limit, tuple):
                for v in self.blur_limit:
                    if v != 0 and v % 2 != 1:
                        raise ValueError(f"Blur limit must be 0 or odd. Got: {self.blur_limit}")

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_limit = cast(Tuple[float, float], sigma_limit)

    def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
        return fblur.gaussian_blur(img, ksize, sigma=sigma)

    def get_params(self) -> Dict[str, float]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
        if ksize != 0 and ksize % 2 != 1:
            ksize = (ksize + 1) % (self.blur_limit[1] + 1)

        return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("blur_limit", "sigma_limit")
apply (self, img, ksize, sigma, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
    return fblur.gaussian_blur(img, ksize, sigma=sigma)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> Dict[str, float]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
    if ksize != 0 and ksize % 2 != 1:
        ksize = (ksize + 1) % (self.blur_limit[1] + 1)

    return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str]:
    return ("blur_limit", "sigma_limit")
class GlassBlur (sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=False, p=0.5) [view source on GitHub]

Apply glass noise to the input image.

Parameters:

Name Type Description
sigma float

standard deviation for Gaussian kernel.

max_delta int

max distance between pixels which are swapped.

iterations int

number of repeats. Should be in range [1, inf). Default: (2).

mode str

mode of computation: fast or exact. Default: "fast".

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class GlassBlur(ImageOnlyTransform):
    """Apply glass noise to the input image.

    Args:
        sigma (float): standard deviation for Gaussian kernel.
        max_delta (int): max distance between pixels which are swapped.
        iterations (int): number of repeats.
            Should be in range [1, inf). Default: (2).
        mode (str): mode of computation: fast or exact. Default: "fast".
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
        https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

    """

    class InitSchema(BaseTransformInitSchema):
        sigma: float = Field(default=0.7, ge=0, description="Standard deviation for the Gaussian kernel.")
        max_delta: int = Field(default=4, ge=1, description="Maximum distance between pixels that are swapped.")
        iterations: int = Field(default=2, ge=1, description="Number of times the glass noise effect is applied.")
        mode: Literal["fast", "exact"] = "fast"

    def __init__(
        self,
        sigma: float = 0.7,
        max_delta: int = 4,
        iterations: int = 2,
        mode: Literal["fast", "exact"] = "fast",
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.sigma = sigma
        self.max_delta = max_delta
        self.iterations = iterations
        self.mode = mode

    def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
        if dxy is None:
            msg = "dxy is None"
            raise ValueError(msg)

        return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
        img = params["image"]

        height, width = img.shape[:2]

        # generate array containing all necessary values for transformations
        width_pixels = height - self.max_delta * 2
        height_pixels = width - self.max_delta * 2
        total_pixels = int(width_pixels * height_pixels)
        dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

        return {"dxy": dxy}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("sigma", "max_delta", "iterations", "mode")

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]
targets_as_params: List[str] property readonly

Targets used to get params

apply (self, img, *args, *, dxy, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
    if dxy is None:
        msg = "dxy is None"
        raise ValueError(msg)

    return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
    img = params["image"]

    height, width = img.shape[:2]

    # generate array containing all necessary values for transformations
    width_pixels = height - self.max_delta * 2
    height_pixels = width - self.max_delta * 2
    total_pixels = int(width_pixels * height_pixels)
    dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

    return {"dxy": dxy}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
    return ("sigma", "max_delta", "iterations", "mode")
class MedianBlur (blur_limit=7, always_apply=False, p=0.5) [view source on GitHub]

Blur the input image using a median filter with a random aperture linear size.

Parameters:

Name Type Description
blur_limit int

maximum aperture linear size for blurring the input image. Must be odd and in range [3, inf). Default: (3, 7).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class MedianBlur(Blur):
    """Blur the input image using a median filter with a random aperture linear size.

    Args:
        blur_limit (int): maximum aperture linear size for blurring the input image.
            Must be odd and in range [3, inf). Default: (3, 7).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
        super().__init__(blur_limit, always_apply, p)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.median_blur(img, kernel)
__init__ (self, blur_limit=7, always_apply=False, p=0.5) special

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/blur/transforms.py
Python
def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
    super().__init__(blur_limit, always_apply, p)
apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fblur.median_blur(img, kernel)
class MotionBlur (blur_limit=7, allow_shifted=True, always_apply=False, p=0.5) [view source on GitHub]

Apply motion blur to the input image using a random-sized kernel.

Parameters:

Name Type Description
blur_limit int

maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7).

allow_shifted bool

if set to true creates non shifted kernels only, otherwise creates randomly shifted kernels. Default: True.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class MotionBlur(Blur):
    """Apply motion blur to the input image using a random-sized kernel.

    Args:
        blur_limit (int): maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        allow_shifted (bool): if set to true creates non shifted kernels only,
            otherwise creates randomly shifted kernels. Default: True.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        allow_shifted: bool = Field(
            default=True,
            description="If set to true creates non-shifted kernels only, otherwise creates randomly shifted kernels.",
        )
        blur_limit: ScaleIntType = Field(
            default=(3, 7),
            description="Maximum kernel size for blurring the input image.",
        )

        @model_validator(mode="after")
        def process_blur(self) -> Self:
            self.blur_limit = cast(Tuple[int, int], to_tuple(self.blur_limit, 3))

            if self.allow_shifted and isinstance(self.blur_limit, tuple) and any(x % 2 != 1 for x in self.blur_limit):
                raise ValueError(f"Blur limit must be odd when centered=True. Got: {self.blur_limit}")

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = 7,
        allow_shifted: bool = True,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(blur_limit=blur_limit, always_apply=always_apply, p=p)
        self.allow_shifted = allow_shifted
        self.blur_limit = cast(Tuple[int, int], blur_limit)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "allow_shifted")

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> Dict[str, Any]:
        ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
        if ksize <= TWO:
            raise ValueError(f"ksize must be > 2. Got: {ksize}")
        kernel = np.zeros((ksize, ksize), dtype=np.uint8)
        x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
        if x1 == x2:
            y1, y2 = random.sample(range(ksize), 2)
        else:
            y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

        def make_odd_val(v1: int, v2: int) -> Tuple[int, int]:
            len_v = abs(v1 - v2) + 1
            if len_v % 2 != 1:
                if v2 > v1:
                    v2 -= 1
                else:
                    v1 -= 1
            return v1, v2

        if not self.allow_shifted:
            x1, x2 = make_odd_val(x1, x2)
            y1, y2 = make_odd_val(y1, y2)

            xc = (x1 + x2) / 2
            yc = (y1 + y2) / 2

            center = ksize / 2 - 0.5
            dx = xc - center
            dy = yc - center
            x1, x2 = (int(i - dx) for i in [x1, x2])
            y1, y2 = (int(i - dy) for i in [y1, y2])

        cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

        # Normalize kernel
        return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel=kernel)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> Dict[str, Any]:
    ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
    if ksize <= TWO:
        raise ValueError(f"ksize must be > 2. Got: {ksize}")
    kernel = np.zeros((ksize, ksize), dtype=np.uint8)
    x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
    if x1 == x2:
        y1, y2 = random.sample(range(ksize), 2)
    else:
        y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

    def make_odd_val(v1: int, v2: int) -> Tuple[int, int]:
        len_v = abs(v1 - v2) + 1
        if len_v % 2 != 1:
            if v2 > v1:
                v2 -= 1
            else:
                v1 -= 1
        return v1, v2

    if not self.allow_shifted:
        x1, x2 = make_odd_val(x1, x2)
        y1, y2 = make_odd_val(y1, y2)

        xc = (x1 + x2) / 2
        yc = (y1 + y2) / 2

        center = ksize / 2 - 0.5
        dx = xc - center
        dy = yc - center
        x1, x2 = (int(i - dx) for i in [x1, x2])
        y1, y2 = (int(i - dy) for i in [y1, y2])

    cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

    # Normalize kernel
    return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return (*super().get_transform_init_args_names(), "allow_shifted")
class ZoomBlur (max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=False, p=0.5) [view source on GitHub]

Apply zoom blur transform.

Parameters:

Name Type Description
max_factor float, float) or float

range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1.

step_factor float, float) or float

If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range [step_factor[0], step_factor[1]). Default: (0.01, 0.03). All step_factor values should be positive.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class ZoomBlur(ImageOnlyTransform):
    """Apply zoom blur transform.

    Args:
        max_factor ((float, float) or float): range for max factor for blurring.
            If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
            All max_factor values should be larger than 1.
        step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
            If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
            All step_factor values should be positive.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        unit8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
    """

    class InitSchema(BaseTransformInitSchema):
        max_factor: OnePlusFloatRangeType = (1, 1.31)
        step_factor: NonNegativeFloatRangeType = (0.01, 0.03)

    def __init__(
        self,
        max_factor: ScaleFloatType = (1, 1.31),
        step_factor: ScaleFloatType = (0.01, 0.03),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.max_factor = cast(Tuple[float, float], max_factor)
        self.step_factor = cast(Tuple[float, float], step_factor)

    def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
        return fblur.zoom_blur(img, zoom_factors)

    def get_params(self) -> Dict[str, Any]:
        max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
        step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
        return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("max_factor", "step_factor")
apply (self, img, zoom_factors, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
    return fblur.zoom_blur(img, zoom_factors)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> Dict[str, Any]:
    max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
    step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
    return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str]:
    return ("max_factor", "step_factor")

crops special

functional

def bbox_crop (bbox, x_min, y_min, x_max, y_max, rows, cols) [view source on GitHub]

Crop a bounding box using the provided coordinates.

Parameters:

Name Type Description
bbox BoxInternalType

A bounding box in the format (x_min, y_min, x_max, y_max).

x_min int

The minimum x-coordinate of the crop area.

y_min int

The minimum y-coordinate of the crop area.

x_max int

The maximum x-coordinate of the crop area.

y_max int

The maximum y-coordinate of the crop area.

rows int

The number of rows (height) in the original image.

cols int

The number of columns (width) in the original image.

Returns:

Type Description
BoxInternalType

A cropped bounding box in the format (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/crops/functional.py
Python
def bbox_crop(
    bbox: BoxInternalType,
    x_min: int,
    y_min: int,
    x_max: int,
    y_max: int,
    rows: int,
    cols: int,
) -> BoxInternalType:
    """Crop a bounding box using the provided coordinates.

    Args:
        bbox (BoxInternalType): A bounding box in the format `(x_min, y_min, x_max, y_max)`.
        x_min (int): The minimum x-coordinate of the crop area.
        y_min (int): The minimum y-coordinate of the crop area.
        x_max (int): The maximum x-coordinate of the crop area.
        y_max (int): The maximum y-coordinate of the crop area.
        rows (int): The number of rows (height) in the original image.
        cols (int): The number of columns (width) in the original image.

    Returns:
        BoxInternalType: A cropped bounding box in the format `(x_min, y_min, x_max, y_max)`.
    """
    crop_coords = x_min, y_min, x_max, y_max
    crop_height = y_max - y_min
    crop_width = x_max - x_min
    return crop_bbox_by_coords(bbox, crop_coords, crop_height, crop_width, rows, cols)
def crop_bbox_by_coords (bbox, crop_coords, crop_height, crop_width, rows, cols) [view source on GitHub]

Crop a bounding box using the provided coordinates of the crop area and the original image dimensions.

Parameters:

Name Type Description
bbox BoxInternalType

A bounding box in the format (x_min, y_min, x_max, y_max).

crop_coords Tuple[int, int, int, int]

The coordinates of the crop area in the format (x1, y1, x2, y2).

crop_height int

The height of the crop area in pixels.

crop_width int

The width of the crop area in pixels.

rows int

The number of rows (height) in the original image.

cols int

The number of columns (width) in the original image.

Returns:

Type Description
BoxInternalType

A cropped bounding box in the format (x_min, y_min, x_max, y_max).

Source code in albumentations/augmentations/crops/functional.py
Python
def crop_bbox_by_coords(
    bbox: BoxInternalType,
    crop_coords: Tuple[int, int, int, int],
    crop_height: int,
    crop_width: int,
    rows: int,
    cols: int,
) -> BoxInternalType:
    """Crop a bounding box using the provided coordinates of the crop area and the original image dimensions.

    Args:
        bbox (BoxInternalType): A bounding box in the format `(x_min, y_min, x_max, y_max)`.
        crop_coords (Tuple[int, int, int, int]): The coordinates of the crop area in the format `(x1, y1, x2, y2)`.
        crop_height (int): The height of the crop area in pixels.
        crop_width (int): The width of the crop area in pixels.
        rows (int): The number of rows (height) in the original image.
        cols (int): The number of columns (width) in the original image.

    Returns:
        BoxInternalType: A cropped bounding box in the format `(x_min, y_min, x_max, y_max)`.
    """
    normalized_bbox = denormalize_bbox(bbox, rows, cols)
    x_min, y_min, x_max, y_max = normalized_bbox[:4]
    x1, y1 = crop_coords[:2]
    cropped_bbox = x_min - x1, y_min - y1, x_max - x1, y_max - y1
    return cast(BoxInternalType, normalize_bbox(cropped_bbox, crop_height, crop_width))
def crop_keypoint_by_coords (keypoint, crop_coords) [view source on GitHub]

Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

Parameters:

Name Type Description
keypoint tuple

A keypoint (x, y, angle, scale).

crop_coords tuple

Crop box coords (x1, x2, y1, y2).

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/crops/functional.py
Python
def crop_keypoint_by_coords(
    keypoint: KeypointInternalType,
    crop_coords: Tuple[int, int, int, int],
) -> KeypointInternalType:
    """Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the
    required height and width of the crop.

    Args:
        keypoint (tuple): A keypoint `(x, y, angle, scale)`.
        crop_coords (tuple): Crop box coords `(x1, x2, y1, y2)`.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    x, y, angle, scale = keypoint[:4]
    x1, y1 = crop_coords[:2]
    return x - x1, y - y1, angle, scale
def keypoint_center_crop (keypoint, crop_height, crop_width, rows, cols) [view source on GitHub]

Keypoint center crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

crop_height int

Crop height.

crop_width int

Crop width.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/crops/functional.py
Python
def keypoint_center_crop(
    keypoint: KeypointInternalType,
    crop_height: int,
    crop_width: int,
    rows: int,
    cols: int,
) -> KeypointInternalType:
    """Keypoint center crop.

    Args:
        keypoint: A keypoint `(x, y, angle, scale)`.
        crop_height: Crop height.
        crop_width: Crop width.
        rows: Image height.
        cols: Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    crop_coords = get_center_crop_coords(rows, cols, crop_height, crop_width)
    return crop_keypoint_by_coords(keypoint, crop_coords)
def keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols) [view source on GitHub]

Keypoint random crop.

Parameters:

Name Type Description
keypoint Tuple[float, float, float, float]

(tuple): A keypoint (x, y, angle, scale).

crop_height int

Crop height.

crop_width int

Crop width.

h_start int

Crop height start.

w_start int

Crop width start.

rows int

Image height.

cols int

Image width.

Returns:

Type Description
Tuple[float, float, float, float]

A keypoint (x, y, angle, scale).

Source code in albumentations/augmentations/crops/functional.py
Python
def keypoint_random_crop(
    keypoint: KeypointInternalType,
    crop_height: int,
    crop_width: int,
    h_start: float,
    w_start: float,
    rows: int,
    cols: int,
) -> KeypointInternalType:
    """Keypoint random crop.

    Args:
        keypoint: (tuple): A keypoint `(x, y, angle, scale)`.
        crop_height (int): Crop height.
        crop_width (int): Crop width.
        h_start (int): Crop height start.
        w_start (int): Crop width start.
        rows (int): Image height.
        cols (int): Image width.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    """
    crop_coords = get_random_crop_coords(rows, cols, crop_height, crop_width, h_start, w_start)
    return crop_keypoint_by_coords(keypoint, crop_coords)

transforms

class BBoxSafeRandomCrop (erosion_rate=0.0, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input without loss of bboxes.

Parameters:

Name Type Description
erosion_rate float

erosion rate applied on input image height before crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BBoxSafeRandomCrop(DualTransform):
    """Crop a random part of the input without loss of bboxes.

    Args:
        erosion_rate: erosion rate applied on input image height before crop.
        p: probability of applying the transform. Default: 1.
    Targets:
        image, mask, bboxes
    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        erosion_rate: float = Field(
            default=0.0,
            ge=0.0,
            le=1.0,
            description="Erosion rate applied on input image height before crop.",
        )
        p: ProbabilityType = 1

    def __init__(self, erosion_rate: float = 0.0, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.erosion_rate = erosion_rate

    def apply(
        self,
        img: np.ndarray,
        crop_height: int,
        crop_width: int,
        h_start: int,
        w_start: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.random_crop(img, crop_height, crop_width, h_start, w_start)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
        img_h, img_w = params["image"].shape[:2]
        if len(params["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
            erosive_h = int(img_h * (1.0 - self.erosion_rate))
            crop_height = img_h if erosive_h >= img_h else random.randint(erosive_h, img_h)
            return {
                "h_start": random.random(),
                "w_start": random.random(),
                "crop_height": crop_height,
                "crop_width": int(crop_height * img_w / img_h),
            }
        # get union of all bboxes
        x, y, x2, y2 = union_of_bboxes(
            width=img_w,
            height=img_h,
            bboxes=params["bboxes"],
            erosion_rate=self.erosion_rate,
        )
        # find bigger region
        bx, by = x * random.random(), y * random.random()
        bx2, by2 = x2 + (1 - x2) * random.random(), y2 + (1 - y2) * random.random()
        bw, bh = bx2 - bx, by2 - by
        crop_height = img_h if bh >= 1.0 else int(img_h * bh)
        crop_width = img_w if bw >= 1.0 else int(img_w * bw)
        h_start = np.clip(0.0 if bh >= 1.0 else by / (1.0 - bh), 0.0, 1.0)
        w_start = np.clip(0.0 if bw >= 1.0 else bx / (1.0 - bw), 0.0, 1.0)
        return {"h_start": h_start, "w_start": w_start, "crop_height": crop_height, "crop_width": crop_width}

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        crop_height: int,
        crop_width: int,
        h_start: int,
        w_start: int,
        rows: int,
        cols: int,
        **params: Any,
    ) -> BoxInternalType:
        return fcrops.bbox_random_crop(bbox, crop_height, crop_width, h_start, w_start, rows, cols)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image", "bboxes"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("erosion_rate",)

    @property
    def targets(self) -> Dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }
targets_as_params: List[str] property readonly

Targets used to get params

apply (self, img, crop_height, crop_width, h_start, w_start, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    crop_height: int,
    crop_width: int,
    h_start: int,
    w_start: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.random_crop(img, crop_height, crop_width, h_start, w_start)
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
    img_h, img_w = params["image"].shape[:2]
    if len(params["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
        erosive_h = int(img_h * (1.0 - self.erosion_rate))
        crop_height = img_h if erosive_h >= img_h else random.randint(erosive_h, img_h)
        return {
            "h_start": random.random(),
            "w_start": random.random(),
            "crop_height": crop_height,
            "crop_width": int(crop_height * img_w / img_h),
        }
    # get union of all bboxes
    x, y, x2, y2 = union_of_bboxes(
        width=img_w,
        height=img_h,
        bboxes=params["bboxes"],
        erosion_rate=self.erosion_rate,
    )
    # find bigger region
    bx, by = x * random.random(), y * random.random()
    bx2, by2 = x2 + (1 - x2) * random.random(), y2 + (1 - y2) * random.random()
    bw, bh = bx2 - bx, by2 - by
    crop_height = img_h if bh >= 1.0 else int(img_h * bh)
    crop_width = img_w if bw >= 1.0 else int(img_w * bw)
    h_start = np.clip(0.0 if bh >= 1.0 else by / (1.0 - bh), 0.0, 1.0)
    w_start = np.clip(0.0 if bw >= 1.0 else bx / (1.0 - bw), 0.0, 1.0)
    return {"h_start": h_start, "w_start": w_start, "crop_height": crop_height, "crop_width": crop_width}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return ("erosion_rate",)
class CenterCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop the central part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CenterCrop(DualTransform):
    """Crop the central part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(CropInitSchema):
        pass

    def __init__(self, height: int, width: int, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.height = height
        self.width = width

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fcrops.center_crop(img, self.height, self.width)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fcrops.bbox_center_crop(bbox, self.height, self.width, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fcrops.keypoint_center_crop(keypoint, self.height, self.width, **params)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("height", "width")
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fcrops.center_crop(img, self.height, self.width)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str]:
    return ("height", "width")
class Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0) [view source on GitHub]

Crop region from image.

Parameters:

Name Type Description
x_min int

Minimum upper left x coordinate.

y_min int

Minimum upper left y coordinate.

x_max int

Maximum lower right x coordinate.

y_max int

Maximum lower right y coordinate.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class Crop(DualTransform):
    """Crop region from image.

    Args:
        x_min: Minimum upper left x coordinate.
        y_min: Minimum upper left y coordinate.
        x_max: Maximum lower right x coordinate.
        y_max: Maximum lower right y coordinate.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        x_min: Annotated[int, Field(ge=0, description="Minimum upper left x coordinate")]
        y_min: Annotated[int, Field(ge=0, description="Minimum upper left y coordinate")]
        x_max: Annotated[int, Field(gt=0, description="Maximum lower right x coordinate")]
        y_max: Annotated[int, Field(gt=0, description="Maximum lower right y coordinate")]
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_coordinates(self) -> Self:
            if not self.x_min < self.x_max:
                msg = "x_max must be greater than x_min"
                raise ValueError(msg)
            if not self.y_min < self.y_max:
                msg = "y_max must be greater than y_min"
                raise ValueError(msg)
            return self

    def __init__(
        self,
        x_min: int = 0,
        y_min: int = 0,
        x_max: int = 1024,
        y_max: int = 1024,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fcrops.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fcrops.bbox_crop(bbox, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fcrops.crop_keypoint_by_coords(keypoint, crop_coords=(self.x_min, self.y_min, self.x_max, self.y_max))

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("x_min", "y_min", "x_max", "y_max")
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fcrops.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
    return ("x_min", "y_min", "x_max", "y_max")
class CropAndPad (px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e., extracts a subimage from a given full image). Padding adds pixels to the sides (e.g., black pixels). This transformation will never crop images below a height or width of 1.

Note

This transformation automatically resizes images back to their original size. To deactivate this, add the parameter keep_size=False.

Parameters:

Name Type Description
px int, Tuple[int, int], Tuple[int, int, int, int], Tuple[Union[int, Tuple[int, int], List[int]], Union[int, Tuple[int, int], List[int]], Union[int, Tuple[int, int], List[int]], Union[int, Tuple[int, int], List[int]]]

The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter percent may be set, not both at the same time.

* If `None`, then pixel-based cropping/padding will not be used.
* If `int`, then that exact number of pixels will always be cropped/padded.
* If a `tuple` of two `int`s with values `a` and `b`, then each side will be cropped/padded by a
    random amount sampled uniformly per image and side from the interval `[a, b]`.
    If `sample_independently` is set to `False`, only one value will be sampled per
        image and used for all sides.
* If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
    Each entry may be:
    - A single `int` (always crop/pad by exactly that value).
    - A `tuple` of two `int`s `a` and `b` (crop/pad by an amount within `[a, b]`).
    - A `list` of `int`s (crop/pad by a random value that is contained in the `list`).
percent float, Tuple[float, float], Tuple[float, float, float, float], Tuple[Union[float, Tuple[float, float], List[float]], Union[float, Tuple[float, float], List[float]], Union[float, Tuple[float, float], List[float]], Union[float, Tuple[float, float], List[float]]]

The number of pixels to crop (negative values) or pad (positive values) on each side of the image given as a fraction of the image height/width. E.g. if this is set to -0.1, the transformation will always crop away 10% of the image's height at both the top and the bottom (both 10% each), as well as 10% of the width at the right and left. Expected value range is (-1.0, inf). Either this or the parameter px may be set, not both at the same time.

* If `None`, then fraction-based cropping/padding will not be used.
* If `float`, then that fraction will always be cropped/padded.
* If a `tuple` of two `float`s with values `a` and `b`, then each side will be cropped/padded by a
random fraction sampled uniformly per image and side from the interval `[a, b]`.
If `sample_independently` is set to `False`, only one value will be sampled per image and used
for all sides.
* If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
    Each entry may be:
    - A single `float` (always crop/pad by exactly that percent value).
    - A `tuple` of two `float`s `a` and `b` (crop/pad by a fraction from `[a, b]`).
    - A `list` of `float`s (crop/pad by a random value that is contained in the `list`).
pad_mode int

OpenCV border mode.

pad_cval Union[int, float, Tuple[Union[int, float], Union[int, float]], List[Union[int, float]]]

The constant value to use if the pad mode is BORDER_CONSTANT. * If number, then that value will be used. * If a tuple of two numbers and at least one of them is a float, then a random number will be uniformly sampled per image from the continuous interval [a, b] and used as the value. If both numbers are ints, the interval is discrete. * If a list of numbers, then a random value will be chosen from the elements of the list and used as the value.

pad_cval_mask Union[int, float, Tuple[Union[int, float], Union[int, float]], List[Union[int, float]]]

Same as pad_cval but only for masks.

keep_size bool

After cropping and padding, the resulting image will usually have a different height/width compared to the original input image. If this parameter is set to True, then the cropped/padded image will be resized to the input image's size, i.e., the output shape is always identical to the input shape.

sample_independently bool

If False and the values for px/percent result in exactly one probability distribution for all image sides, only one single value will be sampled from that probability distribution and used for all sides. I.e., the crop/pad amount then is the same for all sides. If True, four values will be sampled independently, one per side.

interpolation int

OpenCV flag that is used to specify the interpolation algorithm for images. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

Targets

image, mask, bboxes, keypoints

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropAndPad(DualTransform):
    """Crop and pad images by pixel amounts or fractions of image sizes.
    Cropping removes pixels at the sides (i.e., extracts a subimage from a given full image).
    Padding adds pixels to the sides (e.g., black pixels).
    This transformation will never crop images below a height or width of 1.

    Note:
        This transformation automatically resizes images back to their original size. To deactivate this, add the
        parameter `keep_size=False`.

    Args:
        px (int,
            Tuple[int, int],
            Tuple[int, int, int, int],
            Tuple[Union[int, Tuple[int, int], List[int]],
                  Union[int, Tuple[int, int], List[int]],
                  Union[int, Tuple[int, int], List[int]],
                  Union[int, Tuple[int, int], List[int]]]):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image.
                Either this or the parameter `percent` may be set, not both at the same time.

                * If `None`, then pixel-based cropping/padding will not be used.
                * If `int`, then that exact number of pixels will always be cropped/padded.
                * If a `tuple` of two `int`s with values `a` and `b`, then each side will be cropped/padded by a
                    random amount sampled uniformly per image and side from the interval `[a, b]`.
                    If `sample_independently` is set to `False`, only one value will be sampled per
                        image and used for all sides.
                * If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
                    Each entry may be:
                    - A single `int` (always crop/pad by exactly that value).
                    - A `tuple` of two `int`s `a` and `b` (crop/pad by an amount within `[a, b]`).
                    - A `list` of `int`s (crop/pad by a random value that is contained in the `list`).

        percent (float,
                 Tuple[float, float],
                 Tuple[float, float, float, float],
                 Tuple[Union[float, Tuple[float, float], List[float]],
                       Union[float, Tuple[float, float], List[float]],
                       Union[float, Tuple[float, float], List[float]],
                       Union[float, Tuple[float, float], List[float]]]):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image given
                as a *fraction* of the image height/width. E.g. if this is set to `-0.1`, the transformation will
                always crop away `10%` of the image's height at both the top and the bottom (both `10%` each),
                as well as `10%` of the width at the right and left. Expected value range is `(-1.0, inf)`.
                Either this or the parameter `px` may be set, not both at the same time.

                * If `None`, then fraction-based cropping/padding will not be used.
                * If `float`, then that fraction will always be cropped/padded.
                * If a `tuple` of two `float`s with values `a` and `b`, then each side will be cropped/padded by a
                random fraction sampled uniformly per image and side from the interval `[a, b]`.
                If `sample_independently` is set to `False`, only one value will be sampled per image and used
                for all sides.
                * If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
                    Each entry may be:
                    - A single `float` (always crop/pad by exactly that percent value).
                    - A `tuple` of two `float`s `a` and `b` (crop/pad by a fraction from `[a, b]`).
                    - A `list` of `float`s (crop/pad by a random value that is contained in the `list`).

        pad_mode (int): OpenCV border mode.
        pad_cval (Union[int, float, Tuple[Union[int, float], Union[int, float]], List[Union[int, float]]]):
            The constant value to use if the pad mode is `BORDER_CONSTANT`.
                * If `number`, then that value will be used.
                * If a `tuple` of two numbers and at least one of them is a `float`, then a random number
                    will be uniformly sampled per image from the continuous interval `[a, b]` and used as the value.
                    If both numbers are `int`s, the interval is discrete.
                * If a `list` of numbers, then a random value will be chosen from the elements of the `list` and
                    used as the value.

        pad_cval_mask (Union[int, float, Tuple[Union[int, float], Union[int, float]], List[Union[int, float]]]):
            Same as `pad_cval` but only for masks.

        keep_size (bool):
            After cropping and padding, the resulting image will usually have a different height/width compared to
            the original input image. If this parameter is set to `True`, then the cropped/padded image will be
            resized to the input image's size, i.e., the output shape is always identical to the input shape.

        sample_independently (bool):
            If `False` and the values for `px`/`percent` result in exactly one probability distribution for all
            image sides, only one single value will be sampled from that probability distribution and used for
            all sides. I.e., the crop/pad amount then is the same for all sides. If `True`, four values
            will be sampled independently, one per side.

        interpolation (int):
            OpenCV flag that is used to specify the interpolation algorithm for images. Should be one of:
            `cv2.INTER_NEAREST`, `cv2.INTER_LINEAR`, `cv2.INTER_CUBIC`, `cv2.INTER_AREA`, `cv2.INTER_LANCZOS4`.
            Default: `cv2.INTER_LINEAR`.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        unit8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        px: Optional[PxType] = Field(
            default=None,
            description="Number of pixels to crop (negative) or pad (positive).",
        )
        percent: Optional[PercentType] = Field(
            default=None,
            description="Fraction of image size to crop (negative) or pad (positive).",
        )
        pad_mode: BorderModeType = cv2.BORDER_CONSTANT
        pad_cval: Union[ScalarType, Tuple[ScalarType, ScalarType], List[ScalarType]] = Field(
            default=0,
            description="Padding value if pad_mode is BORDER_CONSTANT.",
        )
        pad_cval_mask: Union[ScalarType, Tuple[ScalarType, ScalarType], List[ScalarType]] = Field(
            default=0,
            description="Padding value for masks if pad_mode is BORDER_CONSTANT.",
        )
        keep_size: bool = Field(
            default=True,
            description="Whether to resize the image back to the original size after cropping and padding.",
        )
        sample_independently: bool = Field(
            default=True,
            description="Whether to sample the crop/pad size independently for each side.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def check_px_percent(self) -> Self:
            if self.px is None and self.percent is None:
                msg = "Both px and percent parameters cannot be None simultaneously."
                raise ValueError(msg)
            if self.px is not None and self.percent is not None:
                msg = "Only px or percent may be set!"
                raise ValueError(msg)
            return self

    def __init__(
        self,
        px: Optional[Union[int, List[int]]] = None,
        percent: Optional[Union[float, List[float]]] = None,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_cval: Union[ScalarType, Tuple[ScalarType, ScalarType], List[ScalarType]] = 0,
        pad_cval_mask: Union[ScalarType, Tuple[ScalarType, ScalarType], List[ScalarType]] = 0,
        keep_size: bool = True,
        sample_independently: bool = True,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)

        self.px = px
        self.percent = percent

        self.pad_mode = pad_mode
        self.pad_cval = pad_cval
        self.pad_cval_mask = pad_cval_mask

        self.keep_size = keep_size
        self.sample_independently = sample_independently

        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        pad_value: float,
        rows: int,
        cols: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            img,
            crop_params,
            pad_params,
            pad_value,
            rows,
            cols,
            interpolation,
            self.pad_mode,
            self.keep_size,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        pad_value_mask: float,
        rows: int,
        cols: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            mask,
            crop_params,
            pad_params,
            pad_value_mask,
            rows,
            cols,
            interpolation,
            self.pad_mode,
            self.keep_size,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        rows: int,
        cols: int,
        result_rows: int,
        result_cols: int,
        **params: Any,
    ) -> BoxInternalType:
        return fcrops.crop_and_pad_bbox(bbox, crop_params, pad_params, rows, cols, result_rows, result_cols)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        rows: int,
        cols: int,
        result_rows: int,
        result_cols: int,
        **params: Any,
    ) -> KeypointInternalType:
        return fcrops.crop_and_pad_keypoint(
            keypoint,
            crop_params,
            pad_params,
            rows,
            cols,
            result_rows,
            result_cols,
            self.keep_size,
        )

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    @staticmethod
    def __prevent_zero(val1: int, val2: int, max_val: int) -> Tuple[int, int]:
        regain = abs(max_val) + 1
        regain1 = regain // 2
        regain2 = regain // 2
        if regain1 + regain2 < regain:
            regain1 += 1

        if regain1 > val1:
            diff = regain1 - val1
            regain1 = val1
            regain2 += diff
        elif regain2 > val2:
            diff = regain2 - val2
            regain2 = val2
            regain1 += diff

        return val1 - regain1, val2 - regain2

    @staticmethod
    def _prevent_zero(crop_params: List[int], height: int, width: int) -> List[int]:
        top, right, bottom, left = crop_params

        remaining_height = height - (top + bottom)
        remaining_width = width - (left + right)

        if remaining_height < 1:
            top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
        if remaining_width < 1:
            left, right = CropAndPad.__prevent_zero(left, right, width)

        return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        height, width = params["image"].shape[:2]

        if self.px is not None:
            new_params = self._get_px_params()
        else:
            percent_params = self._get_percent_params()
            new_params = [
                int(percent_params[0] * height),
                int(percent_params[1] * width),
                int(percent_params[2] * height),
                int(percent_params[3] * width),
            ]

        pad_params = [max(i, 0) for i in new_params]

        crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

        top, right, bottom, left = crop_params
        crop_params = [left, top, width - right, height - bottom]
        result_rows = crop_params[3] - crop_params[1]
        result_cols = crop_params[2] - crop_params[0]
        if result_cols == width and result_rows == height:
            crop_params = []

        top, right, bottom, left = pad_params
        pad_params = [top, bottom, left, right]
        if any(pad_params):
            result_rows += top + bottom
            result_cols += left + right
        else:
            pad_params = []

        return {
            "crop_params": crop_params or None,
            "pad_params": pad_params or None,
            "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
            "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
            "result_rows": result_rows,
            "result_cols": result_cols,
        }

    def _get_px_params(self) -> List[int]:
        if self.px is None:
            msg = "px is not set"
            raise ValueError(msg)

        if isinstance(self.px, int):
            params = [self.px] * 4
        elif len(self.px) == PAIR:
            if self.sample_independently:
                params = [random.randrange(*self.px) for _ in range(4)]
            else:
                px = random.randrange(*self.px)
                params = [px] * 4
        elif isinstance(self.px[0], int):
            params = self.px
        elif len(self.px[0]) == PAIR:
            params = [random.randrange(*i) for i in self.px]
        else:
            params = [random.choice(i) for i in self.px]

        return params

    def _get_percent_params(self) -> List[float]:
        if self.percent is None:
            msg = "percent is not set"
            raise ValueError(msg)

        if isinstance(self.percent, float):
            params = [self.percent] * 4
        elif len(self.percent) == PAIR:
            if self.sample_independently:
                params = [random.uniform(*self.percent) for _ in range(4)]
            else:
                px = random.uniform(*self.percent)
                params = [px] * 4
        elif isinstance(self.percent[0], (int, float)):
            params = self.percent
        elif len(self.percent[0]) == PAIR:
            params = [random.uniform(*i) for i in self.percent]
        else:
            params = [random.choice(i) for i in self.percent]

        return params  # params = [top, right, bottom, left]

    @staticmethod
    def _get_pad_value(
        pad_value: Union[ScalarType, Tuple[ScalarType, ScalarType], List[ScalarType]],
    ) -> ScalarType:
        if isinstance(pad_value, (int, float)):
            return pad_value

        if len(pad_value) == PAIR:
            a, b = pad_value
            if isinstance(a, int) and isinstance(b, int):
                return random.randint(a, b)

            return random.uniform(a, b)

        return random.choice(pad_value)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (
            "px",
            "percent",
            "pad_mode",
            "pad_cval",
            "pad_cval_mask",
            "keep_size",
            "sample_independently",
            "interpolation",
        )
targets_as_params: List[str] property readonly

Targets used to get params

apply (self, img, crop_params, pad_params, pad_value, rows, cols, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    crop_params: Sequence[int],
    pad_params: Sequence[int],
    pad_value: float,
    rows: int,
    cols: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.crop_and_pad(
        img,
        crop_params,
        pad_params,
        pad_value,
        rows,
        cols,
        interpolation,
        self.pad_mode,
        self.keep_size,
    )
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
    height, width = params["image"].shape[:2]

    if self.px is not None:
        new_params = self._get_px_params()
    else:
        percent_params = self._get_percent_params()
        new_params = [
            int(percent_params[0] * height),
            int(percent_params[1] * width),
            int(percent_params[2] * height),
            int(percent_params[3] * width),
        ]

    pad_params = [max(i, 0) for i in new_params]

    crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

    top, right, bottom, left = crop_params
    crop_params = [left, top, width - right, height - bottom]
    result_rows = crop_params[3] - crop_params[1]
    result_cols = crop_params[2] - crop_params[0]
    if result_cols == width and result_rows == height:
        crop_params = []

    top, right, bottom, left = pad_params
    pad_params = [top, bottom, left, right]
    if any(pad_params):
        result_rows += top + bottom
        result_cols += left + right
    else:
        pad_params = []

    return {
        "crop_params": crop_params or None,
        "pad_params": pad_params or None,
        "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
        "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
        "result_rows": result_rows,
        "result_cols": result_cols,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return (
        "px",
        "percent",
        "pad_mode",
        "pad_cval",
        "pad_cval_mask",
        "keep_size",
        "sample_independently",
        "interpolation",
    )
class CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0) [view source on GitHub]

Crop area with mask if mask is non-empty, else make random crop.

Parameters:

Name Type Description
height int

vertical size of crop in pixels

width int

horizontal size of crop in pixels

ignore_values list of int

values to ignore in mask, 0 values are always ignored (e.g. if background value is 5 set ignore_values=[5] to ignore)

ignore_channels list of int

channels to ignore in mask (e.g. if background is a first channel set ignore_channels=[0] to ignore)

p float

probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropNonEmptyMaskIfExists(DualTransform):
    """Crop area with mask if mask is non-empty, else make random crop.

    Args:
        height: vertical size of crop in pixels
        width: horizontal size of crop in pixels
        ignore_values (list of int): values to ignore in mask, `0` values are always ignored
            (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
        ignore_channels (list of int): channels to ignore in mask
            (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(CropInitSchema):
        ignore_values: Optional[List[int]] = Field(
            default=None,
            description="Values to ignore in mask, `0` values are always ignored",
        )
        ignore_channels: Optional[List[int]] = Field(default=None, description="Channels to ignore in mask")

    def __init__(
        self,
        height: int,
        width: int,
        ignore_values: Optional[List[int]] = None,
        ignore_channels: Optional[List[int]] = None,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)

        self.height = height
        self.width = width
        self.ignore_values = ignore_values
        self.ignore_channels = ignore_channels

    def apply(
        self,
        img: np.ndarray,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop(img, x_min, y_min, x_max, y_max)

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> BoxInternalType:
        return fcrops.bbox_crop(
            bbox,
            x_min=x_min,
            x_max=x_max,
            y_min=y_min,
            y_max=y_max,
            rows=params["rows"],
            cols=params["cols"],
        )

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> KeypointInternalType:
        return fcrops.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
        mask_height, mask_width = mask.shape[:2]

        if self.ignore_values is not None:
            ignore_values_np = np.array(self.ignore_values)
            mask = np.where(np.isin(mask, ignore_values_np), 0, mask)

        if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS and self.ignore_channels is not None:
            target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
            mask = np.take(mask, target_channels, axis=-1)

        if self.height > mask_height or self.width > mask_width:
            raise ValueError(
                f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})",
            )

        return mask

    def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
        super().update_params(params, **kwargs)
        if "mask" in kwargs:
            mask = self._preprocess_mask(kwargs["mask"])
        elif "masks" in kwargs and len(kwargs["masks"]):
            masks = kwargs["masks"]
            mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
            for m in masks[1:]:
                mask |= self._preprocess_mask(m)
        else:
            msg = "Can not find mask for CropNonEmptyMaskIfExists"
            raise RuntimeError(msg)

        mask_height, mask_width = mask.shape[:2]

        if mask.any():
            mask = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
            non_zero_yx = np.argwhere(mask)
            y, x = random.choice(non_zero_yx)
            x_min = x - random.randint(0, self.width - 1)
            y_min = y - random.randint(0, self.height - 1)
            x_min = np.clip(x_min, 0, mask_width - self.width)
            y_min = np.clip(y_min, 0, mask_height - self.height)
        else:
            x_min = random.randint(0, mask_width - self.width)
            y_min = random.randint(0, mask_height - self.height)

        x_max = x_min + self.width
        y_max = y_min + self.height

        params.update({"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max})
        return params

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("height", "width", "ignore_values", "ignore_channels")
apply (self, img, x_min, x_max, y_min, y_max, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    x_min: int,
    x_max: int,
    y_min: int,
    y_max: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.crop(img, x_min, y_min, x_max, y_max)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
    return ("height", "width", "ignore_values", "ignore_channels")
update_params (self, params, **kwargs)

Update parameters with transform specific params

Source code in albumentations/augmentations/crops/transforms.py
Python
def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
    super().update_params(params, **kwargs)
    if "mask" in kwargs:
        mask = self._preprocess_mask(kwargs["mask"])
    elif "masks" in kwargs and len(kwargs["masks"]):
        masks = kwargs["masks"]
        mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
        for m in masks[1:]:
            mask |= self._preprocess_mask(m)
    else:
        msg = "Can not find mask for CropNonEmptyMaskIfExists"
        raise RuntimeError(msg)

    mask_height, mask_width = mask.shape[:2]

    if mask.any():
        mask = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
        non_zero_yx = np.argwhere(mask)
        y, x = random.choice(non_zero_yx)
        x_min = x - random.randint(0, self.width - 1)
        y_min = y - random.randint(0, self.height - 1)
        x_min = np.clip(x_min, 0, mask_width - self.width)
        y_min = np.clip(y_min, 0, mask_height - self.height)
    else:
        x_min = random.randint(0, mask_width - self.width)
        y_min = random.randint(0, mask_height - self.height)

    x_max = x_min + self.width
    y_max = y_min + self.height

    params.update({"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max})
    return params
class RandomCrop (height, width, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCrop(DualTransform):
    """Crop a random part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(CropInitSchema):
        pass

    def __init__(self, height: int, width: int, always_apply: bool = False, p: float = 1.0):
        super().__init__(always_apply, p)
        self.height = height
        self.width = width

    def apply(self, img: np.ndarray, h_start: int, w_start: int, **params: Any) -> np.ndarray:
        return fcrops.random_crop(img, self.height, self.width, h_start, w_start)

    def get_params(self) -> Dict[str, float]:
        return {"h_start": random.random(), "w_start": random.random()}

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fcrops.bbox_random_crop(bbox, self.height, self.width, **params)

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
        return fcrops.keypoint_random_crop(keypoint, self.height, self.width, **params)

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("height", "width")
apply (self, img, h_start, w_start, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(self, img: np.ndarray, h_start: int, w_start: int, **params: Any) -> np.ndarray:
    return fcrops.random_crop(img, self.height, self.width, h_start, w_start)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params(self) -> Dict[str, float]:
    return {"h_start": random.random(), "w_start": random.random()}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str]:
    return ("height", "width")
class RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=False, p=1.0) [view source on GitHub]

Randomly crops parts of the image from the borders without resizing at the end. The cropped regions are defined as fractions of the original image dimensions, specified for each side of the image (left, right, top, bottom).

Parameters:

Name Type Description
crop_left float

Fraction of the width to randomly crop from the left side. Must be in the range [0.0, 1.0]. Default is 0.1.

crop_right float

Fraction of the width to randomly crop from the right side. Must be in the range [0.0, 1.0]. Default is 0.1.

crop_top float

Fraction of the height to randomly crop from the top side. Must be in the range [0.0, 1.0]. Default is 0.1.

crop_bottom float

Fraction of the height to randomly crop from the bottom side. Must be in the range [0.0, 1.0]. Default is 0.1.

p float

Probability of applying the transform. Default is 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropFromBorders(DualTransform):
    """Randomly crops parts of the image from the borders without resizing at the end. The cropped regions are defined
    as fractions of the original image dimensions, specified for each side of the image (left, right, top, bottom).

    Args:
        crop_left (float): Fraction of the width to randomly crop from the left side. Must be in the range [0.0, 1.0].
                            Default is 0.1.
        crop_right (float): Fraction of the width to randomly crop from the right side. Must be in the range [0.0, 1.0].
                            Default is 0.1.
        crop_top (float): Fraction of the height to randomly crop from the top side. Must be in the range [0.0, 1.0].
                          Default is 0.1.
        crop_bottom (float): Fraction of the height to randomly crop from the bottom side.
                             Must be in the range [0.0, 1.0]. Default is 0.1.
        p (float): Probability of applying the transform. Default is 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        crop_left: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of width to randomly crop from the left side.",
        )
        crop_right: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of width to randomly crop from the right side.",
        )
        crop_top: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of height to randomly crop from the top side.",
        )
        crop_bottom: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of height to randomly crop from the bottom side.",
        )
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_crop_values(self) -> Self:
            if self.crop_left + self.crop_right > 1.0:
                msg = "The sum of crop_left and crop_right must be <= 1."
                raise ValueError(msg)
            if self.crop_top + self.crop_bottom > 1.0:
                msg = "The sum of crop_top and crop_bottom must be <= 1."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        crop_left: float = 0.1,
        crop_right: float = 0.1,
        crop_top: float = 0.1,
        crop_bottom: float = 0.1,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        self.crop_left = crop_left
        self.crop_right = crop_right
        self.crop_top = crop_top
        self.crop_bottom = crop_bottom

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
        height, width = params["image"].shape[:2]

        x_min = random_utils.randint(0, int(self.crop_left * width) + 1)
        x_max = random_utils.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width + 1)

        y_min = random_utils.randint(0, int(self.crop_top * height) + 1)
        y_max = random_utils.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height + 1)

        return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}

    def apply(
        self,
        img: np.ndarray,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.clamping_crop(img, x_min, y_min, x_max, y_max)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.clamping_crop(mask, x_min, y_min, x_max, y_max)

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        return fcrops.bbox_crop(bbox, x_min, y_min, x_max, y_max, rows, cols)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> KeypointInternalType:
        return fcrops.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "crop_left", "crop_right", "crop_top", "crop_bottom"
targets_as_params: List[str] property readonly

Targets used to get params

apply (self, img, x_min, x_max, y_min, y_max, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    x_min: int,
    x_max: int,
    y_min: int,
    y_max: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.clamping_crop(img, x_min, y_min, x_max, y_max)
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
    height, width = params["image"].shape[:2]

    x_min = random_utils.randint(0, int(self.crop_left * width) + 1)
    x_max = random_utils.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width + 1)

    y_min = random_utils.randint(0, int(self.crop_top * height) + 1)
    y_max = random_utils.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height + 1)

    return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return "crop_left", "crop_right", "crop_top", "crop_bottom"
class RandomCropNearBBox (max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=False, p=1.0) [view source on GitHub]

Crop bbox from image with random shift by x,y coordinates

Parameters:

Name Type Description
max_part_shift float, (float, float

Max shift in height and width dimensions relative to cropping_bbox dimension. If max_part_shift is a single float, the range will be (0, max_part_shift). Default (0, 0.3).

cropping_bbox_key str

Additional target key for cropping box. Default cropping_bbox.

cropping_box_key str

[Deprecated] Use cropping_bbox_key instead.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:

Python
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_box')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropNearBBox(DualTransform):
    """Crop bbox from image with random shift by x,y coordinates

    Args:
        max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
            to `cropping_bbox` dimension.
            If max_part_shift is a single float, the range will be (0, max_part_shift).
            Default (0, 0.3).
        cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
        cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_box')],
        >>>              bbox_params=BboxParams("pascal_voc"))
        >>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        max_part_shift: ZeroOneRangeType = (0, 0.3)
        cropping_bbox_key: str = Field(default="cropping_bbox", description="Additional target key for cropping box.")
        p: ProbabilityType = 1

    def __init__(
        self,
        max_part_shift: ScaleFloatType = (0, 0.3),
        cropping_bbox_key: str = "cropping_bbox",
        cropping_box_key: Optional[str] = None,  # Deprecated
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)
        # Check for deprecated parameter and issue warning
        if cropping_box_key is not None:
            warn(
                "The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
                "Use 'cropping_bbox_key' instead.",
                DeprecationWarning,
                stacklevel=2,
            )
            # Ensure the new parameter is used even if the old one is passed
            cropping_bbox_key = cropping_box_key

        self.max_part_shift = cast(Tuple[float, float], max_part_shift)
        self.cropping_bbox_key = cropping_bbox_key

    def apply(
        self,
        img: np.ndarray,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.clamping_crop(img, x_min, y_min, x_max, y_max)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
        bbox = params[self.cropping_bbox_key]
        h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
        w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

        x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
        x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

        y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
        y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

        x_min = max(0, x_min)
        y_min = max(0, y_min)

        return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}

    def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
        return fcrops.bbox_crop(bbox, **params)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> KeypointInternalType:
        return fcrops.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    @property
    def targets_as_params(self) -> List[str]:
        return [self.cropping_bbox_key]

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("max_part_shift", "cropping_bbox_key")
targets_as_params: List[str] property readonly

Targets used to get params

apply (self, img, x_min, x_max, y_min, y_max, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    x_min: int,
    x_max: int,
    y_min: int,
    y_max: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.clamping_crop(img, x_min, y_min, x_max, y_max)
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
    bbox = params[self.cropping_bbox_key]
    h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
    w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

    x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
    x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

    y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
    y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

    x_min = max(0, x_min)
    y_min = max(0, y_min)

    return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str]:
    return ("max_part_shift", "cropping_bbox_key")
class RandomResizedCrop (size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Torchvision's variant of crop a random part of the input and rescale it to some size.

Parameters:

Name Type Description
size int, int

target size for the output image, i.e. (height, width) after crop and resize

scale float, float

range of size of the origin size cropped

ratio float, float

range of aspect ratio of the origin aspect ratio cropped

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomResizedCrop(_BaseRandomSizedCrop):
    """Torchvision's variant of crop a random part of the input and rescale it to some size.

    Args:
        size (int, int): target size for the output image, i.e. (height, width) after crop and resize
        scale ((float, float)): range of size of the origin size cropped
        ratio ((float, float)): range of aspect ratio of the origin aspect ratio cropped
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: Annotated[Tuple[float, float], AfterValidator(check_01)] = (0.08, 1.0)
        ratio: Annotated[Tuple[float, float], AfterValidator(check_0plus)] = (0.75, 1.3333333333333333)
        width: Optional[int] = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        height: Optional[int] = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        size: Optional[ScaleIntType] = None
        p: ProbabilityType = 1
        interpolation: InterpolationType = cv2.INTER_LINEAR

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)

            return self

    def __init__(
        self,
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: Optional[ScaleIntType] = None,
        width: Optional[int] = None,
        height: Optional[int] = None,
        *,
        scale: Tuple[float, float] = (0.08, 1.0),
        ratio: Tuple[float, float] = (0.75, 1.3333333333333333),
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(size=cast(Tuple[int, int], size), interpolation=interpolation, always_apply=always_apply, p=p)
        self.scale = scale
        self.ratio = ratio

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
        img = params["image"]
        img_height, img_width = img.shape[:2]
        area = img_height * img_width

        for _ in range(10):
            target_area = random_utils.uniform(*self.scale) * area
            log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
            aspect_ratio = math.exp(random_utils.uniform(*log_ratio))

            width = int(round(math.sqrt(target_area * aspect_ratio)))
            height = int(round(math.sqrt(target_area / aspect_ratio)))

            if 0 < width <= img_width and 0 < height <= img_height:
                i = random.randint(0, img_height - height)
                j = random.randint(0, img_width - width)
                return {
                    "crop_height": height,
                    "crop_width": width,
                    "h_start": i * 1.0 / (img_height - height + 1e-10),
                    "w_start": j * 1.0 / (img_width - width + 1e-10),
                }

        # Fallback to central crop
        in_ratio = img_width / img_height
        if in_ratio < min(self.ratio):
            width = img_width
            height = int(round(img_width / min(self.ratio)))
        elif in_ratio > max(self.ratio):
            height = img_height
            width = int(round(height * max(self.ratio)))
        else:  # whole image
            width = img_width
            height = img_height
        i = (img_height - height) // 2
        j = (img_width - width) // 2
        return {
            "crop_height": height,
            "crop_width": width,
            "h_start": i * 1.0 / (img_height - height + 1e-10),
            "w_start": j * 1.0 / (img_width - width + 1e-10),
        }

    def get_params(self) -> Dict[str, Any]:
        return {}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "size", "scale", "ratio", "interpolation"
targets_as_params: List[str] property readonly

Targets used to get params

get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params(self) -> Dict[str, Any]:
    return {}
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
    img = params["image"]
    img_height, img_width = img.shape[:2]
    area = img_height * img_width

    for _ in range(10):
        target_area = random_utils.uniform(*self.scale) * area
        log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
        aspect_ratio = math.exp(random_utils.uniform(*log_ratio))

        width = int(round(math.sqrt(target_area * aspect_ratio)))
        height = int(round(math.sqrt(target_area / aspect_ratio)))

        if 0 < width <= img_width and 0 < height <= img_height:
            i = random.randint(0, img_height - height)
            j = random.randint(0, img_width - width)
            return {
                "crop_height": height,
                "crop_width": width,
                "h_start": i * 1.0 / (img_height - height + 1e-10),
                "w_start": j * 1.0 / (img_width - width + 1e-10),
            }

    # Fallback to central crop
    in_ratio = img_width / img_height
    if in_ratio < min(self.ratio):
        width = img_width
        height = int(round(img_width / min(self.ratio)))
    elif in_ratio > max(self.ratio):
        height = img_height
        width = int(round(height * max(self.ratio)))
    else:  # whole image
        width = img_width
        height = img_height
    i = (img_height - height) // 2
    j = (img_width - width) // 2
    return {
        "crop_height": height,
        "crop_width": width,
        "h_start": i * 1.0 / (img_height - height + 1e-10),
        "w_start": j * 1.0 / (img_width - width + 1e-10),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return "size", "scale", "ratio", "interpolation"
class RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size without loss of bboxes.

Parameters:

Name Type Description
height int

height after crop and resize.

width int

width after crop and resize.

erosion_rate float

erosion rate applied on input image height before crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
    """Crop a random part of the input and rescale it to some size without loss of bboxes.

    Args:
        height: height after crop and resize.
        width: width after crop and resize.
        erosion_rate: erosion rate applied on input image height before crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.
    Targets:
        image, mask, bboxes
    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(CropInitSchema):
        erosion_rate: float = Field(
            default=0.0,
            ge=0.0,
            le=1.0,
            description="Erosion rate applied on input image height before crop.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR

    def __init__(
        self,
        height: int,
        width: int,
        erosion_rate: float = 0.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(erosion_rate, always_apply, p)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_height: int,
        crop_width: int,
        h_start: int,
        w_start: int,
        **params: Any,
    ) -> np.ndarray:
        crop = fcrops.random_crop(img, crop_height, crop_width, h_start, w_start)
        return fgeometric.resize(crop, self.height, self.width, self.interpolation)

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "height", "width", "interpolation")
apply (self, img, crop_height, crop_width, h_start, w_start, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    crop_height: int,
    crop_width: int,
    h_start: int,
    w_start: int,
    **params: Any,
) -> np.ndarray:
    crop = fcrops.random_crop(img, crop_height, crop_width, h_start, w_start)
    return fgeometric.resize(crop, self.height, self.width, self.interpolation)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return (*super().get_transform_init_args_names(), "height", "width", "interpolation")
class RandomSizedCrop (min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, always_apply=False, p=1.0) [view source on GitHub]

Crop a random portion of the input and rescale it to a specific size.

Parameters:

Name Type Description
min_max_height int, int

crop size limits.

size int, int

target size for the output image, i.e. (height, width) after crop and resize

w2h_ratio float

aspect ratio of crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedCrop(_BaseRandomSizedCrop):
    """Crop a random portion of the input and rescale it to a specific size.

    Args:
        min_max_height ((int, int)): crop size limits.
        size ((int, int)): target size for the output image, i.e. (height, width) after crop and resize
        w2h_ratio (float): aspect ratio of crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1
        min_max_height: OnePlusIntRangeType
        w2h_ratio: Annotated[float, Field(gt=0, description="Aspect ratio of crop.")]
        width: Optional[int] = Field(
            None,
            deprecated=(
                "Initializing with 'size' as an integer and a separate 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        height: Optional[int] = Field(
            None,
            deprecated=(
                "Initializing with 'height' and 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        size: Optional[ScaleIntType] = None

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)
            return self

    def __init__(
        self,
        min_max_height: Tuple[int, int],
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: Optional[ScaleIntType] = None,
        width: Optional[int] = None,
        height: Optional[int] = None,
        *,
        w2h_ratio: float = 1.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(size=cast(Tuple[int, int], size), interpolation=interpolation, always_apply=always_apply, p=p)
        self.min_max_height = min_max_height
        self.w2h_ratio = w2h_ratio

    def get_params(self) -> Dict[str, Union[int, float]]:
        crop_height = random_utils.randint(self.min_max_height[0], self.min_max_height[1])
        return {
            "h_start": random.random(),
            "w_start": random.random(),
            "crop_height": crop_height,
            "crop_width": int(crop_height * self.w2h_ratio),
        }

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return "min_max_height", "size", "w2h_ratio", "interpolation"
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params(self) -> Dict[str, Union[int, float]]:
    crop_height = random_utils.randint(self.min_max_height[0], self.min_max_height[1])
    return {
        "h_start": random.random(),
        "w_start": random.random(),
        "crop_height": crop_height,
        "crop_width": int(crop_height * self.w2h_ratio),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> Tuple[str, ...]:
    return "min_max_height", "size", "w2h_ratio", "interpolation"

domain_adaptation

class FDA (reference_images, beta_limit=(0, 0.1), read_fn=<function read_rgb_image at 0x7f91e8c4d280>, always_apply=False, p=0.5) [view source on GitHub]

Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source and target datasets, effectively adapting images from one domain to closely resemble those from another without altering their semantic content.

This transform is particularly beneficial in scenarios where the training (source) and testing (target) images come from different distributions, such as synthetic versus real images, or day versus night scenes. Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain alignment by swapping low-frequency components of the Fourier transform between the source and target images. This technique has shown to improve the performance of models on the target domain, particularly for tasks like semantic segmentation, without additional training for domain invariance.

The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more of the original image's characteristics and higher values leading to more pronounced adaptation effects. It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

Parameters:

Name Type Description
reference_images Sequence[Any]

Sequence of objects to be converted into images by read_fn. This typically involves paths to images that serve as target domain examples for adaptation.

beta_limit float or tuple of float

Coefficient beta from the paper, controlling the swapping extent of frequency components. Values should be less than 0.5.

read_fn Callable

User-defined function for reading images. It takes an element from reference_images and returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a numpy array.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

Note

FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target domain samples are unavailable. It enables significant improvements in model generalization by aligning the low-level statistics of source and target images through a simple yet effective Fourier-based method.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class FDA(ImageOnlyTransform):
    """Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation
    (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source
    and target datasets, effectively adapting images from one domain to closely resemble those from another without
    altering their semantic content.

    This transform is particularly beneficial in scenarios where the training (source) and testing (target) images
    come from different distributions, such as synthetic versus real images, or day versus night scenes.
    Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain
    alignment by swapping low-frequency components of the Fourier transform between the source and target images.
    This technique has shown to improve the performance of models on the target domain, particularly for tasks
    like semantic segmentation, without additional training for domain invariance.

    The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more
    of the original image's characteristics and higher values leading to more pronounced adaptation effects.
    It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

    Args:
        reference_images (Sequence[Any]): Sequence of objects to be converted into images by `read_fn`. This typically
            involves paths to images that serve as target domain examples for adaptation.
        beta_limit (float or tuple of float): Coefficient beta from the paper, controlling the swapping extent of
            frequency components. Values should be less than 0.5.
        read_fn (Callable): User-defined function for reading images. It takes an element from `reference_images` and
            returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a
            numpy array.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        - https://github.com/YanchaoYang/FDA
        - https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)

    Note:
        FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target
        domain samples are unavailable. It enables significant improvements in model generalization by aligning
        the low-level statistics of source and target images through a simple yet effective Fourier-based method.
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        read_fn: Callable[[Any], np.ndarray]
        beta_limit: NonNegativeFloatRangeType = (0, 0.1)

        @field_validator("beta_limit")
        @classmethod
        def check_ranges(cls, value: Tuple[float, float]) -> Tuple[float, float]:
            bounds = 0, MAX_BETA_LIMIT
            if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
                raise ValueError(f"Values should be in the range {bounds} got {value} ")
            return value

    def __init__(
        self,
        reference_images: Sequence[Any],
        beta_limit: ScaleFloatType = (0, 0.1),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.beta_limit = cast(Tuple[float, float], beta_limit)

    def apply(
        self,
        img: np.ndarray,
        target_image: np.ndarray,
        beta: float,
        **params: Any,
    ) -> np.ndarray:
        return fourier_domain_adaptation(img, target_image, beta)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
        img = params["image"]
        target_img = self.read_fn(random.choice(self.reference_images))
        target_img = cv2.resize(target_img, dsize=(img.shape[1], img.shape[0]))

        return {"target_image": target_img}

    def get_params(self) -> Dict[str, float]:
        return {"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return "reference_images", "beta_limit", "read_fn"

    def to_dict_private(self) -> Dict[str, Any]:
        msg = "FDA can not be serialized."
        raise NotImplementedError(msg)
targets_as_params: List[str] property readonly

Targets used to get params

apply (self, img, target_image, beta, **params)

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def apply(
    self,
    img: np.ndarray,
    target_image: np.ndarray,
    beta: float,
    **params: Any,
) -> np.ndarray:
    return fourier_domain_adaptation(img, target_image, beta)
get_params (self)

Returns parameters independent of input

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_params(self) -> Dict[str, float]:
    return {"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}
get_params_dependent_on_targets (self, params)

Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
    img = params["image"]
    target_img = self.read_fn(random.choice(self.reference_images))
    target_img = cv2.resize(target_img, dsize=(img.shape[1], img.shape[0]))

    return {"target_image": target_img}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
    return "reference_images", "beta_limit", "read_fn"

class HistogramMatching (reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f91e8c4d280>, always_apply=False, p=0.5) [view source on GitHub]

Implements histogram matching, a technique that adjusts the pixel values of an input image to match the histogram of a reference image. This adjustment ensures that the output image has a similar tone and contrast to the reference. The process is applied independently to each channel of multi-channel images, provided both the input and reference images have the same number of channels.

Histogram matching serves as an effective normalization method in image processing tasks such as feature matching. It is particularly useful when images originate from varied sources or are captured under different lighting conditions, helping to standardize the images' appearance before further processing.

Parameters:

Name Type Description
reference_images Sequence[Any]

A sequence of objects to be converted into images by read_fn. Typically, this is a sequence of image paths.

blend_ratio Tuple[float, float]

Specifies the minimum and maximum blend ratio for blending the matched image with the original image. A random blend factor within this range is chosen for each image to increase the diversity of the output images.

read_fn Callable[[Any], np.ndarray]

A user-defined function for reading images, which accepts an element from reference_images and returns a numpy array of image pixels. By default, this is expected to take a file path and return an image as a numpy array.

p float

The probability of applying the transform to any given image. Defaults to 0.5.

Targets

image

Image types: uint8, float32

Note

This class cannot be serialized directly due to its dynamic nature and dependency on external image data. An attempt to serialize it will raise a NotImplementedError.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class HistogramMatching(ImageOnlyTransform):
    """Implements histogram matching, a technique that adjusts the pixel values of an input image
    to match the histogram of a reference image. This adjustment ensures that the output image
    has a similar tone and contrast to the reference. The process is applied independently to
    each channel of multi-channel images, provided both the input and reference images have the
    same number of channels.

    Histogram matching serves as an effective normalization method in image processing tasks such
    as feature matching. It is particularly useful when images originate from varied sources or are
    captured under different lighting conditions, helping to standardize the images' appearance
    before further processing.

    Args:
        reference_images (Sequence[Any]): A sequence of objects to be converted into images by `read_fn`.
            Typically, this is a sequence of image paths.
        blend_ratio (Tuple[float, float]): Specifies the minimum and maximum blend ratio for blending the matched
            image with the original image. A random blend factor within this range is chosen for each image to
            increase the diversity of the output images.
        read_fn (Callable[[Any], np.ndarray]): A user-defined function for reading images, which accepts an
            element from `reference_images` and returns a numpy array of image pixels. By default, this is expected
            to take a file path and return an image as a numpy array.
        p (float): The probability of applying the transform to any given image. Defaults to 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        This class cannot be serialized directly due to its dynamic nature and dependency on external image data.
        An attempt to serialize it will raise a NotImplementedError.

    Reference:
        https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        blend_ratio: ZeroOneRangeType = (0.5, 1.0)
        read_fn: Callable[[Any], np.ndarray]

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: Tuple[float, float] = (0.5, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio

    def apply(
        self: np.ndarray,
        img: np.ndarray,
        reference_image: np.ndarray,
        blend_ratio: float,
        **params: Any,
    ) -> np.ndarray:
        return apply_histogram(img, reference_image, blend_ratio)

    def get_params(self) -> Dict[str, np.ndarray]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("reference_images", "blend_ratio",