Skip to content

Full API Reference on a single page

Pixel-level transforms

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

Spatial-level transforms

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform Image Mask BBoxes Keypoints
Affine
BBoxSafeRandomCrop
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
D4
ElasticTransform
Erasing
FrequencyMasking
GridDistortion
GridDropout
GridElasticDeform
HorizontalFlip
Lambda
LongestMaxSize
MaskDropout
Morphological
NoOp
OpticalDistortion
OverlayElements
Pad
PadIfNeeded
Perspective
PiecewiseAffine
PixelDropout
RandomCrop
RandomCropFromBorders
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
RotateAndProject
SafeRotate
ShiftScaleRotate
SmallestMaxSize
ThinPlateSpline
TimeMasking
TimeReverse
Transpose
VerticalFlip
XYMasking

augmentations special

blur special

functional

def create_motion_kernel (kernel_size, angle, direction, allow_shifted, random_state) [view source on GitHub]

Create a motion blur kernel.

Parameters:

Name Type Description
kernel_size int

Size of the kernel (must be odd)

angle float

Angle in degrees (counter-clockwise)

direction float

Blur direction (-1.0 to 1.0)

allow_shifted bool

Allow kernel to be randomly shifted from center

random_state Random

Python's random.Random instance

Returns:

Type Description
np.ndarray

Motion blur kernel

Source code in albumentations/augmentations/blur/functional.py
Python
def create_motion_kernel(
    kernel_size: int,
    angle: float,
    direction: float,
    allow_shifted: bool,
    random_state: Random,
) -> np.ndarray:
    """Create a motion blur kernel.

    Args:
        kernel_size: Size of the kernel (must be odd)
        angle: Angle in degrees (counter-clockwise)
        direction: Blur direction (-1.0 to 1.0)
        allow_shifted: Allow kernel to be randomly shifted from center
        random_state: Python's random.Random instance

    Returns:
        Motion blur kernel
    """
    kernel = np.zeros((kernel_size, kernel_size), dtype=np.float32)
    center = kernel_size // 2

    # Convert angle to radians
    angle_rad = np.deg2rad(angle)

    # Calculate direction vector
    dx = np.cos(angle_rad)
    dy = np.sin(angle_rad)

    # Create line points with direction bias
    line_length = kernel_size // 2
    t = np.linspace(-line_length, line_length, kernel_size * 2)

    # Apply direction bias
    if direction != 0:
        t = t * (1 + abs(direction))
        if direction < 0:
            t = t * -1

    # Generate line coordinates
    x = center + dx * t
    y = center + dy * t

    # Apply random shift if allowed
    if allow_shifted and random_state is not None:
        shift_x = random_state.uniform(-1, 1) * line_length / 2
        shift_y = random_state.uniform(-1, 1) * line_length / 2
        x += shift_x
        y += shift_y

    # Round coordinates and clip to kernel bounds
    x = np.clip(np.round(x), 0, kernel_size - 1).astype(int)
    y = np.clip(np.round(y), 0, kernel_size - 1).astype(int)

    # Keep only unique points to avoid multiple assignments
    points = np.unique(np.column_stack([y, x]), axis=0)
    kernel[points[:, 0], points[:, 1]] = 1

    # Ensure at least one point is set
    if not kernel.any():
        kernel[center, center] = 1

    return kernel
def process_blur_limit (value, info, min_value=0) [view source on GitHub]

Process blur limit to ensure valid kernel sizes.

Source code in albumentations/augmentations/blur/functional.py
Python
def process_blur_limit(value: ScaleIntType, info: ValidationInfo, min_value: int = 0) -> tuple[int, int]:
    """Process blur limit to ensure valid kernel sizes."""
    result = value if isinstance(value, Sequence) else (min_value, value)

    result = _ensure_min_value(result, min_value, info.field_name)
    result = _ensure_odd_values(result, info.field_name)

    if result[0] > result[1]:
        final_result = (result[1], result[1])
        warn(
            f"{info.field_name}: Invalid range {result} (min > max). "
            f"Range automatically adjusted to {final_result}.",
            UserWarning,
            stacklevel=2,
        )
        return final_result

    return result
def sample_odd_from_range (random_state, low, high) [view source on GitHub]

Sample an odd number from the range [low, high] (inclusive).

Parameters:

Name Type Description
random_state Random

instance of random.Random

low int

lower bound (will be converted to nearest valid odd number)

high int

upper bound (will be converted to nearest valid odd number)

Returns:

Type Description
int

Randomly sampled odd number from the range

Note

  • Input values will be converted to nearest valid odd numbers:
  • Values less than 3 will become 3
  • Even values will be rounded up to next odd number
  • After normalization, high must be >= low
Source code in albumentations/augmentations/blur/functional.py
Python
def sample_odd_from_range(random_state: Random, low: int, high: int) -> int:
    """Sample an odd number from the range [low, high] (inclusive).

    Args:
        random_state: instance of random.Random
        low: lower bound (will be converted to nearest valid odd number)
        high: upper bound (will be converted to nearest valid odd number)

    Returns:
        Randomly sampled odd number from the range

    Note:
        - Input values will be converted to nearest valid odd numbers:
          * Values less than 3 will become 3
          * Even values will be rounded up to next odd number
        - After normalization, high must be >= low
    """
    # Normalize low value
    low = max(3, low + (low % 2 == 0))
    # Normalize high value
    high = max(3, high + (high % 2 == 0))

    # Ensure high >= low after normalization
    high = max(high, low)

    if low == high:
        return low

    # Calculate number of possible odd values
    num_odd_values = (high - low) // 2 + 1
    # Generate random index and convert to corresponding odd number
    rand_idx = random_state.randint(0, num_odd_values - 1)
    return low + (2 * rand_idx)

transforms

class AdvancedBlur (blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=(-90, 90), beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=None, p=0.5) [view source on GitHub]

Applies a Generalized Gaussian blur to the input image with randomized parameters for advanced data augmentation.

This transform creates a custom blur kernel based on the Generalized Gaussian distribution, which allows for a wide range of blur effects beyond standard Gaussian blur. It then applies this kernel to the input image through convolution. The transform also incorporates noise into the kernel, resulting in a unique combination of blurring and noise injection.

Key features of this augmentation:

  1. Generalized Gaussian Kernel: Uses a generalized normal distribution to create kernels that can range from box-like blurs to very peaked blurs, controlled by the beta parameter.

  2. Anisotropic Blurring: Allows for different blur strengths in horizontal and vertical directions (controlled by sigma_x and sigma_y), and rotation of the kernel.

  3. Kernel Noise: Adds multiplicative noise to the kernel before applying it to the image, creating more diverse and realistic blur effects.

Implementation Details: The kernel is generated using a 2D Generalized Gaussian function. The process involves: 1. Creating a 2D grid based on the kernel size 2. Applying rotation to this grid 3. Calculating the kernel values using the Generalized Gaussian formula 4. Adding multiplicative noise to the kernel 5. Normalizing the kernel

The resulting kernel is then applied to the image using convolution.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Controls the size of the blur kernel. If a single int is provided, the kernel size will be randomly chosen between 3 and that value. Must be odd and ≥ 3. Larger values create stronger blur effects. Default: (3, 7)

sigma_x_limit tuple[float, float] | float

Controls the spread of the blur in the x direction. Higher values increase blur strength. If a single float is provided, the range will be (0, limit). Default: (0.2, 1.0)

sigma_y_limit tuple[float, float] | float

Controls the spread of the blur in the y direction. Higher values increase blur strength. If a single float is provided, the range will be (0, limit). Default: (0.2, 1.0)

rotate_limit tuple[int, int] | int

Range of angles (in degrees) for rotating the kernel. This rotation allows for diagonal blur directions. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Default: (-90, 90)

beta_limit tuple[float, float] | float

Shape parameter of the Generalized Gaussian distribution. - beta = 1 gives a standard Gaussian distribution - beta < 1 creates heavier tails, resulting in more uniform, box-like blur - beta > 1 creates lighter tails, resulting in more peaked, focused blur Default: (0.5, 8.0)

noise_limit tuple[float, float] | float

Controls the strength of multiplicative noise applied to the kernel. Values around 1.0 keep the original kernel mostly intact, while values further from 1.0 introduce more variation. Default: (0.75, 1.25)

p float

Probability of applying the transform. Default: 0.5

Notes

  • This transform is particularly useful for simulating complex, real-world blur effects that go beyond simple Gaussian blur.
  • The combination of blur and noise can help in creating more robust models by simulating a wider range of image degradations.
  • Extreme values, especially for beta and noise, may result in unrealistic effects and should be used cautiously.

Reference

This transform is inspired by techniques described in: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" https://arxiv.org/abs/2107.10833

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class AdvancedBlur(ImageOnlyTransform):
    """Applies a Generalized Gaussian blur to the input image with randomized parameters for advanced data augmentation.

    This transform creates a custom blur kernel based on the Generalized Gaussian distribution,
    which allows for a wide range of blur effects beyond standard Gaussian blur. It then applies
    this kernel to the input image through convolution. The transform also incorporates noise
    into the kernel, resulting in a unique combination of blurring and noise injection.

    Key features of this augmentation:

    1. Generalized Gaussian Kernel: Uses a generalized normal distribution to create kernels
       that can range from box-like blurs to very peaked blurs, controlled by the beta parameter.

    2. Anisotropic Blurring: Allows for different blur strengths in horizontal and vertical
       directions (controlled by sigma_x and sigma_y), and rotation of the kernel.

    3. Kernel Noise: Adds multiplicative noise to the kernel before applying it to the image,
       creating more diverse and realistic blur effects.

    Implementation Details:
        The kernel is generated using a 2D Generalized Gaussian function. The process involves:
        1. Creating a 2D grid based on the kernel size
        2. Applying rotation to this grid
        3. Calculating the kernel values using the Generalized Gaussian formula
        4. Adding multiplicative noise to the kernel
        5. Normalizing the kernel

        The resulting kernel is then applied to the image using convolution.

    Args:
        blur_limit (tuple[int, int] | int, optional): Controls the size of the blur kernel. If a single int
            is provided, the kernel size will be randomly chosen between 3 and that value.
            Must be odd and ≥ 3. Larger values create stronger blur effects.
            Default: (3, 7)

        sigma_x_limit (tuple[float, float] | float): Controls the spread of the blur in the x direction.
            Higher values increase blur strength.
            If a single float is provided, the range will be (0, limit).
            Default: (0.2, 1.0)

        sigma_y_limit (tuple[float, float] | float): Controls the spread of the blur in the y direction.
            Higher values increase blur strength.
            If a single float is provided, the range will be (0, limit).
            Default: (0.2, 1.0)

        rotate_limit (tuple[int, int] | int): Range of angles (in degrees) for rotating the kernel.
            This rotation allows for diagonal blur directions. If limit is a single int, an angle is picked
            from (-rotate_limit, rotate_limit).
            Default: (-90, 90)

        beta_limit (tuple[float, float] | float): Shape parameter of the Generalized Gaussian distribution.
            - beta = 1 gives a standard Gaussian distribution
            - beta < 1 creates heavier tails, resulting in more uniform, box-like blur
            - beta > 1 creates lighter tails, resulting in more peaked, focused blur
            Default: (0.5, 8.0)

        noise_limit (tuple[float, float] | float): Controls the strength of multiplicative noise
            applied to the kernel. Values around 1.0 keep the original kernel mostly intact,
            while values further from 1.0 introduce more variation.
            Default: (0.75, 1.25)

        p (float): Probability of applying the transform. Default: 0.5

    Notes:
        - This transform is particularly useful for simulating complex, real-world blur effects
          that go beyond simple Gaussian blur.
        - The combination of blur and noise can help in creating more robust models by simulating
          a wider range of image degradations.
        - Extreme values, especially for beta and noise, may result in unrealistic effects and
          should be used cautiously.

    Reference:
        This transform is inspired by techniques described in:
        "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data"
        https://arxiv.org/abs/2107.10833

    Targets:
        image

    Image types:
        uint8, float32
    """

    class InitSchema(BlurInitSchema):
        sigma_x_limit: NonNegativeFloatRangeType
        sigma_y_limit: NonNegativeFloatRangeType
        beta_limit: NonNegativeFloatRangeType
        noise_limit: NonNegativeFloatRangeType
        rotate_limit: SymmetricRangeType

        @field_validator("beta_limit")
        @classmethod
        def check_beta_limit(cls, value: ScaleFloatType) -> tuple[float, float]:
            result = to_tuple(value, low=0)
            if not (result[0] < 1.0 < result[1]):
                msg = "beta_limit is expected to include 1.0."
                raise ValueError(msg)
            return result

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.sigma_x_limit, (tuple, list))
                and self.sigma_x_limit[0] == 0
                and isinstance(self.sigma_y_limit, (tuple, list))
                and self.sigma_y_limit[0] == 0
            ):
                msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_x_limit: ScaleFloatType = (0.2, 1.0),
        sigma_y_limit: ScaleFloatType = (0.2, 1.0),
        sigmaX_limit: ScaleFloatType | None = None,  # noqa: N803
        sigmaY_limit: ScaleFloatType | None = None,  # noqa: N803
        rotate_limit: ScaleIntType = (-90, 90),
        beta_limit: ScaleFloatType = (0.5, 8.0),
        noise_limit: ScaleFloatType = (0.9, 1.1),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        if sigmaX_limit is not None:
            warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_x_limit = sigmaX_limit

        if sigmaY_limit is not None:
            warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_y_limit = sigmaY_limit

        self.blur_limit = cast(tuple[int, int], blur_limit)
        self.sigma_x_limit = cast(tuple[float, float], sigma_x_limit)
        self.sigma_y_limit = cast(tuple[float, float], sigma_y_limit)
        self.rotate_limit = cast(tuple[int, int], rotate_limit)
        self.beta_limit = cast(tuple[float, float], beta_limit)
        self.noise_limit = cast(tuple[float, float], noise_limit)

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> dict[str, np.ndarray]:
        ksize = fblur.sample_odd_from_range(self.py_random, self.blur_limit[0], self.blur_limit[1])
        sigma_x = self.py_random.uniform(*self.sigma_x_limit)
        sigma_y = self.py_random.uniform(*self.sigma_y_limit)
        angle = np.deg2rad(self.py_random.uniform(*self.rotate_limit))

        # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
        beta = (
            self.py_random.uniform(self.beta_limit[0], 1)
            if self.py_random.random() < HALF
            else self.py_random.uniform(1, self.beta_limit[1])
        )

        noise_matrix = self.random_generator.uniform(*self.noise_limit, size=(ksize, ksize))

        # Generate mesh grid centered at zero.
        ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
        # > Shape (ksize, ksize, 2)
        grid = np.stack(np.meshgrid(ax, ax), axis=-1)

        # Calculate rotated sigma matrix
        d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
        u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
        sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

        inverse_sigma = np.linalg.inv(sigma_matrix)
        # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
        kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
        # Add noise
        kernel *= noise_matrix

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)
        return {"kernel": kernel}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str]:
        return (
            "blur_limit",
            "sigma_x_limit",
            "sigma_y_limit",
            "rotate_limit",
            "beta_limit",
            "noise_limit",
        )
class Blur (blur_limit=(3, 7), p=0.5, always_apply=None) [view source on GitHub]

Apply uniform box blur to the input image using a randomly sized square kernel.

This transform uses OpenCV's cv2.blur function, which performs a simple box filter blur. The size of the blur kernel is randomly selected for each application, allowing for varying degrees of blur intensity.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Controls the range of the blur kernel size. - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. The kernel size must be odd and greater than or equal to 3. Larger kernel sizes produce stronger blur effects. Default: (3, 7)

p float

Probability of applying the transform. Default: 0.5

Notes

  • The blur kernel is always square (same width and height).
  • Only odd kernel sizes are used to ensure the blur has a clear center pixel.
  • Box blur is faster than Gaussian blur but may produce less natural results.
  • This blur method averages all pixels under the kernel area, which can reduce noise but also reduce image detail.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Blur(blur_limit=(3, 7), p=1.0)
>>> result = transform(image=image)
>>> blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class Blur(ImageOnlyTransform):
    """Apply uniform box blur to the input image using a randomly sized square kernel.

    This transform uses OpenCV's cv2.blur function, which performs a simple box filter blur.
    The size of the blur kernel is randomly selected for each application, allowing for
    varying degrees of blur intensity.

    Args:
        blur_limit (tuple[int, int] | int): Controls the range of the blur kernel size.
            - If a single int is provided, the kernel size will be randomly chosen
              between 3 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            The kernel size must be odd and greater than or equal to 3.
            Larger kernel sizes produce stronger blur effects.
            Default: (3, 7)

        p (float): Probability of applying the transform. Default: 0.5

    Notes:
        - The blur kernel is always square (same width and height).
        - Only odd kernel sizes are used to ensure the blur has a clear center pixel.
        - Box blur is faster than Gaussian blur but may produce less natural results.
        - This blur method averages all pixels under the kernel area, which can
          reduce noise but also reduce image detail.

    Targets:
        image

    Image types:
        uint8, float32

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.Blur(blur_limit=(3, 7), p=1.0)
        >>> result = transform(image=image)
        >>> blurred_image = result["image"]
    """

    class InitSchema(BlurInitSchema):
        pass

    def __init__(self, blur_limit: ScaleIntType = (3, 7), p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.blur_limit = cast(tuple[int, int], blur_limit)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.blur(img, kernel)

    def get_params(self) -> dict[str, Any]:
        kernel = fblur.sample_odd_from_range(
            self.py_random,
            self.blur_limit[0],
            self.blur_limit[1],
        )
        return {"kernel": kernel}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("blur_limit",)
class BlurInitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class BlurInitSchema(BaseTransformInitSchema):
    blur_limit: ScaleIntType

    @field_validator("blur_limit")
    @classmethod
    def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
        return fblur.process_blur_limit(value, info, min_value=3)
class Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=None, p=0.5) [view source on GitHub]

Apply defocus blur to the input image.

This transform simulates the effect of an out-of-focus camera by applying a defocus blur to the image. It uses a combination of disc kernels and Gaussian blur to create a realistic defocus effect.

Parameters:

Name Type Description
radius tuple[int, int] | int

Range for the radius of the defocus blur. If a single int is provided, the range will be [1, radius]. Larger values create a stronger blur effect. Default: (3, 10)

alias_blur tuple[float, float] | float

Range for the standard deviation of the Gaussian blur applied after the main defocus blur. This helps to reduce aliasing artifacts. If a single float is provided, the range will be (0, alias_blur). Larger values create a smoother, more aliased effect. Default: (0.1, 0.5)

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Note

  • The defocus effect is created using a disc kernel, which simulates the shape of a camera's aperture.
  • The additional Gaussian blur (alias_blur) helps to soften the edges of the disc kernel, creating a more natural-looking defocus effect.
  • Larger radius values will create a stronger, more noticeable defocus effect.
  • The alias_blur parameter can be used to fine-tune the appearance of the defocus, with larger values creating a smoother, potentially more realistic effect.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Defocus(radius=(4, 8), alias_blur=(0.2, 0.4), always_apply=True)
>>> result = transform(image=image)
>>> defocused_image = result['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class Defocus(ImageOnlyTransform):
    """Apply defocus blur to the input image.

    This transform simulates the effect of an out-of-focus camera by applying a defocus blur
    to the image. It uses a combination of disc kernels and Gaussian blur to create a realistic
    defocus effect.

    Args:
        radius (tuple[int, int] | int): Range for the radius of the defocus blur.
            If a single int is provided, the range will be [1, radius].
            Larger values create a stronger blur effect.
            Default: (3, 10)

        alias_blur (tuple[float, float] | float): Range for the standard deviation of the Gaussian blur
            applied after the main defocus blur. This helps to reduce aliasing artifacts.
            If a single float is provided, the range will be (0, alias_blur).
            Larger values create a smoother, more aliased effect.
            Default: (0.1, 0.5)

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The defocus effect is created using a disc kernel, which simulates the shape of a camera's aperture.
        - The additional Gaussian blur (alias_blur) helps to soften the edges of the disc kernel, creating a
          more natural-looking defocus effect.
        - Larger radius values will create a stronger, more noticeable defocus effect.
        - The alias_blur parameter can be used to fine-tune the appearance of the defocus, with larger values
          creating a smoother, potentially more realistic effect.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.Defocus(radius=(4, 8), alias_blur=(0.2, 0.4), always_apply=True)
        >>> result = transform(image=image)
        >>> defocused_image = result['image']

    References:
        - https://en.wikipedia.org/wiki/Defocus_aberration
        - https://www.researchgate.net/publication/261311609_Realistic_Defocus_Blur_for_Multiplane_Computer-Generated_Holography
    """

    class InitSchema(BaseTransformInitSchema):
        radius: OnePlusIntRangeType
        alias_blur: NonNegativeFloatRangeType

    def __init__(
        self,
        radius: ScaleIntType = (3, 10),
        alias_blur: ScaleFloatType = (0.1, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.radius = cast(tuple[int, int], radius)
        self.alias_blur = cast(tuple[float, float], alias_blur)

    def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
        return fblur.defocus(img, radius, alias_blur)

    def get_params(self) -> dict[str, Any]:
        return {
            "radius": self.py_random.randint(*self.radius),
            "alias_blur": self.py_random.uniform(*self.alias_blur),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("radius", "alias_blur")
class GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=None, p=0.5) [view source on GitHub]

Apply Gaussian blur to the input image using a randomly sized kernel.

This transform blurs the input image using a Gaussian filter with a random kernel size and sigma value. Gaussian blur is a widely used image processing technique that reduces image noise and detail, creating a smoothing effect.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Controls the range of the Gaussian kernel size. - If a single int is provided, the kernel size will be randomly chosen between 0 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. Larger kernel sizes produce stronger blur effects. Default: (3, 7)

sigma_limit tuple[float, float] | float

Range for the Gaussian kernel standard deviation (sigma). Must be in range [0, inf). - If a single float is provided, sigma will be randomly chosen between 0 and that value. - If a tuple of two floats is provided, it defines the inclusive range of possible sigma values. If set to 0, sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Larger sigma values produce stronger blur effects. Default: 0

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The relationship between kernel size and sigma affects the blur strength: larger kernel sizes allow for stronger blurring effects.
  • When both blur_limit and sigma_limit are set to ranges starting from 0, the blur_limit minimum is automatically set to 3 to ensure a valid kernel size.
  • For uint8 images, the computation might be faster than for floating-point images.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GaussianBlur(blur_limit=(3, 7), sigma_limit=(0.1, 2), p=1)
>>> result = transform(image=image)
>>> blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class GaussianBlur(ImageOnlyTransform):
    """Apply Gaussian blur to the input image using a randomly sized kernel.

    This transform blurs the input image using a Gaussian filter with a random kernel size
    and sigma value. Gaussian blur is a widely used image processing technique that reduces
    image noise and detail, creating a smoothing effect.

    Args:
        blur_limit (tuple[int, int] | int): Controls the range of the Gaussian kernel size.
            - If a single int is provided, the kernel size will be randomly chosen
              between 0 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            Must be zero or odd and in range [0, inf). If set to 0, it will be computed
            from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            Larger kernel sizes produce stronger blur effects.
            Default: (3, 7)

        sigma_limit (tuple[float, float] | float): Range for the Gaussian kernel standard
            deviation (sigma). Must be in range [0, inf).
            - If a single float is provided, sigma will be randomly chosen
              between 0 and that value.
            - If a tuple of two floats is provided, it defines the inclusive range
              of possible sigma values.
            If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
            Larger sigma values produce stronger blur effects.
            Default: 0

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The relationship between kernel size and sigma affects the blur strength:
          larger kernel sizes allow for stronger blurring effects.
        - When both blur_limit and sigma_limit are set to ranges starting from 0,
          the blur_limit minimum is automatically set to 3 to ensure a valid kernel size.
        - For uint8 images, the computation might be faster than for floating-point images.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.GaussianBlur(blur_limit=(3, 7), sigma_limit=(0.1, 2), p=1)
        >>> result = transform(image=image)
        >>> blurred_image = result["image"]
    """

    class InitSchema(BlurInitSchema):
        sigma_limit: NonNegativeFloatRangeType

        @field_validator("blur_limit")
        @classmethod
        def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
            return fblur.process_blur_limit(value, info, min_value=0)

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.blur_limit, (tuple, list))
                and self.blur_limit[0] == 0
                and isinstance(self.sigma_limit, (tuple, list))
                and self.sigma_limit[0] == 0
            ):
                self.blur_limit = 3, max(3, self.blur_limit[1])
                warnings.warn(
                    "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                    "blur_limit minimum value changed to 3.",
                    stacklevel=2,
                )

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.blur_limit = cast(tuple[int, int], blur_limit)
        self.sigma_limit = cast(tuple[float, float], sigma_limit)

    def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
        return fblur.gaussian_blur(img, ksize, sigma=sigma)

    def get_params(self) -> dict[str, float]:
        ksize = fblur.sample_odd_from_range(
            self.py_random,
            self.blur_limit[0],
            self.blur_limit[1],
        )

        return {"ksize": ksize, "sigma": self.py_random.uniform(*self.sigma_limit)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "blur_limit", "sigma_limit"
class GlassBlur (sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=None, p=0.5) [view source on GitHub]

Apply a glass blur effect to the input image.

This transform simulates the effect of looking through textured glass by locally shuffling pixels in the image. It creates a distorted, frosted glass-like appearance.

Parameters:

Name Type Description
sigma float

Standard deviation for the Gaussian kernel used in the process. Higher values increase the blur effect. Must be non-negative. Default: 0.7

max_delta int

Maximum distance in pixels for shuffling. Determines how far pixels can be moved. Larger values create more distortion. Must be a positive integer. Default: 4

iterations int

Number of times to apply the glass blur effect. More iterations create a stronger effect but increase computation time. Must be a positive integer. Default: 2

mode Literal["fast", "exact"]

Mode of computation. Options are: - "fast": Uses a faster but potentially less accurate method. - "exact": Uses a slower but more precise method. Default: "fast"

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • This transform is particularly effective for creating a 'looking through glass' effect or simulating the view through a frosted window.
  • The 'fast' mode is recommended for most use cases as it provides a good balance between effect quality and computation speed.
  • Increasing 'iterations' will strengthen the effect but also increase the processing time linearly.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GlassBlur(sigma=0.7, max_delta=4, iterations=3, mode="fast", p=1)
>>> result = transform(image=image)
>>> glass_blurred_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class GlassBlur(ImageOnlyTransform):
    """Apply a glass blur effect to the input image.

    This transform simulates the effect of looking through textured glass by locally
    shuffling pixels in the image. It creates a distorted, frosted glass-like appearance.

    Args:
        sigma (float): Standard deviation for the Gaussian kernel used in the process.
            Higher values increase the blur effect. Must be non-negative.
            Default: 0.7

        max_delta (int): Maximum distance in pixels for shuffling.
            Determines how far pixels can be moved. Larger values create more distortion.
            Must be a positive integer.
            Default: 4

        iterations (int): Number of times to apply the glass blur effect.
            More iterations create a stronger effect but increase computation time.
            Must be a positive integer.
            Default: 2

        mode (Literal["fast", "exact"]): Mode of computation. Options are:
            - "fast": Uses a faster but potentially less accurate method.
            - "exact": Uses a slower but more precise method.
            Default: "fast"

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - This transform is particularly effective for creating a 'looking through
          glass' effect or simulating the view through a frosted window.
        - The 'fast' mode is recommended for most use cases as it provides a good
          balance between effect quality and computation speed.
        - Increasing 'iterations' will strengthen the effect but also increase the
          processing time linearly.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.GlassBlur(sigma=0.7, max_delta=4, iterations=3, mode="fast", p=1)
        >>> result = transform(image=image)
        >>> glass_blurred_image = result["image"]

    References:
        - This implementation is based on the technique described in:
          "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness"
          https://arxiv.org/abs/1903.12261
        - Original implementation:
          https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
    """

    class InitSchema(BaseTransformInitSchema):
        sigma: float = Field(ge=0)
        max_delta: int = Field(ge=1)
        iterations: int = Field(ge=1)
        mode: Literal["fast", "exact"]

    def __init__(
        self,
        sigma: float = 0.7,
        max_delta: int = 4,
        iterations: int = 2,
        mode: Literal["fast", "exact"] = "fast",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.sigma = sigma
        self.max_delta = max_delta
        self.iterations = iterations
        self.mode = mode

    def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
        return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]

        # generate array containing all necessary values for transformations
        width_pixels = height - self.max_delta * 2
        height_pixels = width - self.max_delta * 2
        total_pixels = int(width_pixels * height_pixels)
        dxy = self.random_generator.integers(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

        return {"dxy": dxy}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "sigma", "max_delta", "iterations", "mode"
class MedianBlur (blur_limit=7, p=0.5, always_apply=None) [view source on GitHub]

Apply median blur to the input image.

This transform uses a median filter to blur the input image. Median filtering is particularly effective at removing salt-and-pepper noise while preserving edges, making it a popular choice for noise reduction in image processing.

Parameters:

Name Type Description
blur_limit int | tuple[int, int]

Maximum aperture linear size for blurring the input image. Must be odd and in the range [3, inf). - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Default: (3, 7)

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The kernel size (aperture linear size) must always be odd and greater than 1.
  • Unlike mean blur or Gaussian blur, median blur uses the median of all pixels under the kernel area, making it more robust to outliers.
  • This transform is particularly useful for:
  • Removing salt-and-pepper noise
  • Preserving edges while smoothing images
  • Pre-processing images for edge detection algorithms
  • For color images, the median is calculated independently for each channel.
  • Larger kernel sizes result in stronger blurring effects but may also remove fine details from the image.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MedianBlur(blur_limit=(3, 7), p=0.5)
>>> result = transform(image=image)
>>> blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class MedianBlur(Blur):
    """Apply median blur to the input image.

    This transform uses a median filter to blur the input image. Median filtering is particularly
    effective at removing salt-and-pepper noise while preserving edges, making it a popular choice
    for noise reduction in image processing.

    Args:
        blur_limit (int | tuple[int, int]): Maximum aperture linear size for blurring the input image.
            Must be odd and in the range [3, inf).
            - If a single int is provided, the kernel size will be randomly chosen
              between 3 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            Default: (3, 7)

        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The kernel size (aperture linear size) must always be odd and greater than 1.
        - Unlike mean blur or Gaussian blur, median blur uses the median of all pixels under
          the kernel area, making it more robust to outliers.
        - This transform is particularly useful for:
          * Removing salt-and-pepper noise
          * Preserving edges while smoothing images
          * Pre-processing images for edge detection algorithms
        - For color images, the median is calculated independently for each channel.
        - Larger kernel sizes result in stronger blurring effects but may also remove
          fine details from the image.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.MedianBlur(blur_limit=(3, 7), p=0.5)
        >>> result = transform(image=image)
        >>> blurred_image = result["image"]

    References:
        - Median filter: https://en.wikipedia.org/wiki/Median_filter
        - OpenCV medianBlur: https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga564869aa33e58769b4469101aac458f9
    """

    def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(blur_limit=blur_limit, p=p, always_apply=always_apply)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.median_blur(img, kernel)
class MotionBlur (blur_limit=7, allow_shifted=True, angle_range=(0, 360), direction_range=(-0.5, 0.5), always_apply=None, p=0.5) [view source on GitHub]

Apply motion blur to the input image using a directional kernel.

This transform simulates motion blur effects that occur during image capture, such as camera shake or object movement. It creates a directional blur using a line-shaped kernel with controllable angle, direction, and position.

Parameters:

Name Type Description
blur_limit int | tuple[int, int]

Maximum kernel size for blurring. Should be in range [3, inf). - If int: kernel size will be randomly chosen from [3, blur_limit] - If tuple: kernel size will be randomly chosen from [min, max] Larger values create stronger blur effects. Default: (3, 7)

angle_range tuple[float, float]

Range of possible angles in degrees. Controls the rotation of the motion blur line: - 0°: Horizontal motion blur → - 45°: Diagonal motion blur ↗ - 90°: Vertical motion blur ↑ - 135°: Diagonal motion blur ↖ Default: (0, 360)

direction_range tuple[float, float]

Range for motion bias. Controls how the blur extends from the center: - -1.0: Blur extends only backward (←) - 0.0: Blur extends equally in both directions (←→) - 1.0: Blur extends only forward (→) For example, with angle=0: - direction=-1.0: ←• - direction=0.0: ←•→ - direction=1.0: •→ Default: (-0.5, 0.5)

allow_shifted bool

Allow random kernel position shifts. - If True: Kernel can be randomly offset from center - If False: Kernel will always be centered Default: True

p float

Probability of applying the transform. Default: 0.5

Examples of angle vs direction: 1. Horizontal motion (angle=0°): - direction=0.0: ←•→ (symmetric blur) - direction=1.0: •→ (forward blur) - direction=-1.0: ←• (backward blur)

2. Vertical motion (angle=90°):
   - direction=0.0:   ↑•↓   (symmetric blur)
   - direction=1.0:    •↑   (upward blur)
   - direction=-1.0:  ↓•    (downward blur)

3. Diagonal motion (angle=45°):
   - direction=0.0:   ↙•↗   (symmetric blur)
   - direction=1.0:    •↗   (forward diagonal blur)
   - direction=-1.0:  ↙•    (backward diagonal blur)

Note

  • angle controls the orientation of the motion line
  • direction controls the distribution of the blur along that line
  • Together they can simulate various motion effects:
  • Camera shake: Small angle range + direction near 0
  • Object motion: Specific angle + direction=1.0
  • Complex motion: Random angle + random direction

Examples:

Python
>>> import albumentations as A
>>> # Horizontal camera shake (symmetric)
>>> transform = A.MotionBlur(
...     angle_range=(-5, 5),      # Near-horizontal motion
...     direction_range=(0, 0),    # Symmetric blur
...     p=1.0
... )
>>>
>>> # Object moving right
>>> transform = A.MotionBlur(
...     angle_range=(0, 0),        # Horizontal motion
...     direction_range=(0.8, 1.0), # Strong forward bias
...     p=1.0
... )

References

See Also: - GaussianBlur: For uniform blur effects - MedianBlur: For noise reduction while preserving edges - RandomRain: Another motion-based effect - Perspective: For geometric motion-like distortions

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class MotionBlur(Blur):
    """Apply motion blur to the input image using a directional kernel.

    This transform simulates motion blur effects that occur during image capture,
    such as camera shake or object movement. It creates a directional blur using
    a line-shaped kernel with controllable angle, direction, and position.

    Args:
        blur_limit (int | tuple[int, int]): Maximum kernel size for blurring.
            Should be in range [3, inf).
            - If int: kernel size will be randomly chosen from [3, blur_limit]
            - If tuple: kernel size will be randomly chosen from [min, max]
            Larger values create stronger blur effects.
            Default: (3, 7)

        angle_range (tuple[float, float]): Range of possible angles in degrees.
            Controls the rotation of the motion blur line:
            - 0°: Horizontal motion blur →
            - 45°: Diagonal motion blur ↗
            - 90°: Vertical motion blur ↑
            - 135°: Diagonal motion blur ↖
            Default: (0, 360)

        direction_range (tuple[float, float]): Range for motion bias.
            Controls how the blur extends from the center:
            - -1.0: Blur extends only backward (←)
            -  0.0: Blur extends equally in both directions (←→)
            -  1.0: Blur extends only forward (→)
            For example, with angle=0:
            - direction=-1.0: ←•
            - direction=0.0:  ←•→
            - direction=1.0:   •→
            Default: (-0.5, 0.5)

        allow_shifted (bool): Allow random kernel position shifts.
            - If True: Kernel can be randomly offset from center
            - If False: Kernel will always be centered
            Default: True

        p (float): Probability of applying the transform. Default: 0.5

    Examples of angle vs direction:
        1. Horizontal motion (angle=0°):
           - direction=0.0:   ←•→   (symmetric blur)
           - direction=1.0:    •→   (forward blur)
           - direction=-1.0:  ←•    (backward blur)

        2. Vertical motion (angle=90°):
           - direction=0.0:   ↑•↓   (symmetric blur)
           - direction=1.0:    •↑   (upward blur)
           - direction=-1.0:  ↓•    (downward blur)

        3. Diagonal motion (angle=45°):
           - direction=0.0:   ↙•↗   (symmetric blur)
           - direction=1.0:    •↗   (forward diagonal blur)
           - direction=-1.0:  ↙•    (backward diagonal blur)

    Note:
        - angle controls the orientation of the motion line
        - direction controls the distribution of the blur along that line
        - Together they can simulate various motion effects:
          * Camera shake: Small angle range + direction near 0
          * Object motion: Specific angle + direction=1.0
          * Complex motion: Random angle + random direction

    Example:
        >>> import albumentations as A
        >>> # Horizontal camera shake (symmetric)
        >>> transform = A.MotionBlur(
        ...     angle_range=(-5, 5),      # Near-horizontal motion
        ...     direction_range=(0, 0),    # Symmetric blur
        ...     p=1.0
        ... )
        >>>
        >>> # Object moving right
        >>> transform = A.MotionBlur(
        ...     angle_range=(0, 0),        # Horizontal motion
        ...     direction_range=(0.8, 1.0), # Strong forward bias
        ...     p=1.0
        ... )

    References:
        - Motion blur fundamentals:
          https://en.wikipedia.org/wiki/Motion_blur

        - Directional blur kernels:
          https://www.sciencedirect.com/topics/computer-science/directional-blur

        - OpenCV filter2D (used for convolution):
          https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga27c049795ce870216ddfb366086b5a04

        - Research on motion blur simulation:
          "Understanding and Evaluating Blind Deconvolution Algorithms" (CVPR 2009)
          https://doi.org/10.1109/CVPR.2009.5206815

        - Motion blur in photography:
          "The Manual of Photography", Chapter 7: Motion in Photography
          ISBN: 978-0240520377

        - Kornia's implementation (similar approach):
          https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.RandomMotionBlur

    See Also:
        - GaussianBlur: For uniform blur effects
        - MedianBlur: For noise reduction while preserving edges
        - RandomRain: Another motion-based effect
        - Perspective: For geometric motion-like distortions

    """

    class InitSchema(BlurInitSchema):
        allow_shifted: bool
        angle_range: Annotated[tuple[float, float], AfterValidator(nondecreasing)]
        direction_range: Annotated[
            tuple[float, float],
            AfterValidator(nondecreasing),
            AfterValidator(check_range_bounds(min_val=-1.0, max_val=1.0)),
        ]

    def __init__(
        self,
        blur_limit: ScaleIntType = 7,
        allow_shifted: bool = True,
        angle_range: tuple[float, float] = (0, 360),
        direction_range: tuple[float, float] = (-0.5, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(blur_limit=blur_limit, p=p)
        self.allow_shifted = allow_shifted
        self.blur_limit = cast(tuple[int, int], blur_limit)
        self.angle_range = angle_range
        self.direction_range = direction_range

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "allow_shifted", "angle_range", "direction_range")

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> dict[str, Any]:
        ksize = fblur.sample_odd_from_range(
            self.py_random,
            self.blur_limit[0],
            self.blur_limit[1],
        )

        angle = self.py_random.uniform(*self.angle_range)
        direction = self.py_random.uniform(*self.direction_range)

        # Create motion blur kernel
        kernel = fblur.create_motion_kernel(
            ksize,
            angle,
            direction,
            allow_shifted=self.allow_shifted,
            random_state=self.py_random,
        )

        return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
class ZoomBlur (max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=None, p=0.5) [view source on GitHub]

Apply zoom blur transform.

Parameters:

Name Type Description
max_factor float, float) or float

range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1.

step_factor float, float) or float

If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range [step_factor[0], step_factor[1]). Default: (0.01, 0.03). All step_factor values should be positive.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class ZoomBlur(ImageOnlyTransform):
    """Apply zoom blur transform.

    Args:
        max_factor ((float, float) or float): range for max factor for blurring.
            If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
            All max_factor values should be larger than 1.
        step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
            If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
            All step_factor values should be positive.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        unit8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
    """

    class InitSchema(BaseTransformInitSchema):
        max_factor: OnePlusFloatRangeType
        step_factor: NonNegativeFloatRangeType

    def __init__(
        self,
        max_factor: ScaleFloatType = (1, 1.31),
        step_factor: ScaleFloatType = (0.01, 0.03),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.max_factor = cast(tuple[float, float], max_factor)
        self.step_factor = cast(tuple[float, float], step_factor)

    def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
        return fblur.zoom_blur(img, zoom_factors)

    def get_params(self) -> dict[str, Any]:
        step_factor = self.py_random.uniform(*self.step_factor)
        max_factor = max(1 + step_factor, self.py_random.uniform(*self.max_factor))
        return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("max_factor", "step_factor")

crops special

functional

def crop_and_pad_keypoints (keypoints, crop_params=None, pad_params=None, image_shape=(0, 0), result_shape=(0, 0), keep_size=False) [view source on GitHub]

Crop and pad multiple keypoints simultaneously.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).

crop_params Sequence[int]

Crop parameters [crop_x1, crop_y1, ...].

pad_params Sequence[int]

Pad parameters [top, bottom, left, right].

image_shape Tuple[int, int]

Original image shape (rows, cols).

result_shape Tuple[int, int]

Result image shape (rows, cols).

keep_size bool

Whether to keep the original size.

Returns:

Type Description
np.ndarray

Array of transformed keypoints with the same shape as input.

Source code in albumentations/augmentations/crops/functional.py
Python
@handle_empty_array("keypoints")
def crop_and_pad_keypoints(
    keypoints: np.ndarray,
    crop_params: tuple[int, int, int, int] | None = None,
    pad_params: tuple[int, int, int, int] | None = None,
    image_shape: tuple[int, int] = (0, 0),
    result_shape: tuple[int, int] = (0, 0),
    keep_size: bool = False,
) -> np.ndarray:
    """Crop and pad multiple keypoints simultaneously.

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).
        crop_params (Sequence[int], optional): Crop parameters [crop_x1, crop_y1, ...].
        pad_params (Sequence[int], optional): Pad parameters [top, bottom, left, right].
        image_shape (Tuple[int, int]): Original image shape (rows, cols).
        result_shape (Tuple[int, int]): Result image shape (rows, cols).
        keep_size (bool): Whether to keep the original size.

    Returns:
        np.ndarray: Array of transformed keypoints with the same shape as input.
    """
    transformed_keypoints = keypoints.copy()

    if crop_params is not None:
        crop_x1, crop_y1 = crop_params[:2]
        transformed_keypoints[:, 0] -= crop_x1
        transformed_keypoints[:, 1] -= crop_y1

    if pad_params is not None:
        top, _, left, _ = pad_params
        transformed_keypoints[:, 0] += left
        transformed_keypoints[:, 1] += top

    rows, cols = image_shape[:2]
    result_rows, result_cols = result_shape[:2]

    if keep_size and (result_cols != cols or result_rows != rows):
        scale_x = cols / result_cols
        scale_y = rows / result_rows
        return fgeometric.keypoints_scale(transformed_keypoints, scale_x, scale_y)

    return transformed_keypoints
def crop_bboxes_by_coords (bboxes, crop_coords, image_shape, normalized_input=True) [view source on GitHub]

Crop bounding boxes based on given crop coordinates.

This function adjusts bounding boxes to fit within a cropped image.

Parameters:

Name Type Description
bboxes np.ndarray

Array of bounding boxes with shape (N, 4+) where each row is [x_min, y_min, x_max, y_max, ...]. The bounding box coordinates can be either normalized (in [0, 1]) if normalized_input=True or absolute pixel values if normalized_input=False.

crop_coords tuple[int, int, int, int]

Crop coordinates (x_min, y_min, x_max, y_max) in absolute pixel values.

image_shape tuple[int, int]

Original image shape (height, width).

normalized_input bool

Whether input boxes are in normalized coordinates. If True, assumes input is normalized [0,1] and returns normalized coordinates. If False, assumes input is in absolute pixels and returns absolute coordinates. Default: True for backward compatibility.

Returns:

Type Description
np.ndarray

Array of cropped bounding boxes. Coordinates will be in the same format as input (normalized if normalized_input=True, absolute pixels if normalized_input=False).

Note

Bounding boxes that fall completely outside the crop area will be removed. Bounding boxes that partially overlap with the crop area will be adjusted to fit within it.

Source code in albumentations/augmentations/crops/functional.py
Python
def crop_bboxes_by_coords(
    bboxes: np.ndarray,
    crop_coords: tuple[int, int, int, int],
    image_shape: tuple[int, int],
    normalized_input: bool = True,
) -> np.ndarray:
    """Crop bounding boxes based on given crop coordinates.

    This function adjusts bounding boxes to fit within a cropped image.

    Args:
        bboxes (np.ndarray): Array of bounding boxes with shape (N, 4+) where each row is
                             [x_min, y_min, x_max, y_max, ...]. The bounding box coordinates
                             can be either normalized (in [0, 1]) if normalized_input=True or
                             absolute pixel values if normalized_input=False.
        crop_coords (tuple[int, int, int, int]): Crop coordinates (x_min, y_min, x_max, y_max)
                                                 in absolute pixel values.
        image_shape (tuple[int, int]): Original image shape (height, width).
        normalized_input (bool): Whether input boxes are in normalized coordinates.
                               If True, assumes input is normalized [0,1] and returns normalized coordinates.
                               If False, assumes input is in absolute pixels and returns absolute coordinates.
                               Default: True for backward compatibility.

    Returns:
        np.ndarray: Array of cropped bounding boxes. Coordinates will be in the same format as input
                   (normalized if normalized_input=True, absolute pixels if normalized_input=False).

    Note:
        Bounding boxes that fall completely outside the crop area will be removed.
        Bounding boxes that partially overlap with the crop area will be adjusted to fit within it.
    """
    if not bboxes.size:
        return bboxes

    # Convert to absolute coordinates if needed
    if normalized_input:
        cropped_bboxes = denormalize_bboxes(bboxes.copy().astype(np.float32), image_shape)
    else:
        cropped_bboxes = bboxes.copy().astype(np.float32)

    x_min, y_min = crop_coords[:2]

    # Subtract crop coordinates
    cropped_bboxes[:, [0, 2]] -= x_min
    cropped_bboxes[:, [1, 3]] -= y_min

    # Calculate crop shape
    crop_height = crop_coords[3] - crop_coords[1]
    crop_width = crop_coords[2] - crop_coords[0]
    crop_shape = (crop_height, crop_width)

    # Return in same format as input
    return normalize_bboxes(cropped_bboxes, crop_shape) if normalized_input else cropped_bboxes
def crop_keypoints_by_coords (keypoints, crop_coords) [view source on GitHub]

Crop keypoints using the provided coordinates of bottom-left and top-right corners in pixels.

Parameters:

Name Type Description
keypoints np.ndarray

An array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).

crop_coords tuple

Crop box coords (x1, y1, x2, y2).

Returns:

Type Description
np.ndarray

An array of cropped keypoints with the same shape as the input.

Source code in albumentations/augmentations/crops/functional.py
Python
@handle_empty_array("keypoints")
def crop_keypoints_by_coords(
    keypoints: np.ndarray,
    crop_coords: tuple[int, int, int, int],
) -> np.ndarray:
    """Crop keypoints using the provided coordinates of bottom-left and top-right corners in pixels.

    Args:
        keypoints (np.ndarray): An array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).
        crop_coords (tuple): Crop box coords (x1, y1, x2, y2).

    Returns:
        np.ndarray: An array of cropped keypoints with the same shape as the input.
    """
    x1, y1 = crop_coords[:2]

    cropped_keypoints = keypoints.copy()
    cropped_keypoints[:, 0] -= x1  # Adjust x coordinates
    cropped_keypoints[:, 1] -= y1  # Adjust y coordinates

    return cropped_keypoints

transforms

class BBoxSafeRandomCrop (erosion_rate=0.0, p=1.0, always_apply=None) [view source on GitHub]

Crop a random part of the input without loss of bounding boxes.

This transform performs a random crop of the input image while ensuring that all bounding boxes remain within the cropped area. It's particularly useful for object detection tasks where preserving all objects in the image is crucial.

Parameters:

Name Type Description
erosion_rate float

A value between 0.0 and 1.0 that determines the minimum allowable size of the crop as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be at least 80% of the original image height. Default: 0.0 (no minimum size).

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

This transform ensures that all bounding boxes in the original image are fully contained within the cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out), it will default to cropping the entire image.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.ones((300, 300, 3), dtype=np.uint8)
>>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
>>> transform = A.Compose([
...     A.BBoxSafeRandomCrop(erosion_rate=0.2, p=1.0),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
>>> transformed_image = transformed['image']
>>> transformed_bboxes = transformed['bboxes']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BBoxSafeRandomCrop(BaseCrop):
    """Crop a random part of the input without loss of bounding boxes.

    This transform performs a random crop of the input image while ensuring that all bounding boxes remain within
    the cropped area. It's particularly useful for object detection tasks where preserving all objects in the image
    is crucial.

    Args:
        erosion_rate (float): A value between 0.0 and 1.0 that determines the minimum allowable size of the crop
            as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be
            at least 80% of the original image height. Default: 0.0 (no minimum size).
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        This transform ensures that all bounding boxes in the original image are fully contained within the
        cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out),
        it will default to cropping the entire image.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.ones((300, 300, 3), dtype=np.uint8)
        >>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
        >>> transform = A.Compose([
        ...     A.BBoxSafeRandomCrop(erosion_rate=0.2, p=1.0),
        ... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
        >>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
        >>> transformed_image = transformed['image']
        >>> transformed_bboxes = transformed['bboxes']
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        erosion_rate: float = Field(
            ge=0.0,
            le=1.0,
        )

    def __init__(self, erosion_rate: float = 0.0, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p)
        self.erosion_rate = erosion_rate

    def _get_coords_no_bbox(self, image_shape: tuple[int, int]) -> tuple[int, int, int, int]:
        image_height, image_width = image_shape

        erosive_h = int(image_height * (1.0 - self.erosion_rate))
        crop_height = image_height if erosive_h >= image_height else self.py_random.randint(erosive_h, image_height)

        crop_width = int(crop_height * image_width / image_height)

        h_start = self.py_random.random()
        w_start = self.py_random.random()

        crop_shape = (crop_height, crop_width)

        return fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]

        if len(data["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
            crop_coords = self._get_coords_no_bbox(image_shape)
            return {"crop_coords": crop_coords}

        bbox_union = union_of_bboxes(bboxes=data["bboxes"], erosion_rate=self.erosion_rate)

        if bbox_union is None:
            crop_coords = self._get_coords_no_bbox(image_shape)
            return {"crop_coords": crop_coords}

        x_min, y_min, x_max, y_max = bbox_union

        x_min = np.clip(x_min, 0, 1)
        y_min = np.clip(y_min, 0, 1)
        x_max = np.clip(x_max, x_min, 1)
        y_max = np.clip(y_max, y_min, 1)

        image_height, image_width = image_shape

        crop_x_min = int(x_min * self.py_random.random() * image_width)
        crop_y_min = int(y_min * self.py_random.random() * image_height)

        bbox_xmax = x_max + (1 - x_max) * self.py_random.random()
        bbox_ymax = y_max + (1 - y_max) * self.py_random.random()
        crop_x_max = int(bbox_xmax * image_width)
        crop_y_max = int(bbox_ymax * image_height)

        return {"crop_coords": (crop_x_min, crop_y_min, crop_x_max, crop_y_max)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("erosion_rate",)
class BaseCrop [view source on GitHub]

Base class for transforms that only perform cropping.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BaseCrop(DualTransform):
    """Base class for transforms that only perform cropping."""

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(
        self,
        img: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop(img, x_min=crop_coords[0], y_min=crop_coords[1], x_max=crop_coords[2], y_max=crop_coords[3])

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_bboxes_by_coords(bboxes, crop_coords, params["shape"][:2])

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_keypoints_by_coords(keypoints, crop_coords)

    @staticmethod
    def _clip_bbox(bbox: tuple[int, int, int, int], image_shape: tuple[int, int]) -> tuple[int, int, int, int]:
        height, width = image_shape[:2]
        x_min, y_min, x_max, y_max = bbox
        x_min = np.clip(x_min, 0, width)
        y_min = np.clip(y_min, 0, height)

        x_max = np.clip(x_max, x_min, width)
        y_max = np.clip(y_max, y_min, height)
        return x_min, y_min, x_max, y_max
class BaseCropAndPad (pad_if_needed, border_mode, fill, fill_mask, pad_position, p, always_apply=None) [view source on GitHub]

Base class for transforms that need both cropping and padding.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BaseCropAndPad(BaseCrop):
    """Base class for transforms that need both cropping and padding."""

    class InitSchema(BaseTransformInitSchema):
        pad_if_needed: bool
        border_mode: BorderModeType
        fill: ColorType
        fill_mask: ColorType
        pad_position: PositionType

    def __init__(
        self,
        pad_if_needed: bool,
        border_mode: int,
        fill: ColorType,
        fill_mask: ColorType,
        pad_position: PositionType,
        p: float,
        always_apply: bool | None = None,
    ):
        super().__init__(p=p)
        self.pad_if_needed = pad_if_needed
        self.border_mode = border_mode
        self.fill = fill
        self.fill_mask = fill_mask
        self.pad_position = pad_position

    def _get_pad_params(self, image_shape: tuple[int, int], target_shape: tuple[int, int]) -> dict[str, Any] | None:
        """Calculate padding parameters if needed."""
        if not self.pad_if_needed:
            return None

        h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = fgeometric.get_padding_params(
            image_shape=image_shape,
            min_height=target_shape[0],
            min_width=target_shape[1],
            pad_height_divisor=None,
            pad_width_divisor=None,
        )

        if h_pad_top == h_pad_bottom == w_pad_left == w_pad_right == 0:
            return None

        h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = fgeometric.adjust_padding_by_position(
            h_top=h_pad_top,
            h_bottom=h_pad_bottom,
            w_left=w_pad_left,
            w_right=w_pad_right,
            position=self.pad_position,
            py_random=self.py_random,
        )

        return {
            "pad_top": h_pad_top,
            "pad_bottom": h_pad_bottom,
            "pad_left": w_pad_left,
            "pad_right": w_pad_right,
        }

    def apply(
        self,
        img: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        pad_params = params.get("pad_params")
        if pad_params is not None:
            img = fgeometric.pad_with_params(
                img,
                pad_params["pad_top"],
                pad_params["pad_bottom"],
                pad_params["pad_left"],
                pad_params["pad_right"],
                border_mode=self.border_mode,
                value=self.fill,
            )
        return super().apply(img, crop_coords, **params)

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        pad_params = params.get("pad_params")
        image_shape = params["shape"][:2]

        if pad_params is not None:
            # First denormalize bboxes to absolute coordinates
            bboxes_np = denormalize_bboxes(bboxes, image_shape)

            # Apply padding to bboxes (already works with absolute coordinates)
            bboxes_np = fgeometric.pad_bboxes(
                bboxes_np,
                pad_params["pad_top"],
                pad_params["pad_bottom"],
                pad_params["pad_left"],
                pad_params["pad_right"],
                self.border_mode,
                image_shape=image_shape,
            )

            # Update shape to padded dimensions
            padded_height = image_shape[0] + pad_params["pad_top"] + pad_params["pad_bottom"]
            padded_width = image_shape[1] + pad_params["pad_left"] + pad_params["pad_right"]
            padded_shape = (padded_height, padded_width)

            bboxes_np = normalize_bboxes(bboxes_np, padded_shape)

            params["shape"] = padded_shape

            return super().apply_to_bboxes(bboxes_np, crop_coords, **params)

        # If no padding, use original function behavior
        return super().apply_to_bboxes(bboxes, crop_coords, **params)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        pad_params = params.get("pad_params")
        image_shape = params["shape"][:2]

        if pad_params is not None:
            # Calculate padded dimensions
            padded_height = image_shape[0] + pad_params["pad_top"] + pad_params["pad_bottom"]
            padded_width = image_shape[1] + pad_params["pad_left"] + pad_params["pad_right"]

            # First apply padding to keypoints using original image shape
            keypoints = fgeometric.pad_keypoints(
                keypoints,
                pad_params["pad_top"],
                pad_params["pad_bottom"],
                pad_params["pad_left"],
                pad_params["pad_right"],
                self.border_mode,
                image_shape=image_shape,
            )

            # Update image shape for subsequent crop operation
            params = {**params, "shape": (padded_height, padded_width)}

        return super().apply_to_keypoints(keypoints, crop_coords, **params)
class BaseRandomSizedCropInitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BaseRandomSizedCropInitSchema(BaseTransformInitSchema):
    size: tuple[int, int]

    @field_validator("size")
    @classmethod
    def check_size(cls, value: tuple[int, int]) -> tuple[int, int]:
        if any(x <= 0 for x in value):
            raise ValueError("All elements of 'size' must be positive integers.")
        return value
class CenterCrop (height, width, pad_if_needed=False, pad_mode=None, pad_cval=None, pad_cval_mask=None, pad_position='center', border_mode=0, fill=0.0, fill_mask=0.0, p=1.0, always_apply=None) [view source on GitHub]

Crop the central part of the input.

This transform crops the center of the input image, mask, bounding boxes, and keypoints to the specified dimensions. It's useful when you want to focus on the central region of the input, discarding peripheral information.

Parameters:

Name Type Description
height int

The height of the crop. Must be greater than 0.

width int

The width of the crop. Must be greater than 0.

pad_if_needed bool

Whether to pad if crop size exceeds image size. Default: False.

border_mode OpenCV flag

OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

fill ColorType

Padding value for images if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fill_mask ColorType

Padding value for masks if border_mode is cv2.BORDER_CONSTANT. Default: 0.

pad_position Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']

Position of padding. Default: 'center'.

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • If pad_if_needed is False and crop size exceeds image dimensions, it will raise a CropSizeError.
  • If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded.
  • For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CenterCrop(BaseCropAndPad):
    """Crop the central part of the input.

    This transform crops the center of the input image, mask, bounding boxes, and keypoints to the specified dimensions.
    It's useful when you want to focus on the central region of the input, discarding peripheral information.

    Args:
        height (int): The height of the crop. Must be greater than 0.
        width (int): The width of the crop. Must be greater than 0.
        pad_if_needed (bool): Whether to pad if crop size exceeds image size. Default: False.
        border_mode (OpenCV flag): OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
        fill (ColorType): Padding value for images if border_mode is
            cv2.BORDER_CONSTANT. Default: 0.
        fill_mask (ColorType): Padding value for masks if border_mode is
            cv2.BORDER_CONSTANT. Default: 0.
        pad_position (Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']):
            Position of padding. Default: 'center'.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - If pad_if_needed is False and crop size exceeds image dimensions, it will raise a CropSizeError.
        - If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded.
        - For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.
    """

    class InitSchema(BaseCropAndPad.InitSchema):
        height: Annotated[int, Field(ge=1)]
        width: Annotated[int, Field(ge=1)]
        border_mode: BorderModeType
        fill: ColorType
        fill_mask: ColorType
        pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead")
        pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
        pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")

        @model_validator(mode="after")
        def validate_dimensions(self) -> Self:
            if self.pad_mode is not None:
                self.border_mode = self.pad_mode
            if self.pad_cval is not None:
                self.fill = self.pad_cval
            if self.pad_cval_mask is not None:
                self.fill_mask = self.pad_cval_mask
            return self

    def __init__(
        self,
        height: int,
        width: int,
        pad_if_needed: bool = False,
        pad_mode: int | None = None,
        pad_cval: ColorType | None = None,
        pad_cval_mask: ColorType | None = None,
        pad_position: PositionType = "center",
        border_mode: int = cv2.BORDER_CONSTANT,
        fill: ColorType = 0.0,
        fill_mask: ColorType = 0.0,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(
            pad_if_needed=pad_if_needed,
            border_mode=border_mode,
            fill=fill,
            fill_mask=fill_mask,
            pad_position=pad_position,
            p=p,
        )
        self.height = height
        self.width = width

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "height",
            "width",
            "pad_if_needed",
            "border_mode",
            "fill",
            "fill_mask",
            "pad_position",
        )

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, Any]:
        image_shape = params["shape"][:2]
        image_height, image_width = image_shape

        if not self.pad_if_needed and (self.height > image_height or self.width > image_width):
            raise CropSizeError(
                f"Crop size (height, width) exceeds image dimensions (height, width):"
                f" {(self.height, self.width)} vs {image_shape[:2]}",
            )

        # Get padding params first if needed
        pad_params = self._get_pad_params(image_shape, (self.height, self.width))

        # If padding is needed, adjust the image shape for crop calculation
        if pad_params is not None:
            pad_top = pad_params["pad_top"]
            pad_bottom = pad_params["pad_bottom"]
            pad_left = pad_params["pad_left"]
            pad_right = pad_params["pad_right"]

            padded_height = image_height + pad_top + pad_bottom
            padded_width = image_width + pad_left + pad_right
            padded_shape = (padded_height, padded_width)

            # Get crop coordinates based on padded dimensions
            crop_coords = fcrops.get_center_crop_coords(padded_shape, (self.height, self.width))
        else:
            # Get crop coordinates based on original dimensions
            crop_coords = fcrops.get_center_crop_coords(image_shape, (self.height, self.width))

        return {
            "crop_coords": crop_coords,
            "pad_params": pad_params,
        }
class Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, pad_if_needed=False, pad_mode=None, pad_cval=None, pad_cval_mask=None, pad_position='center', border_mode=0, fill=0, fill_mask=0, p=1.0, always_apply=None) [view source on GitHub]

Crop a specific region from the input image.

This transform crops a rectangular region from the input image, mask, bounding boxes, and keypoints based on specified coordinates. It's useful when you want to extract a specific area of interest from your inputs.

Parameters:

Name Type Description
x_min int

Minimum x-coordinate of the crop region (left edge). Must be >= 0. Default: 0.

y_min int

Minimum y-coordinate of the crop region (top edge). Must be >= 0. Default: 0.

x_max int

Maximum x-coordinate of the crop region (right edge). Must be > x_min. Default: 1024.

y_max int

Maximum y-coordinate of the crop region (bottom edge). Must be > y_min. Default: 1024.

pad_if_needed bool

Whether to pad if crop coordinates exceed image dimensions. Default: False.

border_mode OpenCV flag

OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

fill ColorType

Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fill_mask ColorType

Padding value for masks. Default: 0.

pad_position Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']

Position of padding. Default: 'center'.

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • The crop coordinates are applied as follows: x_min <= x < x_max and y_min <= y < y_max.
  • If pad_if_needed is False and crop region extends beyond image boundaries, it will be clipped.
  • If pad_if_needed is True, image will be padded to accommodate the full crop region.
  • For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class Crop(BaseCropAndPad):
    """Crop a specific region from the input image.

    This transform crops a rectangular region from the input image, mask, bounding boxes, and keypoints
    based on specified coordinates. It's useful when you want to extract a specific area of interest
    from your inputs.

    Args:
        x_min (int): Minimum x-coordinate of the crop region (left edge). Must be >= 0. Default: 0.
        y_min (int): Minimum y-coordinate of the crop region (top edge). Must be >= 0. Default: 0.
        x_max (int): Maximum x-coordinate of the crop region (right edge). Must be > x_min. Default: 1024.
        y_max (int): Maximum y-coordinate of the crop region (bottom edge). Must be > y_min. Default: 1024.
        pad_if_needed (bool): Whether to pad if crop coordinates exceed image dimensions. Default: False.
        border_mode (OpenCV flag): OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
        fill (ColorType): Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0.
        fill_mask (ColorType): Padding value for masks. Default: 0.
        pad_position (Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']):
            Position of padding. Default: 'center'.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - The crop coordinates are applied as follows: x_min <= x < x_max and y_min <= y < y_max.
        - If pad_if_needed is False and crop region extends beyond image boundaries, it will be clipped.
        - If pad_if_needed is True, image will be padded to accommodate the full crop region.
        - For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.
    """

    class InitSchema(BaseCropAndPad.InitSchema):
        x_min: Annotated[int, Field(ge=0)]
        y_min: Annotated[int, Field(ge=0)]
        x_max: Annotated[int, Field(gt=0)]
        y_max: Annotated[int, Field(gt=0)]
        border_mode: BorderModeType
        fill: ColorType
        fill_mask: ColorType
        pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead")
        pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
        pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")

        @model_validator(mode="after")
        def validate_coordinates(self) -> Self:
            if not self.x_min < self.x_max:
                msg = "x_max must be greater than x_min"
                raise ValueError(msg)
            if not self.y_min < self.y_max:
                msg = "y_max must be greater than y_min"
                raise ValueError(msg)

            if self.pad_mode is not None:
                self.border_mode = self.pad_mode
            if self.pad_cval is not None:
                self.fill = self.pad_cval
            if self.pad_cval_mask is not None:
                self.fill_mask = self.pad_cval_mask

            return self

    def __init__(
        self,
        x_min: int = 0,
        y_min: int = 0,
        x_max: int = 1024,
        y_max: int = 1024,
        pad_if_needed: bool = False,
        pad_mode: int | None = None,
        pad_cval: ColorType | None = None,
        pad_cval_mask: ColorType | None = None,
        pad_position: PositionType = "center",
        border_mode: int = cv2.BORDER_CONSTANT,
        fill: ColorType = 0,
        fill_mask: ColorType = 0,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(
            pad_if_needed=pad_if_needed,
            border_mode=border_mode,
            fill=fill,
            fill_mask=fill_mask,
            pad_position=pad_position,
            p=p,
        )
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, Any]:
        image_shape = params["shape"][:2]
        image_height, image_width = image_shape

        crop_height = self.y_max - self.y_min
        crop_width = self.x_max - self.x_min

        if not self.pad_if_needed:
            # If no padding, clip coordinates to image boundaries
            x_min = np.clip(self.x_min, 0, image_width)
            y_min = np.clip(self.y_min, 0, image_height)
            x_max = np.clip(self.x_max, x_min, image_width)
            y_max = np.clip(self.y_max, y_min, image_height)
            return {"crop_coords": (x_min, y_min, x_max, y_max)}

        # Calculate padding if needed
        pad_params = self._get_pad_params(
            image_shape=image_shape,
            target_shape=(max(crop_height, image_height), max(crop_width, image_width)),
        )

        if pad_params is not None:
            # Adjust crop coordinates based on padding
            x_min = self.x_min + pad_params["pad_left"]
            y_min = self.y_min + pad_params["pad_top"]
            x_max = self.x_max + pad_params["pad_left"]
            y_max = self.y_max + pad_params["pad_top"]
            crop_coords = (x_min, y_min, x_max, y_max)
        else:
            crop_coords = (self.x_min, self.y_min, self.x_max, self.y_max)

        return {
            "crop_coords": crop_coords,
            "pad_params": pad_params,
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "x_min",
            "y_min",
            "x_max",
            "y_max",
            "pad_if_needed",
            "border_mode",
            "fill",
            "fill_mask",
            "pad_position",
        )
class CropAndPad (px=None, percent=None, pad_mode=None, pad_cval=None, pad_cval_mask=None, keep_size=True, sample_independently=True, interpolation=1, mask_interpolation=0, border_mode=0, fill=0, fill_mask=0, p=1.0, always_apply=None) [view source on GitHub]

Crop and pad images by pixel amounts or fractions of image sizes.

This transform allows for simultaneous cropping and padding of images. Cropping removes pixels from the sides (i.e., extracts a subimage), while padding adds pixels to the sides (e.g., black pixels). The amount of cropping/padding can be specified either in absolute pixels or as a fraction of the image size.

Parameters:

Name Type Description
px int, tuple of int, tuple of tuples of int, or None

The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter percent may be set, not both at the same time. - If int: crop/pad all sides by this value. - If tuple of 2 ints: crop/pad by (top/bottom, left/right). - If tuple of 4 ints: crop/pad by (top, right, bottom, left). - Each int can also be a tuple of 2 ints for a range, or a list of ints for discrete choices. Default: None.

percent float, tuple of float, tuple of tuples of float, or None

The fraction of the image size to crop (negative values) or pad (positive values) on each side. Either this or the parameter px may be set, not both at the same time. - If float: crop/pad all sides by this fraction. - If tuple of 2 floats: crop/pad by (top/bottom, left/right) fractions. - If tuple of 4 floats: crop/pad by (top, right, bottom, left) fractions. - Each float can also be a tuple of 2 floats for a range, or a list of floats for discrete choices. Default: None.

border_mode int

OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

fill ColorType

The constant value to use for padding if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fill_mask ColorType

Same as fill but used for mask padding. Default: 0.

keep_size bool

If True, the output image will be resized to the input image size after cropping/padding. Default: True.

sample_independently bool

If True and ranges are used for px/percent, sample a value for each side independently. If False, sample one value and use it for all sides. Default: True.

interpolation int

OpenCV interpolation flag used for resizing if keep_size is True. Default: cv2.INTER_LINEAR.

mask_interpolation int

OpenCV interpolation flag used for resizing if keep_size is True. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • This transform will never crop images below a height or width of 1.
  • When using pixel values (px), the image will be cropped/padded by exactly that many pixels.
  • When using percentages (percent), the amount of crop/pad will be calculated based on the image size.
  • Bounding boxes that end up fully outside the image after cropping will be removed.
  • Keypoints that end up outside the image after cropping will be removed.

Examples:

Python
>>> import albumentations as A
>>> transform = A.Compose([
...     A.CropAndPad(px=(-10, 20, 30, -40), border_mode=cv2.BORDER_REFLECT, fill=128, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropAndPad(DualTransform):
    """Crop and pad images by pixel amounts or fractions of image sizes.

    This transform allows for simultaneous cropping and padding of images. Cropping removes pixels from the sides
    (i.e., extracts a subimage), while padding adds pixels to the sides (e.g., black pixels). The amount of
    cropping/padding can be specified either in absolute pixels or as a fraction of the image size.

    Args:
        px (int, tuple of int, tuple of tuples of int, or None):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image.
            Either this or the parameter `percent` may be set, not both at the same time.
            - If int: crop/pad all sides by this value.
            - If tuple of 2 ints: crop/pad by (top/bottom, left/right).
            - If tuple of 4 ints: crop/pad by (top, right, bottom, left).
            - Each int can also be a tuple of 2 ints for a range, or a list of ints for discrete choices.
            Default: None.

        percent (float, tuple of float, tuple of tuples of float, or None):
            The fraction of the image size to crop (negative values) or pad (positive values) on each side.
            Either this or the parameter `px` may be set, not both at the same time.
            - If float: crop/pad all sides by this fraction.
            - If tuple of 2 floats: crop/pad by (top/bottom, left/right) fractions.
            - If tuple of 4 floats: crop/pad by (top, right, bottom, left) fractions.
            - Each float can also be a tuple of 2 floats for a range, or a list of floats for discrete choices.
            Default: None.

        border_mode (int):
            OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

        fill (ColorType):
            The constant value to use for padding if border_mode is cv2.BORDER_CONSTANT.
            Default: 0.

        fill_mask (ColorType):
            Same as fill but used for mask padding. Default: 0.

        keep_size (bool):
            If True, the output image will be resized to the input image size after cropping/padding.
            Default: True.

        sample_independently (bool):
            If True and ranges are used for px/percent, sample a value for each side independently.
            If False, sample one value and use it for all sides. Default: True.

        interpolation (int):
            OpenCV interpolation flag used for resizing if keep_size is True.
            Default: cv2.INTER_LINEAR.

        mask_interpolation (int):
            OpenCV interpolation flag used for resizing if keep_size is True.
            Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_NEAREST.

        p (float):
            Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - This transform will never crop images below a height or width of 1.
        - When using pixel values (px), the image will be cropped/padded by exactly that many pixels.
        - When using percentages (percent), the amount of crop/pad will be calculated based on the image size.
        - Bounding boxes that end up fully outside the image after cropping will be removed.
        - Keypoints that end up outside the image after cropping will be removed.

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        ...     A.CropAndPad(px=(-10, 20, 30, -40), border_mode=cv2.BORDER_REFLECT, fill=128, p=1.0),
        ... ])
        >>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
        >>> transformed_image = transformed['image']
        >>> transformed_mask = transformed['mask']
        >>> transformed_bboxes = transformed['bboxes']
        >>> transformed_keypoints = transformed['keypoints']
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        px: PxType | None
        percent: PercentType | None
        pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead")
        pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
        pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")
        keep_size: bool
        sample_independently: bool
        interpolation: InterpolationType
        mask_interpolation: InterpolationType
        fill: ColorType
        fill_mask: ColorType
        border_mode: BorderModeType

        @model_validator(mode="after")
        def check_px_percent(self) -> Self:
            if self.px is None and self.percent is None:
                msg = "Both px and percent parameters cannot be None simultaneously."
                raise ValueError(msg)
            if self.px is not None and self.percent is not None:
                msg = "Only px or percent may be set!"
                raise ValueError(msg)
            if self.pad_mode is not None:
                self.border_mode = self.pad_mode
            if self.pad_cval is not None:
                self.fill = self.pad_cval
            if self.pad_cval_mask is not None:
                self.fill_mask = self.pad_cval_mask

            return self

    def __init__(
        self,
        px: int | list[int] | None = None,
        percent: float | list[float] | None = None,
        pad_mode: int | None = None,
        pad_cval: ColorType | None = None,
        pad_cval_mask: ColorType | None = None,
        keep_size: bool = True,
        sample_independently: bool = True,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        border_mode: BorderModeType = cv2.BORDER_CONSTANT,
        fill: ColorType = 0,
        fill_mask: ColorType = 0,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.px = px
        self.percent = percent

        self.border_mode = border_mode
        self.fill = fill
        self.fill_mask = fill_mask

        self.keep_size = keep_size
        self.sample_independently = sample_independently

        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        fill: ColorType,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            img,
            crop_params,
            pad_params,
            fill,
            params["shape"][:2],
            self.interpolation,
            self.border_mode,
            self.keep_size,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        fill_mask: ColorType,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            mask,
            crop_params,
            pad_params,
            fill_mask,
            params["shape"][:2],
            self.mask_interpolation,
            self.border_mode,
            self.keep_size,
        )

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        crop_params: tuple[int, int, int, int],
        pad_params: tuple[int, int, int, int],
        result_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad_bboxes(bboxes, crop_params, pad_params, params["shape"][:2], result_shape)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        crop_params: tuple[int, int, int, int],
        pad_params: tuple[int, int, int, int],
        result_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad_keypoints(
            keypoints,
            crop_params,
            pad_params,
            params["shape"][:2],
            result_shape,
            self.keep_size,
        )

    @staticmethod
    def __prevent_zero(val1: int, val2: int, max_val: int) -> tuple[int, int]:
        regain = abs(max_val) + 1
        regain1 = regain // 2
        regain2 = regain // 2
        if regain1 + regain2 < regain:
            regain1 += 1

        if regain1 > val1:
            diff = regain1 - val1
            regain1 = val1
            regain2 += diff
        elif regain2 > val2:
            diff = regain2 - val2
            regain2 = val2
            regain1 += diff

        return val1 - regain1, val2 - regain2

    @staticmethod
    def _prevent_zero(crop_params: list[int], height: int, width: int) -> list[int]:
        top, right, bottom, left = crop_params

        remaining_height = height - (top + bottom)
        remaining_width = width - (left + right)

        if remaining_height < 1:
            top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
        if remaining_width < 1:
            left, right = CropAndPad.__prevent_zero(left, right, width)

        return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        if self.px is not None:
            new_params = self._get_px_params()
        else:
            percent_params = self._get_percent_params()
            new_params = [
                int(percent_params[0] * height),
                int(percent_params[1] * width),
                int(percent_params[2] * height),
                int(percent_params[3] * width),
            ]

        pad_params = [max(i, 0) for i in new_params]

        crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

        top, right, bottom, left = crop_params
        crop_params = [left, top, width - right, height - bottom]
        result_rows = crop_params[3] - crop_params[1]
        result_cols = crop_params[2] - crop_params[0]
        if result_cols == width and result_rows == height:
            crop_params = []

        top, right, bottom, left = pad_params
        pad_params = [top, bottom, left, right]
        if any(pad_params):
            result_rows += top + bottom
            result_cols += left + right
        else:
            pad_params = []

        return {
            "crop_params": crop_params or None,
            "pad_params": pad_params or None,
            "fill": None if pad_params is None else self._get_pad_value(cast(ColorType, self.fill)),
            "fill_mask": None if pad_params is None else self._get_pad_value(cast(ColorType, self.fill_mask)),
            "result_shape": (result_rows, result_cols),
        }

    def _get_px_params(self) -> list[int]:
        if self.px is None:
            msg = "px is not set"
            raise ValueError(msg)

        if isinstance(self.px, int):
            params = [self.px] * 4
        elif len(self.px) == PAIR:
            if self.sample_independently:
                params = [self.py_random.randrange(*self.px) for _ in range(4)]
            else:
                px = self.py_random.randrange(*self.px)
                params = [px] * 4
        elif isinstance(self.px[0], int):
            params = self.px
        elif len(self.px[0]) == PAIR:
            params = [self.py_random.randrange(*i) for i in self.px]
        else:
            params = [self.py_random.choice(i) for i in self.px]

        return params

    def _get_percent_params(self) -> list[float]:
        if self.percent is None:
            msg = "percent is not set"
            raise ValueError(msg)

        if isinstance(self.percent, float):
            params = [self.percent] * 4
        elif len(self.percent) == PAIR:
            if self.sample_independently:
                params = [self.py_random.uniform(*self.percent) for _ in range(4)]
            else:
                px = self.py_random.uniform(*self.percent)
                params = [px] * 4
        elif isinstance(self.percent[0], (int, float)):
            params = self.percent
        elif len(self.percent[0]) == PAIR:
            params = [self.py_random.uniform(*i) for i in self.percent]
        else:
            params = [self.py_random.choice(i) for i in self.percent]

        return params  # params = [top, right, bottom, left]

    def _get_pad_value(
        self,
        fill: ColorType,
    ) -> int | float:
        if isinstance(fill, (list, tuple)):
            if len(fill) == PAIR:
                a, b = fill
                if isinstance(a, int) and isinstance(b, int):
                    return self.py_random.randint(a, b)
                return self.py_random.uniform(a, b)
            return self.py_random.choice(fill)

        if isinstance(fill, Real):
            return fill

        msg = "fill should be a number or list, or tuple of two numbers."
        raise ValueError(msg)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "px",
            "percent",
            "border_mode",
            "fill",
            "fill_mask",
            "keep_size",
            "sample_independently",
            "interpolation",
            "mask_interpolation",
        )
class CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, p=1.0, always_apply=None) [view source on GitHub]

Crop area with mask if mask is non-empty, else make random crop.

This transform attempts to crop a region containing a mask (non-zero pixels). If the mask is empty or not provided, it falls back to a random crop. This is particularly useful for segmentation tasks where you want to focus on regions of interest defined by the mask.

Parameters:

Name Type Description
height int

Vertical size of crop in pixels. Must be > 0.

width int

Horizontal size of crop in pixels. Must be > 0.

ignore_values list of int

Values to ignore in mask, 0 values are always ignored. For example, if background value is 5, set ignore_values=[5] to ignore it. Default: None.

ignore_channels list of int

Channels to ignore in mask. For example, if background is the first channel, set ignore_channels=[0] to ignore it. Default: None.

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • If a mask is provided, the transform will try to crop an area containing non-zero (or non-ignored) pixels.
  • If no suitable area is found in the mask or no mask is provided, it will perform a random crop.
  • The crop size (height, width) must not exceed the original image dimensions.
  • Bounding boxes and keypoints are also cropped along with the image and mask.

Exceptions:

Type Description
ValueError

If the specified crop size is larger than the input image dimensions.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.zeros((100, 100), dtype=np.uint8)
>>> mask[25:75, 25:75] = 1  # Create a non-empty region in the mask
>>> transform = A.Compose([
...     A.CropNonEmptyMaskIfExists(height=50, width=50, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
# The resulting crop will likely include part of the non-zero region in the mask

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropNonEmptyMaskIfExists(BaseCrop):
    """Crop area with mask if mask is non-empty, else make random crop.

    This transform attempts to crop a region containing a mask (non-zero pixels). If the mask is empty or not provided,
    it falls back to a random crop. This is particularly useful for segmentation tasks where you want to focus on
    regions of interest defined by the mask.

    Args:
        height (int): Vertical size of crop in pixels. Must be > 0.
        width (int): Horizontal size of crop in pixels. Must be > 0.
        ignore_values (list of int, optional): Values to ignore in mask, `0` values are always ignored.
            For example, if background value is 5, set `ignore_values=[5]` to ignore it. Default: None.
        ignore_channels (list of int, optional): Channels to ignore in mask.
            For example, if background is the first channel, set `ignore_channels=[0]` to ignore it. Default: None.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - If a mask is provided, the transform will try to crop an area containing non-zero (or non-ignored) pixels.
        - If no suitable area is found in the mask or no mask is provided, it will perform a random crop.
        - The crop size (height, width) must not exceed the original image dimensions.
        - Bounding boxes and keypoints are also cropped along with the image and mask.

    Raises:
        ValueError: If the specified crop size is larger than the input image dimensions.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> mask = np.zeros((100, 100), dtype=np.uint8)
        >>> mask[25:75, 25:75] = 1  # Create a non-empty region in the mask
        >>> transform = A.Compose([
        ...     A.CropNonEmptyMaskIfExists(height=50, width=50, p=1.0),
        ... ])
        >>> transformed = transform(image=image, mask=mask)
        >>> transformed_image = transformed['image']
        >>> transformed_mask = transformed['mask']
        # The resulting crop will likely include part of the non-zero region in the mask
    """

    class InitSchema(BaseCrop.InitSchema):
        ignore_values: list[int] | None
        ignore_channels: list[int] | None
        height: Annotated[int, Field(ge=1)]
        width: Annotated[int, Field(ge=1)]

    def __init__(
        self,
        height: int,
        width: int,
        ignore_values: list[int] | None = None,
        ignore_channels: list[int] | None = None,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(p=p)

        self.height = height
        self.width = width
        self.ignore_values = ignore_values
        self.ignore_channels = ignore_channels

    def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
        mask_height, mask_width = mask.shape[:2]

        if self.ignore_values is not None:
            ignore_values_np = np.array(self.ignore_values)
            mask = np.where(np.isin(mask, ignore_values_np), 0, mask)

        if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS and self.ignore_channels is not None:
            target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
            mask = np.take(mask, target_channels, axis=-1)

        if self.height > mask_height or self.width > mask_width:
            raise ValueError(
                f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})",
            )

        return mask

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, Any]:
        """Get crop coordinates based on mask content."""
        if "mask" in data:
            mask = self._preprocess_mask(data["mask"])
        elif "masks" in data and len(data["masks"]):
            masks = data["masks"]
            mask = self._preprocess_mask(np.copy(masks[0]))
            for m in masks[1:]:
                mask |= self._preprocess_mask(m)
        else:
            msg = "Can not find mask for CropNonEmptyMaskIfExists"
            raise RuntimeError(msg)

        mask_height, mask_width = mask.shape[:2]

        if mask.any():
            # Find non-zero regions in mask
            mask_sum = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
            non_zero_yx = np.argwhere(mask_sum)
            y, x = self.py_random.choice(non_zero_yx)

            # Calculate crop coordinates centered around chosen point
            x_min = x - self.py_random.randint(0, self.width - 1)
            y_min = y - self.py_random.randint(0, self.height - 1)
            x_min = np.clip(x_min, 0, mask_width - self.width)
            y_min = np.clip(y_min, 0, mask_height - self.height)
        else:
            # Random crop if no non-zero regions
            x_min = self.py_random.randint(0, mask_width - self.width)
            y_min = self.py_random.randint(0, mask_height - self.height)

        x_max = x_min + self.width
        y_max = y_min + self.height

        return {"crop_coords": (x_min, y_min, x_max, y_max)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width", "ignore_values", "ignore_channels"
class RandomCrop (height, width, pad_if_needed=False, pad_mode=None, pad_cval=None, pad_cval_mask=None, pad_position='center', border_mode=0, fill=0.0, fill_mask=0.0, p=1.0, always_apply=None) [view source on GitHub]

Crop a random part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

pad_if_needed bool

Whether to pad if crop size exceeds image size. Default: False.

border_mode OpenCV flag

OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.

fill ColorType

Padding value for images if border_mode is cv2.BORDER_CONSTANT. Default: 0.

fill_mask ColorType

Padding value for masks if border_mode is cv2.BORDER_CONSTANT. Default: 0.

pad_position Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']

Position of padding. Default: 'center'.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded before applying the random crop.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCrop(BaseCropAndPad):
    """Crop a random part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        pad_if_needed (bool): Whether to pad if crop size exceeds image size. Default: False.
        border_mode (OpenCV flag): OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
        fill (ColorType): Padding value for images if border_mode is
            cv2.BORDER_CONSTANT. Default: 0.
        fill_mask (ColorType): Padding value for masks if border_mode is
            cv2.BORDER_CONSTANT. Default: 0.
        pad_position (Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']):
            Position of padding. Default: 'center'.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded
        before applying the random crop.
    """

    class InitSchema(BaseCropAndPad.InitSchema):
        height: Annotated[int, Field(ge=1)]
        width: Annotated[int, Field(ge=1)]
        border_mode: BorderModeType
        fill: ColorType
        fill_mask: ColorType
        pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead ")
        pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
        pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")

        @model_validator(mode="after")
        def validate_dimensions(self) -> Self:
            if self.pad_mode is not None:
                self.border_mode = self.pad_mode
            if self.pad_cval is not None:
                self.fill = self.pad_cval
            if self.pad_cval_mask is not None:
                self.fill_mask = self.pad_cval_mask
            return self

    def __init__(
        self,
        height: int,
        width: int,
        pad_if_needed: bool = False,
        pad_mode: int | None = None,
        pad_cval: ColorType | None = None,
        pad_cval_mask: ColorType | None = None,
        pad_position: PositionType = "center",
        border_mode: int = cv2.BORDER_CONSTANT,
        fill: ColorType = 0.0,
        fill_mask: ColorType = 0.0,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(
            pad_if_needed=pad_if_needed,
            border_mode=border_mode,
            fill=fill,
            fill_mask=fill_mask,
            pad_position=pad_position,
            p=p,
        )
        self.height = height
        self.width = width

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, Any]:  # Changed return type to be more flexible
        image_shape = params["shape"][:2]
        image_height, image_width = image_shape

        if not self.pad_if_needed and (self.height > image_height or self.width > image_width):
            raise CropSizeError(
                f"Crop size (height, width) exceeds image dimensions (height, width):"
                f" {(self.height, self.width)} vs {image_shape[:2]}",
            )

        # Get padding params first if needed
        pad_params = self._get_pad_params(image_shape, (self.height, self.width))

        # If padding is needed, adjust the image shape for crop calculation
        if pad_params is not None:
            pad_top = pad_params["pad_top"]
            pad_bottom = pad_params["pad_bottom"]
            pad_left = pad_params["pad_left"]
            pad_right = pad_params["pad_right"]

            padded_height = image_height + pad_top + pad_bottom
            padded_width = image_width + pad_left + pad_right
            padded_shape = (padded_height, padded_width)

            # Get random crop coordinates based on padded dimensions
            h_start = self.py_random.random()
            w_start = self.py_random.random()
            crop_coords = fcrops.get_crop_coords(padded_shape, (self.height, self.width), h_start, w_start)
        else:
            # Get random crop coordinates based on original dimensions
            h_start = self.py_random.random()
            w_start = self.py_random.random()
            crop_coords = fcrops.get_crop_coords(image_shape, (self.height, self.width), h_start, w_start)

        return {
            "crop_coords": crop_coords,
            "pad_params": pad_params,
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "height",
            "width",
            "pad_if_needed",
            "border_mode",
            "fill",
            "fill_mask",
            "pad_position",
        )
class RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=None, p=1.0) [view source on GitHub]

Randomly crops the input from its borders without resizing.

This transform randomly crops parts of the input (image, mask, bounding boxes, or keypoints) from each of its borders. The amount of cropping is specified as a fraction of the input's dimensions for each side independently.

Parameters:

Name Type Description
crop_left float

The maximum fraction of width to crop from the left side. Must be in the range [0.0, 1.0]. Default: 0.1

crop_right float

The maximum fraction of width to crop from the right side. Must be in the range [0.0, 1.0]. Default: 0.1

crop_top float

The maximum fraction of height to crop from the top. Must be in the range [0.0, 1.0]. Default: 0.1

crop_bottom float

The maximum fraction of height to crop from the bottom. Must be in the range [0.0, 1.0]. Default: 0.1

p float

Probability of applying the transform. Default: 1.0

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • The actual amount of cropping for each side is randomly chosen between 0 and the specified maximum for each application of the transform.
  • The sum of crop_left and crop_right must not exceed 1.0, and the sum of crop_top and crop_bottom must not exceed 1.0. Otherwise, a ValueError will be raised.
  • This transform does not resize the input after cropping, so the output dimensions will be smaller than the input dimensions.
  • Bounding boxes that end up fully outside the cropped area will be removed.
  • Keypoints that end up outside the cropped area will be removed.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomCropFromBorders(
...     crop_left=0.1, crop_right=0.2, crop_top=0.2, crop_bottom=0.1, p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
# The resulting image will have random crops from each border, with the maximum
# possible crops being 10% from the left, 20% from the right, 20% from the top,
# and 10% from the bottom. The image size will be reduced accordingly.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropFromBorders(BaseCrop):
    """Randomly crops the input from its borders without resizing.

    This transform randomly crops parts of the input (image, mask, bounding boxes, or keypoints)
    from each of its borders. The amount of cropping is specified as a fraction of the input's
    dimensions for each side independently.

    Args:
        crop_left (float): The maximum fraction of width to crop from the left side.
            Must be in the range [0.0, 1.0]. Default: 0.1
        crop_right (float): The maximum fraction of width to crop from the right side.
            Must be in the range [0.0, 1.0]. Default: 0.1
        crop_top (float): The maximum fraction of height to crop from the top.
            Must be in the range [0.0, 1.0]. Default: 0.1
        crop_bottom (float): The maximum fraction of height to crop from the bottom.
            Must be in the range [0.0, 1.0]. Default: 0.1
        p (float): Probability of applying the transform. Default: 1.0

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - The actual amount of cropping for each side is randomly chosen between 0 and
          the specified maximum for each application of the transform.
        - The sum of crop_left and crop_right must not exceed 1.0, and the sum of
          crop_top and crop_bottom must not exceed 1.0. Otherwise, a ValueError will be raised.
        - This transform does not resize the input after cropping, so the output dimensions
          will be smaller than the input dimensions.
        - Bounding boxes that end up fully outside the cropped area will be removed.
        - Keypoints that end up outside the cropped area will be removed.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.RandomCropFromBorders(
        ...     crop_left=0.1, crop_right=0.2, crop_top=0.2, crop_bottom=0.1, p=1.0
        ... )
        >>> result = transform(image=image)
        >>> transformed_image = result['image']
        # The resulting image will have random crops from each border, with the maximum
        # possible crops being 10% from the left, 20% from the right, 20% from the top,
        # and 10% from the bottom. The image size will be reduced accordingly.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        crop_left: float = Field(
            ge=0.0,
            le=1.0,
        )
        crop_right: float = Field(
            ge=0.0,
            le=1.0,
        )
        crop_top: float = Field(
            ge=0.0,
            le=1.0,
        )
        crop_bottom: float = Field(
            ge=0.0,
            le=1.0,
        )

        @model_validator(mode="after")
        def validate_crop_values(self) -> Self:
            if self.crop_left + self.crop_right > 1.0:
                msg = "The sum of crop_left and crop_right must be <= 1."
                raise ValueError(msg)
            if self.crop_top + self.crop_bottom > 1.0:
                msg = "The sum of crop_top and crop_bottom must be <= 1."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        crop_left: float = 0.1,
        crop_right: float = 0.1,
        crop_top: float = 0.1,
        crop_bottom: float = 0.1,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p)
        self.crop_left = crop_left
        self.crop_right = crop_right
        self.crop_top = crop_top
        self.crop_bottom = crop_bottom

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        height, width = params["shape"][:2]

        x_min = self.py_random.randint(0, int(self.crop_left * width))
        x_max = self.py_random.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width)

        y_min = self.py_random.randint(0, int(self.crop_top * height))
        y_max = self.py_random.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height)

        crop_coords = x_min, y_min, x_max, y_max

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "crop_left", "crop_right", "crop_top", "crop_bottom"
class RandomCropNearBBox (max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=None, p=1.0) [view source on GitHub]

Crop bbox from image with random shift by x,y coordinates

Parameters:

Name Type Description
max_part_shift float, (float, float

Max shift in height and width dimensions relative to cropping_bbox dimension. If max_part_shift is a single float, the range will be (0, max_part_shift). Default (0, 0.3).

cropping_bbox_key str

Additional target key for cropping box. Default cropping_bbox.

cropping_box_key str

[Deprecated] Use cropping_bbox_key instead.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:

Python
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropNearBBox(BaseCrop):
    """Crop bbox from image with random shift by x,y coordinates

    Args:
        max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
            to `cropping_bbox` dimension.
            If max_part_shift is a single float, the range will be (0, max_part_shift).
            Default (0, 0.3).
        cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
        cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
        >>>              bbox_params=BboxParams("pascal_voc"))
        >>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        max_part_shift: ZeroOneRangeType
        cropping_bbox_key: str

    def __init__(
        self,
        max_part_shift: ScaleFloatType = (0, 0.3),
        cropping_bbox_key: str = "cropping_bbox",
        cropping_box_key: str | None = None,  # Deprecated
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p)
        # Check for deprecated parameter and issue warning
        if cropping_box_key is not None:
            warn(
                "The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
                "Use 'cropping_bbox_key' instead.",
                DeprecationWarning,
                stacklevel=2,
            )
            # Ensure the new parameter is used even if the old one is passed
            cropping_bbox_key = cropping_box_key

        self.max_part_shift = cast(tuple[float, float], max_part_shift)
        self.cropping_bbox_key = cropping_bbox_key

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[float, ...]]:
        bbox = data[self.cropping_bbox_key]

        image_shape = params["shape"][:2]

        bbox = self._clip_bbox(bbox, image_shape)

        h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
        w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

        x_min = bbox[0] - self.py_random.randint(-w_max_shift, w_max_shift)
        x_max = bbox[2] + self.py_random.randint(-w_max_shift, w_max_shift)

        y_min = bbox[1] - self.py_random.randint(-h_max_shift, h_max_shift)
        y_max = bbox[3] + self.py_random.randint(-h_max_shift, h_max_shift)

        crop_coords = self._clip_bbox((x_min, y_min, x_max, y_max), image_shape)

        if crop_coords[0] == crop_coords[2] or crop_coords[1] == crop_coords[3]:
            crop_shape = (bbox[3] - bbox[1], bbox[2] - bbox[0])
            crop_coords = fcrops.get_center_crop_coords(image_shape, crop_shape)

        return {"crop_coords": crop_coords}

    @property
    def targets_as_params(self) -> list[str]:
        return [self.cropping_bbox_key]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_part_shift", "cropping_bbox_key"
class RandomResizedCrop (size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, mask_interpolation=0, p=1.0, always_apply=None) [view source on GitHub]

Crop a random part of the input and rescale it to a specified size.

This transform first crops a random portion of the input image (or mask, bounding boxes, keypoints) and then resizes the crop to a specified size. It's particularly useful for training neural networks on images of varying sizes and aspect ratios.

Parameters:

Name Type Description
size tuple[int, int]

Target size for the output image, i.e. (height, width) after crop and resize.

scale tuple[float, float]

Range of the random size of the crop relative to the input size. For example, (0.08, 1.0) means the crop size will be between 8% and 100% of the input size. Default: (0.08, 1.0)

ratio tuple[float, float]

Range of aspect ratios of the random crop. For example, (0.75, 1.3333) allows crop aspect ratios from 3:4 to 4:3. Default: (0.75, 1.3333333333333333)

interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR

mask_interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST

p float

Probability of applying the transform. Default: 1.0

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • This transform attempts to crop a random area with an aspect ratio and relative size specified by 'ratio' and 'scale' parameters. If it fails to find a suitable crop after 10 attempts, it will return a crop from the center of the image.
  • The crop's aspect ratio is defined as width / height.
  • Bounding boxes that end up fully outside the cropped area will be removed.
  • Keypoints that end up outside the cropped area will be removed.
  • After cropping, the result is resized to the specified size.

Mathematical Details: 1. A target area A is sampled from the range [scale[0] * input_area, scale[1] * input_area]. 2. A target aspect ratio r is sampled from the range [ratio[0], ratio[1]]. 3. The crop width and height are computed as: w = sqrt(A * r) h = sqrt(A / r) 4. If w and h are within the input image dimensions, the crop is accepted. Otherwise, steps 1-3 are repeated (up to 10 times). 5. If no valid crop is found after 10 attempts, a centered crop is taken. 6. The crop is then resized to the specified size.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomResizedCrop(size=80, scale=(0.5, 1.0), ratio=(0.75, 1.33), p=1.0)
>>> result = transform(image=image)
>>> transformed_image = result['image']
# transformed_image will be a 80x80 crop from a random location in the original image,
# with the crop's size between 50% and 100% of the original image size,
# and the crop's aspect ratio between 3:4 and 4:3.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomResizedCrop(_BaseRandomSizedCrop):
    """Crop a random part of the input and rescale it to a specified size.

    This transform first crops a random portion of the input image (or mask, bounding boxes, keypoints)
    and then resizes the crop to a specified size. It's particularly useful for training neural networks
    on images of varying sizes and aspect ratios.

    Args:
        size (tuple[int, int]): Target size for the output image, i.e. (height, width) after crop and resize.
        scale (tuple[float, float]): Range of the random size of the crop relative to the input size.
            For example, (0.08, 1.0) means the crop size will be between 8% and 100% of the input size.
            Default: (0.08, 1.0)
        ratio (tuple[float, float]): Range of aspect ratios of the random crop.
            For example, (0.75, 1.3333) allows crop aspect ratios from 3:4 to 4:3.
            Default: (0.75, 1.3333333333333333)
        interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR
        mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
            Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_NEAREST
        p (float): Probability of applying the transform. Default: 1.0

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - This transform attempts to crop a random area with an aspect ratio and relative size
          specified by 'ratio' and 'scale' parameters. If it fails to find a suitable crop after
          10 attempts, it will return a crop from the center of the image.
        - The crop's aspect ratio is defined as width / height.
        - Bounding boxes that end up fully outside the cropped area will be removed.
        - Keypoints that end up outside the cropped area will be removed.
        - After cropping, the result is resized to the specified size.

    Mathematical Details:
        1. A target area A is sampled from the range [scale[0] * input_area, scale[1] * input_area].
        2. A target aspect ratio r is sampled from the range [ratio[0], ratio[1]].
        3. The crop width and height are computed as:
           w = sqrt(A * r)
           h = sqrt(A / r)
        4. If w and h are within the input image dimensions, the crop is accepted.
           Otherwise, steps 1-3 are repeated (up to 10 times).
        5. If no valid crop is found after 10 attempts, a centered crop is taken.
        6. The crop is then resized to the specified size.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.RandomResizedCrop(size=80, scale=(0.5, 1.0), ratio=(0.75, 1.33), p=1.0)
        >>> result = transform(image=image)
        >>> transformed_image = result['image']
        # transformed_image will be a 80x80 crop from a random location in the original image,
        # with the crop's size between 50% and 100% of the original image size,
        # and the crop's aspect ratio between 3:4 and 4:3.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]
        ratio: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)]
        width: int | None = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        height: int | None = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        size: ScaleIntType | None
        interpolation: InterpolationType
        mask_interpolation: InterpolationType

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)

            return self

    def __init__(
        self,
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: ScaleIntType | None = None,
        width: int | None = None,
        height: int | None = None,
        *,
        scale: tuple[float, float] = (0.08, 1.0),
        ratio: tuple[float, float] = (0.75, 1.3333333333333333),
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(
            size=cast(tuple[int, int], size),
            interpolation=interpolation,
            mask_interpolation=mask_interpolation,
            p=p,
        )
        self.scale = scale
        self.ratio = ratio

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]
        image_height, image_width = image_shape

        area = image_height * image_width

        for _ in range(10):
            target_area = self.py_random.uniform(*self.scale) * area
            log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
            aspect_ratio = math.exp(self.py_random.uniform(*log_ratio))

            width = int(round(math.sqrt(target_area * aspect_ratio)))
            height = int(round(math.sqrt(target_area / aspect_ratio)))

            if 0 < width <= image_width and 0 < height <= image_height:
                i = self.py_random.randint(0, image_height - height)
                j = self.py_random.randint(0, image_width - width)

                h_start = i * 1.0 / (image_height - height + 1e-10)
                w_start = j * 1.0 / (image_width - width + 1e-10)

                crop_shape = (height, width)

                crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

                return {"crop_coords": crop_coords}

        # Fallback to central crop
        in_ratio = image_width / image_height
        if in_ratio < min(self.ratio):
            width = image_width
            height = int(round(image_width / min(self.ratio)))
        elif in_ratio > max(self.ratio):
            height = image_height
            width = int(round(height * max(self.ratio)))
        else:  # whole image
            width = image_width
            height = image_height

        i = (image_height - height) // 2
        j = (image_width - width) // 2

        h_start = i * 1.0 / (image_height - height + 1e-10)
        w_start = j * 1.0 / (image_width - width + 1e-10)

        crop_shape = (height, width)

        crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "size", "scale", "ratio", "interpolation", "mask_interpolation"
class RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, mask_interpolation=0, always_apply=None, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to a specific size without loss of bounding boxes.

This transform first attempts to crop a random portion of the input image while ensuring that all bounding boxes remain within the cropped area. It then resizes the crop to the specified size. This is particularly useful for object detection tasks where preserving all objects in the image is crucial while also standardizing the image size.

Parameters:

Name Type Description
height int

Height of the output image after resizing.

width int

Width of the output image after resizing.

erosion_rate float

A value between 0.0 and 1.0 that determines the minimum allowable size of the crop as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be at least 80% of the original image height and width. Default: 0.0 (no minimum size).

interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

mask_interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • This transform ensures that all bounding boxes in the original image are fully contained within the cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out), it will default to cropping the entire image.
  • After cropping, the result is resized to the specified (height, width) size.
  • Bounding box coordinates are adjusted to match the new image size.
  • Keypoints are moved along with the crop and scaled to the new image size.
  • If there are no bounding boxes in the image, it will fall back to a random crop.

Mathematical Details: 1. A crop region is selected that includes all bounding boxes. 2. The crop size is determined by the erosion_rate: min_crop_size = (1 - erosion_rate) * original_size 3. If the selected crop is smaller than min_crop_size, it's expanded to meet this requirement. 4. The crop is then resized to the specified (height, width) size. 5. Bounding box coordinates are transformed to match the new image size: new_coord = (old_coord - crop_start) * (new_size / crop_size)

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
>>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
>>> transform = A.Compose([
...     A.RandomSizedBBoxSafeCrop(height=224, width=224, erosion_rate=0.2, p=1.0),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
>>> transformed_image = transformed['image']
>>> transformed_bboxes = transformed['bboxes']
# transformed_image will be a 224x224 image containing all original bounding boxes,
# with their coordinates adjusted to the new image size.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
    """Crop a random part of the input and rescale it to a specific size without loss of bounding boxes.

    This transform first attempts to crop a random portion of the input image while ensuring that all bounding boxes
    remain within the cropped area. It then resizes the crop to the specified size. This is particularly useful for
    object detection tasks where preserving all objects in the image is crucial while also standardizing the image size.

    Args:
        height (int): Height of the output image after resizing.
        width (int): Width of the output image after resizing.
        erosion_rate (float): A value between 0.0 and 1.0 that determines the minimum allowable size of the crop
            as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be
            at least 80% of the original image height and width. Default: 0.0 (no minimum size).
        interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
            Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_NEAREST.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - This transform ensures that all bounding boxes in the original image are fully contained within the
          cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out),
          it will default to cropping the entire image.
        - After cropping, the result is resized to the specified (height, width) size.
        - Bounding box coordinates are adjusted to match the new image size.
        - Keypoints are moved along with the crop and scaled to the new image size.
        - If there are no bounding boxes in the image, it will fall back to a random crop.

    Mathematical Details:
        1. A crop region is selected that includes all bounding boxes.
        2. The crop size is determined by the erosion_rate:
           min_crop_size = (1 - erosion_rate) * original_size
        3. If the selected crop is smaller than min_crop_size, it's expanded to meet this requirement.
        4. The crop is then resized to the specified (height, width) size.
        5. Bounding box coordinates are transformed to match the new image size:
           new_coord = (old_coord - crop_start) * (new_size / crop_size)

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
        >>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
        >>> transform = A.Compose([
        ...     A.RandomSizedBBoxSafeCrop(height=224, width=224, erosion_rate=0.2, p=1.0),
        ... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
        >>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
        >>> transformed_image = transformed['image']
        >>> transformed_bboxes = transformed['bboxes']
        # transformed_image will be a 224x224 image containing all original bounding boxes,
        # with their coordinates adjusted to the new image size.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        height: Annotated[int, Field(ge=1)]
        width: Annotated[int, Field(ge=1)]
        erosion_rate: float = Field(
            ge=0.0,
            le=1.0,
        )
        interpolation: InterpolationType
        mask_interpolation: InterpolationType

    def __init__(
        self,
        height: int,
        width: int,
        erosion_rate: float = 0.0,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(erosion_rate=erosion_rate, p=p)
        self.height = height
        self.width = width
        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        crop = fcrops.crop(img, *crop_coords)
        return fgeometric.resize(crop, (self.height, self.width), self.interpolation)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        crop = fcrops.crop(mask, *crop_coords)
        return fgeometric.resize(crop, (self.height, self.width), self.mask_interpolation)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        keypoints = fcrops.crop_keypoints_by_coords(keypoints, crop_coords)

        crop_height = crop_coords[3] - crop_coords[1]
        crop_width = crop_coords[2] - crop_coords[0]

        scale_y = self.height / crop_height
        scale_x = self.width / crop_width
        return fgeometric.keypoints_scale(keypoints, scale_x=scale_x, scale_y=scale_y)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "height", "width", "interpolation", "mask_interpolation")
class RandomSizedCrop (min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, mask_interpolation=0, p=1.0, always_apply=None) [view source on GitHub]

Crop a random part of the input and rescale it to a specific size.

This transform first crops a random portion of the input and then resizes it to a specified size. The size of the random crop is controlled by the 'min_max_height' parameter.

Parameters:

Name Type Description
min_max_height tuple[int, int]

Minimum and maximum height of the crop in pixels.

size tuple[int, int]

Target size for the output image, i.e. (height, width) after crop and resize.

w2h_ratio float

Aspect ratio (width/height) of crop. Default: 1.0

interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

mask_interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST.

p float

Probability of applying the transform. Default: 1.0

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • The crop size is randomly selected for each execution within the range specified by 'min_max_height'.
  • The aspect ratio of the crop is determined by the 'w2h_ratio' parameter.
  • After cropping, the result is resized to the specified 'size'.
  • Bounding boxes that end up fully outside the cropped area will be removed.
  • Keypoints that end up outside the cropped area will be removed.
  • This transform differs from RandomResizedCrop in that it allows more control over the crop size through the 'min_max_height' parameter, rather than using a scale parameter.

Mathematical Details: 1. A random crop height h is sampled from the range [min_max_height[0], min_max_height[1]]. 2. The crop width w is calculated as: w = h * w2h_ratio 3. A random location for the crop is selected within the input image. 4. The image is cropped to the size (h, w). 5. The crop is then resized to the specified 'size'.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomSizedCrop(
...     min_max_height=(50, 80),
...     size=(64, 64),
...     w2h_ratio=1.0,
...     interpolation=cv2.INTER_LINEAR,
...     p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
# transformed_image will be a 64x64 image, resulting from a crop with height
# between 50 and 80 pixels, and the same aspect ratio as specified by w2h_ratio,
# taken from a random location in the original image and then resized.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedCrop(_BaseRandomSizedCrop):
    """Crop a random part of the input and rescale it to a specific size.

    This transform first crops a random portion of the input and then resizes it to a specified size.
    The size of the random crop is controlled by the 'min_max_height' parameter.

    Args:
        min_max_height (tuple[int, int]): Minimum and maximum height of the crop in pixels.
        size (tuple[int, int]): Target size for the output image, i.e. (height, width) after crop and resize.
        w2h_ratio (float): Aspect ratio (width/height) of crop. Default: 1.0
        interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
            Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_NEAREST.
        p (float): Probability of applying the transform. Default: 1.0

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - The crop size is randomly selected for each execution within the range specified by 'min_max_height'.
        - The aspect ratio of the crop is determined by the 'w2h_ratio' parameter.
        - After cropping, the result is resized to the specified 'size'.
        - Bounding boxes that end up fully outside the cropped area will be removed.
        - Keypoints that end up outside the cropped area will be removed.
        - This transform differs from RandomResizedCrop in that it allows more control over the crop size
          through the 'min_max_height' parameter, rather than using a scale parameter.

    Mathematical Details:
        1. A random crop height h is sampled from the range [min_max_height[0], min_max_height[1]].
        2. The crop width w is calculated as: w = h * w2h_ratio
        3. A random location for the crop is selected within the input image.
        4. The image is cropped to the size (h, w).
        5. The crop is then resized to the specified 'size'.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.RandomSizedCrop(
        ...     min_max_height=(50, 80),
        ...     size=(64, 64),
        ...     w2h_ratio=1.0,
        ...     interpolation=cv2.INTER_LINEAR,
        ...     p=1.0
        ... )
        >>> result = transform(image=image)
        >>> transformed_image = result['image']
        # transformed_image will be a 64x64 image, resulting from a crop with height
        # between 50 and 80 pixels, and the same aspect ratio as specified by w2h_ratio,
        # taken from a random location in the original image and then resized.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        interpolation: InterpolationType
        mask_interpolation: InterpolationType
        min_max_height: OnePlusIntRangeType
        w2h_ratio: Annotated[float, Field(gt=0)]
        width: int | None = Field(
            None,
            deprecated=(
                "Initializing with 'size' as an integer and a separate 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        height: int | None = Field(
            None,
            deprecated=(
                "Initializing with 'height' and 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        size: ScaleIntType | None

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)
            return self

    def __init__(
        self,
        min_max_height: tuple[int, int],
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: ScaleIntType | None = None,
        width: int | None = None,
        height: int | None = None,
        *,
        w2h_ratio: float = 1.0,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(
            size=cast(tuple[int, int], size),
            interpolation=interpolation,
            mask_interpolation=mask_interpolation,
            p=p,
        )
        self.min_max_height = min_max_height
        self.w2h_ratio = w2h_ratio

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]

        crop_height = self.py_random.randint(*self.min_max_height)
        crop_width = int(crop_height * self.w2h_ratio)

        crop_shape = (crop_height, crop_width)

        h_start = self.py_random.random()
        w_start = self.py_random.random()

        crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "min_max_height", "w2h_ratio")

domain_adaptation special

functional

def apply_histogram (img, reference_image, blend_ratio) [view source on GitHub]

Apply histogram matching to an input image using a reference image and blend the result.

This function performs histogram matching between the input image and a reference image, then blends the result with the original input image based on the specified blend ratio.

Parameters:

Name Type Description
img np.ndarray

The input image to be transformed. Can be either grayscale or RGB. Supported dtypes: uint8, float32 (values should be in [0, 1] range).

reference_image np.ndarray

The reference image used for histogram matching. Should have the same number of channels as the input image. Supported dtypes: uint8, float32 (values should be in [0, 1] range).

blend_ratio float

The ratio for blending the matched image with the original image. Should be in the range [0, 1], where 0 means no change and 1 means full histogram matching.

Returns:

Type Description
np.ndarray

The transformed image after histogram matching and blending. The output will have the same shape and dtype as the input image.

Supported image types: - Grayscale images: 2D arrays - RGB images: 3D arrays with 3 channels - Multispectral images: 3D arrays with more than 3 channels

Note

  • If the input and reference images have different sizes, the reference image will be resized to match the input image's dimensions.
  • The function uses a custom implementation of histogram matching based on OpenCV and NumPy.
  • The @clipped and @preserve_channel_dim decorators ensure the output is within the valid range and maintains the original number of dimensions.
Source code in albumentations/augmentations/domain_adaptation/functional.py
Python
@clipped
@preserve_channel_dim
def apply_histogram(img: np.ndarray, reference_image: np.ndarray, blend_ratio: float) -> np.ndarray:
    """Apply histogram matching to an input image using a reference image and blend the result.

    This function performs histogram matching between the input image and a reference image,
    then blends the result with the original input image based on the specified blend ratio.

    Args:
        img (np.ndarray): The input image to be transformed. Can be either grayscale or RGB.
            Supported dtypes: uint8, float32 (values should be in [0, 1] range).
        reference_image (np.ndarray): The reference image used for histogram matching.
            Should have the same number of channels as the input image.
            Supported dtypes: uint8, float32 (values should be in [0, 1] range).
        blend_ratio (float): The ratio for blending the matched image with the original image.
            Should be in the range [0, 1], where 0 means no change and 1 means full histogram matching.

    Returns:
        np.ndarray: The transformed image after histogram matching and blending.
            The output will have the same shape and dtype as the input image.

    Supported image types:
        - Grayscale images: 2D arrays
        - RGB images: 3D arrays with 3 channels
        - Multispectral images: 3D arrays with more than 3 channels

    Note:
        - If the input and reference images have different sizes, the reference image
          will be resized to match the input image's dimensions.
        - The function uses a custom implementation of histogram matching based on OpenCV and NumPy.
        - The @clipped and @preserve_channel_dim decorators ensure the output is within
          the valid range and maintains the original number of dimensions.
    """
    # Resize reference image only if necessary
    if img.shape[:2] != reference_image.shape[:2]:
        reference_image = cv2.resize(reference_image, dsize=(img.shape[1