Skip to content

Full API Reference on a single page

Pixel-level transforms

Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.

Spatial-level transforms

Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Transform Image Mask BBoxes Keypoints Global Label
Affine
BBoxSafeRandomCrop
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
D4
ElasticTransform
GridDistortion
GridDropout
GridElasticDeform
HorizontalFlip
Lambda
LongestMaxSize
MaskDropout
MixUp
Morphological
NoOp
OpticalDistortion
OverlayElements
PadIfNeeded
Perspective
PiecewiseAffine
PixelDropout
RandomCrop
RandomCropFromBorders
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
SafeRotate
ShiftScaleRotate
SmallestMaxSize
Transpose
VerticalFlip
XYMasking

augmentations special

blur special

transforms

class AdvancedBlur (blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=(-90, 90), beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=None, p=0.5) [view source on GitHub]

Applies a Generalized Gaussian blur to the input image with randomized parameters for advanced data augmentation.

This transform creates a custom blur kernel based on the Generalized Gaussian distribution, which allows for a wide range of blur effects beyond standard Gaussian blur. It then applies this kernel to the input image through convolution. The transform also incorporates noise into the kernel, resulting in a unique combination of blurring and noise injection.

Key features of this augmentation:

  1. Generalized Gaussian Kernel: Uses a generalized normal distribution to create kernels that can range from box-like blurs to very peaked blurs, controlled by the beta parameter.

  2. Anisotropic Blurring: Allows for different blur strengths in horizontal and vertical directions (controlled by sigma_x and sigma_y), and rotation of the kernel.

  3. Kernel Noise: Adds multiplicative noise to the kernel before applying it to the image, creating more diverse and realistic blur effects.

Implementation Details: The kernel is generated using a 2D Generalized Gaussian function. The process involves: 1. Creating a 2D grid based on the kernel size 2. Applying rotation to this grid 3. Calculating the kernel values using the Generalized Gaussian formula 4. Adding multiplicative noise to the kernel 5. Normalizing the kernel

The resulting kernel is then applied to the image using convolution.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Controls the size of the blur kernel. If a single int is provided, the kernel size will be randomly chosen between 3 and that value. Must be odd and ≥ 3. Larger values create stronger blur effects. Default: (3, 7)

sigma_x_limit tuple[float, float] | float

Controls the spread of the blur in the x direction. Higher values increase blur strength. If a single float is provided, the range will be (0, limit). Default: (0.2, 1.0)

sigma_y_limit tuple[float, float] | float

Controls the spread of the blur in the y direction. Higher values increase blur strength. If a single float is provided, the range will be (0, limit). Default: (0.2, 1.0)

rotate_limit tuple[int, int] | int

Range of angles (in degrees) for rotating the kernel. This rotation allows for diagonal blur directions. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Default: (-90, 90)

beta_limit tuple[float, float] | float

Shape parameter of the Generalized Gaussian distribution. - beta = 1 gives a standard Gaussian distribution - beta < 1 creates heavier tails, resulting in more uniform, box-like blur - beta > 1 creates lighter tails, resulting in more peaked, focused blur Default: (0.5, 8.0)

noise_limit tuple[float, float] | float

Controls the strength of multiplicative noise applied to the kernel. Values around 1.0 keep the original kernel mostly intact, while values further from 1.0 introduce more variation. Default: (0.75, 1.25)

p float

Probability of applying the transform. Default: 0.5

Notes

  • This transform is particularly useful for simulating complex, real-world blur effects that go beyond simple Gaussian blur.
  • The combination of blur and noise can help in creating more robust models by simulating a wider range of image degradations.
  • Extreme values, especially for beta and noise, may result in unrealistic effects and should be used cautiously.

Reference

This transform is inspired by techniques described in: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" https://arxiv.org/abs/2107.10833

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class AdvancedBlur(ImageOnlyTransform):
    """Applies a Generalized Gaussian blur to the input image with randomized parameters for advanced data augmentation.

    This transform creates a custom blur kernel based on the Generalized Gaussian distribution,
    which allows for a wide range of blur effects beyond standard Gaussian blur. It then applies
    this kernel to the input image through convolution. The transform also incorporates noise
    into the kernel, resulting in a unique combination of blurring and noise injection.

    Key features of this augmentation:

    1. Generalized Gaussian Kernel: Uses a generalized normal distribution to create kernels
       that can range from box-like blurs to very peaked blurs, controlled by the beta parameter.

    2. Anisotropic Blurring: Allows for different blur strengths in horizontal and vertical
       directions (controlled by sigma_x and sigma_y), and rotation of the kernel.

    3. Kernel Noise: Adds multiplicative noise to the kernel before applying it to the image,
       creating more diverse and realistic blur effects.

    Implementation Details:
        The kernel is generated using a 2D Generalized Gaussian function. The process involves:
        1. Creating a 2D grid based on the kernel size
        2. Applying rotation to this grid
        3. Calculating the kernel values using the Generalized Gaussian formula
        4. Adding multiplicative noise to the kernel
        5. Normalizing the kernel

        The resulting kernel is then applied to the image using convolution.

    Args:
        blur_limit (tuple[int, int] | int, optional): Controls the size of the blur kernel. If a single int
            is provided, the kernel size will be randomly chosen between 3 and that value.
            Must be odd and ≥ 3. Larger values create stronger blur effects.
            Default: (3, 7)

        sigma_x_limit (tuple[float, float] | float): Controls the spread of the blur in the x direction.
            Higher values increase blur strength.
            If a single float is provided, the range will be (0, limit).
            Default: (0.2, 1.0)

        sigma_y_limit (tuple[float, float] | float): Controls the spread of the blur in the y direction.
            Higher values increase blur strength.
            If a single float is provided, the range will be (0, limit).
            Default: (0.2, 1.0)

        rotate_limit (tuple[int, int] | int): Range of angles (in degrees) for rotating the kernel.
            This rotation allows for diagonal blur directions. If limit is a single int, an angle is picked
            from (-rotate_limit, rotate_limit).
            Default: (-90, 90)

        beta_limit (tuple[float, float] | float): Shape parameter of the Generalized Gaussian distribution.
            - beta = 1 gives a standard Gaussian distribution
            - beta < 1 creates heavier tails, resulting in more uniform, box-like blur
            - beta > 1 creates lighter tails, resulting in more peaked, focused blur
            Default: (0.5, 8.0)

        noise_limit (tuple[float, float] | float): Controls the strength of multiplicative noise
            applied to the kernel. Values around 1.0 keep the original kernel mostly intact,
            while values further from 1.0 introduce more variation.
            Default: (0.75, 1.25)

        p (float): Probability of applying the transform. Default: 0.5

    Notes:
        - This transform is particularly useful for simulating complex, real-world blur effects
          that go beyond simple Gaussian blur.
        - The combination of blur and noise can help in creating more robust models by simulating
          a wider range of image degradations.
        - Extreme values, especially for beta and noise, may result in unrealistic effects and
          should be used cautiously.

    Reference:
        This transform is inspired by techniques described in:
        "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data"
        https://arxiv.org/abs/2107.10833

    Targets:
        image

    Image types:
        uint8, float32
    """

    class InitSchema(BlurInitSchema):
        sigma_x_limit: NonNegativeFloatRangeType
        sigma_y_limit: NonNegativeFloatRangeType
        beta_limit: NonNegativeFloatRangeType
        noise_limit: NonNegativeFloatRangeType
        rotate_limit: SymmetricRangeType

        @field_validator("beta_limit")
        @classmethod
        def check_beta_limit(cls, value: ScaleFloatType) -> tuple[float, float]:
            result = to_tuple(value, low=0)
            if not (result[0] < 1.0 < result[1]):
                msg = "beta_limit is expected to include 1.0."
                raise ValueError(msg)
            return result

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.sigma_x_limit, (tuple, list))
                and self.sigma_x_limit[0] == 0
                and isinstance(self.sigma_y_limit, (tuple, list))
                and self.sigma_y_limit[0] == 0
            ):
                msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_x_limit: ScaleFloatType = (0.2, 1.0),
        sigma_y_limit: ScaleFloatType = (0.2, 1.0),
        sigmaX_limit: ScaleFloatType | None = None,  # noqa: N803
        sigmaY_limit: ScaleFloatType | None = None,  # noqa: N803
        rotate_limit: ScaleIntType = (-90, 90),
        beta_limit: ScaleFloatType = (0.5, 8.0),
        noise_limit: ScaleFloatType = (0.9, 1.1),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        if sigmaX_limit is not None:
            warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_x_limit = sigmaX_limit

        if sigmaY_limit is not None:
            warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning, stacklevel=2)
            sigma_y_limit = sigmaY_limit

        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_x_limit = cast(Tuple[float, float], sigma_x_limit)
        self.sigma_y_limit = cast(Tuple[float, float], sigma_y_limit)
        self.rotate_limit = cast(Tuple[int, int], rotate_limit)
        self.beta_limit = cast(Tuple[float, float], beta_limit)
        self.noise_limit = cast(Tuple[float, float], noise_limit)

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        sigma_x = random.uniform(*self.sigma_x_limit)
        sigma_y = random.uniform(*self.sigma_y_limit)
        angle = np.deg2rad(random.uniform(*self.rotate_limit))

        # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
        beta = (
            random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
        )

        noise_matrix = random_utils.uniform(*self.noise_limit, size=(ksize, ksize))

        # Generate mesh grid centered at zero.
        ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
        # > Shape (ksize, ksize, 2)
        grid = np.stack(np.meshgrid(ax, ax), axis=-1)

        # Calculate rotated sigma matrix
        d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
        u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
        sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

        inverse_sigma = np.linalg.inv(sigma_matrix)
        # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
        kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
        # Add noise
        kernel *= noise_matrix

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)
        return {"kernel": kernel}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str]:
        return (
            "blur_limit",
            "sigma_x_limit",
            "sigma_y_limit",
            "rotate_limit",
            "beta_limit",
            "noise_limit",
        )
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BlurInitSchema):
    sigma_x_limit: NonNegativeFloatRangeType
    sigma_y_limit: NonNegativeFloatRangeType
    beta_limit: NonNegativeFloatRangeType
    noise_limit: NonNegativeFloatRangeType
    rotate_limit: SymmetricRangeType

    @field_validator("beta_limit")
    @classmethod
    def check_beta_limit(cls, value: ScaleFloatType) -> tuple[float, float]:
        result = to_tuple(value, low=0)
        if not (result[0] < 1.0 < result[1]):
            msg = "beta_limit is expected to include 1.0."
            raise ValueError(msg)
        return result

    @model_validator(mode="after")
    def validate_limits(self) -> Self:
        if (
            isinstance(self.sigma_x_limit, (tuple, list))
            and self.sigma_x_limit[0] == 0
            and isinstance(self.sigma_y_limit, (tuple, list))
            and self.sigma_y_limit[0] == 0
        ):
            msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel=kernel)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> dict[str, np.ndarray]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
    sigma_x = random.uniform(*self.sigma_x_limit)
    sigma_y = random.uniform(*self.sigma_y_limit)
    angle = np.deg2rad(random.uniform(*self.rotate_limit))

    # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
    beta = (
        random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
    )

    noise_matrix = random_utils.uniform(*self.noise_limit, size=(ksize, ksize))

    # Generate mesh grid centered at zero.
    ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
    # > Shape (ksize, ksize, 2)
    grid = np.stack(np.meshgrid(ax, ax), axis=-1)

    # Calculate rotated sigma matrix
    d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
    u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
    sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

    inverse_sigma = np.linalg.inv(sigma_matrix)
    # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
    kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
    # Add noise
    kernel *= noise_matrix

    # Normalize kernel
    kernel = kernel.astype(np.float32) / np.sum(kernel)
    return {"kernel": kernel}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str]:
    return (
        "blur_limit",
        "sigma_x_limit",
        "sigma_y_limit",
        "rotate_limit",
        "beta_limit",
        "noise_limit",
    )
class Blur (blur_limit=(3, 7), p=0.5, always_apply=None) [view source on GitHub]

Apply uniform box blur to the input image using a randomly sized square kernel.

This transform uses OpenCV's cv2.blur function, which performs a simple box filter blur. The size of the blur kernel is randomly selected for each application, allowing for varying degrees of blur intensity.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Controls the range of the blur kernel size. - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. The kernel size must be odd and greater than or equal to 3. Larger kernel sizes produce stronger blur effects. Default: (3, 7)

p float

Probability of applying the transform. Default: 0.5

Notes

  • The blur kernel is always square (same width and height).
  • Only odd kernel sizes are used to ensure the blur has a clear center pixel.
  • Box blur is faster than Gaussian blur but may produce less natural results.
  • This blur method averages all pixels under the kernel area, which can reduce noise but also reduce image detail.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Blur(blur_limit=(3, 7), p=1.0)
>>> result = transform(image=image)
>>> blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class Blur(ImageOnlyTransform):
    """Apply uniform box blur to the input image using a randomly sized square kernel.

    This transform uses OpenCV's cv2.blur function, which performs a simple box filter blur.
    The size of the blur kernel is randomly selected for each application, allowing for
    varying degrees of blur intensity.

    Args:
        blur_limit (tuple[int, int] | int): Controls the range of the blur kernel size.
            - If a single int is provided, the kernel size will be randomly chosen
              between 3 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            The kernel size must be odd and greater than or equal to 3.
            Larger kernel sizes produce stronger blur effects.
            Default: (3, 7)

        p (float): Probability of applying the transform. Default: 0.5

    Notes:
        - The blur kernel is always square (same width and height).
        - Only odd kernel sizes are used to ensure the blur has a clear center pixel.
        - Box blur is faster than Gaussian blur but may produce less natural results.
        - This blur method averages all pixels under the kernel area, which can
          reduce noise but also reduce image detail.

    Targets:
        image

    Image types:
        uint8, float32

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.Blur(blur_limit=(3, 7), p=1.0)
        >>> result = transform(image=image)
        >>> blurred_image = result["image"]
    """

    class InitSchema(BlurInitSchema):
        pass

    def __init__(self, blur_limit: ScaleIntType = (3, 7), p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.blur(img, kernel)

    def get_params(self) -> dict[str, Any]:
        return {"kernel": random_utils.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("blur_limit",)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BlurInitSchema):
    pass
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fblur.blur(img, kernel)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    return {"kernel": random_utils.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("blur_limit",)
class BlurInitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class BlurInitSchema(BaseTransformInitSchema):
    blur_limit: ScaleIntType

    @field_validator("blur_limit")
    @classmethod
    def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
        return process_blur_limit(value, info, min_value=3)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class Defocus (radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=None, p=0.5) [view source on GitHub]

Apply defocus blur to the input image.

This transform simulates the effect of an out-of-focus camera by applying a defocus blur to the image. It uses a combination of disc kernels and Gaussian blur to create a realistic defocus effect.

Parameters:

Name Type Description
radius tuple[int, int] | int

Range for the radius of the defocus blur. If a single int is provided, the range will be [1, radius]. Larger values create a stronger blur effect. Default: (3, 10)

alias_blur tuple[float, float] | float

Range for the standard deviation of the Gaussian blur applied after the main defocus blur. This helps to reduce aliasing artifacts. If a single float is provided, the range will be (0, alias_blur). Larger values create a smoother, more aliased effect. Default: (0.1, 0.5)

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Note

  • The defocus effect is created using a disc kernel, which simulates the shape of a camera's aperture.
  • The additional Gaussian blur (alias_blur) helps to soften the edges of the disc kernel, creating a more natural-looking defocus effect.
  • Larger radius values will create a stronger, more noticeable defocus effect.
  • The alias_blur parameter can be used to fine-tune the appearance of the defocus, with larger values creating a smoother, potentially more realistic effect.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Defocus(radius=(4, 8), alias_blur=(0.2, 0.4), always_apply=True)
>>> result = transform(image=image)
>>> defocused_image = result['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class Defocus(ImageOnlyTransform):
    """Apply defocus blur to the input image.

    This transform simulates the effect of an out-of-focus camera by applying a defocus blur
    to the image. It uses a combination of disc kernels and Gaussian blur to create a realistic
    defocus effect.

    Args:
        radius (tuple[int, int] | int): Range for the radius of the defocus blur.
            If a single int is provided, the range will be [1, radius].
            Larger values create a stronger blur effect.
            Default: (3, 10)

        alias_blur (tuple[float, float] | float): Range for the standard deviation of the Gaussian blur
            applied after the main defocus blur. This helps to reduce aliasing artifacts.
            If a single float is provided, the range will be (0, alias_blur).
            Larger values create a smoother, more aliased effect.
            Default: (0.1, 0.5)

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The defocus effect is created using a disc kernel, which simulates the shape of a camera's aperture.
        - The additional Gaussian blur (alias_blur) helps to soften the edges of the disc kernel, creating a
          more natural-looking defocus effect.
        - Larger radius values will create a stronger, more noticeable defocus effect.
        - The alias_blur parameter can be used to fine-tune the appearance of the defocus, with larger values
          creating a smoother, potentially more realistic effect.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.Defocus(radius=(4, 8), alias_blur=(0.2, 0.4), always_apply=True)
        >>> result = transform(image=image)
        >>> defocused_image = result['image']

    References:
        - https://en.wikipedia.org/wiki/Defocus_aberration
        - https://www.researchgate.net/publication/261311609_Realistic_Defocus_Blur_for_Multiplane_Computer-Generated_Holography
    """

    class InitSchema(BaseTransformInitSchema):
        radius: OnePlusIntRangeType
        alias_blur: NonNegativeFloatRangeType

    def __init__(
        self,
        radius: ScaleIntType = (3, 10),
        alias_blur: ScaleFloatType = (0.1, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.radius = cast(Tuple[int, int], radius)
        self.alias_blur = cast(Tuple[float, float], alias_blur)

    def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
        return fblur.defocus(img, radius, alias_blur)

    def get_params(self) -> dict[str, Any]:
        return {
            "radius": random.randint(*self.radius),
            "alias_blur": random.uniform(*self.alias_blur),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("radius", "alias_blur")
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    radius: OnePlusIntRangeType
    alias_blur: NonNegativeFloatRangeType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, radius, alias_blur, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
    return fblur.defocus(img, radius, alias_blur)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    return {
        "radius": random.randint(*self.radius),
        "alias_blur": random.uniform(*self.alias_blur),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("radius", "alias_blur")
class GaussianBlur (blur_limit=(3, 7), sigma_limit=0, always_apply=None, p=0.5) [view source on GitHub]

Apply Gaussian blur to the input image using a randomly sized kernel.

This transform blurs the input image using a Gaussian filter with a random kernel size and sigma value. Gaussian blur is a widely used image processing technique that reduces image noise and detail, creating a smoothing effect.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Controls the range of the Gaussian kernel size. - If a single int is provided, the kernel size will be randomly chosen between 0 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. Larger kernel sizes produce stronger blur effects. Default: (3, 7)

sigma_limit tuple[float, float] | float

Range for the Gaussian kernel standard deviation (sigma). Must be in range [0, inf). - If a single float is provided, sigma will be randomly chosen between 0 and that value. - If a tuple of two floats is provided, it defines the inclusive range of possible sigma values. If set to 0, sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Larger sigma values produce stronger blur effects. Default: 0

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The relationship between kernel size and sigma affects the blur strength: larger kernel sizes allow for stronger blurring effects.
  • When both blur_limit and sigma_limit are set to ranges starting from 0, the blur_limit minimum is automatically set to 3 to ensure a valid kernel size.
  • For uint8 images, the computation might be faster than for floating-point images.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GaussianBlur(blur_limit=(3, 7), sigma_limit=(0.1, 2), p=1)
>>> result = transform(image=image)
>>> blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class GaussianBlur(ImageOnlyTransform):
    """Apply Gaussian blur to the input image using a randomly sized kernel.

    This transform blurs the input image using a Gaussian filter with a random kernel size
    and sigma value. Gaussian blur is a widely used image processing technique that reduces
    image noise and detail, creating a smoothing effect.

    Args:
        blur_limit (tuple[int, int] | int): Controls the range of the Gaussian kernel size.
            - If a single int is provided, the kernel size will be randomly chosen
              between 0 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            Must be zero or odd and in range [0, inf). If set to 0, it will be computed
            from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            Larger kernel sizes produce stronger blur effects.
            Default: (3, 7)

        sigma_limit (tuple[float, float] | float): Range for the Gaussian kernel standard
            deviation (sigma). Must be in range [0, inf).
            - If a single float is provided, sigma will be randomly chosen
              between 0 and that value.
            - If a tuple of two floats is provided, it defines the inclusive range
              of possible sigma values.
            If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
            Larger sigma values produce stronger blur effects.
            Default: 0

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The relationship between kernel size and sigma affects the blur strength:
          larger kernel sizes allow for stronger blurring effects.
        - When both blur_limit and sigma_limit are set to ranges starting from 0,
          the blur_limit minimum is automatically set to 3 to ensure a valid kernel size.
        - For uint8 images, the computation might be faster than for floating-point images.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.GaussianBlur(blur_limit=(3, 7), sigma_limit=(0.1, 2), p=1)
        >>> result = transform(image=image)
        >>> blurred_image = result["image"]
    """

    class InitSchema(BlurInitSchema):
        sigma_limit: NonNegativeFloatRangeType

        @field_validator("blur_limit")
        @classmethod
        def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
            return process_blur_limit(value, info, min_value=0)

        @model_validator(mode="after")
        def validate_limits(self) -> Self:
            if (
                isinstance(self.blur_limit, (tuple, list))
                and self.blur_limit[0] == 0
                and isinstance(self.sigma_limit, (tuple, list))
                and self.sigma_limit[0] == 0
            ):
                self.blur_limit = 3, max(3, self.blur_limit[1])
                warnings.warn(
                    "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                    "blur_limit minimum value changed to 3.",
                    stacklevel=2,
                )

            if isinstance(self.blur_limit, tuple):
                for v in self.blur_limit:
                    if v != 0 and v % 2 != 1:
                        raise ValueError(f"Blur limit must be 0 or odd. Got: {self.blur_limit}")

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_limit = cast(Tuple[float, float], sigma_limit)

    def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
        return fblur.gaussian_blur(img, ksize, sigma=sigma)

    def get_params(self) -> dict[str, float]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
        if ksize != 0 and ksize % 2 != 1:
            ksize = (ksize + 1) % (self.blur_limit[1] + 1)

        return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "blur_limit", "sigma_limit"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BlurInitSchema):
    sigma_limit: NonNegativeFloatRangeType

    @field_validator("blur_limit")
    @classmethod
    def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
        return process_blur_limit(value, info, min_value=0)

    @model_validator(mode="after")
    def validate_limits(self) -> Self:
        if (
            isinstance(self.blur_limit, (tuple, list))
            and self.blur_limit[0] == 0
            and isinstance(self.sigma_limit, (tuple, list))
            and self.sigma_limit[0] == 0
        ):
            self.blur_limit = 3, max(3, self.blur_limit[1])
            warnings.warn(
                "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                "blur_limit minimum value changed to 3.",
                stacklevel=2,
            )

        if isinstance(self.blur_limit, tuple):
            for v in self.blur_limit:
                if v != 0 and v % 2 != 1:
                    raise ValueError(f"Blur limit must be 0 or odd. Got: {self.blur_limit}")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, ksize, sigma, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
    return fblur.gaussian_blur(img, ksize, sigma=sigma)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> dict[str, float]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
    if ksize != 0 and ksize % 2 != 1:
        ksize = (ksize + 1) % (self.blur_limit[1] + 1)

    return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return "blur_limit", "sigma_limit"
class GlassBlur (sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=None, p=0.5) [view source on GitHub]

Apply a glass blur effect to the input image.

This transform simulates the effect of looking through textured glass by locally shuffling pixels in the image. It creates a distorted, frosted glass-like appearance.

Parameters:

Name Type Description
sigma float

Standard deviation for the Gaussian kernel used in the process. Higher values increase the blur effect. Must be non-negative. Default: 0.7

max_delta int

Maximum distance in pixels for shuffling. Determines how far pixels can be moved. Larger values create more distortion. Must be a positive integer. Default: 4

iterations int

Number of times to apply the glass blur effect. More iterations create a stronger effect but increase computation time. Must be a positive integer. Default: 2

mode Literal["fast", "exact"]

Mode of computation. Options are: - "fast": Uses a faster but potentially less accurate method. - "exact": Uses a slower but more precise method. Default: "fast"

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • This transform is particularly effective for creating a 'looking through glass' effect or simulating the view through a frosted window.
  • The 'fast' mode is recommended for most use cases as it provides a good balance between effect quality and computation speed.
  • Increasing 'iterations' will strengthen the effect but also increase the processing time linearly.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GlassBlur(sigma=0.7, max_delta=4, iterations=3, mode="fast", p=1)
>>> result = transform(image=image)
>>> glass_blurred_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class GlassBlur(ImageOnlyTransform):
    """Apply a glass blur effect to the input image.

    This transform simulates the effect of looking through textured glass by locally
    shuffling pixels in the image. It creates a distorted, frosted glass-like appearance.

    Args:
        sigma (float): Standard deviation for the Gaussian kernel used in the process.
            Higher values increase the blur effect. Must be non-negative.
            Default: 0.7

        max_delta (int): Maximum distance in pixels for shuffling.
            Determines how far pixels can be moved. Larger values create more distortion.
            Must be a positive integer.
            Default: 4

        iterations (int): Number of times to apply the glass blur effect.
            More iterations create a stronger effect but increase computation time.
            Must be a positive integer.
            Default: 2

        mode (Literal["fast", "exact"]): Mode of computation. Options are:
            - "fast": Uses a faster but potentially less accurate method.
            - "exact": Uses a slower but more precise method.
            Default: "fast"

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - This transform is particularly effective for creating a 'looking through
          glass' effect or simulating the view through a frosted window.
        - The 'fast' mode is recommended for most use cases as it provides a good
          balance between effect quality and computation speed.
        - Increasing 'iterations' will strengthen the effect but also increase the
          processing time linearly.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.GlassBlur(sigma=0.7, max_delta=4, iterations=3, mode="fast", p=1)
        >>> result = transform(image=image)
        >>> glass_blurred_image = result["image"]

    References:
        - This implementation is based on the technique described in:
          "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness"
          https://arxiv.org/abs/1903.12261
        - Original implementation:
          https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
    """

    class InitSchema(BaseTransformInitSchema):
        sigma: float = Field(ge=0)
        max_delta: int = Field(ge=1)
        iterations: int = Field(ge=1)
        mode: Literal["fast", "exact"]

    def __init__(
        self,
        sigma: float = 0.7,
        max_delta: int = 4,
        iterations: int = 2,
        mode: Literal["fast", "exact"] = "fast",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.sigma = sigma
        self.max_delta = max_delta
        self.iterations = iterations
        self.mode = mode

    def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
        return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]

        # generate array containing all necessary values for transformations
        width_pixels = height - self.max_delta * 2
        height_pixels = width - self.max_delta * 2
        total_pixels = int(width_pixels * height_pixels)
        dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

        return {"dxy": dxy}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "sigma", "max_delta", "iterations", "mode"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    sigma: float = Field(ge=0)
    max_delta: int = Field(ge=1)
    iterations: int = Field(ge=1)
    mode: Literal["fast", "exact"]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, *args, *, dxy, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
    return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    height, width = params["shape"][:2]

    # generate array containing all necessary values for transformations
    width_pixels = height - self.max_delta * 2
    height_pixels = width - self.max_delta * 2
    total_pixels = int(width_pixels * height_pixels)
    dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

    return {"dxy": dxy}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return "sigma", "max_delta", "iterations", "mode"
class MedianBlur (blur_limit=7, p=0.5, always_apply=None) [view source on GitHub]

Apply median blur to the input image.

This transform uses a median filter to blur the input image. Median filtering is particularly effective at removing salt-and-pepper noise while preserving edges, making it a popular choice for noise reduction in image processing.

Parameters:

Name Type Description
blur_limit int | tuple[int, int]

Maximum aperture linear size for blurring the input image. Must be odd and in the range [3, inf). - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Default: (3, 7)

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The kernel size (aperture linear size) must always be odd and greater than 1.
  • Unlike mean blur or Gaussian blur, median blur uses the median of all pixels under the kernel area, making it more robust to outliers.
  • This transform is particularly useful for:
  • Removing salt-and-pepper noise
  • Preserving edges while smoothing images
  • Pre-processing images for edge detection algorithms
  • For color images, the median is calculated independently for each channel.
  • Larger kernel sizes result in stronger blurring effects but may also remove fine details from the image.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MedianBlur(blur_limit=(3, 7), p=0.5)
>>> result = transform(image=image)
>>> blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class MedianBlur(Blur):
    """Apply median blur to the input image.

    This transform uses a median filter to blur the input image. Median filtering is particularly
    effective at removing salt-and-pepper noise while preserving edges, making it a popular choice
    for noise reduction in image processing.

    Args:
        blur_limit (int | tuple[int, int]): Maximum aperture linear size for blurring the input image.
            Must be odd and in the range [3, inf).
            - If a single int is provided, the kernel size will be randomly chosen
              between 3 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            Default: (3, 7)

        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The kernel size (aperture linear size) must always be odd and greater than 1.
        - Unlike mean blur or Gaussian blur, median blur uses the median of all pixels under
          the kernel area, making it more robust to outliers.
        - This transform is particularly useful for:
          * Removing salt-and-pepper noise
          * Preserving edges while smoothing images
          * Pre-processing images for edge detection algorithms
        - For color images, the median is calculated independently for each channel.
        - Larger kernel sizes result in stronger blurring effects but may also remove
          fine details from the image.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.MedianBlur(blur_limit=(3, 7), p=0.5)
        >>> result = transform(image=image)
        >>> blurred_image = result["image"]

    References:
        - Median filter: https://en.wikipedia.org/wiki/Median_filter
        - OpenCV medianBlur: https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga564869aa33e58769b4469101aac458f9
    """

    def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(blur_limit=blur_limit, p=p, always_apply=always_apply)

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fblur.median_blur(img, kernel)
__init__ (self, blur_limit=7, p=0.5, always_apply=None) special

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/blur/transforms.py
Python
def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
    super().__init__(blur_limit=blur_limit, p=p, always_apply=always_apply)
apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fblur.median_blur(img, kernel)
class MotionBlur (blur_limit=7, allow_shifted=True, always_apply=None, p=0.5) [view source on GitHub]

Apply motion blur to the input image using a random-sized kernel.

This transform simulates the effect of camera or object motion during image capture, creating a directional blur. It uses a line-shaped kernel with random orientation to achieve this effect.

Parameters:

Name Type Description
blur_limit int | tuple[int, int]

Maximum kernel size for blurring the input image. Should be in range [3, inf). - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Default: (3, 7)

allow_shifted bool

If set to True, allows the motion blur kernel to be randomly shifted from the center. If False, the kernel will always be centered. Default: True

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The blur kernel is always a straight line, simulating linear motion.
  • The angle of the motion blur is randomly chosen for each application.
  • Larger kernel sizes result in more pronounced motion blur effects.
  • When allow_shifted is True, the blur effect can appear more natural and varied, as it simulates motion that isn't perfectly centered in the frame.
  • This transform is particularly useful for:
  • Simulating camera shake or motion blur in action scenes
  • Data augmentation for object detection or tracking tasks
  • Creating more challenging inputs for image stabilization algorithms

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MotionBlur(blur_limit=7, allow_shifted=True, p=0.5)
>>> result = transform(image=image)
>>> motion_blurred_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class MotionBlur(Blur):
    """Apply motion blur to the input image using a random-sized kernel.

    This transform simulates the effect of camera or object motion during image capture,
    creating a directional blur. It uses a line-shaped kernel with random orientation
    to achieve this effect.

    Args:
        blur_limit (int | tuple[int, int]): Maximum kernel size for blurring the input image.
            Should be in range [3, inf).
            - If a single int is provided, the kernel size will be randomly chosen
              between 3 and that value.
            - If a tuple of two ints is provided, it defines the inclusive range
              of possible kernel sizes.
            Default: (3, 7)

        allow_shifted (bool): If set to True, allows the motion blur kernel to be
            randomly shifted from the center. If False, the kernel will always be
            centered. Default: True

        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The blur kernel is always a straight line, simulating linear motion.
        - The angle of the motion blur is randomly chosen for each application.
        - Larger kernel sizes result in more pronounced motion blur effects.
        - When `allow_shifted` is True, the blur effect can appear more natural and varied,
          as it simulates motion that isn't perfectly centered in the frame.
        - This transform is particularly useful for:
          * Simulating camera shake or motion blur in action scenes
          * Data augmentation for object detection or tracking tasks
          * Creating more challenging inputs for image stabilization algorithms

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.MotionBlur(blur_limit=7, allow_shifted=True, p=0.5)
        >>> result = transform(image=image)
        >>> motion_blurred_image = result["image"]

    References:
        - Motion blur: https://en.wikipedia.org/wiki/Motion_blur
        - OpenCV filter2D (used internally):
          https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga27c049795ce870216ddfb366086b5a04
    """

    class InitSchema(BaseTransformInitSchema):
        allow_shifted: bool
        blur_limit: ScaleIntType

        @model_validator(mode="after")
        def process_blur(self) -> Self:
            self.blur_limit = cast(Tuple[int, int], to_tuple(self.blur_limit, 3))

            if self.allow_shifted and isinstance(self.blur_limit, tuple) and any(x % 2 != 1 for x in self.blur_limit):
                raise ValueError(f"Blur limit must be odd when centered=True. Got: {self.blur_limit}")

            return self

    def __init__(
        self,
        blur_limit: ScaleIntType = 7,
        allow_shifted: bool = True,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(blur_limit=blur_limit, p=p, always_apply=always_apply)
        self.allow_shifted = allow_shifted
        self.blur_limit = cast(Tuple[int, int], blur_limit)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "allow_shifted")

    def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel=kernel)

    def get_params(self) -> dict[str, Any]:
        ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
        if ksize <= TWO:
            raise ValueError(f"ksize must be > 2. Got: {ksize}")
        kernel = np.zeros((ksize, ksize), dtype=np.uint8)
        x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
        if x1 == x2:
            y1, y2 = random.sample(range(ksize), 2)
        else:
            y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

        def make_odd_val(v1: int, v2: int) -> tuple[int, int]:
            len_v = abs(v1 - v2) + 1
            if len_v % 2 != 1:
                if v2 > v1:
                    v2 -= 1
                else:
                    v1 -= 1
            return v1, v2

        if not self.allow_shifted:
            x1, x2 = make_odd_val(x1, x2)
            y1, y2 = make_odd_val(y1, y2)

            xc = (x1 + x2) / 2
            yc = (y1 + y2) / 2

            center = ksize / 2 - 0.5
            dx = xc - center
            dy = yc - center
            x1, x2 = (int(i - dx) for i in [x1, x2])
            y1, y2 = (int(i - dy) for i in [y1, y2])

        cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

        # Normalize kernel
        return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    allow_shifted: bool
    blur_limit: ScaleIntType

    @model_validator(mode="after")
    def process_blur(self) -> Self:
        self.blur_limit = cast(Tuple[int, int], to_tuple(self.blur_limit, 3))

        if self.allow_shifted and isinstance(self.blur_limit, tuple) and any(x % 2 != 1 for x in self.blur_limit):
            raise ValueError(f"Blur limit must be odd when centered=True. Got: {self.blur_limit}")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel=kernel)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
    if ksize <= TWO:
        raise ValueError(f"ksize must be > 2. Got: {ksize}")
    kernel = np.zeros((ksize, ksize), dtype=np.uint8)
    x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
    if x1 == x2:
        y1, y2 = random.sample(range(ksize), 2)
    else:
        y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

    def make_odd_val(v1: int, v2: int) -> tuple[int, int]:
        len_v = abs(v1 - v2) + 1
        if len_v % 2 != 1:
            if v2 > v1:
                v2 -= 1
            else:
                v1 -= 1
        return v1, v2

    if not self.allow_shifted:
        x1, x2 = make_odd_val(x1, x2)
        y1, y2 = make_odd_val(y1, y2)

        xc = (x1 + x2) / 2
        yc = (y1 + y2) / 2

        center = ksize / 2 - 0.5
        dx = xc - center
        dy = yc - center
        x1, x2 = (int(i - dx) for i in [x1, x2])
        y1, y2 = (int(i - dy) for i in [y1, y2])

    cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

    # Normalize kernel
    return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (*super().get_transform_init_args_names(), "allow_shifted")
class ZoomBlur (max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=None, p=0.5) [view source on GitHub]

Apply zoom blur transform.

Parameters:

Name Type Description
max_factor float, float) or float

range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1.

step_factor float, float) or float

If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range [step_factor[0], step_factor[1]). Default: (0.01, 0.03). All step_factor values should be positive.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class ZoomBlur(ImageOnlyTransform):
    """Apply zoom blur transform.

    Args:
        max_factor ((float, float) or float): range for max factor for blurring.
            If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
            All max_factor values should be larger than 1.
        step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
            If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
            All step_factor values should be positive.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        unit8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
    """

    class InitSchema(BaseTransformInitSchema):
        max_factor: OnePlusFloatRangeType
        step_factor: NonNegativeFloatRangeType

    def __init__(
        self,
        max_factor: ScaleFloatType = (1, 1.31),
        step_factor: ScaleFloatType = (0.01, 0.03),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.max_factor = cast(Tuple[float, float], max_factor)
        self.step_factor = cast(Tuple[float, float], step_factor)

    def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
        return fblur.zoom_blur(img, zoom_factors)

    def get_params(self) -> dict[str, Any]:
        step_factor = random.uniform(*self.step_factor)
        max_factor = max(1 + step_factor, random.uniform(*self.max_factor))
        return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("max_factor", "step_factor")
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/blur/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    max_factor: OnePlusFloatRangeType
    step_factor: NonNegativeFloatRangeType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, zoom_factors, **params)

Apply transform on image.

Source code in albumentations/augmentations/blur/transforms.py
Python
def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
    return fblur.zoom_blur(img, zoom_factors)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    step_factor = random.uniform(*self.step_factor)
    max_factor = max(1 + step_factor, random.uniform(*self.max_factor))
    return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/blur/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("max_factor", "step_factor")

crops special

functional

def crop_and_pad_keypoints (keypoints, crop_params=None, pad_params=None, image_shape=(0, 0), result_shape=(0, 0), keep_size=False) [view source on GitHub]

Crop and pad multiple keypoints simultaneously.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).

crop_params Sequence[int]

Crop parameters [crop_x1, crop_y1, ...].

pad_params Sequence[int]

Pad parameters [top, bottom, left, right].

image_shape Tuple[int, int]

Original image shape (rows, cols).

result_shape Tuple[int, int]

Result image shape (rows, cols).

keep_size bool

Whether to keep the original size.

Returns:

Type Description
np.ndarray

Array of transformed keypoints with the same shape as input.

Source code in albumentations/augmentations/crops/functional.py
Python
@handle_empty_array
def crop_and_pad_keypoints(
    keypoints: np.ndarray,
    crop_params: tuple[int, int, int, int] | None = None,
    pad_params: tuple[int, int, int, int] | None = None,
    image_shape: tuple[int, int] = (0, 0),
    result_shape: tuple[int, int] = (0, 0),
    keep_size: bool = False,
) -> np.ndarray:
    """Crop and pad multiple keypoints simultaneously.

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).
        crop_params (Sequence[int], optional): Crop parameters [crop_x1, crop_y1, ...].
        pad_params (Sequence[int], optional): Pad parameters [top, bottom, left, right].
        image_shape (Tuple[int, int]): Original image shape (rows, cols).
        result_shape (Tuple[int, int]): Result image shape (rows, cols).
        keep_size (bool): Whether to keep the original size.

    Returns:
        np.ndarray: Array of transformed keypoints with the same shape as input.
    """
    transformed_keypoints = keypoints.copy()

    if crop_params is not None:
        crop_x1, crop_y1 = crop_params[:2]
        transformed_keypoints[:, 0] -= crop_x1
        transformed_keypoints[:, 1] -= crop_y1

    if pad_params is not None:
        top, _, left, _ = pad_params
        transformed_keypoints[:, 0] += left
        transformed_keypoints[:, 1] += top

    rows, cols = image_shape[:2]
    result_rows, result_cols = result_shape[:2]

    if keep_size and (result_cols != cols or result_rows != rows):
        scale_x = cols / result_cols
        scale_y = rows / result_rows
        return fgeometric.keypoints_scale(transformed_keypoints, scale_x, scale_y)

    return transformed_keypoints
def crop_bboxes_by_coords (bboxes, crop_coords, image_shape) [view source on GitHub]

Crop bounding boxes based on given crop coordinates.

This function adjusts bounding boxes to fit within a cropped image.

Parameters:

Name Type Description
bboxes np.ndarray

Array of bounding boxes with shape (N, 4+) where each row is [x_min, y_min, x_max, y_max, ...]. The bounding box coordinates should be normalized (in the range [0, 1]).

crop_coords tuple[int, int, int, int]

Crop coordinates (x_min, y_min, x_max, y_max) in absolute pixel values.

image_shape tuple[int, int]

Original image shape (height, width).

Returns:

Type Description
np.ndarray

Array of cropped bounding boxes, normalized to the new crop size.

Note

Bounding boxes that fall completely outside the crop area will be removed. Bounding boxes that partially overlap with the crop area will be adjusted to fit within it.

Source code in albumentations/augmentations/crops/functional.py
Python
def crop_bboxes_by_coords(
    bboxes: np.ndarray,
    crop_coords: tuple[int, int, int, int],
    image_shape: tuple[int, int],
) -> np.ndarray:
    """Crop bounding boxes based on given crop coordinates.

    This function adjusts bounding boxes to fit within a cropped image.

    Args:
        bboxes (np.ndarray): Array of bounding boxes with shape (N, 4+) where each row is
                             [x_min, y_min, x_max, y_max, ...]. The bounding box coordinates
                             should be normalized (in the range [0, 1]).
        crop_coords (tuple[int, int, int, int]): Crop coordinates (x_min, y_min, x_max, y_max)
                                                 in absolute pixel values.
        image_shape (tuple[int, int]): Original image shape (height, width).

    Returns:
        np.ndarray: Array of cropped bounding boxes, normalized to the new crop size.

    Note:
        Bounding boxes that fall completely outside the crop area will be removed.
        Bounding boxes that partially overlap with the crop area will be adjusted to fit within it.
    """
    if not bboxes.size:
        return bboxes

    cropped_bboxes = denormalize_bboxes(bboxes.copy().astype(np.float32), image_shape)

    x_min, y_min = crop_coords[:2]

    # Subtract crop coordinates
    cropped_bboxes[:, [0, 2]] -= x_min
    cropped_bboxes[:, [1, 3]] -= y_min

    # Calculate crop shape
    crop_height = crop_coords[3] - crop_coords[1]
    crop_width = crop_coords[2] - crop_coords[0]
    crop_shape = (crop_height, crop_width)

    # Normalize the cropped bboxes
    return normalize_bboxes(cropped_bboxes, crop_shape)
def crop_keypoints_by_coords (keypoints, crop_coords) [view source on GitHub]

Crop keypoints using the provided coordinates of bottom-left and top-right corners in pixels.

Parameters:

Name Type Description
keypoints np.ndarray

An array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).

crop_coords tuple

Crop box coords (x1, y1, x2, y2).

Returns:

Type Description
np.ndarray

An array of cropped keypoints with the same shape as the input.

Source code in albumentations/augmentations/crops/functional.py
Python
@handle_empty_array
def crop_keypoints_by_coords(
    keypoints: np.ndarray,
    crop_coords: tuple[int, int, int, int],
) -> np.ndarray:
    """Crop keypoints using the provided coordinates of bottom-left and top-right corners in pixels.

    Args:
        keypoints (np.ndarray): An array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).
        crop_coords (tuple): Crop box coords (x1, y1, x2, y2).

    Returns:
        np.ndarray: An array of cropped keypoints with the same shape as the input.
    """
    x1, y1 = crop_coords[:2]

    cropped_keypoints = keypoints.copy()
    cropped_keypoints[:, 0] -= x1  # Adjust x coordinates
    cropped_keypoints[:, 1] -= y1  # Adjust y coordinates

    return cropped_keypoints

transforms

class BBoxSafeRandomCrop (erosion_rate=0.0, p=1.0, always_apply=None) [view source on GitHub]

Crop a random part of the input without loss of bboxes.

Parameters:

Name Type Description
erosion_rate float

erosion rate applied on input image height before crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BBoxSafeRandomCrop(_BaseCrop):
    """Crop a random part of the input without loss of bboxes.

    Args:
        erosion_rate: erosion rate applied on input image height before crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        erosion_rate: float = Field(
            default=0.0,
            ge=0.0,
            le=1.0,
            description="Erosion rate applied on input image height before crop.",
        )
        p: ProbabilityType = 1

    def __init__(self, erosion_rate: float = 0.0, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.erosion_rate = erosion_rate

    def _get_coords_no_bbox(self, image_shape: tuple[int, int]) -> tuple[int, int, int, int]:
        image_height, image_width = image_shape

        erosive_h = int(image_height * (1.0 - self.erosion_rate))
        crop_height = image_height if erosive_h >= image_height else random.randint(erosive_h, image_height)

        crop_width = int(crop_height * image_width / image_height)

        h_start = random.random()
        w_start = random.random()

        crop_shape = (crop_height, crop_width)

        return fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]

        if len(data["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
            crop_coords = self._get_coords_no_bbox(image_shape)
            return {"crop_coords": crop_coords}

        bbox_union = union_of_bboxes(bboxes=data["bboxes"], erosion_rate=self.erosion_rate)

        if bbox_union is None:
            crop_coords = self._get_coords_no_bbox(image_shape)
            return {"crop_coords": crop_coords}

        x_min, y_min, x_max, y_max = bbox_union

        x_min = np.clip(x_min, 0, 1)
        y_min = np.clip(y_min, 0, 1)
        x_max = np.clip(x_max, x_min, 1)
        y_max = np.clip(y_max, y_min, 1)

        image_height, image_width = image_shape

        crop_x_min = int(x_min * random.random() * image_width)
        crop_y_min = int(y_min * random.random() * image_height)

        bbox_xmax = x_max + (1 - x_max) * random.random()
        bbox_ymax = y_max + (1 - y_max) * random.random()
        crop_x_max = int(bbox_xmax * image_width)
        crop_y_max = int(bbox_ymax * image_height)

        return {"crop_coords": (crop_x_min, crop_y_min, crop_x_max, crop_y_max)}

    @property
    def targets_as_params(self) -> list[str]:
        return ["bboxes"]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("erosion_rate",)
targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    erosion_rate: float = Field(
        default=0.0,
        ge=0.0,
        le=1.0,
        description="Erosion rate applied on input image height before crop.",
    )
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_shape = params["shape"][:2]

    if len(data["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
        crop_coords = self._get_coords_no_bbox(image_shape)
        return {"crop_coords": crop_coords}

    bbox_union = union_of_bboxes(bboxes=data["bboxes"], erosion_rate=self.erosion_rate)

    if bbox_union is None:
        crop_coords = self._get_coords_no_bbox(image_shape)
        return {"crop_coords": crop_coords}

    x_min, y_min, x_max, y_max = bbox_union

    x_min = np.clip(x_min, 0, 1)
    y_min = np.clip(y_min, 0, 1)
    x_max = np.clip(x_max, x_min, 1)
    y_max = np.clip(y_max, y_min, 1)

    image_height, image_width = image_shape

    crop_x_min = int(x_min * random.random() * image_width)
    crop_y_min = int(y_min * random.random() * image_height)

    bbox_xmax = x_max + (1 - x_max) * random.random()
    bbox_ymax = y_max + (1 - y_max) * random.random()
    crop_x_max = int(bbox_xmax * image_width)
    crop_y_max = int(bbox_ymax * image_height)

    return {"crop_coords": (crop_x_min, crop_y_min, crop_x_max, crop_y_max)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("erosion_rate",)
class BaseRandomSizedCropInitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class BaseRandomSizedCropInitSchema(BaseTransformInitSchema):
    size: tuple[int, int]

    @field_validator("size")
    @classmethod
    def check_size(cls, value: tuple[int, int]) -> tuple[int, int]:
        if any(x <= 0 for x in value):
            raise ValueError("All elements of 'size' must be positive integers.")
        return value
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class CenterCrop (height, width, p=1.0, always_apply=None) [view source on GitHub]

Crop the central part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CenterCrop(_BaseCrop):
    """Crop the central part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(CropInitSchema):
        pass

    def __init__(self, height: int, width: int, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.height = height
        self.width = width

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width"

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]
        crop_coords = fcrops.get_center_crop_coords(image_shape, (self.height, self.width))

        return {"crop_coords": crop_coords}
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(CropInitSchema):
    pass
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_shape = params["shape"][:2]
    crop_coords = fcrops.get_center_crop_coords(image_shape, (self.height, self.width))

    return {"crop_coords": crop_coords}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width"
class Crop (x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=None, p=1.0) [view source on GitHub]

Crop region from image.

Parameters:

Name Type Description
x_min int

Minimum upper left x coordinate.

y_min int

Minimum upper left y coordinate.

x_max int

Maximum lower right x coordinate.

y_max int

Maximum lower right y coordinate.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class Crop(_BaseCrop):
    """Crop region from image.

    Args:
        x_min: Minimum upper left x coordinate.
        y_min: Minimum upper left y coordinate.
        x_max: Maximum lower right x coordinate.
        y_max: Maximum lower right y coordinate.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        x_min: Annotated[int, Field(ge=0, description="Minimum upper left x coordinate")]
        y_min: Annotated[int, Field(ge=0, description="Minimum upper left y coordinate")]
        x_max: Annotated[int, Field(gt=0, description="Maximum lower right x coordinate")]
        y_max: Annotated[int, Field(gt=0, description="Maximum lower right y coordinate")]
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_coordinates(self) -> Self:
            if not self.x_min < self.x_max:
                msg = "x_max must be greater than x_min"
                raise ValueError(msg)
            if not self.y_min < self.y_max:
                msg = "y_max must be greater than y_min"
                raise ValueError(msg)
            return self

    def __init__(
        self,
        x_min: int = 0,
        y_min: int = 0,
        x_max: int = 1024,
        y_max: int = 1024,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "x_min", "y_min", "x_max", "y_max"

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        return {"crop_coords": (self.x_min, self.y_min, self.x_max, self.y_max)}
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    x_min: Annotated[int, Field(ge=0, description="Minimum upper left x coordinate")]
    y_min: Annotated[int, Field(ge=0, description="Minimum upper left y coordinate")]
    x_max: Annotated[int, Field(gt=0, description="Maximum lower right x coordinate")]
    y_max: Annotated[int, Field(gt=0, description="Maximum lower right y coordinate")]
    p: ProbabilityType = 1

    @model_validator(mode="after")
    def validate_coordinates(self) -> Self:
        if not self.x_min < self.x_max:
            msg = "x_max must be greater than x_min"
            raise ValueError(msg)
        if not self.y_min < self.y_max:
            msg = "y_max must be greater than y_min"
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    return {"crop_coords": (self.x_min, self.y_min, self.x_max, self.y_max)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "x_min", "y_min", "x_max", "y_max"
class CropAndPad (px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=None, p=1.0) [view source on GitHub]

Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e., extracts a subimage from a given full image). Padding adds pixels to the sides (e.g., black pixels). This transformation will never crop images below a height or width of 1.

Note

This transformation automatically resizes images back to their original size. To deactivate this, add the parameter keep_size=False.

Parameters:

Name Type Description
px int, tuple[int, int], tuple[int, int, int, int], tuple[Union[int, tuple[int, int], list[int]], Union[int, tuple[int, int], list[int]], Union[int, tuple[int, int], list[int]], Union[int, tuple[int, int], list[int]]]

The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter percent may be set, not both at the same time.

* If `None`, then pixel-based cropping/padding will not be used.
* If `int`, then that exact number of pixels will always be cropped/padded.
* If a `tuple` of two `int`s with values `a` and `b`, then each side will be cropped/padded by a
    random amount sampled uniformly per image and side from the interval `[a, b]`.
    If `sample_independently` is set to `False`, only one value will be sampled per
        image and used for all sides.
* If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
    Each entry may be:
    - A single `int` (always crop/pad by exactly that value).
    - A `tuple` of two `int`s `a` and `b` (crop/pad by an amount within `[a, b]`).
    - A `list` of `int`s (crop/pad by a random value that is contained in the `list`).
percent float, tuple[float, float], tuple[float, float, float, float], tuple[Union[float, tuple[float, float], list[float]], Union[float, tuple[float, float], list[float]], Union[float, tuple[float, float], list[float]], Union[float, tuple[float, float], list[float]]]

The number of pixels to crop (negative values) or pad (positive values) on each side of the image given as a fraction of the image height/width. E.g. if this is set to -0.1, the transformation will always crop away 10% of the image's height at both the top and the bottom (both 10% each), as well as 10% of the width at the right and left. Expected value range is (-1.0, inf). Either this or the parameter px may be set, not both at the same time.

* If `None`, then fraction-based cropping/padding will not be used.
* If `float`, then that fraction will always be cropped/padded.
* If a `tuple` of two `float`s with values `a` and `b`, then each side will be cropped/padded by a
random fraction sampled uniformly per image and side from the interval `[a, b]`.
If `sample_independently` is set to `False`, only one value will be sampled per image and used
for all sides.
* If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
    Each entry may be:
    - A single `float` (always crop/pad by exactly that percent value).
    - A `tuple` of two `float`s `a` and `b` (crop/pad by a fraction from `[a, b]`).
    - A `list` of `float`s (crop/pad by a random value that is contained in the `list`).
pad_mode int

OpenCV border mode.

pad_cval Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]

The constant value to use if the pad mode is BORDER_CONSTANT. * If number, then that value will be used. * If a tuple of two numbers and at least one of them is a float, then a random number will be uniformly sampled per image from the continuous interval [a, b] and used as the value. If both numbers are ints, the interval is discrete. * If a list of numbers, then a random value will be chosen from the elements of the list and used as the value.

pad_cval_mask Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]

Same as pad_cval but only for masks.

keep_size bool

After cropping and padding, the resulting image will usually have a different height/width compared to the original input image. If this parameter is set to True, then the cropped/padded image will be resized to the input image's size, i.e., the output shape is always identical to the input shape.

sample_independently bool

If False and the values for px/percent result in exactly one probability distribution for all image sides, only one single value will be sampled from that probability distribution and used for all sides. I.e., the crop/pad amount then is the same for all sides. If True, four values will be sampled independently, one per side.

interpolation int

OpenCV flag that is used to specify the interpolation algorithm for images. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

Targets

image, mask, bboxes, keypoints

Image types: unit8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropAndPad(DualTransform):
    """Crop and pad images by pixel amounts or fractions of image sizes.
    Cropping removes pixels at the sides (i.e., extracts a subimage from a given full image).
    Padding adds pixels to the sides (e.g., black pixels).
    This transformation will never crop images below a height or width of 1.

    Note:
        This transformation automatically resizes images back to their original size. To deactivate this, add the
        parameter `keep_size=False`.

    Args:
        px (int,
            tuple[int, int],
            tuple[int, int, int, int],
            tuple[Union[int, tuple[int, int], list[int]],
                  Union[int, tuple[int, int], list[int]],
                  Union[int, tuple[int, int], list[int]],
                  Union[int, tuple[int, int], list[int]]]):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image.
                Either this or the parameter `percent` may be set, not both at the same time.

                * If `None`, then pixel-based cropping/padding will not be used.
                * If `int`, then that exact number of pixels will always be cropped/padded.
                * If a `tuple` of two `int`s with values `a` and `b`, then each side will be cropped/padded by a
                    random amount sampled uniformly per image and side from the interval `[a, b]`.
                    If `sample_independently` is set to `False`, only one value will be sampled per
                        image and used for all sides.
                * If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
                    Each entry may be:
                    - A single `int` (always crop/pad by exactly that value).
                    - A `tuple` of two `int`s `a` and `b` (crop/pad by an amount within `[a, b]`).
                    - A `list` of `int`s (crop/pad by a random value that is contained in the `list`).

        percent (float,
                 tuple[float, float],
                 tuple[float, float, float, float],
                 tuple[Union[float, tuple[float, float], list[float]],
                       Union[float, tuple[float, float], list[float]],
                       Union[float, tuple[float, float], list[float]],
                       Union[float, tuple[float, float], list[float]]]):
            The number of pixels to crop (negative values) or pad (positive values) on each side of the image given
                as a *fraction* of the image height/width. E.g. if this is set to `-0.1`, the transformation will
                always crop away `10%` of the image's height at both the top and the bottom (both `10%` each),
                as well as `10%` of the width at the right and left. Expected value range is `(-1.0, inf)`.
                Either this or the parameter `px` may be set, not both at the same time.

                * If `None`, then fraction-based cropping/padding will not be used.
                * If `float`, then that fraction will always be cropped/padded.
                * If a `tuple` of two `float`s with values `a` and `b`, then each side will be cropped/padded by a
                random fraction sampled uniformly per image and side from the interval `[a, b]`.
                If `sample_independently` is set to `False`, only one value will be sampled per image and used
                for all sides.
                * If a `tuple` of four entries, then the entries represent top, right, bottom, and left.
                    Each entry may be:
                    - A single `float` (always crop/pad by exactly that percent value).
                    - A `tuple` of two `float`s `a` and `b` (crop/pad by a fraction from `[a, b]`).
                    - A `list` of `float`s (crop/pad by a random value that is contained in the `list`).

        pad_mode (int): OpenCV border mode.
        pad_cval (Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]):
            The constant value to use if the pad mode is `BORDER_CONSTANT`.
                * If `number`, then that value will be used.
                * If a `tuple` of two numbers and at least one of them is a `float`, then a random number
                    will be uniformly sampled per image from the continuous interval `[a, b]` and used as the value.
                    If both numbers are `int`s, the interval is discrete.
                * If a `list` of numbers, then a random value will be chosen from the elements of the `list` and
                    used as the value.

        pad_cval_mask (Union[int, float, tuple[Union[int, float], Union[int, float]], list[Union[int, float]]]):
            Same as `pad_cval` but only for masks.

        keep_size (bool):
            After cropping and padding, the resulting image will usually have a different height/width compared to
            the original input image. If this parameter is set to `True`, then the cropped/padded image will be
            resized to the input image's size, i.e., the output shape is always identical to the input shape.

        sample_independently (bool):
            If `False` and the values for `px`/`percent` result in exactly one probability distribution for all
            image sides, only one single value will be sampled from that probability distribution and used for
            all sides. I.e., the crop/pad amount then is the same for all sides. If `True`, four values
            will be sampled independently, one per side.

        interpolation (int):
            OpenCV flag that is used to specify the interpolation algorithm for images. Should be one of:
            `cv2.INTER_NEAREST`, `cv2.INTER_LINEAR`, `cv2.INTER_CUBIC`, `cv2.INTER_AREA`, `cv2.INTER_LANCZOS4`.
            Default: `cv2.INTER_LINEAR`.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        unit8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        px: PxType | None = Field(
            default=None,
            description="Number of pixels to crop (negative) or pad (positive).",
        )
        percent: PercentType | None = Field(
            default=None,
            description="Fraction of image size to crop (negative) or pad (positive).",
        )
        pad_mode: BorderModeType = cv2.BORDER_CONSTANT
        pad_cval: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = Field(
            default=0,
            description="Padding value if pad_mode is BORDER_CONSTANT.",
        )
        pad_cval_mask: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = Field(
            default=0,
            description="Padding value for masks if pad_mode is BORDER_CONSTANT.",
        )
        keep_size: bool = Field(
            default=True,
            description="Whether to resize the image back to the original size after cropping and padding.",
        )
        sample_independently: bool = Field(
            default=True,
            description="Whether to sample the crop/pad size independently for each side.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def check_px_percent(self) -> Self:
            if self.px is None and self.percent is None:
                msg = "Both px and percent parameters cannot be None simultaneously."
                raise ValueError(msg)
            if self.px is not None and self.percent is not None:
                msg = "Only px or percent may be set!"
                raise ValueError(msg)
            return self

    def __init__(
        self,
        px: int | list[int] | None = None,
        percent: float | list[float] | None = None,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_cval: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = 0,
        pad_cval_mask: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = 0,
        keep_size: bool = True,
        sample_independently: bool = True,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.px = px
        self.percent = percent

        self.pad_mode = pad_mode
        self.pad_cval = pad_cval
        self.pad_cval_mask = pad_cval_mask

        self.keep_size = keep_size
        self.sample_independently = sample_independently

        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        pad_value: ColorType,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            img,
            crop_params,
            pad_params,
            pad_value,
            params["shape"][:2],
            interpolation,
            self.pad_mode,
            self.keep_size,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        crop_params: Sequence[int],
        pad_params: Sequence[int],
        pad_value_mask: float,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad(
            mask,
            crop_params,
            pad_params,
            pad_value_mask,
            params["shape"][:2],
            interpolation,
            self.pad_mode,
            self.keep_size,
        )

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        crop_params: tuple[int, int, int, int],
        pad_params: tuple[int, int, int, int],
        result_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad_bboxes(bboxes, crop_params, pad_params, params["shape"][:2], result_shape)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        crop_params: tuple[int, int, int, int],
        pad_params: tuple[int, int, int, int],
        result_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fcrops.crop_and_pad_keypoints(
            keypoints,
            crop_params,
            pad_params,
            params["shape"][:2],
            result_shape,
            self.keep_size,
        )

    @staticmethod
    def __prevent_zero(val1: int, val2: int, max_val: int) -> tuple[int, int]:
        regain = abs(max_val) + 1
        regain1 = regain // 2
        regain2 = regain // 2
        if regain1 + regain2 < regain:
            regain1 += 1

        if regain1 > val1:
            diff = regain1 - val1
            regain1 = val1
            regain2 += diff
        elif regain2 > val2:
            diff = regain2 - val2
            regain2 = val2
            regain1 += diff

        return val1 - regain1, val2 - regain2

    @staticmethod
    def _prevent_zero(crop_params: list[int], height: int, width: int) -> list[int]:
        top, right, bottom, left = crop_params

        remaining_height = height - (top + bottom)
        remaining_width = width - (left + right)

        if remaining_height < 1:
            top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
        if remaining_width < 1:
            left, right = CropAndPad.__prevent_zero(left, right, width)

        return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        if self.px is not None:
            new_params = self._get_px_params()
        else:
            percent_params = self._get_percent_params()
            new_params = [
                int(percent_params[0] * height),
                int(percent_params[1] * width),
                int(percent_params[2] * height),
                int(percent_params[3] * width),
            ]

        pad_params = [max(i, 0) for i in new_params]

        crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

        top, right, bottom, left = crop_params
        crop_params = [left, top, width - right, height - bottom]
        result_rows = crop_params[3] - crop_params[1]
        result_cols = crop_params[2] - crop_params[0]
        if result_cols == width and result_rows == height:
            crop_params = []

        top, right, bottom, left = pad_params
        pad_params = [top, bottom, left, right]
        if any(pad_params):
            result_rows += top + bottom
            result_cols += left + right
        else:
            pad_params = []

        return {
            "crop_params": crop_params or None,
            "pad_params": pad_params or None,
            "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
            "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
            "result_shape": (result_rows, result_cols),
        }

    def _get_px_params(self) -> list[int]:
        if self.px is None:
            msg = "px is not set"
            raise ValueError(msg)

        if isinstance(self.px, int):
            params = [self.px] * 4
        elif len(self.px) == PAIR:
            if self.sample_independently:
                params = [random.randrange(*self.px) for _ in range(4)]
            else:
                px = random.randrange(*self.px)
                params = [px] * 4
        elif isinstance(self.px[0], int):
            params = self.px
        elif len(self.px[0]) == PAIR:
            params = [random.randrange(*i) for i in self.px]
        else:
            params = [random.choice(i) for i in self.px]

        return params

    def _get_percent_params(self) -> list[float]:
        if self.percent is None:
            msg = "percent is not set"
            raise ValueError(msg)

        if isinstance(self.percent, float):
            params = [self.percent] * 4
        elif len(self.percent) == PAIR:
            if self.sample_independently:
                params = [random.uniform(*self.percent) for _ in range(4)]
            else:
                px = random.uniform(*self.percent)
                params = [px] * 4
        elif isinstance(self.percent[0], (int, float)):
            params = self.percent
        elif len(self.percent[0]) == PAIR:
            params = [random.uniform(*i) for i in self.percent]
        else:
            params = [random.choice(i) for i in self.percent]

        return params  # params = [top, right, bottom, left]

    @staticmethod
    def _get_pad_value(
        pad_value: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType],
    ) -> ScalarType:
        if isinstance(pad_value, (int, float)):
            return pad_value

        if len(pad_value) == PAIR:
            a, b = pad_value
            if isinstance(a, int) and isinstance(b, int):
                return random.randint(a, b)

            return random.uniform(a, b)

        return random.choice(pad_value)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "px",
            "percent",
            "pad_mode",
            "pad_cval",
            "pad_cval_mask",
            "keep_size",
            "sample_independently",
            "interpolation",
        )
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    px: PxType | None = Field(
        default=None,
        description="Number of pixels to crop (negative) or pad (positive).",
    )
    percent: PercentType | None = Field(
        default=None,
        description="Fraction of image size to crop (negative) or pad (positive).",
    )
    pad_mode: BorderModeType = cv2.BORDER_CONSTANT
    pad_cval: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = Field(
        default=0,
        description="Padding value if pad_mode is BORDER_CONSTANT.",
    )
    pad_cval_mask: ScalarType | tuple[ScalarType, ScalarType] | list[ScalarType] = Field(
        default=0,
        description="Padding value for masks if pad_mode is BORDER_CONSTANT.",
    )
    keep_size: bool = Field(
        default=True,
        description="Whether to resize the image back to the original size after cropping and padding.",
    )
    sample_independently: bool = Field(
        default=True,
        description="Whether to sample the crop/pad size independently for each side.",
    )
    interpolation: InterpolationType = cv2.INTER_LINEAR
    p: ProbabilityType = 1

    @model_validator(mode="after")
    def check_px_percent(self) -> Self:
        if self.px is None and self.percent is None:
            msg = "Both px and percent parameters cannot be None simultaneously."
            raise ValueError(msg)
        if self.px is not None and self.percent is not None:
            msg = "Only px or percent may be set!"
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, crop_params, pad_params, pad_value, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    crop_params: Sequence[int],
    pad_params: Sequence[int],
    pad_value: ColorType,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fcrops.crop_and_pad(
        img,
        crop_params,
        pad_params,
        pad_value,
        params["shape"][:2],
        interpolation,
        self.pad_mode,
        self.keep_size,
    )
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    if self.px is not None:
        new_params = self._get_px_params()
    else:
        percent_params = self._get_percent_params()
        new_params = [
            int(percent_params[0] * height),
            int(percent_params[1] * width),
            int(percent_params[2] * height),
            int(percent_params[3] * width),
        ]

    pad_params = [max(i, 0) for i in new_params]

    crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)

    top, right, bottom, left = crop_params
    crop_params = [left, top, width - right, height - bottom]
    result_rows = crop_params[3] - crop_params[1]
    result_cols = crop_params[2] - crop_params[0]
    if result_cols == width and result_rows == height:
        crop_params = []

    top, right, bottom, left = pad_params
    pad_params = [top, bottom, left, right]
    if any(pad_params):
        result_rows += top + bottom
        result_cols += left + right
    else:
        pad_params = []

    return {
        "crop_params": crop_params or None,
        "pad_params": pad_params or None,
        "pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
        "pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
        "result_shape": (result_rows, result_cols),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "px",
        "percent",
        "pad_mode",
        "pad_cval",
        "pad_cval_mask",
        "keep_size",
        "sample_independently",
        "interpolation",
    )
class CropInitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropInitSchema(BaseTransformInitSchema):
    height: int | None = Field(description="Height of the crop", ge=1)
    width: int | None = Field(description="Width of the crop", ge=1)
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class CropNonEmptyMaskIfExists (height, width, ignore_values=None, ignore_channels=None, always_apply=None, p=1.0) [view source on GitHub]

Crop area with mask if mask is non-empty, else make random crop.

Parameters:

Name Type Description
height int

vertical size of crop in pixels

width int

horizontal size of crop in pixels

ignore_values list of int

values to ignore in mask, 0 values are always ignored (e.g. if background value is 5 set ignore_values=[5] to ignore)

ignore_channels list of int

channels to ignore in mask (e.g. if background is a first channel set ignore_channels=[0] to ignore)

p float

probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class CropNonEmptyMaskIfExists(_BaseCrop):
    """Crop area with mask if mask is non-empty, else make random crop.

    Args:
        height: vertical size of crop in pixels
        width: horizontal size of crop in pixels
        ignore_values (list of int): values to ignore in mask, `0` values are always ignored
            (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
        ignore_channels (list of int): channels to ignore in mask
            (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(CropInitSchema):
        ignore_values: list[int] | None = Field(
            default=None,
            description="Values to ignore in mask, `0` values are always ignored",
        )
        ignore_channels: list[int] | None = Field(default=None, description="Channels to ignore in mask")

    def __init__(
        self,
        height: int,
        width: int,
        ignore_values: list[int] | None = None,
        ignore_channels: list[int] | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)

        self.height = height
        self.width = width
        self.ignore_values = ignore_values
        self.ignore_channels = ignore_channels

    def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
        mask_height, mask_width = mask.shape[:2]

        if self.ignore_values is not None:
            ignore_values_np = np.array(self.ignore_values)
            mask = np.where(np.isin(mask, ignore_values_np), 0, mask)

        if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS and self.ignore_channels is not None:
            target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
            mask = np.take(mask, target_channels, axis=-1)

        if self.height > mask_height or self.width > mask_width:
            raise ValueError(
                f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})",
            )

        return mask

    def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
        super().update_params(params, **kwargs)
        if "mask" in kwargs:
            mask = self._preprocess_mask(kwargs["mask"])
        elif "masks" in kwargs and len(kwargs["masks"]):
            masks = kwargs["masks"]
            mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
            for m in masks[1:]:
                mask |= self._preprocess_mask(m)
        else:
            msg = "Can not find mask for CropNonEmptyMaskIfExists"
            raise RuntimeError(msg)

        mask_height, mask_width = mask.shape[:2]

        if mask.any():
            mask = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
            non_zero_yx = np.argwhere(mask)
            y, x = random.choice(non_zero_yx)
            x_min = x - random.randint(0, self.width - 1)
            y_min = y - random.randint(0, self.height - 1)
            x_min = np.clip(x_min, 0, mask_width - self.width)
            y_min = np.clip(y_min, 0, mask_height - self.height)
        else:
            x_min = random.randint(0, mask_width - self.width)
            y_min = random.randint(0, mask_height - self.height)

        x_max = x_min + self.width
        y_max = y_min + self.height

        crop_coords = x_min, y_min, x_max, y_max

        params["crop_coords"] = crop_coords
        return params

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width", "ignore_values", "ignore_channels"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(CropInitSchema):
    ignore_values: list[int] | None = Field(
        default=None,
        description="Values to ignore in mask, `0` values are always ignored",
    )
    ignore_channels: list[int] | None = Field(default=None, description="Channels to ignore in mask")
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width", "ignore_values", "ignore_channels"
update_params (self, params, **kwargs)

Update parameters with transform specific params. This method is deprecated, use: - get_params for transform specific params like interpolation and - update_params_shape for data like shape.

Source code in albumentations/augmentations/crops/transforms.py
Python
def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
    super().update_params(params, **kwargs)
    if "mask" in kwargs:
        mask = self._preprocess_mask(kwargs["mask"])
    elif "masks" in kwargs and len(kwargs["masks"]):
        masks = kwargs["masks"]
        mask = self._preprocess_mask(np.copy(masks[0]))  # need copy as we perform in-place mod afterwards
        for m in masks[1:]:
            mask |= self._preprocess_mask(m)
    else:
        msg = "Can not find mask for CropNonEmptyMaskIfExists"
        raise RuntimeError(msg)

    mask_height, mask_width = mask.shape[:2]

    if mask.any():
        mask = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
        non_zero_yx = np.argwhere(mask)
        y, x = random.choice(non_zero_yx)
        x_min = x - random.randint(0, self.width - 1)
        y_min = y - random.randint(0, self.height - 1)
        x_min = np.clip(x_min, 0, mask_width - self.width)
        y_min = np.clip(y_min, 0, mask_height - self.height)
    else:
        x_min = random.randint(0, mask_width - self.width)
        y_min = random.randint(0, mask_height - self.height)

    x_max = x_min + self.width
    y_max = y_min + self.height

    crop_coords = x_min, y_min, x_max, y_max

    params["crop_coords"] = crop_coords
    return params
class RandomCrop (height, width, p=1.0, always_apply=None) [view source on GitHub]

Crop a random part of the input.

Parameters:

Name Type Description
height int

height of the crop.

width int

width of the crop.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCrop(_BaseCrop):
    """Crop a random part of the input.

    Args:
        height: height of the crop.
        width: width of the crop.
        p: probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class InitSchema(CropInitSchema):
        pass

    def __init__(self, height: int, width: int, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.height = height
        self.width = width

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]

        image_height, image_width = image_shape

        if self.height > image_height or self.width > image_width:
            raise CropSizeError(
                f"Crop size (height, width) exceeds image dimensions (height, width):"
                f" {(self.height, self.width)} vs {image_shape[:2]}",
            )

        h_start = random.random()
        w_start = random.random()
        crop_coords = fcrops.get_crop_coords(image_shape, (self.height, self.width), h_start, w_start)
        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(CropInitSchema):
    pass
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_shape = params["shape"][:2]

    image_height, image_width = image_shape

    if self.height > image_height or self.width > image_width:
        raise CropSizeError(
            f"Crop size (height, width) exceeds image dimensions (height, width):"
            f" {(self.height, self.width)} vs {image_shape[:2]}",
        )

    h_start = random.random()
    w_start = random.random()
    crop_coords = fcrops.get_crop_coords(image_shape, (self.height, self.width), h_start, w_start)
    return {"crop_coords": crop_coords}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width"
class RandomCropFromBorders (crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=None, p=1.0) [view source on GitHub]

Randomly crops parts of the image from the borders without resizing at the end. The cropped regions are defined as fractions of the original image dimensions, specified for each side of the image (left, right, top, bottom).

Parameters:

Name Type Description
crop_left float

Fraction of the width to randomly crop from the left side. Must be in the range [0.0, 1.0]. Default is 0.1.

crop_right float

Fraction of the width to randomly crop from the right side. Must be in the range [0.0, 1.0]. Default is 0.1.

crop_top float

Fraction of the height to randomly crop from the top side. Must be in the range [0.0, 1.0]. Default is 0.1.

crop_bottom float

Fraction of the height to randomly crop from the bottom side. Must be in the range [0.0, 1.0]. Default is 0.1.

p float

Probability of applying the transform. Default is 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropFromBorders(_BaseCrop):
    """Randomly crops parts of the image from the borders without resizing at the end. The cropped regions are defined
    as fractions of the original image dimensions, specified for each side of the image (left, right, top, bottom).

    Args:
        crop_left (float): Fraction of the width to randomly crop from the left side. Must be in the range [0.0, 1.0].
                            Default is 0.1.
        crop_right (float): Fraction of the width to randomly crop from the right side. Must be in the range [0.0, 1.0].
                            Default is 0.1.
        crop_top (float): Fraction of the height to randomly crop from the top side. Must be in the range [0.0, 1.0].
                          Default is 0.1.
        crop_bottom (float): Fraction of the height to randomly crop from the bottom side.
                             Must be in the range [0.0, 1.0]. Default is 0.1.
        p (float): Probability of applying the transform. Default is 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        crop_left: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of width to randomly crop from the left side.",
        )
        crop_right: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of width to randomly crop from the right side.",
        )
        crop_top: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of height to randomly crop from the top side.",
        )
        crop_bottom: float = Field(
            default=0.1,
            ge=0.0,
            le=1.0,
            description="Fraction of height to randomly crop from the bottom side.",
        )
        p: ProbabilityType = 1

        @model_validator(mode="after")
        def validate_crop_values(self) -> Self:
            if self.crop_left + self.crop_right > 1.0:
                msg = "The sum of crop_left and crop_right must be <= 1."
                raise ValueError(msg)
            if self.crop_top + self.crop_bottom > 1.0:
                msg = "The sum of crop_top and crop_bottom must be <= 1."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        crop_left: float = 0.1,
        crop_right: float = 0.1,
        crop_top: float = 0.1,
        crop_bottom: float = 0.1,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)
        self.crop_left = crop_left
        self.crop_right = crop_right
        self.crop_top = crop_top
        self.crop_bottom = crop_bottom

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        height, width = params["shape"][:2]

        x_min = random.randint(0, int(self.crop_left * width))
        x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width)

        y_min = random.randint(0, int(self.crop_top * height))
        y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height)

        crop_coords = x_min, y_min, x_max, y_max

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "crop_left", "crop_right", "crop_top", "crop_bottom"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    crop_left: float = Field(
        default=0.1,
        ge=0.0,
        le=1.0,
        description="Fraction of width to randomly crop from the left side.",
    )
    crop_right: float = Field(
        default=0.1,
        ge=0.0,
        le=1.0,
        description="Fraction of width to randomly crop from the right side.",
    )
    crop_top: float = Field(
        default=0.1,
        ge=0.0,
        le=1.0,
        description="Fraction of height to randomly crop from the top side.",
    )
    crop_bottom: float = Field(
        default=0.1,
        ge=0.0,
        le=1.0,
        description="Fraction of height to randomly crop from the bottom side.",
    )
    p: ProbabilityType = 1

    @model_validator(mode="after")
    def validate_crop_values(self) -> Self:
        if self.crop_left + self.crop_right > 1.0:
            msg = "The sum of crop_left and crop_right must be <= 1."
            raise ValueError(msg)
        if self.crop_top + self.crop_bottom > 1.0:
            msg = "The sum of crop_top and crop_bottom must be <= 1."
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    height, width = params["shape"][:2]

    x_min = random.randint(0, int(self.crop_left * width))
    x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width)

    y_min = random.randint(0, int(self.crop_top * height))
    y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height)

    crop_coords = x_min, y_min, x_max, y_max

    return {"crop_coords": crop_coords}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "crop_left", "crop_right", "crop_top", "crop_bottom"
class RandomCropNearBBox (max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=None, p=1.0) [view source on GitHub]

Crop bbox from image with random shift by x,y coordinates

Parameters:

Name Type Description
max_part_shift float, (float, float

Max shift in height and width dimensions relative to cropping_bbox dimension. If max_part_shift is a single float, the range will be (0, max_part_shift). Default (0, 0.3).

cropping_bbox_key str

Additional target key for cropping box. Default cropping_bbox.

cropping_box_key str

[Deprecated] Use cropping_bbox_key instead.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Examples:

Python
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
>>>              bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomCropNearBBox(_BaseCrop):
    """Crop bbox from image with random shift by x,y coordinates

    Args:
        max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
            to `cropping_bbox` dimension.
            If max_part_shift is a single float, the range will be (0, max_part_shift).
            Default (0, 0.3).
        cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
        cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
        >>>              bbox_params=BboxParams("pascal_voc"))
        >>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        max_part_shift: ZeroOneRangeType
        cropping_bbox_key: str = Field(default="cropping_bbox", description="Additional target key for cropping box.")
        p: ProbabilityType = 1

    def __init__(
        self,
        max_part_shift: ScaleFloatType = (0, 0.3),
        cropping_bbox_key: str = "cropping_bbox",
        cropping_box_key: str | None = None,  # Deprecated
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        # Check for deprecated parameter and issue warning
        if cropping_box_key is not None:
            warn(
                "The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
                "Use 'cropping_bbox_key' instead.",
                DeprecationWarning,
                stacklevel=2,
            )
            # Ensure the new parameter is used even if the old one is passed
            cropping_bbox_key = cropping_box_key

        self.max_part_shift = cast(Tuple[float, float], max_part_shift)
        self.cropping_bbox_key = cropping_bbox_key

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[float, ...]]:
        bbox = data[self.cropping_bbox_key]

        image_shape = params["shape"][:2]

        bbox = self._clip_bbox(bbox, image_shape)

        h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
        w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

        x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
        x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

        y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
        y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

        crop_coords = self._clip_bbox((x_min, y_min, x_max, y_max), image_shape)

        if crop_coords[0] == crop_coords[2] or crop_coords[1] == crop_coords[3]:
            crop_shape = (bbox[3] - bbox[1], bbox[2] - bbox[0])
            crop_coords = fcrops.get_center_crop_coords(image_shape, crop_shape)

        return {"crop_coords": crop_coords}

    @property
    def targets_as_params(self) -> list[str]:
        return [self.cropping_bbox_key]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_part_shift", "cropping_bbox_key"
targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    max_part_shift: ZeroOneRangeType
    cropping_bbox_key: str = Field(default="cropping_bbox", description="Additional target key for cropping box.")
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[float, ...]]:
    bbox = data[self.cropping_bbox_key]

    image_shape = params["shape"][:2]

    bbox = self._clip_bbox(bbox, image_shape)

    h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
    w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

    x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
    x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

    y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
    y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

    crop_coords = self._clip_bbox((x_min, y_min, x_max, y_max), image_shape)

    if crop_coords[0] == crop_coords[2] or crop_coords[1] == crop_coords[3]:
        crop_shape = (bbox[3] - bbox[1], bbox[2] - bbox[0])
        crop_coords = fcrops.get_center_crop_coords(image_shape, crop_shape)

    return {"crop_coords": crop_coords}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "max_part_shift", "cropping_bbox_key"
class RandomResizedCrop (size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=None, p=1.0) [view source on GitHub]

Torchvision's variant of crop a random part of the input and rescale it to some size.

Parameters:

Name Type Description
size int, int

expected output size of the crop, for each edge. If size is an int instead of sequence like (height, width), a square output size (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

scale float, float

Specifies the lower and upper bounds for the random area of the crop, before resizing. The scale is defined with respect to the area of the original image.

ratio float, float

lower and upper bounds for the random aspect ratio of the crop, before resizing.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomResizedCrop(_BaseRandomSizedCrop):
    """Torchvision's variant of crop a random part of the input and rescale it to some size.

    Args:
        size (int, int): expected output size of the crop, for each edge. If size is an int instead of sequence
            like (height, width), a square output size (size, size) is made. If provided a sequence of length 1,
            it will be interpreted as (size[0], size[0]).
        scale ((float, float)): Specifies the lower and upper bounds for the random area of the crop, before resizing.
            The scale is defined with respect to the area of the original image.
        ratio ((float, float)): lower and upper bounds for the random aspect ratio of the crop, before resizing.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: Annotated[tuple[float, float], AfterValidator(check_01)] = (0.08, 1.0)
        ratio: Annotated[tuple[float, float], AfterValidator(check_0plus)] = (0.75, 1.3333333333333333)
        width: int | None = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        height: int | None = Field(
            None,
            deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
        )
        size: ScaleIntType | None = None
        p: ProbabilityType = 1
        interpolation: InterpolationType = cv2.INTER_LINEAR

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)

            return self

    def __init__(
        self,
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: ScaleIntType | None = None,
        width: int | None = None,
        height: int | None = None,
        *,
        scale: tuple[float, float] = (0.08, 1.0),
        ratio: tuple[float, float] = (0.75, 1.3333333333333333),
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(size=cast(Tuple[int, int], size), interpolation=interpolation, p=p, always_apply=always_apply)
        self.scale = scale
        self.ratio = ratio

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]
        image_height, image_width = image_shape

        area = image_height * image_width

        for _ in range(10):
            target_area = random.uniform(*self.scale) * area
            log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
            aspect_ratio = math.exp(random.uniform(*log_ratio))

            width = int(round(math.sqrt(target_area * aspect_ratio)))
            height = int(round(math.sqrt(target_area / aspect_ratio)))

            if 0 < width <= image_width and 0 < height <= image_height:
                i = random.randint(0, image_height - height)
                j = random.randint(0, image_width - width)

                h_start = i * 1.0 / (image_height - height + 1e-10)
                w_start = j * 1.0 / (image_width - width + 1e-10)

                crop_shape = (height, width)

                crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

                return {"crop_coords": crop_coords}

        # Fallback to central crop
        in_ratio = image_width / image_height
        if in_ratio < min(self.ratio):
            width = image_width
            height = int(round(image_width / min(self.ratio)))
        elif in_ratio > max(self.ratio):
            height = image_height
            width = int(round(height * max(self.ratio)))
        else:  # whole image
            width = image_width
            height = image_height

        i = (image_height - height) // 2
        j = (image_width - width) // 2

        h_start = i * 1.0 / (image_height - height + 1e-10)
        w_start = j * 1.0 / (image_width - width + 1e-10)

        crop_shape = (height, width)

        crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "size", "scale", "ratio", "interpolation"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale: Annotated[tuple[float, float], AfterValidator(check_01)] = (0.08, 1.0)
    ratio: Annotated[tuple[float, float], AfterValidator(check_0plus)] = (0.75, 1.3333333333333333)
    width: int | None = Field(
        None,
        deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
    )
    height: int | None = Field(
        None,
        deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
    )
    size: ScaleIntType | None = None
    p: ProbabilityType = 1
    interpolation: InterpolationType = cv2.INTER_LINEAR

    @model_validator(mode="after")
    def process(self) -> Self:
        if isinstance(self.size, int):
            if isinstance(self.width, int):
                self.size = (self.size, self.width)
            else:
                msg = "If size is an integer, width as integer must be specified."
                raise TypeError(msg)

        if self.size is None:
            if self.height is None or self.width is None:
                message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                raise ValueError(message)
            self.size = (self.height, self.width)

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_shape = params["shape"][:2]
    image_height, image_width = image_shape

    area = image_height * image_width

    for _ in range(10):
        target_area = random.uniform(*self.scale) * area
        log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
        aspect_ratio = math.exp(random.uniform(*log_ratio))

        width = int(round(math.sqrt(target_area * aspect_ratio)))
        height = int(round(math.sqrt(target_area / aspect_ratio)))

        if 0 < width <= image_width and 0 < height <= image_height:
            i = random.randint(0, image_height - height)
            j = random.randint(0, image_width - width)

            h_start = i * 1.0 / (image_height - height + 1e-10)
            w_start = j * 1.0 / (image_width - width + 1e-10)

            crop_shape = (height, width)

            crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

            return {"crop_coords": crop_coords}

    # Fallback to central crop
    in_ratio = image_width / image_height
    if in_ratio < min(self.ratio):
        width = image_width
        height = int(round(image_width / min(self.ratio)))
    elif in_ratio > max(self.ratio):
        height = image_height
        width = int(round(height * max(self.ratio)))
    else:  # whole image
        width = image_width
        height = image_height

    i = (image_height - height) // 2
    j = (image_width - width) // 2

    h_start = i * 1.0 / (image_height - height + 1e-10)
    w_start = j * 1.0 / (image_width - width + 1e-10)

    crop_shape = (height, width)

    crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

    return {"crop_coords": crop_coords}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "size", "scale", "ratio", "interpolation"
class RandomSizedBBoxSafeCrop (height, width, erosion_rate=0.0, interpolation=1, always_apply=None, p=1.0) [view source on GitHub]

Crop a random part of the input and rescale it to some size without loss of bboxes.

Parameters:

Name Type Description
height int

height after crop and resize.

width int

width after crop and resize.

erosion_rate float

erosion rate applied on input image height before crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
    """Crop a random part of the input and rescale it to some size without loss of bboxes.

    Args:
        height: height after crop and resize.
        width: width after crop and resize.
        erosion_rate: erosion rate applied on input image height before crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(CropInitSchema):
        erosion_rate: float = Field(
            default=0.0,
            ge=0.0,
            le=1.0,
            description="Erosion rate applied on input image height before crop.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR

    def __init__(
        self,
        height: int,
        width: int,
        erosion_rate: float = 0.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(erosion_rate=erosion_rate, p=p, always_apply=always_apply)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        crop = fcrops.crop(img, *crop_coords)
        return fgeometric.resize(crop, (self.height, self.width), self.interpolation)

    def apply_to_keypoint(
        self,
        keypoints: np.ndarray,
        crop_coords: tuple[int, int, int, int],
        **params: Any,
    ) -> np.ndarray:
        keypoints = fcrops.crop_keypoints_by_coords(keypoints, crop_coords)

        crop_height = crop_coords[3] - crop_coords[1]
        crop_width = crop_coords[2] - crop_coords[0]

        scale_y = self.height / crop_height
        scale_x = self.width / crop_width
        return fgeometric.keypoints_scale(keypoints, scale_x=scale_x, scale_y=scale_y)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (*super().get_transform_init_args_names(), "height", "width", "interpolation")
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(CropInitSchema):
    erosion_rate: float = Field(
        default=0.0,
        ge=0.0,
        le=1.0,
        description="Erosion rate applied on input image height before crop.",
    )
    interpolation: InterpolationType = cv2.INTER_LINEAR
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, crop_coords, **params)

Apply transform on image.

Source code in albumentations/augmentations/crops/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    crop_coords: tuple[int, int, int, int],
    **params: Any,
) -> np.ndarray:
    crop = fcrops.crop(img, *crop_coords)
    return fgeometric.resize(crop, (self.height, self.width), self.interpolation)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (*super().get_transform_init_args_names(), "height", "width", "interpolation")
class RandomSizedCrop (min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, always_apply=None, p=1.0) [view source on GitHub]

Crop a random portion of the input and rescale it to a specific size.

Parameters:

Name Type Description
min_max_height int, int

crop size limits.

size int, int

target size for the output image, i.e. (height, width) after crop and resize

w2h_ratio float

aspect ratio of crop.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class RandomSizedCrop(_BaseRandomSizedCrop):
    """Crop a random portion of the input and rescale it to a specific size.

    Args:
        min_max_height ((int, int)): crop size limits.
        size ((int, int)): target size for the output image, i.e. (height, width) after crop and resize
        w2h_ratio (float): aspect ratio of crop.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1
        min_max_height: OnePlusIntRangeType
        w2h_ratio: Annotated[float, Field(gt=0, description="Aspect ratio of crop.")]
        width: int | None = Field(
            None,
            deprecated=(
                "Initializing with 'size' as an integer and a separate 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        height: int | None = Field(
            None,
            deprecated=(
                "Initializing with 'height' and 'width' is deprecated. "
                "Please use a tuple (height, width) for the 'size' argument."
            ),
        )
        size: ScaleIntType | None = None

        @model_validator(mode="after")
        def process(self) -> Self:
            if isinstance(self.size, int):
                if isinstance(self.width, int):
                    self.size = (self.size, self.width)
                else:
                    msg = "If size is an integer, width as integer must be specified."
                    raise TypeError(msg)

            if self.size is None:
                if self.height is None or self.width is None:
                    message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                    raise ValueError(message)
                self.size = (self.height, self.width)
            return self

    def __init__(
        self,
        min_max_height: tuple[int, int],
        # NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
        size: ScaleIntType | None = None,
        width: int | None = None,
        height: int | None = None,
        *,
        w2h_ratio: float = 1.0,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(size=cast(Tuple[int, int], size), interpolation=interpolation, p=p, always_apply=always_apply)
        self.min_max_height = min_max_height
        self.w2h_ratio = w2h_ratio

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, tuple[int, int, int, int]]:
        image_shape = params["shape"][:2]

        crop_height = random.randint(self.min_max_height[0], self.min_max_height[1])
        crop_width = int(crop_height * self.w2h_ratio)

        crop_shape = (crop_height, crop_width)

        h_start = random.random()
        w_start = random.random()

        crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

        return {"crop_coords": crop_coords}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "min_max_height", "size", "w2h_ratio", "interpolation"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/crops/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    interpolation: InterpolationType = cv2.INTER_LINEAR
    p: ProbabilityType = 1
    min_max_height: OnePlusIntRangeType
    w2h_ratio: Annotated[float, Field(gt=0, description="Aspect ratio of crop.")]
    width: int | None = Field(
        None,
        deprecated=(
            "Initializing with 'size' as an integer and a separate 'width' is deprecated. "
            "Please use a tuple (height, width) for the 'size' argument."
        ),
    )
    height: int | None = Field(
        None,
        deprecated=(
            "Initializing with 'height' and 'width' is deprecated. "
            "Please use a tuple (height, width) for the 'size' argument."
        ),
    )
    size: ScaleIntType | None = None

    @model_validator(mode="after")
    def process(self) -> Self:
        if isinstance(self.size, int):
            if isinstance(self.width, int):
                self.size = (self.size, self.width)
            else:
                msg = "If size is an integer, width as integer must be specified."
                raise TypeError(msg)

        if self.size is None:
            if self.height is None or self.width is None:
                message = "If 'size' is not provided, both 'height' and 'width' must be specified."
                raise ValueError(message)
            self.size = (self.height, self.width)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
    image_shape = params["shape"][:2]

    crop_height = random.randint(self.min_max_height[0], self.min_max_height[1])
    crop_width = int(crop_height * self.w2h_ratio)

    crop_shape = (crop_height, crop_width)

    h_start = random.random()
    w_start = random.random()

    crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)

    return {"crop_coords": crop_coords}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/crops/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "min_max_height", "size", "w2h_ratio", "interpolation"

domain_adaptation

class FDA (reference_images, beta_limit=(0, 0.1), read_fn=<function read_rgb_image at 0x7f381bfe0d60>, always_apply=None, p=0.5) [view source on GitHub]

Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source and target datasets, effectively adapting images from one domain to closely resemble those from another without altering their semantic content.

This transform is particularly beneficial in scenarios where the training (source) and testing (target) images come from different distributions, such as synthetic versus real images, or day versus night scenes. Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain alignment by swapping low-frequency components of the Fourier transform between the source and target images. This technique has shown to improve the performance of models on the target domain, particularly for tasks like semantic segmentation, without additional training for domain invariance.

The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more of the original image's characteristics and higher values leading to more pronounced adaptation effects. It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

Parameters:

Name Type Description
reference_images Sequence[Any]

Sequence of objects to be converted into images by read_fn. This typically involves paths to images that serve as target domain examples for adaptation.

beta_limit tuple[float, float] | float

Coefficient beta from the paper, controlling the swapping extent of frequency components. If one value is provided beta will be sampled from uniform distribution [0, beta_limit]. Values should be less than 0.5.

read_fn Callable

User-defined function for reading images. It takes an element from reference_images and returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a numpy array.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)

Note

FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target domain samples are unavailable. It enables significant improvements in model generalization by aligning the low-level statistics of source and target images through a simple yet effective Fourier-based method.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class FDA(ImageOnlyTransform):
    """Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation
    (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source
    and target datasets, effectively adapting images from one domain to closely resemble those from another without
    altering their semantic content.

    This transform is particularly beneficial in scenarios where the training (source) and testing (target) images
    come from different distributions, such as synthetic versus real images, or day versus night scenes.
    Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain
    alignment by swapping low-frequency components of the Fourier transform between the source and target images.
    This technique has shown to improve the performance of models on the target domain, particularly for tasks
    like semantic segmentation, without additional training for domain invariance.

    The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more
    of the original image's characteristics and higher values leading to more pronounced adaptation effects.
    It is recommended to use beta values less than 0.3 to avoid introducing artifacts.

    Args:
        reference_images (Sequence[Any]): Sequence of objects to be converted into images by `read_fn`. This typically
            involves paths to images that serve as target domain examples for adaptation.
        beta_limit (tuple[float, float] | float): Coefficient beta from the paper, controlling the swapping extent of
            frequency components. If one value is provided beta will be sampled from uniform
            distribution [0, beta_limit]. Values should be less than 0.5.
        read_fn (Callable): User-defined function for reading images. It takes an element from `reference_images` and
            returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a
            numpy array.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        - https://github.com/YanchaoYang/FDA
        - https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)

    Note:
        FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target
        domain samples are unavailable. It enables significant improvements in model generalization by aligning
        the low-level statistics of source and target images through a simple yet effective Fourier-based method.
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        read_fn: Callable[[Any], np.ndarray]
        beta_limit: ZeroOneRangeType

        @field_validator("beta_limit")
        @classmethod
        def check_ranges(cls, value: tuple[float, float]) -> tuple[float, float]:
            bounds = 0, MAX_BETA_LIMIT
            if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
                raise ValueError(f"Values should be in the range {bounds} got {value} ")
            return value

    def __init__(
        self,
        reference_images: Sequence[Any],
        beta_limit: ScaleFloatType = (0, 0.1),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.beta_limit = cast(Tuple[float, float], beta_limit)

    def apply(
        self,
        img: np.ndarray,
        target_image: np.ndarray,
        beta: float,
        **params: Any,
    ) -> np.ndarray:
        return fourier_domain_adaptation(img, target_image, beta)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        target_img = self.read_fn(random.choice(self.reference_images))
        target_img = cv2.resize(target_img, dsize=(params["cols"], params["rows"]))

        return {"target_image": target_img}

    def get_params(self) -> dict[str, float]:
        return {"beta": random.uniform(*self.beta_limit)}

    def get_transform_init_args_names(self) -> tuple[str, str, str]:
        return "reference_images", "beta_limit", "read_fn"

    def to_dict_private(self) -> dict[str, Any]:
        msg = "FDA can not be serialized."
        raise NotImplementedError(msg)
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class InitSchema(BaseTransformInitSchema):
    reference_images: Sequence[Any]
    read_fn: Callable[[Any], np.ndarray]
    beta_limit: ZeroOneRangeType

    @field_validator("beta_limit")
    @classmethod
    def check_ranges(cls, value: tuple[float, float]) -> tuple[float, float]:
        bounds = 0, MAX_BETA_LIMIT
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"Values should be in the range {bounds} got {value} ")
        return value
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, target_image, beta, **params)

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def apply(
    self,
    img: np.ndarray,
    target_image: np.ndarray,
    beta: float,
    **params: Any,
) -> np.ndarray:
    return fourier_domain_adaptation(img, target_image, beta)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_params(self) -> dict[str, float]:
    return {"beta": random.uniform(*self.beta_limit)}
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    target_img = self.read_fn(random.choice(self.reference_images))
    target_img = cv2.resize(target_img, dsize=(params["cols"], params["rows"]))

    return {"target_image": target_img}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str]:
    return "reference_images", "beta_limit", "read_fn"

class HistogramMatching (reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f381bfe0d60>, always_apply=None, p=0.5) [view source on GitHub]

Adjust the pixel values of an input image to match the histogram of a reference image.

This transform applies histogram matching, a technique that modifies the distribution of pixel intensities in the input image to closely resemble that of a reference image. This process is performed independently for each channel in multi-channel images, provided both the input and reference images have the same number of channels.

Histogram matching is particularly useful for: - Normalizing images from different sources or captured under varying conditions. - Preparing images for feature matching or other computer vision tasks where consistent tone and contrast are important. - Simulating different lighting or camera conditions in a controlled manner.

Parameters:

Name Type Description
reference_images Sequence[Any]

A sequence of reference image sources. These can be file paths, URLs, or any objects that can be converted to images by the read_fn.

blend_ratio tuple[float, float]

Range for the blending factor between the original and the matched image. Must be two floats between 0 and 1, where: - 0 means no blending (original image is returned) - 1 means full histogram matching A random value within this range is chosen for each application. Default: (0.5, 1.0)

read_fn Callable[[Any], np.ndarray]

A function that takes an element from reference_images and returns a numpy array representing the image. Default: read_rgb_image (reads image file from disk)

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Note

  • This transform cannot be directly serialized due to its dependency on external image data.
  • The effectiveness of the matching depends on the similarity between the input and reference images.
  • For best results, choose reference images that represent the desired tone and contrast.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> reference_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> transform = A.HistogramMatching(
...     reference_images=[reference_image],
...     blend_ratio=(0.5, 1.0),
...     read_fn=lambda x: x,
...     p=1
... )
>>> result = transform(image=image)
>>> matched_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class HistogramMatching(ImageOnlyTransform):
    """Adjust the pixel values of an input image to match the histogram of a reference image.

    This transform applies histogram matching, a technique that modifies the distribution of pixel
    intensities in the input image to closely resemble that of a reference image. This process is
    performed independently for each channel in multi-channel images, provided both the input and
    reference images have the same number of channels.

    Histogram matching is particularly useful for:
    - Normalizing images from different sources or captured under varying conditions.
    - Preparing images for feature matching or other computer vision tasks where consistent
      tone and contrast are important.
    - Simulating different lighting or camera conditions in a controlled manner.

    Args:
        reference_images (Sequence[Any]): A sequence of reference image sources. These can be
            file paths, URLs, or any objects that can be converted to images by the `read_fn`.
        blend_ratio (tuple[float, float]): Range for the blending factor between the original
            and the matched image. Must be two floats between 0 and 1, where:
            - 0 means no blending (original image is returned)
            - 1 means full histogram matching
            A random value within this range is chosen for each application.
            Default: (0.5, 1.0)
        read_fn (Callable[[Any], np.ndarray]): A function that takes an element from
            `reference_images` and returns a numpy array representing the image.
            Default: read_rgb_image (reads image file from disk)
        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - This transform cannot be directly serialized due to its dependency on external image data.
        - The effectiveness of the matching depends on the similarity between the input and reference images.
        - For best results, choose reference images that represent the desired tone and contrast.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> reference_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> transform = A.HistogramMatching(
        ...     reference_images=[reference_image],
        ...     blend_ratio=(0.5, 1.0),
        ...     read_fn=lambda x: x,
        ...     p=1
        ... )
        >>> result = transform(image=image)
        >>> matched_image = result["image"]

    References:
        - Histogram Matching in scikit-image:
          https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        blend_ratio: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
        read_fn: Callable[[Any], np.ndarray]

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: tuple[float, float] = (0.5, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio

    def apply(
        self: np.ndarray,
        img: np.ndarray,
        reference_image: np.ndarray,
        blend_ratio: float,
        **params: Any,
    ) -> np.ndarray:
        return apply_histogram(img, reference_image, blend_ratio)

    def get_params(self) -> dict[str, np.ndarray]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(*self.blend_ratio),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "reference_images", "blend_ratio", "read_fn"

    def to_dict_private(self) -> dict[str, Any]:
        msg = "HistogramMatching can not be serialized."
        raise NotImplementedError(msg)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class InitSchema(BaseTransformInitSchema):
    reference_images: Sequence[Any]
    blend_ratio: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
    read_fn: Callable[[Any], np.ndarray]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, reference_image, blend_ratio, **params)

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def apply(
    self: np.ndarray,
    img: np.ndarray,
    reference_image: np.ndarray,
    blend_ratio: float,
    **params: Any,
) -> np.ndarray:
    return apply_histogram(img, reference_image, blend_ratio)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_params(self) -> dict[str, np.ndarray]:
    return {
        "reference_image": self.read_fn(random.choice(self.reference_images)),
        "blend_ratio": random.uniform(*self.blend_ratio),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "reference_images", "blend_ratio", "read_fn"

class PixelDistributionAdaptation (reference_images, blend_ratio=(0.25, 1.0), read_fn=<function read_rgb_image at 0x7f381bfe0d60>, transform_type='pca', always_apply=None, p=0.5) [view source on GitHub]

Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image with that of a reference image. This process involves fitting a simple statistical transformation (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images, transforming the original image with the transformation trained on it, and then applying the inverse transformation using the transform fitted on the reference image. The result is an adapted image that retains the original content while mimicking the pixel value distribution of the reference domain.

The process can be visualized as two main steps: 1. Adjusting the original image to a standard distribution space using a selected transform. 2. Moving the adjusted image into the distribution space of the reference image by applying the inverse of the transform fitted on the reference image.

This technique is especially useful in scenarios where images from different domains (e.g., synthetic vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in image processing tasks.

Parameters:

Name Type Description
reference_images Sequence[Any]

A sequence of objects (typically image paths) that will be converted into images by read_fn. These images serve as references for the domain adaptation.

blend_ratio tuple[float, float]

Specifies the minimum and maximum blend ratio for mixing the adapted image with the original. This enhances the diversity of the output images. Values should be in the range [0, 1]. Default: (0.25, 1.0)

read_fn Callable

A user-defined function for reading and converting the objects in reference_images into numpy arrays. By default, it assumes these objects are image paths.

transform_type Literal["pca", "standard", "minmax"]

Specifies the type of statistical transformation to apply. - "pca": Principal Component Analysis - "standard": StandardScaler (zero mean and unit variance) - "minmax": MinMaxScaler (scales to a fixed range, usually [0, 1]) Default: "pca"

p float

The probability of applying the transform to any given image. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The effectiveness of the adaptation depends on the similarity between the input and reference domains.
  • PCA transformation may alter color relationships more significantly than other methods.
  • StandardScaler and MinMaxScaler preserve color relationships better but may provide less dramatic adaptations.
  • The blend_ratio parameter allows for a smooth transition between the original and fully adapted image.
  • This transform cannot be directly serialized due to its dependency on external image data.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> reference_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> transform = A.PixelDistributionAdaptation(
...     reference_images=[reference_image],
...     blend_ratio=(0.5, 1.0),
...     transform_type="standard",
...     read_fn=lambda x: x,
...     p=1.0
... )
>>> result = transform(image=image)
>>> adapted_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class PixelDistributionAdaptation(ImageOnlyTransform):
    """Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image
    with that of a reference image. This process involves fitting a simple statistical transformation
    (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images,
    transforming the original image with the transformation trained on it, and then applying the inverse
    transformation using the transform fitted on the reference image. The result is an adapted image
    that retains the original content while mimicking the pixel value distribution of the reference domain.

    The process can be visualized as two main steps:
    1. Adjusting the original image to a standard distribution space using a selected transform.
    2. Moving the adjusted image into the distribution space of the reference image by applying the inverse
       of the transform fitted on the reference image.

    This technique is especially useful in scenarios where images from different domains (e.g., synthetic
    vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in
    image processing tasks.

    Args:
        reference_images (Sequence[Any]): A sequence of objects (typically image paths) that will be
            converted into images by `read_fn`. These images serve as references for the domain adaptation.
        blend_ratio (tuple[float, float]): Specifies the minimum and maximum blend ratio for mixing
            the adapted image with the original. This enhances the diversity of the output images.
            Values should be in the range [0, 1]. Default: (0.25, 1.0)
        read_fn (Callable): A user-defined function for reading and converting the objects in
            `reference_images` into numpy arrays. By default, it assumes these objects are image paths.
        transform_type (Literal["pca", "standard", "minmax"]): Specifies the type of statistical
            transformation to apply.
            - "pca": Principal Component Analysis
            - "standard": StandardScaler (zero mean and unit variance)
            - "minmax": MinMaxScaler (scales to a fixed range, usually [0, 1])
            Default: "pca"
        p (float): The probability of applying the transform to any given image. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The effectiveness of the adaptation depends on the similarity between the input and reference domains.
        - PCA transformation may alter color relationships more significantly than other methods.
        - StandardScaler and MinMaxScaler preserve color relationships better but may provide less dramatic adaptations.
        - The blend_ratio parameter allows for a smooth transition between the original and fully adapted image.
        - This transform cannot be directly serialized due to its dependency on external image data.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> reference_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> transform = A.PixelDistributionAdaptation(
        ...     reference_images=[reference_image],
        ...     blend_ratio=(0.5, 1.0),
        ...     transform_type="standard",
        ...     read_fn=lambda x: x,
        ...     p=1.0
        ... )
        >>> result = transform(image=image)
        >>> adapted_image = result["image"]

    References:
        - https://github.com/arsenyinfo/qudida
        - https://arxiv.org/abs/1911.11483
    """

    class InitSchema(BaseTransformInitSchema):
        reference_images: Sequence[Any]
        blend_ratio: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
        read_fn: Callable[[Any], np.ndarray]
        transform_type: Literal["pca", "standard", "minmax"]

    def __init__(
        self,
        reference_images: Sequence[Any],
        blend_ratio: tuple[float, float] = (0.25, 1.0),
        read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
        transform_type: Literal["pca", "standard", "minmax"] = "pca",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio
        self.transform_type = transform_type

    def apply(self, img: np.ndarray, reference_image: np.ndarray, blend_ratio: float, **params: Any) -> np.ndarray:
        return adapt_pixel_distribution(
            img,
            ref=reference_image,
            weight=blend_ratio,
            transform_type=self.transform_type,
        )

    def get_params(self) -> dict[str, Any]:
        return {
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(*self.blend_ratio),
        }

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "reference_images", "blend_ratio", "read_fn", "transform_type"

    def to_dict_private(self) -> dict[str, Any]:
        msg = "PixelDistributionAdaptation can not be serialized."
        raise NotImplementedError(msg)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/domain_adaptation.py
Python
class InitSchema(BaseTransformInitSchema):
    reference_images: Sequence[Any]
    blend_ratio: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
    read_fn: Callable[[Any], np.ndarray]
    transform_type: Literal["pca", "standard", "minmax"]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, reference_image, blend_ratio, **params)

Apply transform on image.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def apply(self, img: np.ndarray, reference_image: np.ndarray, blend_ratio: float, **params: Any) -> np.ndarray:
    return adapt_pixel_distribution(
        img,
        ref=reference_image,
        weight=blend_ratio,
        transform_type=self.transform_type,
    )
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_params(self) -> dict[str, Any]:
    return {
        "reference_image": self.read_fn(random.choice(self.reference_images)),
        "blend_ratio": random.uniform(*self.blend_ratio),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/domain_adaptation.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return "reference_images", "blend_ratio", "read_fn", "transform_type"

domain_adaptation_functional

def apply_histogram (img, reference_image, blend_ratio) [view source on GitHub]

Apply histogram matching to an input image using a reference image and blend the result.

This function performs histogram matching between the input image and a reference image, then blends the result with the original input image based on the specified blend ratio.

Parameters:

Name Type Description
img np.ndarray

The input image to be transformed. Can be either grayscale or RGB. Supported dtypes: uint8, float32 (values should be in [0, 1] range).

reference_image np.ndarray

The reference image used for histogram matching. Should have the same number of channels as the input image. Supported dtypes: uint8, float32 (values should be in [0, 1] range).

blend_ratio float

The ratio for blending the matched image with the original image. Should be in the range [0, 1], where 0 means no change and 1 means full histogram matching.

Returns:

Type Description
np.ndarray

The transformed image after histogram matching and blending. The output will have the same shape and dtype as the input image.

Supported image types: - Grayscale images: 2D arrays - RGB images: 3D arrays with 3 channels - Multispectral images: 3D arrays with more than 3 channels

Note

  • If the input and reference images have different sizes, the reference image will be resized to match the input image's dimensions.
  • The function uses match_histograms from scikit-image for the core histogram matching.
  • The @clipped and @preserve_channel_dim decorators ensure the output is within the valid range and maintains the original number of dimensions.

Examples:

Python
>>> import numpy as np
>>> from albumentations.augmentations.domain_adaptation_functional import apply_histogram
>>> input_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> reference_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> result = apply_histogram(input_image, reference_image, blend_ratio=0.7)
Source code in albumentations/augmentations/domain_adaptation_functional.py
Python
@clipped
@preserve_channel_dim
def apply_histogram(img: np.ndarray, reference_image: np.ndarray, blend_ratio: float) -> np.ndarray:
    """Apply histogram matching to an input image using a reference image and blend the result.

    This function performs histogram matching between the input image and a reference image,
    then blends the result with the original input image based on the specified blend ratio.

    Args:
        img (np.ndarray): The input image to be transformed. Can be either grayscale or RGB.
            Supported dtypes: uint8, float32 (values should be in [0, 1] range).
        reference_image (np.ndarray): The reference image used for histogram matching.
            Should have the same number of channels as the input image.
            Supported dtypes: uint8, float32 (values should be in [0, 1] range).
        blend_ratio (float): The ratio for blending the matched image with the original image.
            Should be in the range [0, 1], where 0 means no change and 1 means full histogram matching.

    Returns:
        np.ndarray: The transformed image after histogram matching and blending.
            The output will have the same shape and dtype as the input image.

    Supported image types:
        - Grayscale images: 2D arrays
        - RGB images: 3D arrays with 3 channels
        - Multispectral images: 3D arrays with more than 3 channels

    Note:
        - If the input and reference images have different sizes, the reference image
          will be resized to match the input image's dimensions.
        - The function uses `match_histograms` from scikit-image for the core histogram matching.
        - The @clipped and @preserve_channel_dim decorators ensure the output is within
          the valid range and maintains the original number of dimensions.

    Example:
        >>> import numpy as np
        >>> from albumentations.augmentations.domain_adaptation_functional import apply_histogram
        >>> input_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> reference_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> result = apply_histogram(input_image, reference_image, blend_ratio=0.7)
    """
    # Resize reference image only if necessary
    if img.shape[:2] != reference_image.shape[:2]:
        reference_image = cv2.resize(reference_image, dsize=(img.shape[1], img.shape[0]))

    img = np.squeeze(img)
    reference_image = np.squeeze(reference_image)

    # Match histograms between the images
    matched = match_histograms(
        img,
        reference_image,
        channel_axis=2 if img.ndim == NUM_MULTI_CHANNEL_DIMENSIONS and img.shape[2] > 1 else None,
    )

    # Blend the original image and the matched image
    return add_weighted(matched, blend_ratio, img, 1 - blend_ratio)

def fourier_domain_adaptation (img, target_img, beta) [view source on GitHub]

Apply Fourier Domain Adaptation to the input image using a target image.

This function performs domain adaptation in the frequency domain by modifying the amplitude spectrum of the source image based on the target image's amplitude spectrum. It preserves the phase information of the source image, which helps maintain its content while adapting its style to match the target image.

Parameters:

Name Type Description
img np.ndarray

The source image to be adapted. Can be grayscale or RGB.

target_img np.ndarray

The target image used as a reference for adaptation. Should have the same dimensions as the source image.

beta float

The adaptation strength, typically in the range [0, 1]. Higher values result in stronger adaptation towards the target image's style.

Returns:

Type Description
np.ndarray

The adapted image with the same shape and type as the input image.

Exceptions:

Type Description
ValueError

If the source and target images have different shapes.

Note

  • Both input images are converted to float32 for processing.
  • The function handles both grayscale (2D) and color (3D) images.
  • For grayscale images, an extra dimension is added to facilitate uniform processing.
  • The adaptation is performed channel-wise for color images.
  • The output is clipped to the valid range and preserves the original number of channels.

The adaptation process involves the following steps for each channel: 1. Compute the 2D Fourier Transform of both source and target images. 2. Shift the zero frequency component to the center of the spectrum. 3. Extract amplitude and phase information from the source image's spectrum. 4. Mutate the source amplitude using the target amplitude and the beta parameter. 5. Combine the mutated amplitude with the original phase. 6. Perform the inverse Fourier Transform to obtain the adapted channel.

The low_freq_mutate function (not shown here) is responsible for the actual amplitude mutation, focusing on low-frequency components which carry style information.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> source_img = np.random.rand(100, 100, 3).astype(np.float32)
>>> target_img = np.random.rand(100, 100, 3).astype(np.float32)
>>> adapted_img = A.fourier_domain_adaptation(source_img, target_img, beta=0.5)
>>> assert adapted_img.shape == source_img.shape

References

Source code in albumentations/augmentations/domain_adaptation_functional.py
Python
@clipped
@preserve_channel_dim
def fourier_domain_adaptation(img: np.ndarray, target_img: np.ndarray, beta: float) -> np.ndarray:
    """Apply Fourier Domain Adaptation to the input image using a target image.

    This function performs domain adaptation in the frequency domain by modifying the amplitude
    spectrum of the source image based on the target image's amplitude spectrum. It preserves
    the phase information of the source image, which helps maintain its content while adapting
    its style to match the target image.

    Args:
        img (np.ndarray): The source image to be adapted. Can be grayscale or RGB.
        target_img (np.ndarray): The target image used as a reference for adaptation.
            Should have the same dimensions as the source image.
        beta (float): The adaptation strength, typically in the range [0, 1].
            Higher values result in stronger adaptation towards the target image's style.

    Returns:
        np.ndarray: The adapted image with the same shape and type as the input image.

    Raises:
        ValueError: If the source and target images have different shapes.

    Note:
        - Both input images are converted to float32 for processing.
        - The function handles both grayscale (2D) and color (3D) images.
        - For grayscale images, an extra dimension is added to facilitate uniform processing.
        - The adaptation is performed channel-wise for color images.
        - The output is clipped to the valid range and preserves the original number of channels.

    The adaptation process involves the following steps for each channel:
    1. Compute the 2D Fourier Transform of both source and target images.
    2. Shift the zero frequency component to the center of the spectrum.
    3. Extract amplitude and phase information from the source image's spectrum.
    4. Mutate the source amplitude using the target amplitude and the beta parameter.
    5. Combine the mutated amplitude with the original phase.
    6. Perform the inverse Fourier Transform to obtain the adapted channel.

    The `low_freq_mutate` function (not shown here) is responsible for the actual
    amplitude mutation, focusing on low-frequency components which carry style information.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> source_img = np.random.rand(100, 100, 3).astype(np.float32)
        >>> target_img = np.random.rand(100, 100, 3).astype(np.float32)
        >>> adapted_img = A.fourier_domain_adaptation(source_img, target_img, beta=0.5)
        >>> assert adapted_img.shape == source_img.shape

    References:
        - "FDA: Fourier Domain Adaptation for Semantic Segmentation"
          (Yang and Soatto, 2020, CVPR)
          https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf
    """
    src_img = img.astype(np.float32)
    trg_img = target_img.astype(np.float32)

    if len(src_img.shape) == MONO_CHANNEL_DIMENSIONS:
        src_img = np.expand_dims(src_img, axis=-1)
    if len(trg_img.shape) == MONO_CHANNEL_DIMENSIONS:
        trg_img = np.expand_dims(trg_img, axis=-1)

    num_channels = src_img.shape[-1]

    # Prepare container for the output image
    src_in_trg = np.zeros_like(src_img)

    for channel_id in range(num_channels):
        # Perform FFT on each channel
        fft_src = np.fft.fft2(src_img[:, :, channel_id])
        fft_trg = np.fft.fft2(trg_img[:, :, channel_id])

        # Shift the zero frequency component to the center
        fft_src_shifted = np.fft.fftshift(fft_src)
        fft_trg_shifted = np.fft.fftshift(fft_trg)

        # Extract amplitude and phase
        amp_src, pha_src = np.abs(fft_src_shifted), np.angle(fft_src_shifted)
        amp_trg = np.abs(fft_trg_shifted)

        # Mutate the amplitude part of the source with the target
        mutated_amp = low_freq_mutate(amp_src.copy(), amp_trg, beta)

        # Combine the mutated amplitude with the original phase
        fft_src_mutated = np.fft.ifftshift(mutated_amp * np.exp(1j * pha_src))

        # Perform inverse FFT
        src_in_trg_channel = np.fft.ifft2(fft_src_mutated)

        # Store the result in the corresponding channel of the output image
        src_in_trg[:, :, channel_id] = np.real(src_in_trg_channel)

    return src_in_trg

dropout special

channel_dropout

class ChannelDropout (channel_drop_range=(1, 1), fill_value=0, always_apply=None, p=0.5) [view source on GitHub]

Randomly drop channels in the input image.

This transform randomly selects a number of channels to drop from the input image and replaces them with a specified fill value. This can improve model robustness to missing or corrupted channels.

The technique is conceptually similar to: - Dropout layers in neural networks, which randomly set input units to 0 during training. - CoarseDropout augmentation, which drops out regions in the spatial dimensions of the image.

However, ChannelDropout operates on the channel dimension, effectively "dropping out" entire color channels or feature maps.

Parameters:

Name Type Description
channel_drop_range tuple[int, int]

Range from which to choose the number of channels to drop. The actual number will be randomly selected from the inclusive range [min, max]. Default: (1, 1).

fill_value float

Pixel value used to fill the dropped channels. Default: 0.

p float

Probability of applying the transform. Must be in the range [0, 1]. Default: 0.5.

Exceptions:

Type Description
NotImplementedError

If the input image has only one channel.

ValueError

If the upper bound of channel_drop_range is greater than or equal to the number of channels in the input image.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ChannelDropout(channel_drop_range=(1, 2), fill_value=128, p=1.0)
>>> result = transform(image=image)
>>> dropped_image = result['image']
>>> assert dropped_image.shape == image.shape
>>> assert np.any(dropped_image != image)  # Some channels should be different

Note

  • The number of channels to drop is randomly chosen within the specified range.
  • Channels are randomly selected for dropping.
  • This transform is not applicable to single-channel (grayscale) images.
  • The transform will raise an error if it's not possible to drop the specified number of channels (e.g., trying to drop 3 channels from an RGB image).
  • This augmentation can be particularly useful for training models to be robust against missing or corrupted channel data in multi-spectral or hyperspectral imagery.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/channel_dropout.py
Python
class ChannelDropout(ImageOnlyTransform):
    """Randomly drop channels in the input image.

    This transform randomly selects a number of channels to drop from the input image
    and replaces them with a specified fill value. This can improve model robustness
    to missing or corrupted channels.

    The technique is conceptually similar to:
    - Dropout layers in neural networks, which randomly set input units to 0 during training.
    - CoarseDropout augmentation, which drops out regions in the spatial dimensions of the image.

    However, ChannelDropout operates on the channel dimension, effectively "dropping out"
    entire color channels or feature maps.

    Args:
        channel_drop_range (tuple[int, int]): Range from which to choose the number
            of channels to drop. The actual number will be randomly selected from
            the inclusive range [min, max]. Default: (1, 1).
        fill_value (float): Pixel value used to fill the dropped channels.
            Default: 0.
        p (float): Probability of applying the transform. Must be in the range
            [0, 1]. Default: 0.5.

    Raises:
        NotImplementedError: If the input image has only one channel.
        ValueError: If the upper bound of channel_drop_range is greater than or
            equal to the number of channels in the input image.

    Targets:
        image

    Image types:
        uint8, float32

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.ChannelDropout(channel_drop_range=(1, 2), fill_value=128, p=1.0)
        >>> result = transform(image=image)
        >>> dropped_image = result['image']
        >>> assert dropped_image.shape == image.shape
        >>> assert np.any(dropped_image != image)  # Some channels should be different

    Note:
        - The number of channels to drop is randomly chosen within the specified range.
        - Channels are randomly selected for dropping.
        - This transform is not applicable to single-channel (grayscale) images.
        - The transform will raise an error if it's not possible to drop the specified
          number of channels (e.g., trying to drop 3 channels from an RGB image).
        - This augmentation can be particularly useful for training models to be robust
          against missing or corrupted channel data in multi-spectral or hyperspectral imagery.

    """

    class InitSchema(BaseTransformInitSchema):
        channel_drop_range: Annotated[tuple[int, int], AfterValidator(check_1plus)]
        fill_value: Annotated[float, Field(description="Pixel value for the dropped channel.")]

    def __init__(
        self,
        channel_drop_range: tuple[int, int] = (1, 1),
        fill_value: float = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.channel_drop_range = channel_drop_range
        self.fill_value = fill_value

    def apply(self, img: np.ndarray, channels_to_drop: tuple[int, ...], **params: Any) -> np.ndarray:
        return channel_dropout(img, channels_to_drop, self.fill_value)

    def get_params_dependent_on_data(self, params: Mapping[str, Any], data: Mapping[str, Any]) -> dict[str, Any]:
        image = data["image"] if "image" in data else data["images"][0]
        num_channels = get_num_channels(image)

        if num_channels == 1:
            msg = "Images has one channel. ChannelDropout is not defined."
            raise NotImplementedError(msg)

        if self.channel_drop_range[1] >= num_channels:
            msg = "Can not drop all channels in ChannelDropout."
            raise ValueError(msg)

        num_drop_channels = random.randint(*self.channel_drop_range)

        channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)

        return {"channels_to_drop": channels_to_drop}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "channel_drop_range", "fill_value"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/channel_dropout.py
Python
class InitSchema(BaseTransformInitSchema):
    channel_drop_range: Annotated[tuple[int, int], AfterValidator(check_1plus)]
    fill_value: Annotated[float, Field(description="Pixel value for the dropped channel.")]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, channels_to_drop, **params)

Apply transform on image.

Source code in albumentations/augmentations/dropout/channel_dropout.py
Python
def apply(self, img: np.ndarray, channels_to_drop: tuple[int, ...], **params: Any) -> np.ndarray:
    return channel_dropout(img, channels_to_drop, self.fill_value)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/channel_dropout.py
Python
def get_params_dependent_on_data(self, params: Mapping[str, Any], data: Mapping[str, Any]) -> dict[str, Any]:
    image = data["image"] if "image" in data else data["images"][0]
    num_channels = get_num_channels(image)

    if num_channels == 1:
        msg = "Images has one channel. ChannelDropout is not defined."
        raise NotImplementedError(msg)

    if self.channel_drop_range[1] >= num_channels:
        msg = "Can not drop all channels in ChannelDropout."
        raise ValueError(msg)

    num_drop_channels = random.randint(*self.channel_drop_range)

    channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)

    return {"channels_to_drop": channels_to_drop}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/channel_dropout.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "channel_drop_range", "fill_value"

coarse_dropout

class CoarseDropout (max_holes=None, max_height=None, max_width=None, min_holes=None, min_height=None, min_width=None, fill_value=0, mask_fill_value=None, num_holes_range=(1, 1), hole_height_range=(8, 8), hole_width_range=(8, 8), always_apply=None, p=0.5) [view source on GitHub]

CoarseDropout randomly drops out rectangular regions from the image and optionally, the corresponding regions in an associated mask, to simulate the occlusion and varied object sizes found in real-world settings. This transformation is an evolution of CutOut and RandomErasing, offering more flexibility in the size, number of dropout regions, and fill values.

Parameters:

Name Type Description
num_holes_range tuple[int, int]

Specifies the range (minimum and maximum) of the number of rectangular regions to zero out. This allows for dynamic variation in the number of regions removed per transformation instance.

hole_height_range tuple[ScalarType, ScalarType]

Defines the minimum and maximum heights of the dropout regions, providing variability in their vertical dimensions.

hole_width_range tuple[ScalarType, ScalarType]

Defines the minimum and maximum widths of the dropout regions, providing variability in their horizontal dimensions.

fill_value ColorType, Literal["random"]

Specifies the value used to fill the dropout regions. This can be a constant value, a tuple specifying pixel intensity across channels, or 'random' which fills the region with random noise.

mask_fill_value ColorType | None

Specifies the fill value for dropout regions in the mask. If set to None, the mask regions corresponding to the image dropout regions are left unchanged.

Targets

image, mask, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
class CoarseDropout(DualTransform):
    """CoarseDropout randomly drops out rectangular regions from the image and optionally,
    the corresponding regions in an associated mask, to simulate the occlusion and
    varied object sizes found in real-world settings. This transformation is an
    evolution of CutOut and RandomErasing, offering more flexibility in the size,
    number of dropout regions, and fill values.

    Args:
        num_holes_range (tuple[int, int]): Specifies the range (minimum and maximum)
            of the number of rectangular regions to zero out. This allows for dynamic
            variation in the number of regions removed per transformation instance.
        hole_height_range (tuple[ScalarType, ScalarType]): Defines the minimum and
            maximum heights of the dropout regions, providing variability in their vertical dimensions.
        hole_width_range (tuple[ScalarType, ScalarType]): Defines the minimum and
            maximum widths of the dropout regions, providing variability in their horizontal dimensions.
        fill_value (ColorType, Literal["random"]): Specifies the value used to fill the dropout regions.
            This can be a constant value, a tuple specifying pixel intensity across channels, or 'random'
            which fills the region with random noise.
        mask_fill_value (ColorType | None): Specifies the fill value for dropout regions in the mask.
            If set to `None`, the mask regions corresponding to the image dropout regions are left unchanged.


    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/1708.04552
        https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
        https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        min_holes: int | None = Field(
            default=None,
            ge=0,
            description="Minimum number of regions to zero out.",
        )
        max_holes: int | None = Field(
            default=8,
            ge=0,
            description="Maximum number of regions to zero out.",
        )
        num_holes_range: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] = (1, 1)

        min_height: ScalarType | None = Field(
            default=None,
            ge=0,
            description="Minimum height of the hole.",
        )
        max_height: ScalarType | None = Field(
            default=8,
            ge=0,
            description="Maximum height of the hole.",
        )
        hole_height_range: tuple[ScalarType, ScalarType] = (8, 8)

        min_width: ScalarType | None = Field(
            default=None,
            ge=0,
            description="Minimum width of the hole.",
        )
        max_width: ScalarType | None = Field(
            default=8,
            ge=0,
            description="Maximum width of the hole.",
        )
        hole_width_range: tuple[ScalarType, ScalarType] = (8, 8)

        fill_value: ColorType | Literal["random"] = Field(default=0, description="Value for dropped pixels.")
        mask_fill_value: ColorType | None = Field(default=None, description="Fill value for dropped pixels in mask.")

        @staticmethod
        def update_range(
            min_value: NumericType | None,
            max_value: NumericType | None,
            default_range: tuple[NumericType, NumericType],
        ) -> tuple[NumericType, NumericType]:
            if max_value is not None:
                return (min_value or max_value, max_value)

            return default_range

        @staticmethod
        # Validation for hole dimensions ranges
        def validate_range(range_value: tuple[ScalarType, ScalarType], range_name: str, minimum: float = 0) -> None:
            if not minimum <= range_value[0] <= range_value[1]:
                raise ValueError(
                    f"First value in {range_name} should be less or equal than the second value "
                    f"and at least {minimum}. Got: {range_value}",
                )
            if isinstance(range_value[0], float) and not all(0 <= x <= 1 for x in range_value):
                raise ValueError(f"All values in {range_name} should be in [0, 1] range. Got: {range_value}")

        @model_validator(mode="after")
        def check_num_holes_and_dimensions(self) -> Self:
            if self.min_holes is not None:
                warn("`min_holes` is deprecated. Use num_holes_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_holes is not None:
                warn("`max_holes` is deprecated. Use num_holes_range instead.", DeprecationWarning, stacklevel=2)

            if self.min_height is not None:
                warn("`min_height` is deprecated. Use hole_height_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_height is not None:
                warn("`max_height` is deprecated. Use hole_height_range instead.", DeprecationWarning, stacklevel=2)

            if self.min_width is not None:
                warn("`min_width` is deprecated. Use hole_width_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_width is not None:
                warn("`max_width` is deprecated. Use hole_width_range instead.", DeprecationWarning, stacklevel=2)

            if self.max_holes is not None:
                # Update ranges for holes, heights, and widths
                self.num_holes_range = self.update_range(self.min_holes, self.max_holes, self.num_holes_range)

            self.validate_range(self.num_holes_range, "num_holes_range", minimum=1)

            if self.max_height is not None:
                self.hole_height_range = self.update_range(self.min_height, self.max_height, self.hole_height_range)
            self.validate_range(self.hole_height_range, "hole_height_range")

            if self.max_width is not None:
                self.hole_width_range = self.update_range(self.min_width, self.max_width, self.hole_width_range)
            self.validate_range(self.hole_width_range, "hole_width_range")

            return self

    def __init__(
        self,
        max_holes: int | None = None,
        max_height: ScalarType | None = None,
        max_width: ScalarType | None = None,
        min_holes: int | None = None,
        min_height: ScalarType | None = None,
        min_width: ScalarType | None = None,
        fill_value: ColorType | Literal["random"] = 0,
        mask_fill_value: ColorType | None = None,
        num_holes_range: tuple[int, int] = (1, 1),
        hole_height_range: tuple[ScalarType, ScalarType] = (8, 8),
        hole_width_range: tuple[ScalarType, ScalarType] = (8, 8),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.num_holes_range = num_holes_range
        self.hole_height_range = hole_height_range
        self.hole_width_range = hole_width_range

        self.fill_value = fill_value  # type: ignore[assignment]
        self.mask_fill_value = mask_fill_value

    def apply(
        self,
        img: np.ndarray,
        fill_value: ColorType | Literal["random"],
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        return cutout(img, holes, fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        mask_fill_value: ScalarType,
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if mask_fill_value is None:
            return mask
        return cutout(mask, holes, mask_fill_value)

    @staticmethod
    def calculate_hole_dimensions(
        image_shape: tuple[int, int],
        height_range: tuple[ScalarType, ScalarType],
        width_range: tuple[ScalarType, ScalarType],
        size: int,
    ) -> tuple[np.ndarray, np.ndarray]:
        """Calculate random hole dimensions based on the provided ranges."""
        height, width = image_shape[:2]

        if isinstance(height_range[0], int):
            min_height = height_range[0]
            max_height = min(height_range[1], height)

            min_width = width_range[0]
            max_width = min(width_range[1], width)

            hole_heights = randint(np.int64(min_height), np.int64(max_height + 1), size=size)
            hole_widths = randint(np.int64(min_width), np.int64(max_width + 1), size=size)

        else:  # Assume float
            hole_heights = (height * uniform(height_range[0], height_range[1], size=size)).astype(int)
            hole_widths = (width * uniform(width_range[0], width_range[1], size=size)).astype(int)

        return hole_heights, hole_widths

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image_shape = params["shape"][:2]

        num_holes = randint(self.num_holes_range[0], self.num_holes_range[1] + 1)

        hole_heights, hole_widths = self.calculate_hole_dimensions(
            image_shape,
            self.hole_height_range,
            self.hole_width_range,
            size=num_holes,
        )

        height, width = image_shape[:2]

        y1 = randint(np.int8(0), height - hole_heights + 1, size=num_holes)
        x1 = randint(np.int8(0), width - hole_widths + 1, size=num_holes)
        y2 = y1 + hole_heights
        x2 = x1 + hole_widths

        holes = np.stack([x1, y1, x2, y2], axis=-1)

        return {"holes": holes}

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        holes: np.ndarray,
        **params: Any,
    ) -> np.ndarray:
        return filter_keypoints_in_holes(keypoints, holes)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "num_holes_range",
            "hole_height_range",
            "hole_width_range",
            "fill_value",
            "mask_fill_value",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "keypoints": self.apply_to_keypoints,
        }
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
class InitSchema(BaseTransformInitSchema):
    min_holes: int | None = Field(
        default=None,
        ge=0,
        description="Minimum number of regions to zero out.",
    )
    max_holes: int | None = Field(
        default=8,
        ge=0,
        description="Maximum number of regions to zero out.",
    )
    num_holes_range: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] = (1, 1)

    min_height: ScalarType | None = Field(
        default=None,
        ge=0,
        description="Minimum height of the hole.",
    )
    max_height: ScalarType | None = Field(
        default=8,
        ge=0,
        description="Maximum height of the hole.",
    )
    hole_height_range: tuple[ScalarType, ScalarType] = (8, 8)

    min_width: ScalarType | None = Field(
        default=None,
        ge=0,
        description="Minimum width of the hole.",
    )
    max_width: ScalarType | None = Field(
        default=8,
        ge=0,
        description="Maximum width of the hole.",
    )
    hole_width_range: tuple[ScalarType, ScalarType] = (8, 8)

    fill_value: ColorType | Literal["random"] = Field(default=0, description="Value for dropped pixels.")
    mask_fill_value: ColorType | None = Field(default=None, description="Fill value for dropped pixels in mask.")

    @staticmethod
    def update_range(
        min_value: NumericType | None,
        max_value: NumericType | None,
        default_range: tuple[NumericType, NumericType],
    ) -> tuple[NumericType, NumericType]:
        if max_value is not None:
            return (min_value or max_value, max_value)

        return default_range

    @staticmethod
    # Validation for hole dimensions ranges
    def validate_range(range_value: tuple[ScalarType, ScalarType], range_name: str, minimum: float = 0) -> None:
        if not minimum <= range_value[0] <= range_value[1]:
            raise ValueError(
                f"First value in {range_name} should be less or equal than the second value "
                f"and at least {minimum}. Got: {range_value}",
            )
        if isinstance(range_value[0], float) and not all(0 <= x <= 1 for x in range_value):
            raise ValueError(f"All values in {range_name} should be in [0, 1] range. Got: {range_value}")

    @model_validator(mode="after")
    def check_num_holes_and_dimensions(self) -> Self:
        if self.min_holes is not None:
            warn("`min_holes` is deprecated. Use num_holes_range instead.", DeprecationWarning, stacklevel=2)

        if self.max_holes is not None:
            warn("`max_holes` is deprecated. Use num_holes_range instead.", DeprecationWarning, stacklevel=2)

        if self.min_height is not None:
            warn("`min_height` is deprecated. Use hole_height_range instead.", DeprecationWarning, stacklevel=2)

        if self.max_height is not None:
            warn("`max_height` is deprecated. Use hole_height_range instead.", DeprecationWarning, stacklevel=2)

        if self.min_width is not None:
            warn("`min_width` is deprecated. Use hole_width_range instead.", DeprecationWarning, stacklevel=2)

        if self.max_width is not None:
            warn("`max_width` is deprecated. Use hole_width_range instead.", DeprecationWarning, stacklevel=2)

        if self.max_holes is not None:
            # Update ranges for holes, heights, and widths
            self.num_holes_range = self.update_range(self.min_holes, self.max_holes, self.num_holes_range)

        self.validate_range(self.num_holes_range, "num_holes_range", minimum=1)

        if self.max_height is not None:
            self.hole_height_range = self.update_range(self.min_height, self.max_height, self.hole_height_range)
        self.validate_range(self.hole_height_range, "hole_height_range")

        if self.max_width is not None:
            self.hole_width_range = self.update_range(self.min_width, self.max_width, self.hole_width_range)
        self.validate_range(self.hole_width_range, "hole_width_range")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, fill_value, holes, **params)

Apply transform on image.

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
def apply(
    self,
    img: np.ndarray,
    fill_value: ColorType | Literal["random"],
    holes: Iterable[tuple[int, int, int, int]],
    **params: Any,
) -> np.ndarray:
    return cutout(img, holes, fill_value)
calculate_hole_dimensions (image_shape, height_range, width_range, size) staticmethod

Calculate random hole dimensions based on the provided ranges.

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
@staticmethod
def calculate_hole_dimensions(
    image_shape: tuple[int, int],
    height_range: tuple[ScalarType, ScalarType],
    width_range: tuple[ScalarType, ScalarType],
    size: int,
) -> tuple[np.ndarray, np.ndarray]:
    """Calculate random hole dimensions based on the provided ranges."""
    height, width = image_shape[:2]

    if isinstance(height_range[0], int):
        min_height = height_range[0]
        max_height = min(height_range[1], height)

        min_width = width_range[0]
        max_width = min(width_range[1], width)

        hole_heights = randint(np.int64(min_height), np.int64(max_height + 1), size=size)
        hole_widths = randint(np.int64(min_width), np.int64(max_width + 1), size=size)

    else:  # Assume float
        hole_heights = (height * uniform(height_range[0], height_range[1], size=size)).astype(int)
        hole_widths = (width * uniform(width_range[0], width_range[1], size=size)).astype(int)

    return hole_heights, hole_widths
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image_shape = params["shape"][:2]

    num_holes = randint(self.num_holes_range[0], self.num_holes_range[1] + 1)

    hole_heights, hole_widths = self.calculate_hole_dimensions(
        image_shape,
        self.hole_height_range,
        self.hole_width_range,
        size=num_holes,
    )

    height, width = image_shape[:2]

    y1 = randint(np.int8(0), height - hole_heights + 1, size=num_holes)
    x1 = randint(np.int8(0), width - hole_widths + 1, size=num_holes)
    y2 = y1 + hole_heights
    x2 = x1 + hole_widths

    holes = np.stack([x1, y1, x2, y2], axis=-1)

    return {"holes": holes}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/coarse_dropout.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "num_holes_range",
        "hole_height_range",
        "hole_width_range",
        "fill_value",
        "mask_fill_value",
    )

functional

def cutout (img, holes, fill_value=0, random_state=None) [view source on GitHub]

Apply cutout augmentation to the image by cutting out holes and filling them with either a given value or random noise.

Parameters:

Name Type Description
img np.ndarray

The image to augment. Can be a 2D (grayscale) or 3D (color) array.

holes np.ndarray

An array of holes with shape (num_holes, 4). Each hole is represented as [x1, y1, x2, y2].

fill_value Union[ColorType, Literal["random"]]

The fill value to use for the holes. Can be a single integer, a tuple or list of numbers for multichannel images, or the string "random" to fill with random noise. Defaults to 0.

random_state np.random.RandomState | None

The random state to use for generating random fill values. If None, a new random state will be used. Defaults to None.

Returns:

Type Description
np.ndarray

The augmented image with cutout holes applied.

Exceptions:

Type Description
ValueError

If the fill_value is not of the expected type.

Note

  • The function creates a copy of the input image before applying the cutout.
  • For multichannel images, the fill_value should match the number of channels.
  • When using "random" fill, the random values are generated to match the image's dtype and shape.

Examples:

Python
>>> import numpy as np
>>> img = np.ones((100, 100, 3), dtype=np.uint8) * 255
>>> holes = np.array([[20, 20, 40, 40], [60, 60, 80, 80]])
>>> result = cutout(img, holes, fill_value=0)
>>> print(result.shape)
(100, 100, 3)
Source code in albumentations/augmentations/dropout/functional.py
Python
def cutout(
    img: np.ndarray,
    holes: np.ndarray,
    fill_value: ColorType | Literal["random"] = 0,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Apply cutout augmentation to the image by cutting out holes and filling them
    with either a given value or random noise.

    Args:
        img (np.ndarray): The image to augment. Can be a 2D (grayscale) or 3D (color) array.
        holes (np.ndarray): An array of holes with shape (num_holes, 4).
            Each hole is represented as [x1, y1, x2, y2].
        fill_value (Union[ColorType, Literal["random"]], optional): The fill value to use for the holes.
            Can be a single integer, a tuple or list of numbers for multichannel images,
            or the string "random" to fill with random noise. Defaults to 0.
        random_state (np.random.RandomState | None, optional): The random state to use for generating
            random fill values. If None, a new random state will be used. Defaults to None.

    Returns:
        np.ndarray: The augmented image with cutout holes applied.

    Raises:
        ValueError: If the fill_value is not of the expected type.

    Note:
        - The function creates a copy of the input image before applying the cutout.
        - For multichannel images, the fill_value should match the number of channels.
        - When using "random" fill, the random values are generated to match the image's dtype and shape.

    Example:
        >>> import numpy as np
        >>> img = np.ones((100, 100, 3), dtype=np.uint8) * 255
        >>> holes = np.array([[20, 20, 40, 40], [60, 60, 80, 80]])
        >>> result = cutout(img, holes, fill_value=0)
        >>> print(result.shape)
        (100, 100, 3)
    """
    img = img.copy()

    if isinstance(fill_value, (int, float, tuple, list)):
        fill_value = np.array(fill_value, dtype=img.dtype)

    for x_min, y_min, x_max, y_max in holes:
        if isinstance(fill_value, str) and fill_value == "random":
            shape = (
                (y_max - y_min, x_max - x_min)
                if img.ndim == MONO_CHANNEL_DIMENSIONS
                else (y_max - y_min, x_max - x_min, img.shape[2])
            )
            random_fill = generate_random_fill(img.dtype, shape, random_state)
            img[y_min:y_max, x_min:x_max] = random_fill
        else:
            img[y_min:y_max, x_min:x_max] = fill_value

    return img
def filter_keypoints_in_holes (keypoints, holes) [view source on GitHub]

Filter out keypoints that are inside any of the holes.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (num_keypoints, 2+). The first two columns are x and y coordinates.

holes np.ndarray

Array of holes with shape (num_holes, 4). Each hole is represented as [x1, y1, x2, y2].

Returns:

Type Description
np.ndarray

Array of keypoints that are not inside any hole.

Source code in albumentations/augmentations/dropout/functional.py
Python
@handle_empty_array
def filter_keypoints_in_holes(keypoints: np.ndarray, holes: np.ndarray) -> np.ndarray:
    """Filter out keypoints that are inside any of the holes.

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (num_keypoints, 2+).
                                The first two columns are x and y coordinates.
        holes (np.ndarray): Array of holes with shape (num_holes, 4).
                            Each hole is represented as [x1, y1, x2, y2].

    Returns:
        np.ndarray: Array of keypoints that are not inside any hole.
    """
    # Broadcast keypoints and holes for vectorized comparison
    kp_x = keypoints[:, 0][:, np.newaxis]  # Shape: (num_keypoints, 1)
    kp_y = keypoints[:, 1][:, np.newaxis]  # Shape: (num_keypoints, 1)

    hole_x1 = holes[:, 0]  # Shape: (num_holes,)
    hole_y1 = holes[:, 1]  # Shape: (num_holes,)
    hole_x2 = holes[:, 2]  # Shape: (num_holes,)
    hole_y2 = holes[:, 3]  # Shape: (num_holes,)

    # Check if each keypoint is inside each hole
    inside_hole = (kp_x >= hole_x1) & (kp_x < hole_x2) & (kp_y >= hole_y1) & (kp_y < hole_y2)

    # A keypoint is valid if it's not inside any hole
    valid_keypoints = ~np.any(inside_hole, axis=1)

    return keypoints[valid_keypoints]
def generate_random_fill (dtype, shape, random_state) [view source on GitHub]

Generate a random fill array based on the given dtype and target shape.

This function creates a numpy array filled with random values. The range and type of these values depend on the input dtype. For integer dtypes, it generates random integers. For floating-point dtypes, it generates random floats.

Parameters:

Name Type Description
dtype np.dtype

The data type of the array to be generated.

shape tuple[int, ...]

The shape of the array to be generated.

random_state np.random.RandomState | None

The random state to use for generating values. If None, the default numpy random state is used.

Returns:

Type Description
np.ndarray

A numpy array of the specified shape and dtype, filled with random values.

Exceptions:

Type Description
ValueError

If the input dtype is neither integer nor floating-point.

Examples:

Python
>>> import numpy as np
>>> random_state = np.random.RandomState(42)
>>> result = generate_random_fill(np.dtype('uint8'), (2, 2), random_state)
>>> print(result)
[[172 251]
 [ 80 141]]
Source code in albumentations/augmentations/dropout/functional.py
Python
def generate_random_fill(
    dtype: np.dtype,
    shape: tuple[int, ...],
    random_state: np.random.RandomState | None,
) -> np.ndarray:
    """Generate a random fill array based on the given dtype and target shape.

    This function creates a numpy array filled with random values. The range and type of these values
    depend on the input dtype. For integer dtypes, it generates random integers. For floating-point
    dtypes, it generates random floats.

    Args:
        dtype (np.dtype): The data type of the array to be generated.
        shape (tuple[int, ...]): The shape of the array to be generated.
        random_state (np.random.RandomState | None): The random state to use for generating values.
            If None, the default numpy random state is used.

    Returns:
        np.ndarray: A numpy array of the specified shape and dtype, filled with random values.

    Raises:
        ValueError: If the input dtype is neither integer nor floating-point.

    Examples:
        >>> import numpy as np
        >>> random_state = np.random.RandomState(42)
        >>> result = generate_random_fill(np.dtype('uint8'), (2, 2), random_state)
        >>> print(result)
        [[172 251]
         [ 80 141]]
    """
    max_value = MAX_VALUES_BY_DTYPE[dtype]
    if np.issubdtype(dtype, np.integer):
        return random_utils.randint(0, max_value + 1, size=shape, dtype=dtype, random_state=random_state)
    if np.issubdtype(dtype, np.floating):
        return random_utils.uniform(0, max_value, size=shape, random_state=random_state).astype(dtype)
    raise ValueError(f"Unsupported dtype: {dtype}")

grid_dropout

class GridDropout (ratio=0.5, unit_size_min=None, unit_size_max=None, holes_number_x=None, holes_number_y=None, shift_x=None, shift_y=None, random_offset=False, fill_value=0, mask_fill_value=None, unit_size_range=None, holes_number_xy=None, shift_xy=(0, 0), always_apply=None, p=0.5) [view source on GitHub]

GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

Parameters:

Name Type Description
ratio float

The ratio of the mask holes to the unit_size (same for horizontal and vertical directions). Must be between 0 and 1. Default: 0.5.

random_offset bool

Whether to offset the grid randomly between 0 and grid unit size - hole size. If True, entered shift_x and shift_y are ignored and set randomly. Default: False.

fill_value Optional[ColorType]

Value for the dropped pixels. Default: 0.

mask_fill_value Optional[ColorType]

Value for the dropped pixels in mask. If None, transformation is not applied to the mask. Default: None.

unit_size_range Optional[tuple[int, int]]

Range from which to sample grid size. Default: None. Must be between 2 and the image shorter edge.

holes_number_xy Optional[tuple[int, int]]

The number of grid units in x and y directions. First value should be between 1 and image width//2, Second value should be between 1 and image height//2. Default: None.

shift_xy tuple[int, int]

Offsets of the grid start in x and y directions. Offsets of the grid start in x and y directions from (0,0) coordinate. Default: (0, 0).

p float

Probability of applying the transform. Default: 0.5.

Targets

image, mask

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/grid_dropout.py
Python
class GridDropout(DualTransform):
    """GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

    Args:
        ratio (float): The ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
            Must be between 0 and 1. Default: 0.5.
        random_offset (bool): Whether to offset the grid randomly between 0 and grid unit size - hole size.
            If True, entered shift_x and shift_y are ignored and set randomly. Default: False.
        fill_value (Optional[ColorType]): Value for the dropped pixels. Default: 0.
        mask_fill_value (Optional[ColorType]): Value for the dropped pixels in mask.
            If None, transformation is not applied to the mask. Default: None.
        unit_size_range (Optional[tuple[int, int]]): Range from which to sample grid size. Default: None.
             Must be between 2 and the image shorter edge.
        holes_number_xy (Optional[tuple[int, int]]): The number of grid units in x and y directions.
            First value should be between 1 and image width//2,
            Second value should be between 1 and image height//2.
            Default: None.
        shift_xy (tuple[int, int]): Offsets of the grid start in x and y directions.
            Offsets of the grid start in x and y directions from (0,0) coordinate.
            Default: (0, 0).

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/2001.04086

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        ratio: float = Field(description="The ratio of the mask holes to the unit_size.", gt=0, le=1)

        unit_size_min: int | None = Field(None, description="Minimum size of the grid unit.", ge=2)
        unit_size_max: int | None = Field(None, description="Maximum size of the grid unit.", ge=2)

        holes_number_x: int | None = Field(None, description="The number of grid units in x direction.", ge=1)
        holes_number_y: int | None = Field(None, description="The number of grid units in y direction.", ge=1)

        shift_x: int | None = Field(0, description="Offsets of the grid start in x direction.", ge=0)
        shift_y: int | None = Field(0, description="Offsets of the grid start in y direction.", ge=0)

        random_offset: bool = Field(False, description="Whether to offset the grid randomly.")
        fill_value: ColorType | None = Field(0, description="Value for the dropped pixels.")
        mask_fill_value: ColorType | None = Field(None, description="Value for the dropped pixels in mask.")
        unit_size_range: (
            Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] | None
        ) = None
        shift_xy: Annotated[tuple[int, int], AfterValidator(check_0plus)] = Field(
            (0, 0),
            description="Offsets of the grid start in x and y directions.",
        )
        holes_number_xy: Annotated[tuple[int, int], AfterValidator(check_1plus)] | None = Field(
            None,
            description="The number of grid units in x and y directions.",
        )

        @model_validator(mode="after")
        def validate_normalization(self) -> Self:
            if self.unit_size_min is not None and self.unit_size_max is not None:
                self.unit_size_range = self.unit_size_min, self.unit_size_max
                warn(
                    "unit_size_min and unit_size_max are deprecated. Use unit_size_range instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.shift_x is not None and self.shift_y is not None:
                self.shift_xy = self.shift_x, self.shift_y
                warn("shift_x and shift_y are deprecated. Use shift_xy instead.", DeprecationWarning, stacklevel=2)

            if self.holes_number_x is not None and self.holes_number_y is not None:
                self.holes_number_xy = self.holes_number_x, self.holes_number_y
                warn(
                    "holes_number_x and holes_number_y are deprecated. Use holes_number_xy instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.unit_size_range and not MIN_UNIT_SIZE <= self.unit_size_range[0] <= self.unit_size_range[1]:
                raise ValueError("Max unit size should be >= min size, both at least 2 pixels.")

            return self

    def __init__(
        self,
        ratio: float = 0.5,
        unit_size_min: int | None = None,
        unit_size_max: int | None = None,
        holes_number_x: int | None = None,
        holes_number_y: int | None = None,
        shift_x: int | None = None,
        shift_y: int | None = None,
        random_offset: bool = False,
        fill_value: ColorType = 0,
        mask_fill_value: ColorType | None = None,
        unit_size_range: tuple[int, int] | None = None,
        holes_number_xy: tuple[int, int] | None = None,
        shift_xy: tuple[int, int] = (0, 0),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.ratio = ratio
        self.unit_size_range = unit_size_range
        self.holes_number_xy = holes_number_xy
        self.random_offset = random_offset
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value
        self.shift_xy = shift_xy

    def apply(self, img: np.ndarray, holes: Iterable[tuple[int, int, int, int]], **params: Any) -> np.ndarray:
        return fdropout.cutout(img, holes, self.fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        holes: Iterable[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if self.mask_fill_value is None:
            return mask

        return fdropout.cutout(mask, holes, self.mask_fill_value)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image_shape = params["shape"]
        unit_shape = self._calculate_unit_dimensions(image_shape)
        hole_dimensions = self._calculate_hole_dimensions(unit_shape)
        shift_x, shift_y = self._calculate_shifts(unit_shape, hole_dimensions)
        holes = self._generate_holes(image_shape, unit_shape, hole_dimensions, shift_x, shift_y)
        return {"holes": holes}

    def _calculate_unit_dimensions(self, shape: tuple[int, int]) -> tuple[int, int]:
        """Calculates the dimensions of the grid units."""
        if self.unit_size_range is not None:
            self._validate_unit_sizes(shape)
            unit_size = random.randint(*self.unit_size_range)
            return unit_size, unit_size

        return self._calculate_dimensions_based_on_holes(shape)

    def _validate_unit_sizes(self, shape: tuple[int, int]) -> None:
        """Validates the minimum and maximum unit sizes."""
        if self.unit_size_range is None:
            raise ValueError("unit_size_range must not be None.")
        if self.unit_size_range[1] > min(shape[:2]):
            msg = "Grid size limits must be within the shortest image edge."
            raise ValueError(msg)

    def _calculate_dimensions_based_on_holes(self, shape: tuple[int, int]) -> tuple[int, int]:
        """Calculates dimensions based on the number of holes specified."""
        height, width = shape[:2]
        holes_number_x, holes_number_y = self.holes_number_xy or (None, None)
        unit_width = self._calculate_dimension(width, holes_number_x, 10)
        unit_height = self._calculate_dimension(height, holes_number_y, unit_width)
        return unit_height, unit_width

    @staticmethod
    def _calculate_dimension(dimension: int, holes_number: int | None, fallback: int) -> int:
        """Helper function to calculate unit width or height."""
        if holes_number is None:
            return max(2, dimension // fallback)

        if not 1 <= holes_number <= dimension // 2:
            raise ValueError(f"The number of holes must be between 1 and {dimension // 2}.")
        return dimension // holes_number

    def _calculate_hole_dimensions(self, unit_shape: tuple[int, int]) -> tuple[int, int]:
        """Calculates the dimensions of the holes to be dropped out."""
        unit_height, unit_width = unit_shape
        hole_width = min(max(1, int(unit_width * self.ratio)), unit_width - 1)
        hole_height = min(max(1, int(unit_height * self.ratio)), unit_height - 1)
        return hole_height, hole_width

    def _calculate_shifts(
        self,
        unit_shape: tuple[int, int],
        hole_dimensions: tuple[int, int],
    ) -> tuple[int, int]:
        """Calculates the shifts for the grid start."""
        unit_width, unit_height = unit_shape
        hole_width, hole_height = hole_dimensions
        if self.random_offset:
            shift_x = random.randint(0, unit_width - hole_width)
            shift_y = random.randint(0, unit_height - hole_height)
            return shift_x, shift_y

        if isinstance(self.shift_xy, Sequence) and len(self.shift_xy) == PAIR:
            shift_x = min(max(0, self.shift_xy[0]), unit_width - hole_width)
            shift_y = min(max(0, self.shift_xy[1]), unit_height - hole_height)
            return shift_x, shift_y

        return 0, 0

    @staticmethod
    def _generate_holes(
        image_shape: tuple[int, int],
        unit_shape: tuple[int, int],
        hole_dimensions: tuple[int, int],
        shift_x: int,
        shift_y: int,
    ) -> np.ndarray:
        height, width = image_shape[:2]
        unit_height, unit_width = unit_shape
        hole_width, hole_height = hole_dimensions
        """Generates the list of holes to be dropped out."""
        holes = []
        for i in range(width // unit_width + 1):
            for j in range(height // unit_height + 1):
                x1 = min(shift_x + unit_width * i, width)
                y1 = min(shift_y + unit_height * j, height)
                x2 = min(x1 + hole_width, width)
                y2 = min(y1 + hole_height, height)
                holes.append((x1, y1, x2, y2))
        return np.array(holes)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "ratio",
            "unit_size_range",
            "holes_number_xy",
            "shift_xy",
            "random_offset",
            "fill_value",
            "mask_fill_value",
        )
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/grid_dropout.py
Python
class InitSchema(BaseTransformInitSchema):
    ratio: float = Field(description="The ratio of the mask holes to the unit_size.", gt=0, le=1)

    unit_size_min: int | None = Field(None, description="Minimum size of the grid unit.", ge=2)
    unit_size_max: int | None = Field(None, description="Maximum size of the grid unit.", ge=2)

    holes_number_x: int | None = Field(None, description="The number of grid units in x direction.", ge=1)
    holes_number_y: int | None = Field(None, description="The number of grid units in y direction.", ge=1)

    shift_x: int | None = Field(0, description="Offsets of the grid start in x direction.", ge=0)
    shift_y: int | None = Field(0, description="Offsets of the grid start in y direction.", ge=0)

    random_offset: bool = Field(False, description="Whether to offset the grid randomly.")
    fill_value: ColorType | None = Field(0, description="Value for the dropped pixels.")
    mask_fill_value: ColorType | None = Field(None, description="Value for the dropped pixels in mask.")
    unit_size_range: (
        Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)] | None
    ) = None
    shift_xy: Annotated[tuple[int, int], AfterValidator(check_0plus)] = Field(
        (0, 0),
        description="Offsets of the grid start in x and y directions.",
    )
    holes_number_xy: Annotated[tuple[int, int], AfterValidator(check_1plus)] | None = Field(
        None,
        description="The number of grid units in x and y directions.",
    )

    @model_validator(mode="after")
    def validate_normalization(self) -> Self:
        if self.unit_size_min is not None and self.unit_size_max is not None:
            self.unit_size_range = self.unit_size_min, self.unit_size_max
            warn(
                "unit_size_min and unit_size_max are deprecated. Use unit_size_range instead.",
                DeprecationWarning,
                stacklevel=2,
            )

        if self.shift_x is not None and self.shift_y is not None:
            self.shift_xy = self.shift_x, self.shift_y
            warn("shift_x and shift_y are deprecated. Use shift_xy instead.", DeprecationWarning, stacklevel=2)

        if self.holes_number_x is not None and self.holes_number_y is not None:
            self.holes_number_xy = self.holes_number_x, self.holes_number_y
            warn(
                "holes_number_x and holes_number_y are deprecated. Use holes_number_xy instead.",
                DeprecationWarning,
                stacklevel=2,
            )

        if self.unit_size_range and not MIN_UNIT_SIZE <= self.unit_size_range[0] <= self.unit_size_range[1]:
            raise ValueError("Max unit size should be >= min size, both at least 2 pixels.")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, holes, **params)

Apply transform on image.

Source code in albumentations/augmentations/dropout/grid_dropout.py
Python
def apply(self, img: np.ndarray, holes: Iterable[tuple[int, int, int, int]], **params: Any) -> np.ndarray:
    return fdropout.cutout(img, holes, self.fill_value)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/grid_dropout.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image_shape = params["shape"]
    unit_shape = self._calculate_unit_dimensions(image_shape)
    hole_dimensions = self._calculate_hole_dimensions(unit_shape)
    shift_x, shift_y = self._calculate_shifts(unit_shape, hole_dimensions)
    holes = self._generate_holes(image_shape, unit_shape, hole_dimensions, shift_x, shift_y)
    return {"holes": holes}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/grid_dropout.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "ratio",
        "unit_size_range",
        "holes_number_xy",
        "shift_xy",
        "random_offset",
        "fill_value",
        "mask_fill_value",
    )

mask_dropout

class MaskDropout (max_objects=(1, 1), image_fill_value=0, mask_fill_value=0, always_apply=None, p=0.5) [view source on GitHub]

Image & mask augmentation that zero out mask and image regions corresponding to randomly chosen object instance from mask.

Mask must be single-channel image, zero values treated as background. Image can be any number of channels.

Parameters:

Name Type Description
max_objects ScaleIntType

Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]

image_fill_value float | Literal['inpaint']

Fill value to use when filling image. Can be 'inpaint' to apply inpainting (works only for 3-channel images)

mask_fill_value ScalarType

Fill value to use when filling mask.

Targets

image, mask

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/mask_dropout.py
Python
class MaskDropout(DualTransform):
    """Image & mask augmentation that zero out mask and image regions corresponding
    to randomly chosen object instance from mask.

    Mask must be single-channel image, zero values treated as background.
    Image can be any number of channels.

    Args:
        max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
        image_fill_value: Fill value to use when filling image.
            Can be 'inpaint' to apply inpainting (works only  for 3-channel images)
        mask_fill_value: Fill value to use when filling mask.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        max_objects: OnePlusIntRangeType

        image_fill_value: float | Literal["inpaint"] = Field(
            default=0,
            description=(
                "Fill value to use when filling image. "
                "Can be 'inpaint' to apply inpainting (works only for 3-channel images)."
            ),
        )
        mask_fill_value: float = Field(default=0, description="Fill value to use when filling mask.")

    def __init__(
        self,
        max_objects: ScaleIntType = (1, 1),
        image_fill_value: float | Literal["inpaint"] = 0,
        mask_fill_value: ScalarType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.max_objects = cast(Tuple[int, int], max_objects)
        self.image_fill_value = image_fill_value
        self.mask_fill_value = mask_fill_value

    @property
    def targets_as_params(self) -> list[str]:
        return ["mask"]

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        mask = data["mask"]

        label_image, num_labels = label(mask, return_num=True)

        if num_labels == 0:
            dropout_mask = None
        else:
            objects_to_drop = random.randint(self.max_objects[0], self.max_objects[1])
            objects_to_drop = min(num_labels, objects_to_drop)

            if objects_to_drop == num_labels:
                dropout_mask = mask > 0
            else:
                labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
                dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=bool)
                for label_index in labels_index:
                    dropout_mask |= label_image == label_index

        params.update({"dropout_mask": dropout_mask})
        return params

    def apply(self, img: np.ndarray, dropout_mask: np.ndarray, **params: Any) -> np.ndarray:
        if dropout_mask is None:
            return img

        if self.image_fill_value == "inpaint":
            dropout_mask = dropout_mask.astype(np.uint8)
            _, _, width, height = cv2.boundingRect(dropout_mask)
            radius = min(3, max(width, height) // 2)
            return cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)

        img = img.copy()
        img[dropout_mask] = self.image_fill_value

        return img

    def apply_to_mask(self, mask: np.ndarray, dropout_mask: np.ndarray, **params: Any) -> np.ndarray:
        if dropout_mask is None:
            return mask

        mask = mask.copy()
        mask[dropout_mask] = self.mask_fill_value
        return mask

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_objects", "image_fill_value", "mask_fill_value"

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
        }
targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/mask_dropout.py
Python
class InitSchema(BaseTransformInitSchema):
    max_objects: OnePlusIntRangeType

    image_fill_value: float | Literal["inpaint"] = Field(
        default=0,
        description=(
            "Fill value to use when filling image. "
            "Can be 'inpaint' to apply inpainting (works only for 3-channel images)."
        ),
    )
    mask_fill_value: float = Field(default=0, description="Fill value to use when filling mask.")
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, dropout_mask, **params)

Apply transform on image.

Source code in albumentations/augmentations/dropout/mask_dropout.py
Python
def apply(self, img: np.ndarray, dropout_mask: np.ndarray, **params: Any) -> np.ndarray:
    if dropout_mask is None:
        return img

    if self.image_fill_value == "inpaint":
        dropout_mask = dropout_mask.astype(np.uint8)
        _, _, width, height = cv2.boundingRect(dropout_mask)
        radius = min(3, max(width, height) // 2)
        return cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)

    img = img.copy()
    img[dropout_mask] = self.image_fill_value

    return img
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/mask_dropout.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    mask = data["mask"]

    label_image, num_labels = label(mask, return_num=True)

    if num_labels == 0:
        dropout_mask = None
    else:
        objects_to_drop = random.randint(self.max_objects[0], self.max_objects[1])
        objects_to_drop = min(num_labels, objects_to_drop)

        if objects_to_drop == num_labels:
            dropout_mask = mask > 0
        else:
            labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
            dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=bool)
            for label_index in labels_index:
                dropout_mask |= label_image == label_index

    params.update({"dropout_mask": dropout_mask})
    return params
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/mask_dropout.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "max_objects", "image_fill_value", "mask_fill_value"

xy_masking

class XYMasking (num_masks_x=0, num_masks_y=0, mask_x_length=0, mask_y_length=0, fill_value=0, mask_fill_value=0, always_apply=None, p=0.5) [view source on GitHub]

Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis), simulating occlusions. This transform is useful for training models to recognize images with varied visibility conditions. It's particularly effective for spectrogram images, allowing spectral and frequency masking to improve model robustness.

At least one of max_x_length or max_y_length must be specified, dictating the mask's maximum size along each axis.

Parameters:

Name Type Description
num_masks_x Union[int, tuple[int, int]]

Number or range of horizontal regions to mask. Defaults to 0.

num_masks_y Union[int, tuple[int, int]]

Number or range of vertical regions to mask. Defaults to 0.

mask_x_length [Union[int, tuple[int, int]]

Specifies the length of the masks along the X (horizontal) axis. If an integer is provided, it sets a fixed mask length. If a tuple of two integers (min, max) is provided, the mask length is randomly chosen within this range for each mask. This allows for variable-length masks in the horizontal direction.

mask_y_length Union[int, tuple[int, int]]

Specifies the height of the masks along the Y (vertical) axis. Similar to mask_x_length, an integer sets a fixed mask height, while a tuple (min, max) allows for variable-height masks, chosen randomly within the specified range for each mask. This flexibility facilitates creating masks of various sizes in the vertical direction.

fill_value Union[int, float, list[int], list[float]]

Value to fill image masks. Defaults to 0.

mask_fill_value Optional[Union[int, float, list[int], list[float]]]

Value to fill masks in the mask. If None, uses mask is not affected. Default: None.

p float

Probability of applying the transform. Defaults to 0.5.

Targets

image, mask, keypoints

Image types: uint8, float32

Note: Either max_x_length or max_y_length or both must be defined.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
class XYMasking(DualTransform):
    """Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis),
    simulating occlusions. This transform is useful for training models to recognize images
    with varied visibility conditions. It's particularly effective for spectrogram images,
    allowing spectral and frequency masking to improve model robustness.

    At least one of `max_x_length` or `max_y_length` must be specified, dictating the mask's
    maximum size along each axis.

    Args:
        num_masks_x (Union[int, tuple[int, int]]): Number or range of horizontal regions to mask. Defaults to 0.
        num_masks_y (Union[int, tuple[int, int]]): Number or range of vertical regions to mask. Defaults to 0.
        mask_x_length ([Union[int, tuple[int, int]]): Specifies the length of the masks along
            the X (horizontal) axis. If an integer is provided, it sets a fixed mask length.
            If a tuple of two integers (min, max) is provided,
            the mask length is randomly chosen within this range for each mask.
            This allows for variable-length masks in the horizontal direction.
        mask_y_length (Union[int, tuple[int, int]]): Specifies the height of the masks along
            the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height,
            while a tuple (min, max) allows for variable-height masks, chosen randomly
            within the specified range for each mask. This flexibility facilitates creating masks of various
            sizes in the vertical direction.
        fill_value (Union[int, float, list[int], list[float]]): Value to fill image masks. Defaults to 0.
        mask_fill_value (Optional[Union[int, float, list[int], list[float]]]): Value to fill masks in the mask.
            If `None`, uses mask is not affected. Default: `None`.
        p (float): Probability of applying the transform. Defaults to 0.5.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Note: Either `max_x_length` or `max_y_length` or both must be defined.

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        num_masks_x: NonNegativeIntRangeType
        num_masks_y: NonNegativeIntRangeType
        mask_x_length: NonNegativeIntRangeType
        mask_y_length: NonNegativeIntRangeType

        fill_value: ColorType
        mask_fill_value: ColorType

        @model_validator(mode="after")
        def check_mask_length(self) -> Self:
            if (
                isinstance(self.mask_x_length, int)
                and self.mask_x_length <= 0
                and isinstance(self.mask_y_length, int)
                and self.mask_y_length <= 0
            ):
                msg = "At least one of `mask_x_length` or `mask_y_length` Should be a positive number."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        num_masks_x: ScaleIntType = 0,
        num_masks_y: ScaleIntType = 0,
        mask_x_length: ScaleIntType = 0,
        mask_y_length: ScaleIntType = 0,
        fill_value: ColorType = 0,
        mask_fill_value: ColorType = 0,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.num_masks_x = cast(Tuple[int, int], num_masks_x)
        self.num_masks_y = cast(Tuple[int, int], num_masks_y)

        self.mask_x_length = cast(Tuple[int, int], mask_x_length)
        self.mask_y_length = cast(Tuple[int, int], mask_y_length)
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value

    def apply(
        self,
        img: np.ndarray,
        masks_x: list[tuple[int, int, int, int]],
        masks_y: list[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        return cutout(img, masks_x + masks_y, self.fill_value)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        masks_x: list[tuple[int, int, int, int]],
        masks_y: list[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        if self.mask_fill_value is None:
            return mask
        return cutout(mask, masks_x + masks_y, self.mask_fill_value)

    def validate_mask_length(
        self,
        mask_length: tuple[int, int] | None,
        dimension_size: int,
        dimension_name: str,
    ) -> None:
        """Validate the mask length against the corresponding image dimension size.

        Args:
            mask_length (Optional[tuple[int, int]]): The length of the mask to be validated.
            dimension_size (int): The size of the image dimension (width or height)
                against which to validate the mask length.
            dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

        """
        if mask_length is not None:
            if isinstance(mask_length, (tuple, list)):
                if mask_length[0] < 0 or mask_length[1] > dimension_size:
                    raise ValueError(
                        f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]",
                    )
            elif mask_length < 0 or mask_length > dimension_size:
                raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, list[tuple[int, int, int, int]]]:
        height, width = params["shape"][:2]

        # Use the helper method to validate mask lengths against image dimensions
        self.validate_mask_length(self.mask_x_length, width, "mask_x_length")
        self.validate_mask_length(self.mask_y_length, height, "mask_y_length")

        masks_x = self.generate_masks(self.num_masks_x, width, height, self.mask_x_length, axis="x")
        masks_y = self.generate_masks(self.num_masks_y, width, height, self.mask_y_length, axis="y")

        return {"masks_x": masks_x, "masks_y": masks_y}

    @staticmethod
    def generate_mask_size(mask_length: tuple[int, int]) -> int:
        return random.randint(mask_length[0], mask_length[1])

    def generate_masks(
        self,
        num_masks: tuple[int, int],
        width: int,
        height: int,
        max_length: tuple[int, int] | None,
        axis: str,
    ) -> list[tuple[int, int, int, int]]:
        if max_length is None or max_length == 0 or isinstance(num_masks, (int, float)) and num_masks == 0:
            return []

        masks = []

        num_masks_integer = num_masks if isinstance(num_masks, int) else random.randint(num_masks[0], num_masks[1])

        for _ in range(num_masks_integer):
            length = self.generate_mask_size(max_length)

            if axis == "x":
                x1 = random.randint(0, width - length)
                y1 = 0
                x2, y2 = x1 + length, height
            else:  # axis == 'y'
                y1 = random.randint(0, height - length)
                x1 = 0
                x2, y2 = width, y1 + length

            masks.append((x1, y1, x2, y2))
        return masks

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        masks_x: list[tuple[int, int, int, int]],
        masks_y: list[tuple[int, int, int, int]],
        **params: Any,
    ) -> np.ndarray:
        return filter_keypoints_in_holes(keypoints, np.array(masks_x + masks_y))

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "num_masks_x",
            "num_masks_y",
            "mask_x_length",
            "mask_y_length",
            "fill_value",
            "mask_fill_value",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "keypoints": self.apply_to_keypoints,
        }
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
class InitSchema(BaseTransformInitSchema):
    num_masks_x: NonNegativeIntRangeType
    num_masks_y: NonNegativeIntRangeType
    mask_x_length: NonNegativeIntRangeType
    mask_y_length: NonNegativeIntRangeType

    fill_value: ColorType
    mask_fill_value: ColorType

    @model_validator(mode="after")
    def check_mask_length(self) -> Self:
        if (
            isinstance(self.mask_x_length, int)
            and self.mask_x_length <= 0
            and isinstance(self.mask_y_length, int)
            and self.mask_y_length <= 0
        ):
            msg = "At least one of `mask_x_length` or `mask_y_length` Should be a positive number."
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, masks_x, masks_y, **params)

Apply transform on image.

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
def apply(
    self,
    img: np.ndarray,
    masks_x: list[tuple[int, int, int, int]],
    masks_y: list[tuple[int, int, int, int]],
    **params: Any,
) -> np.ndarray:
    return cutout(img, masks_x + masks_y, self.fill_value)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, list[tuple[int, int, int, int]]]:
    height, width = params["shape"][:2]

    # Use the helper method to validate mask lengths against image dimensions
    self.validate_mask_length(self.mask_x_length, width, "mask_x_length")
    self.validate_mask_length(self.mask_y_length, height, "mask_y_length")

    masks_x = self.generate_masks(self.num_masks_x, width, height, self.mask_x_length, axis="x")
    masks_y = self.generate_masks(self.num_masks_y, width, height, self.mask_y_length, axis="y")

    return {"masks_x": masks_x, "masks_y": masks_y}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "num_masks_x",
        "num_masks_y",
        "mask_x_length",
        "mask_y_length",
        "fill_value",
        "mask_fill_value",
    )
validate_mask_length (self, mask_length, dimension_size, dimension_name)

Validate the mask length against the corresponding image dimension size.

Parameters:

Name Type Description
mask_length Optional[tuple[int, int]]

The length of the mask to be validated.

dimension_size int

The size of the image dimension (width or height) against which to validate the mask length.

dimension_name str

The name of the dimension ('width' or 'height') for error messaging.

Source code in albumentations/augmentations/dropout/xy_masking.py
Python
def validate_mask_length(
    self,
    mask_length: tuple[int, int] | None,
    dimension_size: int,
    dimension_name: str,
) -> None:
    """Validate the mask length against the corresponding image dimension size.

    Args:
        mask_length (Optional[tuple[int, int]]): The length of the mask to be validated.
        dimension_size (int): The size of the image dimension (width or height)
            against which to validate the mask length.
        dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.

    """
    if mask_length is not None:
        if isinstance(mask_length, (tuple, list)):
            if mask_length[0] < 0 or mask_length[1] > dimension_size:
                raise ValueError(
                    f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]",
                )
        elif mask_length < 0 or mask_length > dimension_size:
            raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")

functional

def add_fog (img, fog_intensity, alpha_coef, fog_particle_positions, random_state=None) [view source on GitHub]

Add fog to the input image.

Parameters:

Name Type Description
img np.ndarray

Input image.

fog_intensity float

Intensity of the fog effect, between 0 and 1.

alpha_coef float

Base alpha (transparency) value for fog particles.

fog_particle_positions list[tuple[int, int]]

List of (x, y) coordinates for fog particles.

random_state np.random.RandomState | None

If specified, this will be random state used

Returns:

Type Description
np.ndarray

Image with added fog effect.

Source code in albumentations/augmentations/functional.py
Python
@clipped
@preserve_channel_dim
def add_fog(
    img: np.ndarray,
    fog_intensity: float,
    alpha_coef: float,
    fog_particle_positions: list[tuple[int, int]],
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Add fog to the input image.

    Args:
        img (np.ndarray): Input image.
        fog_intensity (float): Intensity of the fog effect, between 0 and 1.
        alpha_coef (float): Base alpha (transparency) value for fog particles.
        fog_particle_positions (list[tuple[int, int]]): List of (x, y) coordinates for fog particles.
        random_state (np.random.RandomState | None): If specified, this will be random state used
    Returns:
        np.ndarray: Image with added fog effect.
    """
    input_dtype = img.dtype

    if input_dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    height, width = img.shape[:2]
    num_channels = get_num_channels(img)

    fog_layer = np.zeros((height, width, num_channels), dtype=np.uint8)

    max_fog_radius = int(
        min(height, width) * 0.1 * fog_intensity,
    )  # Maximum radius scales with image size and intensity

    for x, y in fog_particle_positions:
        radius = random_utils.randint(max_fog_radius // 2, max_fog_radius, random_state=random_state)
        color = 255 if num_channels == 1 else (255,) * num_channels
        cv2.circle(
            fog_layer,
            center=(x, y),
            radius=radius,
            color=color,
            thickness=-1,
        )

    # Apply gaussian blur to the fog layer
    fog_layer = cv2.GaussianBlur(fog_layer, (25, 25), 0)

    # Blend the fog layer with the original image
    alpha = np.mean(fog_layer, axis=2, keepdims=True) / 255 * alpha_coef * fog_intensity
    fog_image = img * (1 - alpha) + fog_layer * alpha

    fog_image = fog_image.astype(np.uint8)

    return to_float(fog_image, max_value=255) if input_dtype == np.float32 else fog_image

def add_rain (img, slant, drop_length, drop_width, drop_color, blur_value, brightness_coefficient, rain_drops) [view source on GitHub]

Adds rain drops to the image.

Parameters:

Name Type Description
img np.ndarray

Input image.

slant int

The angle of the rain drops.

drop_length int

The length of each rain drop.

drop_width int

The width of each rain drop.

drop_color tuple[int, int, int]

The color of the rain drops in RGB format.

blur_value int

The size of the kernel used to blur the image. Rainy views are blurry.

brightness_coefficient float

Coefficient to adjust the brightness of the image. Rainy days are usually shady.

rain_drops list[tuple[int, int]]

A list of tuples where each tuple represents the (x, y) coordinates of the starting point of a rain drop.

Returns:

Type Description
np.ndarray

Image with rain effect added.

Source code in albumentations/augmentations/functional.py
Python
@preserve_channel_dim
def add_rain(
    img: np.ndarray,
    slant: int,
    drop_length: int,
    drop_width: int,
    drop_color: tuple[int, int, int],
    blur_value: int,
    brightness_coefficient: float,
    rain_drops: list[tuple[int, int]],
) -> np.ndarray:
    """Adds rain drops to the image.

    Args:
        img (np.ndarray): Input image.
        slant (int): The angle of the rain drops.
        drop_length (int): The length of each rain drop.
        drop_width (int): The width of each rain drop.
        drop_color (tuple[int, int, int]): The color of the rain drops in RGB format.
        blur_value (int): The size of the kernel used to blur the image. Rainy views are blurry.
        brightness_coefficient (float): Coefficient to adjust the brightness of the image. Rainy days are usually shady.
        rain_drops (list[tuple[int, int]]): A list of tuples where each tuple represents the (x, y)
            coordinates of the starting point of a rain drop.

    Returns:
        np.ndarray: Image with rain effect added.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    input_dtype = img.dtype

    image = from_float(img, target_dtype=np.uint8) if input_dtype == np.float32 else img.astype(np.uint8)

    for rain_drop_x0, rain_drop_y0 in rain_drops:
        rain_drop_x1 = rain_drop_x0 + slant
        rain_drop_y1 = rain_drop_y0 + drop_length

        cv2.line(
            image,
            (rain_drop_x0, rain_drop_y0),
            (rain_drop_x1, rain_drop_y1),
            drop_color,
            drop_width,
        )

    image = cv2.blur(image, (blur_value, blur_value))  # rainy view are blurry
    image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype(np.float32)
    image_hsv[:, :, 2] *= brightness_coefficient

    image_rgb = cv2.cvtColor(image_hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)

    return to_float(image_rgb) if input_dtype == np.float32 else image_rgb

def add_shadow (img, vertices_list, intensities) [view source on GitHub]

Add shadows to the image by reducing the intensity of the pixel values in specified regions.

Parameters:

Name Type Description
img np.ndarray

Input image. Multichannel images are supported.

vertices_list list[np.ndarray]

List of vertices for shadow polygons.

intensities np.ndarray

Array of shadow intensities. Range is [0, 1].

Returns:

Type Description
np.ndarray

Image with shadows added.

Source code in albumentations/augmentations/functional.py
Python
@preserve_channel_dim
def add_shadow(img: np.ndarray, vertices_list: list[np.ndarray], intensities: np.ndarray) -> np.ndarray:
    """Add shadows to the image by reducing the intensity of the pixel values in specified regions.

    Args:
        img (np.ndarray): Input image. Multichannel images are supported.
        vertices_list (list[np.ndarray]): List of vertices for shadow polygons.
        intensities (np.ndarray): Array of shadow intensities. Range is [0, 1].

    Returns:
        np.ndarray: Image with shadows added.

    Reference:
        https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    input_dtype = img.dtype

    num_channels = get_num_channels(img)
    max_value = MAX_VALUES_BY_DTYPE[np.uint8]

    if input_dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    img_shadowed = img.copy()

    # Iterate over the vertices and intensity list
    for vertices, shadow_intensity in zip(vertices_list, intensities):
        # Create mask for the current shadow polygon
        mask = np.zeros((img.shape[0], img.shape[1], 1), dtype=np.uint8)
        cv2.fillPoly(mask, [vertices], (max_value,))

        # Duplicate the mask to have the same number of channels as the image
        mask = np.repeat(mask, num_channels, axis=2)

        # Apply shadow to the channels directly
        # It could be tempting to convert to HLS and apply the shadow to the L channel, but it creates artifacts
        shadowed_indices = mask[:, :, 0] == max_value
        img_shadowed[shadowed_indices] = clip(
            img_shadowed[shadowed_indices] * shadow_intensity,
            np.uint8,
        )

    return to_float(img_shadowed) if input_dtype == np.float32 else img_shadowed

def add_snow_bleach (img, snow_point, brightness_coeff) [view source on GitHub]

Adds a simple snow effect to the image by bleaching out pixels.

This function simulates a basic snow effect by increasing the brightness of pixels that are above a certain threshold (snow_point). It operates in the HLS color space to modify the lightness channel.

Parameters:

Name Type Description
img np.ndarray

Input image. Can be either RGB uint8 or float32.

snow_point float

A float in the range [0, 1], scaled and adjusted to determine the threshold for pixel modification. Higher values result in less snow effect.

brightness_coeff float

Coefficient applied to increase the brightness of pixels below the snow_point threshold. Larger values lead to more pronounced snow effects. Should be greater than 1.0 for a visible effect.

Returns:

Type Description
np.ndarray

Image with simulated snow effect. The output has the same dtype as the input.

Note

  • This function converts the image to the HLS color space to modify the lightness channel.
  • The snow effect is created by selectively increasing the brightness of pixels.
  • This method tends to create a 'bleached' look, which may not be as realistic as more advanced snow simulation techniques.
  • The function automatically handles both uint8 and float32 input images.

The snow effect is created through the following steps: 1. Convert the image from RGB to HLS color space. 2. Adjust the snow_point threshold. 3. Increase the lightness of pixels below the threshold. 4. Convert the image back to RGB.

Mathematical Formulation: Let L be the lightness channel in HLS space. For each pixel (i, j): If L[i, j] < snow_point: L[i, j] = L[i, j] * brightness_coeff

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> snowy_image = A.functional.add_snow_v1(image, snow_point=0.5, brightness_coeff=1.5)
Source code in albumentations/augmentations/functional.py
Python
def add_snow_bleach(img: np.ndarray, snow_point: float, brightness_coeff: float) -> np.ndarray:
    """Adds a simple snow effect to the image by bleaching out pixels.

    This function simulates a basic snow effect by increasing the brightness of pixels
    that are above a certain threshold (snow_point). It operates in the HLS color space
    to modify the lightness channel.

    Args:
        img (np.ndarray): Input image. Can be either RGB uint8 or float32.
        snow_point (float): A float in the range [0, 1], scaled and adjusted to determine
            the threshold for pixel modification. Higher values result in less snow effect.
        brightness_coeff (float): Coefficient applied to increase the brightness of pixels
            below the snow_point threshold. Larger values lead to more pronounced snow effects.
            Should be greater than 1.0 for a visible effect.

    Returns:
        np.ndarray: Image with simulated snow effect. The output has the same dtype as the input.

    Note:
        - This function converts the image to the HLS color space to modify the lightness channel.
        - The snow effect is created by selectively increasing the brightness of pixels.
        - This method tends to create a 'bleached' look, which may not be as realistic as more
          advanced snow simulation techniques.
        - The function automatically handles both uint8 and float32 input images.

    The snow effect is created through the following steps:
    1. Convert the image from RGB to HLS color space.
    2. Adjust the snow_point threshold.
    3. Increase the lightness of pixels below the threshold.
    4. Convert the image back to RGB.

    Mathematical Formulation:
        Let L be the lightness channel in HLS space.
        For each pixel (i, j):
        If L[i, j] < snow_point:
            L[i, j] = L[i, j] * brightness_coeff

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> snowy_image = A.functional.add_snow_v1(image, snow_point=0.5, brightness_coeff=1.5)

    References:
        - HLS Color Space: https://en.wikipedia.org/wiki/HSL_and_HSV
        - Original implementation: https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
    """
    input_dtype = img.dtype
    max_value = MAX_VALUES_BY_DTYPE[np.uint8]

    snow_point *= max_value / 2
    snow_point += max_value / 3

    if input_dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
    image_hls = np.array(image_hls, dtype=np.float32)

    image_hls[:, :, 1][image_hls[:, :, 1] < snow_point] *= brightness_coeff

    image_hls[:, :, 1] = clip(image_hls[:, :, 1], np.uint8)

    image_hls = np.array(image_hls, dtype=np.uint8)

    image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)

    return to_float(image_rgb) if input_dtype == np.float32 else image_rgb

def add_snow_texture (img, snow_point, brightness_coeff) [view source on GitHub]

Add a realistic snow effect to the input image.

This function simulates snowfall by applying multiple visual effects to the image, including brightness adjustment, snow texture overlay, depth simulation, and color tinting. The result is a more natural-looking snow effect compared to simple pixel bleaching methods.

Parameters:

Name Type Description
img np.ndarray

Input image in RGB format.

snow_point float

Coefficient that controls the amount and intensity of snow. Should be in the range [0, 1], where 0 means no snow and 1 means maximum snow effect.

brightness_coeff float

Coefficient for brightness adjustment to simulate the reflective nature of snow. Should be in the range [0, 1], where higher values result in a brighter image.

Returns:

Type Description
np.ndarray

Image with added snow effect. The output has the same dtype as the input.

Note

  • The function first converts the image to HSV color space for better control over brightness and color adjustments.
  • A snow texture is generated using Gaussian noise and then filtered for a more natural appearance.
  • A depth effect is simulated, with more snow at the top of the image and less at the bottom.
  • A slight blue tint is added to simulate the cool color of snow.
  • Random sparkle effects are added to simulate light reflecting off snow crystals.

The snow effect is created through the following steps: 1. Brightness adjustment in HSV space 2. Generation of a snow texture using Gaussian noise 3. Application of a depth effect to the snow texture 4. Blending of the snow texture with the original image 5. Addition of a cool blue tint 6. Addition of sparkle effects

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> snowy_image = A.functional.add_snow_v2(image, snow_coeff=0.5, brightness_coeff=0.2)

Note

This function works with both uint8 and float32 image types, automatically handling the conversion between them.

Source code in albumentations/augmentations/functional.py
Python
def add_snow_texture(img: np.ndarray, snow_point: float, brightness_coeff: float) -> np.ndarray:
    """Add a realistic snow effect to the input image.

    This function simulates snowfall by applying multiple visual effects to the image,
    including brightness adjustment, snow texture overlay, depth simulation, and color tinting.
    The result is a more natural-looking snow effect compared to simple pixel bleaching methods.

    Args:
        img (np.ndarray): Input image in RGB format.
        snow_point (float): Coefficient that controls the amount and intensity of snow.
            Should be in the range [0, 1], where 0 means no snow and 1 means maximum snow effect.
        brightness_coeff (float): Coefficient for brightness adjustment to simulate the
            reflective nature of snow. Should be in the range [0, 1], where higher values
            result in a brighter image.

    Returns:
        np.ndarray: Image with added snow effect. The output has the same dtype as the input.

    Note:
        - The function first converts the image to HSV color space for better control over
          brightness and color adjustments.
        - A snow texture is generated using Gaussian noise and then filtered for a more
          natural appearance.
        - A depth effect is simulated, with more snow at the top of the image and less at the bottom.
        - A slight blue tint is added to simulate the cool color of snow.
        - Random sparkle effects are added to simulate light reflecting off snow crystals.

    The snow effect is created through the following steps:
    1. Brightness adjustment in HSV space
    2. Generation of a snow texture using Gaussian noise
    3. Application of a depth effect to the snow texture
    4. Blending of the snow texture with the original image
    5. Addition of a cool blue tint
    6. Addition of sparkle effects

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> snowy_image = A.functional.add_snow_v2(image, snow_coeff=0.5, brightness_coeff=0.2)

    Note:
        This function works with both uint8 and float32 image types, automatically
        handling the conversion between them.

    References:
        - Perlin Noise: https://en.wikipedia.org/wiki/Perlin_noise
        - HSV Color Space: https://en.wikipedia.org/wiki/HSL_and_HSV
    """
    input_dtype = img.dtype

    if input_dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    max_value = MAX_VALUES_BY_DTYPE[np.uint8]

    # Convert to HSV for better color control
    img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV).astype(np.float32)

    # Increase brightness
    img_hsv[:, :, 2] = np.clip(img_hsv[:, :, 2] * (1 + brightness_coeff * snow_point), 0, max_value)

    # Generate snow texture
    snow_texture = random_utils.normal(size=img.shape[:2], loc=0.5, scale=0.3)
    snow_texture = cv2.GaussianBlur(snow_texture, (0, 0), sigmaX=1, sigmaY=1)

    # Create depth effect for snow simulation
    # More snow accumulates at the top of the image, gradually decreasing towards the bottom
    # This simulates natural snow distribution on surfaces
    # The effect is achieved using a linear gradient from 1 (full snow) to 0.2 (less snow)
    rows = img.shape[0]
    depth_effect = np.linspace(1, 0.2, rows)[:, np.newaxis]
    snow_texture *= depth_effect

    # Apply snow texture
    snow_layer = (np.dstack([snow_texture] * 3) * max_value * snow_point).astype(np.float32)

    # Blend snow with original image
    img_with_snow = cv2.addWeighted(img_hsv, 1, snow_layer, 1, 0)

    # Add a slight blue tint to simulate cool snow color
    blue_tint = np.full_like(img_with_snow, (0.6, 0.75, 1))  # Slight blue in HSV

    img_with_snow = cv2.addWeighted(img_with_snow, 0.85, blue_tint, 0.15 * snow_point, 0)

    # Convert back to RGB
    img_with_snow = cv2.cvtColor(img_with_snow.astype(np.uint8), cv2.COLOR_HSV2RGB)

    # Add some sparkle effects for snow glitter
    sparkle = random_utils.random(img.shape[:2]) > 0.99  # noqa: PLR2004
    img_with_snow[sparkle] = [max_value, max_value, max_value]

    return to_float(img_with_snow) if input_dtype == np.float32 else img_with_snow

def add_sun_flare_overlay (img, flare_center, src_radius, src_color, circles) [view source on GitHub]

Add a sun flare effect to an image using a simple overlay technique.

This function creates a basic sun flare effect by overlaying multiple semi-transparent circles of varying sizes and intensities on the input image. The effect simulates a simple lens flare caused by bright light sources.

Parameters:

Name Type Description
img np.ndarray

The input image.

flare_center tuple[float, float]

(x, y) coordinates of the flare center in pixel coordinates.

src_radius int

The radius of the main sun circle in pixels.

src_color ColorType

The color of the sun, represented as a tuple of RGB values.

circles list[Any]

A list of tuples, each representing a circle that contributes to the flare effect. Each tuple contains: - alpha (float): The transparency of the circle (0.0 to 1.0). - center (tuple[int, int]): (x, y) coordinates of the circle center. - radius (int): The radius of the circle. - color (tuple[int, int, int]): RGB color of the circle.

Returns:

Type Description
np.ndarray

The output image with the sun flare effect added.

Note

  • This function uses a simple alpha blending technique to overlay flare elements.
  • The main sun is created as a gradient circle, fading from the center outwards.
  • Additional flare circles are added along an imaginary line from the sun's position.
  • This method is computationally efficient but may produce less realistic results compared to more advanced techniques.

The flare effect is created through the following steps: 1. Create an overlay image and output image as copies of the input. 2. Add smaller flare circles to the overlay. 3. Blend the overlay with the output image using alpha compositing. 4. Add the main sun circle with a radial gradient.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> flare_center = (50, 50)
>>> src_radius = 20
>>> src_color = (255, 255, 200)
>>> circles = [
...     (0.1, (60, 60), 5, (255, 200, 200)),
...     (0.2, (70, 70), 3, (200, 255, 200))
... ]
>>> flared_image = A.functional.add_sun_flare_overlay(
...     image, flare_center, src_radius, src_color, circles
... )
Source code in albumentations/augmentations/functional.py
Python
@preserve_channel_dim
def add_sun_flare_overlay(
    img: np.ndarray,
    flare_center: tuple[float, float],
    src_radius: int,
    src_color: ColorType,
    circles: list[Any],
) -> np.ndarray:
    """Add a sun flare effect to an image using a simple overlay technique.

    This function creates a basic sun flare effect by overlaying multiple semi-transparent
    circles of varying sizes and intensities on the input image. The effect simulates
    a simple lens flare caused by bright light sources.

    Args:
        img (np.ndarray): The input image.
        flare_center (tuple[float, float]): (x, y) coordinates of the flare center
            in pixel coordinates.
        src_radius (int): The radius of the main sun circle in pixels.
        src_color (ColorType): The color of the sun, represented as a tuple of RGB values.
        circles (list[Any]): A list of tuples, each representing a circle that contributes
            to the flare effect. Each tuple contains:
            - alpha (float): The transparency of the circle (0.0 to 1.0).
            - center (tuple[int, int]): (x, y) coordinates of the circle center.
            - radius (int): The radius of the circle.
            - color (tuple[int, int, int]): RGB color of the circle.

    Returns:
        np.ndarray: The output image with the sun flare effect added.

    Note:
        - This function uses a simple alpha blending technique to overlay flare elements.
        - The main sun is created as a gradient circle, fading from the center outwards.
        - Additional flare circles are added along an imaginary line from the sun's position.
        - This method is computationally efficient but may produce less realistic results
          compared to more advanced techniques.

    The flare effect is created through the following steps:
    1. Create an overlay image and output image as copies of the input.
    2. Add smaller flare circles to the overlay.
    3. Blend the overlay with the output image using alpha compositing.
    4. Add the main sun circle with a radial gradient.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> flare_center = (50, 50)
        >>> src_radius = 20
        >>> src_color = (255, 255, 200)
        >>> circles = [
        ...     (0.1, (60, 60), 5, (255, 200, 200)),
        ...     (0.2, (70, 70), 3, (200, 255, 200))
        ... ]
        >>> flared_image = A.functional.add_sun_flare_overlay(
        ...     image, flare_center, src_radius, src_color, circles
        ... )

    References:
        - Alpha compositing: https://en.wikipedia.org/wiki/Alpha_compositing
        - Lens flare: https://en.wikipedia.org/wiki/Lens_flare
    """
    input_dtype = img.dtype
    if input_dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    overlay = img.copy()
    output = img.copy()

    for alpha, (x, y), rad3, (r_color, g_color, b_color) in circles:
        cv2.circle(overlay, (x, y), rad3, (r_color, g_color, b_color), -1)
        output = add_weighted(overlay, alpha, output, 1 - alpha)

    point = [int(x) for x in flare_center]

    overlay = output.copy()
    num_times = src_radius // 10
    alpha = np.linspace(0.0, 1, num=num_times)
    rad = np.linspace(1, src_radius, num=num_times)
    for i in range(num_times):
        cv2.circle(overlay, point, int(rad[i]), src_color, -1)
        alp = alpha[num_times - i - 1] * alpha[num_times - i - 1] * alpha[num_times - i - 1]
        output = add_weighted(overlay, alp, output, 1 - alp)

    return to_float(output) if input_dtype == np.float32 else output

def add_sun_flare_physics_based (img, flare_center, src_radius, src_color, circles) [view source on GitHub]

Add a more realistic sun flare effect to the image.

This function creates a complex sun flare effect by simulating various optical phenomena that occur in real camera lenses when capturing bright light sources. The result is a more realistic and physically plausible lens flare effect.

Parameters:

Name Type Description
img np.ndarray

Input image.

flare_center tuple[int, int]

(x, y) coordinates of the sun's center in pixels.

src_radius int

Radius of the main sun circle in pixels.

src_color tuple[int, int, int]

Color of the sun in RGB format.

circles list[Any]

List of tuples, each representing a flare circle with parameters: (alpha, center, size, color) - alpha (float): Transparency of the circle (0.0 to 1.0). - center (tuple[int, int]): (x, y) coordinates of the circle center. - size (float): Size factor for the circle radius. - color (tuple[int, int, int]): RGB color of the circle.

Returns:

Type Description
np.ndarray

Image with added sun flare effect.

Note

This function implements several techniques to create a more realistic flare: 1. Separate flare layer: Allows for complex manipulations of the flare effect. 2. Lens diffraction spikes: Simulates light diffraction in camera aperture. 3. Radial gradient mask: Creates natural fading of the flare from the center. 4. Gaussian blur: Softens the flare for a more natural glow effect. 5. Chromatic aberration: Simulates color fringing often seen in real lens flares. 6. Screen blending: Provides a more realistic blending of the flare with the image.

The flare effect is created through the following steps: 1. Create a separate flare layer. 2. Add the main sun circle and diffraction spikes to the flare layer. 3. Add additional flare circles based on the input parameters. 4. Apply Gaussian blur to soften the flare. 5. Create and apply a radial gradient mask for natural fading. 6. Simulate chromatic aberration by applying different blurs to color channels. 7. Blend the flare with the original image using screen blending mode.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [1000, 1000, 3], dtype=np.uint8)
>>> flare_center = (500, 500)
>>> src_radius = 50
>>> src_color = (255, 255, 200)
>>> circles = [
...     (0.1, (550, 550), 10, (255, 200, 200)),
...     (0.2, (600, 600), 5, (200, 255, 200))
... ]
>>> flared_image = A.functional.add_sun_flare_physics_based(
...     image, flare_center, src_radius, src_color, circles
... )
Source code in albumentations/augmentations/functional.py
Python
@clipped
def add_sun_flare_physics_based(
    img: np.ndarray,
    flare_center: tuple[int, int],
    src_radius: int,
    src_color: tuple[int, int, int],
    circles: list[Any],
) -> np.ndarray:
    """Add a more realistic sun flare effect to the image.

    This function creates a complex sun flare effect by simulating various optical phenomena
    that occur in real camera lenses when capturing bright light sources. The result is a
    more realistic and physically plausible lens flare effect.

    Args:
        img (np.ndarray): Input image.
        flare_center (tuple[int, int]): (x, y) coordinates of the sun's center in pixels.
        src_radius (int): Radius of the main sun circle in pixels.
        src_color (tuple[int, int, int]): Color of the sun in RGB format.
        circles (list[Any]): List of tuples, each representing a flare circle with parameters:
            (alpha, center, size, color)
            - alpha (float): Transparency of the circle (0.0 to 1.0).
            - center (tuple[int, int]): (x, y) coordinates of the circle center.
            - size (float): Size factor for the circle radius.
            - color (tuple[int, int, int]): RGB color of the circle.

    Returns:
        np.ndarray: Image with added sun flare effect.

    Note:
        This function implements several techniques to create a more realistic flare:
        1. Separate flare layer: Allows for complex manipulations of the flare effect.
        2. Lens diffraction spikes: Simulates light diffraction in camera aperture.
        3. Radial gradient mask: Creates natural fading of the flare from the center.
        4. Gaussian blur: Softens the flare for a more natural glow effect.
        5. Chromatic aberration: Simulates color fringing often seen in real lens flares.
        6. Screen blending: Provides a more realistic blending of the flare with the image.

    The flare effect is created through the following steps:
    1. Create a separate flare layer.
    2. Add the main sun circle and diffraction spikes to the flare layer.
    3. Add additional flare circles based on the input parameters.
    4. Apply Gaussian blur to soften the flare.
    5. Create and apply a radial gradient mask for natural fading.
    6. Simulate chromatic aberration by applying different blurs to color channels.
    7. Blend the flare with the original image using screen blending mode.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [1000, 1000, 3], dtype=np.uint8)
        >>> flare_center = (500, 500)
        >>> src_radius = 50
        >>> src_color = (255, 255, 200)
        >>> circles = [
        ...     (0.1, (550, 550), 10, (255, 200, 200)),
        ...     (0.2, (600, 600), 5, (200, 255, 200))
        ... ]
        >>> flared_image = A.functional.add_sun_flare_physics_based(
        ...     image, flare_center, src_radius, src_color, circles
        ... )

    References:
        - Lens flare: https://en.wikipedia.org/wiki/Lens_flare
        - Diffraction: https://en.wikipedia.org/wiki/Diffraction
        - Chromatic aberration: https://en.wikipedia.org/wiki/Chromatic_aberration
        - Screen blending: https://en.wikipedia.org/wiki/Blend_modes#Screen
    """
    input_dtype = img.dtype
    if input_dtype == np.float32:
        img = from_float(img, dtype=np.uint8)

    output = img.copy()
    height, width = img.shape[:2]

    # Create a separate flare layer
    flare_layer = np.zeros_like(img, dtype=np.float32)

    # Add the main sun
    cv2.circle(flare_layer, flare_center, src_radius, src_color, -1)

    # Add lens diffraction spikes
    for angle in [0, 45, 90, 135]:
        end_point = (
            int(flare_center[0] + np.cos(np.radians(angle)) * max(width, height)),
            int(flare_center[1] + np.sin(np.radians(angle)) * max(width, height)),
        )
        cv2.line(flare_layer, flare_center, end_point, src_color, 2)

    # Add flare circles
    for _, center, size, color in circles:
        cv2.circle(flare_layer, center, int(size**0.33), color, -1)

    # Apply gaussian blur to soften the flare
    flare_layer = cv2.GaussianBlur(flare_layer, (0, 0), sigmaX=15, sigmaY=15)

    # Create a radial gradient mask
    y, x = np.ogrid[:height, :width]
    mask = np.sqrt((x - flare_center[0]) ** 2 + (y - flare_center[1]) ** 2)
    mask = 1 - np.clip(mask / (max(width, height) * 0.7), 0, 1)
    mask = np.dstack([mask] * 3)

    # Apply the mask to the flare layer
    flare_layer *= mask

    # Add chromatic aberration
    channels = list(cv2.split(flare_layer))
    channels[0] = cv2.GaussianBlur(channels[0], (0, 0), sigmaX=3, sigmaY=3)  # Blue channel
    channels[2] = cv2.GaussianBlur(channels[2], (0, 0), sigmaX=5, sigmaY=5)  # Red channel
    flare_layer = cv2.merge(channels)

    # Blend the flare with the original image using screen blending
    output = 255 - ((255 - output) * (255 - flare_layer) / 255)

    return to_float(output) if input_dtype == np.float32 else output

def almost_equal_intervals (n, parts) [view source on GitHub]

Generates an array of nearly equal integer intervals that sum up to n.

This function divides the number n into parts nearly equal parts. It ensures that the sum of all parts equals n, and the difference between any two parts is at most one. This is useful for distributing a total amount into nearly equal discrete parts.

Parameters:

Name Type Description
n int

The total value to be split.

parts int

The number of parts to split into.

Returns:

Type Description
np.ndarray

An array of integers where each integer represents the size of a part.

Examples:

Python
>>> almost_equal_intervals(20, 3)
array([7, 7, 6])  # Splits 20 into three parts: 7, 7, and 6
>>> almost_equal_intervals(16, 4)
array([4, 4, 4, 4])  # Splits 16 into four equal parts
Source code in albumentations/augmentations/functional.py
Python
def almost_equal_intervals(n: int, parts: int) -> np.ndarray:
    """Generates an array of nearly equal integer intervals that sum up to `n`.

    This function divides the number `n` into `parts` nearly equal parts. It ensures that
    the sum of all parts equals `n`, and the difference between any two parts is at most one.
    This is useful for distributing a total amount into nearly equal discrete parts.

    Args:
        n (int): The total value to be split.
        parts (int): The number of parts to split into.

    Returns:
        np.ndarray: An array of integers where each integer represents the size of a part.

    Example:
        >>> almost_equal_intervals(20, 3)
        array([7, 7, 6])  # Splits 20 into three parts: 7, 7, and 6
        >>> almost_equal_intervals(16, 4)
        array([4, 4, 4, 4])  # Splits 16 into four equal parts
    """
    part_size, remainder = divmod(n, parts)
    # Create an array with the base part size and adjust the first `remainder` parts by adding 1
    return np.array([part_size + 1 if i < remainder else part_size for i in range(parts)])

def clahe (img, clip_limit, tile_grid_size) [view source on GitHub]

Apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to the input image.

This function enhances the contrast of the input image using CLAHE. For color images, it converts the image to the LAB color space, applies CLAHE to the L channel, and then converts the image back to RGB.

Parameters:

Name Type Description
img np.ndarray

Input image. Can be grayscale (2D array) or RGB (3D array).

clip_limit float

Threshold for contrast limiting. Higher values give more contrast.

tile_grid_size tuple[int, int]

Size of grid for histogram equalization. Width and height of the grid.

Returns:

Type Description
np.ndarray

Image with CLAHE applied. The output has the same dtype as the input.

Note

  • If the input image is float32, it's temporarily converted to uint8 for processing and then converted back to float32.
  • For color images, CLAHE is applied only to the luminance channel in the LAB color space.

Exceptions:

Type Description
ValueError

If the input image is not 2D or 3D.

Examples:

Python
>>> import numpy as np
>>> img = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> result = clahe(img, clip_limit=2.0, tile_grid_size=(8, 8))
>>> assert result.shape == img.shape
>>> assert result.dtype == img.dtype
Source code in albumentations/augmentations/functional.py
Python
@preserve_channel_dim
def clahe(img: np.ndarray, clip_limit: float, tile_grid_size: tuple[int, int]) -> np.ndarray:
    """Apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to the input image.

    This function enhances the contrast of the input image using CLAHE. For color images,
    it converts the image to the LAB color space, applies CLAHE to the L channel, and then
    converts the image back to RGB.

    Args:
        img (np.ndarray): Input image. Can be grayscale (2D array) or RGB (3D array).
        clip_limit (float): Threshold for contrast limiting. Higher values give more contrast.
        tile_grid_size (tuple[int, int]): Size of grid for histogram equalization.
            Width and height of the grid.

    Returns:
        np.ndarray: Image with CLAHE applied. The output has the same dtype as the input.

    Note:
        - If the input image is float32, it's temporarily converted to uint8 for processing
          and then converted back to float32.
        - For color images, CLAHE is applied only to the luminance channel in the LAB color space.

    Raises:
        ValueError: If the input image is not 2D or 3D.

    Example:
        >>> import numpy as np
        >>> img = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> result = clahe(img, clip_limit=2.0, tile_grid_size=(8, 8))
        >>> assert result.shape == img.shape
        >>> assert result.dtype == img.dtype
    """
    img = img.copy()
    original_dtype = img.dtype

    if img.dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    clahe_mat = cv2.createCLAHE(clipLimit=clip_limit, tileGridSize=tile_grid_size)

    if is_grayscale_image(img):
        result = clahe_mat.apply(img)
        return to_float(result, max_value=255) if original_dtype == np.float32 else result

    img = cv2.cvtColor(img, cv2.COLOR_RGB2LAB)

    img[:, :, 0] = clahe_mat.apply(img[:, :, 0])

    result = cv2.cvtColor(img, cv2.COLOR_LAB2RGB)

    return to_float(result, max_value=255) if original_dtype == np.float32 else result

def create_shape_groups (tiles) [view source on GitHub]

Groups tiles by their shape and stores the indices for each shape.

Source code in albumentations/augmentations/functional.py
Python
def create_shape_groups(tiles: np.ndarray) -> dict[tuple[int, int], list[int]]:
    """Groups tiles by their shape and stores the indices for each shape."""
    shape_groups = defaultdict(list)
    for index, (start_y, start_x, end_y, end_x) in enumerate(tiles):
        shape = (end_y - start_y, end_x - start_x)
        shape_groups[shape].append(index)
    return shape_groups

def equalize (img, mask=None, mode='cv', by_channels=True) [view source on GitHub]

Apply histogram equalization to the input image.

This function enhances the contrast of the input image by equalizing its histogram. It supports both grayscale and color images, and can operate on individual channels or on the luminance channel of the image.

Parameters:

Name Type Description
img np.ndarray

Input image. Can be grayscale (2D array) or RGB (3D array).

mask np.ndarray | None

Optional mask to apply the equalization selectively. If provided, must have the same shape as the input image. Default: None.

mode ImageMode

The backend to use for equalization. Can be either "cv" for OpenCV or "pil" for Pillow-style equalization. Default: "cv".

by_channels bool

If True, applies equalization to each channel independently. If False, converts the image to YCrCb color space and equalizes only the luminance channel. Only applicable to color images. Default: True.

Returns:

Type Description
np.ndarray

Equalized image. The output has the same dtype as the input.

Exceptions:

Type Description
ValueError

If the input image or mask have invalid shapes or types.

Note

  • If the input image is not uint8, it will be temporarily converted to uint8 for processing and then converted back to its original dtype.
  • For color images, when by_channels=False, the image is converted to YCrCb color space, equalized on the Y channel, and then converted back to RGB.
  • The function preserves the original number of channels in the image.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> equalized = A.equalize(image, mode="cv", by_channels=True)
>>> assert equalized.shape == image.shape
>>> assert equalized.dtype == image.dtype
Source code in albumentations/augmentations/functional.py
Python
@preserve_channel_dim
def equalize(
    img: np.ndarray,
    mask: np.ndarray | None = None,
    mode: ImageMode = "cv",
    by_channels: bool = True,
) -> np.ndarray:
    """Apply histogram equalization to the input image.

    This function enhances the contrast of the input image by equalizing its histogram.
    It supports both grayscale and color images, and can operate on individual channels
    or on the luminance channel of the image.

    Args:
        img (np.ndarray): Input image. Can be grayscale (2D array) or RGB (3D array).
        mask (np.ndarray | None): Optional mask to apply the equalization selectively.
            If provided, must have the same shape as the input image. Default: None.
        mode (ImageMode): The backend to use for equalization. Can be either "cv" for
            OpenCV or "pil" for Pillow-style equalization. Default: "cv".
        by_channels (bool): If True, applies equalization to each channel independently.
            If False, converts the image to YCrCb color space and equalizes only the
            luminance channel. Only applicable to color images. Default: True.

    Returns:
        np.ndarray: Equalized image. The output has the same dtype as the input.

    Raises:
        ValueError: If the input image or mask have invalid shapes or types.

    Note:
        - If the input image is not uint8, it will be temporarily converted to uint8
          for processing and then converted back to its original dtype.
        - For color images, when by_channels=False, the image is converted to YCrCb
          color space, equalized on the Y channel, and then converted back to RGB.
        - The function preserves the original number of channels in the image.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> equalized = A.equalize(image, mode="cv", by_channels=True)
        >>> assert equalized.shape == image.shape
        >>> assert equalized.dtype == image.dtype
    """
    original_dtype = img.dtype

    if original_dtype != np.uint8:
        img = from_float(img, target_dtype=np.uint8)

    _check_preconditions(img, mask, by_channels)

    function = _equalize_pil if mode == "pil" else _equalize_cv

    if is_grayscale_image(img):
        return function(img, _handle_mask(mask))

    if not by_channels:
        result_img = cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb)
        result_img[..., 0] = function(result_img[..., 0], _handle_mask(mask))
        return cv2.cvtColor(result_img, cv2.COLOR_YCrCb2RGB)

    result_img = np.empty_like(img)
    for i in range(NUM_RGB_CHANNELS):
        _mask = _handle_mask(mask, i)
        result_img[..., i] = function(img[..., i], _mask)

    return to_float(result_img, max_value=255) if original_dtype == np.float32 else result_img

def fancy_pca (img, alpha_vector) [view source on GitHub]

Perform 'Fancy PCA' augmentation on an image with any number of channels.

Parameters:

Name Type Description
img np.ndarray

Input image

alpha_vector np.ndarray

Vector of scale factors for each principal component. Should have the same length as the number of channels in the image.

Returns:

Type Description
np.ndarray

Augmented image of the same shape, type, and range as the input.

Image types: uint8, float32

Number of channels: Any

Note

  • This function generalizes the Fancy PCA augmentation to work with any number of channels.
  • It preserves the original range of the image ([0, 255] for uint8, [0, 1] for float32).
  • For single-channel images, the augmentation is applied as a simple scaling of pixel intensity variation.
  • For multi-channel images, PCA is performed on the entire image, treating each pixel as a point in N-dimensional space (where N is the number of channels).
  • The augmentation preserves the correlation between channels while adding controlled noise.
  • Computation time may increase significantly for images with a large number of channels.

Reference

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Source code in albumentations/augmentations/functional.py
Python
@clipped
@preserve_channel_dim
def fancy_pca(img: np.ndarray, alpha_vector: np.ndarray) -> np.ndarray:
    """Perform 'Fancy PCA' augmentation on an image with any number of channels.

    Args:
        img (np.ndarray): Input image
        alpha_vector (np.ndarray): Vector of scale factors for each principal component.
                                   Should have the same length as the number of channels in the image.

    Returns:
        np.ndarray: Augmented image of the same shape, type, and range as the input.

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - This function generalizes the Fancy PCA augmentation to work with any number of channels.
        - It preserves the original range of the image ([0, 255] for uint8, [0, 1] for float32).
        - For single-channel images, the augmentation is applied as a simple scaling of pixel intensity variation.
        - For multi-channel images, PCA is performed on the entire image, treating each pixel
          as a point in N-dimensional space (where N is the number of channels).
        - The augmentation preserves the correlation between channels while adding controlled noise.
        - Computation time may increase significantly for images with a large number of channels.

    Reference:
        Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
        ImageNet classification with deep convolutional neural networks.
        In Advances in neural information processing systems (pp. 1097-1105).
    """
    orig_shape = img.shape
    orig_dtype = img.dtype
    num_channels = get_num_channels(img)

    # Convert to float32 and scale to [0, 1] if necessary
    if orig_dtype == np.uint8:
        img = to_float(img)

    # Reshape image to 2D array of pixels
    img_reshaped = img.reshape(-1, num_channels)

    # Center the pixel values
    img_mean = np.mean(img_reshaped, axis=0)
    img_centered = img_reshaped - img_mean

    if num_channels == 1:
        # For grayscale images, apply a simple scaling
        std_dev = np.std(img_centered)
        noise = alpha_vector[0] * std_dev * img_centered
    else:
        # Compute covariance matrix
        img_cov = np.cov(img_centered, rowvar=False)

        # Compute eigenvectors & eigenvalues of the covariance matrix
        eig_vals, eig_vecs = np.linalg.eigh(img_cov)

        # Sort eigenvectors by eigenvalues in descending order
        sort_perm = eig_vals[::-1].argsort()
        eig_vals = eig_vals[sort_perm]
        eig_vecs = eig_vecs[:, sort_perm]

        # Create noise vector
        noise = np.dot(np.dot(eig_vecs, np.diag(alpha_vector * eig_vals)), img_centered.T).T

    # Add noise to the image
    img_pca = img_reshaped + noise

    # Reshape back to original shape
    img_pca = img_pca.reshape(orig_shape)

    # Clip values to [0, 1] range
    img_pca = np.clip(img_pca, 0, 1)

    return from_float(img_pca, target_dtype=orig_dtype) if orig_dtype == np.uint8 else img_pca

def generate_shuffled_splits (size, divisions, random_state=None) [view source on GitHub]

Generate shuffled splits for a given dimension size and number of divisions.

Parameters:

Name Type Description
size int

Total size of the dimension (height or width).

divisions int

Number of divisions (rows or columns).

random_state Optional[np.random.RandomState]

Seed for the random number generator for reproducibility.

Returns:

Type Description
np.ndarray

Cumulative edges of the shuffled intervals.

Source code in albumentations/augmentations/functional.py
Python
def generate_shuffled_splits(
    size: int,
    divisions: int,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Generate shuffled splits for a given dimension size and number of divisions.

    Args:
        size (int): Total size of the dimension (height or width).
        divisions (int): Number of divisions (rows or columns).
        random_state (Optional[np.random.RandomState]): Seed for the random number generator for reproducibility.

    Returns:
        np.ndarray: Cumulative edges of the shuffled intervals.
    """
    intervals = almost_equal_intervals(size, divisions)
    intervals = random_utils.shuffle(intervals, random_state=random_state)
    return np.insert(np.cumsum(intervals), 0, 0)

def grayscale_to_multichannel (grayscale_image, num_output_channels=3) [view source on GitHub]

Convert a grayscale image to a multi-channel image.

This function takes a 2D grayscale image or a 3D image with a single channel and converts it to a multi-channel image by repeating the grayscale data across the specified number of channels.

Parameters:

Name Type Description
grayscale_image np.ndarray

Input grayscale image. Can be 2D (height, width) or 3D (height, width, 1).

num_output_channels int

Number of channels in the output image. Defaults to 3.

Returns:

Type Description
np.ndarray

Multi-channel image with shape (height, width, num_channels).

Note

If the input is already a multi-channel image with the desired number of channels, it will be returned unchanged.

Source code in albumentations/augmentations/functional.py
Python
def grayscale_to_multichannel(grayscale_image: np.ndarray, num_output_channels: int = 3) -> np.ndarray:
    """Convert a grayscale image to a multi-channel image.

    This function takes a 2D grayscale image or a 3D image with a single channel
    and converts it to a multi-channel image by repeating the grayscale data
    across the specified number of channels.

    Args:
        grayscale_image (np.ndarray): Input grayscale image. Can be 2D (height, width)
                                      or 3D (height, width, 1).
        num_output_channels (int, optional): Number of channels in the output image. Defaults to 3.

    Returns:
        np.ndarray: Multi-channel image with shape (height, width, num_channels).

    Note:
        If the input is already a multi-channel image with the desired number of channels,
        it will be returned unchanged.
    """
    grayscale_image = grayscale_image.copy().squeeze()
    return np.stack([grayscale_image] * num_output_channels, axis=-1)

def iso_noise (image, color_shift=0.05, intensity=0.5, random_state=None) [view source on GitHub]

Apply poisson noise to an image to simulate camera sensor noise.

Parameters:

Name Type Description
image np.ndarray

Input image. Currently, only RGB images are supported.

color_shift float

The amount of color shift to apply. Default is 0.05.

intensity float

Multiplication factor for noise values. Values of ~0.5 produce a noticeable, yet acceptable level of noise. Default is 0.5.

random_state np.random.RandomState | None

If specified, this will be random state used for noise generation.

Returns:

Type Description
np.ndarray

The noised image.

Image types: uint8, float32

Number of channels: 3

Source code in albumentations/augmentations/functional.py
Python
@clipped
def iso_noise(
    image: np.ndarray,
    color_shift: float = 0.05,
    intensity: float = 0.5,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Apply poisson noise to an image to simulate camera sensor noise.

    Args:
        image (np.ndarray): Input image. Currently, only RGB images are supported.
        color_shift (float): The amount of color shift to apply. Default is 0.05.
        intensity (float): Multiplication factor for noise values. Values of ~0.5 produce a noticeable,
                           yet acceptable level of noise. Default is 0.5.
        random_state (np.random.RandomState | None): If specified, this will be random state used
            for noise generation.

    Returns:
        np.ndarray: The noised image.

    Image types:
        uint8, float32

    Number of channels:
        3
    """
    input_dtype = image.dtype
    factor = 1

    if input_dtype == np.uint8:
        image = to_float(image)
        factor = MAX_VALUES_BY_DTYPE[input_dtype]

    hls = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
    _, stddev = cv2.meanStdDev(hls)

    luminance_noise = random_utils.poisson(stddev[1] * intensity * 255, size=hls.shape[:2], random_state=random_state)
    color_noise = random_utils.normal(0, color_shift * 360 * intensity, size=hls.shape[:2], random_state=random_state)

    hue = hls[..., 0]
    hue += color_noise
    hue %= 360

    luminance = hls[..., 1]
    luminance += (luminance_noise / 255) * (1.0 - luminance)

    return cv2.cvtColor(hls, cv2.COLOR_HLS2RGB) * factor

def move_tone_curve (img, low_y, high_y) [view source on GitHub]

Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

Parameters:

Name Type Description
img np.ndarray

np.ndarray. Any number of channels

low_y float | np.ndarray

per-channel or single y-position of a Bezier control point used to adjust the tone curve, must be in range [0, 1]

high_y float | np.ndarray

per-channel or single y-position of a Bezier control point used to adjust image tone curve, must be in range [0, 1]

Source code in albumentations/augmentations/functional.py
Python
@preserve_channel_dim
def move_tone_curve(
    img: np.ndarray,
    low_y: float | np.ndarray,
    high_y: float | np.ndarray,
) -> np.ndarray:
    """Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.

    Args:
        img: np.ndarray. Any number of channels
        low_y: per-channel or single y-position of a Bezier control point used
            to adjust the tone curve, must be in range [0, 1]
        high_y: per-channel or single y-position of a Bezier control point used
            to adjust image tone curve, must be in range [0, 1]

    """
    input_dtype = img.dtype

    if input_dtype == np.float32:
        img = from_float(img, target_dtype=np.uint8)

    t = np.linspace(0.0, 1.0, 256)

    def evaluate_bez(t: np.ndarray, low_y: float | np.ndarray, high_y: float | np.ndarray) -> np.ndarray:
        one_minus_t = 1 - t
        return (3 * one_minus_t**2 * t * low_y + 3 * one_minus_t * t**2 * high_y + t**3) * 255

    num_channels = get_num_channels(img)

    if np.isscalar(low_y) and np.isscalar(high_y):
        lut = clip(np.rint(evaluate_bez(t, low_y, high_y)), np.uint8)
        output = cv2.LUT(img, lut)
    elif isinstance(low_y, np.ndarray) and isinstance(high_y, np.ndarray):
        luts = clip(np.rint(evaluate_bez(t[:, np.newaxis], low_y, high_y).T), np.uint8)
        output = cv2.merge([cv2.LUT(img[:, :, i], luts[i]) for i in range(num_channels)])
    else:
        raise TypeError(
            f"low_y and high_y must both be of type float or np.ndarray. Got {type(low_y)} and {type(high_y)}",
        )

    return to_float(output, max_value=255) if input_dtype == np.float32 else output

def posterize (img, bits) [view source on GitHub]

Reduce the number of bits for each color channel.

Parameters:

Name Type Description
img np.ndarray

image to posterize.

bits int

number of high bits. Must be in range [0, 8]

Returns:

Type Description
np.ndarray

Image with reduced color channels.

Source code in albumentations/augmentations/functional.py
Python
@clipped
@preserve_channel_dim
def posterize(img: np.ndarray, bits: int) -> np.ndarray:
    """Reduce the number of bits for each color channel.

    Args:
        img: image to posterize.
        bits: number of high bits. Must be in range [0, 8]

    Returns:
        Image with reduced color channels.

    """
    bits_array = np.uint8(bits)

    original_dtype = img.dtype

    if original_dtype != np.uint8:
        img = from_float(img, target_dtype=np.uint8)

    if np.any((bits_array < 0) | (bits_array > EIGHT)):
        msg = "bits must be in range [0, 8]"
        raise ValueError(msg)

    if not bits_array.shape or len(bits_array) == 1:
        if bits_array == 0:
            return np.zeros_like(img)
        if bits_array == EIGHT:
            return img.copy()

        lut = np.arange(0, 256, dtype=np.uint8)
        mask = ~np.uint8(2 ** (8 - bits_array) - 1)
        lut &= mask

        return cv2.LUT(img, lut)

    if not is_rgb_image(img):
        msg = "If bits is iterable image must be RGB"
        raise TypeError(msg)

    result_img = np.empty_like(img)
    for i, channel_bits in enumerate(bits_array):
        if channel_bits == 0:
            result_img[..., i] = np.zeros_like(img[..., i])
        elif channel_bits == EIGHT:
            result_img[..., i] = img[..., i].copy()
        else:
            lut = np.arange(0, 256, dtype=np.uint8)
            mask = ~np.uint8(2 ** (8 - channel_bits) - 1)
            lut &= mask

            result_img[..., i] = cv2.LUT(img[..., i], lut)

    return to_float(result_img) if original_dtype == np.float32 else result_img

def shuffle_tiles_within_shape_groups (shape_groups, random_state=None) [view source on GitHub]

Shuffles indices within each group of similar shapes and creates a list where each index points to the index of the tile it should be mapped to.

Parameters:

Name Type Description
shape_groups dict[tuple[int, int], list[int]]

Groups of tile indices categorized by shape.

random_state Optional[np.random.RandomState]

Seed for the random number generator for reproducibility.

Returns:

Type Description
list[int]

A list where each index is mapped to the new index of the tile after shuffling.

Source code in albumentations/augmentations/functional.py
Python
def shuffle_tiles_within_shape_groups(
    shape_groups: dict[tuple[int, int], list[int]],
    random_state: np.random.RandomState | None = None,
) -> list[int]:
    """Shuffles indices within each group of similar shapes and creates a list where each
    index points to the index of the tile it should be mapped to.

    Args:
        shape_groups (dict[tuple[int, int], list[int]]): Groups of tile indices categorized by shape.
        random_state (Optional[np.random.RandomState]): Seed for the random number generator for reproducibility.

    Returns:
        list[int]: A list where each index is mapped to the new index of the tile after shuffling.
    """
    # Initialize the output list with the same size as the total number of tiles, filled with -1
    num_tiles = sum(len(indices) for indices in shape_groups.values())
    mapping = [-1] * num_tiles

    # Prepare the random number generator

    for indices in shape_groups.values():
        shuffled_indices = random_utils.shuffle(indices.copy(), random_state=random_state)
        for old, new in zip(indices, shuffled_indices):
            mapping[old] = new

    return mapping

def solarize (img, threshold=128) [view source on GitHub]

Invert all pixel values above a threshold.

Parameters:

Name Type Description
img np.ndarray

The image to solarize.

threshold int

All pixels above this grayscale level are inverted.

Returns:

Type Description
np.ndarray

Solarized image.

Source code in albumentations/augmentations/functional.py
Python
def solarize(img: np.ndarray, threshold: int = 128) -> np.ndarray:
    """Invert all pixel values above a threshold.

    Args:
        img: The image to solarize.
        threshold: All pixels above this grayscale level are inverted.

    Returns:
        Solarized image.

    """
    dtype = img.dtype
    max_val = MAX_VALUES_BY_DTYPE[dtype]

    if dtype == np.uint8:
        lut = [(i if i < threshold else max_val - i) for i in range(int(max_val) + 1)]

        prev_shape = img.shape
        img = cv2.LUT(img, np.array(lut, dtype=dtype))

        if len(prev_shape) != len(img.shape):
            img = np.expand_dims(img, -1)
        return img

    result_img = img.copy()
    cond = img >= threshold
    result_img[cond] = max_val - result_img[cond]
    return result_img

def split_uniform_grid (image_shape, grid, random_state=None) [view source on GitHub]

Splits an image shape into a uniform grid specified by the grid dimensions.

Parameters:

Name Type Description
image_shape tuple[int, int]

The shape of the image as (height, width).

grid tuple[int, int]

The grid size as (rows, columns).

random_state Optional[np.random.RandomState]

The random state to use for shuffling the splits. If None, the splits are not shuffled.

Returns:

Type Description
np.ndarray

An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).

Note

The function uses generate_shuffled_splits to generate the splits for the height and width of the image. The splits are then used to calculate the coordinates of the tiles.

Source code in albumentations/augmentations/functional.py
Python
def split_uniform_grid(
    image_shape: tuple[int, int],
    grid: tuple[int, int],
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Splits an image shape into a uniform grid specified by the grid dimensions.

    Args:
        image_shape (tuple[int, int]): The shape of the image as (height, width).
        grid (tuple[int, int]): The grid size as (rows, columns).
        random_state (Optional[np.random.RandomState]): The random state to use for shuffling the splits.
            If None, the splits are not shuffled.

    Returns:
        np.ndarray: An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).

    Note:
        The function uses `generate_shuffled_splits` to generate the splits for the height and width of the image.
        The splits are then used to calculate the coordinates of the tiles.
    """
    n_rows, n_cols = grid

    height_splits = generate_shuffled_splits(image_shape[0], grid[0], random_state)
    width_splits = generate_shuffled_splits(image_shape[1], grid[1], random_state)

    # Calculate tiles coordinates
    tiles = [
        (height_splits[i], width_splits[j], height_splits[i + 1], width_splits[j + 1])
        for i in range(n_rows)
        for j in range(n_cols)
    ]

    return np.array(tiles)

def swap_tiles_on_image (image, tiles, mapping=None) [view source on GitHub]

Swap tiles on the image according to the new format.

Parameters:

Name Type Description
image np.ndarray

Input image.

tiles np.ndarray

Array of tiles with each tile as [start_y, start_x, end_y, end_x].

mapping list[int] | None

list of new tile indices.

Returns:

Type Description
np.ndarray

Output image with tiles swapped according to the random shuffle.

Source code in albumentations/augmentations/functional.py
Python
def swap_tiles_on_image(image: np.ndarray, tiles: np.ndarray, mapping: list[int] | None = None) -> np.ndarray:
    """Swap tiles on the image according to the new format.

    Args:
        image: Input image.
        tiles: Array of tiles with each tile as [start_y, start_x, end_y, end_x].
        mapping: list of new tile indices.

    Returns:
        np.ndarray: Output image with tiles swapped according to the random shuffle.
    """
    # If no tiles are provided, return a copy of the original image
    if tiles.size == 0 or mapping is None:
        return image.copy()

    # Create a copy of the image to retain original for reference
    new_image = np.empty_like(image)
    for num, new_index in enumerate(mapping):
        start_y, start_x, end_y, end_x = tiles[new_index]
        start_y_orig, start_x_orig, end_y_orig, end_x_orig = tiles[num]
        # Assign the corresponding tile from the original image to the new image
        new_image[start_y:end_y, start_x:end_x] = image[start_y_orig:end_y_orig, start_x_orig:end_x_orig]

    return new_image

def swap_tiles_on_keypoints (keypoints, tiles, mapping) [view source on GitHub]

Swap the positions of keypoints based on a tile mapping.

This function takes a set of keypoints and repositions them according to a mapping of tile swaps. Keypoints are moved from their original tiles to new positions in the swapped tiles.

Parameters:

Name Type Description
keypoints np.ndarray

A 2D numpy array of shape (N, 2) where N is the number of keypoints. Each row represents a keypoint's (x, y) coordinates.

tiles np.ndarray

A 2D numpy array of shape (M, 4) where M is the number of tiles. Each row represents a tile's (start_y, start_x, end_y, end_x) coordinates.

mapping np.ndarray

A 1D numpy array of shape (M,) where M is the number of tiles. Each element i contains the index of the tile that tile i should be swapped with.

Returns:

Type Description
np.ndarray

A 2D numpy array of the same shape as the input keypoints, containing the new positions of the keypoints after the tile swap.

Exceptions:

Type Description
RuntimeWarning

If any keypoint is not found within any tile.

Notes

  • Keypoints that do not fall within any tile will remain unchanged.
  • The function assumes that the tiles do not overlap and cover the entire image space.
Source code in albumentations/augmentations/functional.py
Python
def swap_tiles_on_keypoints(
    keypoints: np.ndarray,
    tiles: np.ndarray,
    mapping: np.ndarray,
) -> np.ndarray:
    """Swap the positions of keypoints based on a tile mapping.

    This function takes a set of keypoints and repositions them according to a mapping of tile swaps.
    Keypoints are moved from their original tiles to new positions in the swapped tiles.

    Args:
        keypoints (np.ndarray): A 2D numpy array of shape (N, 2) where N is the number of keypoints.
                                Each row represents a keypoint's (x, y) coordinates.
        tiles (np.ndarray): A 2D numpy array of shape (M, 4) where M is the number of tiles.
                            Each row represents a tile's (start_y, start_x, end_y, end_x) coordinates.
        mapping (np.ndarray): A 1D numpy array of shape (M,) where M is the number of tiles.
                              Each element i contains the index of the tile that tile i should be swapped with.

    Returns:
        np.ndarray: A 2D numpy array of the same shape as the input keypoints, containing the new positions
                    of the keypoints after the tile swap.

    Raises:
        RuntimeWarning: If any keypoint is not found within any tile.

    Notes:
        - Keypoints that do not fall within any tile will remain unchanged.
        - The function assumes that the tiles do not overlap and cover the entire image space.
    """
    if not keypoints.size:
        return keypoints

    # Broadcast keypoints and tiles for vectorized comparison
    kp_x = keypoints[:, 0][:, np.newaxis]  # Shape: (num_keypoints, 1)
    kp_y = keypoints[:, 1][:, np.newaxis]  # Shape: (num_keypoints, 1)

    start_y, start_x, end_y, end_x = tiles.T  # Each shape: (num_tiles,)

    # Check if each keypoint is inside each tile
    in_tile = (kp_y >= start_y) & (kp_y < end_y) & (kp_x >= start_x) & (kp_x < end_x)

    # Find which tile each keypoint belongs to
    tile_indices = np.argmax(in_tile, axis=1)

    # Check if any keypoint is not in any tile
    not_in_any_tile = ~np.any(in_tile, axis=1)
    if np.any(not_in_any_tile):
        warn(
            "Some keypoints are not in any tile. They will be returned unchanged. This is unexpected and should be "
            "investigated.",
            RuntimeWarning,
            stacklevel=2,
        )

    # Get the new tile indices
    new_tile_indices = np.array(mapping)[tile_indices]

    # Calculate the offsets
    old_start_x = tiles[tile_indices, 1]
    old_start_y = tiles[tile_indices, 0]
    new_start_x = tiles[new_tile_indices, 1]
    new_start_y = tiles[new_tile_indices, 0]

    # Apply the transformation
    new_keypoints = keypoints.copy()
    new_keypoints[:, 0] = (keypoints[:, 0] - old_start_x) + new_start_x
    new_keypoints[:, 1] = (keypoints[:, 1] - old_start_y) + new_start_y

    # Keep original coordinates for keypoints not in any tile
    new_keypoints[not_in_any_tile] = keypoints[not_in_any_tile]

    return new_keypoints

def to_gray_average (img) [view source on GitHub]

Convert an image to grayscale using the average method.

This function computes the arithmetic mean across all channels for each pixel, resulting in a grayscale representation of the image.

Key aspects of this method: 1. It treats all channels equally, regardless of their perceptual importance. 2. Works with any number of channels, making it versatile for various image types. 3. Simple and fast to compute, but may not accurately represent perceived brightness. 4. For RGB images, the formula is: Gray = (R + G + B) / 3

Note: This method may produce different results compared to weighted methods (like RGB weighted average) which account for human perception of color brightness. It may also produce unexpected results for images with alpha channels or non-color data in additional channels.

Parameters:

Name Type Description
img np.ndarray

Input image as a numpy array. Can be any number of channels.

Returns:

Type Description
np.ndarray

Grayscale image as a 2D numpy array. The output data type matches the input data type.

Image types: uint8, float32

Number of channels: any

Source code in albumentations/augmentations/functional.py
Python
def to_gray_average(img: np.ndarray) -> np.ndarray:
    """Convert an image to grayscale using the average method.

    This function computes the arithmetic mean across all channels for each pixel,
    resulting in a grayscale representation of the image.

    Key aspects of this method:
    1. It treats all channels equally, regardless of their perceptual importance.
    2. Works with any number of channels, making it versatile for various image types.
    3. Simple and fast to compute, but may not accurately represent perceived brightness.
    4. For RGB images, the formula is: Gray = (R + G + B) / 3

    Note: This method may produce different results compared to weighted methods
    (like RGB weighted average) which account for human perception of color brightness.
    It may also produce unexpected results for images with alpha channels or
    non-color data in additional channels.

    Args:
        img (np.ndarray): Input image as a numpy array. Can be any number of channels.

    Returns:
        np.ndarray: Grayscale image as a 2D numpy array. The output data type
                    matches the input data type.

    Image types:
        uint8, float32

    Number of channels:
        any
    """
    return np.mean(img, axis=-1).astype(img.dtype)

def to_gray_desaturation (img) [view source on GitHub]

Convert an image to grayscale using the desaturation method.

Parameters:

Name Type Description
img np.ndarray

Input image as a numpy array.

Returns:

Type Description
np.ndarray

Grayscale image as a 2D numpy array.

Image types: uint8, float32

Number of channels: any

Source code in albumentations/augmentations/functional.py
Python
@clipped
def to_gray_desaturation(img: np.ndarray) -> np.ndarray:
    """Convert an image to grayscale using the desaturation method.

    Args:
        img (np.ndarray): Input image as a numpy array.

    Returns:
        np.ndarray: Grayscale image as a 2D numpy array.

    Image types:
        uint8, float32

    Number of channels:
        any
    """
    float_image = img.astype(np.float32)
    return (np.max(float_image, axis=-1) + np.min(float_image, axis=-1)) / 2

def to_gray_from_lab (img) [view source on GitHub]

Convert an RGB image to grayscale using the L channel from the LAB color space.

This function converts the RGB image to the LAB color space and extracts the L channel. The LAB color space is designed to approximate human vision, where L represents lightness.

Key aspects of this method: 1. The L channel represents the lightness of each pixel, ranging from 0 (black) to 100 (white). 2. It's more perceptually uniform than RGB, meaning equal changes in L values correspond to roughly equal changes in perceived lightness. 3. The L channel is independent of the color information (A and B channels), making it suitable for grayscale conversion.

This method can be particularly useful when you want a grayscale image that closely matches human perception of lightness, potentially preserving more perceived contrast than simple RGB-based methods.

Parameters:

Name Type Description
img np.ndarray

Input RGB image as a numpy array.

Returns:

Type Description
np.ndarray

Grayscale image as a 2D numpy array, representing the L (lightness) channel. Values are scaled to match the input image's data type range.

Image types: uint8, float32

Number of channels: 3

Source code in albumentations/augmentations/functional.py
Python
@clipped
def to_gray_from_lab(img: np.ndarray) -> np.ndarray:
    """Convert an RGB image to grayscale using the L channel from the LAB color space.

    This function converts the RGB image to the LAB color space and extracts the L channel.
    The LAB color space is designed to approximate human vision, where L represents lightness.

    Key aspects of this method:
    1. The L channel represents the lightness of each pixel, ranging from 0 (black) to 100 (white).
    2. It's more perceptually uniform than RGB, meaning equal changes in L values correspond to
       roughly equal changes in perceived lightness.
    3. The L channel is independent of the color information (A and B channels), making it
       suitable for grayscale conversion.

    This method can be particularly useful when you want a grayscale image that closely
    matches human perception of lightness, potentially preserving more perceived contrast
    than simple RGB-based methods.

    Args:
        img (np.ndarray): Input RGB image as a numpy array.

    Returns:
        np.ndarray: Grayscale image as a 2D numpy array, representing the L (lightness) channel.
                    Values are scaled to match the input image's data type range.

    Image types:
        uint8, float32

    Number of channels:
        3
    """
    dtype = img.dtype
    img_uint8 = from_float(img, target_dtype=np.uint8) if dtype == np.float32 else img
    result = cv2.cvtColor(img_uint8, cv2.COLOR_RGB2LAB)[..., 0]

    return to_float(result) if dtype == np.float32 else result

def to_gray_max (img) [view source on GitHub]

Convert an image to grayscale using the maximum channel value method.

This function takes the maximum value across all channels for each pixel, resulting in a grayscale image that preserves the brightest parts of the original image.

Key aspects of this method: 1. Works with any number of channels, making it versatile for various image types. 2. For 3-channel (e.g., RGB) images, this method is equivalent to extracting the V (Value) channel from the HSV color space. 3. Preserves the brightest parts of the image but may lose some color contrast information. 4. Simple and fast to compute.

Note: - This method tends to produce brighter grayscale images compared to other conversion methods, as it always selects the highest intensity value from the channels. - For RGB images, it may not accurately represent perceived brightness as it doesn't account for human color perception.

Parameters:

Name Type Description
img np.ndarray

Input image as a numpy array. Can be any number of channels.

Returns:

Type Description
np.ndarray

Grayscale image as a 2D numpy array. The output data type matches the input data type.

Image types: uint8, float32

Number of channels: any

Source code in albumentations/augmentations/functional.py
Python
def to_gray_max(img: np.ndarray) -> np.ndarray:
    """Convert an image to grayscale using the maximum channel value method.

    This function takes the maximum value across all channels for each pixel,
    resulting in a grayscale image that preserves the brightest parts of the original image.

    Key aspects of this method:
    1. Works with any number of channels, making it versatile for various image types.
    2. For 3-channel (e.g., RGB) images, this method is equivalent to extracting the V (Value)
       channel from the HSV color space.
    3. Preserves the brightest parts of the image but may lose some color contrast information.
    4. Simple and fast to compute.

    Note:
    - This method tends to produce brighter grayscale images compared to other conversion methods,
      as it always selects the highest intensity value from the channels.
    - For RGB images, it may not accurately represent perceived brightness as it doesn't
      account for human color perception.

    Args:
        img (np.ndarray): Input image as a numpy array. Can be any number of channels.

    Returns:
        np.ndarray: Grayscale image as a 2D numpy array. The output data type
                    matches the input data type.

    Image types:
        uint8, float32

    Number of channels:
        any
    """
    return np.max(img, axis=-1)

def to_gray_pca (img) [view source on GitHub]

Convert an image to grayscale using Principal Component Analysis (PCA).

This function applies PCA to reduce a multi-channel image to a single channel, effectively creating a grayscale representation that captures the maximum variance in the color data.

Parameters:

Name Type Description
img np.ndarray

Input image as a numpy array with shape (height, width, channels).

Returns:

Type Description
np.ndarray

Grayscale image as a 2D numpy array with shape (height, width). If input is uint8, output is uint8 in range [0, 255]. If input is float32, output is float32 in range [0, 1].

Note

This method can potentially preserve more information from the original image compared to standard weighted average methods, as it accounts for the correlations between color channels.

Image types: uint8, float32

Number of channels: any

Source code in albumentations/augmentations/functional.py
Python
@clipped
def to_gray_pca(img: np.ndarray) -> np.ndarray:
    """Convert an image to grayscale using Principal Component Analysis (PCA).

    This function applies PCA to reduce a multi-channel image to a single channel,
    effectively creating a grayscale representation that captures the maximum variance
    in the color data.

    Args:
        img (np.ndarray): Input image as a numpy array with shape (height, width, channels).

    Returns:
        np.ndarray: Grayscale image as a 2D numpy array with shape (height, width).
                    If input is uint8, output is uint8 in range [0, 255].
                    If input is float32, output is float32 in range [0, 1].

    Note:
        This method can potentially preserve more information from the original image
        compared to standard weighted average methods, as it accounts for the
        correlations between color channels.

    Image types:
        uint8, float32

    Number of channels:
        any
    """
    dtype = img.dtype
    # Reshape the image to a 2D array of pixels
    pixels = img.reshape(-1, img.shape[2])

    # Perform PCA
    pca = PCA(n_components=1)
    pca_result = pca.fit_transform(pixels)

    # Reshape back to image dimensions and scale to 0-255
    grayscale = pca_result.reshape(img.shape[:2])
    grayscale = normalize_per_image(grayscale, "min_max")

    return from_float(grayscale, target_dtype=dtype) if dtype == np.uint8 else grayscale

def to_gray_weighted_average (img) [view source on GitHub]

Convert an RGB image to grayscale using the weighted average method.

This function uses OpenCV's cvtColor function with COLOR_RGB2GRAY conversion, which applies the following formula: Y = 0.299R + 0.587G + 0.114*B

Parameters:

Name Type Description
img np.ndarray

Input RGB image as a numpy array.

Returns:

Type Description
np.ndarray

Grayscale image as a 2D numpy array.

Image types: uint8, float32

Number of channels: 3

Source code in albumentations/augmentations/functional.py
Python
def to_gray_weighted_average(img: np.ndarray) -> np.ndarray:
    """Convert an RGB image to grayscale using the weighted average method.

    This function uses OpenCV's cvtColor function with COLOR_RGB2GRAY conversion,
    which applies the following formula:
    Y = 0.299*R + 0.587*G + 0.114*B

    Args:
        img (np.ndarray): Input RGB image as a numpy array.

    Returns:
        np.ndarray: Grayscale image as a 2D numpy array.

    Image types:
        uint8, float32

    Number of channels:
        3
    """
    return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

geometric special

functional

def bboxes_affine (bboxes, matrix, rotate_method, image_shape, border_mode, output_shape) [view source on GitHub]

Apply an affine transformation to bounding boxes.

For reflection border modes (cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT), this function: 1. Calculates necessary padding to avoid information loss 2. Applies padding to the bounding boxes 3. Adjusts the transformation matrix to account for padding 4. Applies the affine transformation 5. Validates the transformed bounding boxes

For other border modes, it directly applies the affine transformation without padding.

Parameters:

Name Type Description
bboxes np.ndarray

Input bounding boxes

matrix skimage.transform.ProjectiveTransform

Affine transformation matrix

rotate_method str

Method for rotating bounding boxes ('largest_box' or 'ellipse')

image_shape Sequence[int]

Shape of the input image

border_mode int

OpenCV border mode

output_shape Sequence[int]

Shape of the output image

Returns:

Type Description
np.ndarray

Transformed and normalized bounding boxes

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_affine(
    bboxes: np.ndarray,
    matrix: skimage.transform.ProjectiveTransform,
    rotate_method: Literal["largest_box", "ellipse"],
    image_shape: tuple[int, int],
    border_mode: int,
    output_shape: tuple[int, int],
) -> np.ndarray:
    """Apply an affine transformation to bounding boxes.

    For reflection border modes (cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT), this function:
    1. Calculates necessary padding to avoid information loss
    2. Applies padding to the bounding boxes
    3. Adjusts the transformation matrix to account for padding
    4. Applies the affine transformation
    5. Validates the transformed bounding boxes

    For other border modes, it directly applies the affine transformation without padding.

    Args:
        bboxes (np.ndarray): Input bounding boxes
        matrix (skimage.transform.ProjectiveTransform): Affine transformation matrix
        rotate_method (str): Method for rotating bounding boxes ('largest_box' or 'ellipse')
        image_shape (Sequence[int]): Shape of the input image
        border_mode (int): OpenCV border mode
        output_shape (Sequence[int]): Shape of the output image

    Returns:
        np.ndarray: Transformed and normalized bounding boxes
    """
    if is_identity_matrix(matrix):
        return bboxes

    bboxes = denormalize_bboxes(bboxes, image_shape)

    if border_mode in REFLECT_BORDER_MODES:
        # Step 1: Compute affine transform padding
        pad_left, pad_right, pad_top, pad_bottom = calculate_affine_transform_padding(matrix, image_shape)
        grid_dimensions = get_pad_grid_dimensions(pad_top, pad_bottom, pad_left, pad_right, image_shape)
        bboxes = generate_reflected_bboxes(bboxes, grid_dimensions, image_shape, center_in_origin=True)

    # Apply affine transform
    if rotate_method == "largest_box":
        transformed_bboxes = bboxes_affine_largest_box(bboxes, matrix)
    elif rotate_method == "ellipse":
        transformed_bboxes = bboxes_affine_ellipse(bboxes, matrix)
    else:
        raise ValueError(f"Method {rotate_method} is not a valid rotation method.")

    # Validate and normalize bboxes
    validated_bboxes = validate_bboxes(transformed_bboxes, output_shape)

    return normalize_bboxes(validated_bboxes, output_shape)
def bboxes_affine_ellipse (bboxes, matrix) [view source on GitHub]

Apply an affine transformation to bounding boxes using an ellipse approximation method.

This function transforms bounding boxes by approximating each box with an ellipse, transforming points along the ellipse's circumference, and then computing the new bounding box that encloses the transformed ellipse.

Parameters:

Name Type Description
bboxes np.ndarray

An array of bounding boxes with shape (N, 4+) where N is the number of bounding boxes. Each row should contain [x_min, y_min, x_max, y_max] followed by any additional attributes (e.g., class labels).

matrix skimage.transform.ProjectiveTransform

The affine transformation matrix to apply.

Returns:

Type Description
np.ndarray

An array of transformed bounding boxes with the same shape as the input. Each row contains [new_x_min, new_y_min, new_x_max, new_y_max] followed by any additional attributes from the input bounding boxes.

Note

  • This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max].
  • The ellipse approximation method can provide a tighter bounding box compared to the largest box method, especially for rotations.
  • 360 points are used to approximate each ellipse, which provides a good balance between accuracy and computational efficiency.
  • Any additional attributes beyond the first 4 coordinates are preserved unchanged.
  • This method may be more suitable for objects that are roughly elliptical in shape.

Examples:

Python
>>> bboxes = np.array([[10, 10, 30, 20, 1], [40, 40, 60, 60, 2]])  # Two boxes with class labels
>>> matrix = skimage.transform.AffineTransform(rotation=np.pi/4)  # 45-degree rotation
>>> transformed_bboxes = bboxes_affine_ellipse(bboxes, matrix)
>>> print(transformed_bboxes)
[[ 5.86  5.86 34.14 24.14  1.  ]
 [30.   30.   70.   70.    2.  ]]
Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_affine_ellipse(bboxes: np.ndarray, matrix: skimage.transform.ProjectiveTransform) -> np.ndarray:
    """Apply an affine transformation to bounding boxes using an ellipse approximation method.

    This function transforms bounding boxes by approximating each box with an ellipse,
    transforming points along the ellipse's circumference, and then computing the
    new bounding box that encloses the transformed ellipse.

    Args:
        bboxes (np.ndarray): An array of bounding boxes with shape (N, 4+) where N is the number of
                             bounding boxes. Each row should contain [x_min, y_min, x_max, y_max]
                             followed by any additional attributes (e.g., class labels).
        matrix (skimage.transform.ProjectiveTransform): The affine transformation matrix to apply.

    Returns:
        np.ndarray: An array of transformed bounding boxes with the same shape as the input.
                    Each row contains [new_x_min, new_y_min, new_x_max, new_y_max] followed by
                    any additional attributes from the input bounding boxes.

    Note:
        - This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max].
        - The ellipse approximation method can provide a tighter bounding box compared to the
          largest box method, especially for rotations.
        - 360 points are used to approximate each ellipse, which provides a good balance between
          accuracy and computational efficiency.
        - Any additional attributes beyond the first 4 coordinates are preserved unchanged.
        - This method may be more suitable for objects that are roughly elliptical in shape.

    Example:
        >>> bboxes = np.array([[10, 10, 30, 20, 1], [40, 40, 60, 60, 2]])  # Two boxes with class labels
        >>> matrix = skimage.transform.AffineTransform(rotation=np.pi/4)  # 45-degree rotation
        >>> transformed_bboxes = bboxes_affine_ellipse(bboxes, matrix)
        >>> print(transformed_bboxes)
        [[ 5.86  5.86 34.14 24.14  1.  ]
         [30.   30.   70.   70.    2.  ]]
    """
    x_min, y_min, x_max, y_max = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]
    bbox_width = (x_max - x_min) / 2
    bbox_height = (y_max - y_min) / 2
    center_x = x_min + bbox_width
    center_y = y_min + bbox_height

    angles = np.arange(0, 360, dtype=np.float32)
    cos_angles = np.cos(np.radians(angles))
    sin_angles = np.sin(np.radians(angles))

    # Generate points for all ellipses at once
    x = bbox_width[:, np.newaxis] * sin_angles + center_x[:, np.newaxis]
    y = bbox_height[:, np.newaxis] * cos_angles + center_y[:, np.newaxis]
    points = np.stack([x, y], axis=-1).reshape(-1, 2)

    # Transform all points at once
    # Replacing skimage.transform.matrix_transform with numpy ops:
    # points reshape from N, 2 to N, 3 with extra dim filled with 1
    points = np.concatenate([points, np.ones((points.shape[0], 1))], axis=1)

    # change matrix.params.T to matrix.T if matrix is np.ndarray and no longer skimage.transform.ProjectiveTransform
    transformed_points = points @ matrix.params.T

    # set zero to very small number before homogeneous divide
    transformed_points[:, -1:] = np.where(
        transformed_points[:, -1:] == 0,
        np.finfo(float).eps,
        transformed_points[:, -1:],
    )

    # homogeneous divide and then get x, y
    transformed_points = (transformed_points / transformed_points[:, -1:])[:, :2]

    transformed_points = transformed_points.reshape(len(bboxes), -1, 2)

    # Compute new bounding boxes
    new_x_min = np.min(transformed_points[:, :, 0], axis=1)
    new_x_max = np.max(transformed_points[:, :, 0], axis=1)
    new_y_min = np.min(transformed_points[:, :, 1], axis=1)
    new_y_max = np.max(transformed_points[:, :, 1], axis=1)

    return np.column_stack([new_x_min, new_y_min, new_x_max, new_y_max, bboxes[:, 4:]])
def bboxes_affine_largest_box (bboxes, matrix) [view source on GitHub]

Apply an affine transformation to bounding boxes and return the largest enclosing boxes.

This function transforms each corner of every bounding box using the given affine transformation matrix, then computes the new bounding boxes that fully enclose the transformed corners.

Parameters:

Name Type Description
bboxes np.ndarray

An array of bounding boxes with shape (N, 4+) where N is the number of bounding boxes. Each row should contain [x_min, y_min, x_max, y_max] followed by any additional attributes (e.g., class labels).

matrix skimage.transform.ProjectiveTransform

The affine transformation matrix to apply.

Returns:

Type Description
np.ndarray

An array of transformed bounding boxes with the same shape as the input. Each row contains [new_x_min, new_y_min, new_x_max, new_y_max] followed by any additional attributes from the input bounding boxes.

Note

  • This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max].
  • The resulting bounding boxes are the smallest axis-aligned boxes that completely enclose the transformed original boxes. They may be larger than the minimal possible bounding box if the original box becomes rotated.
  • Any additional attributes beyond the first 4 coordinates are preserved unchanged.
  • This method is called "largest box" because it returns the largest axis-aligned box that encloses all corners of the transformed bounding box.

Examples:

Python
>>> bboxes = np.array([[10, 10, 20, 20, 1], [30, 30, 40, 40, 2]])  # Two boxes with class labels
>>> matrix = skimage.transform.AffineTransform(scale=(2, 2), translation=(5, 5))
>>> transformed_bboxes = bboxes_affine_largest_box(bboxes, matrix)
>>> print(transformed_bboxes)
[[ 25.  25.  45.  45.   1.]
 [ 65.  65.  85.  85.   2.]]
Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_affine_largest_box(bboxes: np.ndarray, matrix: skimage.transform.ProjectiveTransform) -> np.ndarray:
    """Apply an affine transformation to bounding boxes and return the largest enclosing boxes.

    This function transforms each corner of every bounding box using the given affine transformation
    matrix, then computes the new bounding boxes that fully enclose the transformed corners.

    Args:
        bboxes (np.ndarray): An array of bounding boxes with shape (N, 4+) where N is the number of
                             bounding boxes. Each row should contain [x_min, y_min, x_max, y_max]
                             followed by any additional attributes (e.g., class labels).
        matrix (skimage.transform.ProjectiveTransform): The affine transformation matrix to apply.

    Returns:
        np.ndarray: An array of transformed bounding boxes with the same shape as the input.
                    Each row contains [new_x_min, new_y_min, new_x_max, new_y_max] followed by
                    any additional attributes from the input bounding boxes.

    Note:
        - This function assumes that the input bounding boxes are in the format [x_min, y_min, x_max, y_max].
        - The resulting bounding boxes are the smallest axis-aligned boxes that completely
          enclose the transformed original boxes. They may be larger than the minimal possible
          bounding box if the original box becomes rotated.
        - Any additional attributes beyond the first 4 coordinates are preserved unchanged.
        - This method is called "largest box" because it returns the largest axis-aligned box
          that encloses all corners of the transformed bounding box.

    Example:
        >>> bboxes = np.array([[10, 10, 20, 20, 1], [30, 30, 40, 40, 2]])  # Two boxes with class labels
        >>> matrix = skimage.transform.AffineTransform(scale=(2, 2), translation=(5, 5))
        >>> transformed_bboxes = bboxes_affine_largest_box(bboxes, matrix)
        >>> print(transformed_bboxes)
        [[ 25.  25.  45.  45.   1.]
         [ 65.  65.  85.  85.   2.]]
    """
    # Extract corners of all bboxes
    x_min, y_min, x_max, y_max = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]
    corners = np.array([[x_min, y_min], [x_max, y_min], [x_max, y_max], [x_min, y_max]]).transpose(
        2,
        0,
        1,
    )  # Shape: (num_bboxes, 4, 2)

    # Transform all corners at once
    transformed_corners = skimage.transform.matrix_transform(corners.reshape(-1, 2), matrix.params)
    transformed_corners = transformed_corners.reshape(-1, 4, 2)

    # Compute new bounding boxes
    new_x_min = np.min(transformed_corners[:, :, 0], axis=1)
    new_x_max = np.max(transformed_corners[:, :, 0], axis=1)
    new_y_min = np.min(transformed_corners[:, :, 1], axis=1)
    new_y_max = np.max(transformed_corners[:, :, 1], axis=1)

    return np.column_stack([new_x_min, new_y_min, new_x_max, new_y_max, bboxes[:, 4:]])
def bboxes_d4 (bboxes, group_member) [view source on GitHub]

Applies a D_4 symmetry group transformation to a bounding box.

The function transforms a bounding box according to the specified group member from the D_4 group. These transformations include rotations and reflections, specified to work on an image's bounding box given its dimensions.

  • bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).
  • group_member (D4Type): A string identifier for the D_4 group transformation to apply. Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'.
  • BoxInternalType: The transformed bounding box.
  • ValueError: If an invalid group member is specified.

Examples:

  • Applying a 90-degree rotation: bbox_d4((10, 20, 110, 120), 'r90') This would rotate the bounding box 90 degrees within a 100x100 image.
Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_d4(
    bboxes: np.ndarray,
    group_member: D4Type,
) -> np.ndarray:
    """Applies a `D_4` symmetry group transformation to a bounding box.

    The function transforms a bounding box according to the specified group member from the `D_4` group.
    These transformations include rotations and reflections, specified to work on an image's bounding box given
    its dimensions.

    Parameters:
    -  bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).
    - group_member (D4Type): A string identifier for the `D_4` group transformation to apply.
        Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hvt', 'h', 't'.

    Returns:
    - BoxInternalType: The transformed bounding box.

    Raises:
    - ValueError: If an invalid group member is specified.

    Examples:
    - Applying a 90-degree rotation:
      `bbox_d4((10, 20, 110, 120), 'r90')`
      This would rotate the bounding box 90 degrees within a 100x100 image.
    """
    transformations = {
        "e": lambda x: x,  # Identity transformation
        "r90": lambda x: bboxes_rot90(x, 1),  # Rotate 90 degrees
        "r180": lambda x: bboxes_rot90(x, 2),  # Rotate 180 degrees
        "r270": lambda x: bboxes_rot90(x, 3),  # Rotate 270 degrees
        "v": lambda x: bboxes_vflip(x),  # Vertical flip
        "hvt": lambda x: bboxes_transpose(bboxes_rot90(x, 2)),  # Reflect over anti-diagonal
        "h": lambda x: bboxes_hflip(x),  # Horizontal flip
        "t": lambda x: bboxes_transpose(x),  # Transpose (reflect over main diagonal)
    }

    # Execute the appropriate transformation
    if group_member in transformations:
        return transformations[group_member](bboxes)

    raise ValueError(f"Invalid group member: {group_member}")
def bboxes_flip (bboxes, d) [view source on GitHub]

Flip a bounding box either vertically, horizontally or both depending on the value of d.

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

d int

dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose

Returns:

Type Description
np.ndarray

A bounding box (x_min, y_min, x_max, y_max).

Exceptions:

Type Description
ValueError

if value of d is not -1, 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_flip(bboxes: np.ndarray, d: int) -> np.ndarray:
    """Flip a bounding box either vertically, horizontally or both depending on the value of `d`.

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).
        d: dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose

    Returns:
        A bounding box `(x_min, y_min, x_max, y_max)`.

    Raises:
        ValueError: if value of `d` is not -1, 0 or 1.

    """
    if d == 0:
        return bboxes_vflip(bboxes)
    if d == 1:
        return bboxes_hflip(bboxes)
    if d == -1:
        bboxes = bboxes_hflip(bboxes)
        return bboxes_vflip(bboxes)

    raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
def bboxes_hflip (bboxes) [view source on GitHub]

Flip bounding boxes horizontally around the y-axis.

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

Returns:

Type Description
np.ndarray

A numpy array of horizontally flipped bounding boxes with the same shape as input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_hflip(bboxes: np.ndarray) -> np.ndarray:
    """Flip bounding boxes horizontally around the y-axis.

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

    Returns:
        np.ndarray: A numpy array of horizontally flipped bounding boxes with the same shape as input.
    """
    flipped_bboxes = bboxes.copy()
    flipped_bboxes[:, 0] = 1 - bboxes[:, 2]  # new x_min = 1 - x_max
    flipped_bboxes[:, 2] = 1 - bboxes[:, 0]  # new x_max = 1 - x_min

    return flipped_bboxes
def bboxes_rot90 (bboxes, factor) [view source on GitHub]

Rotates bounding boxes by 90 degrees CCW (see np.rot90)

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

factor int

Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.

Returns:

Type Description
np.ndarray

A numpy array of rotated bounding boxes with the same shape as input.

Exceptions:

Type Description
ValueError

If factor is not in set {0, 1, 2, 3}.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_rot90(bboxes: np.ndarray, factor: int) -> np.ndarray:
    """Rotates bounding boxes by 90 degrees CCW (see np.rot90)

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).
        factor: Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.

    Returns:
        np.ndarray: A numpy array of rotated bounding boxes with the same shape as input.

    Raises:
        ValueError: If factor is not in set {0, 1, 2, 3}.
    """
    if factor not in {0, 1, 2, 3}:
        raise ValueError("Parameter factor must be in set {0, 1, 2, 3}")

    if factor == 0:
        return bboxes

    rotated_bboxes = bboxes.copy()
    x_min, y_min, x_max, y_max = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]

    if factor == 1:
        rotated_bboxes[:, 0] = y_min
        rotated_bboxes[:, 1] = 1 - x_max
        rotated_bboxes[:, 2] = y_max
        rotated_bboxes[:, 3] = 1 - x_min
    elif factor == ROT90_180_FACTOR:
        rotated_bboxes[:, 0] = 1 - x_max
        rotated_bboxes[:, 1] = 1 - y_max
        rotated_bboxes[:, 2] = 1 - x_min
        rotated_bboxes[:, 3] = 1 - y_min
    elif factor == ROT90_270_FACTOR:
        rotated_bboxes[:, 0] = 1 - y_max
        rotated_bboxes[:, 1] = x_min
        rotated_bboxes[:, 2] = 1 - y_min
        rotated_bboxes[:, 3] = x_max

    return rotated_bboxes
def bboxes_rotate (bboxes, angle, method, image_shape) [view source on GitHub]

Rotates bounding boxes by angle degrees.

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

angle float

Angle of rotation in degrees.

method Literal['largest_box', 'ellipse']

Rotation method used. Should be one of: "largest_box", "ellipse".

image_shape tuple[int, int]

Image shape (height, width).

Returns:

Type Description
np.ndarray

A numpy array of rotated bounding boxes with the same shape as input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_rotate(
    bboxes: np.ndarray,
    angle: float,
    method: Literal["largest_box", "ellipse"],
    image_shape: tuple[int, int],
) -> np.ndarray:
    """Rotates bounding boxes by angle degrees.

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).
        angle: Angle of rotation in degrees.
        method: Rotation method used. Should be one of: "largest_box", "ellipse".
        image_shape: Image shape (height, width).

    Returns:
        np.ndarray: A numpy array of rotated bounding boxes with the same shape as input.

    Reference:
        https://arxiv.org/abs/2109.13488
    """
    bboxes = bboxes.copy()
    rows, cols = image_shape[:2]
    x_min, y_min, x_max, y_max = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]
    scale = cols / float(rows)

    if method == "largest_box":
        x = np.column_stack([x_min, x_max, x_max, x_min]) - 0.5
        y = np.column_stack([y_min, y_min, y_max, y_max]) - 0.5
    elif method == "ellipse":
        w = (x_max - x_min) / 2
        h = (y_max - y_min) / 2
        data = np.arange(0, 360, dtype=np.float32)
        x = w[:, np.newaxis] * np.sin(np.radians(data)) + (w + x_min - 0.5)[:, np.newaxis]
        y = h[:, np.newaxis] * np.cos(np.radians(data)) + (h + y_min - 0.5)[:, np.newaxis]
    else:
        raise ValueError(f"Method {method} is not a valid rotation method.")

    angle_rad = np.deg2rad(angle)
    x_t = (np.cos(angle_rad) * x * scale + np.sin(angle_rad) * y) / scale
    y_t = -np.sin(angle_rad) * x * scale + np.cos(angle_rad) * y
    x_t = x_t + 0.5
    y_t = y_t + 0.5

    # Update the first 4 columns of the input array
    bboxes[:, 0] = np.min(x_t, axis=1)
    bboxes[:, 1] = np.min(y_t, axis=1)
    bboxes[:, 2] = np.max(x_t, axis=1)
    bboxes[:, 3] = np.max(y_t, axis=1)

    return bboxes
def bboxes_transpose (bboxes) [view source on GitHub]

Transpose bounding boxes by swapping x and y coordinates.

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

Returns:

Type Description
np.ndarray

A numpy array of transposed bounding boxes with the same shape as input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_transpose(bboxes: np.ndarray) -> np.ndarray:
    """Transpose bounding boxes by swapping x and y coordinates.

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

    Returns:
        np.ndarray: A numpy array of transposed bounding boxes with the same shape as input.
    """
    transposed_bboxes = bboxes.copy()
    transposed_bboxes[:, [0, 1, 2, 3]] = bboxes[:, [1, 0, 3, 2]]

    return transposed_bboxes
def bboxes_vflip (bboxes) [view source on GitHub]

Flip bounding boxes vertically around the x-axis.

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

Returns:

Type Description
np.ndarray

A numpy array of vertically flipped bounding boxes with the same shape as input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def bboxes_vflip(bboxes: np.ndarray) -> np.ndarray:
    """Flip bounding boxes vertically around the x-axis.

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
                Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).

    Returns:
        np.ndarray: A numpy array of vertically flipped bounding boxes with the same shape as input.
    """
    flipped_bboxes = bboxes.copy()
    flipped_bboxes[:, 1] = 1 - bboxes[:, 3]  # new y_min = 1 - y_max
    flipped_bboxes[:, 3] = 1 - bboxes[:, 1]  # new y_max = 1 - y_min

    return flipped_bboxes
def calculate_affine_transform_padding (matrix, image_shape) [view source on GitHub]

Calculate the necessary padding for an affine transformation to avoid empty spaces.

Source code in albumentations/augmentations/geometric/functional.py
Python
def calculate_affine_transform_padding(
    matrix: skimage.transform.ProjectiveTransform,
    image_shape: Sequence[int],
) -> tuple[int, int, int, int]:
    """Calculate the necessary padding for an affine transformation to avoid empty spaces."""
    height, width = image_shape[:2]

    # Check for identity transform
    if is_identity_matrix(matrix):
        return (0, 0, 0, 0)

    # Original corners
    corners = np.array([[0, 0], [width, 0], [width, height], [0, height]])

    # Transform corners
    transformed_corners = matrix(corners)

    # Find box that includes both original and transformed corners
    all_corners = np.vstack((corners, transformed_corners))
    min_x, min_y = all_corners.min(axis=0)
    max_x, max_y = all_corners.max(axis=0)
    # Compute the inverse transform
    inverse_matrix = matrix.inverse

    # Apply inverse transform to all corners of the bounding box
    bbox_corners = np.array([[min_x, min_y], [max_x, min_y], [max_x, max_y], [min_x, max_y]])

    inverse_corners = inverse_matrix(bbox_corners)

    min_x, min_y = inverse_corners.min(axis=0)
    max_x, max_y = inverse_corners.max(axis=0)

    pad_left = max(0, math.ceil(0 - min_x))
    pad_right = max(0, math.ceil(max_x - width))
    pad_top = max(0, math.ceil(0 - min_y))
    pad_bottom = max(0, math.ceil(max_y - height))

    return pad_left, pad_right, pad_top, pad_bottom
def calculate_grid_dimensions (image_shape, num_grid_xy) [view source on GitHub]

Calculate the dimensions of a grid overlay on an image using vectorized operations.

This function divides an image into a grid and calculates the dimensions (x_min, y_min, x_max, y_max) for each cell in the grid without using loops.

Parameters:

Name Type Description
image_shape tuple[int, int]

The shape of the image (height, width).

num_grid_xy tuple[int, int]

The number of grid cells in (x, y) directions.

Returns:

Type Description
np.ndarray

A 3D array of shape (grid_height, grid_width, 4) where each element is [x_min, y_min, x_max, y_max] for a grid cell.

Examples:

Python
>>> image_shape = (100, 150)
>>> num_grid_xy = (3, 2)
>>> dimensions = calculate_grid_dimensions(image_shape, num_grid_xy)
>>> print(dimensions.shape)
(2, 3, 4)
>>> print(dimensions[0, 0])  # First cell
[  0   0  50  50]
Source code in albumentations/augmentations/geometric/functional.py
Python
def calculate_grid_dimensions(
    image_shape: tuple[int, int],
    num_grid_xy: tuple[int, int],
) -> np.ndarray:
    """Calculate the dimensions of a grid overlay on an image using vectorized operations.

    This function divides an image into a grid and calculates the dimensions
    (x_min, y_min, x_max, y_max) for each cell in the grid without using loops.

    Args:
        image_shape (tuple[int, int]): The shape of the image (height, width).
        num_grid_xy (tuple[int, int]): The number of grid cells in (x, y) directions.

    Returns:
        np.ndarray: A 3D array of shape (grid_height, grid_width, 4) where each element
                    is [x_min, y_min, x_max, y_max] for a grid cell.

    Example:
        >>> image_shape = (100, 150)
        >>> num_grid_xy = (3, 2)
        >>> dimensions = calculate_grid_dimensions(image_shape, num_grid_xy)
        >>> print(dimensions.shape)
        (2, 3, 4)
        >>> print(dimensions[0, 0])  # First cell
        [  0   0  50  50]
    """
    num_grid_yx = np.array(num_grid_xy[::-1])  # Reverse to match image_shape order
    image_shape = np.array(image_shape)

    square_shape = image_shape // num_grid_yx
    last_square_shape = image_shape - (square_shape * (num_grid_yx - 1))

    grid_width, grid_height = num_grid_xy

    # Create meshgrid for row and column indices
    col_indices, row_indices = np.meshgrid(np.arange(grid_width), np.arange(grid_height))

    # Calculate x_min and y_min
    x_min = col_indices * square_shape[1]
    y_min = row_indices * square_shape[0]

    # Calculate x_max and y_max
    x_max = np.where(col_indices == grid_width - 1, x_min + last_square_shape[1], x_min + square_shape[1])
    y_max = np.where(row_indices == grid_height - 1, y_min + last_square_shape[0], y_min + square_shape[0])

    # Stack the dimensions
    return np.stack([x_min, y_min, x_max, y_max], axis=-1).astype(np.int16)
def center (image_shape) [view source on GitHub]

Calculate the center coordinates if image. Used by images, masks and keypoints.

Parameters:

Name Type Description
image_shape tuple[int, int]

The shape of the image.

Returns:

Type Description
tuple[float, float]

The center coordinates.

Source code in albumentations/augmentations/geometric/functional.py
Python
def center(image_shape: tuple[int, int]) -> tuple[float, float]:
    """Calculate the center coordinates if image. Used by images, masks and keypoints.

    Args:
        image_shape (tuple[int, int]): The shape of the image.

    Returns:
        tuple[float, float]: The center coordinates.
    """
    height, width = image_shape[:2]
    return width / 2 - 0.5, height / 2 - 0.5
def center_bbox (image_shape) [view source on GitHub]

Calculate the center coordinates for of image for bounding boxes.

Parameters:

Name Type Description
image_shape tuple[int, int]

The shape of the image.

Returns:

Type Description
tuple[float, float]

The center coordinates.

Source code in albumentations/augmentations/geometric/functional.py
Python
def center_bbox(image_shape: tuple[int, int]) -> tuple[float, float]:
    """Calculate the center coordinates for of image for bounding boxes.

    Args:
        image_shape (tuple[int, int]): The shape of the image.

    Returns:
        tuple[float, float]: The center coordinates.
    """
    height, width = image_shape[:2]
    return width / 2, height / 2
def compute_transformed_image_bounds (matrix, image_shape) [view source on GitHub]

Compute the bounds of an image after applying an affine transformation.

Parameters:

Name Type Description
matrix skimage.transform.ProjectiveTransform

The affine transformation matrix.

image_shape tuple[int, int]

The shape of the image as (height, width).

Returns:

Type Description
tuple[np.ndarray, np.ndarray]

A tuple containing: - min_coords: An array with the minimum x and y coordinates. - max_coords: An array with the maximum x and y coordinates.

Source code in albumentations/augmentations/geometric/functional.py
Python
def compute_transformed_image_bounds(
    matrix: skimage.transform.ProjectiveTransform,
    image_shape: tuple[int, int],
) -> tuple[np.ndarray, np.ndarray]:
    """Compute the bounds of an image after applying an affine transformation.

    Args:
        matrix (skimage.transform.ProjectiveTransform): The affine transformation matrix.
        image_shape (tuple[int, int]): The shape of the image as (height, width).

    Returns:
        tuple[np.ndarray, np.ndarray]: A tuple containing:
            - min_coords: An array with the minimum x and y coordinates.
            - max_coords: An array with the maximum x and y coordinates.
    """
    height, width = image_shape[:2]

    # Define the corners of the image
    corners = np.array([[0, 0], [width, 0], [width, height], [0, height]])

    # Transform the corners
    transformed_corners = matrix(corners)

    # Calculate the bounding box of the transformed corners
    min_coords = np.floor(transformed_corners.min(axis=0)).astype(int)
    max_coords = np.ceil(transformed_corners.max(axis=0)).astype(int)

    return min_coords, max_coords
def create_affine_transformation_matrix (translate, shear, scale, rotate, shift) [view source on GitHub]

Create an affine transformation matrix combining translation, shear, scale, and rotation.

This function creates a complex affine transformation by combining multiple transformations in a specific order. The transformations are applied as follows: 1. Shift to top-left: Moves the center of transformation to (0, 0) 2. Apply main transformations: scale, rotation, shear, and translation 3. Shift back to center: Moves the center of transformation back to its original position

The order of these transformations is crucial as matrix multiplications are not commutative.

Parameters:

Name Type Description
translate TranslateDict

Translation in x and y directions. Keys: 'x', 'y'. Values: translation amounts in pixels.

shear ShearDict

Shear in x and y directions. Keys: 'x', 'y'. Values: shear angles in degrees.

scale ScaleDict

Scale factors for x and y directions. Keys: 'x', 'y'. Values: scale factors (1.0 means no scaling).

rotate float

Rotation angle in degrees. Positive values rotate counter-clockwise.

shift tuple[float, float]

Shift to apply before and after transformations. Typically the image center (width/2, height/2).

Returns:

Type Description
skimage.transform.ProjectiveTransform

The resulting affine transformation matrix.

Note

  • All angle inputs (rotate, shear) are in degrees and are converted to radians internally.
  • The order of transformations in the AffineTransform is: scale, rotation, shear, translation.
  • The resulting transformation can be applied to coordinates using the call method.
Source code in albumentations/augmentations/geometric/functional.py
Python
def create_affine_transformation_matrix(
    translate: TranslateDict,
    shear: ShearDict,
    scale: ScaleDict,
    rotate: float,
    shift: tuple[float, float],
) -> skimage.transform.ProjectiveTransform:
    """Create an affine transformation matrix combining translation, shear, scale, and rotation.

    This function creates a complex affine transformation by combining multiple transformations
    in a specific order. The transformations are applied as follows:
    1. Shift to top-left: Moves the center of transformation to (0, 0)
    2. Apply main transformations: scale, rotation, shear, and translation
    3. Shift back to center: Moves the center of transformation back to its original position

    The order of these transformations is crucial as matrix multiplications are not commutative.

    Args:
        translate (TranslateDict): Translation in x and y directions.
                                   Keys: 'x', 'y'. Values: translation amounts in pixels.
        shear (ShearDict): Shear in x and y directions.
                           Keys: 'x', 'y'. Values: shear angles in degrees.
        scale (ScaleDict): Scale factors for x and y directions.
                           Keys: 'x', 'y'. Values: scale factors (1.0 means no scaling).
        rotate (float): Rotation angle in degrees. Positive values rotate counter-clockwise.
        shift (tuple[float, float]): Shift to apply before and after transformations.
                                     Typically the image center (width/2, height/2).

    Returns:
        skimage.transform.ProjectiveTransform: The resulting affine transformation matrix.

    Note:
        - All angle inputs (rotate, shear) are in degrees and are converted to radians internally.
        - The order of transformations in the AffineTransform is: scale, rotation, shear, translation.
        - The resulting transformation can be applied to coordinates using the __call__ method.
    """
    # Step 1: Create matrix to shift to top-left
    # This moves the center of transformation to (0, 0)
    matrix_to_topleft = skimage.transform.SimilarityTransform(translation=[shift[0], shift[1]])

    # Step 2: Create matrix for main transformations
    # This includes scaling, translation, rotation, and x-shear
    matrix_transforms = skimage.transform.AffineTransform(
        scale=(scale["x"], scale["y"]),
        rotation=np.deg2rad(rotate),
        shear=(np.deg2rad(shear["x"]), np.deg2rad(shear["y"])),  # Both x and y shear
        translation=(translate["x"], translate["y"]),
    )

    # Step 3: Create matrix to shift back to center
    # This is the inverse of the top-left shift
    matrix_to_center = matrix_to_topleft.inverse

    # Combine all transformations
    # The order is important: transformations are applied from right to left
    return (
        matrix_to_center  # 3. Shift back to original center
        + matrix_transforms  # 2. Apply main transformations
        + matrix_to_topleft  # 1. Shift to top-left
    )
def d4 (img, group_member) [view source on GitHub]

Applies a D_4 symmetry group transformation to an image array.

This function manipulates an image using transformations such as rotations and flips, corresponding to the D_4 dihedral group symmetry operations. Each transformation is identified by a unique group member code.

  • img (np.ndarray): The input image array to transform.
  • group_member (D4Type): A string identifier indicating the specific transformation to apply. Valid codes include:
  • 'e': Identity (no transformation).
  • 'r90': Rotate 90 degrees counterclockwise.
  • 'r180': Rotate 180 degrees.
  • 'r270': Rotate 270 degrees counterclockwise.
  • 'v': Vertical flip.
  • 'hvt': Transpose over second diagonal
  • 'h': Horizontal flip.
  • 't': Transpose (reflect over the main diagonal).
  • np.ndarray: The transformed image array.
  • ValueError: If an invalid group member is specified.

Examples:

  • Rotating an image by 90 degrees: transformed_image = d4(original_image, 'r90')
  • Applying a horizontal flip to an image: transformed_image = d4(original_image, 'h')
Source code in albumentations/augmentations/geometric/functional.py
Python
def d4(img: np.ndarray, group_member: D4Type) -> np.ndarray:
    """Applies a `D_4` symmetry group transformation to an image array.

    This function manipulates an image using transformations such as rotations and flips,
    corresponding to the `D_4` dihedral group symmetry operations.
    Each transformation is identified by a unique group member code.

    Parameters:
    - img (np.ndarray): The input image array to transform.
    - group_member (D4Type): A string identifier indicating the specific transformation to apply. Valid codes include:
      - 'e': Identity (no transformation).
      - 'r90': Rotate 90 degrees counterclockwise.
      - 'r180': Rotate 180 degrees.
      - 'r270': Rotate 270 degrees counterclockwise.
      - 'v': Vertical flip.
      - 'hvt': Transpose over second diagonal
      - 'h': Horizontal flip.
      - 't': Transpose (reflect over the main diagonal).

    Returns:
    - np.ndarray: The transformed image array.

    Raises:
    - ValueError: If an invalid group member is specified.

    Examples:
    - Rotating an image by 90 degrees:
      `transformed_image = d4(original_image, 'r90')`
    - Applying a horizontal flip to an image:
      `transformed_image = d4(original_image, 'h')`
    """
    transformations = {
        "e": lambda x: x,  # Identity transformation
        "r90": lambda x: rot90(x, 1),  # Rotate 90 degrees
        "r180": lambda x: rot90(x, 2),  # Rotate 180 degrees
        "r270": lambda x: rot90(x, 3),  # Rotate 270 degrees
        "v": vflip,  # Vertical flip
        "hvt": lambda x: transpose(rot90(x, 2)),  # Reflect over anti-diagonal
        "h": hflip,  # Horizontal flip
        "t": transpose,  # Transpose (reflect over main diagonal)
    }

    # Execute the appropriate transformation
    if group_member in transformations:
        return transformations[group_member](img)

    raise ValueError(f"Invalid group member: {group_member}")
def distort_image (image, generated_mesh, interpolation) [view source on GitHub]

Apply perspective distortion to an image based on a generated mesh.

This function applies a perspective transformation to each cell of the image defined by the generated mesh. The distortion is applied using OpenCV's perspective transformation and blending techniques.

Parameters:

Name Type Description
image np.ndarray

The input image to be distorted. Can be a 2D grayscale image or a 3D color image.

generated_mesh np.ndarray

A 2D array where each row represents a quadrilateral cell as [x1, y1, x2, y2, dst_x1, dst_y1, dst_x2, dst_y2, dst_x3, dst_y3, dst_x4, dst_y4]. The first four values define the source rectangle, and the last eight values define the destination quadrilateral.

interpolation int

Interpolation method to be used in the perspective transformation. Should be one of the OpenCV interpolation flags (e.g., cv2.INTER_LINEAR).

Returns:

Type Description
np.ndarray

The distorted image with the same shape and dtype as the input image.

Note

  • The function preserves the channel dimension of the input image.
  • Each cell of the generated mesh is transformed independently and then blended into the output image.
  • The distortion is applied using perspective transformation, which allows for more complex distortions compared to affine transformations.

Examples:

Python
>>> image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
>>> mesh = np.array([[0, 0, 50, 50, 5, 5, 45, 5, 45, 45, 5, 45]])
>>> distorted = distort_image(image, mesh, cv2.INTER_LINEAR)
>>> distorted.shape
(100, 100, 3)
Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_channel_dim
def distort_image(image: np.ndarray, generated_mesh: np.ndarray, interpolation: int) -> np.ndarray:
    """Apply perspective distortion to an image based on a generated mesh.

    This function applies a perspective transformation to each cell of the image defined by the
    generated mesh. The distortion is applied using OpenCV's perspective transformation and
    blending techniques.

    Args:
        image (np.ndarray): The input image to be distorted. Can be a 2D grayscale image or a
                            3D color image.
        generated_mesh (np.ndarray): A 2D array where each row represents a quadrilateral cell
                                    as [x1, y1, x2, y2, dst_x1, dst_y1, dst_x2, dst_y2, dst_x3, dst_y3, dst_x4, dst_y4].
                                    The first four values define the source rectangle, and the last eight values
                                    define the destination quadrilateral.
        interpolation (int): Interpolation method to be used in the perspective transformation.
                             Should be one of the OpenCV interpolation flags (e.g., cv2.INTER_LINEAR).

    Returns:
        np.ndarray: The distorted image with the same shape and dtype as the input image.

    Note:
        - The function preserves the channel dimension of the input image.
        - Each cell of the generated mesh is transformed independently and then blended into the output image.
        - The distortion is applied using perspective transformation, which allows for more complex
          distortions compared to affine transformations.

    Example:
        >>> image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
        >>> mesh = np.array([[0, 0, 50, 50, 5, 5, 45, 5, 45, 45, 5, 45]])
        >>> distorted = distort_image(image, mesh, cv2.INTER_LINEAR)
        >>> distorted.shape
        (100, 100, 3)
    """
    distorted_image = np.zeros_like(image)

    for mesh in generated_mesh:
        # Extract source rectangle and destination quadrilateral
        x1, y1, x2, y2 = mesh[:4]  # Source rectangle
        dst_quad = mesh[4:].reshape(4, 2)  # Destination quadrilateral

        # Convert source rectangle to quadrilateral
        src_quad = np.array(
            [
                [x1, y1],  # Top-left
                [x2, y1],  # Top-right
                [x2, y2],  # Bottom-right
                [x1, y2],  # Bottom-left
            ],
            dtype=np.float32,
        )

        # Calculate Perspective transformation matrix
        perspective_mat = cv2.getPerspectiveTransform(src_quad, dst_quad)

        # Apply Perspective transformation
        warped = cv2.warpPerspective(image, perspective_mat, (image.shape[1], image.shape[0]), flags=interpolation)

        # Create mask for the transformed region
        mask = np.zeros(image.shape[:2], dtype=np.uint8)
        cv2.fillConvexPoly(mask, np.int32(dst_quad), 255)

        # Copy only the warped quadrilateral area to the output image
        distorted_image = cv2.copyTo(warped, mask, distorted_image)

    return distorted_image
def elastic_transform (img, alpha, sigma, interpolation, border_mode, value=None, random_state=None, approximate=False, same_dxdy=False) [view source on GitHub]

Apply an elastic transformation to an image.

Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_channel_dim
def elastic_transform(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    interpolation: int,
    border_mode: int,
    value: ColorType | None = None,
    random_state: np.random.RandomState | None = None,
    approximate: bool = False,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Apply an elastic transformation to an image."""
    if approximate:
        return elastic_transform_approximate(
            img,
            alpha,
            sigma,
            interpolation,
            border_mode,
            value,
            random_state,
            same_dxdy,
        )
    return elastic_transform_precise(
        img,
        alpha,
        sigma,
        interpolation,
        border_mode,
        value,
        random_state,
        same_dxdy,
    )
def elastic_transform_approximate (img, alpha, sigma, interpolation, border_mode, value, random_state, same_dxdy=False) [view source on GitHub]

Apply an approximate elastic transformation to an image.

Source code in albumentations/augmentations/geometric/functional.py
Python
def elastic_transform_approximate(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    interpolation: int,
    border_mode: int,
    value: ColorType | None,
    random_state: np.random.RandomState | None,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Apply an approximate elastic transformation to an image."""
    return elastic_transform_helper(
        img,
        alpha,
        sigma,
        interpolation,
        border_mode,
        value,
        random_state,
        same_dxdy,
        kernel_size=(17, 17),
    )
def elastic_transform_precise (img, alpha, sigma, interpolation, border_mode, value, random_state, same_dxdy=False) [view source on GitHub]

Apply a precise elastic transformation to an image.

This function applies an elastic deformation to the input image using a precise method. The transformation involves creating random displacement fields, smoothing them using Gaussian blur with adaptive kernel size, and then remapping the image according to the smoothed displacement fields.

Parameters:

Name Type Description
img np.ndarray

Input image.

alpha float

Scaling factor for the random displacement fields.

sigma float

Standard deviation for Gaussian blur applied to the displacement fields.

interpolation int

Interpolation method to be used (e.g., cv2.INTER_LINEAR).

border_mode int

Pixel extrapolation method (e.g., cv2.BORDER_CONSTANT).

value ColorType | None

Border value if border_mode is cv2.BORDER_CONSTANT.

random_state np.random.RandomState | None

Random state for reproducibility.

same_dxdy bool

If True, use the same displacement field for both x and y directions.

Returns:

Type Description
np.ndarray

Transformed image with precise elastic deformation applied.

Source code in albumentations/augmentations/geometric/functional.py
Python
def elastic_transform_precise(
    img: np.ndarray,
    alpha: float,
    sigma: float,
    interpolation: int,
    border_mode: int,
    value: ColorType | None,
    random_state: np.random.RandomState | None,
    same_dxdy: bool = False,
) -> np.ndarray:
    """Apply a precise elastic transformation to an image.

    This function applies an elastic deformation to the input image using a precise method.
    The transformation involves creating random displacement fields, smoothing them using Gaussian
    blur with adaptive kernel size, and then remapping the image according to the smoothed displacement fields.

    Args:
        img (np.ndarray): Input image.
        alpha (float): Scaling factor for the random displacement fields.
        sigma (float): Standard deviation for Gaussian blur applied to the displacement fields.
        interpolation (int): Interpolation method to be used (e.g., cv2.INTER_LINEAR).
        border_mode (int): Pixel extrapolation method (e.g., cv2.BORDER_CONSTANT).
        value (ColorType | None): Border value if border_mode is cv2.BORDER_CONSTANT.
        random_state (np.random.RandomState | None): Random state for reproducibility.
        same_dxdy (bool, optional): If True, use the same displacement field for both x and y directions.

    Returns:
        np.ndarray: Transformed image with precise elastic deformation applied.
    """
    return elastic_transform_helper(
        img,
        alpha,
        sigma,
        interpolation,
        border_mode,
        value,
        random_state,
        same_dxdy,
        kernel_size=(0, 0),
    )
def find_keypoint (position, distance_map, threshold, inverted) [view source on GitHub]

Determine if a valid keypoint can be found at the given position.

Source code in albumentations/augmentations/geometric/functional.py
Python
def find_keypoint(
    position: tuple[int, int],
    distance_map: np.ndarray,
    threshold: float | None,
    inverted: bool,
) -> tuple[float, float] | None:
    """Determine if a valid keypoint can be found at the given position."""
    y, x = position
    value = distance_map[y, x]
    if not inverted and threshold is not None and value >= threshold:
        return None
    if inverted and threshold is not None and value <= threshold:
        return None
    return float(x), float(y)
def flip_bboxes (bboxes, flip_horizontal=False, flip_vertical=False, image_shape=(0, 0)) [view source on GitHub]

Flip bounding boxes horizontally and/or vertically.

Parameters:

Name Type Description
bboxes np.ndarray

Array of bounding boxes with shape (n, m) where each row is [x_min, y_min, x_max, y_max, ...].

flip_horizontal bool

Whether to flip horizontally.

flip_vertical bool

Whether to flip vertically.

image_shape tuple[int, int]

Shape of the image as (height, width).

Returns:

Type Description
np.ndarray

Flipped bounding boxes.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def flip_bboxes(
    bboxes: np.ndarray,
    flip_horizontal: bool = False,
    flip_vertical: bool = False,
    image_shape: tuple[int, int] = (0, 0),
) -> np.ndarray:
    """Flip bounding boxes horizontally and/or vertically.

    Args:
        bboxes (np.ndarray): Array of bounding boxes with shape (n, m) where each row is
            [x_min, y_min, x_max, y_max, ...].
        flip_horizontal (bool): Whether to flip horizontally.
        flip_vertical (bool): Whether to flip vertically.
        image_shape (tuple[int, int]): Shape of the image as (height, width).

    Returns:
        np.ndarray: Flipped bounding boxes.
    """
    rows, cols = image_shape[:2]
    flipped_bboxes = bboxes.copy()
    if flip_horizontal:
        flipped_bboxes[:, [0, 2]] = cols - flipped_bboxes[:, [2, 0]]
    if flip_vertical:
        flipped_bboxes[:, [1, 3]] = rows - flipped_bboxes[:, [3, 1]]
    return flipped_bboxes
def from_distance_maps (distance_maps, inverted, if_not_found_coords=None, threshold=None) [view source on GitHub]

Convert distance maps back to keypoints coordinates.

This function is the inverse of to_distance_maps. It takes distance maps generated for a set of keypoints and reconstructs the original keypoint coordinates. The function supports both regular and inverted distance maps, and can handle cases where keypoints are not found or fall outside a specified threshold.

Parameters:

Name Type Description
distance_maps np.ndarray

A 3D numpy array of shape (height, width, nb_keypoints) containing distance maps for each keypoint. Each channel represents the distance map for one keypoint.

inverted bool

If True, treats the distance maps as inverted (where higher values indicate closer proximity to keypoints). If False, treats them as regular distance maps (where lower values indicate closer proximity).

if_not_found_coords Sequence[int] | dict[str, Any] | None

Coordinates to use for keypoints that are not found or fall outside the threshold. Can be: - None: Drop keypoints that are not found. - Sequence of two integers: Use these as (x, y) coordinates for not found keypoints. - Dict with 'x' and 'y' keys: Use these values for not found keypoints. Defaults to None.

threshold float | None

A threshold value to determine valid keypoints. For inverted maps, values >= threshold are considered valid. For regular maps, values <= threshold are considered valid. If None, all keypoints are considered valid. Defaults to None.

Returns:

Type Description
np.ndarray

A 2D numpy array of shape (nb_keypoints, 2) containing the (x, y) coordinates of the reconstructed keypoints. If drop_if_not_found is True (derived from if_not_found_coords), the output may have fewer rows than input keypoints.

Exceptions:

Type Description
ValueError

If the input distance_maps is not a 3D array.

Notes

  • The function uses vectorized operations for improved performance, especially with large numbers of keypoints.
  • When threshold is None, all keypoints are considered valid, and if_not_found_coords is not used.
  • The function assumes that the input distance maps are properly normalized and scaled according to the original image dimensions.

Examples:

Python
>>> distance_maps = np.random.rand(100, 100, 3)  # 3 keypoints
>>> inverted = True
>>> if_not_found_coords = [0, 0]
>>> threshold = 0.5
>>> keypoints = from_distance_maps(distance_maps, inverted, if_not_found_coords, threshold)
>>> print(keypoints.shape)
(3, 2)
Source code in albumentations/augmentations/geometric/functional.py
Python
def from_distance_maps(
    distance_maps: np.ndarray,
    inverted: bool,
    if_not_found_coords: Sequence[int] | dict[str, Any] | None = None,
    threshold: float | None = None,
) -> np.ndarray:
    """Convert distance maps back to keypoints coordinates.

    This function is the inverse of `to_distance_maps`. It takes distance maps generated for a set of keypoints
    and reconstructs the original keypoint coordinates. The function supports both regular and inverted distance maps,
    and can handle cases where keypoints are not found or fall outside a specified threshold.

    Args:
        distance_maps (np.ndarray): A 3D numpy array of shape (height, width, nb_keypoints) containing
            distance maps for each keypoint. Each channel represents the distance map for one keypoint.
        inverted (bool): If True, treats the distance maps as inverted (where higher values indicate
            closer proximity to keypoints). If False, treats them as regular distance maps (where lower
            values indicate closer proximity).
        if_not_found_coords (Sequence[int] | dict[str, Any] | None, optional): Coordinates to use for
            keypoints that are not found or fall outside the threshold. Can be:
            - None: Drop keypoints that are not found.
            - Sequence of two integers: Use these as (x, y) coordinates for not found keypoints.
            - Dict with 'x' and 'y' keys: Use these values for not found keypoints.
            Defaults to None.
        threshold (float | None, optional): A threshold value to determine valid keypoints. For inverted
            maps, values >= threshold are considered valid. For regular maps, values <= threshold are
            considered valid. If None, all keypoints are considered valid. Defaults to None.

    Returns:
        np.ndarray: A 2D numpy array of shape (nb_keypoints, 2) containing the (x, y) coordinates
        of the reconstructed keypoints. If `drop_if_not_found` is True (derived from if_not_found_coords),
        the output may have fewer rows than input keypoints.

    Raises:
        ValueError: If the input `distance_maps` is not a 3D array.

    Notes:
        - The function uses vectorized operations for improved performance, especially with large numbers of keypoints.
        - When `threshold` is None, all keypoints are considered valid, and `if_not_found_coords` is not used.
        - The function assumes that the input distance maps are properly normalized and scaled according to the
          original image dimensions.

    Example:
        >>> distance_maps = np.random.rand(100, 100, 3)  # 3 keypoints
        >>> inverted = True
        >>> if_not_found_coords = [0, 0]
        >>> threshold = 0.5
        >>> keypoints = from_distance_maps(distance_maps, inverted, if_not_found_coords, threshold)
        >>> print(keypoints.shape)
        (3, 2)
    """
    if distance_maps.ndim != NUM_MULTI_CHANNEL_DIMENSIONS:
        msg = f"Expected three-dimensional input, got {distance_maps.ndim} dimensions and shape {distance_maps.shape}."
        raise ValueError(msg)
    height, width, nb_keypoints = distance_maps.shape

    drop_if_not_found, if_not_found_x, if_not_found_y = validate_if_not_found_coords(if_not_found_coords)

    # Find the indices of max/min values for all keypoints at once
    if inverted:
        hitidx_flat = np.argmax(distance_maps.reshape(height * width, nb_keypoints), axis=0)
    else:
        hitidx_flat = np.argmin(distance_maps.reshape(height * width, nb_keypoints), axis=0)

    # Convert flat indices to 2D coordinates
    hitidx_y, hitidx_x = np.unravel_index(hitidx_flat, (height, width))

    # Create keypoints array
    keypoints = np.column_stack((hitidx_x, hitidx_y)).astype(float)

    if threshold is not None:
        # Check threshold condition
        if inverted:
            valid_mask = distance_maps[hitidx_y, hitidx_x, np.arange(nb_keypoints)] >= threshold
        else:
            valid_mask = distance_maps[hitidx_y, hitidx_x, np.arange(nb_keypoints)] <= threshold

        if not drop_if_not_found:
            # Replace invalid keypoints with if_not_found_coords
            keypoints[~valid_mask] = [if_not_found_x, if_not_found_y]
        else:
            # Keep only valid keypoints
            return keypoints[valid_mask]

    return keypoints
def generate_distorted_grid_polygons (dimensions, magnitude) [view source on GitHub]

Generate distorted grid polygons based on input dimensions and magnitude.

This function creates a grid of polygons and applies random distortions to the internal vertices, while keeping the boundary vertices fixed. The distortion is applied consistently across shared vertices to avoid gaps or overlaps in the resulting grid.

Parameters:

Name Type Description
dimensions np.ndarray

A 3D array of shape (grid_height, grid_width, 4) where each element is [x_min, y_min, x_max, y_max] representing the dimensions of a grid cell.

magnitude int

Maximum pixel-wise displacement for distortion. The actual displacement will be randomly chosen in the range [-magnitude, magnitude].

Returns:

Type Description
np.ndarray

A 2D array of shape (total_cells, 8) where each row represents a distorted polygon as [x1, y1, x2, y1, x2, y2, x1, y2]. The total_cells is equal to grid_height * grid_width.

Note

  • Only internal grid points are distorted; boundary points remain fixed.
  • The function ensures consistent distortion across shared vertices of adjacent cells.
  • The distortion is applied to the following points of each internal cell:
    • Bottom-right of the cell above and to the left
    • Bottom-left of the cell above
    • Top-right of the cell to the left
    • Top-left of the current cell
  • Each square represents a cell, and the X marks indicate the coordinates where displacement occurs. +--+--+--+--+ | | | | | +--X--X--X--+ | | | | | +--X--X--X--+ | | | | | +--X--X--X--+ | | | | | +--+--+--+--+
  • For each X, the coordinates of the left, right, top, and bottom edges in the four adjacent cells are displaced.

Examples:

Python
>>> dimensions = np.array([[[0, 0, 50, 50], [50, 0, 100, 50]],
...                        [[0, 50, 50, 100], [50, 50, 100, 100]]])
>>> distorted = generate_distorted_grid_polygons(dimensions, magnitude=10)
>>> distorted.shape
(4, 8)
Source code in albumentations/augmentations/geometric/functional.py
Python
def generate_distorted_grid_polygons(
    dimensions: np.ndarray,
    magnitude: int,
) -> np.ndarray:
    """Generate distorted grid polygons based on input dimensions and magnitude.

    This function creates a grid of polygons and applies random distortions to the internal vertices,
    while keeping the boundary vertices fixed. The distortion is applied consistently across shared
    vertices to avoid gaps or overlaps in the resulting grid.

    Args:
        dimensions (np.ndarray): A 3D array of shape (grid_height, grid_width, 4) where each element
                                 is [x_min, y_min, x_max, y_max] representing the dimensions of a grid cell.
        magnitude (int): Maximum pixel-wise displacement for distortion. The actual displacement
                         will be randomly chosen in the range [-magnitude, magnitude].

    Returns:
        np.ndarray: A 2D array of shape (total_cells, 8) where each row represents a distorted polygon
                    as [x1, y1, x2, y1, x2, y2, x1, y2]. The total_cells is equal to grid_height * grid_width.

    Note:
        - Only internal grid points are distorted; boundary points remain fixed.
        - The function ensures consistent distortion across shared vertices of adjacent cells.
        - The distortion is applied to the following points of each internal cell:
            * Bottom-right of the cell above and to the left
            * Bottom-left of the cell above
            * Top-right of the cell to the left
            * Top-left of the current cell
        - Each square represents a cell, and the X marks indicate the coordinates where displacement occurs.
            +--+--+--+--+
            |  |  |  |  |
            +--X--X--X--+
            |  |  |  |  |
            +--X--X--X--+
            |  |  |  |  |
            +--X--X--X--+
            |  |  |  |  |
            +--+--+--+--+
        - For each X, the coordinates of the left, right, top, and bottom edges
          in the four adjacent cells are displaced.

    Example:
        >>> dimensions = np.array([[[0, 0, 50, 50], [50, 0, 100, 50]],
        ...                        [[0, 50, 50, 100], [50, 50, 100, 100]]])
        >>> distorted = generate_distorted_grid_polygons(dimensions, magnitude=10)
        >>> distorted.shape
        (4, 8)
    """
    grid_height, grid_width = dimensions.shape[:2]
    total_cells = grid_height * grid_width

    # Initialize polygons
    polygons = np.zeros((total_cells, 8), dtype=np.float32)
    polygons[:, 0:2] = dimensions.reshape(-1, 4)[:, [0, 1]]  # x1, y1
    polygons[:, 2:4] = dimensions.reshape(-1, 4)[:, [2, 1]]  # x2, y1
    polygons[:, 4:6] = dimensions.reshape(-1, 4)[:, [2, 3]]  # x2, y2
    polygons[:, 6:8] = dimensions.reshape(-1, 4)[:, [0, 3]]  # x1, y2

    # Generate displacements for internal grid points only
    internal_points_height, internal_points_width = grid_height - 1, grid_width - 1
    displacements = random_utils.randint(
        -magnitude,
        magnitude + 1,
        size=(internal_points_height, internal_points_width, 2),
    ).astype(np.float32)

    # Apply displacements to internal polygon vertices
    for i in range(1, grid_height):
        for j in range(1, grid_width):
            dx, dy = displacements[i - 1, j - 1]

            # Bottom-right of cell (i-1, j-1)
            polygons[(i - 1) * grid_width + (j - 1), 4:6] += [dx, dy]

            # Bottom-left of cell (i-1, j)
            polygons[(i - 1) * grid_width + j, 6:8] += [dx, dy]

            # Top-right of cell (i, j-1)
            polygons[i * grid_width + (j - 1), 2:4] += [dx, dy]

            # Top-left of cell (i, j)
            polygons[i * grid_width + j, 0:2] += [dx, dy]

    return polygons
def generate_reflected_bboxes (bboxes, grid_dims, image_shape, center_in_origin=False) [view source on GitHub]

Generate reflected bounding boxes for the entire reflection grid.

Parameters:

Name Type Description
bboxes np.ndarray

Original bounding boxes.

grid_dims dict[str, tuple[int, int]]

Grid dimensions and original position.

image_shape tuple[int, int]

Shape of the original image as (height, width).

center_in_origin bool

If True, center the grid at the origin. Default is False.

Returns:

Type Description
np.ndarray

Array of reflected and shifted bounding boxes for the entire grid.

Source code in albumentations/augmentations/geometric/functional.py
Python
def generate_reflected_bboxes(
    bboxes: np.ndarray,
    grid_dims: dict[str, tuple[int, int]],
    image_shape: tuple[int, int],
    center_in_origin: bool = False,
) -> np.ndarray:
    """Generate reflected bounding boxes for the entire reflection grid.

    Args:
        bboxes (np.ndarray): Original bounding boxes.
        grid_dims (dict[str, tuple[int, int]]): Grid dimensions and original position.
        image_shape (tuple[int, int]): Shape of the original image as (height, width).
        center_in_origin (bool): If True, center the grid at the origin. Default is False.

    Returns:
        np.ndarray: Array of reflected and shifted bounding boxes for the entire grid.
    """
    rows, cols = image_shape[:2]
    grid_rows, grid_cols = grid_dims["grid_shape"]
    original_row, original_col = grid_dims["original_position"]

    # Prepare flipped versions of bboxes
    bboxes_hflipped = flip_bboxes(bboxes, flip_horizontal=True, image_shape=image_shape)
    bboxes_vflipped = flip_bboxes(bboxes, flip_vertical=True, image_shape=image_shape)
    bboxes_hvflipped = flip_bboxes(bboxes, flip_horizontal=True, flip_vertical=True, image_shape=image_shape)

    # Shift all versions to the original position
    shift_vector = np.array([original_col * cols, original_row * rows, original_col * cols, original_row * rows])
    bboxes = shift_bboxes(bboxes, shift_vector)
    bboxes_hflipped = shift_bboxes(bboxes_hflipped, shift_vector)
    bboxes_vflipped = shift_bboxes(bboxes_vflipped, shift_vector)
    bboxes_hvflipped = shift_bboxes(bboxes_hvflipped, shift_vector)

    new_bboxes = []

    for grid_row in range(grid_rows):
        for grid_col in range(grid_cols):
            # Determine which version of bboxes to use based on grid position
            if (grid_row - original_row) % 2 == 0 and (grid_col - original_col) % 2 == 0:
                current_bboxes = bboxes
            elif (grid_row - original_row) % 2 == 0:
                current_bboxes = bboxes_hflipped
            elif (grid_col - original_col) % 2 == 0:
                current_bboxes = bboxes_vflipped
            else:
                current_bboxes = bboxes_hvflipped

            # Shift to the current grid cell
            cell_shift = np.array(
                [
                    (grid_col - original_col) * cols,
                    (grid_row - original_row) * rows,
                    (grid_col - original_col) * cols,
                    (grid_row - original_row) * rows,
                ],
            )
            shifted_bboxes = shift_bboxes(current_bboxes, cell_shift)

            new_bboxes.append(shifted_bboxes)

    result = np.vstack(new_bboxes)

    return shift_bboxes(result, -shift_vector) if center_in_origin else result
def generate_reflected_keypoints (keypoints, grid_dims, image_shape, center_in_origin=False) [view source on GitHub]

Generate reflected keypoints for the entire reflection grid.

This function creates a grid of keypoints by reflecting and shifting the original keypoints. It handles both centered and non-centered grids based on the center_in_origin parameter.

Parameters:

Name Type Description
keypoints np.ndarray

Original keypoints array of shape (N, 4+), where N is the number of keypoints, and each keypoint is represented by at least 4 values (x, y, angle, scale, ...).

grid_dims dict[str, tuple[int, int]]

A dictionary containing grid dimensions and original position. It should have the following keys: - "grid_shape": tuple[int, int] representing (grid_rows, grid_cols) - "original_position": tuple[int, int] representing (original_row, original_col)

image_shape tuple[int, int]

Shape of the original image as (height, width).

center_in_origin bool

If True, center the grid at the origin. Default is False.

Returns:

Type Description
np.ndarray

Array of reflected and shifted keypoints for the entire grid. The shape is (N * grid_rows * grid_cols, 4+), where N is the number of original keypoints.

Note

  • The function handles keypoint flipping and shifting to create a grid of reflected keypoints.
  • It preserves the angle and scale information of the keypoints during transformations.
  • The resulting grid can be either centered at the origin or positioned based on the original grid.
Source code in albumentations/augmentations/geometric/functional.py
Python
def generate_reflected_keypoints(
    keypoints: np.ndarray,
    grid_dims: dict[str, tuple[int, int]],
    image_shape: tuple[int, int],
    center_in_origin: bool = False,
) -> np.ndarray:
    """Generate reflected keypoints for the entire reflection grid.

    This function creates a grid of keypoints by reflecting and shifting the original keypoints.
    It handles both centered and non-centered grids based on the `center_in_origin` parameter.

    Args:
        keypoints (np.ndarray): Original keypoints array of shape (N, 4+), where N is the number of keypoints,
                                and each keypoint is represented by at least 4 values (x, y, angle, scale, ...).
        grid_dims (dict[str, tuple[int, int]]): A dictionary containing grid dimensions and original position.
            It should have the following keys:
            - "grid_shape": tuple[int, int] representing (grid_rows, grid_cols)
            - "original_position": tuple[int, int] representing (original_row, original_col)
        image_shape (tuple[int, int]): Shape of the original image as (height, width).
        center_in_origin (bool, optional): If True, center the grid at the origin. Default is False.

    Returns:
        np.ndarray: Array of reflected and shifted keypoints for the entire grid. The shape is
                    (N * grid_rows * grid_cols, 4+), where N is the number of original keypoints.

    Note:
        - The function handles keypoint flipping and shifting to create a grid of reflected keypoints.
        - It preserves the angle and scale information of the keypoints during transformations.
        - The resulting grid can be either centered at the origin or positioned based on the original grid.
    """
    grid_rows, grid_cols = grid_dims["grid_shape"]
    original_row, original_col = grid_dims["original_position"]

    # Prepare flipped versions of keypoints
    keypoints_hflipped = flip_keypoints(keypoints, flip_horizontal=True, image_shape=image_shape)
    keypoints_vflipped = flip_keypoints(keypoints, flip_vertical=True, image_shape=image_shape)
    keypoints_hvflipped = flip_keypoints(keypoints, flip_horizontal=True, flip_vertical=True, image_shape=image_shape)

    rows, cols = image_shape[:2]

    # Shift all versions to the original position
    shift_vector = np.array([original_col * cols, original_row * rows, 0, 0])  # Only shift x and y
    keypoints = shift_keypoints(keypoints, shift_vector)
    keypoints_hflipped = shift_keypoints(keypoints_hflipped, shift_vector)
    keypoints_vflipped = shift_keypoints(keypoints_vflipped, shift_vector)
    keypoints_hvflipped = shift_keypoints(keypoints_hvflipped, shift_vector)

    new_keypoints = []

    for grid_row in range(grid_rows):
        for grid_col in range(grid_cols):
            # Determine which version of keypoints to use based on grid position
            if (grid_row - original_row) % 2 == 0 and (grid_col - original_col) % 2 == 0:
                current_keypoints = keypoints
            elif (grid_row - original_row) % 2 == 0:
                current_keypoints = keypoints_hflipped
            elif (grid_col - original_col) % 2 == 0:
                current_keypoints = keypoints_vflipped
            else:
                current_keypoints = keypoints_hvflipped

            # Shift to the current grid cell
            cell_shift = np.array([(grid_col - original_col) * cols, (grid_row - original_row) * rows, 0, 0])
            shifted_keypoints = shift_keypoints(current_keypoints, cell_shift)

            new_keypoints.append(shifted_keypoints)

    result = np.vstack(new_keypoints)

    return shift_keypoints(result, -shift_vector) if center_in_origin else result
def get_pad_grid_dimensions (pad_top, pad_bottom, pad_left, pad_right, image_shape) [view source on GitHub]

Calculate the dimensions of the grid needed for reflection padding and the position of the original image.

Parameters:

Name Type Description
pad_top int

Number of pixels to pad above the image.

pad_bottom int

Number of pixels to pad below the image.

pad_left int

Number of pixels to pad to the left of the image.

pad_right int

Number of pixels to pad to the right of the image.

image_shape tuple[int, int]

Shape of the original image as (height, width).

Returns:

Type Description
dict[str, tuple[int, int]]

A dictionary containing: - 'grid_shape': A tuple (grid_rows, grid_cols) where: - grid_rows (int): Number of times the image needs to be repeated vertically. - grid_cols (int): Number of times the image needs to be repeated horizontally. - 'original_position': A tuple (original_row, original_col) where: - original_row (int): Row index of the original image in the grid. - original_col (int): Column index of the original image in the grid.

Source code in albumentations/augmentations/geometric/functional.py
Python
def get_pad_grid_dimensions(
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    image_shape: tuple[int, int],
) -> dict[str, tuple[int, int]]:
    """Calculate the dimensions of the grid needed for reflection padding and the position of the original image.

    Args:
        pad_top (int): Number of pixels to pad above the image.
        pad_bottom (int): Number of pixels to pad below the image.
        pad_left (int): Number of pixels to pad to the left of the image.
        pad_right (int): Number of pixels to pad to the right of the image.
        image_shape (tuple[int, int]): Shape of the original image as (height, width).

    Returns:
        dict[str, tuple[int, int]]: A dictionary containing:
            - 'grid_shape': A tuple (grid_rows, grid_cols) where:
                - grid_rows (int): Number of times the image needs to be repeated vertically.
                - grid_cols (int): Number of times the image needs to be repeated horizontally.
            - 'original_position': A tuple (original_row, original_col) where:
                - original_row (int): Row index of the original image in the grid.
                - original_col (int): Column index of the original image in the grid.
    """
    rows, cols = image_shape[:2]

    grid_rows = 1 + math.ceil(pad_top / rows) + math.ceil(pad_bottom / rows)
    grid_cols = 1 + math.ceil(pad_left / cols) + math.ceil(pad_right / cols)
    original_row = math.ceil(pad_top / rows)
    original_col = math.ceil(pad_left / cols)

    return {"grid_shape": (grid_rows, grid_cols), "original_position": (original_row, original_col)}
def keypoints_affine (keypoints, matrix, image_shape, scale, mode) [view source on GitHub]

Apply an affine transformation to keypoints.

This function transforms keypoints using the given affine transformation matrix. It handles reflection padding if necessary, updates coordinates, angles, and scales.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (N, 4+) where N is the number of keypoints. Each keypoint is represented as [x, y, angle, scale, ...].

matrix skimage.transform.ProjectiveTransform

The affine transformation matrix.

image_shape tuple[int, int]

Shape of the image (height, width).

scale dict[str, Any]

Dictionary containing scale factors for x and y directions. Expected keys are 'x' and 'y'.

mode int

Border mode for handling keypoints near image edges. Use cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT, etc.

Returns:

Type Description
np.ndarray

Transformed keypoints array with the same shape as input.

Notes

  • The function applies reflection padding if the mode is in REFLECT_BORDER_MODES.
  • Coordinates (x, y) are transformed using the affine matrix.
  • Angles are adjusted based on the rotation component of the affine transformation.
  • Scales are multiplied by the maximum of x and y scale factors.
  • The @angle_2pi_range decorator ensures angles remain in the [0, 2π] range.

Examples:

Python
>>> keypoints = np.array([[100, 100, 0, 1]])
>>> matrix = skimage.transform.ProjectiveTransform(...)
>>> scale = {'x': 1.5, 'y': 1.2}
>>> transformed_keypoints = keypoints_affine(keypoints, matrix, (480, 640), scale, cv2.BORDER_REFLECT_101)
Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_affine(
    keypoints: np.ndarray,
    matrix: skimage.transform.ProjectiveTransform,
    image_shape: tuple[int, int],
    scale: dict[str, Any],
    mode: int,
) -> np.ndarray:
    """Apply an affine transformation to keypoints.

    This function transforms keypoints using the given affine transformation matrix.
    It handles reflection padding if necessary, updates coordinates, angles, and scales.

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (N, 4+) where N is the number of keypoints.
                                Each keypoint is represented as [x, y, angle, scale, ...].
        matrix (skimage.transform.ProjectiveTransform): The affine transformation matrix.
        image_shape (tuple[int, int]): Shape of the image (height, width).
        scale (dict[str, Any]): Dictionary containing scale factors for x and y directions.
                                Expected keys are 'x' and 'y'.
        mode (int): Border mode for handling keypoints near image edges.
                    Use cv2.BORDER_REFLECT_101, cv2.BORDER_REFLECT, etc.

    Returns:
        np.ndarray: Transformed keypoints array with the same shape as input.

    Notes:
        - The function applies reflection padding if the mode is in REFLECT_BORDER_MODES.
        - Coordinates (x, y) are transformed using the affine matrix.
        - Angles are adjusted based on the rotation component of the affine transformation.
        - Scales are multiplied by the maximum of x and y scale factors.
        - The @angle_2pi_range decorator ensures angles remain in the [0, 2π] range.

    Example:
        >>> keypoints = np.array([[100, 100, 0, 1]])
        >>> matrix = skimage.transform.ProjectiveTransform(...)
        >>> scale = {'x': 1.5, 'y': 1.2}
        >>> transformed_keypoints = keypoints_affine(keypoints, matrix, (480, 640), scale, cv2.BORDER_REFLECT_101)
    """
    keypoints = keypoints.copy().astype(np.float32)

    if is_identity_matrix(matrix):
        return keypoints

    if mode in REFLECT_BORDER_MODES:
        # Step 1: Compute affine transform padding
        pad_left, pad_right, pad_top, pad_bottom = calculate_affine_transform_padding(matrix, image_shape)
        grid_dimensions = get_pad_grid_dimensions(pad_top, pad_bottom, pad_left, pad_right, image_shape)
        keypoints = generate_reflected_keypoints(keypoints, grid_dimensions, image_shape, center_in_origin=True)

    # Extract x, y coordinates
    xy = keypoints[:, :2]

    # Transform x, y coordinates
    xy_transformed = cv2.transform(xy.reshape(-1, 1, 2), matrix.params[:2]).squeeze()

    # Calculate angle adjustment
    angle_adjustment = rotation2d_matrix_to_euler_angles(matrix.params[:2], y_up=False)

    # Update angles
    keypoints[:, 2] = keypoints[:, 2] + angle_adjustment

    # Update scales
    max_scale = max(scale["x"], scale["y"])

    keypoints[:, 3] *= max_scale

    # Update x, y coordinates
    keypoints[:, :2] = xy_transformed

    return keypoints
def keypoints_d4 (keypoints, group_member, image_shape, ** params) [view source on GitHub]

Applies a D_4 symmetry group transformation to a keypoint.

This function adjusts a keypoint's coordinates according to the specified D_4 group transformation, which includes rotations and reflections suitable for image processing tasks. These transformations account for the dimensions of the image to ensure the keypoint remains within its boundaries.

  • keypoints (np.ndarray): An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...). -group_member (D4Type): A string identifier for the D_4 group transformation to apply. Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hv', 'h', 't'.
  • image_shape (tuple[int, int]): The shape of the image.
  • params (Any): Not used
  • KeypointInternalType: The transformed keypoint.
  • ValueError: If an invalid group member is specified, indicating that the specified transformation does not exist.

Examples:

  • Rotating a keypoint by 90 degrees in a 100x100 image: keypoint_d4((50, 30), 'r90', 100, 100) This would move the keypoint from (50, 30) to (70, 50) assuming standard coordinate transformations.
Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def keypoints_d4(
    keypoints: np.ndarray,
    group_member: D4Type,
    image_shape: tuple[int, int],
    **params: Any,
) -> np.ndarray:
    """Applies a `D_4` symmetry group transformation to a keypoint.

    This function adjusts a keypoint's coordinates according to the specified `D_4` group transformation,
    which includes rotations and reflections suitable for image processing tasks. These transformations account
    for the dimensions of the image to ensure the keypoint remains within its boundaries.

    Parameters:
    - keypoints (np.ndarray): An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).
    -group_member (D4Type): A string identifier for the `D_4` group transformation to apply.
        Valid values are 'e', 'r90', 'r180', 'r270', 'v', 'hv', 'h', 't'.
    - image_shape (tuple[int, int]): The shape of the image.
    - params (Any): Not used

    Returns:
    - KeypointInternalType: The transformed keypoint.

    Raises:
    - ValueError: If an invalid group member is specified, indicating that the specified transformation does not exist.

    Examples:
    - Rotating a keypoint by 90 degrees in a 100x100 image:
      `keypoint_d4((50, 30), 'r90', 100, 100)`
      This would move the keypoint from (50, 30) to (70, 50) assuming standard coordinate transformations.
    """
    rows, cols = image_shape[:2]
    transformations = {
        "e": lambda x: x,  # Identity transformation
        "r90": lambda x: keypoints_rot90(x, 1, image_shape),  # Rotate 90 degrees
        "r180": lambda x: keypoints_rot90(x, 2, image_shape),  # Rotate 180 degrees
        "r270": lambda x: keypoints_rot90(x, 3, image_shape),  # Rotate 270 degrees
        "v": lambda x: keypoints_vflip(x, rows),  # Vertical flip
        "hvt": lambda x: keypoints_transpose(keypoints_rot90(x, 2, image_shape)),  # Reflect over anti diagonal
        "h": lambda x: keypoints_hflip(x, cols),  # Horizontal flip
        "t": lambda x: keypoints_transpose(x),  # Transpose (reflect over main diagonal)
    }
    # Execute the appropriate transformation
    if group_member in transformations:
        return transformations[group_member](keypoints)

    raise ValueError(f"Invalid group member: {group_member}")
def keypoints_flip (keypoints, d, image_shape) [view source on GitHub]

Flip a keypoint either vertically, horizontally or both depending on the value of d.

Parameters:

Name Type Description
keypoints np.ndarray

A keypoints (x, y, angle, scale).

d int

Number of flip. Must be -1, 0 or 1: * 0 - vertical flip, * 1 - horizontal flip, * -1 - vertical and horizontal flip.

image_shape tuple[int, int]

A tuple of image shape (height, width, channels).

Returns:

Type Description
np.ndarray

A keypoint (x, y, angle, scale).

Exceptions:

Type Description
ValueError

if value of d is not -1, 0 or 1.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_flip(keypoints: np.ndarray, d: int, image_shape: tuple[int, int]) -> np.ndarray:
    """Flip a keypoint either vertically, horizontally or both depending on the value of `d`.

    Args:
        keypoints: A keypoints `(x, y, angle, scale)`.
        d: Number of flip. Must be -1, 0 or 1:
            * 0 - vertical flip,
            * 1 - horizontal flip,
            * -1 - vertical and horizontal flip.
        image_shape: A tuple of image shape `(height, width, channels)`.

    Returns:
        A keypoint `(x, y, angle, scale)`.

    Raises:
        ValueError: if value of `d` is not -1, 0 or 1.

    """
    rows, cols = image_shape[:2]

    if d == 0:
        return keypoints_vflip(keypoints, rows)
    if d == 1:
        return keypoints_hflip(keypoints, cols)
    if d == -1:
        keypoints = keypoints_hflip(keypoints, cols)
        return keypoints_vflip(keypoints, rows)

    raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
def keypoints_hflip (keypoints, cols) [view source on GitHub]

Flip keypoints horizontally around the y-axis.

Parameters:

Name Type Description
keypoints np.ndarray

A numpy array of shape (N, 4+) where each row represents a keypoint (x, y, angle, scale, ...).

cols int

Image width.

Returns:

Type Description
np.ndarray

An array of flipped keypoints with the same shape as the input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_hflip(keypoints: np.ndarray, cols: int) -> np.ndarray:
    """Flip keypoints horizontally around the y-axis.

    Args:
        keypoints: A numpy array of shape (N, 4+) where each row represents a keypoint (x, y, angle, scale, ...).
        cols: Image width.

    Returns:
        np.ndarray: An array of flipped keypoints with the same shape as the input.
    """
    flipped_keypoints = keypoints.copy().astype(np.float32)

    # Flip x-coordinates
    flipped_keypoints[:, 0] = (cols - 1) - keypoints[:, 0]

    # Adjust angles
    flipped_keypoints[:, 2] = np.pi - keypoints[:, 2]

    return flipped_keypoints
def keypoints_rot90 (keypoints, factor, image_shape) [view source on GitHub]

Rotate keypoints by 90 degrees counter-clockwise (CCW) a specified number of times.

Parameters:

Name Type Description
keypoints np.ndarray

An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).

factor int

The number of 90 degree CCW rotations to apply. Must be in the range [0, 3].

image_shape tuple[int, int]

The shape of the image (height, width).

Returns:

Type Description
np.ndarray

The rotated keypoints with the same shape as the input.

Exceptions:

Type Description
ValueError

If the factor is not in the set {0, 1, 2, 3}.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_rot90(
    keypoints: np.ndarray,
    factor: int,
    image_shape: tuple[int, int],
) -> np.ndarray:
    """Rotate keypoints by 90 degrees counter-clockwise (CCW) a specified number of times.

    Args:
        keypoints (np.ndarray): An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).
        factor (int): The number of 90 degree CCW rotations to apply. Must be in the range [0, 3].
        image_shape (tuple[int, int]): The shape of the image (height, width).

    Returns:
        np.ndarray: The rotated keypoints with the same shape as the input.

    Raises:
        ValueError: If the factor is not in the set {0, 1, 2, 3}.
    """
    if factor not in {0, 1, 2, 3}:
        raise ValueError("Parameter factor must be in set {0, 1, 2, 3}")

    if factor == 0:
        return keypoints

    height, width = image_shape[:2]
    rotated_keypoints = keypoints.copy().astype(np.float32)

    x, y, angle = keypoints[:, 0], keypoints[:, 1], keypoints[:, 2]

    if factor == 1:
        rotated_keypoints[:, 0] = y
        rotated_keypoints[:, 1] = width - 1 - x
        rotated_keypoints[:, 2] = angle - np.pi / 2
    elif factor == ROT90_180_FACTOR:
        rotated_keypoints[:, 0] = width - 1 - x
        rotated_keypoints[:, 1] = height - 1 - y
        rotated_keypoints[:, 2] = angle - np.pi
    elif factor == ROT90_270_FACTOR:
        rotated_keypoints[:, 0] = height - 1 - y
        rotated_keypoints[:, 1] = x
        rotated_keypoints[:, 2] = angle + np.pi / 2

    return rotated_keypoints
def keypoints_rotate (keypoints, angle, image_shape) [view source on GitHub]

Rotate keypoints by a specified angle.

Parameters:

Name Type Description
keypoints np.ndarray

An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).

angle float

The angle by which to rotate the keypoints, in degrees.

image_shape tuple[int, int]

The shape of the image the keypoints belong to (height, width).

**params

Additional parameters.

Returns:

Type Description
np.ndarray

The rotated keypoints with the same shape as the input.

Note

The rotation is performed around the center of the image.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_rotate(
    keypoints: np.ndarray,
    angle: float,
    image_shape: tuple[int, int],
) -> np.ndarray:
    """Rotate keypoints by a specified angle.

    Args:
        keypoints (np.ndarray): An array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).
        angle (float): The angle by which to rotate the keypoints, in degrees.
        image_shape (tuple[int, int]): The shape of the image the keypoints belong to (height, width).
        **params: Additional parameters.

    Returns:
        np.ndarray: The rotated keypoints with the same shape as the input.

    Note:
        The rotation is performed around the center of the image.
    """
    image_center = center(image_shape)
    matrix = cv2.getRotationMatrix2D(image_center, angle, 1.0)

    # Create a copy of the input keypoints to avoid modifying the original array
    rotated_keypoints = keypoints.copy().astype(np.float32)

    # Extract x and y coordinates
    xy = rotated_keypoints[:, :2]

    # Rotate x and y coordinates
    xy_rotated = cv2.transform(xy.reshape(-1, 1, 2), matrix).squeeze()

    # Update x and y coordinates
    rotated_keypoints[:, :2] = xy_rotated

    # Update angles
    rotated_keypoints[:, 2] += np.radians(angle)

    return rotated_keypoints
def keypoints_scale (keypoints, scale_x, scale_y) [view source on GitHub]

Scales keypoints by scale_x and scale_y.

Parameters:

Name Type Description
keypoints np.ndarray

A numpy array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).

scale_x float

Scale coefficient x-axis.

scale_y float

Scale coefficient y-axis.

Returns:

Type Description
np.ndarray

A numpy array of scaled keypoints with the same shape as input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def keypoints_scale(keypoints: np.ndarray, scale_x: float, scale_y: float) -> np.ndarray:
    """Scales keypoints by scale_x and scale_y.

    Args:
        keypoints: A numpy array of keypoints with shape (N, 4+) in the format (x, y, angle, scale, ...).
        scale_x: Scale coefficient x-axis.
        scale_y: Scale coefficient y-axis.

    Returns:
        A numpy array of scaled keypoints with the same shape as input.
    """
    # Extract x, y, angle, and scale
    x, y, angle, scale = keypoints[:, 0], keypoints[:, 1], keypoints[:, 2], keypoints[:, 3]

    # Scale x and y
    x_scaled = x * scale_x
    y_scaled = y * scale_y

    # Scale the keypoint scale by the maximum of scale_x and scale_y
    scale_scaled = scale * max(scale_x, scale_y)

    # Create the output array
    scaled_keypoints = np.column_stack([x_scaled, y_scaled, angle, scale_scaled])

    # If there are additional columns, preserve them
    if keypoints.shape[1] > NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS:
        scaled_keypoints = np.column_stack(
            [scaled_keypoints, keypoints[:, NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS:]],
        )

    return scaled_keypoints
def keypoints_transpose (keypoints) [view source on GitHub]

Transposes keypoints along the main diagonal.

Parameters:

Name Type Description
keypoints np.ndarray

A numpy array of shape (N, 4+) where each row represents a keypoint (x, y, angle, scale, ...).

Returns:

Type Description
np.ndarray

An array of transposed keypoints with the same shape as the input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_transpose(keypoints: np.ndarray) -> np.ndarray:
    """Transposes keypoints along the main diagonal.

    Args:
        keypoints: A numpy array of shape (N, 4+) where each row represents a keypoint (x, y, angle, scale, ...).

    Returns:
        np.ndarray: An array of transposed keypoints with the same shape as the input.
    """
    transposed_keypoints = keypoints.copy()

    # Swap x and y coordinates
    transposed_keypoints[:, [0, 1]] = keypoints[:, [1, 0]]

    # Adjust angles to reflect the coordinate swap
    angles = keypoints[:, 2]
    transposed_keypoints[:, 2] = np.where(angles <= np.pi, np.pi / 2 - angles, 3 * np.pi / 2 - angles)

    return transposed_keypoints
def keypoints_vflip (keypoints, rows) [view source on GitHub]

Flip keypoints vertically around the x-axis.

Parameters:

Name Type Description
keypoints np.ndarray

A numpy array of shape (N, 4+) where each row represents a keypoint (x, y, angle, scale, ...).

rows int

Image height.

Returns:

Type Description
np.ndarray

An array of flipped keypoints with the same shape as the input.

Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
@angle_2pi_range
def keypoints_vflip(keypoints: np.ndarray, rows: int) -> np.ndarray:
    """Flip keypoints vertically around the x-axis.

    Args:
        keypoints: A numpy array of shape (N, 4+) where each row represents a keypoint (x, y, angle, scale, ...).
        rows: Image height.

    Returns:
        np.ndarray: An array of flipped keypoints with the same shape as the input.
    """
    flipped_keypoints = keypoints.copy().astype(np.float32)

    # Flip y-coordinates
    flipped_keypoints[:, 1] = (rows - 1) - keypoints[:, 1]

    # Negate angles
    flipped_keypoints[:, 2] = -keypoints[:, 2]

    return flipped_keypoints
def optical_distortion (img, k, dx, dy, interpolation, border_mode, value=None) [view source on GitHub]

Barrel / pincushion distortion. Unconventional augment.

Source code in albumentations/augmentations/geometric/functional.py
Python
@preserve_channel_dim
def optical_distortion(
    img: np.ndarray,
    k: int,
    dx: int,
    dy: int,
    interpolation: int,
    border_mode: int,
    value: ColorType | None = None,
) -> np.ndarray:
    """Barrel / pincushion distortion. Unconventional augment.

    Reference:
        |  https://stackoverflow.com/questions/6199636/formulas-for-barrel-pincushion-distortion
        |  https://stackoverflow.com/questions/10364201/image-transformation-in-opencv
        |  https://stackoverflow.com/questions/2477774/correcting-fisheye-distortion-programmatically
        |  http://www.coldvision.io/2017/03/02/advanced-lane-finding-using-opencv/
    """
    height, width = img.shape[:2]

    fx = width
    fy = height

    cx = width * 0.5 + dx
    cy = height * 0.5 + dy

    camera_matrix = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]], dtype=np.float32)

    distortion = np.array([k, k, 0, 0, 0], dtype=np.float32)
    map1, map2 = cv2.initUndistortRectifyMap(camera_matrix, distortion, None, None, (width, height), cv2.CV_32FC1)
    return cv2.remap(img, map1, map2, interpolation=interpolation, borderMode=border_mode, borderValue=value)
def perspective_bboxes (bboxes, image_shape, matrix, max_width, max_height, keep_size) [view source on GitHub]

Applies perspective transformation to bounding boxes.

This function transforms bounding boxes using the given perspective transformation matrix. It handles bounding boxes with additional attributes beyond the standard coordinates.

Parameters:

Name Type Description
bboxes np.ndarray

An array of bounding boxes with shape (num_bboxes, 4+). Each row represents a bounding box (x_min, y_min, x_max, y_max, ...). Additional columns beyond the first 4 are preserved unchanged.

image_shape tuple[int, int]

The shape of the image (height, width).

matrix np.ndarray

The perspective transformation matrix.

max_width int

The maximum width of the output image.

max_height int

The maximum height of the output image.

keep_size bool

If True, maintains the original image size after transformation.

Returns:

Type Description
np.ndarray

An array of transformed bounding boxes with the same shape as input. The first 4 columns contain the transformed coordinates, and any additional columns are preserved from the input.

Note

  • This function modifies only the coordinate columns (first 4) of the input bounding boxes.
  • Any additional attributes (columns beyond the first 4) are kept unchanged.
  • The function handles denormalization and renormalization of coordinates internally.

Examples:

Python
>>> bboxes = np.array([[0.1, 0.1, 0.3, 0.3, 1], [0.5, 0.5, 0.8, 0.8, 2]])
>>> image_shape = (100, 100)
>>> matrix = np.array([[1.5, 0.2, -20], [-0.1, 1.3, -10], [0.002, 0.001, 1]])
>>> transformed_bboxes = perspective_bboxes(bboxes, image_shape, matrix, 150, 150, False)
Source code in albumentations/augmentations/geometric/functional.py
Python
@handle_empty_array
def perspective_bboxes(
    bboxes: np.ndarray,
    image_shape: tuple[int, int],
    matrix: np.ndarray,
    max_width: int,
    max_height: int,
    keep_size: bool,
) -> np.ndarray:
    """Applies perspective transformation to bounding boxes.

    This function transforms bounding boxes using the given perspective transformation matrix.
    It handles bounding boxes with additional attributes beyond the standard coordinates.

    Args:
        bboxes (np.ndarray): An array of bounding boxes with shape (num_bboxes, 4+).
                             Each row represents a bounding box (x_min, y_min, x_max, y_max, ...).
                             Additional columns beyond the first 4 are preserved unchanged.
        image_shape (tuple[int, int]): The shape of the image (height, width).
        matrix (np.ndarray): The perspective transformation matrix.
        max_width (int): The maximum width of the output image.
        max_height (int): The maximum height of the output image.
        keep_size (bool): If True, maintains the original image size after transformation.

    Returns:
        np.ndarray: An array of transformed bounding boxes with the same shape as input.
                    The first 4 columns contain the transformed coordinates, and any
                    additional columns are preserved from the input.

    Note:
        - This function modifies only the coordinate columns (first 4) of the input bounding boxes.
        - Any additional attributes (columns beyond the first 4) are kept unchanged.
        - The function handles denormalization and renormalization of coordinates internally.

    Example:
        >>> bboxes = np.array([[0.1, 0.1, 0.3, 0.3, 1], [0.5, 0.5, 0.8, 0.8, 2]])
        >>> image_shape = (100, 100)
        >>> matrix = np.array([[1.5, 0.2, -20], [-0.1, 1.3, -10], [0.002, 0.001, 1]])
        >>> transformed_bboxes = perspective_bboxes(bboxes, image_shape, matrix, 150, 150, False)
    """
    height, width = image_shape[:2]

    # Create a copy of the input bboxes to avoid modifying the original array
    transformed_bboxes = bboxes.copy()

    # Denormalize bboxes
    denormalized_coords = denormalize_bboxes(bboxes[:, :4], image_shape)

    # Create points for each bbox
    x_min, y_min, x_max, y_max = denormalized_coords.T
    points = np.array([[x_min, y_min], [x_max, y_min], [x_max, y_max], [x_min, y_max]]).transpose(2, 0, 1)
    # Shape is: (num_bboxes, 4, 2)

    # Reshape points to (num_bboxes * 4, 2)
    points_reshaped = points.reshape(-1, 2)

    # Pad points_reshaped with two columns of zeros
    points_padded = np.pad(points_reshaped, ((0, 0), (0, 2)), mode="constant")

    # Apply perspective transformation to all points at once
    transformed_points = perspective_keypoints(points_padded, image_shape, matrix, max_width, max_height, keep_size)

    # Reshape back to (num_bboxes, 4, 2)
    transformed_points = transformed_points[:, :2].reshape(-1, 4, 2)
    # Get new bounding boxes
    new_coords = np.array(
        [[np.min(box[:, 0]), np.min(box[:, 1]), np.max(box[:, 0]), np.max(box[:, 1])] for box in transformed_points],
    )

    # Normalize the new bounding boxes
    output_shape = (height if keep_size else max_height, width if keep_size else max_width)
    normalized_coords = normalize_bboxes(new_coords, output_shape)

    # Update only the first 4 columns of the bboxes array
    transformed_bboxes[:, :4] = normalized_coords

    return transformed_bboxes
def rotation2d_matrix_to_euler_angles (matrix, y_up) [view source on GitHub]

matrix (np.ndarray): Rotation matrix y_up (bool): is Y axis looks up or down

Source code in albumentations/augmentations/geometric/functional.py
Python
def rotation2d_matrix_to_euler_angles(matrix: np.ndarray, y_up: bool) -> float:
    """Args:
    matrix (np.ndarray): Rotation matrix
    y_up (bool): is Y axis looks up or down

    """
    if y_up:
        return np.arctan2(matrix[1, 0], matrix[0, 0])
    return np.arctan2(-matrix[1, 0], matrix[0, 0])
def shift_bboxes (bboxes, shift_vector) [view source on GitHub]

Shift bounding boxes by a given vector.

Parameters:

Name Type Description
bboxes np.ndarray

Array of bounding boxes with shape (n, m) where n is the number of bboxes and m >= 4. The first 4 columns are [x_min, y_min, x_max, y_max].

shift_vector np.ndarray

Vector to shift the bounding boxes by, with shape (4,) for [shift_x, shift_y, shift_x, shift_y].

Returns:

Type Description
np.ndarray

Shifted bounding boxes with the same shape as input.

Source code in albumentations/augmentations/geometric/functional.py
Python
def shift_bboxes(bboxes: np.ndarray, shift_vector: np.ndarray) -> np.ndarray:
    """Shift bounding boxes by a given vector.

    Args:
        bboxes (np.ndarray): Array of bounding boxes with shape (n, m) where n is the number of bboxes
                             and m >= 4. The first 4 columns are [x_min, y_min, x_max, y_max].
        shift_vector (np.ndarray): Vector to shift the bounding boxes by, with shape (4,) for
                                   [shift_x, shift_y, shift_x, shift_y].

    Returns:
        np.ndarray: Shifted bounding boxes with the same shape as input.
    """
    # Create a copy of the input array to avoid modifying it in-place
    shifted_bboxes = bboxes.copy()

    # Add the shift vector to the first 4 columns
    shifted_bboxes[:, :4] += shift_vector

    return shifted_bboxes
def to_distance_maps (keypoints, image_shape, inverted=False) [view source on GitHub]

Generate a (H,W,N) array of distance maps for N keypoints.

The n-th distance map contains at every location (y, x) the euclidean distance to the n-th keypoint.

This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.

Parameters:

Name Type Description
keypoints np.ndarray

A numpy array of shape (N, 2+) where N is the number of keypoints. Each row represents a keypoint's (x, y) coordinates.

image_shape tuple[int, int]

tuple[int, int] shape of the image (height, width)

inverted bool

If True, inverted distance maps are returned where each distance value d is replaced by d/(d+1), i.e. the distance maps have values in the range (0.0, 1.0] with 1.0 denoting exactly the position of the respective keypoint.

Returns:

Type Description
np.ndarray

A float32 array of shape (H, W, N) containing N distance maps for N keypoints. Each location (y, x, n) in the array denotes the euclidean distance at (y, x) to the n-th keypoint. If inverted is True, the distance d is replaced by d/(d+1). The height and width of the array match the height and width in image_shape.

Source code in albumentations/augmentations/geometric/functional.py
Python
def to_distance_maps(
    keypoints: np.ndarray,
    image_shape: tuple[int, int],
    inverted: bool = False,
) -> np.ndarray:
    """Generate a ``(H,W,N)`` array of distance maps for ``N`` keypoints.

    The ``n``-th distance map contains at every location ``(y, x)`` the
    euclidean distance to the ``n``-th keypoint.

    This function can be used as a helper when augmenting keypoints with a
    method that only supports the augmentation of images.

    Args:
        keypoints: A numpy array of shape (N, 2+) where N is the number of keypoints.
                   Each row represents a keypoint's (x, y) coordinates.
        image_shape: tuple[int, int] shape of the image (height, width)
        inverted (bool): If ``True``, inverted distance maps are returned where each
            distance value d is replaced by ``d/(d+1)``, i.e. the distance
            maps have values in the range ``(0.0, 1.0]`` with ``1.0`` denoting
            exactly the position of the respective keypoint.

    Returns:
        np.ndarray: A ``float32`` array of shape (H, W, N) containing ``N`` distance maps for ``N``
            keypoints. Each location ``(y, x, n)`` in the array denotes the
            euclidean distance at ``(y, x)`` to the ``n``-th keypoint.
            If `inverted` is ``True``, the distance ``d`` is replaced
            by ``d/(d+1)``. The height and width of the array match the
            height and width in ``image_shape``.
    """
    height, width = image_shape[:2]
    if len(keypoints) == 0:
        return np.zeros((height, width, 0), dtype=np.float32)

    # Create coordinate grids
    yy, xx = np.mgrid[:height, :width]

    # Convert keypoints to numpy array
    keypoints_array = np.array(keypoints)

    # Compute distances for all keypoints at once
    distances = np.sqrt(
        (xx[..., np.newaxis] - keypoints_array[:, 0]) ** 2 + (yy[..., np.newaxis] - keypoints_array[:, 1]) ** 2,
    )

    if inverted:
        return (1 / (distances + 1)).astype(np.float32)
    return distances.astype(np.float32)
def transpose (img) [view source on GitHub]

Transposes the first two dimensions of an array of any dimensionality. Retains the order of any additional dimensions.

Parameters:

Name Type Description
img np.ndarray

Input array.

Returns:

Type Description
np.ndarray

Transposed array.

Source code in albumentations/augmentations/geometric/functional.py
Python
def transpose(img: np.ndarray) -> np.ndarray:
    """Transposes the first two dimensions of an array of any dimensionality.
    Retains the order of any additional dimensions.

    Args:
        img (np.ndarray): Input array.

    Returns:
        np.ndarray: Transposed array.
    """
    # Generate the new axes order
    new_axes = list(range(img.ndim))
    new_axes[0], new_axes[1] = 1, 0  # Swap the first two dimensions

    # Transpose the array using the new axes order
    return img.transpose(new_axes)
def validate_bboxes (bboxes, image_shape) [view source on GitHub]

Validate bounding boxes and remove invalid ones.

Parameters:

Name Type Description
bboxes np.ndarray

Array of bounding boxes with shape (n, 4) where each row is [x_min, y_min, x_max, y_max].

image_shape tuple[int, int]

Shape of the image as (height, width).

Returns:

Type Description
np.ndarray

Array of valid bounding boxes, potentially with fewer boxes than the input.

Examples:

Python
>>> bboxes = np.array([[10, 20, 30, 40], [-10, -10, 5, 5], [100, 100, 120, 120]])
>>> valid_bboxes = validate_bboxes(bboxes, (100, 100))
>>> print(valid_bboxes)
[[10 20 30 40]]
Source code in albumentations/augmentations/geometric/functional.py
Python
def validate_bboxes(bboxes: np.ndarray, image_shape: Sequence[int]) -> np.ndarray:
    """Validate bounding boxes and remove invalid ones.

    Args:
        bboxes (np.ndarray): Array of bounding boxes with shape (n, 4) where each row is [x_min, y_min, x_max, y_max].
        image_shape (tuple[int, int]): Shape of the image as (height, width).

    Returns:
        np.ndarray: Array of valid bounding boxes, potentially with fewer boxes than the input.

    Example:
        >>> bboxes = np.array([[10, 20, 30, 40], [-10, -10, 5, 5], [100, 100, 120, 120]])
        >>> valid_bboxes = validate_bboxes(bboxes, (100, 100))
        >>> print(valid_bboxes)
        [[10 20 30 40]]
    """
    rows, cols = image_shape[:2]

    x_min, y_min, x_max, y_max = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]

    valid_indices = (x_max > 0) & (y_max > 0) & (x_min < cols) & (y_min < rows)

    return bboxes[valid_indices]
def validate_if_not_found_coords (if_not_found_coords) [view source on GitHub]

Validate and process if_not_found_coords parameter.

Source code in albumentations/augmentations/geometric/functional.py
Python
def validate_if_not_found_coords(
    if_not_found_coords: Sequence[int] | dict[str, Any] | None,
) -> tuple[bool, float, float]:
    """Validate and process `if_not_found_coords` parameter."""
    if if_not_found_coords is None:
        return True, -1, -1
    if isinstance(if_not_found_coords, (tuple, list)):
        if len(if_not_found_coords) != PAIR:
            msg = "Expected tuple/list 'if_not_found_coords' to contain exactly two entries."
            raise ValueError(msg)
        return False, if_not_found_coords[0], if_not_found_coords[1]
    if isinstance(if_not_found_coords, dict):
        return False, if_not_found_coords["x"], if_not_found_coords["y"]

    msg = "Expected if_not_found_coords to be None, tuple, list, or dict."
    raise ValueError(msg)
def validate_keypoints (keypoints, image_shape) [view source on GitHub]

Validate keypoints and remove those that fall outside the image boundaries.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (N, M) where N is the number of keypoints and M >= 2. The first two columns represent x and y coordinates.

image_shape tuple[int, int]

Shape of the image as (height, width).

Returns:

Type Description
np.ndarray

Array of valid keypoints that fall within the image boundaries.

Note

This function only checks the x and y coordinates (first two columns) of the keypoints. Any additional columns (e.g., angle, scale) are preserved for valid keypoints.

Source code in albumentations/augmentations/geometric/functional.py
Python
def validate_keypoints(keypoints: np.ndarray, image_shape: tuple[int, int]) -> np.ndarray:
    """Validate keypoints and remove those that fall outside the image boundaries.

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (N, M) where N is the number of keypoints
                                and M >= 2. The first two columns represent x and y coordinates.
        image_shape (tuple[int, int]): Shape of the image as (height, width).

    Returns:
        np.ndarray: Array of valid keypoints that fall within the image boundaries.

    Note:
        This function only checks the x and y coordinates (first two columns) of the keypoints.
        Any additional columns (e.g., angle, scale) are preserved for valid keypoints.
    """
    rows, cols = image_shape[:2]

    x, y = keypoints[:, 0], keypoints[:, 1]

    valid_indices = (x >= 0) & (x < cols) & (y >= 0) & (y < rows)

    return keypoints[valid_indices]

resize

class LongestMaxSize (max_size=1024, interpolation=1, always_apply=None, p=1) [view source on GitHub]

Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name Type Description
max_size int, list of int

maximum size of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.

interpolation OpenCV flag

interpolation method. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class LongestMaxSize(DualTransform):
    """Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
            will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(MaxSizeInitSchema):
        pass

    def __init__(
        self,
        max_size: int | Sequence[int] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self,
        img: np.ndarray,
        max_size: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.longest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        # Bounding box coordinates are scale invariant
        return bboxes

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        max_size: int,
        **params: Any,
    ) -> np.ndarray:
        image_shape = params["shape"][:2]

        scale = max_size / max(image_shape)
        return fgeometric.keypoints_scale(keypoints, scale, scale)

    def get_params(self) -> dict[str, int]:
        return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_size", "interpolation"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class InitSchema(MaxSizeInitSchema):
    pass
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, max_size, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py
Python
def apply(
    self,
    img: np.ndarray,
    max_size: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.longest_max_size(img, max_size=max_size, interpolation=interpolation)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/resize.py
Python
def get_params(self) -> dict[str, int]:
    return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/resize.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "max_size", "interpolation"
class MaxSizeInitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class MaxSizeInitSchema(BaseTransformInitSchema):
    max_size: int | list[int] = Field(
        default=1024,
        description="Maximum size of the smallest side of the image after the transformation.",
    )
    interpolation: InterpolationType = cv2.INTER_LINEAR
    p: ProbabilityType = 1

    @field_validator("max_size")
    @classmethod
    def check_scale_limit(cls, v: ScaleIntType, info: ValidationInfo) -> int | list[int]:
        result = v if isinstance(v, (list, tuple)) else [v]
        for value in result:
            if not value >= 1:
                raise ValueError(f"{info.field_name} must be bigger or equal to 1.")

        return cast(Union[int, List[int]], result)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class RandomScale (scale_limit=0.1, interpolation=1, always_apply=None, p=0.5) [view source on GitHub]

Randomly resize the input. Output image size is different from the input image size.

Parameters:

Name Type Description
scale_limit float, float) or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class RandomScale(DualTransform):
    """Randomly resize the input. Output image size is different from the input image size.

    Args:
        scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
            range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
            If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
            Default: (-0.1, 0.1).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale_limit: ScaleFloatType
        interpolation: InterpolationType

        @field_validator("scale_limit")
        @classmethod
        def check_scale_limit(cls, v: ScaleFloatType) -> tuple[float, float]:
            return to_tuple(v, bias=1.0)

    def __init__(
        self,
        scale_limit: ScaleFloatType = 0.1,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.scale_limit = cast(Tuple[float, float], scale_limit)
        self.interpolation = interpolation

    def get_params(self) -> dict[str, float]:
        return {"scale": random.uniform(self.scale_limit[0], self.scale_limit[1])}

    def apply(
        self,
        img: np.ndarray,
        scale: float,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.scale(img, scale, interpolation)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        # Bounding box coordinates are scale invariant
        return bboxes

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        scale: float,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.keypoints_scale(keypoints, scale, scale)

    def get_transform_init_args(self) -> dict[str, Any]:
        return {"interpolation": self.interpolation, "scale_limit": to_tuple(self.scale_limit, bias=-1.0)}
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class InitSchema(BaseTransformInitSchema):
    scale_limit: ScaleFloatType
    interpolation: InterpolationType

    @field_validator("scale_limit")
    @classmethod
    def check_scale_limit(cls, v: ScaleFloatType) -> tuple[float, float]:
        return to_tuple(v, bias=1.0)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, scale, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py
Python
def apply(
    self,
    img: np.ndarray,
    scale: float,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.scale(img, scale, interpolation)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/resize.py
Python
def get_params(self) -> dict[str, float]:
    return {"scale": random.uniform(self.scale_limit[0], self.scale_limit[1])}
class Resize (height, width, interpolation=1, always_apply=None, p=1) [view source on GitHub]

Resize the input to the given height and width.

Parameters:

Name Type Description
height int

desired height of the output.

width int

desired width of the output.

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class Resize(DualTransform):
    """Resize the input to the given height and width.

    Args:
        height (int): desired height of the output.
        width (int): desired width of the output.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        height: int = Field(ge=1, description="Desired height of the output.")
        width: int = Field(ge=1, description="Desired width of the output.")
        interpolation: InterpolationType = cv2.INTER_LINEAR
        p: ProbabilityType = 1

    def __init__(
        self,
        height: int,
        width: int,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)
        self.height = height
        self.width = width
        self.interpolation = interpolation

    def apply(self, img: np.ndarray, interpolation: int, **params: Any) -> np.ndarray:
        return fgeometric.resize(img, (self.height, self.width), interpolation=interpolation)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        # Bounding box coordinates are scale invariant
        return bboxes

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        height, width = params["shape"][:2]
        scale_x = self.width / width
        scale_y = self.height / height
        return fgeometric.keypoints_scale(keypoints, scale_x, scale_y)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "height", "width", "interpolation"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class InitSchema(BaseTransformInitSchema):
    height: int = Field(ge=1, description="Desired height of the output.")
    width: int = Field(ge=1, description="Desired width of the output.")
    interpolation: InterpolationType = cv2.INTER_LINEAR
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py
Python
def apply(self, img: np.ndarray, interpolation: int, **params: Any) -> np.ndarray:
    return fgeometric.resize(img, (self.height, self.width), interpolation=interpolation)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/resize.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "height", "width", "interpolation"
class SmallestMaxSize (max_size=1024, interpolation=1, always_apply=None, p=1) [view source on GitHub]

Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:

Name Type Description
max_size int, list of int

maximum size of smallest side of the image after the transformation. When using a list, max size will be randomly selected from the values in the list.

interpolation OpenCV flag

interpolation method. Default: cv2.INTER_LINEAR.

p float

probability of applying the transform. Default: 1.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class SmallestMaxSize(DualTransform):
    """Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of smallest side of the image after the transformation. When using a
            list, max size will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(MaxSizeInitSchema):
        pass

    def __init__(
        self,
        max_size: int | Sequence[int] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p, always_apply)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self,
        img: np.ndarray,
        max_size: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.smallest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        # Bounding box coordinates are scale invariant
        return bboxes

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        max_size: int,
        **params: Any,
    ) -> np.ndarray:
        image_shape = params["shape"][:2]

        scale = max_size / min(image_shape)
        return fgeometric.keypoints_scale(keypoints, scale, scale)

    def get_params(self) -> dict[str, int]:
        return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "max_size", "interpolation"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/resize.py
Python
class InitSchema(MaxSizeInitSchema):
    pass
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, max_size, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/resize.py
Python
def apply(
    self,
    img: np.ndarray,
    max_size: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.smallest_max_size(img, max_size=max_size, interpolation=interpolation)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/resize.py
Python
def get_params(self) -> dict[str, int]:
    return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/resize.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "max_size", "interpolation"

rotate

class RandomRotate90 [view source on GitHub]

Randomly rotate the input by 90 degrees zero or more times.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py
Python
class RandomRotate90(DualTransform):
    """Randomly rotate the input by 90 degrees zero or more times.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, factor: int, **params: Any) -> np.ndarray:
        return fgeometric.rot90(img, factor)

    def get_params(self) -> dict[str, int]:
        # Random int in the range [0, 3]
        return {"factor": random.randint(0, 3)}

    def apply_to_bboxes(self, bboxes: np.ndarray, factor: int, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_rot90(bboxes, factor)

    def apply_to_keypoints(self, keypoints: np.ndarray, factor: int, **params: Any) -> np.ndarray:
        return fgeometric.keypoints_rot90(keypoints, factor, params["shape"])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
apply (self, img, factor, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def apply(self, img: np.ndarray, factor: int, **params: Any) -> np.ndarray:
    return fgeometric.rot90(img, factor)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def get_params(self) -> dict[str, int]:
    # Random int in the range [0, 3]
    return {"factor": random.randint(0, 3)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()
class Rotate (limit=(-90, 90), interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', crop_border=False, always_apply=None, p=0.5) [view source on GitHub]

Rotate the input by an angle selected randomly from the uniform distribution.

Parameters:

Name Type Description
limit ScaleFloatType

range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90)

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"

crop_border bool

If True would make a largest possible crop within rotated image

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py
Python
class Rotate(DualTransform):
    """Rotate the input by an angle selected randomly from the uniform distribution.

    Args:
        limit: range from which a random angle is picked. If limit is a single int
            an angle is picked from (-limit, limit). Default: (-90, 90)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
            Default: "largest_box"
        crop_border (bool): If True would make a largest possible crop within rotated image
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(RotateInitSchema):
        rotate_method: Literal["largest_box", "ellipse"]
        crop_border: bool

    def __init__(
        self,
        limit: ScaleFloatType = (-90, 90),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        crop_border: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.limit = cast(Tuple[float, float], limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.rotate_method = rotate_method
        self.crop_border = crop_border

    def apply(
        self,
        img: np.ndarray,
        angle: float,
        interpolation: int,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        img_out = fgeometric.rotate(img, angle, interpolation, self.border_mode, self.value)
        if self.crop_border:
            return fcrops.crop(img_out, x_min, y_min, x_max, y_max)
        return img_out

    def apply_to_mask(
        self,
        mask: np.ndarray,
        angle: float,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        img_out = fgeometric.rotate(mask, angle, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
        if self.crop_border:
            return fcrops.crop(img_out, x_min, y_min, x_max, y_max)
        return img_out

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        angle: float,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        image_shape = params["shape"][:2]
        bboxes_out = fgeometric.bboxes_rotate(bboxes, angle, self.rotate_method, image_shape)
        if self.crop_border:
            return fcrops.crop_bboxes_by_coords(bboxes_out, (x_min, y_min, x_max, y_max), image_shape)
        return bboxes_out

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        angle: float,
        x_min: int,
        x_max: int,
        y_min: int,
        y_max: int,
        **params: Any,
    ) -> np.ndarray:
        keypoints_out = fgeometric.keypoints_rotate(keypoints, angle, params["shape"][:2])
        if self.crop_border:
            return fcrops.crop_keypoints_by_coords(keypoints_out, (x_min, y_min, x_max, y_max))
        return keypoints_out

    @staticmethod
    def _rotated_rect_with_max_area(height: int, width: int, angle: float) -> dict[str, int]:
        """Given a rectangle of size wxh that has been rotated by 'angle' (in
        degrees), computes the width and height of the largest possible
        axis-aligned rectangle (maximal area) within the rotated rectangle.

        Reference:
            https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders
        """
        angle = math.radians(angle)
        width_is_longer = width >= height
        side_long, side_short = (width, height) if width_is_longer else (height, width)

        # since the solutions for angle, -angle and 180-angle are all the same,
        # it is sufficient to look at the first quadrant and the absolute values of sin,cos:
        sin_a, cos_a = abs(math.sin(angle)), abs(math.cos(angle))
        if side_short <= 2.0 * sin_a * cos_a * side_long or abs(sin_a - cos_a) < SMALL_NUMBER:
            # half constrained case: two crop corners touch the longer side,
            # the other two corners are on the mid-line parallel to the longer line
            x = 0.5 * side_short
            wr, hr = (x / sin_a, x / cos_a) if width_is_longer else (x / cos_a, x / sin_a)
        else:
            # fully constrained case: crop touches all 4 sides
            cos_2a = cos_a * cos_a - sin_a * sin_a
            wr, hr = (width * cos_a - height * sin_a) / cos_2a, (height * cos_a - width * sin_a) / cos_2a

        return {
            "x_min": max(0, int(width / 2 - wr / 2)),
            "x_max": min(width, int(width / 2 + wr / 2)),
            "y_min": max(0, int(height / 2 - hr / 2)),
            "y_max": min(height, int(height / 2 + hr / 2)),
        }

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        out_params = {"angle": random.uniform(self.limit[0], self.limit[1])}
        if self.crop_border:
            height, width = params["shape"][:2]
            out_params.update(self._rotated_rect_with_max_area(height, width, out_params["angle"]))
        else:
            out_params.update({"x_min": -1, "x_max": -1, "y_min": -1, "y_max": -1})

        return out_params

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method", "crop_border"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py
Python
class InitSchema(RotateInitSchema):
    rotate_method: Literal["largest_box", "ellipse"]
    crop_border: bool
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, angle, interpolation, x_min, x_max, y_min, y_max, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def apply(
    self,
    img: np.ndarray,
    angle: float,
    interpolation: int,
    x_min: int,
    x_max: int,
    y_min: int,
    y_max: int,
    **params: Any,
) -> np.ndarray:
    img_out = fgeometric.rotate(img, angle, interpolation, self.border_mode, self.value)
    if self.crop_border:
        return fcrops.crop(img_out, x_min, y_min, x_max, y_max)
    return img_out
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    out_params = {"angle": random.uniform(self.limit[0], self.limit[1])}
    if self.crop_border:
        height, width = params["shape"][:2]
        out_params.update(self._rotated_rect_with_max_area(height, width, out_params["angle"]))
    else:
        out_params.update({"x_min": -1, "x_max": -1, "y_min": -1, "y_max": -1})

    return out_params
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method", "crop_border"
class RotateInitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py
Python
class RotateInitSchema(BaseTransformInitSchema):
    limit: SymmetricRangeType

    interpolation: InterpolationType
    border_mode: BorderModeType

    value: ColorType | None
    mask_value: ColorType | None
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class SafeRotate (limit=(-90, 90), interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', always_apply=None, p=0.5) [view source on GitHub]

Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

This transformation ensures that the entire rotated image fits within the original frame by scaling it down if necessary. The resulting image maintains its original dimensions but may contain artifacts due to the rotation and scaling process.

Parameters:

Name Type Description
limit float, tuple of float

Range from which a random angle is picked. If limit is a single float, an angle is picked from (-limit, limit). Default: (-90, 90)

interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

Flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of int, list of float

Padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

Padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

rotate_method str

Method to rotate bounding boxes. Should be 'largest_box' or 'ellipse'. Default: 'largest_box'

p float

Probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

  • The rotation is performed around the center of the image.
  • After rotation, the image is scaled to fit within the original frame, which may cause some distortion.
  • The output image will always have the same dimensions as the input image.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py
Python
class SafeRotate(Affine):
    """Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.

    This transformation ensures that the entire rotated image fits within the original frame by scaling it
    down if necessary. The resulting image maintains its original dimensions but may contain artifacts due to the
    rotation and scaling process.

    Args:
        limit (float, tuple of float): Range from which a random angle is picked. If limit is a single float,
            an angle is picked from (-limit, limit). Default: (-90, 90)
        interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): Flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of int, list of float): Padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float, list of int, list of float): Padding value if border_mode is cv2.BORDER_CONSTANT applied
            for masks.
        rotate_method (str): Method to rotate bounding boxes. Should be 'largest_box' or 'ellipse'.
            Default: 'largest_box'
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        - The rotation is performed around the center of the image.
        - After rotation, the image is scaled to fit within the original frame, which may cause some distortion.
        - The output image will always have the same dimensions as the input image.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(RotateInitSchema):
        rotate_method: Literal["largest_box", "ellipse"]

    def __init__(
        self,
        limit: ScaleFloatType = (-90, 90),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        value = 0 if value is None else value
        mask_value = 0 if mask_value is None else mask_value
        super().__init__(
            rotate=limit,
            interpolation=interpolation,
            mode=border_mode,
            cval=value,
            cval_mask=mask_value,
            rotate_method=rotate_method,
            fit_output=True,
            p=p,
            always_apply=always_apply,
        )
        self.limit = cast(Tuple[float, float], limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.rotate_method = rotate_method

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method"

    def _create_safe_rotate_matrix(
        self,
        angle: float,
        center: tuple[float, float],
        image_shape: tuple[int, int],
    ) -> tuple[ProjectiveTransform, dict[str, float]]:
        height, width = image_shape[:2]
        rotation_mat = cv2.getRotationMatrix2D(center, angle, 1.0)

        # Calculate new image size
        abs_cos = abs(rotation_mat[0, 0])
        abs_sin = abs(rotation_mat[0, 1])
        new_w = int(height * abs_sin + width * abs_cos)
        new_h = int(height * abs_cos + width * abs_sin)

        # Adjust the rotation matrix to take into account the new size
        rotation_mat[0, 2] += new_w / 2 - center[0]
        rotation_mat[1, 2] += new_h / 2 - center[1]

        # Calculate scaling factors
        scale_x = width / new_w
        scale_y = height / new_h

        # Create scaling matrix
        scale_mat = np.array([[scale_x, 0, 0], [0, scale_y, 0], [0, 0, 1]])

        # Combine rotation and scaling
        matrix = scale_mat @ np.vstack([rotation_mat, [0, 0, 1]])

        return ProjectiveTransform(matrix=matrix), {"x": scale_x, "y": scale_y}

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image_shape = params["shape"][:2]
        angle = random.uniform(self.limit[0], self.limit[1])

        # Calculate centers for image and bbox
        image_center = fgeometric.center(image_shape)
        bbox_center = fgeometric.center_bbox(image_shape)

        # Create matrices for image and bbox
        matrix, scale = self._create_safe_rotate_matrix(angle, image_center, image_shape)
        bbox_matrix, _ = self._create_safe_rotate_matrix(angle, bbox_center, image_shape)

        return {
            "rotate": angle,
            "scale": scale,
            "matrix": matrix,
            "bbox_matrix": bbox_matrix,
            "output_shape": image_shape,
        }
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/rotate.py
Python
class InitSchema(RotateInitSchema):
    rotate_method: Literal["largest_box", "ellipse"]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image_shape = params["shape"][:2]
    angle = random.uniform(self.limit[0], self.limit[1])

    # Calculate centers for image and bbox
    image_center = fgeometric.center(image_shape)
    bbox_center = fgeometric.center_bbox(image_shape)

    # Create matrices for image and bbox
    matrix, scale = self._create_safe_rotate_matrix(angle, image_center, image_shape)
    bbox_matrix, _ = self._create_safe_rotate_matrix(angle, bbox_center, image_shape)

    return {
        "rotate": angle,
        "scale": scale,
        "matrix": matrix,
        "bbox_matrix": bbox_matrix,
        "output_shape": image_shape,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/rotate.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method"

transforms

class Affine (scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', balanced_scale=False, always_apply=None, p=0.5) [view source on GitHub]

Augmentation to apply affine transformations to images.

Affine transformations involve:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)

All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval and mode of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.

Parameters:

Name Type Description
scale number, tuple of number or dict

Scaling factor to use, where 1.0 denotes "no change" and 0.5 is zoomed out to 50 percent of the original size. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set keep_ratio=True, then the same value will be used for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes. Note that when the keep_ratio=True, the x- and y-axis ranges should be the same.

translate_percent None, number, tuple of number or dict

Translation as a fraction of the image height/width (x-translation, y-translation), where 0 denotes "no change" and 0.5 denotes "half of the axis size". * If None then equivalent to 0.0 unless translate_px has a value other than None. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

translate_px None, int, tuple of int or dict

Translation in pixels. * If None then equivalent to 0 unless translate_percent has a value other than None. * If a single int, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the discrete interval [a..b]. That number will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

rotate number or tuple of number

Rotation in degrees (NOT radians), i.e. expected value range is around [-360, 360]. Rotation happens around the center of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b] and used as the rotation value.

shear number, tuple of number or dict

Shear in degrees (NOT radians), i.e. expected value range is around [-360, 360], with reasonable values being in the range of [-45, 45]. * If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done). * If a tuple (a, b), then two value will be uniformly sampled per image from the interval [a, b] and be used as the x- and y-shear value. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

interpolation int

OpenCV interpolation flag.

mask_interpolation int

OpenCV interpolation flag.

cval number or sequence of number

The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when mode=constant. The expected value range is [0, 255] for uint8 images.

cval_mask number or tuple of number

Same as cval but only for masks.

mode int

OpenCV border flag.

fit_output bool

If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (translate_percent and translate_px are ignored). Otherwise (False), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False

keep_ratio bool

When True, the original aspect ratio will be kept when the random scale is applied. Default: False.

rotate_method Literal["largest_box", "ellipse"]

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"

balanced_scale bool

When True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False.

This is important because without it, scaling tends to lean towards upscaling. For example, if we want the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly from [0.5, 2]. With balanced_scale, the function ensures that half the time, the scaling factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in). This makes the zooming in and out process more balanced.

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Affine(DualTransform):
    """Augmentation to apply affine transformations to images.

    Affine transformations involve:

        - Translation ("move" image on the x-/y-axis)
        - Rotation
        - Scaling ("zoom" in/out)
        - Shear (move one side of the image, turning a square into a trapezoid)

    All such transformations can create "new" pixels in the image without a defined content, e.g.
    if the image is translated to the left, pixels are created on the right.
    A method has to be defined to deal with these pixel values.
    The parameters `cval` and `mode` of this class deal with this.

    Some transformations involve interpolations between several pixels
    of the input image to generate output pixel values. The parameters `interpolation` and
    `mask_interpolation` deals with the method of interpolation used for this.

    Args:
        scale (number, tuple of number or dict): Scaling factor to use, where ``1.0`` denotes "no change" and
            ``0.5`` is zoomed out to ``50`` percent of the original size.
                * If a single number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
                  That the same range will be used for both x- and y-axis. To keep the aspect ratio, set
                  ``keep_ratio=True``, then the same value will be used for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes. Note that when
                  the ``keep_ratio=True``, the x- and y-axis ranges should be the same.
        translate_percent (None, number, tuple of number or dict): Translation as a fraction of the image height/width
            (x-translation, y-translation), where ``0`` denotes "no change"
            and ``0.5`` denotes "half of the axis size".
                * If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``.
                * If a single number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
                  That sampled fraction value will be used identically for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        translate_px (None, int, tuple of int or dict): Translation in pixels.
                * If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``.
                * If a single int, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from
                  the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        rotate (number or tuple of number): Rotation in degrees (**NOT** radians), i.e. expected value range is
            around ``[-360, 360]``. Rotation happens around the *center* of the image,
            not the top left corner as in some other frameworks.
                * If a number, then that value will be used for all images.
                * If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``
                  and used as the rotation value.
        shear (number, tuple of number or dict): Shear in degrees (**NOT** radians), i.e. expected value range is
            around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``.
                * If a number, then that value will be used for all images as
                  the shear on the x-axis (no shear on the y-axis will be done).
                * If a tuple ``(a, b)``, then two value will be uniformly sampled per image
                  from the interval ``[a, b]`` and be used as the x- and y-shear value.
                * If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
                  Each of these keys can have the same values as described above.
                  Using a dictionary allows to set different values for the two axis and sampling will then happen
                  *independently* per axis, resulting in samples that differ between the axes.
        interpolation (int): OpenCV interpolation flag.
        mask_interpolation (int): OpenCV interpolation flag.
        cval (number or sequence of number): The constant value to use when filling in newly created pixels.
            (E.g. translating by 1px to the right will create a new 1px-wide column of pixels
            on the left of the image).
            The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images.
        cval_mask (number or tuple of number): Same as cval but only for masks.
        mode (int): OpenCV border flag.
        fit_output (bool): If True, the image plane size and position will be adjusted to tightly capture
            the whole image after affine transformation (`translate_percent` and `translate_px` are ignored).
            Otherwise (``False``),  parts of the transformed image may end up outside the image plane.
            Fitting the output shape can be useful to avoid corners of the image being outside the image plane
            after applying rotations. Default: False
        keep_ratio (bool): When True, the original aspect ratio will be kept when the random scale is applied.
            Default: False.
        rotate_method (Literal["largest_box", "ellipse"]): rotation method used for the bounding boxes.
            Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"
        balanced_scale (bool): When True, scaling factors are chosen to be either entirely below or above 1,
            ensuring balanced scaling. Default: False.

            This is important because without it, scaling tends to lean towards upscaling. For example, if we want
            the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is
            three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly
            from [0.5, 2]. With `balanced_scale`, the  function ensures that half the time, the scaling
            factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in).
            This makes the zooming in and out process more balanced.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    Reference:
        [1] https://arxiv.org/abs/2109.13488

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: ScaleFloatType | dict[str, Any] | None = Field(
            default=None,
            description="Scaling factor or dictionary for independent axis scaling.",
        )
        translate_percent: ScaleFloatType | dict[str, Any] | None = Field(
            default=None,
            description="Translation as a fraction of the image dimension.",
        )
        translate_px: ScaleIntType | dict[str, Any] | None = Field(
            default=None,
            description="Translation in pixels.",
        )
        rotate: ScaleFloatType | None = Field(default=None, description="Rotation angle in degrees.")
        shear: ScaleFloatType | dict[str, Any] | None = Field(
            default=None,
            description="Shear angle in degrees.",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        mask_interpolation: InterpolationType = cv2.INTER_NEAREST

        cval: ColorType = Field(default=0, description="Value used for constant padding.")
        cval_mask: ColorType = Field(default=0, description="Value used for mask constant padding.")
        mode: BorderModeType = cv2.BORDER_CONSTANT
        fit_output: Annotated[bool, Field(default=False, description="Adjust output to capture whole image.")]
        keep_ratio: Annotated[bool, Field(default=False, description="Maintain aspect ratio when scaling.")]
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box"
        balanced_scale: Annotated[bool, Field(default=False, description="Use balanced scaling.")]

    def __init__(
        self,
        scale: ScaleFloatType | dict[str, Any] | None = None,
        translate_percent: ScaleFloatType | dict[str, Any] | None = None,
        translate_px: ScaleIntType | dict[str, Any] | None = None,
        rotate: ScaleFloatType | None = None,
        shear: ScaleFloatType | dict[str, Any] | None = None,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        cval: ColorType = 0,
        cval_mask: ColorType = 0,
        mode: int = cv2.BORDER_CONSTANT,
        fit_output: bool = False,
        keep_ratio: bool = False,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        balanced_scale: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        params = [scale, translate_percent, translate_px, rotate, shear]
        if all(p is None for p in params):
            scale = {"x": (0.9, 1.1), "y": (0.9, 1.1)}
            translate_percent = {"x": (-0.1, 0.1), "y": (-0.1, 0.1)}
            rotate = (-15, 15)
            shear = {"x": (-10, 10), "y": (-10, 10)}
        else:
            scale = scale if scale is not None else 1.0
            rotate = rotate if rotate is not None else 0.0
            shear = shear if shear is not None else 0.0

        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.scale = self._handle_dict_arg(scale, "scale")
        self.translate_percent, self.translate_px = self._handle_translate_arg(translate_px, translate_percent)
        self.rotate = to_tuple(rotate, rotate)
        self.fit_output = fit_output
        self.shear = self._handle_dict_arg(shear, "shear")
        self.keep_ratio = keep_ratio
        self.rotate_method = rotate_method
        self.balanced_scale = balanced_scale

        if self.keep_ratio and self.scale["x"] != self.scale["y"]:
            raise ValueError(f"When keep_ratio is True, the x and y scale range should be identical. got {self.scale}")

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "interpolation",
            "mask_interpolation",
            "cval",
            "mode",
            "scale",
            "translate_percent",
            "translate_px",
            "rotate",
            "fit_output",
            "shear",
            "cval_mask",
            "keep_ratio",
            "rotate_method",
            "balanced_scale",
        )

    @staticmethod
    def _handle_dict_arg(
        val: float | tuple[float, float] | dict[str, Any],
        name: str,
        default: float = 1.0,
    ) -> dict[str, Any]:
        if isinstance(val, dict):
            if "x" not in val and "y" not in val:
                raise ValueError(
                    f'Expected {name} dictionary to contain at least key "x" or key "y". Found neither of them.',
                )
            x = val.get("x", default)
            y = val.get("y", default)
            return {"x": to_tuple(x, x), "y": to_tuple(y, y)}
        return {"x": to_tuple(val, val), "y": to_tuple(val, val)}

    @classmethod
    def _handle_translate_arg(
        cls,
        translate_px: ScaleFloatType | dict[str, Any] | None,
        translate_percent: ScaleFloatType | dict[str, Any] | None,
    ) -> Any:
        if translate_percent is None and translate_px is None:
            translate_px = 0

        if translate_percent is not None and translate_px is not None:
            msg = "Expected either translate_percent or translate_px to be provided, but both were provided."
            raise ValueError(msg)

        if translate_percent is not None:
            # translate by percent
            return cls._handle_dict_arg(translate_percent, "translate_percent", default=0.0), translate_px

        if translate_px is None:
            msg = "translate_px is None."
            raise ValueError(msg)
        # translate by pixels
        return translate_percent, cls._handle_dict_arg(translate_px, "translate_px")

    def apply(
        self,
        img: np.ndarray,
        matrix: skimage.transform.ProjectiveTransform,
        output_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.warp_affine(
            img,
            matrix,
            interpolation=self.interpolation,
            cval=self.cval,
            mode=self.mode,
            output_shape=output_shape,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        matrix: skimage.transform.ProjectiveTransform,
        output_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.warp_affine(
            mask,
            matrix,
            interpolation=self.mask_interpolation,
            cval=self.cval_mask,
            mode=self.mode,
            output_shape=output_shape,
        )

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        bbox_matrix: skimage.transform.AffineTransform,
        output_shape: tuple[int, int],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.bboxes_affine(
            bboxes,
            bbox_matrix,
            self.rotate_method,
            params["shape"][:2],
            self.mode,
            output_shape,
        )

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        matrix: skimage.transform.AffineTransform,
        scale: dict[str, Any],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.keypoints_affine(keypoints, matrix, params["shape"], scale, self.mode)

    @staticmethod
    def get_scale(
        scale: dict[str, tuple[float, float]],
        keep_ratio: bool,
        balanced_scale: bool,
    ) -> fgeometric.ScaleDict:
        result_scale = {}
        if balanced_scale:
            for key, value in scale.items():
                lower_interval = (value[0], 1.0) if value[0] < 1 else None
                upper_interval = (1.0, value[1]) if value[1] > 1 else None

                if lower_interval is not None and upper_interval is not None:
                    selected_interval = random.choice([lower_interval, upper_interval])
                elif lower_interval is not None:
                    selected_interval = lower_interval
                elif upper_interval is not None:
                    selected_interval = upper_interval
                else:
                    raise ValueError(f"Both lower_interval and upper_interval are None for key: {key}")

                result_scale[key] = random.uniform(*selected_interval)
        else:
            result_scale = {key: random.uniform(*value) for key, value in scale.items()}

        if keep_ratio:
            result_scale["y"] = result_scale["x"]

        return cast(fgeometric.ScaleDict, result_scale)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image_shape = params["shape"][:2]

        translate = self._get_translate_params(image_shape)
        shear = self._get_shear_params()
        scale = self.get_scale(self.scale, self.keep_ratio, self.balanced_scale)
        rotate = -random.uniform(*self.rotate)

        image_shift = fgeometric.center(image_shape)
        bbox_shift = fgeometric.center_bbox(image_shape)

        matrix = fgeometric.create_affine_transformation_matrix(translate, shear, scale, rotate, image_shift)
        bbox_matrix = fgeometric.create_affine_transformation_matrix(translate, shear, scale, rotate, bbox_shift)

        if self.fit_output:
            matrix, output_shape = fgeometric.compute_affine_warp_output_shape(matrix, image_shape)
            bbox_matrix, _ = fgeometric.compute_affine_warp_output_shape(bbox_matrix, image_shape)
        else:
            output_shape = image_shape

        return {
            "rotate": rotate,
            "scale": scale,
            "matrix": matrix,
            "bbox_matrix": bbox_matrix,
            "output_shape": output_shape,
        }

    def _get_translate_params(self, image_shape: tuple[int, int]) -> fgeometric.TranslateDict:
        height, width = image_shape[:2]
        if self.translate_px is not None:
            return cast(
                fgeometric.TranslateDict,
                {key: random.randint(*value) for key, value in self.translate_px.items()},
            )
        if self.translate_percent is not None:
            translate = {key: random.uniform(*value) for key, value in self.translate_percent.items()}
            return cast(fgeometric.TranslateDict, {"x": translate["x"] * width, "y": translate["y"] * height})
        return cast(fgeometric.TranslateDict, {"x": 0, "y": 0})

    def _get_shear_params(self) -> fgeometric.ShearDict:
        return cast(fgeometric.ShearDict, {key: -random.uniform(*value) for key, value in self.shear.items()})
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale: ScaleFloatType | dict[str, Any] | None = Field(
        default=None,
        description="Scaling factor or dictionary for independent axis scaling.",
    )
    translate_percent: ScaleFloatType | dict[str, Any] | None = Field(
        default=None,
        description="Translation as a fraction of the image dimension.",
    )
    translate_px: ScaleIntType | dict[str, Any] | None = Field(
        default=None,
        description="Translation in pixels.",
    )
    rotate: ScaleFloatType | None = Field(default=None, description="Rotation angle in degrees.")
    shear: ScaleFloatType | dict[str, Any] | None = Field(
        default=None,
        description="Shear angle in degrees.",
    )
    interpolation: InterpolationType = cv2.INTER_LINEAR
    mask_interpolation: InterpolationType = cv2.INTER_NEAREST

    cval: ColorType = Field(default=0, description="Value used for constant padding.")
    cval_mask: ColorType = Field(default=0, description="Value used for mask constant padding.")
    mode: BorderModeType = cv2.BORDER_CONSTANT
    fit_output: Annotated[bool, Field(default=False, description="Adjust output to capture whole image.")]
    keep_ratio: Annotated[bool, Field(default=False, description="Maintain aspect ratio when scaling.")]
    rotate_method: Literal["largest_box", "ellipse"] = "largest_box"
    balanced_scale: Annotated[bool, Field(default=False, description="Use balanced scaling.")]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, matrix, output_shape, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    matrix: skimage.transform.ProjectiveTransform,
    output_shape: tuple[int, int],
    **params: Any,
) -> np.ndarray:
    return fgeometric.warp_affine(
        img,
        matrix,
        interpolation=self.interpolation,
        cval=self.cval,
        mode=self.mode,
        output_shape=output_shape,
    )
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image_shape = params["shape"][:2]

    translate = self._get_translate_params(image_shape)
    shear = self._get_shear_params()
    scale = self.get_scale(self.scale, self.keep_ratio, self.balanced_scale)
    rotate = -random.uniform(*self.rotate)

    image_shift = fgeometric.center(image_shape)
    bbox_shift = fgeometric.center_bbox(image_shape)

    matrix = fgeometric.create_affine_transformation_matrix(translate, shear, scale, rotate, image_shift)
    bbox_matrix = fgeometric.create_affine_transformation_matrix(translate, shear, scale, rotate, bbox_shift)

    if self.fit_output:
        matrix, output_shape = fgeometric.compute_affine_warp_output_shape(matrix, image_shape)
        bbox_matrix, _ = fgeometric.compute_affine_warp_output_shape(bbox_matrix, image_shape)
    else:
        output_shape = image_shape

    return {
        "rotate": rotate,
        "scale": scale,
        "matrix": matrix,
        "bbox_matrix": bbox_matrix,
        "output_shape": output_shape,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "interpolation",
        "mask_interpolation",
        "cval",
        "mode",
        "scale",
        "translate_percent",
        "translate_px",
        "rotate",
        "fit_output",
        "shear",
        "cval_mask",
        "keep_ratio",
        "rotate_method",
        "balanced_scale",
    )
class D4 (always_apply=None, p=1) [view source on GitHub]

Applies one of the eight possible D4 dihedral group transformations to a square-shaped input, maintaining the square shape. These transformations correspond to the symmetries of a square, including rotations and reflections.

The D4 group transformations include: - 'e' (identity): No transformation is applied. - 'r90' (rotation by 90 degrees counterclockwise) - 'r180' (rotation by 180 degrees) - 'r270' (rotation by 270 degrees counterclockwise) - 'v' (reflection across the vertical midline) - 'hvt' (reflection across the anti-diagonal) - 'h' (reflection across the horizontal midline) - 't' (reflection across the main diagonal)

Even if the probability (p) of applying the transform is set to 1, the identity transformation 'e' may still occur, which means the input will remain unchanged in one out of eight cases.

Parameters:

Name Type Description
p float

Probability of applying the transform. Default is 1, meaning the transform is applied every time it is called.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Note

This transform is particularly useful when augmenting data that does not have a clear orientation: - Top view satellite or drone imagery - Medical images

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class D4(DualTransform):
    """Applies one of the eight possible D4 dihedral group transformations to a square-shaped input,
        maintaining the square shape. These transformations correspond to the symmetries of a square,
        including rotations and reflections.

    The D4 group transformations include:
    - 'e' (identity): No transformation is applied.
    - 'r90' (rotation by 90 degrees counterclockwise)
    - 'r180' (rotation by 180 degrees)
    - 'r270' (rotation by 270 degrees counterclockwise)
    - 'v' (reflection across the vertical midline)
    - 'hvt' (reflection across the anti-diagonal)
    - 'h' (reflection across the horizontal midline)
    - 't' (reflection across the main diagonal)

    Even if the probability (`p`) of applying the transform is set to 1, the identity transformation
    'e' may still occur, which means the input will remain unchanged in one out of eight cases.

    Args:
        p (float): Probability of applying the transform. Default is 1, meaning the
                   transform is applied every time it is called.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        This transform is particularly useful when augmenting data that does not have a clear orientation:
        - Top view satellite or drone imagery
        - Medical images

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        p: ProbabilityType = 1

    def __init__(
        self,
        always_apply: bool | None = None,
        p: float = 1,
    ):
        super().__init__(p=p, always_apply=always_apply)

    def apply(self, img: np.ndarray, group_element: D4Type, **params: Any) -> np.ndarray:
        return fgeometric.d4(img, group_element)

    def apply_to_bboxes(self, bboxes: np.ndarray, group_element: D4Type, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_d4(bboxes, group_element)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        group_element: D4Type,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.keypoints_d4(keypoints, group_element, params["shape"])

    def get_params(self) -> dict[str, D4Type]:
        return {
            "group_element": random_utils.choice(d4_group_elements),
        }

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, group_element, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, group_element: D4Type, **params: Any) -> np.ndarray:
    return fgeometric.d4(img, group_element)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params(self) -> dict[str, D4Type]:
    return {
        "group_element": random_utils.choice(d4_group_elements),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()
class ElasticTransform (alpha=3, sigma=50, alpha_affine=None, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=None, approximate=False, same_dxdy=False, p=0.5) [view source on GitHub]

Apply elastic deformation to images, masks, and bounding boxes as described in [Simard2003]_.

This transformation introduces random elastic distortions to images, which can be useful for data augmentation in training convolutional neural networks. The transformation can be applied in an approximate or precise manner, with an option to use the same displacement field for both x and y directions to speed up the process.

Parameters:

Name Type Description
alpha float

Scaling factor for the random displacement fields.

sigma float

Standard deviation for Gaussian filter applied to the displacement fields.

interpolation int

Interpolation method to be used. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default is cv2.INTER_LINEAR.

border_mode int

Pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default is cv2.BORDER_REFLECT_101.

value int, float, list of int, list of float

Padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

Padding value if border_mode is cv2.BORDER_CONSTANT, applied to masks.

approximate bool

Whether to smooth displacement map with a fixed kernel size. Enabling this option gives ~2X speedup on large images. Default is False.

same_dxdy bool

Whether to use the same random displacement for x and y directions. Enabling this option gives ~2X speedup. Default is False.

Targets

image, mask, bboxes

Image types: uint8, float32

Reference

Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003. https://gist.github.com/ernestum/601cdf56d2b424757de5

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class ElasticTransform(DualTransform):
    """Apply elastic deformation to images, masks, and bounding boxes as described in [Simard2003]_.

    This transformation introduces random elastic distortions to images, which can be useful for data augmentation
    in training convolutional neural networks. The transformation can be applied in an approximate or precise manner,
    with an option to use the same displacement field for both x and y directions to speed up the process.

    Args:
        alpha (float): Scaling factor for the random displacement fields.
        sigma (float): Standard deviation for Gaussian filter applied to the displacement fields.
        interpolation (int): Interpolation method to be used. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default is cv2.INTER_LINEAR.
        border_mode (int): Pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default is cv2.BORDER_REFLECT_101.
        value (int, float, list of int, list of float, optional): Padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float, list of int, list of float, optional): Padding value if border_mode is
            cv2.BORDER_CONSTANT, applied to masks.
        approximate (bool, optional): Whether to smooth displacement map with a fixed kernel size.
            Enabling this option gives ~2X speedup on large images. Default is False.
        same_dxdy (bool, optional): Whether to use the same random displacement for x and y directions.
            Enabling this option gives ~2X speedup. Default is False.

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    Reference:
        Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to
        Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
        https://gist.github.com/ernestum/601cdf56d2b424757de5
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        alpha: Annotated[float, Field(description="Alpha parameter.", ge=0)]
        sigma: Annotated[float, Field(default=50, description="Sigma parameter for Gaussian filter.", ge=1)]
        alpha_affine: None = Field(
            description="Alpha affine parameter.",
            deprecated="Use Affine transform to get affine effects",
        )
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: int | float | list[int] | list[float] | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_value: float | list[int] | list[float] | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.",
        )
        approximate: Annotated[bool, Field(default=False, description="Approximate displacement map smoothing.")]
        same_dxdy: Annotated[bool, Field(default=False, description="Use same shift for x and y.")]

    def __init__(
        self,
        alpha: float = 3,
        sigma: float = 50,
        alpha_affine: None = None,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ScalarType | list[ScalarType] | None = None,
        mask_value: ScalarType | list[ScalarType] | None = None,
        always_apply: bool | None = None,
        approximate: bool = False,
        same_dxdy: bool = False,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha
        self.sigma = sigma
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.approximate = approximate
        self.same_dxdy = same_dxdy

    def apply(
        self,
        img: np.ndarray,
        random_seed: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.elastic_transform(
            img,
            self.alpha,
            self.sigma,
            interpolation,
            self.border_mode,
            self.value,
            np.random.RandomState(random_seed),
            self.approximate,
            self.same_dxdy,
        )

    def apply_to_mask(self, mask: np.ndarray, random_seed: int, **params: Any) -> np.ndarray:
        return fgeometric.elastic_transform(
            mask,
            self.alpha,
            self.sigma,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
            np.random.RandomState(random_seed),
            self.approximate,
            self.same_dxdy,
        )

    def apply_to_bboxes(self, bboxes: np.ndarray, random_seed: int, **params: Any) -> np.ndarray:
        return fgeometric.bbox_elastic_transform(
            bboxes,
            self.alpha,
            self.sigma,
            self.interpolation,
            self.border_mode,
            self.approximate,
            self.same_dxdy,
            random_seed,
            params["shape"],
        )

    def get_params(self) -> dict[str, int]:
        return {"random_seed": random_utils.get_random_seed()}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "alpha",
            "sigma",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
            "approximate",
            "same_dxdy",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    alpha: Annotated[float, Field(description="Alpha parameter.", ge=0)]
    sigma: Annotated[float, Field(default=50, description="Sigma parameter for Gaussian filter.", ge=1)]
    alpha_affine: None = Field(
        description="Alpha affine parameter.",
        deprecated="Use Affine transform to get affine effects",
    )
    interpolation: InterpolationType = cv2.INTER_LINEAR
    border_mode: BorderModeType = cv2.BORDER_REFLECT_101
    value: int | float | list[int] | list[float] | None = Field(
        default=None,
        description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
    )
    mask_value: float | list[int] | list[float] | None = Field(
        default=None,
        description="Padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.",
    )
    approximate: Annotated[bool, Field(default=False, description="Approximate displacement map smoothing.")]
    same_dxdy: Annotated[bool, Field(default=False, description="Use same shift for x and y.")]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, random_seed, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    random_seed: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.elastic_transform(
        img,
        self.alpha,
        self.sigma,
        interpolation,
        self.border_mode,
        self.value,
        np.random.RandomState(random_seed),
        self.approximate,
        self.same_dxdy,
    )
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params(self) -> dict[str, int]:
    return {"random_seed": random_utils.get_random_seed()}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "alpha",
        "sigma",
        "interpolation",
        "border_mode",
        "value",
        "mask_value",
        "approximate",
        "same_dxdy",
    )
class Flip (always_apply=None, p=0.5) [view source on GitHub]

Deprecated. Consider using HorizontalFlip, VerticalFlip, RandomRotate90 or D4.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Flip(DualTransform):
    """Deprecated. Consider using HorizontalFlip, VerticalFlip, RandomRotate90 or D4."""

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def __init__(self, always_apply: bool | None = None, p: float = 0.5):
        super().__init__(p=p, always_apply=always_apply)
        warn(
            "Flip is deprecated. Consider using HorizontalFlip, VerticalFlip, RandomRotate90 or D4.",
            DeprecationWarning,
            stacklevel=2,
        )

    def apply(self, img: np.ndarray, d: int, **params: Any) -> np.ndarray:
        """Args:
        d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
                -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
                180 degrees).
        """
        return fgeometric.random_flip(img, d)

    def get_params(self) -> dict[str, int]:
        # Random int in the range [-1, 1]
        return {"d": random.randint(-1, 1)}

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_flip(bboxes, params["d"])

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.keypoints_flip(keypoints, params["d"], params["shape"])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
__init__ (self, always_apply=None, p=0.5) special

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def __init__(self, always_apply: bool | None = None, p: float = 0.5):
    super().__init__(p=p, always_apply=always_apply)
    warn(
        "Flip is deprecated. Consider using HorizontalFlip, VerticalFlip, RandomRotate90 or D4.",
        DeprecationWarning,
        stacklevel=2,
    )
apply (self, img, d, **params)

d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, d: int, **params: Any) -> np.ndarray:
    """Args:
    d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
            -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
            180 degrees).
    """
    return fgeometric.random_flip(img, d)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params(self) -> dict[str, int]:
    # Random int in the range [-1, 1]
    return {"d": random.randint(-1, 1)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()
class GridDistortion (num_steps=5, distort_limit=(-0.3, 0.3), interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=None, p=0.5) [view source on GitHub]

Applies grid distortion augmentation to images, masks, and bounding boxes. This technique involves dividing the image into a grid of cells and randomly displacing the intersection points of the grid, resulting in localized distortions.

Parameters:

Name Type Description
num_steps int

Number of grid cells on each side (minimum 1).

distort_limit float, (float, float

Range of distortion limits. If a single float is provided, the range will be from (-distort_limit, distort_limit). Default: (-0.3, 0.3).

interpolation OpenCV flag

Interpolation algorithm used for image transformation. Options are: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

Pixel extrapolation method used when pixels outside the image are required. Options are: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101.

value int, float, list of ints, list of floats

Value used for padding when border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of floats

Padding value for masks when border_mode is cv2.BORDER_CONSTANT.

normalized bool

If True, ensures that distortion does not exceed image boundaries. Default: False. Reference: https://github.com/albumentations-team/albumentations/pull/722

Targets

image, mask, bboxes

Image types: uint8, float32

Note

This transform is helpful in medical imagery, Optical Character Recognition, and other tasks where local distance may not be preserved.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class GridDistortion(DualTransform):
    """Applies grid distortion augmentation to images, masks, and bounding boxes. This technique involves dividing
    the image into a grid of cells and randomly displacing the intersection points of the grid,
    resulting in localized distortions.

    Args:
        num_steps (int): Number of grid cells on each side (minimum 1).
        distort_limit (float, (float, float)): Range of distortion limits. If a single float is provided,
            the range will be from (-distort_limit, distort_limit). Default: (-0.3, 0.3).
        interpolation (OpenCV flag): Interpolation algorithm used for image transformation. Options are:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): Pixel extrapolation method used when pixels outside the image are required.
            Options are: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP,
            cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101.
        value (int, float, list of ints, list of floats, optional): Value used for padding when
            border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float, list of ints, list of floats, optional): Padding value for masks when
            border_mode is cv2.BORDER_CONSTANT.
        normalized (bool): If True, ensures that distortion does not exceed image boundaries. Default: False.
            Reference: https://github.com/albumentations-team/albumentations/pull/722

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    Note:
        This transform is helpful in medical imagery, Optical Character Recognition, and other tasks where local
        distance may not be preserved.
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        num_steps: Annotated[int, Field(ge=1, description="Count of grid cells on each side.")]
        distort_limit: SymmetricRangeType = (-0.3, 0.3)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_value: ColorType | None = Field(
            default=None,
            description="Padding value for mask if border_mode is cv2.BORDER_CONSTANT.",
        )
        normalized: bool = Field(
            default=False,
            description="If true, distortion will be normalized to not go outside the image.",
        )

        @field_validator("distort_limit")
        @classmethod
        def check_limits(cls, v: tuple[float, float], info: ValidationInfo) -> tuple[float, float]:
            bounds = -1, 1
            result = to_tuple(v)
            check_range(result, *bounds, info.field_name)
            return result

    def __init__(
        self,
        num_steps: int = 5,
        distort_limit: ScaleFloatType = (-0.3, 0.3),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        normalized: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)

        self.num_steps = num_steps
        self.distort_limit = cast(Tuple[float, float], distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.normalized = normalized

    def apply(
        self,
        img: np.ndarray,
        stepsx: tuple[()],
        stepsy: tuple[()],
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.grid_distortion(
            img,
            self.num_steps,
            stepsx,
            stepsy,
            interpolation,
            self.border_mode,
            self.value,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        stepsx: tuple[()],
        stepsy: tuple[()],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.grid_distortion(
            mask,
            self.num_steps,
            stepsx,
            stepsy,
            cv2.INTER_NEAREST,
            self.border_mode,
            self.mask_value,
        )

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        stepsx: tuple[()],
        stepsy: tuple[()],
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.bboxes_grid_distortion(
            bboxes,
            stepsx,
            stepsy,
            self.num_steps,
            self.border_mode,
            params["shape"],
        )

    def _normalize(self, h: int, w: int, xsteps: list[float], ysteps: list[float]) -> dict[str, Any]:
        # compensate for smaller last steps in source image.
        x_step = w // self.num_steps
        last_x_step = min(w, ((self.num_steps + 1) * x_step)) - (self.num_steps * x_step)
        xsteps[-1] *= last_x_step / x_step

        y_step = h // self.num_steps
        last_y_step = min(h, ((self.num_steps + 1) * y_step)) - (self.num_steps * y_step)
        ysteps[-1] *= last_y_step / y_step

        # now normalize such that distortion never leaves image bounds.
        tx = w / math.floor(w / self.num_steps)
        ty = h / math.floor(h / self.num_steps)
        xsteps = np.array(xsteps) * (tx / np.sum(xsteps))
        ysteps = np.array(ysteps) * (ty / np.sum(ysteps))

        return {"stepsx": xsteps, "stepsy": ysteps}

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
        stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]

        if self.normalized:
            return self._normalize(height, width, stepsx, stepsy)

        return {"stepsx": stepsx, "stepsy": stepsy}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    num_steps: Annotated[int, Field(ge=1, description="Count of grid cells on each side.")]
    distort_limit: SymmetricRangeType = (-0.3, 0.3)
    interpolation: InterpolationType = cv2.INTER_LINEAR
    border_mode: BorderModeType = cv2.BORDER_REFLECT_101
    value: ColorType | None = Field(
        default=None,
        description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
    )
    mask_value: ColorType | None = Field(
        default=None,
        description="Padding value for mask if border_mode is cv2.BORDER_CONSTANT.",
    )
    normalized: bool = Field(
        default=False,
        description="If true, distortion will be normalized to not go outside the image.",
    )

    @field_validator("distort_limit")
    @classmethod
    def check_limits(cls, v: tuple[float, float], info: ValidationInfo) -> tuple[float, float]:
        bounds = -1, 1
        result = to_tuple(v)
        check_range(result, *bounds, info.field_name)
        return result
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, stepsx, stepsy, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    stepsx: tuple[()],
    stepsy: tuple[()],
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.grid_distortion(
        img,
        self.num_steps,
        stepsx,
        stepsy,
        interpolation,
        self.border_mode,
        self.value,
    )
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
    stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]

    if self.normalized:
        return self._normalize(height, width, stepsx, stepsy)

    return {"stepsx": stepsx, "stepsy": stepsy}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"
class GridElasticDeform (num_grid_xy, magnitude, interpolation=1, mask_interpolation=0, p=1.0, always_apply=None) [view source on GitHub]

Grid-based Elastic deformation Albumentations implementation

This class applies elastic transformations using a grid-based approach. The granularity and intensity of the distortions can be controlled using the dimensions of the overlaying distortion grid and the magnitude parameter. Larger grid sizes result in finer, less severe distortions.

Parameters:

Name Type Description
num_grid_xy tuple[int, int]

Number of grid cells along the width and height. Specified as (grid_width, grid_height). Each value must be greater than 1.

magnitude int

Maximum pixel-wise displacement for distortion. Must be greater than 0.

interpolation int

Interpolation method to be used for the image transformation. Default: cv2.INTER_LINEAR

mask_interpolation int

Interpolation method to be used for mask transformation. Default: cv2.INTER_NEAREST

p float

Probability of applying the transform. Default: 1.0.

Targets

image, mask

Image types: uint8, float32

Examples:

Python
>>> transform = GridElasticDeform(num_grid_xy=(4, 4), magnitude=10, p=1.0)
>>> result = transform(image=image, mask=mask)
>>> transformed_image, transformed_mask = result['image'], result['mask']

Note

This transformation is particularly useful for data augmentation in medical imaging and other domains where elastic deformations can simulate realistic variations.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class GridElasticDeform(DualTransform):
    """Grid-based Elastic deformation Albumentations implementation

    This class applies elastic transformations using a grid-based approach.
    The granularity and intensity of the distortions can be controlled using
    the dimensions of the overlaying distortion grid and the magnitude parameter.
    Larger grid sizes result in finer, less severe distortions.

    Args:
        num_grid_xy (tuple[int, int]): Number of grid cells along the width and height.
            Specified as (grid_width, grid_height). Each value must be greater than 1.
        magnitude (int): Maximum pixel-wise displacement for distortion. Must be greater than 0.
        interpolation (int): Interpolation method to be used for the image transformation.
            Default: cv2.INTER_LINEAR
        mask_interpolation (int): Interpolation method to be used for mask transformation.
            Default: cv2.INTER_NEAREST
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Example:
        >>> transform = GridElasticDeform(num_grid_xy=(4, 4), magnitude=10, p=1.0)
        >>> result = transform(image=image, mask=mask)
        >>> transformed_image, transformed_mask = result['image'], result['mask']

    Note:
        This transformation is particularly useful for data augmentation in medical imaging
        and other domains where elastic deformations can simulate realistic variations.
    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        num_grid_xy: Annotated[tuple[int, int], AfterValidator(check_1plus)]
        p: ProbabilityType = 1.0
        magnitude: int = Field(gt=0)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        mask_interpolation: InterpolationType = cv2.INTER_NEAREST

    def __init__(
        self,
        num_grid_xy: tuple[int, int],
        magnitude: int,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        p: float = 1.0,
        always_apply: bool | None = None,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.num_grid_xy = num_grid_xy
        self.magnitude = magnitude
        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation

    @staticmethod
    def generate_mesh(polygons: np.ndarray, dimensions: np.ndarray) -> np.ndarray:
        return np.hstack((dimensions.reshape(-1, 4), polygons))

    def get_params_dependent_on_data(
        self,
        params: dict[str, Any],
        data: dict[str, Any],
    ) -> dict[str, Any]:
        image_shape = np.array(params["shape"][:2])

        dimensions = fgeometric.calculate_grid_dimensions(image_shape, self.num_grid_xy)

        polygons = fgeometric.generate_distorted_grid_polygons(dimensions, self.magnitude)

        generated_mesh = self.generate_mesh(polygons, dimensions)

        return {"generated_mesh": generated_mesh}

    def apply(self, img: np.ndarray, generated_mesh: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.distort_image(img, generated_mesh, self.interpolation)

    def apply_to_mask(self, mask: np.ndarray, generated_mesh: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.distort_image(mask, generated_mesh, self.mask_interpolation)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "num_grid_xy", "magnitude", "interpolation", "mask_interpolation"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    num_grid_xy: Annotated[tuple[int, int], AfterValidator(check_1plus)]
    p: ProbabilityType = 1.0
    magnitude: int = Field(gt=0)
    interpolation: InterpolationType = cv2.INTER_LINEAR
    mask_interpolation: InterpolationType = cv2.INTER_NEAREST
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, generated_mesh, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, generated_mesh: np.ndarray, **params: Any) -> np.ndarray:
    return fgeometric.distort_image(img, generated_mesh, self.interpolation)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params_dependent_on_data(
    self,
    params: dict[str, Any],
    data: dict[str, Any],
) -> dict[str, Any]:
    image_shape = np.array(params["shape"][:2])

    dimensions = fgeometric.calculate_grid_dimensions(image_shape, self.num_grid_xy)

    polygons = fgeometric.generate_distorted_grid_polygons(dimensions, self.magnitude)

    generated_mesh = self.generate_mesh(polygons, dimensions)

    return {"generated_mesh": generated_mesh}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "num_grid_xy", "magnitude", "interpolation", "mask_interpolation"
class HorizontalFlip [view source on GitHub]

Flip the input horizontally around the y-axis.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class HorizontalFlip(DualTransform):
    """Flip the input horizontally around the y-axis.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return hflip(img)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_hflip(bboxes)

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.keypoints_hflip(keypoints, params["cols"])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return hflip(img)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()
class OpticalDistortion (distort_limit=(-0.05, 0.05), shift_limit=(-0.05, 0.05), interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=None, p=0.5) [view source on GitHub]

Parameters:

Name Type Description
distort_limit float, (float, float

If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).

shift_limit float, (float, float

If shift_limit is a single float, the range will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of ints, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

Targets

image, mask, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class OpticalDistortion(DualTransform):
    """Args:
        distort_limit (float, (float, float)): If distort_limit is a single float, the range
            will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
        shift_limit (float, (float, float))): If shift_limit is a single float, the range
            will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

    Targets:
        image, mask, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        distort_limit: SymmetricRangeType = (-0.05, 0.05)
        shift_limit: SymmetricRangeType = (-0.05, 0.05)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType | None = Field(
            default=None,
            description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
        )
        mask_value: ColorType | None = Field(
            default=None,
            description="Padding value for mask if border_mode is cv2.BORDER_CONSTANT.",
        )

    def __init__(
        self,
        distort_limit: ScaleFloatType = (-0.05, 0.05),
        shift_limit: ScaleFloatType = (-0.05, 0.05),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.shift_limit = cast(Tuple[float, float], shift_limit)
        self.distort_limit = cast(Tuple[float, float], distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def apply(
        self,
        img: np.ndarray,
        k: int,
        dx: int,
        dy: int,
        interpolation: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)

    def apply_to_mask(self, mask: np.ndarray, k: int, dx: int, dy: int, **params: Any) -> np.ndarray:
        return fgeometric.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)

    def apply_to_bboxes(self, bboxes: np.ndarray, k: float, dx: int, dy: int, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_optical_distortion(bboxes, k, dx, dy, self.border_mode, params["shape"])

    def get_params(self) -> dict[str, Any]:
        return {
            "k": random.uniform(*self.distort_limit),
            "dx": round(random.uniform(*self.shift_limit)),
            "dy": round(random.uniform(*self.shift_limit)),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "distort_limit",
            "shift_limit",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
        )

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
        }
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    distort_limit: SymmetricRangeType = (-0.05, 0.05)
    shift_limit: SymmetricRangeType = (-0.05, 0.05)
    interpolation: InterpolationType = cv2.INTER_LINEAR
    border_mode: BorderModeType = cv2.BORDER_REFLECT_101
    value: ColorType | None = Field(
        default=None,
        description="Padding value if border_mode is cv2.BORDER_CONSTANT.",
    )
    mask_value: ColorType | None = Field(
        default=None,
        description="Padding value for mask if border_mode is cv2.BORDER_CONSTANT.",
    )
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, k, dx, dy, interpolation, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    k: int,
    dx: int,
    dy: int,
    interpolation: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    return {
        "k": random.uniform(*self.distort_limit),
        "dx": round(random.uniform(*self.shift_limit)),
        "dy": round(random.uniform(*self.shift_limit)),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "distort_limit",
        "shift_limit",
        "interpolation",
        "border_mode",
        "value",
        "mask_value",
    )
class PadIfNeeded (min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=None, p=1.0) [view source on GitHub]

Pads the sides of an image if the image dimensions are less than the specified minimum dimensions. If the pad_height_divisor or pad_width_divisor is specified, the function additionally ensures that the image dimensions are divisible by these values.

Parameters:

Name Type Description
min_height int

Minimum desired height of the image. Ensures image height is at least this value.

min_width int

Minimum desired width of the image. Ensures image width is at least this value.

pad_height_divisor int

If set, pads the image height to make it divisible by this value.

pad_width_divisor int

If set, pads the image width to make it divisible by this value.

position Union[str, PositionType]

Position where the image is to be placed after padding. Can be one of 'center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', or 'random'. Default is 'center'.

border_mode int

Specifies the border mode to use if padding is required. The default is cv2.BORDER_REFLECT_101.

value Union[int, float, list[int], list[float]]

Value to fill the border pixels if the border mode is cv2.BORDER_CONSTANT. Default is None.

mask_value Union[int, float, list[int], list[float]]

Similar to value but used for padding masks. Default is None.

p float

Probability of applying the transform. Default is 1.0.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class PadIfNeeded(DualTransform):
    """Pads the sides of an image if the image dimensions are less than the specified minimum dimensions.
    If the `pad_height_divisor` or `pad_width_divisor` is specified, the function additionally ensures
    that the image dimensions are divisible by these values.

    Args:
        min_height (int): Minimum desired height of the image. Ensures image height is at least this value.
        min_width (int): Minimum desired width of the image. Ensures image width is at least this value.
        pad_height_divisor (int, optional): If set, pads the image height to make it divisible by this value.
        pad_width_divisor (int, optional): If set, pads the image width to make it divisible by this value.
        position (Union[str, PositionType]): Position where the image is to be placed after padding.
            Can be one of 'center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', or 'random'.
            Default is 'center'.
        border_mode (int): Specifies the border mode to use if padding is required.
            The default is `cv2.BORDER_REFLECT_101`.
        value (Union[int, float, list[int], list[float]], optional): Value to fill the border pixels if
            the border mode is `cv2.BORDER_CONSTANT`. Default is None.
        mask_value (Union[int, float, list[int], list[float]], optional): Similar to `value` but used for padding masks.
            Default is None.
        p (float): Probability of applying the transform. Default is 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    class PositionType(Enum):
        """Enumerates the types of positions for placing an object within a container.

        This Enum class is utilized to define specific anchor positions that an object can
        assume relative to a container. It's particularly useful in image processing, UI layout,
        and graphic design to specify the alignment and positioning of elements.

        Attributes:
            CENTER (str): Specifies that the object should be placed at the center.
            TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
            TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
            BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
            BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
            RANDOM (str): Indicates that the object's position should be determined randomly.

        """

        CENTER = "center"
        TOP_LEFT = "top_left"
        TOP_RIGHT = "top_right"
        BOTTOM_LEFT = "bottom_left"
        BOTTOM_RIGHT = "bottom_right"
        RANDOM = "random"

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        min_height: int | None = Field(default=None, ge=1, description="Minimal result image height.")
        min_width: int | None = Field(default=None, ge=1, description="Minimal result image width.")
        pad_height_divisor: int | None = Field(
            default=None,
            ge=1,
            description="Ensures image height is divisible by this value.",
        )
        pad_width_divisor: int | None = Field(
            default=None,
            ge=1,
            description="Ensures image width is divisible by this value.",
        )
        position: str = Field(default="center", description="Position of the padded image.")
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType | None = Field(default=None, description="Value for border if BORDER_CONSTANT is used.")
        mask_value: ColorType | None = Field(
            default=None,
            description="Value for mask border if BORDER_CONSTANT is used.",
        )
        p: ProbabilityType = 1.0

        @model_validator(mode="after")
        def validate_divisibility(self) -> Self:
            if (self.min_height is None) == (self.pad_height_divisor is None):
                msg = "Only one of 'min_height' and 'pad_height_divisor' parameters must be set"
                raise ValueError(msg)
            if (self.min_width is None) == (self.pad_width_divisor is None):
                msg = "Only one of 'min_width' and 'pad_width_divisor' parameters must be set"
                raise ValueError(msg)

            if self.border_mode == cv2.BORDER_CONSTANT and self.value is None:
                msg = "If 'border_mode' is set to 'BORDER_CONSTANT', 'value' must be provided."
                raise ValueError(msg)

            return self

    def __init__(
        self,
        min_height: int | None = 1024,
        min_width: int | None = 1024,
        pad_height_divisor: int | None = None,
        pad_width_divisor: int | None = None,
        position: PositionType | str = PositionType.CENTER,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType | None = None,
        mask_value: ColorType | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)
        self.min_height = min_height
        self.min_width = min_width
        self.pad_width_divisor = pad_width_divisor
        self.pad_height_divisor = pad_height_divisor
        self.position = PadIfNeeded.PositionType(position)
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
        params = super().update_params(params, **kwargs)
        rows, cols = params["shape"][:2]

        if self.min_height is not None:
            if rows < self.min_height:
                h_pad_top = int((self.min_height - rows) / 2.0)
                h_pad_bottom = self.min_height - rows - h_pad_top
            else:
                h_pad_top = 0
                h_pad_bottom = 0
        else:
            pad_remained = rows % self.pad_height_divisor
            pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0

            h_pad_top = pad_rows // 2
            h_pad_bottom = pad_rows - h_pad_top

        if self.min_width is not None:
            if cols < self.min_width:
                w_pad_left = int((self.min_width - cols) / 2.0)
                w_pad_right = self.min_width - cols - w_pad_left
            else:
                w_pad_left = 0
                w_pad_right = 0
        else:
            pad_remainder = cols % self.pad_width_divisor
            pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0

            w_pad_left = pad_cols // 2
            w_pad_right = pad_cols - w_pad_left

        h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
            h_top=h_pad_top,
            h_bottom=h_pad_bottom,
            w_left=w_pad_left,
            w_right=w_pad_right,
        )

        params.update(
            {
                "pad_top": h_pad_top,
                "pad_bottom": h_pad_bottom,
                "pad_left": w_pad_left,
                "pad_right": w_pad_right,
            },
        )
        return params

    def apply(
        self,
        img: np.ndarray,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.pad_with_params(
            img,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.value,
        )

    def apply_to_mask(
        self,
        mask: np.ndarray,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.pad_with_params(
            mask,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.mask_value,
        )

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> np.ndarray:
        image_shape = params["shape"][:2]
        bboxes_np = denormalize_bboxes(bboxes, params["shape"])

        result = fgeometric.pad_bboxes(
            bboxes_np,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            self.border_mode,
            image_shape=image_shape,
        )

        rows, cols = params["shape"][:2]

        return normalize_bboxes(result, (rows + pad_top + pad_bottom, cols + pad_left + pad_right))

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        pad_top: int,
        pad_bottom: int,
        pad_left: int,
        pad_right: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.pad_keypoints(
            keypoints,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            self.border_mode,
            image_shape=params["shape"][:2],
        )

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "min_height",
            "min_width",
            "pad_height_divisor",
            "pad_width_divisor",
            "position",
            "border_mode",
            "value",
            "mask_value",
        )

    def __update_position_params(
        self,
        h_top: int,
        h_bottom: int,
        w_left: int,
        w_right: int,
    ) -> tuple[int, int, int, int]:
        if self.position == PadIfNeeded.PositionType.TOP_LEFT:
            h_bottom += h_top
            w_right += w_left
            h_top = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.TOP_RIGHT:
            h_bottom += h_top
            w_left += w_right
            h_top = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_LEFT:
            h_top += h_bottom
            w_right += w_left
            h_bottom = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_RIGHT:
            h_top += h_bottom
            w_left += w_right
            h_bottom = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.RANDOM:
            h_pad = h_top + h_bottom
            w_pad = w_left + w_right
            h_top = random.randint(0, h_pad)
            h_bottom = h_pad - h_top
            w_left = random.randint(0, w_pad)
            w_right = w_pad - w_left

        return h_top, h_bottom, w_left, w_right
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    min_height: int | None = Field(default=None, ge=1, description="Minimal result image height.")
    min_width: int | None = Field(default=None, ge=1, description="Minimal result image width.")
    pad_height_divisor: int | None = Field(
        default=None,
        ge=1,
        description="Ensures image height is divisible by this value.",
    )
    pad_width_divisor: int | None = Field(
        default=None,
        ge=1,
        description="Ensures image width is divisible by this value.",
    )
    position: str = Field(default="center", description="Position of the padded image.")
    border_mode: BorderModeType = cv2.BORDER_REFLECT_101
    value: ColorType | None = Field(default=None, description="Value for border if BORDER_CONSTANT is used.")
    mask_value: ColorType | None = Field(
        default=None,
        description="Value for mask border if BORDER_CONSTANT is used.",
    )
    p: ProbabilityType = 1.0

    @model_validator(mode="after")
    def validate_divisibility(self) -> Self:
        if (self.min_height is None) == (self.pad_height_divisor is None):
            msg = "Only one of 'min_height' and 'pad_height_divisor' parameters must be set"
            raise ValueError(msg)
        if (self.min_width is None) == (self.pad_width_divisor is None):
            msg = "Only one of 'min_width' and 'pad_width_divisor' parameters must be set"
            raise ValueError(msg)

        if self.border_mode == cv2.BORDER_CONSTANT and self.value is None:
            msg = "If 'border_mode' is set to 'BORDER_CONSTANT', 'value' must be provided."
            raise ValueError(msg)

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class PositionType

Enumerates the types of positions for placing an object within a container.

This Enum class is utilized to define specific anchor positions that an object can assume relative to a container. It's particularly useful in image processing, UI layout, and graphic design to specify the alignment and positioning of elements.

Attributes:

Name Type Description
CENTER str

Specifies that the object should be placed at the center.

TOP_LEFT str

Specifies that the object should be placed at the top-left corner.

TOP_RIGHT str

Specifies that the object should be placed at the top-right corner.

BOTTOM_LEFT str

Specifies that the object should be placed at the bottom-left corner.

BOTTOM_RIGHT str

Specifies that the object should be placed at the bottom-right corner.

RANDOM str

Indicates that the object's position should be determined randomly.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class PositionType(Enum):
    """Enumerates the types of positions for placing an object within a container.

    This Enum class is utilized to define specific anchor positions that an object can
    assume relative to a container. It's particularly useful in image processing, UI layout,
    and graphic design to specify the alignment and positioning of elements.

    Attributes:
        CENTER (str): Specifies that the object should be placed at the center.
        TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
        TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
        BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
        BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
        RANDOM (str): Indicates that the object's position should be determined randomly.

    """

    CENTER = "center"
    TOP_LEFT = "top_left"
    TOP_RIGHT = "top_right"
    BOTTOM_LEFT = "bottom_left"
    BOTTOM_RIGHT = "bottom_right"
    RANDOM = "random"
apply (self, img, pad_top, pad_bottom, pad_left, pad_right, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    pad_top: int,
    pad_bottom: int,
    pad_left: int,
    pad_right: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.pad_with_params(
        img,
        pad_top,
        pad_bottom,
        pad_left,
        pad_right,
        border_mode=self.border_mode,
        value=self.value,
    )
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "min_height",
        "min_width",
        "pad_height_divisor",
        "pad_width_divisor",
        "position",
        "border_mode",
        "value",
        "mask_value",
    )
update_params (self, params, **kwargs)

Update parameters with transform specific params. This method is deprecated, use: - get_params for transform specific params like interpolation and - update_params_shape for data like shape.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
    params = super().update_params(params, **kwargs)
    rows, cols = params["shape"][:2]

    if self.min_height is not None:
        if rows < self.min_height:
            h_pad_top = int((self.min_height - rows) / 2.0)
            h_pad_bottom = self.min_height - rows - h_pad_top
        else:
            h_pad_top = 0
            h_pad_bottom = 0
    else:
        pad_remained = rows % self.pad_height_divisor
        pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0

        h_pad_top = pad_rows // 2
        h_pad_bottom = pad_rows - h_pad_top

    if self.min_width is not None:
        if cols < self.min_width:
            w_pad_left = int((self.min_width - cols) / 2.0)
            w_pad_right = self.min_width - cols - w_pad_left
        else:
            w_pad_left = 0
            w_pad_right = 0
    else:
        pad_remainder = cols % self.pad_width_divisor
        pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0

        w_pad_left = pad_cols // 2
        w_pad_right = pad_cols - w_pad_left

    h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
        h_top=h_pad_top,
        h_bottom=h_pad_bottom,
        w_left=w_pad_left,
        w_right=w_pad_right,
    )

    params.update(
        {
            "pad_top": h_pad_top,
            "pad_bottom": h_pad_bottom,
            "pad_left": w_pad_left,
            "pad_right": w_pad_right,
        },
    )
    return params
class Perspective (scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=None, p=0.5) [view source on GitHub]

Perform a random four point perspective transform of the input.

Parameters:

Name Type Description
scale ScaleFloatType

standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).

keep_size bool

Whether to resize image back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True

pad_mode OpenCV flag

OpenCV border mode.

pad_val int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0

mask_pad_val int, float, list of int, list of float

padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0

fit_output bool

If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Perspective(DualTransform):
    """Perform a random four point perspective transform of the input.

    Args:
        scale: standard deviation of the normal distributions. These are used to sample
            the random distances of the subimage's corners from the full image's corners.
            If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
        keep_size: Whether to resize image back to their original size after applying the perspective
            transform. If set to False, the resulting images may end up having different shapes
            and will always be a list, never an array. Default: True
        pad_mode (OpenCV flag): OpenCV border mode.
        pad_val (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
            Default: 0
        mask_pad_val (int, float, list of int, list of float): padding value for mask
            if border_mode is cv2.BORDER_CONSTANT. Default: 0
        fit_output (bool): If True, the image plane size and position will be adjusted to still capture
            the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.)
            Otherwise, parts of the transformed image may be outside of the image plane.
            This setting should not be set to True when using large scale values as it could lead to very large images.
            Default: False
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        scale: NonNegativeFloatRangeType
        keep_size: Annotated[bool, Field(default=True, description="Keep size after transform.")]
        pad_mode: BorderModeType
        pad_val: ColorType | None
        mask_pad_val: ColorType | None
        fit_output: Annotated[bool, Field(default=False, description="Adjust image plane to capture whole image.")]
        interpolation: InterpolationType

    def __init__(
        self,
        scale: ScaleFloatType = (0.05, 0.1),
        keep_size: bool = True,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_val: ColorType = 0,
        mask_pad_val: ColorType = 0,
        fit_output: bool = False,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.scale = cast(Tuple[float, float], scale)
        self.keep_size = keep_size
        self.pad_mode = pad_mode
        self.pad_val = pad_val
        self.mask_pad_val = mask_pad_val
        self.fit_output = fit_output
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.perspective(
            img,
            matrix,
            max_width,
            max_height,
            self.pad_val,
            self.pad_mode,
            self.keep_size,
            params["interpolation"],
        )

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.perspective_bboxes(
            bboxes,
            params["shape"],
            matrix,
            max_width,
            max_height,
            self.keep_size,
        )

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        matrix: np.ndarray,
        max_height: int,
        max_width: int,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.perspective_keypoints(
            keypoints,
            params["shape"],
            matrix,
            max_width,
            max_height,
            self.keep_size,
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        scale = random.uniform(*self.scale)
        points = random_utils.normal(0, scale, (4, 2))
        points = np.mod(np.abs(points), 0.32)

        # top left -- no changes needed, just use jitter
        # top right
        points[1, 0] = 1.0 - points[1, 0]  # w = 1.0 - jitter
        # bottom right
        points[2] = 1.0 - points[2]  # w = 1.0 - jitt
        # bottom left
        points[3, 1] = 1.0 - points[3, 1]  # h = 1.0 - jitter

        points[:, 0] *= width
        points[:, 1] *= height

        # Obtain a consistent order of the points and unpack them individually.
        # Warning: don't just do (tl, tr, br, bl) = _order_points(...)
        # here, because the reordered points is used further below.
        points = self._order_points(points)
        tl, tr, br, bl = points

        # compute the width of the new image, which will be the
        # maximum distance between bottom-right and bottom-left
        # x-coordiates or the top-right and top-left x-coordinates
        min_width = None
        max_width = None
        while min_width is None or min_width < TWO:
            width_top = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
            width_bottom = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
            max_width = int(max(width_top, width_bottom))
            min_width = int(min(width_top, width_bottom))
            if min_width < TWO:
                step_size = (2 - min_width) / 2
                tl[0] -= step_size
                tr[0] += step_size
                bl[0] -= step_size
                br[0] += step_size

        # compute the height of the new image, which will be the maximum distance between the top-right
        # and bottom-right y-coordinates or the top-left and bottom-left y-coordinates
        min_height = None
        max_height = None
        while min_height is None or min_height < TWO:
            height_right = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
            height_left = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
            max_height = int(max(height_right, height_left))
            min_height = int(min(height_right, height_left))
            if min_height < TWO:
                step_size = (2 - min_height) / 2
                tl[1] -= step_size
                tr[1] -= step_size
                bl[1] += step_size
                br[1] += step_size

        # now that we have the dimensions of the new image, construct
        # the set of destination points to obtain a "birds eye view",
        # (i.e. top-down view) of the image, again specifying points
        # in the top-left, top-right, bottom-right, and bottom-left order
        # do not use width-1 or height-1 here, as for e.g. width=3, height=2
        # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
        dst = np.array([[0, 0], [max_width, 0], [max_width, max_height], [0, max_height]], dtype=np.float32)

        # compute the perspective transform matrix and then apply it
        m = cv2.getPerspectiveTransform(points, dst)

        if self.fit_output:
            m, max_width, max_height = self._expand_transform(m, (height, width))

        return {"matrix": m, "max_height": max_height, "max_width": max_width, "interpolation": self.interpolation}

    @classmethod
    def _expand_transform(cls, matrix: np.ndarray, shape: tuple[int, int]) -> tuple[np.ndarray, int, int]:
        height, width = shape[:2]
        # do not use width-1 or height-1 here, as for e.g. width=3, height=2, max_height
        # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
        rect = np.array([[0, 0], [width, 0], [width, height], [0, height]], dtype=np.float32)
        dst = cv2.perspectiveTransform(np.array([rect]), matrix)[0]

        # get min x, y over transformed 4 points
        # then modify target points by subtracting these minima  => shift to (0, 0)
        dst -= dst.min(axis=0, keepdims=True)
        dst = np.around(dst, decimals=0)

        matrix_expanded = cv2.getPerspectiveTransform(rect, dst)
        max_width, max_height = dst.max(axis=0)
        return matrix_expanded, int(max_width), int(max_height)

    @staticmethod
    def _order_points(pts: np.ndarray) -> np.ndarray:
        pts = np.array(sorted(pts, key=lambda x: x[0]))
        left = pts[:2]  # points with smallest x coordinate - left points
        right = pts[2:]  # points with greatest x coordinate - right points

        if left[0][1] < left[1][1]:
            tl, bl = left
        else:
            bl, tl = left

        if right[0][1] < right[1][1]:
            tr, br = right
        else:
            br, tr = right

        return np.array([tl, tr, br, bl], dtype=np.float32)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "scale", "keep_size", "pad_mode", "pad_val", "mask_pad_val", "fit_output", "interpolation"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale: NonNegativeFloatRangeType
    keep_size: Annotated[bool, Field(default=True, description="Keep size after transform.")]
    pad_mode: BorderModeType
    pad_val: ColorType | None
    mask_pad_val: ColorType | None
    fit_output: Annotated[bool, Field(default=False, description="Adjust image plane to capture whole image.")]
    interpolation: InterpolationType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, matrix, max_height, max_width, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    matrix: np.ndarray,
    max_height: int,
    max_width: int,
    **params: Any,
) -> np.ndarray:
    return fgeometric.perspective(
        img,
        matrix,
        max_width,
        max_height,
        self.pad_val,
        self.pad_mode,
        self.keep_size,
        params["interpolation"],
    )
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    scale = random.uniform(*self.scale)
    points = random_utils.normal(0, scale, (4, 2))
    points = np.mod(np.abs(points), 0.32)

    # top left -- no changes needed, just use jitter
    # top right
    points[1, 0] = 1.0 - points[1, 0]  # w = 1.0 - jitter
    # bottom right
    points[2] = 1.0 - points[2]  # w = 1.0 - jitt
    # bottom left
    points[3, 1] = 1.0 - points[3, 1]  # h = 1.0 - jitter

    points[:, 0] *= width
    points[:, 1] *= height

    # Obtain a consistent order of the points and unpack them individually.
    # Warning: don't just do (tl, tr, br, bl) = _order_points(...)
    # here, because the reordered points is used further below.
    points = self._order_points(points)
    tl, tr, br, bl = points

    # compute the width of the new image, which will be the
    # maximum distance between bottom-right and bottom-left
    # x-coordiates or the top-right and top-left x-coordinates
    min_width = None
    max_width = None
    while min_width is None or min_width < TWO:
        width_top = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
        width_bottom = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
        max_width = int(max(width_top, width_bottom))
        min_width = int(min(width_top, width_bottom))
        if min_width < TWO:
            step_size = (2 - min_width) / 2
            tl[0] -= step_size
            tr[0] += step_size
            bl[0] -= step_size
            br[0] += step_size

    # compute the height of the new image, which will be the maximum distance between the top-right
    # and bottom-right y-coordinates or the top-left and bottom-left y-coordinates
    min_height = None
    max_height = None
    while min_height is None or min_height < TWO:
        height_right = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
        height_left = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
        max_height = int(max(height_right, height_left))
        min_height = int(min(height_right, height_left))
        if min_height < TWO:
            step_size = (2 - min_height) / 2
            tl[1] -= step_size
            tr[1] -= step_size
            bl[1] += step_size
            br[1] += step_size

    # now that we have the dimensions of the new image, construct
    # the set of destination points to obtain a "birds eye view",
    # (i.e. top-down view) of the image, again specifying points
    # in the top-left, top-right, bottom-right, and bottom-left order
    # do not use width-1 or height-1 here, as for e.g. width=3, height=2
    # the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
    dst = np.array([[0, 0], [max_width, 0], [max_width, max_height], [0, max_height]], dtype=np.float32)

    # compute the perspective transform matrix and then apply it
    m = cv2.getPerspectiveTransform(points, dst)

    if self.fit_output:
        m, max_width, max_height = self._expand_transform(m, (height, width))

    return {"matrix": m, "max_height": max_height, "max_width": max_width, "interpolation": self.interpolation}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "scale", "keep_size", "pad_mode", "pad_val", "mask_pad_val", "fit_output", "interpolation"
class PiecewiseAffine (scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=None, keypoints_threshold=0.01, p=0.5) [view source on GitHub]

Apply affine transformations that differ between local neighborhoods. This augmentation places a regular grid of points on an image and randomly moves the neighborhood of these point around via affine transformations. This leads to local distortions.

This is mostly a wrapper around scikit-image's PiecewiseAffine. See also Affine for a similar technique.

Note

This augmenter is very slow. Try to use ElasticTransformation instead, which is at least 10x faster.

Note

For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.

Parameters:

Name Type Description
scale float, tuple of float

Each point on the regular grid is moved around via a normal distribution. This scale factor is equivalent to the normal distribution's sigma. Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of the image if absolute_scale=False (default), so this scale can be the same for different sized images. Recommended values are in the range 0.01 to 0.05 (weak to strong augmentations). * If a single float, then that value will always be used as the scale. * If a tuple (a, b) of float s, then a random value will be uniformly sampled per image from the interval [a, b].

nb_rows int, tuple of int

Number of rows of points that the regular grid should have. Must be at least 2. For large images, you might want to pick a higher value than 4. You might have to then adjust scale to lower values. * If a single int, then that value will always be used as the number of rows. * If a tuple (a, b), then a value from the discrete interval [a..b] will be uniformly sampled per image.

nb_cols int, tuple of int

Number of columns. Analogous to nb_rows.

interpolation int

The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic

mask_interpolation int

same as interpolation but for mask.

cval number

The constant value to use when filling in newly created pixels.

cval_mask number

Same as cval but only for masks.

mode str

{'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional Points outside the boundaries of the input are filled according to the given mode. Modes match the behaviour of numpy.pad.

absolute_scale bool

Take scale as an absolute value rather than a relative value.

keypoints_threshold float

Used as threshold in conversion from distance maps to keypoints. The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use None to use no min/max. Default: 0.01

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class PiecewiseAffine(DualTransform):
    """Apply affine transformations that differ between local neighborhoods.
    This augmentation places a regular grid of points on an image and randomly moves the neighborhood of these point
    around via affine transformations. This leads to local distortions.

    This is mostly a wrapper around scikit-image's ``PiecewiseAffine``.
    See also ``Affine`` for a similar technique.

    Note:
        This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.

    Note:
        For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
        this augmenter still has to perform an image-based augmentation,
        which will make it significantly slower and not fully correct for such inputs than other transforms.

    Args:
        scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
            This scale factor is equivalent to the normal distribution's sigma.
            Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
            the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
            Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
                * If a single ``float``, then that value will always be used as the scale.
                * If a tuple ``(a, b)`` of ``float`` s, then a random value will
                  be uniformly sampled per image from the interval ``[a, b]``.
        nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
            Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
            You might have to then adjust scale to lower values.
                * If a single ``int``, then that value will always be used as the number of rows.
                * If a tuple ``(a, b)``, then a value from the discrete interval
                  ``[a..b]`` will be uniformly sampled per image.
        nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
        interpolation (int): The order of interpolation. The order has to be in the range 0-5:
             - 0: Nearest-neighbor
             - 1: Bi-linear (default)
             - 2: Bi-quadratic
             - 3: Bi-cubic
             - 4: Bi-quartic
             - 5: Bi-quintic
        mask_interpolation (int): same as interpolation but for mask.
        cval (number): The constant value to use when filling in newly created pixels.
        cval_mask (number): Same as cval but only for masks.
        mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
            Points outside the boundaries of the input are filled according
            to the given mode.  Modes match the behaviour of `numpy.pad`.
        absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
        keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
            The search for keypoints works by searching for the
            argmin (non-inverted) or argmax (inverted) in each channel. This
            parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
            as a keypoint. Use ``None`` to use no min/max. Default: 0.01

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    class InitSchema(BaseTransformInitSchema):
        scale: NonNegativeFloatRangeType
        nb_rows: ScaleIntType
        nb_cols: ScaleIntType
        interpolation: InterpolationType
        mask_interpolation: InterpolationType
        cval: int
        cval_mask: int
        mode: Literal["constant", "edge", "symmetric", "reflect", "wrap"] = "constant"
        absolute_scale: bool
        keypoints_threshold: float

        @field_validator("nb_rows", "nb_cols")
        @classmethod
        def process_range(cls, value: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
            bounds = 2, BIG_INTEGER
            result = to_tuple(value, value)
            check_range(result, *bounds, info.field_name)
            return result

    def __init__(
        self,
        scale: ScaleFloatType = (0.03, 0.05),
        nb_rows: ScaleIntType = 4,
        nb_cols: ScaleIntType = 4,
        interpolation: int = cv2.INTER_LINEAR,
        mask_interpolation: int = cv2.INTER_NEAREST,
        cval: int = 0,
        cval_mask: int = 0,
        mode: Literal["constant", "edge", "symmetric", "reflect", "wrap"] = "constant",
        absolute_scale: bool = False,
        always_apply: bool | None = None,
        keypoints_threshold: float = 0.01,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)

        warn(
            "This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.",
            stacklevel=2,
        )

        self.scale = cast(Tuple[float, float], scale)
        self.nb_rows = cast(Tuple[int, int], nb_rows)
        self.nb_cols = cast(Tuple[int, int], nb_cols)
        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.absolute_scale = absolute_scale
        self.keypoints_threshold = keypoints_threshold

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "scale",
            "nb_rows",
            "nb_cols",
            "interpolation",
            "mask_interpolation",
            "cval",
            "cval_mask",
            "mode",
            "absolute_scale",
            "keypoints_threshold",
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
        nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
        nb_cells = nb_cols * nb_rows
        scale = random.uniform(*self.scale)

        jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
        if not np.any(jitter > 0):
            for _ in range(10):  # See: https://github.com/albumentations-team/albumentations/issues/1442
                jitter = random_utils.normal(0, scale, (nb_cells, 2))
                if np.any(jitter > 0):
                    break
            if not np.any(jitter > 0):
                return {"matrix": None}

        y = np.linspace(0, height, nb_rows)
        x = np.linspace(0, width, nb_cols)

        # (H, W) and (H, W) for H=rows, W=cols
        xx_src, yy_src = np.meshgrid(x, y)

        # (1, HW, 2) => (HW, 2) for H=rows, W=cols
        points_src = np.dstack([yy_src.flat, xx_src.flat])[0]

        if self.absolute_scale:
            jitter[:, 0] = jitter[:, 0] / height if height > 0 else 0.0
            jitter[:, 1] = jitter[:, 1] / width if width > 0 else 0.0

        jitter[:, 0] = jitter[:, 0] * height
        jitter[:, 1] = jitter[:, 1] * width

        points_dest = np.copy(points_src)
        points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
        points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]

        # Restrict all destination points to be inside the image plane.
        # This is necessary, as otherwise keypoints could be augmented
        # outside of the image plane and these would be replaced by
        # (-1, -1), which would not conform with the behaviour of the other augmenters.
        points_dest[:, 0] = np.clip(points_dest[:, 0], 0, height - 1)
        points_dest[:, 1] = np.clip(points_dest[:, 1], 0, width - 1)

        matrix = skimage.transform.PiecewiseAffineTransform()
        matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])

        return {
            "matrix": matrix,
        }

    def apply(
        self,
        img: np.ndarray,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.piecewise_affine(img, matrix, self.interpolation, self.mode, self.cval)

    def apply_to_mask(
        self,
        mask: np.ndarray,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.piecewise_affine(mask, matrix, self.mask_interpolation, self.mode, self.cval_mask)

    def apply_to_bboxes(
        self,
        bboxes: np.ndarray,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.bboxes_piecewise_affine(bboxes, matrix, params["shape"], self.keypoints_threshold)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        matrix: skimage.transform.PiecewiseAffineTransform,
        **params: Any,
    ) -> np.ndarray:
        return fgeometric.keypoints_piecewise_affine(keypoints, matrix, params["shape"], self.keypoints_threshold)
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale: NonNegativeFloatRangeType
    nb_rows: ScaleIntType
    nb_cols: ScaleIntType
    interpolation: InterpolationType
    mask_interpolation: InterpolationType
    cval: int
    cval_mask: int
    mode: Literal["constant", "edge", "symmetric", "reflect", "wrap"] = "constant"
    absolute_scale: bool
    keypoints_threshold: float

    @field_validator("nb_rows", "nb_cols")
    @classmethod
    def process_range(cls, value: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
        bounds = 2, BIG_INTEGER
        result = to_tuple(value, value)
        check_range(result, *bounds, info.field_name)
        return result
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, matrix, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    matrix: skimage.transform.PiecewiseAffineTransform,
    **params: Any,
) -> np.ndarray:
    return fgeometric.piecewise_affine(img, matrix, self.interpolation, self.mode, self.cval)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
    nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
    nb_cells = nb_cols * nb_rows
    scale = random.uniform(*self.scale)

    jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
    if not np.any(jitter > 0):
        for _ in range(10):  # See: https://github.com/albumentations-team/albumentations/issues/1442
            jitter = random_utils.normal(0, scale, (nb_cells, 2))
            if np.any(jitter > 0):
                break
        if not np.any(jitter > 0):
            return {"matrix": None}

    y = np.linspace(0, height, nb_rows)
    x = np.linspace(0, width, nb_cols)

    # (H, W) and (H, W) for H=rows, W=cols
    xx_src, yy_src = np.meshgrid(x, y)

    # (1, HW, 2) => (HW, 2) for H=rows, W=cols
    points_src = np.dstack([yy_src.flat, xx_src.flat])[0]

    if self.absolute_scale:
        jitter[:, 0] = jitter[:, 0] / height if height > 0 else 0.0
        jitter[:, 1] = jitter[:, 1] / width if width > 0 else 0.0

    jitter[:, 0] = jitter[:, 0] * height
    jitter[:, 1] = jitter[:, 1] * width

    points_dest = np.copy(points_src)
    points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
    points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]

    # Restrict all destination points to be inside the image plane.
    # This is necessary, as otherwise keypoints could be augmented
    # outside of the image plane and these would be replaced by
    # (-1, -1), which would not conform with the behaviour of the other augmenters.
    points_dest[:, 0] = np.clip(points_dest[:, 0], 0, height - 1)
    points_dest[:, 1] = np.clip(points_dest[:, 1], 0, width - 1)

    matrix = skimage.transform.PiecewiseAffineTransform()
    matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])

    return {
        "matrix": matrix,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "scale",
        "nb_rows",
        "nb_cols",
        "interpolation",
        "mask_interpolation",
        "cval",
        "cval_mask",
        "mode",
        "absolute_scale",
        "keypoints_threshold",
    )
class ShiftScaleRotate (shift_limit=(-0.0625, 0.0625), scale_limit=(-0.1, 0.1), rotate_limit=(-45, 45), interpolation=1, border_mode=4, value=0, mask_value=0, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=None, p=0.5) [view source on GitHub]

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters:

Name Type Description
shift_limit float, float) or float

shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [-1, 1]. Default: (-0.0625, 0.0625).

scale_limit float, float) or float

scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1).

rotate_limit int, int) or int

rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45).

interpolation OpenCV flag

flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

border_mode OpenCV flag

flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101

value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT.

mask_value int, float, list of int, list of float

padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

shift_limit_x float, float) or float

shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [-1, 1]. Default: None.

shift_limit_y float, float) or float

shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [-, 1]. Default: None.

rotate_method str

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box"

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, keypoints, bboxes

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class ShiftScaleRotate(Affine):
    """Randomly apply affine transforms: translate, scale and rotate the input.

    Args:
        shift_limit ((float, float) or float): shift factor range for both height and width. If shift_limit
            is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and
            upper bounds should lie in range [-1, 1]. Default: (-0.0625, 0.0625).
        scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
            range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
            If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
            Default: (-0.1, 0.1).
        rotate_limit ((int, int) or int): rotation range. If rotate_limit is a single int value, the
            range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of int,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        shift_limit_x ((float, float) or float): shift factor range for width. If it is set then this value
            instead of shift_limit will be used for shifting width.  If shift_limit_x is a single float value,
            the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in
            the range [-1, 1]. Default: None.
        shift_limit_y ((float, float) or float): shift factor range for height. If it is set then this value
            instead of shift_limit will be used for shifting height.  If shift_limit_y is a single float value,
            the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie
            in the range [-, 1]. Default: None.
        rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
            Default: "largest_box"
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)

    class InitSchema(BaseTransformInitSchema):
        shift_limit: SymmetricRangeType = (-0.0625, 0.0625)
        scale_limit: SymmetricRangeType = (-0.1, 0.1)
        rotate_limit: SymmetricRangeType = (-45, 45)
        interpolation: InterpolationType = cv2.INTER_LINEAR
        border_mode: BorderModeType = cv2.BORDER_REFLECT_101
        value: ColorType = 0
        mask_value: ColorType = 0
        shift_limit_x: ScaleFloatType | None = Field(default=None)
        shift_limit_y: ScaleFloatType | None = Field(default=None)
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box"

        @model_validator(mode="after")
        def check_shift_limit(self) -> Self:
            bounds = -1, 1
            self.shift_limit_x = to_tuple(self.shift_limit_x if self.shift_limit_x is not None else self.shift_limit)
            check_range(self.shift_limit_x, *bounds, "shift_limit_x")
            self.shift_limit_y = to_tuple(self.shift_limit_y if self.shift_limit_y is not None else self.shift_limit)
            check_range(self.shift_limit_y, *bounds, "shift_limit_y")
            return self

        @field_validator("scale_limit")
        @classmethod
        def check_scale_limit(cls, value: ScaleFloatType, info: ValidationInfo) -> ScaleFloatType:
            bounds = 0, float("inf")
            result = to_tuple(value, bias=1.0)
            check_range(result, *bounds, str(info.field_name))
            return result

    def __init__(
        self,
        shift_limit: ScaleFloatType = (-0.0625, 0.0625),
        scale_limit: ScaleFloatType = (-0.1, 0.1),
        rotate_limit: ScaleFloatType = (-45, 45),
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: ColorType = 0,
        mask_value: ColorType = 0,
        shift_limit_x: ScaleFloatType | None = None,
        shift_limit_y: ScaleFloatType | None = None,
        rotate_method: Literal["largest_box", "ellipse"] = "largest_box",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(
            scale=scale_limit,
            translate_percent={"x": shift_limit_x, "y": shift_limit_y},
            rotate=rotate_limit,
            shear=(0, 0),
            interpolation=interpolation,
            mask_interpolation=cv2.INTER_NEAREST,
            cval=value,
            cval_mask=mask_value,
            mode=border_mode,
            fit_output=False,
            keep_ratio=False,
            rotate_method=rotate_method,
            always_apply=always_apply,
            p=p,
        )
        warn(
            "ShiftScaleRotate is deprecated. Please use Affine transform instead .",
            DeprecationWarning,
            stacklevel=2,
        )
        self.shift_limit_x = cast(Tuple[float, float], shift_limit_x)
        self.shift_limit_y = cast(Tuple[float, float], shift_limit_y)
        self.scale_limit = cast(Tuple[float, float], scale_limit)
        self.rotate_limit = cast(Tuple[int, int], rotate_limit)
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def get_transform_init_args(self) -> dict[str, Any]:
        return {
            "shift_limit_x": self.shift_limit_x,
            "shift_limit_y": self.shift_limit_y,
            "scale_limit": to_tuple(self.scale_limit, bias=-1.0),
            "rotate_limit": self.rotate_limit,
            "interpolation": self.interpolation,
            "border_mode": self.border_mode,
            "value": self.value,
            "mask_value": self.mask_value,
            "rotate_method": self.rotate_method,
        }
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    shift_limit: SymmetricRangeType = (-0.0625, 0.0625)
    scale_limit: SymmetricRangeType = (-0.1, 0.1)
    rotate_limit: SymmetricRangeType = (-45, 45)
    interpolation: InterpolationType = cv2.INTER_LINEAR
    border_mode: BorderModeType = cv2.BORDER_REFLECT_101
    value: ColorType = 0
    mask_value: ColorType = 0
    shift_limit_x: ScaleFloatType | None = Field(default=None)
    shift_limit_y: ScaleFloatType | None = Field(default=None)
    rotate_method: Literal["largest_box", "ellipse"] = "largest_box"

    @model_validator(mode="after")
    def check_shift_limit(self) -> Self:
        bounds = -1, 1
        self.shift_limit_x = to_tuple(self.shift_limit_x if self.shift_limit_x is not None else self.shift_limit)
        check_range(self.shift_limit_x, *bounds, "shift_limit_x")
        self.shift_limit_y = to_tuple(self.shift_limit_y if self.shift_limit_y is not None else self.shift_limit)
        check_range(self.shift_limit_y, *bounds, "shift_limit_y")
        return self

    @field_validator("scale_limit")
    @classmethod
    def check_scale_limit(cls, value: ScaleFloatType, info: ValidationInfo) -> ScaleFloatType:
        bounds = 0, float("inf")
        result = to_tuple(value, bias=1.0)
        check_range(result, *bounds, str(info.field_name))
        return result
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class Transpose [view source on GitHub]

Transpose the input by swapping rows and columns.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class Transpose(DualTransform):
    """Transpose the input by swapping rows and columns.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.transpose(img)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_transpose(bboxes)

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.keypoints_transpose(keypoints)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fgeometric.transpose(img)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()
class VerticalFlip [view source on GitHub]

Flip the input vertically around the x-axis.

Parameters:

Name Type Description
p float

probability of applying the transform. Default: 0.5.

Targets

image, mask, bboxes, keypoints

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/geometric/transforms.py
Python
class VerticalFlip(DualTransform):
    """Flip the input vertically around the x-axis.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return vflip(img)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.bboxes_vflip(bboxes)

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        return fgeometric.keypoints_vflip(keypoints, params["rows"])

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return vflip(img)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/geometric/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()

mixing special

transforms

class MixUp (reference_data=None, read_fn=<function MixUp.<lambda> at 0x7f37fe449bc0>, alpha=0.4, mix_coef_return_name='mix_coef', always_apply=None, p=0.5) [view source on GitHub]

Performs MixUp data augmentation, blending images, masks, and class labels with reference data.

MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution. This method is known for improving model generalization by promoting linear behavior between classes and smoothing decision boundaries.

Reference

  • Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412

Parameters:

Name Type Description
reference_data Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]

A sequence or generator of dictionaries containing the reference data for mixing If None or an empty sequence is provided, no operation is performed and a warning is issued.

read_fn Callable[[ReferenceImage], dict[str, Any]]

A function to process items from reference_data. It should accept items from reference_data and return a dictionary containing processed data: - The returned dictionary must include an 'image' key with a numpy array value. - It may also include 'mask', 'global_label' each associated with numpy array values. Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.

mix_coef_return_name str

Name used for the applied alpha coefficient in the returned dictionary. Defaults to "mix_coef".

alpha float

The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0. Higher values lead to more uniform mixing. Defaults to 0.4.

p float

The probability of applying the transformation. Defaults to 0.5.

Targets

image, mask, global_label

Image types: - uint8, float32

Exceptions:

Type Description
- ValueError

If the alpha parameter is negative.

- NotImplementedError

If the transform is applied to bounding boxes or keypoints.

Notes

  • If no reference data is provided, a warning is issued, and the transform acts as a no-op.
  • Notes if images are in float32 format, they should be within [0, 1] range.

Example Usage: import albumentations as A import numpy as np from albumentations.core.types import ReferenceImage

# Prepare reference data
# Note: This code generates random reference data for demonstration purposes only.
# In real-world applications, it's crucial to use meaningful and representative data.
# The quality and relevance of your input data significantly impact the effectiveness
# of the augmentation process. Ensure your data closely aligns with your specific
# use case and application requirements.
reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
                                 mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
                                 global_label=np.random.choice([0, 1], size=3)) for i in range(10)]

# In this example, the lambda function simply returns its input, which works well for
# data already in the expected format. For more complex scenarios, where the data might not be in
# the required format or additional processing is needed, a more sophisticated function can be implemented.
# Below is a hypothetical example where the input data is a file path, # and the function reads the image
# file, converts it to a specific format, and possibly performs other preprocessing steps.

# Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
# def custom_read_fn(file_path):
#     from PIL import Image
#     image = Image.open(file_path).convert('RGB')
#     image = image.resize((100, 100))  # Example resize, adjust as needed.
#     return np.array(image)

# aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])

# For simplicity, the original lambda function is used in this example.
# Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.

# Apply augmentations
image = np.empty([100, 100, 3], dtype=np.uint8)
mask = np.empty([100, 100], dtype=np.uint8)
global_label = np.array([0, 1, 0])
data = aug(image=image, global_label=global_label, mask=mask)
transformed_image = data["image"]
transformed_mask = data["mask"]
transformed_global_label = data["global_label"]

# Print applied mix coefficient
print(data["mix_coef"])  # Output: e.g., 0.9991580344142427

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/mixing/transforms.py
Python
class MixUp(ReferenceBasedTransform):
    """Performs MixUp data augmentation, blending images, masks, and class labels with reference data.

    MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined
    reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution.
    This method is known for improving model generalization by promoting linear behavior between classes and
    smoothing decision boundaries.

    Reference:
        - Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization.
        In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412

    Args:
        reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
            A sequence or generator of dictionaries containing the reference data for mixing
            If None or an empty sequence is provided, no operation is performed and a warning is issued.
        read_fn (Callable[[ReferenceImage], dict[str, Any]]):
            A function to process items from reference_data. It should accept items from reference_data
            and return a dictionary containing processed data:
                - The returned dictionary must include an 'image' key with a numpy array value.
                - It may also include 'mask', 'global_label' each associated with numpy array values.
            Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.
        mix_coef_return_name (str): Name used for the applied alpha coefficient in the returned dictionary.
            Defaults to "mix_coef".
        alpha (float):
            The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0.
            Higher values lead to more uniform mixing. Defaults to 0.4.
        p (float):
            The probability of applying the transformation. Defaults to 0.5.

    Targets:
        image, mask, global_label

    Image types:
        - uint8, float32

    Raises:
        - ValueError: If the alpha parameter is negative.
        - NotImplementedError: If the transform is applied to bounding boxes or keypoints.

    Notes:
        - If no reference data is provided, a warning is issued, and the transform acts as a no-op.
        - Notes if images are in float32 format, they should be within [0, 1] range.

    Example Usage:
        import albumentations as A
        import numpy as np
        from albumentations.core.types import ReferenceImage

        # Prepare reference data
        # Note: This code generates random reference data for demonstration purposes only.
        # In real-world applications, it's crucial to use meaningful and representative data.
        # The quality and relevance of your input data significantly impact the effectiveness
        # of the augmentation process. Ensure your data closely aligns with your specific
        # use case and application requirements.
        reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
                                         mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
                                         global_label=np.random.choice([0, 1], size=3)) for i in range(10)]

        # In this example, the lambda function simply returns its input, which works well for
        # data already in the expected format. For more complex scenarios, where the data might not be in
        # the required format or additional processing is needed, a more sophisticated function can be implemented.
        # Below is a hypothetical example where the input data is a file path, # and the function reads the image
        # file, converts it to a specific format, and possibly performs other preprocessing steps.

        # Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
        # def custom_read_fn(file_path):
        #     from PIL import Image
        #     image = Image.open(file_path).convert('RGB')
        #     image = image.resize((100, 100))  # Example resize, adjust as needed.
        #     return np.array(image)

        # aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])

        # For simplicity, the original lambda function is used in this example.
        # Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.

        # Apply augmentations
        image = np.empty([100, 100, 3], dtype=np.uint8)
        mask = np.empty([100, 100], dtype=np.uint8)
        global_label = np.array([0, 1, 0])
        data = aug(image=image, global_label=global_label, mask=mask)
        transformed_image = data["image"]
        transformed_mask = data["mask"]
        transformed_global_label = data["global_label"]

        # Print applied mix coefficient
        print(data["mix_coef"])  # Output: e.g., 0.9991580344142427
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.GLOBAL_LABEL)

    class InitSchema(BaseTransformInitSchema):
        reference_data: Generator[Any, None, None] | Sequence[Any] | None = None
        read_fn: Callable[[ReferenceImage], Any]
        alpha: Annotated[float, Field(default=0.4, ge=0, le=1)]
        mix_coef_return_name: str = "mix_coef"

    def __init__(
        self,
        reference_data: Generator[Any, None, None] | Sequence[Any] | None = None,
        read_fn: Callable[[ReferenceImage], Any] = lambda x: {"image": x, "mask": None, "class_label": None},
        alpha: float = 0.4,
        mix_coef_return_name: str = "mix_coef",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.mix_coef_return_name = mix_coef_return_name

        self.read_fn = read_fn
        self.alpha = alpha

        if reference_data is None:
            warn("No reference data provided for MixUp. This transform will act as a no-op.", stacklevel=2)
            # Create an empty generator
            self.reference_data: list[Any] = []
        elif (
            isinstance(reference_data, types.GeneratorType)
            or isinstance(reference_data, Iterable)
            and not isinstance(reference_data, str)
        ):
            self.reference_data = reference_data  # type: ignore[assignment]
        else:
            msg = "reference_data must be a list, tuple, generator, or None."
            raise TypeError(msg)

    def apply(self, img: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
        if not mix_data:
            return img

        mix_img = mix_data["image"]

        if img.shape != mix_img.shape and not is_grayscale_image(img):
            msg = "The shape of the reference image should be the same as the input image."
            raise ValueError(msg)

        return add_weighted(img, mix_coef, mix_img.reshape(img.shape), 1 - mix_coef) if mix_img is not None else img

    def apply_to_mask(self, mask: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
        mix_mask = mix_data.get("mask")
        return (
            add_weighted(mask, mix_coef, mix_mask.reshape(mask.shape), 1 - mix_coef) if mix_mask is not None else mask
        )

    def apply_to_global_label(
        self,
        label: np.ndarray,
        mix_data: ReferenceImage,
        mix_coef: float,
        **params: Any,
    ) -> np.ndarray:
        mix_label = mix_data.get("global_label")
        if mix_label is not None and label is not None:
            return mix_coef * label + (1 - mix_coef) * mix_label
        return label

    def apply_to_bboxes(self, bboxes: np.ndarray, mix_data: ReferenceImage, **params: Any) -> np.ndarray:
        msg = "MixUp does not support bounding boxes yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
        raise NotImplementedError(msg)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        *args: Any,
        **params: Any,
    ) -> np.ndarray:
        msg = "MixUp does not support keypoints yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
        raise NotImplementedError(msg)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "reference_data", "alpha"

    def get_params(self) -> dict[str, None | float | dict[str, Any]]:
        mix_data = None
        # Check if reference_data is not empty and is a sequence (list, tuple, np.array)
        if isinstance(self.reference_data, Sequence) and not isinstance(self.reference_data, (str, bytes)):
            if len(self.reference_data) > 0:  # Additional check to ensure it's not empty
                mix_idx = random.randint(0, len(self.reference_data) - 1)
                mix_data = self.reference_data[mix_idx]
        # Check if reference_data is an iterator or generator
        elif isinstance(self.reference_data, Iterator):
            try:
                mix_data = next(self.reference_data)  # Attempt to get the next item
            except StopIteration:
                warn(
                    "Reference data iterator/generator has been exhausted. "
                    "Further mixing augmentations will not be applied.",
                    RuntimeWarning,
                    stacklevel=2,
                )
                return {"mix_data": {}, "mix_coef": 1}

        # If mix_data is None or empty after the above checks, return default values
        if mix_data is None:
            return {"mix_data": {}, "mix_coef": 1}

        # If mix_data is not None, calculate mix_coef and apply read_fn
        mix_coef = beta(self.alpha, self.alpha)  # Assuming beta is defined elsewhere
        return {"mix_data": self.read_fn(mix_data), "mix_coef": mix_coef}

    def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
        res = super().apply_with_params(params, *args, **kwargs)
        if self.mix_coef_return_name:
            res[self.mix_coef_return_name] = params["mix_coef"]
            res["mix_data"] = params["mix_data"]
        return res
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/mixing/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    reference_data: Generator[Any, None, None] | Sequence[Any] | None = None
    read_fn: Callable[[ReferenceImage], Any]
    alpha: Annotated[float, Field(default=0.4, ge=0, le=1)]
    mix_coef_return_name: str = "mix_coef"
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, mix_data, mix_coef, **params)

Apply transform on image.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def apply(self, img: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
    if not mix_data:
        return img

    mix_img = mix_data["image"]

    if img.shape != mix_img.shape and not is_grayscale_image(img):
        msg = "The shape of the reference image should be the same as the input image."
        raise ValueError(msg)

    return add_weighted(img, mix_coef, mix_img.reshape(img.shape), 1 - mix_coef) if mix_img is not None else img
apply_with_params (self, params, *args, **kwargs)

Apply transforms with parameters.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
    res = super().apply_with_params(params, *args, **kwargs)
    if self.mix_coef_return_name:
        res[self.mix_coef_return_name] = params["mix_coef"]
        res["mix_data"] = params["mix_data"]
    return res
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def get_params(self) -> dict[str, None | float | dict[str, Any]]:
    mix_data = None
    # Check if reference_data is not empty and is a sequence (list, tuple, np.array)
    if isinstance(self.reference_data, Sequence) and not isinstance(self.reference_data, (str, bytes)):
        if len(self.reference_data) > 0:  # Additional check to ensure it's not empty
            mix_idx = random.randint(0, len(self.reference_data) - 1)
            mix_data = self.reference_data[mix_idx]
    # Check if reference_data is an iterator or generator
    elif isinstance(self.reference_data, Iterator):
        try:
            mix_data = next(self.reference_data)  # Attempt to get the next item
        except StopIteration:
            warn(
                "Reference data iterator/generator has been exhausted. "
                "Further mixing augmentations will not be applied.",
                RuntimeWarning,
                stacklevel=2,
            )
            return {"mix_data": {}, "mix_coef": 1}

    # If mix_data is None or empty after the above checks, return default values
    if mix_data is None:
        return {"mix_data": {}, "mix_coef": 1}

    # If mix_data is not None, calculate mix_coef and apply read_fn
    mix_coef = beta(self.alpha, self.alpha)  # Assuming beta is defined elsewhere
    return {"mix_data": self.read_fn(mix_data), "mix_coef": mix_coef}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "reference_data", "alpha"
class OverlayElements (metadata_key='overlay_metadata', p=0.5, always_apply=None) [view source on GitHub]

Apply overlay elements such as images and masks onto an input image. This transformation can be used to add various objects (e.g., stickers, logos) to images with optional masks and bounding boxes for better placement control.

Parameters:

Name Type Description
metadata_key str

Additional target key for metadata. Default overlay_metadata.

p float

Probability of applying the transformation. Default: 0.5.

Possible Metadata Fields: - image (np.ndarray): The overlay image to be applied. This is a required field. - bbox (list[int]): The bounding box specifying the region where the overlay should be applied. It should contain four floats: [y_min, x_min, y_max, x_max]. If label_id is provided, it should be appended as the fifth element in the bbox. BBox should be in Albumentations format, that is the same as normalized Pascal VOC format [x_min / width, y_min / height, x_max / width, y_max / height] - mask (np.ndarray): An optional mask that defines the non-rectangular region of the overlay image. If not provided, the entire overlay image is used. - mask_id (int): An optional identifier for the mask. If provided, the regions specified by the mask will be labeled with this identifier in the output mask.

Targets

image, mask

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/mixing/transforms.py
Python
class OverlayElements(ReferenceBasedTransform):
    """Apply overlay elements such as images and masks onto an input image. This transformation can be used to add
    various objects (e.g., stickers, logos) to images with optional masks and bounding boxes for better placement
    control.

    Args:
        metadata_key (str): Additional target key for metadata. Default `overlay_metadata`.
        p (float): Probability of applying the transformation. Default: 0.5.

    Possible Metadata Fields:
        - image (np.ndarray): The overlay image to be applied. This is a required field.
        - bbox (list[int]): The bounding box specifying the region where the overlay should be applied. It should
                            contain four floats: [y_min, x_min, y_max, x_max]. If `label_id` is provided, it should
                            be appended as the fifth element in the bbox. BBox should be in Albumentations format,
                            that is the same as normalized Pascal VOC format
                            [x_min / width, y_min / height, x_max / width, y_max / height]
        - mask (np.ndarray): An optional mask that defines the non-rectangular region of the overlay image. If not
                             provided, the entire overlay image is used.
        - mask_id (int): An optional identifier for the mask. If provided, the regions specified by the mask will
                         be labeled with this identifier in the output mask.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://github.com/danaaubakirova/doc-augmentation

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        metadata_key: str

    def __init__(
        self,
        metadata_key: str = "overlay_metadata",
        p: float = 0.5,
        always_apply: bool | None = None,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.metadata_key = metadata_key

    @property
    def targets_as_params(self) -> list[str]:
        return [self.metadata_key]

    @staticmethod
    def preprocess_metadata(metadata: dict[str, Any], img_shape: tuple[int, int]) -> dict[str, Any]:
        overlay_image = metadata["image"]
        overlay_height, overlay_width = overlay_image.shape[:2]
        image_height, image_width = img_shape[:2]

        if "bbox" in metadata:
            bbox = metadata["bbox"]
            bbox_np = np.array([bbox])
            check_bboxes(bbox_np)
            denormalized_bbox = denormalize_bboxes(bbox_np, img_shape[:2])[0]

            x_min, y_min, x_max, y_max = (int(x) for x in denormalized_bbox[:4])

            if "mask" in metadata:
                mask = metadata["mask"]
                mask = cv2.resize(mask, (x_max - x_min, y_max - y_min), interpolation=cv2.INTER_NEAREST)
            else:
                mask = np.ones((y_max - y_min, x_max - x_min), dtype=np.uint8)

            overlay_image = cv2.resize(overlay_image, (x_max - x_min, y_max - y_min), interpolation=cv2.INTER_AREA)
            offset = (y_min, x_min)

            if len(bbox) == LENGTH_RAW_BBOX and "bbox_id" in metadata:
                bbox = [x_min, y_min, x_max, y_max, metadata["bbox_id"]]
            else:
                bbox = (x_min, y_min, x_max, y_max, *bbox[4:])
        else:
            if image_height < overlay_height or image_width < overlay_width:
                overlay_image = cv2.resize(overlay_image, (image_width, image_height), interpolation=cv2.INTER_AREA)
                overlay_height, overlay_width = overlay_image.shape[:2]

            mask = metadata["mask"] if "mask" in metadata else np.ones_like(overlay_image, dtype=np.uint8)

            max_x_offset = image_width - overlay_width
            max_y_offset = image_height - overlay_height

            offset_x = random.randint(0, max_x_offset)
            offset_y = random.randint(0, max_y_offset)

            offset = (offset_y, offset_x)

            bbox = [
                offset_x,
                offset_y,
                offset_x + overlay_width,
                offset_y + overlay_height,
            ]

            if "bbox_id" in metadata:
                bbox = [*bbox, metadata["bbox_id"]]

        result = {
            "overlay_image": overlay_image,
            "overlay_mask": mask,
            "offset": offset,
            "bbox": bbox,
        }

        if "mask_id" in metadata:
            result["mask_id"] = metadata["mask_id"]

        return result

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        metadata = data[self.metadata_key]
        img_shape = params["shape"]

        if isinstance(metadata, list):
            overlay_data = [self.preprocess_metadata(md, img_shape) for md in metadata]
        else:
            overlay_data = [self.preprocess_metadata(metadata, img_shape)]

        return {
            "overlay_data": overlay_data,
        }

    def apply(
        self,
        img: np.ndarray,
        overlay_data: list[dict[str, Any]],
        **params: Any,
    ) -> np.ndarray:
        for data in overlay_data:
            overlay_image = data["overlay_image"]
            overlay_mask = data["overlay_mask"]
            offset = data["offset"]
            img = fmixing.copy_and_paste_blend(img, overlay_image, overlay_mask, offset=offset)
        return img

    def apply_to_mask(
        self,
        mask: np.ndarray,
        overlay_data: list[dict[str, Any]],
        **params: Any,
    ) -> np.ndarray:
        for data in overlay_data:
            if "mask_id" in data and data["mask_id"] is not None:
                overlay_mask = data["overlay_mask"]
                offset = data["offset"]
                mask_id = data["mask_id"]

                y_min, x_min = offset
                y_max = y_min + overlay_mask.shape[0]
                x_max = x_min + overlay_mask.shape[1]

                mask_section = mask[y_min:y_max, x_min:x_max]
                mask_section[overlay_mask > 0] = mask_id

        return mask

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("metadata_key",)
targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/mixing/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    metadata_key: str
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, overlay_data, **params)

Apply transform on image.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    overlay_data: list[dict[str, Any]],
    **params: Any,
) -> np.ndarray:
    for data in overlay_data:
        overlay_image = data["overlay_image"]
        overlay_mask = data["overlay_mask"]
        offset = data["offset"]
        img = fmixing.copy_and_paste_blend(img, overlay_image, overlay_mask, offset=offset)
    return img
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    metadata = data[self.metadata_key]
    img_shape = params["shape"]

    if isinstance(metadata, list):
        overlay_data = [self.preprocess_metadata(md, img_shape) for md in metadata]
    else:
        overlay_data = [self.preprocess_metadata(metadata, img_shape)]

    return {
        "overlay_data": overlay_data,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/mixing/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("metadata_key",)

text special

functional

def convert_image_to_pil (image) [view source on GitHub]

Convert a NumPy array image to a PIL image.

Source code in albumentations/augmentations/text/functional.py
Python
def convert_image_to_pil(image: np.ndarray) -> Image:
    """Convert a NumPy array image to a PIL image."""
    try:
        from PIL import Image
    except ImportError:
        raise ImportError("Pillow is not installed") from ImportError

    if len(image.shape) == MONO_CHANNEL_DIMENSIONS:  # (height, width)
        return Image.fromarray(image)
    if len(image.shape) == NUM_MULTI_CHANNEL_DIMENSIONS and image.shape[2] == 1:  # (height, width, 1)
        return Image.fromarray(image[:, :, 0], mode="L")
    if len(image.shape) == NUM_MULTI_CHANNEL_DIMENSIONS and image.shape[2] == NUM_RGB_CHANNELS:  # (height, width, 3)
        return Image.fromarray(image)

    raise TypeError(f"Unsupported image shape: {image.shape}")
def draw_text_on_multi_channel_image (image, metadata_list) [view source on GitHub]

Draw text on a multi-channel image with more than three channels.

Source code in albumentations/augmentations/text/functional.py
Python
def draw_text_on_multi_channel_image(image: np.ndarray, metadata_list: list[dict[str, Any]]) -> np.ndarray:
    """Draw text on a multi-channel image with more than three channels."""
    try:
        from PIL import Image, ImageDraw
    except ImportError:
        raise ImportError("Pillow is not installed") from ImportError

    channels = [Image.fromarray(image[:, :, i]) for i in range(image.shape[2])]
    pil_images = [ImageDraw.Draw(channel) for channel in channels]

    for metadata in metadata_list:
        bbox_coords = metadata["bbox_coords"]
        text = metadata["text"]
        font = metadata["font"]
        font_color = metadata["font_color"]

        # Handle different font_color types
        if isinstance(font_color, str):
            # If it's a string, use it as is for all channels
            font_color = [font_color] * image.shape[2]
        elif isinstance(font_color, (int, float)):
            # If it's a single number, convert to int and use for all channels
            font_color = [int(font_color)] * image.shape[2]
        elif isinstance(font_color, Sequence):
            # If it's a sequence, ensure it has the right length and convert to int
            if len(font_color) != image.shape[2]:
                raise ValueError(
                    f"font_color sequence length ({len(font_color)}) "
                    f"must match the number of image channels ({image.shape[2]})",
                )
            font_color = [int(c) for c in font_color]
        else:
            raise TypeError(f"Unsupported font_color type: {type(font_color)}")

        position = bbox_coords[:2]

        for channel_id, pil_image in enumerate(pil_images):
            pil_image.text(position, text, font=font, fill=font_color[channel_id])

    return np.stack([np.array(channel) for channel in channels], axis=2)
def draw_text_on_pil_image (pil_image, metadata_list) [view source on GitHub]

Draw text on a PIL image using metadata information.

Source code in albumentations/augmentations/text/functional.py
Python
def draw_text_on_pil_image(pil_image: Image, metadata_list: list[dict[str, Any]]) -> Image:
    """Draw text on a PIL image using metadata information."""
    try:
        from PIL import ImageDraw
    except ImportError:
        raise ImportError("Pillow is not installed") from ImportError

    draw = ImageDraw.Draw(pil_image)
    for metadata in metadata_list:
        bbox_coords = metadata["bbox_coords"]
        text = metadata["text"]
        font = metadata["font"]
        font_color = metadata["font_color"]
        if isinstance(font_color, (list, tuple)):
            font_color = tuple(int(c) for c in font_color)
        elif isinstance(font_color, float):
            font_color = int(font_color)
        position = bbox_coords[:2]
        draw.text(position, text, font=font, fill=font_color)
    return pil_image

transforms

class TextImage (font_path, stopwords=('the', 'is', 'in', 'at', 'of'), augmentations=(None,), fraction_range=(1.0, 1.0), font_size_fraction_range=(0.8, 0.9), font_color='black', clear_bg=False, metadata_key='textimage_metadata', always_apply=None, p=0.5) [view source on GitHub]

Apply text rendering transformations on images.

This class supports rendering text directly onto images using a variety of configurations, such as custom fonts, font sizes, colors, and augmentation methods. The text can be placed inside specified bounding boxes.

Parameters:

Name Type Description
font_path str | Path

Path to the font file to use for rendering text.

stopwords list[str] | None

List of stopwords for text augmentation.

augmentations tuple[str | None, ...] | list[str | None]

List of text augmentations to apply. None: text is printed as is "insertion": insert random stop words into the text. "swap": swap random words in the text. "deletion": delete random words from the text.

fraction_range tuple[float, float]

Range for selecting a fraction of bounding boxes to modify.

font_size_fraction_range tuple[float, float]

Range for selecting the font size as a fraction of bounding box height.

font_color list[str] | str

List of possible font colors or a single font color.

clear_bg bool

Whether to clear the background before rendering text.

metadata_key str

Key to access metadata in the parameters.

p float

Probability of applying the transform.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import albumentations as A
>>> transform = A.Compose([
    A.TextImage(
        font_path=Path("/path/to/font.ttf"),
        stopwords=["the", "is", "in"],
        augmentations=("insertion", "deletion"),
        fraction_range=(0.5, 1.0),
        font_size_fraction_range=(0.5, 0.9),
        font_color=["red", "green", "blue"],
        metadata_key="text_metadata",
        p=0.5
    )
])
>>> transformed = transform(image=my_image, text_metadata=my_metadata)
>>> image = transformed['image']
# This will render text on `my_image` based on the metadata provided in `my_metadata`.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/text/transforms.py
Python
class TextImage(ImageOnlyTransform):
    """Apply text rendering transformations on images.

    This class supports rendering text directly onto images using a variety of configurations,
    such as custom fonts, font sizes, colors, and augmentation methods. The text can be placed
    inside specified bounding boxes.

    Args:
        font_path (str | Path): Path to the font file to use for rendering text.
        stopwords (list[str] | None): List of stopwords for text augmentation.
        augmentations (tuple[str | None, ...] | list[str | None]): List of text augmentations to apply.
            None: text is printed as is
            "insertion": insert random stop words into the text.
            "swap": swap random words in the text.
            "deletion": delete random words from the text.
        fraction_range (tuple[float, float]): Range for selecting a fraction of bounding boxes to modify.
        font_size_fraction_range (tuple[float, float]): Range for selecting the font size as a fraction of
            bounding box height.
        font_color (list[str] | str): List of possible font colors or a single font color.
        clear_bg (bool): Whether to clear the background before rendering text.
        metadata_key (str): Key to access metadata in the parameters.
        p (float): Probability of applying the transform.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/danaaubakirova/doc-augmentation

    Examples:
        >>> import albumentations as A
        >>> transform = A.Compose([
            A.TextImage(
                font_path=Path("/path/to/font.ttf"),
                stopwords=["the", "is", "in"],
                augmentations=("insertion", "deletion"),
                fraction_range=(0.5, 1.0),
                font_size_fraction_range=(0.5, 0.9),
                font_color=["red", "green", "blue"],
                metadata_key="text_metadata",
                p=0.5
            )
        ])
        >>> transformed = transform(image=my_image, text_metadata=my_metadata)
        >>> image = transformed['image']
        # This will render text on `my_image` based on the metadata provided in `my_metadata`.
    """

    class InitSchema(BaseTransformInitSchema):
        font_path: str | Path
        stopwords: tuple[str, ...]
        augmentations: tuple[str | None, ...] | list[str | None]
        fraction_range: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
        font_size_fraction_range: Annotated[
            tuple[float, float],
            AfterValidator(nondecreasing),
            AfterValidator(check_01),
        ]
        font_color: list[ColorType | str] | ColorType | str
        clear_bg: bool
        metadata_key: str

    def __init__(
        self,
        font_path: str | Path,
        stopwords: tuple[str, ...] = ("the", "is", "in", "at", "of"),
        augmentations: tuple[Literal["insertion", "swap", "deletion"] | None] = (None,),
        fraction_range: tuple[float, float] = (1.0, 1.0),
        font_size_fraction_range: tuple[float, float] = (0.8, 0.9),
        font_color: list[ColorType | str] | ColorType | str = "black",
        clear_bg: bool = False,
        metadata_key: str = "textimage_metadata",
        always_apply: bool | None = None,
        p: float = 0.5,
    ) -> None:
        super().__init__(p=p, always_apply=always_apply)
        self.metadata_key = metadata_key
        self.font_path = font_path
        self.fraction_range = fraction_range
        self.stopwords = stopwords
        self.augmentations = list(augmentations)
        self.font_size_fraction_range = font_size_fraction_range
        self.font_color = font_color
        self.clear_bg = clear_bg

    @property
    def targets_as_params(self) -> list[str]:
        return [self.metadata_key]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "font_path",
            "stopwords",
            "augmentations",
            "fraction_range",
            "font_size_fraction_range",
            "font_color",
            "metadata_key",
            "clear_bg",
        )

    def random_aug(
        self,
        text: str,
        fraction: float,
        choice: Literal["insertion", "swap", "deletion"],
    ) -> str:
        words = [word for word in text.strip().split() if word]
        num_words = len(words)
        num_words_to_modify = max(1, int(fraction * num_words))

        if choice == "insertion":
            result_sentence = ftext.insert_random_stopwords(words, num_words_to_modify, self.stopwords)
        elif choice == "swap":
            result_sentence = ftext.swap_random_words(words, num_words_to_modify)
        elif choice == "deletion":
            result_sentence = ftext.delete_random_words(words, num_words_to_modify)
        else:
            raise ValueError("Invalid choice. Choose from 'insertion', 'swap', or 'deletion'.")

        result_sentence = re.sub(" +", " ", result_sentence).strip()
        return result_sentence if result_sentence != text else ""

    def preprocess_metadata(
        self,
        image: np.ndarray,
        bbox: tuple[float, float, float, float],
        text: str,
        bbox_index: int,
    ) -> dict[str, Any]:
        check_bboxes(np.array([bbox]))
        denormalized_bbox = denormalize_bboxes(np.array([bbox]), image.shape[:2])[0]

        x_min, y_min, x_max, y_max = (int(x) for x in denormalized_bbox[:4])
        bbox_height = y_max - y_min

        font_size_fraction = random.uniform(*self.font_size_fraction_range)

        font = ImageFont.truetype(str(self.font_path), int(font_size_fraction * bbox_height))

        if not self.augmentations or self.augmentations is None:
            augmented_text = text
        else:
            augmentation = random.choice(self.augmentations)

            augmented_text = text if augmentation is None else self.random_aug(text, 0.5, choice=augmentation)

        font_color = random.choice(self.font_color) if isinstance(self.font_color, list) else self.font_color

        return {
            "bbox_coords": (x_min, y_min, x_max, y_max),
            "bbox_index": bbox_index,
            "original_text": text,
            "text": augmented_text,
            "font": font,
            "font_color": font_color,
        }

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image = data["image"]

        metadata = data[self.metadata_key]

        if metadata == []:
            return {
                "overlay_data": [],
            }

        if isinstance(metadata, dict):
            metadata = [metadata]

        fraction = random.uniform(*self.fraction_range)

        num_bboxes_to_modify = int(len(metadata) * fraction)

        bbox_indices_to_update = random.sample(range(len(metadata)), num_bboxes_to_modify)

        overlay_data = [
            self.preprocess_metadata(image, metadata[bbox_index]["bbox"], metadata[bbox_index]["text"], bbox_index)
            for bbox_index in bbox_indices_to_update
        ]

        return {
            "overlay_data": overlay_data,
        }

    def apply(
        self,
        img: np.ndarray,
        overlay_data: list[dict[str, Any]],
        **params: Any,
    ) -> np.ndarray:
        return ftext.render_text(img, overlay_data, clear_bg=self.clear_bg)

    def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
        res = super().apply_with_params(params, *args, **kwargs)
        res["overlay_data"] = [
            {
                "bbox_coords": overlay["bbox_coords"],
                "text": overlay["text"],
                "original_text": overlay["original_text"],
                "bbox_index": overlay["bbox_index"],
                "font_color": overlay["font_color"],
            }
            for overlay in params["overlay_data"]
        ]

        return res
targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/text/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    font_path: str | Path
    stopwords: tuple[str, ...]
    augmentations: tuple[str | None, ...] | list[str | None]
    fraction_range: Annotated[tuple[float, float], AfterValidator(nondecreasing), AfterValidator(check_01)]
    font_size_fraction_range: Annotated[
        tuple[float, float],
        AfterValidator(nondecreasing),
        AfterValidator(check_01),
    ]
    font_color: list[ColorType | str] | ColorType | str
    clear_bg: bool
    metadata_key: str
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, overlay_data, **params)

Apply transform on image.

Source code in albumentations/augmentations/text/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    overlay_data: list[dict[str, Any]],
    **params: Any,
) -> np.ndarray:
    return ftext.render_text(img, overlay_data, clear_bg=self.clear_bg)
apply_with_params (self, params, *args, **kwargs)

Apply transforms with parameters.

Source code in albumentations/augmentations/text/transforms.py
Python
def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
    res = super().apply_with_params(params, *args, **kwargs)
    res["overlay_data"] = [
        {
            "bbox_coords": overlay["bbox_coords"],
            "text": overlay["text"],
            "original_text": overlay["original_text"],
            "bbox_index": overlay["bbox_index"],
            "font_color": overlay["font_color"],
        }
        for overlay in params["overlay_data"]
    ]

    return res
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/text/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image = data["image"]

    metadata = data[self.metadata_key]

    if metadata == []:
        return {
            "overlay_data": [],
        }

    if isinstance(metadata, dict):
        metadata = [metadata]

    fraction = random.uniform(*self.fraction_range)

    num_bboxes_to_modify = int(len(metadata) * fraction)

    bbox_indices_to_update = random.sample(range(len(metadata)), num_bboxes_to_modify)

    overlay_data = [
        self.preprocess_metadata(image, metadata[bbox_index]["bbox"], metadata[bbox_index]["text"], bbox_index)
        for bbox_index in bbox_indices_to_update
    ]

    return {
        "overlay_data": overlay_data,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/text/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "font_path",
        "stopwords",
        "augmentations",
        "fraction_range",
        "font_size_fraction_range",
        "font_color",
        "metadata_key",
        "clear_bg",
    )

transforms

class CLAHE (clip_limit=4.0, tile_grid_size=(8, 8), always_apply=None, p=0.5) [view source on GitHub]

Apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to the input image.

CLAHE is an advanced method of improving the contrast in an image. Unlike regular histogram equalization, which operates on the entire image, CLAHE operates on small regions (tiles) in the image. This results in a more balanced equalization, preventing over-amplification of contrast in areas with initially low contrast.

Parameters:

Name Type Description
clip_limit tuple[float, float] | float

Controls the contrast enhancement limit. - If a single float is provided, the range will be (1, clip_limit). - If a tuple of two floats is provided, it defines the range for random selection. Higher values allow for more contrast enhancement, but may also increase noise. Default: (1, 4)

tile_grid_size tuple[int, int]

Defines the number of tiles in the row and column directions. Format is (rows, columns). Smaller tile sizes can lead to more localized enhancements, while larger sizes give results closer to global histogram equalization. Default: (8, 8)

p float

Probability of applying the transform. Default: 0.5

Notes

  • Supports only RGB or grayscale images.
  • For color images, CLAHE is applied to the L channel in the LAB color space.
  • The clip limit determines the maximum slope of the cumulative histogram. A lower clip limit will result in more contrast limiting.
  • Tile grid size affects the adaptiveness of the method. More tiles increase local adaptiveness but can lead to an unnatural look if set too high.

Targets

image

Image types: uint8, float32

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.CLAHE(clip_limit=(1, 4), tile_grid_size=(8, 8), p=1.0)
>>> result = transform(image=image)
>>> clahe_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class CLAHE(ImageOnlyTransform):
    """Apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to the input image.

    CLAHE is an advanced method of improving the contrast in an image. Unlike regular histogram
    equalization, which operates on the entire image, CLAHE operates on small regions (tiles)
    in the image. This results in a more balanced equalization, preventing over-amplification
    of contrast in areas with initially low contrast.

    Args:
        clip_limit (tuple[float, float] | float): Controls the contrast enhancement limit.
            - If a single float is provided, the range will be (1, clip_limit).
            - If a tuple of two floats is provided, it defines the range for random selection.
            Higher values allow for more contrast enhancement, but may also increase noise.
            Default: (1, 4)

        tile_grid_size (tuple[int, int]): Defines the number of tiles in the row and column directions.
            Format is (rows, columns). Smaller tile sizes can lead to more localized enhancements,
            while larger sizes give results closer to global histogram equalization.
            Default: (8, 8)

        p (float): Probability of applying the transform. Default: 0.5

    Notes:
        - Supports only RGB or grayscale images.
        - For color images, CLAHE is applied to the L channel in the LAB color space.
        - The clip limit determines the maximum slope of the cumulative histogram. A lower
          clip limit will result in more contrast limiting.
        - Tile grid size affects the adaptiveness of the method. More tiles increase local
          adaptiveness but can lead to an unnatural look if set too high.

    Targets:
        image

    Image types:
        uint8, float32

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.CLAHE(clip_limit=(1, 4), tile_grid_size=(8, 8), p=1.0)
        >>> result = transform(image=image)
        >>> clahe_image = result["image"]

    References:
        - https://docs.opencv.org/master/d5/daf/tutorial_py_histogram_equalization.html
        - Zuiderveld, Karel. "Contrast Limited Adaptive Histogram Equalization."
          Graphic Gems IV. Academic Press Professional, Inc., 1994.
    """

    class InitSchema(BaseTransformInitSchema):
        clip_limit: OnePlusFloatRangeType
        tile_grid_size: Annotated[tuple[int, int], AfterValidator(check_1plus)]

    def __init__(
        self,
        clip_limit: ScaleFloatType = 4.0,
        tile_grid_size: tuple[int, int] = (8, 8),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.clip_limit = cast(Tuple[float, float], clip_limit)
        self.tile_grid_size = tile_grid_size

    def apply(self, img: np.ndarray, clip_limit: float, **params: Any) -> np.ndarray:
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "CLAHE transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)

        return fmain.clahe(img, clip_limit, self.tile_grid_size)

    def get_params(self) -> dict[str, float]:
        return {"clip_limit": random.uniform(*self.clip_limit)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("clip_limit", "tile_grid_size")
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    clip_limit: OnePlusFloatRangeType
    tile_grid_size: Annotated[tuple[int, int], AfterValidator(check_1plus)]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, clip_limit, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, clip_limit: float, **params: Any) -> np.ndarray:
    if not is_rgb_image(img) and not is_grayscale_image(img):
        msg = "CLAHE transformation expects 1-channel or 3-channel images."
        raise TypeError(msg)

    return fmain.clahe(img, clip_limit, self.tile_grid_size)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {"clip_limit": random.uniform(*self.clip_limit)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("clip_limit", "tile_grid_size")

class ChannelShuffle [view source on GitHub]

Randomly rearrange channels of the image.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ChannelShuffle(ImageOnlyTransform):
    """Randomly rearrange channels of the image.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    def apply(self, img: np.ndarray, channels_shuffled: tuple[int, ...], **params: Any) -> np.ndarray:
        return fmain.channel_shuffle(img, channels_shuffled)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        ch_arr = list(range(params["shape"][2]))
        ch_arr = random_utils.shuffle(ch_arr)
        return {"channels_shuffled": ch_arr}

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
apply (self, img, channels_shuffled, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, channels_shuffled: tuple[int, ...], **params: Any) -> np.ndarray:
    return fmain.channel_shuffle(img, channels_shuffled)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    ch_arr = list(range(params["shape"][2]))
    ch_arr = random_utils.shuffle(ch_arr)
    return {"channels_shuffled": ch_arr}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()

class ChromaticAberration (primary_distortion_limit=(-0.02, 0.02), secondary_distortion_limit=(-0.05, 0.05), mode='green_purple', interpolation=1, always_apply=None, p=0.5) [view source on GitHub]

Add lateral chromatic aberration by distorting the red and blue channels of the input image.

Chromatic aberration is an optical effect that occurs when a lens fails to focus all colors to the same point. This transform simulates this effect by applying different radial distortions to the red and blue channels of the image, while leaving the green channel unchanged.

Parameters:

Name Type Description
primary_distortion_limit tuple[float, float] | float

Range of the primary radial distortion coefficient. If a single float value is provided, the range will be (-primary_distortion_limit, primary_distortion_limit). This parameter controls the distortion in the center of the image: - Positive values result in pincushion distortion (edges bend inward) - Negative values result in barrel distortion (edges bend outward) Default: (-0.02, 0.02).

secondary_distortion_limit tuple[float, float] | float

Range of the secondary radial distortion coefficient. If a single float value is provided, the range will be (-secondary_distortion_limit, secondary_distortion_limit). This parameter controls the distortion in the corners of the image: - Positive values enhance pincushion distortion - Negative values enhance barrel distortion Default: (-0.05, 0.05).

mode Literal["green_purple", "red_blue", "random"]

Type of color fringing to apply. Options are: - 'green_purple': Distorts red and blue channels in opposite directions, creating green-purple fringing. - 'red_blue': Distorts red and blue channels in the same direction, creating red-blue fringing. - 'random': Randomly chooses between 'green_purple' and 'red_blue' modes for each application. Default: 'green_purple'.

interpolation InterpolationType

Flag specifying the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5.

Targets

image

Image types: uint8, float32

Note

  • This transform only affects RGB images. Grayscale images will raise an error.
  • The strength of the effect depends on both primary and secondary distortion limits.
  • Higher absolute values for distortion limits will result in more pronounced chromatic aberration.
  • The 'green_purple' mode tends to produce more noticeable effects than 'red_blue'.

Examples:

Python
>>> import albumentations as A
>>> import cv2
>>> transform = A.ChromaticAberration(
...     primary_distortion_limit=0.05,
...     secondary_distortion_limit=0.1,
...     mode='green_purple',
...     interpolation=cv2.INTER_LINEAR,
...     p=1.0
... )
>>> transformed = transform(image=image)
>>> aberrated_image = transformed['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ChromaticAberration(ImageOnlyTransform):
    """Add lateral chromatic aberration by distorting the red and blue channels of the input image.

    Chromatic aberration is an optical effect that occurs when a lens fails to focus all colors to the same point.
    This transform simulates this effect by applying different radial distortions to the red and blue channels
    of the image, while leaving the green channel unchanged.

    Args:
        primary_distortion_limit (tuple[float, float] | float): Range of the primary radial distortion coefficient.
            If a single float value is provided, the range
            will be (-primary_distortion_limit, primary_distortion_limit).
            This parameter controls the distortion in the center of the image:
            - Positive values result in pincushion distortion (edges bend inward)
            - Negative values result in barrel distortion (edges bend outward)
            Default: (-0.02, 0.02).

        secondary_distortion_limit (tuple[float, float] | float): Range of the secondary radial distortion coefficient.
            If a single float value is provided, the range
            will be (-secondary_distortion_limit, secondary_distortion_limit).
            This parameter controls the distortion in the corners of the image:
            - Positive values enhance pincushion distortion
            - Negative values enhance barrel distortion
            Default: (-0.05, 0.05).

        mode (Literal["green_purple", "red_blue", "random"]): Type of color fringing to apply. Options are:
            - 'green_purple': Distorts red and blue channels in opposite directions, creating green-purple fringing.
            - 'red_blue': Distorts red and blue channels in the same direction, creating red-blue fringing.
            - 'random': Randomly chooses between 'green_purple' and 'red_blue' modes for each application.
            Default: 'green_purple'.

        interpolation (InterpolationType): Flag specifying the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - This transform only affects RGB images. Grayscale images will raise an error.
        - The strength of the effect depends on both primary and secondary distortion limits.
        - Higher absolute values for distortion limits will result in more pronounced chromatic aberration.
        - The 'green_purple' mode tends to produce more noticeable effects than 'red_blue'.

    Example:
        >>> import albumentations as A
        >>> import cv2
        >>> transform = A.ChromaticAberration(
        ...     primary_distortion_limit=0.05,
        ...     secondary_distortion_limit=0.1,
        ...     mode='green_purple',
        ...     interpolation=cv2.INTER_LINEAR,
        ...     p=1.0
        ... )
        >>> transformed = transform(image=image)
        >>> aberrated_image = transformed['image']

    References:
        - https://en.wikipedia.org/wiki/Chromatic_aberration
        - https://www.researchgate.net/publication/320691320_Chromatic_Aberration_in_Digital_Images
    """

    class InitSchema(BaseTransformInitSchema):
        primary_distortion_limit: SymmetricRangeType
        secondary_distortion_limit: SymmetricRangeType
        mode: ChromaticAberrationMode
        interpolation: InterpolationType

    def __init__(
        self,
        primary_distortion_limit: ScaleFloatType = (-0.02, 0.02),
        secondary_distortion_limit: ScaleFloatType = (-0.05, 0.05),
        mode: ChromaticAberrationMode = "green_purple",
        interpolation: InterpolationType = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.primary_distortion_limit = cast(Tuple[float, float], primary_distortion_limit)
        self.secondary_distortion_limit = cast(Tuple[float, float], secondary_distortion_limit)
        self.mode = mode
        self.interpolation = interpolation

    def apply(
        self,
        img: np.ndarray,
        primary_distortion_red: float,
        secondary_distortion_red: float,
        primary_distortion_blue: float,
        secondary_distortion_blue: float,
        **params: Any,
    ) -> np.ndarray:
        non_rgb_error(img)
        return fmain.chromatic_aberration(
            img,
            primary_distortion_red,
            secondary_distortion_red,
            primary_distortion_blue,
            secondary_distortion_blue,
            self.interpolation,
        )

    def get_params(self) -> dict[str, float]:
        primary_distortion_red = random.uniform(*self.primary_distortion_limit)
        secondary_distortion_red = random.uniform(*self.secondary_distortion_limit)
        primary_distortion_blue = random.uniform(*self.primary_distortion_limit)
        secondary_distortion_blue = random.uniform(*self.secondary_distortion_limit)

        secondary_distortion_red = self._match_sign(primary_distortion_red, secondary_distortion_red)
        secondary_distortion_blue = self._match_sign(primary_distortion_blue, secondary_distortion_blue)

        if self.mode == "green_purple":
            # distortion coefficients of the red and blue channels have the same sign
            primary_distortion_blue = self._match_sign(primary_distortion_red, primary_distortion_blue)
            secondary_distortion_blue = self._match_sign(secondary_distortion_red, secondary_distortion_blue)
        if self.mode == "red_blue":
            # distortion coefficients of the red and blue channels have the opposite sign
            primary_distortion_blue = self._unmatch_sign(primary_distortion_red, primary_distortion_blue)
            secondary_distortion_blue = self._unmatch_sign(secondary_distortion_red, secondary_distortion_blue)

        return {
            "primary_distortion_red": primary_distortion_red,
            "secondary_distortion_red": secondary_distortion_red,
            "primary_distortion_blue": primary_distortion_blue,
            "secondary_distortion_blue": secondary_distortion_blue,
        }

    @staticmethod
    def _match_sign(a: float, b: float) -> float:
        # Match the sign of b to a
        if (a < 0 < b) or (a > 0 > b):
            return -b
        return b

    @staticmethod
    def _unmatch_sign(a: float, b: float) -> float:
        # Unmatch the sign of b to a
        if (a < 0 and b < 0) or (a > 0 and b > 0):
            return -b
        return b

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "primary_distortion_limit", "secondary_distortion_limit", "mode", "interpolation"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    primary_distortion_limit: SymmetricRangeType
    secondary_distortion_limit: SymmetricRangeType
    mode: ChromaticAberrationMode
    interpolation: InterpolationType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, primary_distortion_red, secondary_distortion_red, primary_distortion_blue, secondary_distortion_blue, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    primary_distortion_red: float,
    secondary_distortion_red: float,
    primary_distortion_blue: float,
    secondary_distortion_blue: float,
    **params: Any,
) -> np.ndarray:
    non_rgb_error(img)
    return fmain.chromatic_aberration(
        img,
        primary_distortion_red,
        secondary_distortion_red,
        primary_distortion_blue,
        secondary_distortion_blue,
        self.interpolation,
    )
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    primary_distortion_red = random.uniform(*self.primary_distortion_limit)
    secondary_distortion_red = random.uniform(*self.secondary_distortion_limit)
    primary_distortion_blue = random.uniform(*self.primary_distortion_limit)
    secondary_distortion_blue = random.uniform(*self.secondary_distortion_limit)

    secondary_distortion_red = self._match_sign(primary_distortion_red, secondary_distortion_red)
    secondary_distortion_blue = self._match_sign(primary_distortion_blue, secondary_distortion_blue)

    if self.mode == "green_purple":
        # distortion coefficients of the red and blue channels have the same sign
        primary_distortion_blue = self._match_sign(primary_distortion_red, primary_distortion_blue)
        secondary_distortion_blue = self._match_sign(secondary_distortion_red, secondary_distortion_blue)
    if self.mode == "red_blue":
        # distortion coefficients of the red and blue channels have the opposite sign
        primary_distortion_blue = self._unmatch_sign(primary_distortion_red, primary_distortion_blue)
        secondary_distortion_blue = self._unmatch_sign(secondary_distortion_red, secondary_distortion_blue)

    return {
        "primary_distortion_red": primary_distortion_red,
        "secondary_distortion_red": secondary_distortion_red,
        "primary_distortion_blue": primary_distortion_blue,
        "secondary_distortion_blue": secondary_distortion_blue,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return "primary_distortion_limit", "secondary_distortion_limit", "mode", "interpolation"

class ColorJitter (brightness=(0.8, 1.2), contrast=(0.8, 1.2), saturation=(0.8, 1.2), hue=(-0.5, 0.5), always_apply=None, p=0.5) [view source on GitHub]

Randomly changes the brightness, contrast, saturation, and hue of an image.

This transform is similar to torchvision's ColorJitter but with some differences due to the use of OpenCV instead of Pillow. The main differences are: 1. OpenCV and Pillow use different formulas to convert images to HSV format. 2. This implementation uses value saturation instead of uint8 overflow as in Pillow.

These differences may result in slightly different output compared to torchvision's ColorJitter.

Parameters:

Name Type Description
brightness tuple[float, float] | float

How much to jitter brightness. If float: The brightness factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]. If tuple: The brightness factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)

contrast tuple[float, float] | float

How much to jitter contrast. If float: The contrast factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]. If tuple: The contrast factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)

saturation tuple[float, float] | float

How much to jitter saturation. If float: The saturation factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]. If tuple: The saturation factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)

hue float or tuple of float (min, max

How much to jitter hue. If float: The hue factor is chosen uniformly from [-hue, hue]. Should have 0 <= hue <= 0.5. If tuple: The hue factor is sampled from the range specified. Values should be in range [-0.5, 0.5]. Default: (-0.5, 0.5)

p (float): Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Note

  • The order of application for these color transformations is random for each image.
  • The ranges for brightness, contrast, and saturation are applied as multiplicative factors.
  • The range for hue is applied as an additive factor.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1, p=1.0)
>>> result = transform(image=image)
>>> jittered_image = result['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ColorJitter(ImageOnlyTransform):
    """Randomly changes the brightness, contrast, saturation, and hue of an image.

    This transform is similar to torchvision's ColorJitter but with some differences due to the use of OpenCV
    instead of Pillow. The main differences are:
    1. OpenCV and Pillow use different formulas to convert images to HSV format.
    2. This implementation uses value saturation instead of uint8 overflow as in Pillow.

    These differences may result in slightly different output compared to torchvision's ColorJitter.

    Args:
        brightness (tuple[float, float] | float): How much to jitter brightness.
            If float:
                The brightness factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
            If tuple:
                The brightness factor is sampled from the range specified.
            Should be non-negative numbers.
            Default: (0.8, 1.2)

        contrast (tuple[float, float] | float): How much to jitter contrast.
            If float:
                The contrast factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
            If tuple:
                The contrast factor is sampled from the range specified.
            Should be non-negative numbers.
            Default: (0.8, 1.2)

        saturation (tuple[float, float] | float): How much to jitter saturation.
            If float:
                The saturation factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
            If tuple:
                The saturation factor is sampled from the range specified.
            Should be non-negative numbers.
            Default: (0.8, 1.2)

        hue (float or tuple of float (min, max)): How much to jitter hue.
            If float:
                The hue factor is chosen uniformly from [-hue, hue]. Should have 0 <= hue <= 0.5.
            If tuple:
                The hue factor is sampled from the range specified. Values should be in range [-0.5, 0.5].
            Default: (-0.5, 0.5)

         p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5


    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The order of application for these color transformations is random for each image.
        - The ranges for brightness, contrast, and saturation are applied as multiplicative factors.
        - The range for hue is applied as an additive factor.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1, p=1.0)
        >>> result = transform(image=image)
        >>> jittered_image = result['image']

    References:
        - https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ColorJitter
        - https://docs.opencv.org/3.4/de/d25/imgproc_color_conversions.html
    """

    class InitSchema(BaseTransformInitSchema):
        brightness: ScaleFloatType
        contrast: ScaleFloatType
        saturation: ScaleFloatType
        hue: ScaleFloatType

        @field_validator("brightness", "contrast", "saturation", "hue")
        @classmethod
        def check_ranges(cls, value: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
            if info.field_name == "hue":
                bounds = -0.5, 0.5
                bias = 0
                clip = False
            elif info.field_name in ["brightness", "contrast", "saturation"]:
                bounds = 0, float("inf")
                bias = 1
                clip = True

            if isinstance(value, numbers.Number):
                if value < 0:
                    raise ValueError(f"If {info.field_name} is a single number, it must be non negative.")
                left = bias - value
                if clip:
                    left = max(left, 0)
                value = (left, bias + value)
            elif isinstance(value, tuple) and len(value) == PAIR:
                check_range(value, *bounds, info.field_name)

            return cast(Tuple[float, float], value)

    def __init__(
        self,
        brightness: ScaleFloatType = (0.8, 1.2),
        contrast: ScaleFloatType = (0.8, 1.2),
        saturation: ScaleFloatType = (0.8, 1.2),
        hue: ScaleFloatType = (-0.5, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.brightness = cast(Tuple[float, float], brightness)
        self.contrast = cast(Tuple[float, float], contrast)
        self.saturation = cast(Tuple[float, float], saturation)
        self.hue = cast(Tuple[float, float], hue)

        self.transforms = [
            fmain.adjust_brightness_torchvision,
            fmain.adjust_contrast_torchvision,
            fmain.adjust_saturation_torchvision,
            fmain.adjust_hue_torchvision,
        ]

    def get_params(self) -> dict[str, Any]:
        brightness = random.uniform(*self.brightness)
        contrast = random.uniform(*self.contrast)
        saturation = random.uniform(*self.saturation)
        hue = random.uniform(*self.hue)

        order = [0, 1, 2, 3]
        order = random_utils.shuffle(order)

        return {
            "brightness": brightness,
            "contrast": contrast,
            "saturation": saturation,
            "hue": hue,
            "order": order,
        }

    def apply(
        self,
        img: np.ndarray,
        brightness: float,
        contrast: float,
        saturation: float,
        hue: float,
        order: list[int],
        **params: Any,
    ) -> np.ndarray:
        if order is None:
            order = [0, 1, 2, 3]
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "ColorJitter transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)
        color_transforms = [brightness, contrast, saturation, hue]
        for i in order:
            img = self.transforms[i](img, color_transforms[i])
        return img

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return "brightness", "contrast", "saturation", "hue"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    brightness: ScaleFloatType
    contrast: ScaleFloatType
    saturation: ScaleFloatType
    hue: ScaleFloatType

    @field_validator("brightness", "contrast", "saturation", "hue")
    @classmethod
    def check_ranges(cls, value: ScaleFloatType, info: ValidationInfo) -> tuple[float, float]:
        if info.field_name == "hue":
            bounds = -0.5, 0.5
            bias = 0
            clip = False
        elif info.field_name in ["brightness", "contrast", "saturation"]:
            bounds = 0, float("inf")
            bias = 1
            clip = True

        if isinstance(value, numbers.Number):
            if value < 0:
                raise ValueError(f"If {info.field_name} is a single number, it must be non negative.")
            left = bias - value
            if clip:
                left = max(left, 0)
            value = (left, bias + value)
        elif isinstance(value, tuple) and len(value) == PAIR:
            check_range(value, *bounds, info.field_name)

        return cast(Tuple[float, float], value)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, brightness, contrast, saturation, hue, order, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    brightness: float,
    contrast: float,
    saturation: float,
    hue: float,
    order: list[int],
    **params: Any,
) -> np.ndarray:
    if order is None:
        order = [0, 1, 2, 3]
    if not is_rgb_image(img) and not is_grayscale_image(img):
        msg = "ColorJitter transformation expects 1-channel or 3-channel images."
        raise TypeError(msg)
    color_transforms = [brightness, contrast, saturation, hue]
    for i in order:
        img = self.transforms[i](img, color_transforms[i])
    return img
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    brightness = random.uniform(*self.brightness)
    contrast = random.uniform(*self.contrast)
    saturation = random.uniform(*self.saturation)
    hue = random.uniform(*self.hue)

    order = [0, 1, 2, 3]
    order = random_utils.shuffle(order)

    return {
        "brightness": brightness,
        "contrast": contrast,
        "saturation": saturation,
        "hue": hue,
        "order": order,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return "brightness", "contrast", "saturation", "hue"

class Downscale (scale_min=None, scale_max=None, interpolation=None, scale_range=(0.25, 0.25), interpolation_pair={'upscale': 0, 'downscale': 0}, always_apply=None, p=0.5) [view source on GitHub]

Decrease image quality by downscaling and upscaling back.

This transform simulates the effect of a low-resolution image by first downscaling the image to a lower resolution and then upscaling it back to its original size. This process introduces loss of detail and can be used to simulate low-quality images or to test the robustness of models to different image resolutions.

Parameters:

Name Type Description
scale_range tuple[float, float]

Range for the downscaling factor. Should be two float values between 0 and 1, where the first value is less than or equal to the second. The actual downscaling factor will be randomly chosen from this range for each image. Lower values result in more aggressive downscaling. Default: (0.25, 0.25)

interpolation_pair InterpolationDict

A dictionary specifying the interpolation methods to use for downscaling and upscaling. Should contain two keys: - 'downscale': Interpolation method for downscaling - 'upscale': Interpolation method for upscaling Values should be OpenCV interpolation flags (e.g., cv2.INTER_NEAREST, cv2.INTER_LINEAR, etc.) Default: {'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_NEAREST}

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Note

  • The actual downscaling factor is randomly chosen for each image from the range specified in scale_range.
  • Using different interpolation methods for downscaling and upscaling can produce various effects. For example, using INTER_NEAREST for both can create a pixelated look, while using INTER_LINEAR or INTER_CUBIC can produce smoother results.
  • This transform can be useful for data augmentation, especially when training models that need to be robust to variations in image quality or resolution.

Examples:

Python
>>> import albumentations as A
>>> import cv2
>>> transform = A.Downscale(
...     scale_range=(0.5, 0.75),
...     interpolation_pair={'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_LINEAR},
...     p=0.5
... )
>>> transformed = transform(image=image)
>>> downscaled_image = transformed['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Downscale(ImageOnlyTransform):
    """Decrease image quality by downscaling and upscaling back.

    This transform simulates the effect of a low-resolution image by first downscaling
    the image to a lower resolution and then upscaling it back to its original size.
    This process introduces loss of detail and can be used to simulate low-quality
    images or to test the robustness of models to different image resolutions.

    Args:
        scale_range (tuple[float, float]): Range for the downscaling factor.
            Should be two float values between 0 and 1, where the first value is less than or equal to the second.
            The actual downscaling factor will be randomly chosen from this range for each image.
            Lower values result in more aggressive downscaling.
            Default: (0.25, 0.25)

        interpolation_pair (InterpolationDict): A dictionary specifying the interpolation methods to use for
            downscaling and upscaling. Should contain two keys:
            - 'downscale': Interpolation method for downscaling
            - 'upscale': Interpolation method for upscaling
            Values should be OpenCV interpolation flags (e.g., cv2.INTER_NEAREST, cv2.INTER_LINEAR, etc.)
            Default: {'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_NEAREST}

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The actual downscaling factor is randomly chosen for each image from the range
          specified in scale_range.
        - Using different interpolation methods for downscaling and upscaling can produce
          various effects. For example, using INTER_NEAREST for both can create a pixelated look,
          while using INTER_LINEAR or INTER_CUBIC can produce smoother results.
        - This transform can be useful for data augmentation, especially when training models
          that need to be robust to variations in image quality or resolution.

    Example:
        >>> import albumentations as A
        >>> import cv2
        >>> transform = A.Downscale(
        ...     scale_range=(0.5, 0.75),
        ...     interpolation_pair={'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_LINEAR},
        ...     p=0.5
        ... )
        >>> transformed = transform(image=image)
        >>> downscaled_image = transformed['image']
    """

    class InitSchema(BaseTransformInitSchema):
        scale_min: float | None
        scale_max: float | None

        interpolation: int | Interpolation | InterpolationDict | None = Field(
            default_factory=lambda: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST),
        )
        interpolation_pair: InterpolationPydantic

        scale_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

        @model_validator(mode="after")
        def validate_params(self) -> Self:
            if self.scale_min is not None and self.scale_max is not None:
                warn(
                    "scale_min and scale_max are deprecated. Use scale_range instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

                self.scale_range = (self.scale_min, self.scale_max)
                self.scale_min = None
                self.scale_max = None

            if self.interpolation is not None:
                warn(
                    "Downscale.interpolation is deprecated. Use Downscale.interpolation_pair instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

                if isinstance(self.interpolation, dict):
                    self.interpolation_pair = InterpolationPydantic(**self.interpolation)
                elif isinstance(self.interpolation, int):
                    self.interpolation_pair = InterpolationPydantic(
                        upscale=self.interpolation,
                        downscale=self.interpolation,
                    )
                elif isinstance(self.interpolation, Interpolation):
                    self.interpolation_pair = InterpolationPydantic(
                        upscale=self.interpolation.upscale,
                        downscale=self.interpolation.downscale,
                    )
                self.interpolation = None

            return self

    def __init__(
        self,
        scale_min: float | None = None,
        scale_max: float | None = None,
        interpolation: int | Interpolation | InterpolationDict | None = None,
        scale_range: tuple[float, float] = (0.25, 0.25),
        interpolation_pair: InterpolationDict = InterpolationDict(
            {"upscale": cv2.INTER_NEAREST, "downscale": cv2.INTER_NEAREST},
        ),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.scale_range = scale_range
        self.interpolation_pair = interpolation_pair

    def apply(self, img: np.ndarray, scale: float, **params: Any) -> np.ndarray:
        return fmain.downscale(
            img,
            scale=scale,
            down_interpolation=self.interpolation_pair["downscale"],
            up_interpolation=self.interpolation_pair["upscale"],
        )

    def get_params(self) -> dict[str, Any]:
        return {"scale": random.uniform(*self.scale_range)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "scale_range", "interpolation_pair"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale_min: float | None
    scale_max: float | None

    interpolation: int | Interpolation | InterpolationDict | None = Field(
        default_factory=lambda: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST),
    )
    interpolation_pair: InterpolationPydantic

    scale_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

    @model_validator(mode="after")
    def validate_params(self) -> Self:
        if self.scale_min is not None and self.scale_max is not None:
            warn(
                "scale_min and scale_max are deprecated. Use scale_range instead.",
                DeprecationWarning,
                stacklevel=2,
            )

            self.scale_range = (self.scale_min, self.scale_max)
            self.scale_min = None
            self.scale_max = None

        if self.interpolation is not None:
            warn(
                "Downscale.interpolation is deprecated. Use Downscale.interpolation_pair instead.",
                DeprecationWarning,
                stacklevel=2,
            )

            if isinstance(self.interpolation, dict):
                self.interpolation_pair = InterpolationPydantic(**self.interpolation)
            elif isinstance(self.interpolation, int):
                self.interpolation_pair = InterpolationPydantic(
                    upscale=self.interpolation,
                    downscale=self.interpolation,
                )
            elif isinstance(self.interpolation, Interpolation):
                self.interpolation_pair = InterpolationPydantic(
                    upscale=self.interpolation.upscale,
                    downscale=self.interpolation.downscale,
                )
            self.interpolation = None

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, scale, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, scale: float, **params: Any) -> np.ndarray:
    return fmain.downscale(
        img,
        scale=scale,
        down_interpolation=self.interpolation_pair["downscale"],
        up_interpolation=self.interpolation_pair["upscale"],
    )
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    return {"scale": random.uniform(*self.scale_range)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return "scale_range", "interpolation_pair"

class Emboss (alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=None, p=0.5) [view source on GitHub]

Apply embossing effect to the input image.

This transform creates an emboss effect by highlighting edges and creating a 3D-like texture in the image. It works by applying a specific convolution kernel to the image that emphasizes differences in adjacent pixel values.

Parameters:

Name Type Description
alpha tuple[float, float]

Range to choose the visibility of the embossed image. At 0, only the original image is visible, at 1.0 only its embossed version is visible. Values should be in the range [0, 1]. Alpha will be randomly selected from this range for each image. Default: (0.2, 0.5)

strength tuple[float, float]

Range to choose the strength of the embossing effect. Higher values create a more pronounced 3D effect. Values should be non-negative. Strength will be randomly selected from this range for each image. Default: (0.2, 0.7)

p float

Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Targets

image

Image types: uint8, float32

Note

  • The emboss effect is created using a 3x3 convolution kernel.
  • The 'alpha' parameter controls the blend between the original image and the embossed version. A higher alpha value will result in a more pronounced emboss effect.
  • The 'strength' parameter affects the intensity of the embossing. Higher strength values will create more contrast in the embossed areas, resulting in a stronger 3D-like effect.
  • This transform can be useful for creating artistic effects or for data augmentation in tasks where edge information is important.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Emboss(alpha=(0.2, 0.5), strength=(0.2, 0.7), p=0.5)
>>> result = transform(image=image)
>>> embossed_image = result['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Emboss(ImageOnlyTransform):
    """Apply embossing effect to the input image.

    This transform creates an emboss effect by highlighting edges and creating a 3D-like texture
    in the image. It works by applying a specific convolution kernel to the image that emphasizes
    differences in adjacent pixel values.

    Args:
        alpha (tuple[float, float]): Range to choose the visibility of the embossed image.
            At 0, only the original image is visible, at 1.0 only its embossed version is visible.
            Values should be in the range [0, 1].
            Alpha will be randomly selected from this range for each image.
            Default: (0.2, 0.5)

        strength (tuple[float, float]): Range to choose the strength of the embossing effect.
            Higher values create a more pronounced 3D effect.
            Values should be non-negative.
            Strength will be randomly selected from this range for each image.
            Default: (0.2, 0.7)

        p (float): Probability of applying the transform. Should be in the range [0, 1].
            Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The emboss effect is created using a 3x3 convolution kernel.
        - The 'alpha' parameter controls the blend between the original image and the embossed version.
          A higher alpha value will result in a more pronounced emboss effect.
        - The 'strength' parameter affects the intensity of the embossing. Higher strength values
          will create more contrast in the embossed areas, resulting in a stronger 3D-like effect.
        - This transform can be useful for creating artistic effects or for data augmentation
          in tasks where edge information is important.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.Emboss(alpha=(0.2, 0.5), strength=(0.2, 0.7), p=0.5)
        >>> result = transform(image=image)
        >>> embossed_image = result['image']

    References:
        - https://en.wikipedia.org/wiki/Image_embossing
        - https://www.researchgate.net/publication/303412455_Application_of_Emboss_Filtering_in_Image_Processing
    """

    class InitSchema(BaseTransformInitSchema):
        alpha: Annotated[tuple[float, float], AfterValidator(check_01)]
        strength: Annotated[tuple[float, float], AfterValidator(check_0plus)]

    def __init__(
        self,
        alpha: tuple[float, float] = (0.2, 0.5),
        strength: tuple[float, float] = (0.2, 0.7),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha
        self.strength = strength

    @staticmethod
    def __generate_emboss_matrix(alpha_sample: np.ndarray, strength_sample: np.ndarray) -> np.ndarray:
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [
                [-1 - strength_sample, 0 - strength_sample, 0],
                [0 - strength_sample, 1, 0 + strength_sample],
                [0, 0 + strength_sample, 1 + strength_sample],
            ],
            dtype=np.float32,
        )
        return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect

    def get_params(self) -> dict[str, np.ndarray]:
        alpha = random.uniform(*self.alpha)
        strength = random.uniform(*self.strength)
        emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
        return {"emboss_matrix": emboss_matrix}

    def apply(self, img: np.ndarray, emboss_matrix: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, emboss_matrix)

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("alpha", "strength")
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    alpha: Annotated[tuple[float, float], AfterValidator(check_01)]
    strength: Annotated[tuple[float, float], AfterValidator(check_0plus)]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, emboss_matrix, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, emboss_matrix: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, emboss_matrix)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, np.ndarray]:
    alpha = random.uniform(*self.alpha)
    strength = random.uniform(*self.strength)
    emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
    return {"emboss_matrix": emboss_matrix}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("alpha", "strength")

class Equalize (mode='cv', by_channels=True, mask=None, mask_params=(), always_apply=None, p=0.5) [view source on GitHub]

Equalize the image histogram.

This transform applies histogram equalization to the input image. Histogram equalization is a method in image processing of contrast adjustment using the image's histogram.

Parameters:

Name Type Description
mode Literal['cv', 'pil']

Use OpenCV or Pillow equalization method. Default: 'cv'

by_channels bool

If True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by Y channel. Default: True

mask np.ndarray, callable

If given, only the pixels selected by the mask are included in the analysis. Can be: - A 1-channel or 3-channel numpy array of the same size as the input image. - A callable (function) that generates a mask. The function should accept 'image' as its first argument, and can accept additional arguments specified in mask_params. Default: None

mask_params list[str]

Additional parameters to pass to the mask function. These parameters will be taken from the data dict passed to call. Default: ()

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Note

  • When mode='cv', OpenCV's equalizeHist() function is used.
  • When mode='pil', Pillow's equalize() function is used.
  • The 'by_channels' parameter determines whether equalization is applied to each color channel independently (True) or to the luminance channel only (False).
  • If a mask is provided as a numpy array, it should have the same height and width as the input image.
  • If a mask is provided as a function, it allows for dynamic mask generation based on the input image and additional parameters. This is useful for scenarios where the mask depends on the image content or external data (e.g., bounding boxes, segmentation masks).

Mask Function: When mask is a callable, it should have the following signature: mask_func(image, *args) -> np.ndarray

- image: The input image (numpy array)
- *args: Additional arguments as specified in mask_params

The function should return a numpy array of the same height and width as the input image,
where non-zero pixels indicate areas to be equalized.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> # Using a static mask
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> transform = A.Equalize(mask=mask, p=1.0)
>>> result = transform(image=image)
>>>
>>> # Using a dynamic mask function
>>> def mask_func(image, bboxes):
...     mask = np.ones_like(image[:, :, 0], dtype=np.uint8)
...     for bbox in bboxes:
...         x1, y1, x2, y2 = map(int, bbox)
...         mask[y1:y2, x1:x2] = 0  # Exclude areas inside bounding boxes
...     return mask
>>>
>>> transform = A.Equalize(mask=mask_func, mask_params=['bboxes'], p=1.0)
>>> bboxes = [(10, 10, 50, 50), (60, 60, 90, 90)]  # Example bounding boxes
>>> result = transform(image=image, bboxes=bboxes)

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Equalize(ImageOnlyTransform):
    """Equalize the image histogram.

    This transform applies histogram equalization to the input image. Histogram equalization
    is a method in image processing of contrast adjustment using the image's histogram.

    Args:
        mode (Literal['cv', 'pil']): Use OpenCV or Pillow equalization method.
            Default: 'cv'
        by_channels (bool): If True, use equalization by channels separately,
            else convert image to YCbCr representation and use equalization by `Y` channel.
            Default: True
        mask (np.ndarray, callable): If given, only the pixels selected by
            the mask are included in the analysis. Can be:
            - A 1-channel or 3-channel numpy array of the same size as the input image.
            - A callable (function) that generates a mask. The function should accept 'image'
              as its first argument, and can accept additional arguments specified in mask_params.
            Default: None
        mask_params (list[str]): Additional parameters to pass to the mask function.
            These parameters will be taken from the data dict passed to __call__.
            Default: ()
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - When mode='cv', OpenCV's equalizeHist() function is used.
        - When mode='pil', Pillow's equalize() function is used.
        - The 'by_channels' parameter determines whether equalization is applied to each color channel
          independently (True) or to the luminance channel only (False).
        - If a mask is provided as a numpy array, it should have the same height and width as the input image.
        - If a mask is provided as a function, it allows for dynamic mask generation based on the input image
          and additional parameters. This is useful for scenarios where the mask depends on the image content
          or external data (e.g., bounding boxes, segmentation masks).

    Mask Function:
        When mask is a callable, it should have the following signature:
        mask_func(image, *args) -> np.ndarray

        - image: The input image (numpy array)
        - *args: Additional arguments as specified in mask_params

        The function should return a numpy array of the same height and width as the input image,
        where non-zero pixels indicate areas to be equalized.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>>
        >>> # Using a static mask
        >>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
        >>> transform = A.Equalize(mask=mask, p=1.0)
        >>> result = transform(image=image)
        >>>
        >>> # Using a dynamic mask function
        >>> def mask_func(image, bboxes):
        ...     mask = np.ones_like(image[:, :, 0], dtype=np.uint8)
        ...     for bbox in bboxes:
        ...         x1, y1, x2, y2 = map(int, bbox)
        ...         mask[y1:y2, x1:x2] = 0  # Exclude areas inside bounding boxes
        ...     return mask
        >>>
        >>> transform = A.Equalize(mask=mask_func, mask_params=['bboxes'], p=1.0)
        >>> bboxes = [(10, 10, 50, 50), (60, 60, 90, 90)]  # Example bounding boxes
        >>> result = transform(image=image, bboxes=bboxes)

    References:
        - OpenCV equalizeHist: https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html#ga7e54091f0c937d49bf84152a16f76d6e
        - Pillow ImageOps.equalize: https://pillow.readthedocs.io/en/stable/reference/ImageOps.html#PIL.ImageOps.equalize
        - Histogram Equalization: https://en.wikipedia.org/wiki/Histogram_equalization
    """

    class InitSchema(BaseTransformInitSchema):
        mode: ImageMode
        by_channels: bool
        mask: np.ndarray | Callable[..., Any] | None
        mask_params: Sequence[str]

    def __init__(
        self,
        mode: ImageMode = "cv",
        by_channels: bool = True,
        mask: np.ndarray | Callable[..., Any] | None = None,
        mask_params: Sequence[str] = (),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.mode = mode
        self.by_channels = by_channels
        self.mask = mask
        self.mask_params = mask_params

    def apply(self, img: np.ndarray, mask: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.equalize(img, mode=self.mode, by_channels=self.by_channels, mask=mask)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        if not callable(self.mask):
            return {"mask": self.mask}

        mask_params = {"image": data["image"]}
        for key in self.mask_params:
            if key not in data:
                raise KeyError(f"Required parameter '{key}' for mask function is missing in data.")
            mask_params[key] = data[key]

        return {"mask": self.mask(**mask_params)}

    @property
    def targets_as_params(self) -> list[str]:
        return ["image", *list(self.mask_params)]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "mode", "by_channels", "mask", "mask_params"
targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    mode: ImageMode
    by_channels: bool
    mask: np.ndarray | Callable[..., Any] | None
    mask_params: Sequence[str]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, mask, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, mask: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.equalize(img, mode=self.mode, by_channels=self.by_channels, mask=mask)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    if not callable(self.mask):
        return {"mask": self.mask}

    mask_params = {"image": data["image"]}
    for key in self.mask_params:
        if key not in data:
            raise KeyError(f"Required parameter '{key}' for mask function is missing in data.")
        mask_params[key] = data[key]

    return {"mask": self.mask(**mask_params)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "mode", "by_channels", "mask", "mask_params"

class FancyPCA (alpha=0.1, p=0.5, always_apply=None) [view source on GitHub]

Apply Fancy PCA augmentation to the input image.

This augmentation technique applies PCA (Principal Component Analysis) to the image's color channels, then adds multiples of the principal components to the image, with magnitudes proportional to the corresponding eigenvalues times a random variable drawn from a Gaussian with mean 0 and standard deviation 'alpha'.

Parameters:

Name Type Description
alpha float or tuple of float

Standard deviation of the Gaussian distribution used to generate random noise for each principal component. If a single float is provided, it will be used for all channels. If a tuple of two floats (min, max) is provided, the standard deviation will be uniformly sampled from this range for each run. Default: 0.1.

always_apply bool

If True, the transform will always be applied. Default: False.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: any

Note

  • This augmentation is particularly effective for RGB images but can work with any number of channels.
  • For grayscale images, it applies a simplified version of the augmentation.
  • The transform preserves the mean of the image while adjusting the color/intensity variation.
  • This implementation is based on the paper by Krizhevsky et al. and is similar to the one used in the original AlexNet paper.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.FancyPCA(alpha=0.1, p=1.0)
>>> result = transform(image=image)
>>> augmented_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class FancyPCA(ImageOnlyTransform):
    """Apply Fancy PCA augmentation to the input image.

    This augmentation technique applies PCA (Principal Component Analysis) to the image's color channels,
    then adds multiples of the principal components to the image, with magnitudes proportional to the
    corresponding eigenvalues times a random variable drawn from a Gaussian with mean 0 and standard
    deviation 'alpha'.

    Args:
        alpha (float or tuple of float): Standard deviation of the Gaussian distribution used to generate
            random noise for each principal component. If a single float is provided, it will be used for
            all channels. If a tuple of two floats (min, max) is provided, the standard deviation will be
            uniformly sampled from this range for each run. Default: 0.1.
        always_apply (bool): If True, the transform will always be applied. Default: False.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        any

    Note:
        - This augmentation is particularly effective for RGB images but can work with any number of channels.
        - For grayscale images, it applies a simplified version of the augmentation.
        - The transform preserves the mean of the image while adjusting the color/intensity variation.
        - This implementation is based on the paper by Krizhevsky et al. and is similar to the one used
          in the original AlexNet paper.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.FancyPCA(alpha=0.1, p=1.0)
        >>> result = transform(image=image)
        >>> augmented_image = result["image"]

    References:
        - Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep
          convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
        - https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

    """

    class InitSchema(BaseTransformInitSchema):
        alpha: float = Field(ge=0)

    def __init__(self, alpha: float = 0.1, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha

    def apply(self, img: np.ndarray, alpha_vector: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.fancy_pca(img, alpha_vector)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        shape = params["shape"]
        num_channels = shape[-1] if len(shape) == NUM_MULTI_CHANNEL_DIMENSIONS else 1
        alpha_vector = random_utils.normal(0, self.alpha, num_channels).astype(np.float32)
        return {"alpha_vector": alpha_vector}

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("alpha",)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    alpha: float = Field(ge=0)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, alpha_vector, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, alpha_vector: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.fancy_pca(img, alpha_vector)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    shape = params["shape"]
    num_channels = shape[-1] if len(shape) == NUM_MULTI_CHANNEL_DIMENSIONS else 1
    alpha_vector = random_utils.normal(0, self.alpha, num_channels).astype(np.float32)
    return {"alpha_vector": alpha_vector}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str]:
    return ("alpha",)

class FromFloat (dtype='uint8', max_value=None, always_apply=None, p=1.0) [view source on GitHub]

Convert an image from floating point representation to the specified data type.

This transform is designed to convert images from a normalized floating-point representation (typically with values in the range [0, 1]) to other data types, scaling the values appropriately.

Parameters:

Name Type Description
dtype str

The desired output data type. Supported types include 'uint8', 'uint16', 'uint32'. Default: 'uint8'.

max_value float | None

The maximum value for the output dtype. If None, the transform will attempt to infer the maximum value based on the dtype. Default: None.

p float

Probability of applying the transform. Default: 1.0.

Targets

image

Image types: float32, float64

Note

  • This is the inverse transform for ToFloat.
  • Input images are expected to be in floating point format with values in the range [0, 1].
  • For integer output types (uint8, uint16, uint32), the function will scale the values to the appropriate range (e.g., 0-255 for uint8).
  • For float output types (float32, float64), the values will remain in the [0, 1] range.
  • The transform uses the from_float function internally, which ensures output values are within the valid range for the specified dtype.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> transform = A.FromFloat(dtype='uint8', max_value=None, p=1.0)
>>> image = np.random.rand(100, 100, 3).astype(np.float32)  # Float image in [0, 1] range
>>> result = transform(image=image)
>>> uint8_image = result['image']
>>> assert uint8_image.dtype == np.uint8
>>> assert uint8_image.min() >= 0 and uint8_image.max() <= 255

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class FromFloat(ImageOnlyTransform):
    """Convert an image from floating point representation to the specified data type.

    This transform is designed to convert images from a normalized floating-point representation
    (typically with values in the range [0, 1]) to other data types, scaling the values appropriately.

    Args:
        dtype (str): The desired output data type. Supported types include 'uint8', 'uint16',
                     'uint32'. Default: 'uint8'.
        max_value (float | None): The maximum value for the output dtype. If None, the transform
                                  will attempt to infer the maximum value based on the dtype.
                                  Default: None.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        float32, float64

    Note:
        - This is the inverse transform for ToFloat.
        - Input images are expected to be in floating point format with values in the range [0, 1].
        - For integer output types (uint8, uint16, uint32), the function will scale the values
          to the appropriate range (e.g., 0-255 for uint8).
        - For float output types (float32, float64), the values will remain in the [0, 1] range.
        - The transform uses the `from_float` function internally, which ensures output values
          are within the valid range for the specified dtype.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> transform = A.FromFloat(dtype='uint8', max_value=None, p=1.0)
        >>> image = np.random.rand(100, 100, 3).astype(np.float32)  # Float image in [0, 1] range
        >>> result = transform(image=image)
        >>> uint8_image = result['image']
        >>> assert uint8_image.dtype == np.uint8
        >>> assert uint8_image.min() >= 0 and uint8_image.max() <= 255

    """

    class InitSchema(BaseTransformInitSchema):
        dtype: Literal["uint8", "uint16", "float32", "float64"]
        max_value: float | None
        p: ProbabilityType = 1

    def __init__(
        self,
        dtype: Literal["uint8", "uint16", "float32", "float64"] = "uint8",
        max_value: float | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.dtype = np.dtype(dtype)
        self.max_value = max_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return from_float(img, self.dtype, self.max_value)

    def get_transform_init_args(self) -> dict[str, Any]:
        return {"dtype": self.dtype.name, "max_value": self.max_value}
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    dtype: Literal["uint8", "uint16", "float32", "float64"]
    max_value: float | None
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return from_float(img, self.dtype, self.max_value)

class GaussNoise (var_limit=(10.0, 50.0), mean=0, per_channel=True, noise_scale_factor=1, always_apply=None, p=0.5) [view source on GitHub]

Apply Gaussian noise to the input image.

Parameters:

Name Type Description
var_limit tuple[float, float] | float

Variance range for noise. If var_limit is a single float value, the range will be (0, var_limit). Default: (10.0, 50.0).

mean float

Mean of the noise. Default: 0.

per_channel bool

If True, noise will be sampled for each channel independently. Otherwise, the noise will be sampled once for all channels. Default: True.

noise_scale_factor float

Scaling factor for noise generation. Value should be in the range (0, 1]. When set to 1, noise is sampled for each pixel independently. If less, noise is sampled for a smaller size and resized to fit the shape of the image. Smaller values make the transform faster. Default: 1.0.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Returns:

Type Description
numpy.ndarray

Image with applied Gaussian noise.

Note

  • The noise is generated in the same range as the input image.
  • For uint8 input images, the noise is generated in the range [0, 255].
  • For float32 input images, the noise is generated in the range [0, 1].
  • The resulting image is clipped to keep its values in the input range.
  • Setting per_channel=False is faster but applies the same noise to all channels.
  • The noise_scale_factor parameter allows for a trade-off between transform speed and noise granularity.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
>>>
>>> # Apply Gaussian noise with default parameters
>>> transform = A.GaussNoise(p=1.0)
>>> noisy_image = transform(image=image)['image']
>>>
>>> # Apply Gaussian noise with custom variance range and mean
>>> transform = A.GaussNoise(var_limit=(50.0, 100.0), mean=10, p=1.0)
>>> noisy_image = transform(image=image)['image']
>>>
>>> # Apply the same noise to all channels
>>> transform = A.GaussNoise(per_channel=False, p=1.0)
>>> noisy_image = transform(image=image)['image']
>>>
>>> # Apply noise with reduced granularity for faster processing
>>> transform = A.GaussNoise(noise_scale_factor=0.5, p=1.0)
>>> noisy_image = transform(image=image)['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class GaussNoise(ImageOnlyTransform):
    """Apply Gaussian noise to the input image.

    Args:
        var_limit (tuple[float, float] | float): Variance range for noise. If var_limit is a single float value,
            the range will be (0, var_limit). Default: (10.0, 50.0).
        mean (float): Mean of the noise. Default: 0.
        per_channel (bool): If True, noise will be sampled for each channel independently.
            Otherwise, the noise will be sampled once for all channels. Default: True.
        noise_scale_factor (float): Scaling factor for noise generation. Value should be in the range (0, 1].
            When set to 1, noise is sampled for each pixel independently. If less, noise is sampled for a smaller size
            and resized to fit the shape of the image. Smaller values make the transform faster. Default: 1.0.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Returns:
        numpy.ndarray: Image with applied Gaussian noise.

    Note:
        - The noise is generated in the same range as the input image.
        - For uint8 input images, the noise is generated in the range [0, 255].
        - For float32 input images, the noise is generated in the range [0, 1].
        - The resulting image is clipped to keep its values in the input range.
        - Setting per_channel=False is faster but applies the same noise to all channels.
        - The noise_scale_factor parameter allows for a trade-off between transform speed and noise granularity.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
        >>>
        >>> # Apply Gaussian noise with default parameters
        >>> transform = A.GaussNoise(p=1.0)
        >>> noisy_image = transform(image=image)['image']
        >>>
        >>> # Apply Gaussian noise with custom variance range and mean
        >>> transform = A.GaussNoise(var_limit=(50.0, 100.0), mean=10, p=1.0)
        >>> noisy_image = transform(image=image)['image']
        >>>
        >>> # Apply the same noise to all channels
        >>> transform = A.GaussNoise(per_channel=False, p=1.0)
        >>> noisy_image = transform(image=image)['image']
        >>>
        >>> # Apply noise with reduced granularity for faster processing
        >>> transform = A.GaussNoise(noise_scale_factor=0.5, p=1.0)
        >>> noisy_image = transform(image=image)['image']

    """

    class InitSchema(BaseTransformInitSchema):
        var_limit: NonNegativeFloatRangeType
        mean: float
        per_channel: bool
        noise_scale_factor: float = Field(gt=0, le=1)

    def __init__(
        self,
        var_limit: ScaleFloatType = (10.0, 50.0),
        mean: float = 0,
        per_channel: bool = True,
        noise_scale_factor: float = 1,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.var_limit = cast(Tuple[float, float], var_limit)
        self.mean = mean
        self.per_channel = per_channel
        self.noise_scale_factor = noise_scale_factor

    def apply(self, img: np.ndarray, gauss: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.add_noise(img, gauss)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, float]:
        image = data["image"] if "image" in data else data["images"][0]
        var = random.uniform(*self.var_limit)
        sigma = math.sqrt(var)

        if self.per_channel:
            target_shape = image.shape
            if self.noise_scale_factor == 1:
                gauss = random_utils.normal(self.mean, sigma, target_shape)
            else:
                gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)
        else:
            target_shape = image.shape[:2]
            if self.noise_scale_factor == 1:
                gauss = random_utils.normal(self.mean, sigma, target_shape)
            else:
                gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)

            if image.ndim > MONO_CHANNEL_DIMENSIONS:
                gauss = np.expand_dims(gauss, -1)

        return {"gauss": gauss}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "var_limit", "per_channel", "mean", "noise_scale_factor"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    var_limit: NonNegativeFloatRangeType
    mean: float
    per_channel: bool
    noise_scale_factor: float = Field(gt=0, le=1)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, gauss, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, gauss: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.add_noise(img, gauss)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, float]:
    image = data["image"] if "image" in data else data["images"][0]
    var = random.uniform(*self.var_limit)
    sigma = math.sqrt(var)

    if self.per_channel:
        target_shape = image.shape
        if self.noise_scale_factor == 1:
            gauss = random_utils.normal(self.mean, sigma, target_shape)
        else:
            gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)
    else:
        target_shape = image.shape[:2]
        if self.noise_scale_factor == 1:
            gauss = random_utils.normal(self.mean, sigma, target_shape)
        else:
            gauss = fmain.generate_approx_gaussian_noise(target_shape, self.mean, sigma, self.noise_scale_factor)

        if image.ndim > MONO_CHANNEL_DIMENSIONS:
            gauss = np.expand_dims(gauss, -1)

    return {"gauss": gauss}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "var_limit", "per_channel", "mean", "noise_scale_factor"

class HueSaturationValue (hue_shift_limit=(-20, 20), sat_shift_limit=(-30, 30), val_shift_limit=(-20, 20), always_apply=None, p=0.5) [view source on GitHub]

Randomly change hue, saturation and value of the input image.

This transform adjusts the HSV (Hue, Saturation, Value) channels of an input RGB image. It allows for independent control over each channel, providing a wide range of color and brightness modifications.

Parameters:

Name Type Description
hue_shift_limit float | tuple[float, float]

Range for changing hue. If a single float value is provided, the range will be (-hue_shift_limit, hue_shift_limit). Values should be in the range [-180, 180]. Default: (-20, 20).

sat_shift_limit float | tuple[float, float]

Range for changing saturation. If a single float value is provided, the range will be (-sat_shift_limit, sat_shift_limit). Values should be in the range [-255, 255]. Default: (-30, 30).

val_shift_limit float | tuple[float, float]

Range for changing value (brightness). If a single float value is provided, the range will be (-val_shift_limit, val_shift_limit). Values should be in the range [-255, 255]. Default: (-20, 20).

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: 3

Note

  • The transform first converts the input RGB image to the HSV color space.
  • Each channel (Hue, Saturation, Value) is adjusted independently.
  • Hue is circular, so it wraps around at 180 degrees.
  • For float32 images, the shift values are applied as percentages of the full range.
  • This transform is particularly useful for color augmentation and simulating different lighting conditions.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.HueSaturationValue(
...     hue_shift_limit=20,
...     sat_shift_limit=30,
...     val_shift_limit=20,
...     p=0.7
... )
>>> result = transform(image=image)
>>> augmented_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class HueSaturationValue(ImageOnlyTransform):
    """Randomly change hue, saturation and value of the input image.

    This transform adjusts the HSV (Hue, Saturation, Value) channels of an input RGB image.
    It allows for independent control over each channel, providing a wide range of color
    and brightness modifications.

    Args:
        hue_shift_limit (float | tuple[float, float]): Range for changing hue.
            If a single float value is provided, the range will be (-hue_shift_limit, hue_shift_limit).
            Values should be in the range [-180, 180]. Default: (-20, 20).

        sat_shift_limit (float | tuple[float, float]): Range for changing saturation.
            If a single float value is provided, the range will be (-sat_shift_limit, sat_shift_limit).
            Values should be in the range [-255, 255]. Default: (-30, 30).

        val_shift_limit (float | tuple[float, float]): Range for changing value (brightness).
            If a single float value is provided, the range will be (-val_shift_limit, val_shift_limit).
            Values should be in the range [-255, 255]. Default: (-20, 20).

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        3

    Note:
        - The transform first converts the input RGB image to the HSV color space.
        - Each channel (Hue, Saturation, Value) is adjusted independently.
        - Hue is circular, so it wraps around at 180 degrees.
        - For float32 images, the shift values are applied as percentages of the full range.
        - This transform is particularly useful for color augmentation and simulating
          different lighting conditions.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.HueSaturationValue(
        ...     hue_shift_limit=20,
        ...     sat_shift_limit=30,
        ...     val_shift_limit=20,
        ...     p=0.7
        ... )
        >>> result = transform(image=image)
        >>> augmented_image = result["image"]

    References:
        - HSV color space: https://en.wikipedia.org/wiki/HSL_and_HSV
    """

    class InitSchema(BaseTransformInitSchema):
        hue_shift_limit: SymmetricRangeType
        sat_shift_limit: SymmetricRangeType
        val_shift_limit: SymmetricRangeType

    def __init__(
        self,
        hue_shift_limit: ScaleFloatType = (-20, 20),
        sat_shift_limit: ScaleFloatType = (-30, 30),
        val_shift_limit: ScaleFloatType = (-20, 20),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.hue_shift_limit = cast(Tuple[float, float], hue_shift_limit)
        self.sat_shift_limit = cast(Tuple[float, float], sat_shift_limit)
        self.val_shift_limit = cast(Tuple[float, float], val_shift_limit)

    def apply(
        self,
        img: np.ndarray,
        hue_shift: int,
        sat_shift: int,
        val_shift: int,
        **params: Any,
    ) -> np.ndarray:
        if not is_rgb_image(img) and not is_grayscale_image(img):
            msg = "HueSaturationValue transformation expects 1-channel or 3-channel images."
            raise TypeError(msg)
        return fmain.shift_hsv(img, hue_shift, sat_shift, val_shift)

    def get_params(self) -> dict[str, float]:
        return {
            "hue_shift": random.uniform(*self.hue_shift_limit),
            "sat_shift": random.uniform(*self.sat_shift_limit),
            "val_shift": random.uniform(*self.val_shift_limit),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "hue_shift_limit", "sat_shift_limit", "val_shift_limit"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    hue_shift_limit: SymmetricRangeType
    sat_shift_limit: SymmetricRangeType
    val_shift_limit: SymmetricRangeType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, hue_shift, sat_shift, val_shift, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    hue_shift: int,
    sat_shift: int,
    val_shift: int,
    **params: Any,
) -> np.ndarray:
    if not is_rgb_image(img) and not is_grayscale_image(img):
        msg = "HueSaturationValue transformation expects 1-channel or 3-channel images."
        raise TypeError(msg)
    return fmain.shift_hsv(img, hue_shift, sat_shift, val_shift)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {
        "hue_shift": random.uniform(*self.hue_shift_limit),
        "sat_shift": random.uniform(*self.sat_shift_limit),
        "val_shift": random.uniform(*self.val_shift_limit),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "hue_shift_limit", "sat_shift_limit", "val_shift_limit"

class ISONoise (color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=None, p=0.5) [view source on GitHub]

Applies camera sensor noise to the input image, simulating high ISO settings.

This transform adds random noise to an image, mimicking the effect of using high ISO settings in digital photography. It simulates two main components of ISO noise: 1. Color noise: random shifts in color hue 2. Luminance noise: random variations in pixel intensity

Parameters:

Name Type Description
color_shift tuple[float, float]

Range for changing color hue. Values should be in the range [0, 1], where 1 represents a full 360° hue rotation. Default: (0.01, 0.05)

intensity tuple[float, float]

Range for the noise intensity. Higher values increase the strength of both color and luminance noise. Default: (0.1, 0.5)

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: 3

Note

  • This transform only works with RGB images. It will raise a TypeError if applied to non-RGB images.
  • The color shift is applied in the HSV color space, affecting the hue channel.
  • Luminance noise is added to all channels independently.
  • This transform can be useful for data augmentation in low-light scenarios or when training models to be robust against noisy inputs.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ISONoise(color_shift=(0.01, 0.05), intensity=(0.1, 0.5), p=0.5)
>>> result = transform(image=image)
>>> noisy_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ISONoise(ImageOnlyTransform):
    """Applies camera sensor noise to the input image, simulating high ISO settings.

    This transform adds random noise to an image, mimicking the effect of using high ISO settings
    in digital photography. It simulates two main components of ISO noise:
    1. Color noise: random shifts in color hue
    2. Luminance noise: random variations in pixel intensity

    Args:
        color_shift (tuple[float, float]): Range for changing color hue.
            Values should be in the range [0, 1], where 1 represents a full 360° hue rotation.
            Default: (0.01, 0.05)

        intensity (tuple[float, float]): Range for the noise intensity.
            Higher values increase the strength of both color and luminance noise.
            Default: (0.1, 0.5)

        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        3

    Note:
        - This transform only works with RGB images. It will raise a TypeError if applied to
          non-RGB images.
        - The color shift is applied in the HSV color space, affecting the hue channel.
        - Luminance noise is added to all channels independently.
        - This transform can be useful for data augmentation in low-light scenarios or when
          training models to be robust against noisy inputs.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.ISONoise(color_shift=(0.01, 0.05), intensity=(0.1, 0.5), p=0.5)
        >>> result = transform(image=image)
        >>> noisy_image = result["image"]

    References:
        - ISO noise in digital photography:
          https://en.wikipedia.org/wiki/Image_noise#In_digital_cameras
    """

    class InitSchema(BaseTransformInitSchema):
        color_shift: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]
        intensity: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)]

    def __init__(
        self,
        color_shift: tuple[float, float] = (0.01, 0.05),
        intensity: tuple[float, float] = (0.1, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.intensity = intensity
        self.color_shift = color_shift

    def apply(
        self,
        img: np.ndarray,
        color_shift: float,
        intensity: float,
        random_seed: int,
        **params: Any,
    ) -> np.ndarray:
        non_rgb_error(img)
        return fmain.iso_noise(img, color_shift, intensity, np.random.RandomState(random_seed))

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        return {
            "color_shift": random.uniform(*self.color_shift),
            "intensity": random.uniform(*self.intensity),
            "random_seed": random_utils.get_random_seed(),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "intensity", "color_shift"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    color_shift: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]
    intensity: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, color_shift, intensity, random_seed, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    color_shift: float,
    intensity: float,
    random_seed: int,
    **params: Any,
) -> np.ndarray:
    non_rgb_error(img)
    return fmain.iso_noise(img, color_shift, intensity, np.random.RandomState(random_seed))
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    return {
        "color_shift": random.uniform(*self.color_shift),
        "intensity": random.uniform(*self.intensity),
        "random_seed": random_utils.get_random_seed(),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return "intensity", "color_shift"

class ImageCompression (quality_lower=None, quality_upper=None, compression_type='jpeg', quality_range=(99, 100), always_apply=None, p=0.5) [view source on GitHub]

Decrease image quality by applying JPEG or WebP compression.

This transform simulates the effect of saving an image with lower quality settings, which can introduce compression artifacts. It's useful for data augmentation and for testing model robustness against varying image qualities.

Parameters:

Name Type Description
quality_range tuple[int, int]

Range for the compression quality. The values should be in [1, 100] range, where: - 1 is the lowest quality (maximum compression) - 100 is the highest quality (minimum compression) Default: (99, 100)

compression_type Literal["jpeg", "webp"]

Type of compression to apply. - "jpeg": JPEG compression - "webp": WebP compression Default: "jpeg"

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • This transform expects images with 1, 3, or 4 channels.
  • For JPEG compression, alpha channels (4th channel) will be ignored.
  • WebP compression supports transparency (4 channels).
  • The actual file is not saved to disk; the compression is simulated in memory.
  • Lower quality values result in smaller file sizes but may introduce visible artifacts.
  • This transform can be useful for:
  • Data augmentation to improve model robustness
  • Testing how models perform on images of varying quality
  • Simulating images transmitted over low-bandwidth connections

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ImageCompression(quality_range=(50, 90), compression_type=0, p=1.0)
>>> result = transform(image=image)
>>> compressed_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ImageCompression(ImageOnlyTransform):
    """Decrease image quality by applying JPEG or WebP compression.

    This transform simulates the effect of saving an image with lower quality settings,
    which can introduce compression artifacts. It's useful for data augmentation and
    for testing model robustness against varying image qualities.

    Args:
        quality_range (tuple[int, int]): Range for the compression quality.
            The values should be in [1, 100] range, where:
            - 1 is the lowest quality (maximum compression)
            - 100 is the highest quality (minimum compression)
            Default: (99, 100)

        compression_type (Literal["jpeg", "webp"]): Type of compression to apply.
            - "jpeg": JPEG compression
            - "webp": WebP compression
            Default: "jpeg"

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - This transform expects images with 1, 3, or 4 channels.
        - For JPEG compression, alpha channels (4th channel) will be ignored.
        - WebP compression supports transparency (4 channels).
        - The actual file is not saved to disk; the compression is simulated in memory.
        - Lower quality values result in smaller file sizes but may introduce visible artifacts.
        - This transform can be useful for:
          * Data augmentation to improve model robustness
          * Testing how models perform on images of varying quality
          * Simulating images transmitted over low-bandwidth connections

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.ImageCompression(quality_range=(50, 90), compression_type=0, p=1.0)
        >>> result = transform(image=image)
        >>> compressed_image = result["image"]

    References:
        - JPEG compression: https://en.wikipedia.org/wiki/JPEG
        - WebP compression: https://developers.google.com/speed/webp
    """

    class InitSchema(BaseTransformInitSchema):
        quality_range: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)]

        quality_lower: int | None = Field(
            ge=1,
            le=100,
        )
        quality_upper: int | None = Field(
            ge=1,
            le=100,
        )
        compression_type: Literal["jpeg", "webp"]

        @model_validator(mode="after")
        def validate_ranges(self) -> Self:
            # Update the quality_range based on the non-None values of quality_lower and quality_upper
            if self.quality_lower is not None or self.quality_upper is not None:
                if self.quality_lower is not None:
                    warn(
                        "`quality_lower` is deprecated. Use `quality_range` as tuple"
                        " (quality_lower, quality_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.quality_upper is not None:
                    warn(
                        "`quality_upper` is deprecated. Use `quality_range` as tuple"
                        " (quality_lower, quality_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.quality_lower if self.quality_lower is not None else self.quality_range[0]
                upper = self.quality_upper if self.quality_upper is not None else self.quality_range[1]
                self.quality_range = (lower, upper)
                # Clear the deprecated individual quality settings
                self.quality_lower = None
                self.quality_upper = None

            # Validate the quality_range
            if not (1 <= self.quality_range[0] <= MAX_JPEG_QUALITY and 1 <= self.quality_range[1] <= MAX_JPEG_QUALITY):
                raise ValueError(f"Quality range values should be within [1, {MAX_JPEG_QUALITY}] range.")

            return self

    def __init__(
        self,
        quality_lower: int | None = None,
        quality_upper: int | None = None,
        compression_type: Literal["jpeg", "webp"] = "jpeg",
        quality_range: tuple[int, int] = (99, 100),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.quality_range = quality_range
        self.compression_type = compression_type

    def apply(self, img: np.ndarray, quality: int, image_type: Literal[".jpg", ".webp"], **params: Any) -> np.ndarray:
        return fmain.image_compression(img, quality, image_type)

    def get_params(self) -> dict[str, int | str]:
        if self.compression_type == "jpeg":
            image_type = ".jpg"
        elif self.compression_type == "webp":
            image_type = ".webp"
        else:
            raise ValueError(f"Unknown image compression type: {self.compression_type}")

        return {
            "quality": random.randint(*self.quality_range),
            "image_type": image_type,
        }

    def get_transform_init_args(self) -> dict[str, Any]:
        return {
            "quality_range": self.quality_range,
            "compression_type": self.compression_type,
        }
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    quality_range: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)]

    quality_lower: int | None = Field(
        ge=1,
        le=100,
    )
    quality_upper: int | None = Field(
        ge=1,
        le=100,
    )
    compression_type: Literal["jpeg", "webp"]

    @model_validator(mode="after")
    def validate_ranges(self) -> Self:
        # Update the quality_range based on the non-None values of quality_lower and quality_upper
        if self.quality_lower is not None or self.quality_upper is not None:
            if self.quality_lower is not None:
                warn(
                    "`quality_lower` is deprecated. Use `quality_range` as tuple"
                    " (quality_lower, quality_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            if self.quality_upper is not None:
                warn(
                    "`quality_upper` is deprecated. Use `quality_range` as tuple"
                    " (quality_lower, quality_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            lower = self.quality_lower if self.quality_lower is not None else self.quality_range[0]
            upper = self.quality_upper if self.quality_upper is not None else self.quality_range[1]
            self.quality_range = (lower, upper)
            # Clear the deprecated individual quality settings
            self.quality_lower = None
            self.quality_upper = None

        # Validate the quality_range
        if not (1 <= self.quality_range[0] <= MAX_JPEG_QUALITY and 1 <= self.quality_range[1] <= MAX_JPEG_QUALITY):
            raise ValueError(f"Quality range values should be within [1, {MAX_JPEG_QUALITY}] range.")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, quality, image_type, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, quality: int, image_type: Literal[".jpg", ".webp"], **params: Any) -> np.ndarray:
    return fmain.image_compression(img, quality, image_type)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, int | str]:
    if self.compression_type == "jpeg":
        image_type = ".jpg"
    elif self.compression_type == "webp":
        image_type = ".webp"
    else:
        raise ValueError(f"Unknown image compression type: {self.compression_type}")

    return {
        "quality": random.randint(*self.quality_range),
        "image_type": image_type,
    }

class InterpolationPydantic

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InterpolationPydantic(BaseModel):
    upscale: InterpolationType
    downscale: InterpolationType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class InvertImg [view source on GitHub]

Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.

Parameters:

Name Type Description
p

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InvertImg(ImageOnlyTransform):
    """Invert the input image by subtracting pixel values from max values of the image types,
    i.e., 255 for uint8 and 1.0 for float32.

    Args:
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    """

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.invert(img)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.invert(img)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()

class Lambda (image=None, mask=None, keypoints=None, bboxes=None, global_label=None, name=None, always_apply=None, p=1.0) [view source on GitHub]

A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

Parameters:

Name Type Description
image Callable[..., Any] | None

Image transformation function.

mask Callable[..., Any] | None

Mask transformation function.

keypoint

Keypoint transformation function.

bbox

BBox transformation function.

global_label Callable[..., Any] | None

Global label transformation function.

p float

probability of applying the transform. Default: 1.0.

Targets

image, mask, bboxes, keypoints, global_label

Image types: Any

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Lambda(NoOp):
    """A flexible transformation class for using user-defined transformation functions per targets.
    Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

    Args:
        image: Image transformation function.
        mask: Mask transformation function.
        keypoint: Keypoint transformation function.
        bbox: BBox transformation function.
        global_label: Global label transformation function.
        p: probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints, global_label

    Image types:
        Any

    """

    def __init__(
        self,
        image: Callable[..., Any] | None = None,
        mask: Callable[..., Any] | None = None,
        keypoints: Callable[..., Any] | None = None,
        bboxes: Callable[..., Any] | None = None,
        global_label: Callable[..., Any] | None = None,
        name: str | None = None,
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p, always_apply)

        self.name = name
        self.custom_apply_fns = {
            target_name: fmain.noop for target_name in ("image", "mask", "keypoints", "bboxes", "global_label")
        }
        for target_name, custom_apply_fn in {
            "image": image,
            "mask": mask,
            "keypoints": keypoints,
            "bboxes": bboxes,
            "global_label": global_label,
        }.items():
            if custom_apply_fn is not None:
                if isinstance(custom_apply_fn, LambdaType) and custom_apply_fn.__name__ == "<lambda>":
                    warnings.warn(
                        "Using lambda is incompatible with multiprocessing. "
                        "Consider using regular functions or partial().",
                        stacklevel=2,
                    )

                self.custom_apply_fns[target_name] = custom_apply_fn

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["image"]
        return fn(img, **params)

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["mask"]
        return fn(mask, **params)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        is_ndarray = True

        if not isinstance(bboxes, np.ndarray):
            is_ndarray = False
            bboxes = np.array(bboxes, dtype=np.float32)

        fn = self.custom_apply_fns["bboxes"]
        result = fn(bboxes, **params)

        if not is_ndarray:
            return result.tolist()

        return result

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        is_ndarray = True
        if not isinstance(keypoints, np.ndarray):
            is_ndarray = False
            keypoints = np.array(keypoints, dtype=np.float32)

        fn = self.custom_apply_fns["keypoints"]
        result = fn(keypoints, **params)

        if not is_ndarray:
            return result.tolist()

        return result

    def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
        fn = self.custom_apply_fns["global_label"]
        return fn(label, **params)

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    def to_dict_private(self) -> dict[str, Any]:
        if self.name is None:
            msg = (
                "To make a Lambda transform serializable you should provide the `name` argument, "
                "e.g. `Lambda(name='my_transform', image=<some func>, ...)`."
            )
            raise ValueError(msg)
        return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}

    def __repr__(self) -> str:
        state = {"name": self.name}
        state.update(self.custom_apply_fns.items())  # type: ignore[arg-type]
        state.update(self.get_base_init_args())
        return f"{self.__class__.__name__}({format_args(state)})"
__init__ (self, image=None, mask=None, keypoints=None, bboxes=None, global_label=None, name=None, always_apply=None, p=1.0) special

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/transforms.py
Python
def __init__(
    self,
    image: Callable[..., Any] | None = None,
    mask: Callable[..., Any] | None = None,
    keypoints: Callable[..., Any] | None = None,
    bboxes: Callable[..., Any] | None = None,
    global_label: Callable[..., Any] | None = None,
    name: str | None = None,
    always_apply: bool | None = None,
    p: float = 1.0,
):
    super().__init__(p, always_apply)

    self.name = name
    self.custom_apply_fns = {
        target_name: fmain.noop for target_name in ("image", "mask", "keypoints", "bboxes", "global_label")
    }
    for target_name, custom_apply_fn in {
        "image": image,
        "mask": mask,
        "keypoints": keypoints,
        "bboxes": bboxes,
        "global_label": global_label,
    }.items():
        if custom_apply_fn is not None:
            if isinstance(custom_apply_fn, LambdaType) and custom_apply_fn.__name__ == "<lambda>":
                warnings.warn(
                    "Using lambda is incompatible with multiprocessing. "
                    "Consider using regular functions or partial().",
                    stacklevel=2,
                )

            self.custom_apply_fns[target_name] = custom_apply_fn
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    fn = self.custom_apply_fns["image"]
    return fn(img, **params)

class Morphological (scale=(2, 3), operation='dilation', always_apply=None, p=0.5) [view source on GitHub]

Apply a morphological operation (dilation or erosion) to an image, with particular value for enhancing document scans.

Morphological operations modify the structure of the image. Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them. These operations are beneficial in document processing, for example: - Dilation helps in closing up gaps within text or making thin lines thicker, enhancing legibility for OCR (Optical Character Recognition). - Erosion can remove small white noise and detach connected objects, making the structure of larger objects more pronounced.

Parameters:

Name Type Description
scale int or tuple/list of int

Specifies the size of the structuring element (kernel) used for the operation. - If an integer is provided, a square kernel of that size will be used. - If a tuple or list is provided, it should contain two integers representing the minimum and maximum sizes for the dilation kernel.

operation str

The morphological operation to apply. Options are 'dilation' or 'erosion'. Default is 'dilation'.

p float

The probability of applying this transformation. Default is 0.5.

Targets

image, mask

Image types: uint8, float32

Examples:

Python
>>> import albumentations as A
>>> transform = A.Compose([
>>>     A.Morphological(scale=(2, 3), operation='dilation', p=0.5)
>>> ])
>>> image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Morphological(DualTransform):
    """Apply a morphological operation (dilation or erosion) to an image,
    with particular value for enhancing document scans.

    Morphological operations modify the structure of the image.
    Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them.
    These operations are beneficial in document processing, for example:
    - Dilation helps in closing up gaps within text or making thin lines thicker,
        enhancing legibility for OCR (Optical Character Recognition).
    - Erosion can remove small white noise and detach connected objects,
        making the structure of larger objects more pronounced.

    Args:
        scale (int or tuple/list of int): Specifies the size of the structuring element (kernel) used for the operation.
            - If an integer is provided, a square kernel of that size will be used.
            - If a tuple or list is provided, it should contain two integers representing the minimum
                and maximum sizes for the dilation kernel.
        operation (str, optional): The morphological operation to apply. Options are 'dilation' or 'erosion'.
            Default is 'dilation'.
        p (float, optional): The probability of applying this transformation. Default is 0.5.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
        https://github.com/facebookresearch/nougat

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        >>>     A.Morphological(scale=(2, 3), operation='dilation', p=0.5)
        >>> ])
        >>> image = transform(image=image)["image"]
    """

    _targets = (Targets.IMAGE, Targets.MASK)

    class InitSchema(BaseTransformInitSchema):
        scale: OnePlusIntRangeType = (2, 3)
        operation: MorphologyMode = "dilation"

    def __init__(
        self,
        scale: ScaleIntType = (2, 3),
        operation: MorphologyMode = "dilation",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.scale = cast(Tuple[int, int], scale)
        self.operation = operation

    def apply(self, img: np.ndarray, kernel: tuple[int, int], **params: Any) -> np.ndarray:
        return fmain.morphology(img, kernel, self.operation)

    def apply_to_mask(self, mask: np.ndarray, kernel: tuple[int, int], **params: Any) -> np.ndarray:
        return fmain.morphology(mask, kernel, self.operation)

    def get_params(self) -> dict[str, float]:
        return {
            "kernel": cv2.getStructuringElement(cv2.MORPH_ELLIPSE, self.scale),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("scale", "operation")

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
        }
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale: OnePlusIntRangeType = (2, 3)
    operation: MorphologyMode = "dilation"
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, kernel: tuple[int, int], **params: Any) -> np.ndarray:
    return fmain.morphology(img, kernel, self.operation)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {
        "kernel": cv2.getStructuringElement(cv2.MORPH_ELLIPSE, self.scale),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("scale", "operation")

class MultiplicativeNoise (multiplier=(0.9, 1.1), per_channel=False, elementwise=False, always_apply=None, p=0.5) [view source on GitHub]

Apply multiplicative noise to the input image.

This transform multiplies each pixel in the image by a random value or array of values, effectively creating a noise pattern that scales with the image intensity.

Parameters:

Name Type Description
multiplier tuple[float, float]

The range for the random multiplier. Defines the range from which the multiplier is sampled. Default: (0.9, 1.1)

per_channel bool

If True, use a different random multiplier for each channel. If False, use the same multiplier for all channels. Setting this to False is slightly faster. Default: False

elementwise bool

If True, generates a unique multiplier for each pixel. If False, generates a single multiplier (or one per channel if per_channel=True). Default: False

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • When elementwise=False and per_channel=False, a single multiplier is applied to the entire image.
  • When elementwise=False and per_channel=True, each channel gets a different multiplier.
  • When elementwise=True and per_channel=False, each pixel gets the same multiplier across all channels.
  • When elementwise=True and per_channel=True, each pixel in each channel gets a unique multiplier.
  • Setting per_channel=False is slightly faster, especially for larger images.
  • This transform can be used to simulate various lighting conditions or to create noise that scales with image intensity.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MultiplicativeNoise(multiplier=(0.9, 1.1), per_channel=True, p=1.0)
>>> result = transform(image=image)
>>> noisy_image = result["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class MultiplicativeNoise(ImageOnlyTransform):
    """Apply multiplicative noise to the input image.

    This transform multiplies each pixel in the image by a random value or array of values,
    effectively creating a noise pattern that scales with the image intensity.

    Args:
        multiplier (tuple[float, float]): The range for the random multiplier.
            Defines the range from which the multiplier is sampled.
            Default: (0.9, 1.1)

        per_channel (bool): If True, use a different random multiplier for each channel.
            If False, use the same multiplier for all channels.
            Setting this to False is slightly faster.
            Default: False

        elementwise (bool): If True, generates a unique multiplier for each pixel.
            If False, generates a single multiplier (or one per channel if per_channel=True).
            Default: False

        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - When elementwise=False and per_channel=False, a single multiplier is applied to the entire image.
        - When elementwise=False and per_channel=True, each channel gets a different multiplier.
        - When elementwise=True and per_channel=False, each pixel gets the same multiplier across all channels.
        - When elementwise=True and per_channel=True, each pixel in each channel gets a unique multiplier.
        - Setting per_channel=False is slightly faster, especially for larger images.
        - This transform can be used to simulate various lighting conditions or to create noise that
          scales with image intensity.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.MultiplicativeNoise(multiplier=(0.9, 1.1), per_channel=True, p=1.0)
        >>> result = transform(image=image)
        >>> noisy_image = result["image"]

    References:
        - Multiplicative noise: https://en.wikipedia.org/wiki/Multiplicative_noise
    """

    class InitSchema(BaseTransformInitSchema):
        multiplier: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)]
        per_channel: bool
        elementwise: bool

    def __init__(
        self,
        multiplier: ScaleFloatType = (0.9, 1.1),
        per_channel: bool = False,
        elementwise: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.multiplier = cast(Tuple[float, float], multiplier)
        self.elementwise = elementwise
        self.per_channel = per_channel

    def apply(
        self,
        img: np.ndarray,
        multiplier: float | np.ndarray,
        **kwargs: Any,
    ) -> np.ndarray:
        return multiply(img, multiplier)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        img = data["image"] if "image" in data else data["images"][0]
        num_channels = get_num_channels(img)

        if self.elementwise:
            shape = img.shape if self.per_channel else (*img.shape[:2], 1)
        else:
            shape = (num_channels,) if self.per_channel else (1,)

        multiplier = random_utils.uniform(self.multiplier[0], self.multiplier[1], shape).astype(np.float32)

        if not self.per_channel and num_channels > 1:
            # Replicate the multiplier for all channels if not per_channel
            multiplier = np.repeat(multiplier, num_channels, axis=-1)

        if not self.elementwise and self.per_channel:
            # Reshape to broadcast correctly when not elementwise but per_channel
            multiplier = multiplier.reshape(1, 1, -1)

        if multiplier.shape != img.shape:
            multiplier = multiplier.squeeze()

        return {"multiplier": multiplier}

    def get_transform_init_args_names(self) -> tuple[str, str, str]:
        return "multiplier", "elementwise", "per_channel"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    multiplier: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)]
    per_channel: bool
    elementwise: bool
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, multiplier, **kwargs)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    multiplier: float | np.ndarray,
    **kwargs: Any,
) -> np.ndarray:
    return multiply(img, multiplier)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    img = data["image"] if "image" in data else data["images"][0]
    num_channels = get_num_channels(img)

    if self.elementwise:
        shape = img.shape if self.per_channel else (*img.shape[:2], 1)
    else:
        shape = (num_channels,) if self.per_channel else (1,)

    multiplier = random_utils.uniform(self.multiplier[0], self.multiplier[1], shape).astype(np.float32)

    if not self.per_channel and num_channels > 1:
        # Replicate the multiplier for all channels if not per_channel
        multiplier = np.repeat(multiplier, num_channels, axis=-1)

    if not self.elementwise and self.per_channel:
        # Reshape to broadcast correctly when not elementwise but per_channel
        multiplier = multiplier.reshape(1, 1, -1)

    if multiplier.shape != img.shape:
        multiplier = multiplier.squeeze()

    return {"multiplier": multiplier}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str]:
    return "multiplier", "elementwise", "per_channel"

class Normalize (mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, normalization='standard', always_apply=None, p=1.0) [view source on GitHub]

Applies various normalization techniques to an image. The specific normalization technique can be selected with the normalization parameter.

Standard normalization is applied using the formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value). Other normalization techniques adjust the image based on global or per-channel statistics, or scale pixel values to a specified range.

Parameters:

Name Type Description
mean ColorType | None

Mean values for standard normalization. For "standard" normalization, the default values are ImageNet mean values: (0.485, 0.456, 0.406). For "inception" normalization, use mean values of (0.5, 0.5, 0.5).

std ColorType | None

Standard deviation values for standard normalization. For "standard" normalization, the default values are ImageNet standard deviation :(0.229, 0.224, 0.225). For "inception" normalization, use standard deviation values of (0.5, 0.5, 0.5).

max_pixel_value float | None

Maximum possible pixel value, used for scaling in standard normalization. Defaults to 255.0.

normalization Literal["standard", "image", "image_per_channel", "min_max", "min_max_per_channel", "inception"]) Specifies the normalization technique to apply. Defaults to "standard". - "standard"

Applies the formula (img - mean * max_pixel_value) / (std * max_pixel_value). The default mean and std are based on ImageNet. - "image": Normalizes the whole image based on its global mean and standard deviation. - "image_per_channel": Normalizes the image per channel based on each channel's mean and standard deviation. - "min_max": Scales the image pixel values to a [0, 1] range based on the global minimum and maximum pixel values. - "min_max_per_channel": Scales each channel of the image pixel values to a [0, 1] range based on the per-channel minimum and maximum pixel values.

p float

Probability of applying the transform. Defaults to 1.0.

Targets

image

Image types: uint8, float32

Note

  • For "standard" normalization, mean, std, and max_pixel_value must be provided.
  • For other normalization types, these parameters are ignored.
  • For inception normalization, use mean values of (0.5, 0.5, 0.5).
  • For YOLO normalization, use mean values of (0.5, 0.5, 0.5) and std values of (0, 0, 0).
  • This transform is often used as a final step in image preprocessing pipelines to prepare images for neural network input.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> # Standard ImageNet normalization
>>> transform = A.Normalize(
...     mean=(0.485, 0.456, 0.406),
...     std=(0.229, 0.224, 0.225),
...     max_pixel_value=255.0,
...     p=1.0
... )
>>> normalized_image = transform(image=image)["image"]
>>>
>>> # Min-max normalization
>>> transform_minmax = A.Normalize(normalization="min_max", p=1.0)
>>> normalized_image_minmax = transform_minmax(image=image)["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Normalize(ImageOnlyTransform):
    """Applies various normalization techniques to an image. The specific normalization technique can be selected
        with the `normalization` parameter.

    Standard normalization is applied using the formula:
        `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`.
        Other normalization techniques adjust the image based on global or per-channel statistics,
        or scale pixel values to a specified range.

    Args:
        mean (ColorType | None): Mean values for standard normalization.
            For "standard" normalization, the default values are ImageNet mean values: (0.485, 0.456, 0.406).
            For "inception" normalization, use mean values of (0.5, 0.5, 0.5).
        std (ColorType | None): Standard deviation values for standard normalization.
            For "standard" normalization, the default values are ImageNet standard deviation :(0.229, 0.224, 0.225).
            For "inception" normalization, use standard deviation values of (0.5, 0.5, 0.5).
        max_pixel_value (float | None): Maximum possible pixel value, used for scaling in standard normalization.
            Defaults to 255.0.
        normalization (Literal["standard", "image", "image_per_channel", "min_max", "min_max_per_channel", "inception"])
            Specifies the normalization technique to apply. Defaults to "standard".
            - "standard": Applies the formula `(img - mean * max_pixel_value) / (std * max_pixel_value)`.
                The default mean and std are based on ImageNet.
            - "image": Normalizes the whole image based on its global mean and standard deviation.
            - "image_per_channel": Normalizes the image per channel based on each channel's mean and standard deviation.
            - "min_max": Scales the image pixel values to a [0, 1] range based on the global
                minimum and maximum pixel values.
            - "min_max_per_channel": Scales each channel of the image pixel values to a [0, 1]
                range based on the per-channel minimum and maximum pixel values.

        p (float): Probability of applying the transform. Defaults to 1.0.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - For "standard" normalization, `mean`, `std`, and `max_pixel_value` must be provided.
        - For other normalization types, these parameters are ignored.
        - For inception normalization, use mean values of (0.5, 0.5, 0.5).
        - For YOLO normalization, use mean values of (0.5, 0.5, 0.5) and std values of (0, 0, 0).
        - This transform is often used as a final step in image preprocessing pipelines to
          prepare images for neural network input.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> # Standard ImageNet normalization
        >>> transform = A.Normalize(
        ...     mean=(0.485, 0.456, 0.406),
        ...     std=(0.229, 0.224, 0.225),
        ...     max_pixel_value=255.0,
        ...     p=1.0
        ... )
        >>> normalized_image = transform(image=image)["image"]
        >>>
        >>> # Min-max normalization
        >>> transform_minmax = A.Normalize(normalization="min_max", p=1.0)
        >>> normalized_image_minmax = transform_minmax(image=image)["image"]

    References:
        - ImageNet mean and std: https://pytorch.org/vision/stable/models.html
        - Inception preprocessing: https://keras.io/api/applications/inceptionv3/
    """

    class InitSchema(BaseTransformInitSchema):
        mean: ColorType | None
        std: ColorType | None
        max_pixel_value: float | None
        normalization: Literal[
            "standard",
            "image",
            "image_per_channel",
            "min_max",
            "min_max_per_channel",
        ]

        @model_validator(mode="after")
        def validate_normalization(self) -> Self:
            if (
                self.mean is None
                or self.std is None
                or self.max_pixel_value is None
                and self.normalization == "standard"
            ):
                raise ValueError("mean, std, and max_pixel_value must be provided for standard normalization.")
            return self

    def __init__(
        self,
        mean: ColorType | None = (0.485, 0.456, 0.406),
        std: ColorType | None = (0.229, 0.224, 0.225),
        max_pixel_value: float | None = 255.0,
        normalization: Literal["standard", "image", "image_per_channel", "min_max", "min_max_per_channel"] = "standard",
        always_apply: bool | None = None,
        p: float = 1.0,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.mean = mean
        self.mean_np = np.array(mean, dtype=np.float32) * max_pixel_value
        self.std = std
        self.denominator = np.reciprocal(np.array(std, dtype=np.float32) * max_pixel_value)
        self.max_pixel_value = max_pixel_value
        self.normalization = normalization

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if self.normalization == "standard":
            return normalize(
                img,
                self.mean_np,
                self.denominator,
            )
        return normalize_per_image(img, self.normalization)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "mean", "std", "max_pixel_value", "normalization"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    mean: ColorType | None
    std: ColorType | None
    max_pixel_value: float | None
    normalization: Literal[
        "standard",
        "image",
        "image_per_channel",
        "min_max",
        "min_max_per_channel",
    ]

    @model_validator(mode="after")
    def validate_normalization(self) -> Self:
        if (
            self.mean is None
            or self.std is None
            or self.max_pixel_value is None
            and self.normalization == "standard"
        ):
            raise ValueError("mean, std, and max_pixel_value must be provided for standard normalization.")
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if self.normalization == "standard":
        return normalize(
            img,
            self.mean_np,
            self.denominator,
        )
    return normalize_per_image(img, self.normalization)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "mean", "std", "max_pixel_value", "normalization"

class PixelDropout (dropout_prob=0.01, per_channel=False, drop_value=0, mask_drop_value=None, always_apply=None, p=0.5) [view source on GitHub]

Set pixels to 0 with some probability.

Parameters:

Name Type Description
dropout_prob float

pixel drop probability. Default: 0.01

per_channel bool

if set to True drop mask will be sampled for each channel, otherwise the same mask will be sampled for all channels. Default: False

drop_value number or sequence of numbers or None

Value that will be set in dropped place. If set to None value will be sampled randomly, default ranges will be used: - uint8 - [0, 255] - uint16 - [0, 65535] - uint32 - [0, 4294967295] - float, double - [0, 1] Default: 0

mask_drop_value number or sequence of numbers or None

Value that will be set in dropped place in masks. If set to None masks will be unchanged. Default: 0

p float

probability of applying the transform. Default: 0.5.

Targets

image, mask

Image types: any

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class PixelDropout(DualTransform):
    """Set pixels to 0 with some probability.

    Args:
        dropout_prob (float): pixel drop probability. Default: 0.01
        per_channel (bool): if set to `True` drop mask will be sampled for each channel,
            otherwise the same mask will be sampled for all channels. Default: False
        drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
            If set to None value will be sampled randomly, default ranges will be used:
                - uint8 - [0, 255]
                - uint16 - [0, 65535]
                - uint32 - [0, 4294967295]
                - float, double - [0, 1]
            Default: 0
        mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
            If set to None masks will be unchanged. Default: 0
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask
    Image types:
        any

    """

    class InitSchema(BaseTransformInitSchema):
        dropout_prob: ProbabilityType = 0.01
        per_channel: bool = Field(default=False, description="Sample drop mask per channel.")
        drop_value: ScaleFloatType | None = Field(
            default=0,
            description="Value to set in dropped pixels. None for random sampling.",
        )
        mask_drop_value: ScaleFloatType | None = Field(
            default=None,
            description="Value to set in dropped pixels in masks. None to leave masks unchanged.",
        )

        @model_validator(mode="after")
        def validate_mask_drop_value(self) -> Self:
            if self.mask_drop_value is not None and self.per_channel:
                msg = "PixelDropout supports mask only with per_channel=False."
                raise ValueError(msg)
            return self

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(
        self,
        dropout_prob: float = 0.01,
        per_channel: bool = False,
        drop_value: ScaleFloatType | None = 0,
        mask_drop_value: ScaleFloatType | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.dropout_prob = dropout_prob
        self.per_channel = per_channel
        self.drop_value = drop_value
        self.mask_drop_value = mask_drop_value

    def apply(
        self,
        img: np.ndarray,
        drop_mask: np.ndarray,
        drop_value: float | Sequence[float],
        **params: Any,
    ) -> np.ndarray:
        return fmain.pixel_dropout(img, drop_mask, drop_value)

    def apply_to_mask(self, mask: np.ndarray, drop_mask: np.ndarray, **params: Any) -> np.ndarray:
        if self.mask_drop_value is None:
            return mask

        if mask.ndim == MONO_CHANNEL_DIMENSIONS:
            drop_mask = np.squeeze(drop_mask)

        return fmain.pixel_dropout(mask, drop_mask, self.mask_drop_value)

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        return bboxes

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        return keypoints

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        img = data["image"] if "image" in data else data["images"][0]
        shape = img.shape if self.per_channel else img.shape[:2]

        rnd = np.random.RandomState(random.randint(0, 1 << 31))
        # Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
        drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])

        drop_value: float | Sequence[float] | np.ndarray
        if drop_mask.ndim != img.ndim:
            drop_mask = np.expand_dims(drop_mask, -1)
        if self.drop_value is None:
            drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])

            if img.dtype in (np.uint8, np.uint16, np.uint32):
                drop_value = rnd.randint(0, int(MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
            elif img.dtype in [np.float32, np.double]:
                drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtype)
            else:
                raise ValueError(f"Unsupported dtype: {img.dtype}")
        else:
            drop_value = self.drop_value

        return {"drop_mask": drop_mask, "drop_value": drop_value}

    def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
        return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    dropout_prob: ProbabilityType = 0.01
    per_channel: bool = Field(default=False, description="Sample drop mask per channel.")
    drop_value: ScaleFloatType | None = Field(
        default=0,
        description="Value to set in dropped pixels. None for random sampling.",
    )
    mask_drop_value: ScaleFloatType | None = Field(
        default=None,
        description="Value to set in dropped pixels in masks. None to leave masks unchanged.",
    )

    @model_validator(mode="after")
    def validate_mask_drop_value(self) -> Self:
        if self.mask_drop_value is not None and self.per_channel:
            msg = "PixelDropout supports mask only with per_channel=False."
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, drop_mask, drop_value, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    drop_mask: np.ndarray,
    drop_value: float | Sequence[float],
    **params: Any,
) -> np.ndarray:
    return fmain.pixel_dropout(img, drop_mask, drop_value)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    img = data["image"] if "image" in data else data["images"][0]
    shape = img.shape if self.per_channel else img.shape[:2]

    rnd = np.random.RandomState(random.randint(0, 1 << 31))
    # Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
    drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])

    drop_value: float | Sequence[float] | np.ndarray
    if drop_mask.ndim != img.ndim:
        drop_mask = np.expand_dims(drop_mask, -1)
    if self.drop_value is None:
        drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])

        if img.dtype in (np.uint8, np.uint16, np.uint32):
            drop_value = rnd.randint(0, int(MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
        elif img.dtype in [np.float32, np.double]:
            drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtype)
        else:
            raise ValueError(f"Unsupported dtype: {img.dtype}")
    else:
        drop_value = self.drop_value

    return {"drop_mask": drop_mask, "drop_value": drop_value}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
    return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")

class PlanckianJitter (mode='blackbody', temperature_limit=None, sampling_method='uniform', always_apply=None, p=0.5) [view source on GitHub]

Applies Planckian Jitter to the input image, simulating color temperature variations in illumination.

This transform adjusts the color of an image to mimic the effect of different color temperatures of light sources, based on Planck's law of black body radiation. It can simulate the appearance of an image under various lighting conditions, from warm (reddish) to cool (bluish) color casts.

PlanckianJitter vs. ColorJitter: PlanckianJitter is fundamentally different from ColorJitter in its approach and use cases: 1. Physics-based: PlanckianJitter is grounded in the physics of light, simulating real-world color temperature changes. ColorJitter applies arbitrary color adjustments. 2. Natural effects: This transform produces color shifts that correspond to natural lighting variations, making it ideal for outdoor scene simulation or color constancy problems. 3. Single parameter: Color changes are controlled by a single, physically meaningful parameter (color temperature), unlike ColorJitter's multiple abstract parameters. 4. Correlated changes: Color shifts are correlated across channels in a way that mimics natural light, whereas ColorJitter can make independent channel adjustments.

When to use PlanckianJitter: - Simulating different times of day or lighting conditions in outdoor scenes - Augmenting data for computer vision tasks that need to be robust to natural lighting changes - Preparing synthetic data to better match real-world lighting variations - Color constancy research or applications - When you need physically plausible color variations rather than arbitrary color changes

The logic behind PlanckianJitter: As the color temperature increases: 1. Lower temperatures (around 3000K) produce warm, reddish tones, simulating sunset or incandescent lighting. 2. Mid-range temperatures (around 5500K) correspond to daylight. 3. Higher temperatures (above 7000K) result in cool, bluish tones, similar to overcast sky or shade. This progression mimics the natural variation of sunlight throughout the day and in different weather conditions.

Parameters:

Name Type Description
mode Literal["blackbody", "cied"]

The mode of the transformation. - "blackbody": Simulates blackbody radiation color changes. - "cied": Uses the CIE D illuminant series for color temperature simulation. Default: "blackbody"

temperature_range tuple[int, int] | None

The range of color temperatures (in Kelvin) to sample from. - For "blackbody" mode: Should be within [3000K, 15000K]. Default: (3000, 15000) - For "cied" mode: Should be within [4000K, 15000K]. Default: (4000, 15000) If None, the default ranges will be used based on the selected mode. Higher temperatures produce cooler (bluish) images, lower temperatures produce warmer (reddish) images.

sampling_method Literal["uniform", "gaussian"]

Method to sample the temperature. - "uniform": Samples uniformly across the specified range. - "gaussian": Samples from a Gaussian distribution centered at 6500K (approximate daylight). Default: "uniform"

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The transform preserves the overall brightness of the image while shifting its color.
  • The "blackbody" mode provides a wider range of color shifts, especially in the lower (warmer) temperatures.
  • The "cied" mode is based on standard illuminants and may provide more realistic daylight variations.
  • The Gaussian sampling method tends to produce more subtle variations, as it's centered around daylight.
  • Unlike ColorJitter, this transform ensures that color changes are physically plausible and correlated across channels, maintaining the natural appearance of the scene under different lighting conditions.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> transform = A.PlanckianJitter(mode="blackbody",
...                               temperature_range=(3000, 9000),
...                               sampling_method="uniform",
...                               p=1.0)
>>> result = transform(image=image)
>>> jittered_image = result["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class PlanckianJitter(ImageOnlyTransform):
    """Applies Planckian Jitter to the input image, simulating color temperature variations in illumination.

    This transform adjusts the color of an image to mimic the effect of different color temperatures
    of light sources, based on Planck's law of black body radiation. It can simulate the appearance
    of an image under various lighting conditions, from warm (reddish) to cool (bluish) color casts.

    PlanckianJitter vs. ColorJitter:
    PlanckianJitter is fundamentally different from ColorJitter in its approach and use cases:
    1. Physics-based: PlanckianJitter is grounded in the physics of light, simulating real-world
       color temperature changes. ColorJitter applies arbitrary color adjustments.
    2. Natural effects: This transform produces color shifts that correspond to natural lighting
       variations, making it ideal for outdoor scene simulation or color constancy problems.
    3. Single parameter: Color changes are controlled by a single, physically meaningful parameter
       (color temperature), unlike ColorJitter's multiple abstract parameters.
    4. Correlated changes: Color shifts are correlated across channels in a way that mimics natural
       light, whereas ColorJitter can make independent channel adjustments.

    When to use PlanckianJitter:
    - Simulating different times of day or lighting conditions in outdoor scenes
    - Augmenting data for computer vision tasks that need to be robust to natural lighting changes
    - Preparing synthetic data to better match real-world lighting variations
    - Color constancy research or applications
    - When you need physically plausible color variations rather than arbitrary color changes

    The logic behind PlanckianJitter:
    As the color temperature increases:
    1. Lower temperatures (around 3000K) produce warm, reddish tones, simulating sunset or incandescent lighting.
    2. Mid-range temperatures (around 5500K) correspond to daylight.
    3. Higher temperatures (above 7000K) result in cool, bluish tones, similar to overcast sky or shade.
    This progression mimics the natural variation of sunlight throughout the day and in different weather conditions.

    Args:
        mode (Literal["blackbody", "cied"]): The mode of the transformation.
            - "blackbody": Simulates blackbody radiation color changes.
            - "cied": Uses the CIE D illuminant series for color temperature simulation.
            Default: "blackbody"

        temperature_range (tuple[int, int] | None): The range of color temperatures (in Kelvin) to sample from.
            - For "blackbody" mode: Should be within [3000K, 15000K]. Default: (3000, 15000)
            - For "cied" mode: Should be within [4000K, 15000K]. Default: (4000, 15000)
            If None, the default ranges will be used based on the selected mode.
            Higher temperatures produce cooler (bluish) images, lower temperatures produce warmer (reddish) images.

        sampling_method (Literal["uniform", "gaussian"]): Method to sample the temperature.
            - "uniform": Samples uniformly across the specified range.
            - "gaussian": Samples from a Gaussian distribution centered at 6500K (approximate daylight).
            Default: "uniform"

        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The transform preserves the overall brightness of the image while shifting its color.
        - The "blackbody" mode provides a wider range of color shifts, especially in the lower (warmer) temperatures.
        - The "cied" mode is based on standard illuminants and may provide more realistic daylight variations.
        - The Gaussian sampling method tends to produce more subtle variations, as it's centered around daylight.
        - Unlike ColorJitter, this transform ensures that color changes are physically plausible and correlated
          across channels, maintaining the natural appearance of the scene under different lighting conditions.

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> transform = A.PlanckianJitter(mode="blackbody",
        ...                               temperature_range=(3000, 9000),
        ...                               sampling_method="uniform",
        ...                               p=1.0)
        >>> result = transform(image=image)
        >>> jittered_image = result["image"]

    References:
        - Planck's law: https://en.wikipedia.org/wiki/Planck%27s_law
        - CIE Standard Illuminants: https://en.wikipedia.org/wiki/Standard_illuminant
        - Color temperature: https://en.wikipedia.org/wiki/Color_temperature
        - Implementation inspired by: https://github.com/TheZino/PlanckianJitter
    """

    class InitSchema(BaseTransformInitSchema):
        mode: PlanckianJitterMode
        temperature_limit: Annotated[tuple[int, int], AfterValidator(nondecreasing)] | None
        sampling_method: Literal["uniform", "gaussian"]

        @model_validator(mode="after")
        def validate_temperature(self) -> Self:
            max_temp = int(PLANKIAN_JITTER_CONST["MAX_TEMP"])

            if self.temperature_limit is None:
                if self.mode == "blackbody":
                    self.temperature_limit = int(PLANKIAN_JITTER_CONST["MIN_BLACKBODY_TEMP"]), max_temp
                elif self.mode == "cied":
                    self.temperature_limit = int(PLANKIAN_JITTER_CONST["MIN_CIED_TEMP"]), max_temp
            else:
                if self.mode == "blackbody" and (
                    min(self.temperature_limit) < PLANKIAN_JITTER_CONST["MIN_BLACKBODY_TEMP"]
                    or max(self.temperature_limit) > max_temp
                ):
                    raise ValueError("Temperature limits for blackbody should be in [3000, 15000] range")
                if self.mode == "cied" and (
                    min(self.temperature_limit) < PLANKIAN_JITTER_CONST["MIN_CIED_TEMP"]
                    or max(self.temperature_limit) > max_temp
                ):
                    raise ValueError("Temperature limits for CIED should be in [4000, 15000] range")

                if not self.temperature_limit[0] <= PLANKIAN_JITTER_CONST["WHITE_TEMP"] <= self.temperature_limit[1]:
                    raise ValueError("White temperature should be within the temperature limits")

            return self

    def __init__(
        self,
        mode: PlanckianJitterMode = "blackbody",
        temperature_limit: tuple[int, int] | None = None,
        sampling_method: Literal["uniform", "gaussian"] = "uniform",
        always_apply: bool | None = None,
        p: float = 0.5,
    ) -> None:
        super().__init__(p=p, always_apply=always_apply)

        self.mode = mode
        self.temperature_limit = cast(Tuple[int, int], temperature_limit)
        self.sampling_method = sampling_method

    def apply(self, img: np.ndarray, temperature: int, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            raise TypeError("PlanckianJitter transformation expects 3-channel images.")
        return fmain.planckian_jitter(img, temperature, mode=self.mode)

    def get_params(self) -> dict[str, Any]:
        sampling_prob_boundary = PLANKIAN_JITTER_CONST["SAMPLING_TEMP_PROB"]
        sampling_temp_boundary = PLANKIAN_JITTER_CONST["WHITE_TEMP"]

        if self.sampling_method == "uniform":
            # Split into 2 cases to avoid selecting cold temperatures (>6000) too often
            if random.random() < sampling_prob_boundary:
                temperature = (
                    random.uniform(
                        self.temperature_limit[0],
                        sampling_temp_boundary,
                    ),
                )
            else:
                temperature = (
                    random.uniform(
                        sampling_temp_boundary,
                        self.temperature_limit[1],
                    ),
                )
        elif self.sampling_method == "gaussian":
            # Sample values from asymmetric gaussian distribution
            if random.random() < sampling_prob_boundary:
                # Left side
                shift = np.abs(
                    random.gauss(
                        0,
                        np.abs(sampling_temp_boundary - self.temperature_limit[0]) / 3,
                    ),
                )
            else:
                # Right side
                shift = -np.abs(
                    random.gauss(
                        0,
                        np.abs(self.temperature_limit[1] - sampling_temp_boundary) / 3,
                    ),
                )

            temperature = sampling_temp_boundary - shift
        else:
            raise ValueError(f"Unknown sampling method: {self.sampling_method}")

        return {"temperature": int(np.clip(temperature, self.temperature_limit[0], self.temperature_limit[1]))}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "mode", "temperature_limit", "sampling_method"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    mode: PlanckianJitterMode
    temperature_limit: Annotated[tuple[int, int], AfterValidator(nondecreasing)] | None
    sampling_method: Literal["uniform", "gaussian"]

    @model_validator(mode="after")
    def validate_temperature(self) -> Self:
        max_temp = int(PLANKIAN_JITTER_CONST["MAX_TEMP"])

        if self.temperature_limit is None:
            if self.mode == "blackbody":
                self.temperature_limit = int(PLANKIAN_JITTER_CONST["MIN_BLACKBODY_TEMP"]), max_temp
            elif self.mode == "cied":
                self.temperature_limit = int(PLANKIAN_JITTER_CONST["MIN_CIED_TEMP"]), max_temp
        else:
            if self.mode == "blackbody" and (
                min(self.temperature_limit) < PLANKIAN_JITTER_CONST["MIN_BLACKBODY_TEMP"]
                or max(self.temperature_limit) > max_temp
            ):
                raise ValueError("Temperature limits for blackbody should be in [3000, 15000] range")
            if self.mode == "cied" and (
                min(self.temperature_limit) < PLANKIAN_JITTER_CONST["MIN_CIED_TEMP"]
                or max(self.temperature_limit) > max_temp
            ):
                raise ValueError("Temperature limits for CIED should be in [4000, 15000] range")

            if not self.temperature_limit[0] <= PLANKIAN_JITTER_CONST["WHITE_TEMP"] <= self.temperature_limit[1]:
                raise ValueError("White temperature should be within the temperature limits")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, temperature, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, temperature: int, **params: Any) -> np.ndarray:
    if not is_rgb_image(img):
        raise TypeError("PlanckianJitter transformation expects 3-channel images.")
    return fmain.planckian_jitter(img, temperature, mode=self.mode)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    sampling_prob_boundary = PLANKIAN_JITTER_CONST["SAMPLING_TEMP_PROB"]
    sampling_temp_boundary = PLANKIAN_JITTER_CONST["WHITE_TEMP"]

    if self.sampling_method == "uniform":
        # Split into 2 cases to avoid selecting cold temperatures (>6000) too often
        if random.random() < sampling_prob_boundary:
            temperature = (
                random.uniform(
                    self.temperature_limit[0],
                    sampling_temp_boundary,
                ),
            )
        else:
            temperature = (
                random.uniform(
                    sampling_temp_boundary,
                    self.temperature_limit[1],
                ),
            )
    elif self.sampling_method == "gaussian":
        # Sample values from asymmetric gaussian distribution
        if random.random() < sampling_prob_boundary:
            # Left side
            shift = np.abs(
                random.gauss(
                    0,
                    np.abs(sampling_temp_boundary - self.temperature_limit[0]) / 3,
                ),
            )
        else:
            # Right side
            shift = -np.abs(
                random.gauss(
                    0,
                    np.abs(self.temperature_limit[1] - sampling_temp_boundary) / 3,
                ),
            )

        temperature = sampling_temp_boundary - shift
    else:
        raise ValueError(f"Unknown sampling method: {self.sampling_method}")

    return {"temperature": int(np.clip(temperature, self.temperature_limit[0], self.temperature_limit[1]))}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "mode", "temperature_limit", "sampling_method"

class Posterize (num_bits=4, always_apply=None, p=0.5) [view source on GitHub]

Reduces the number of bits for each color channel in the image.

This transform applies color posterization, a technique that reduces the number of distinct colors used in an image. It works by lowering the number of bits used to represent each color channel, effectively creating a "poster-like" effect with fewer color gradations.

Parameters:

Name Type Description
num_bits int | tuple[int, int] | list[int] | list[tuple[int, int]]

Defines the number of bits to keep for each color channel. Can be specified in several ways: - Single int: Same number of bits for all channels. Range: [0, 8]. - Tuple of two ints: (min_bits, max_bits) to randomly choose from. Range for each: [0, 8]. - List of three ints: Specific number of bits for each channel [r_bits, g_bits, b_bits]. - List of three tuples: Ranges for each channel [(r_min, r_max), (g_min, g_max), (b_min, b_max)]. Default: 4

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The effect becomes more pronounced as the number of bits is reduced.
  • Using 0 bits for a channel will reduce it to a single color (usually black).
  • Using 8 bits leaves the channel unchanged.
  • This transform can create interesting artistic effects or be used for image compression simulation.
  • Posterization is particularly useful for:
  • Creating stylized or retro-looking images
  • Reducing the color palette for specific artistic effects
  • Simulating the look of older or lower-quality digital images
  • Data augmentation in scenarios where color depth might vary

Mathematical Background: For an 8-bit color channel, posterization to n bits can be expressed as: new_value = (old_value >> (8 - n)) << (8 - n) This operation keeps the n most significant bits and sets the rest to zero.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Posterize all channels to 3 bits
Python
>>> transform = A.Posterize(num_bits=3, p=1.0)
>>> posterized_image = transform(image=image)["image"]
Randomly posterize between 2 and 5 bits
Python
>>> transform = A.Posterize(num_bits=(2, 5), p=1.0)
>>> posterized_image = transform(image=image)["image"]
Different bits for each channel
Python
>>> transform = A.Posterize(num_bits=[3, 5, 2], p=1.0)
>>> posterized_image = transform(image=image)["image"]
Range of bits for each channel
Python
>>> transform = A.Posterize(num_bits=[(1, 3), (3, 5), (2, 4)], p=1.0)
>>> posterized_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Posterize(ImageOnlyTransform):
    """Reduces the number of bits for each color channel in the image.

    This transform applies color posterization, a technique that reduces the number of distinct
    colors used in an image. It works by lowering the number of bits used to represent each
    color channel, effectively creating a "poster-like" effect with fewer color gradations.

    Args:
        num_bits (int | tuple[int, int] | list[int] | list[tuple[int, int]]):
            Defines the number of bits to keep for each color channel. Can be specified in several ways:
            - Single int: Same number of bits for all channels. Range: [0, 8].
            - Tuple of two ints: (min_bits, max_bits) to randomly choose from. Range for each: [0, 8].
            - List of three ints: Specific number of bits for each channel [r_bits, g_bits, b_bits].
            - List of three tuples: Ranges for each channel [(r_min, r_max), (g_min, g_max), (b_min, b_max)].
            Default: 4

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The effect becomes more pronounced as the number of bits is reduced.
        - Using 0 bits for a channel will reduce it to a single color (usually black).
        - Using 8 bits leaves the channel unchanged.
        - This transform can create interesting artistic effects or be used for image compression simulation.
        - Posterization is particularly useful for:
          * Creating stylized or retro-looking images
          * Reducing the color palette for specific artistic effects
          * Simulating the look of older or lower-quality digital images
          * Data augmentation in scenarios where color depth might vary

    Mathematical Background:
        For an 8-bit color channel, posterization to n bits can be expressed as:
        new_value = (old_value >> (8 - n)) << (8 - n)
        This operation keeps the n most significant bits and sets the rest to zero.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Posterize all channels to 3 bits
        >>> transform = A.Posterize(num_bits=3, p=1.0)
        >>> posterized_image = transform(image=image)["image"]

        # Randomly posterize between 2 and 5 bits
        >>> transform = A.Posterize(num_bits=(2, 5), p=1.0)
        >>> posterized_image = transform(image=image)["image"]

        # Different bits for each channel
        >>> transform = A.Posterize(num_bits=[3, 5, 2], p=1.0)
        >>> posterized_image = transform(image=image)["image"]

        # Range of bits for each channel
        >>> transform = A.Posterize(num_bits=[(1, 3), (3, 5), (2, 4)], p=1.0)
        >>> posterized_image = transform(image=image)["image"]

    References:
        - Color Quantization: https://en.wikipedia.org/wiki/Color_quantization
        - Posterization: https://en.wikipedia.org/wiki/Posterization
    """

    class InitSchema(BaseTransformInitSchema):
        num_bits: Annotated[
            int | tuple[int, int] | list[tuple[int, int]],
            Field(default=4, description="Number of high bits"),
        ]

        @field_validator("num_bits")
        @classmethod
        def validate_num_bits(cls, num_bits: Any) -> tuple[int, int] | list[tuple[int, int]]:
            if isinstance(num_bits, int):
                return cast(Tuple[int, int], to_tuple(num_bits, num_bits))
            if isinstance(num_bits, Sequence) and len(num_bits) == NUM_BITS_ARRAY_LENGTH:
                return [cast(Tuple[int, int], to_tuple(i, 0)) for i in num_bits]
            return cast(Tuple[int, int], to_tuple(num_bits, 0))

    def __init__(
        self,
        num_bits: int | tuple[int, int] | list[tuple[int, int]] = 4,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.num_bits = cast(Union[Tuple[int, ...], List[Tuple[int, ...]]], num_bits)

    def apply(self, img: np.ndarray, num_bits: int, **params: Any) -> np.ndarray:
        return fmain.posterize(img, num_bits)

    def get_params(self) -> dict[str, Any]:
        if len(self.num_bits) == NUM_BITS_ARRAY_LENGTH:
            return {"num_bits": [random.randint(int(i[0]), int(i[1])) for i in self.num_bits]}  # type: ignore[index]
        num_bits = self.num_bits
        return {"num_bits": random.randint(int(num_bits[0]), int(num_bits[1]))}  # type: ignore[arg-type]

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("num_bits",)
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    num_bits: Annotated[
        int | tuple[int, int] | list[tuple[int, int]],
        Field(default=4, description="Number of high bits"),
    ]

    @field_validator("num_bits")
    @classmethod
    def validate_num_bits(cls, num_bits: Any) -> tuple[int, int] | list[tuple[int, int]]:
        if isinstance(num_bits, int):
            return cast(Tuple[int, int], to_tuple(num_bits, num_bits))
        if isinstance(num_bits, Sequence) and len(num_bits) == NUM_BITS_ARRAY_LENGTH:
            return [cast(Tuple[int, int], to_tuple(i, 0)) for i in num_bits]
        return cast(Tuple[int, int], to_tuple(num_bits, 0))
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, num_bits, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, num_bits: int, **params: Any) -> np.ndarray:
    return fmain.posterize(img, num_bits)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    if len(self.num_bits) == NUM_BITS_ARRAY_LENGTH:
        return {"num_bits": [random.randint(int(i[0]), int(i[1])) for i in self.num_bits]}  # type: ignore[index]
    num_bits = self.num_bits
    return {"num_bits": random.randint(int(num_bits[0]), int(num_bits[1]))}  # type: ignore[arg-type]
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str]:
    return ("num_bits",)

class RGBShift (r_shift_limit=(-20, 20), g_shift_limit=(-20, 20), b_shift_limit=(-20, 20), always_apply=None, p=0.5) [view source on GitHub]

Randomly shifts the values of each RGB channel independently.

This transform adjusts the intensity of the red, green, and blue channels of an image by adding a random value within a specified range to each channel. This can be used to simulate color variations caused by different lighting conditions or camera sensors.

Parameters:

Name Type Description
r_shift_limit float | tuple[float, float]

Range for changing values for the red channel. If r_shift_limit is a single int or float, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).

g_shift_limit float | tuple[float, float]

Range for changing values for the green channel. If g_shift_limit is a single int or float, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).

b_shift_limit float | tuple[float, float]

Range for changing values for the blue channel. If b_shift_limit is a single int or float, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The shift values are sampled independently for each channel.
  • Positive shifts increase the intensity of a color channel, while negative shifts decrease it.
  • For uint8 images, the resulting pixel values are clipped to the [0, 255] range.
  • For float32 images, the values are typically in the [0, 1] range but may exceed it after shifting.
  • This transform can be used to:
  • Simulate variations in color balance
  • Create subtle color casts
  • Augment data for improving model robustness to color variations

Mathematical formula: For each channel c in [r, g, b]: output_c = input_c + shift_c where shift_c is randomly sampled from the corresponding shift_limit range.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RGBShift(p=1.0)
>>> augmented_image = transform(image=image)["image"]
Custom shift ranges for each channel
Python
>>> transform = A.RGBShift(r_shift_limit=30, g_shift_limit=(-20, 20), b_shift_limit=(-10, 10), p=1.0)
>>> augmented_image = transform(image=image)["image"]
Using float values for more precise control
Python
>>> transform = A.RGBShift(r_shift_limit=(-0.1, 0.1), g_shift_limit=0.2, b_shift_limit=(-0.3, 0.3), p=1.0)
>>> augmented_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RGBShift(ImageOnlyTransform):
    """Randomly shifts the values of each RGB channel independently.

    This transform adjusts the intensity of the red, green, and blue channels of an image
    by adding a random value within a specified range to each channel. This can be used to
    simulate color variations caused by different lighting conditions or camera sensors.

    Args:
        r_shift_limit (float | tuple[float, float]): Range for changing values for the red channel.
            If r_shift_limit is a single int or float, the range will be (-r_shift_limit, r_shift_limit).
            Default: (-20, 20).
        g_shift_limit (float | tuple[float, float]): Range for changing values for the green channel.
            If g_shift_limit is a single int or float, the range will be (-g_shift_limit, g_shift_limit).
            Default: (-20, 20).
        b_shift_limit (float | tuple[float, float]): Range for changing values for the blue channel.
            If b_shift_limit is a single int or float, the range will be (-b_shift_limit, b_shift_limit).
            Default: (-20, 20).
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The shift values are sampled independently for each channel.
        - Positive shifts increase the intensity of a color channel, while negative shifts decrease it.
        - For uint8 images, the resulting pixel values are clipped to the [0, 255] range.
        - For float32 images, the values are typically in the [0, 1] range but may exceed it after shifting.
        - This transform can be used to:
          * Simulate variations in color balance
          * Create subtle color casts
          * Augment data for improving model robustness to color variations

    Mathematical formula:
        For each channel c in [r, g, b]:
        output_c = input_c + shift_c
        where shift_c is randomly sampled from the corresponding shift_limit range.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RGBShift(p=1.0)
        >>> augmented_image = transform(image=image)["image"]

        # Custom shift ranges for each channel
        >>> transform = A.RGBShift(r_shift_limit=30, g_shift_limit=(-20, 20), b_shift_limit=(-10, 10), p=1.0)
        >>> augmented_image = transform(image=image)["image"]

        # Using float values for more precise control
        >>> transform = A.RGBShift(r_shift_limit=(-0.1, 0.1), g_shift_limit=0.2, b_shift_limit=(-0.3, 0.3), p=1.0)
        >>> augmented_image = transform(image=image)["image"]

    References:
        - Color balance: https://en.wikipedia.org/wiki/Color_balance
        - Color cast: https://en.wikipedia.org/wiki/Color_cast
    """

    class InitSchema(BaseTransformInitSchema):
        r_shift_limit: SymmetricRangeType
        g_shift_limit: SymmetricRangeType
        b_shift_limit: SymmetricRangeType

    def __init__(
        self,
        r_shift_limit: ScaleFloatType = (-20, 20),
        g_shift_limit: ScaleFloatType = (-20, 20),
        b_shift_limit: ScaleFloatType = (-20, 20),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.r_shift_limit = cast(Tuple[float, float], r_shift_limit)
        self.g_shift_limit = cast(Tuple[float, float], g_shift_limit)
        self.b_shift_limit = cast(Tuple[float, float], b_shift_limit)

    def apply(self, img: np.ndarray, shift: np.ndarray, **params: Any) -> np.ndarray:
        if not is_rgb_image(img):
            msg = "RGBShift transformation expects 3-channel images."
            raise TypeError(msg)

        return albucore.add_vector(img, shift)

    def get_params(self) -> dict[str, Any]:
        return {
            "shift": np.array(
                [
                    random.uniform(*self.r_shift_limit),
                    random.uniform(*self.g_shift_limit),
                    random.uniform(*self.b_shift_limit),
                ],
            ),
        }

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "r_shift_limit", "g_shift_limit", "b_shift_limit"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    r_shift_limit: SymmetricRangeType
    g_shift_limit: SymmetricRangeType
    b_shift_limit: SymmetricRangeType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, shift, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, shift: np.ndarray, **params: Any) -> np.ndarray:
    if not is_rgb_image(img):
        msg = "RGBShift transformation expects 3-channel images."
        raise TypeError(msg)

    return albucore.add_vector(img, shift)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    return {
        "shift": np.array(
            [
                random.uniform(*self.r_shift_limit),
                random.uniform(*self.g_shift_limit),
                random.uniform(*self.b_shift_limit),
            ],
        ),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "r_shift_limit", "g_shift_limit", "b_shift_limit"

class RandomBrightnessContrast (brightness_limit=(-0.2, 0.2), contrast_limit=(-0.2, 0.2), brightness_by_max=True, always_apply=None, p=0.5) [view source on GitHub]

Randomly changes the brightness and contrast of the input image.

This transform adjusts the brightness and contrast of an image simultaneously, allowing for a wide range of lighting and contrast variations. It's particularly useful for data augmentation in computer vision tasks, helping models become more robust to different lighting conditions.

Parameters:

Name Type Description
brightness_limit float | tuple[float, float]

Factor range for changing brightness. If a single float value is provided, the range will be (-brightness_limit, brightness_limit). Values should typically be in the range [-1.0, 1.0], where 0 means no change, 1.0 means maximum brightness, and -1.0 means minimum brightness. Default: (-0.2, 0.2).

contrast_limit float | tuple[float, float]

Factor range for changing contrast. If a single float value is provided, the range will be (-contrast_limit, contrast_limit). Values should typically be in the range [-1.0, 1.0], where 0 means no change, 1.0 means maximum increase in contrast, and -1.0 means maximum decrease in contrast. Default: (-0.2, 0.2).

brightness_by_max bool

If True, adjusts brightness by scaling pixel values up to the maximum value of the image's dtype. If False, uses the mean pixel value for adjustment. Default: True.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The order of operation is: contrast adjustment, then brightness adjustment.
  • For uint8 images, the output is clipped to [0, 255] range.
  • For float32 images, the output may exceed the [0, 1] range.
  • The brightness_by_max parameter affects how brightness is adjusted:
  • If True, brightness adjustment is more pronounced and can lead to more saturated results.
  • If False, brightness adjustment is more subtle and preserves the overall lighting better.
  • This transform is useful for:
  • Simulating different lighting conditions
  • Enhancing low-light or overexposed images
  • Data augmentation to improve model robustness

Mathematical Formulation: Let a be the contrast adjustment factor and β be the brightness adjustment factor. For each pixel value x: 1. Contrast adjustment: x' = clip((x - mean) * (1 + a) + mean) 2. Brightness adjustment: If brightness_by_max is True: x'' = clip(x' * (1 + β)) If brightness_by_max is False: x'' = clip(x' + β * max_value) Where clip() ensures values stay within the valid range for the image dtype.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RandomBrightnessContrast(p=1.0)
>>> augmented_image = transform(image=image)["image"]
Custom brightness and contrast limits
Python
>>> transform = A.RandomBrightnessContrast(
...     brightness_limit=0.3,
...     contrast_limit=0.3,
...     p=1.0
... )
>>> augmented_image = transform(image=image)["image"]
Adjust brightness based on mean value
Python
>>> transform = A.RandomBrightnessContrast(
...     brightness_limit=0.2,
...     contrast_limit=0.2,
...     brightness_by_max=False,
...     p=1.0
... )
>>> augmented_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomBrightnessContrast(ImageOnlyTransform):
    """Randomly changes the brightness and contrast of the input image.

    This transform adjusts the brightness and contrast of an image simultaneously, allowing for
    a wide range of lighting and contrast variations. It's particularly useful for data augmentation
    in computer vision tasks, helping models become more robust to different lighting conditions.

    Args:
        brightness_limit (float | tuple[float, float]): Factor range for changing brightness.
            If a single float value is provided, the range will be (-brightness_limit, brightness_limit).
            Values should typically be in the range [-1.0, 1.0], where 0 means no change,
            1.0 means maximum brightness, and -1.0 means minimum brightness.
            Default: (-0.2, 0.2).

        contrast_limit (float | tuple[float, float]): Factor range for changing contrast.
            If a single float value is provided, the range will be (-contrast_limit, contrast_limit).
            Values should typically be in the range [-1.0, 1.0], where 0 means no change,
            1.0 means maximum increase in contrast, and -1.0 means maximum decrease in contrast.
            Default: (-0.2, 0.2).

        brightness_by_max (bool): If True, adjusts brightness by scaling pixel values up to the
            maximum value of the image's dtype. If False, uses the mean pixel value for adjustment.
            Default: True.

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The order of operation is: contrast adjustment, then brightness adjustment.
        - For uint8 images, the output is clipped to [0, 255] range.
        - For float32 images, the output may exceed the [0, 1] range.
        - The `brightness_by_max` parameter affects how brightness is adjusted:
          * If True, brightness adjustment is more pronounced and can lead to more saturated results.
          * If False, brightness adjustment is more subtle and preserves the overall lighting better.
        - This transform is useful for:
          * Simulating different lighting conditions
          * Enhancing low-light or overexposed images
          * Data augmentation to improve model robustness

    Mathematical Formulation:
        Let a be the contrast adjustment factor and β be the brightness adjustment factor.
        For each pixel value x:
        1. Contrast adjustment: x' = clip((x - mean) * (1 + a) + mean)
        2. Brightness adjustment:
           If brightness_by_max is True:  x'' = clip(x' * (1 + β))
           If brightness_by_max is False: x'' = clip(x' + β * max_value)
        Where clip() ensures values stay within the valid range for the image dtype.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RandomBrightnessContrast(p=1.0)
        >>> augmented_image = transform(image=image)["image"]

        # Custom brightness and contrast limits
        >>> transform = A.RandomBrightnessContrast(
        ...     brightness_limit=0.3,
        ...     contrast_limit=0.3,
        ...     p=1.0
        ... )
        >>> augmented_image = transform(image=image)["image"]

        # Adjust brightness based on mean value
        >>> transform = A.RandomBrightnessContrast(
        ...     brightness_limit=0.2,
        ...     contrast_limit=0.2,
        ...     brightness_by_max=False,
        ...     p=1.0
        ... )
        >>> augmented_image = transform(image=image)["image"]

    References:
        - Brightness: https://en.wikipedia.org/wiki/Brightness
        - Contrast: https://en.wikipedia.org/wiki/Contrast_(vision)
    """

    class InitSchema(BaseTransformInitSchema):
        brightness_limit: SymmetricRangeType
        contrast_limit: SymmetricRangeType
        brightness_by_max: bool

    def __init__(
        self,
        brightness_limit: ScaleFloatType = (-0.2, 0.2),
        contrast_limit: ScaleFloatType = (-0.2, 0.2),
        brightness_by_max: bool = True,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.brightness_limit = cast(Tuple[float, float], brightness_limit)
        self.contrast_limit = cast(Tuple[float, float], contrast_limit)
        self.brightness_by_max = brightness_by_max

    def apply(self, img: np.ndarray, alpha: float, beta: float, **params: Any) -> np.ndarray:
        return fmain.brightness_contrast_adjust(img, alpha, beta, self.brightness_by_max)

    def get_params(self) -> dict[str, float]:
        return {
            "alpha": 1.0 + random.uniform(*self.contrast_limit),
            "beta": 0.0 + random.uniform(*self.brightness_limit),
        }

    def get_transform_init_args_names(self) -> tuple[str, str, str]:
        return "brightness_limit", "contrast_limit", "brightness_by_max"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    brightness_limit: SymmetricRangeType
    contrast_limit: SymmetricRangeType
    brightness_by_max: bool
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, alpha, beta, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, alpha: float, beta: float, **params: Any) -> np.ndarray:
    return fmain.brightness_contrast_adjust(img, alpha, beta, self.brightness_by_max)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {
        "alpha": 1.0 + random.uniform(*self.contrast_limit),
        "beta": 0.0 + random.uniform(*self.brightness_limit),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str]:
    return "brightness_limit", "contrast_limit", "brightness_by_max"

class RandomFog (fog_coef_lower=None, fog_coef_upper=None, alpha_coef=0.08, fog_coef_range=(0.3, 1), always_apply=None, p=0.5) [view source on GitHub]

Simulates fog for the image by adding random fog-like artifacts.

This transform creates a fog effect by generating semi-transparent overlays that mimic the visual characteristics of fog. The fog intensity and distribution can be controlled to create various fog-like conditions.

Parameters:

Name Type Description
fog_coef_range tuple[float, float]

Range for fog intensity coefficient. Should be in [0, 1] range.

alpha_coef float

Transparency of the fog circles. Should be in [0, 1] range. Default: 0.08.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The fog effect is created by overlaying semi-transparent circles on the image.
  • Higher fog coefficient values result in denser fog effects.
  • The fog is typically denser in the center of the image and gradually decreases towards the edges.
  • This transform is useful for:
  • Simulating various weather conditions in outdoor scenes
  • Data augmentation for improving model robustness to foggy conditions
  • Creating atmospheric effects in image editing

Mathematical Formulation: For each fog particle: 1. A position (x, y) is randomly generated within the image. 2. A circle with random radius is drawn at this position. 3. The circle's alpha (transparency) is determined by the alpha_coef. 4. These circles are overlaid on the original image to create the fog effect.

The final pixel value is calculated as:
output = (1 - alpha) * original_pixel + alpha * fog_color

where alpha is influenced by the fog_coef and alpha_coef parameters.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RandomFog(p=1.0)
>>> foggy_image = transform(image=image)["image"]
Custom fog intensity range
Python
>>> transform = A.RandomFog(fog_coef_lower=0.3, fog_coef_upper=0.8, p=1.0)
>>> foggy_image = transform(image=image)["image"]
Adjust fog transparency
Python
>>> transform = A.RandomFog(fog_coef_lower=0.2, fog_coef_upper=0.5, alpha_coef=0.1, p=1.0)
>>> foggy_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomFog(ImageOnlyTransform):
    """Simulates fog for the image by adding random fog-like artifacts.

    This transform creates a fog effect by generating semi-transparent overlays
    that mimic the visual characteristics of fog. The fog intensity and distribution
    can be controlled to create various fog-like conditions.

    Args:
        fog_coef_range (tuple[float, float]): Range for fog intensity coefficient. Should be in [0, 1] range.
        alpha_coef (float): Transparency of the fog circles. Should be in [0, 1] range. Default: 0.08.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The fog effect is created by overlaying semi-transparent circles on the image.
        - Higher fog coefficient values result in denser fog effects.
        - The fog is typically denser in the center of the image and gradually decreases towards the edges.
        - This transform is useful for:
          * Simulating various weather conditions in outdoor scenes
          * Data augmentation for improving model robustness to foggy conditions
          * Creating atmospheric effects in image editing

    Mathematical Formulation:
        For each fog particle:
        1. A position (x, y) is randomly generated within the image.
        2. A circle with random radius is drawn at this position.
        3. The circle's alpha (transparency) is determined by the alpha_coef.
        4. These circles are overlaid on the original image to create the fog effect.

        The final pixel value is calculated as:
        output = (1 - alpha) * original_pixel + alpha * fog_color

        where alpha is influenced by the fog_coef and alpha_coef parameters.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RandomFog(p=1.0)
        >>> foggy_image = transform(image=image)["image"]

        # Custom fog intensity range
        >>> transform = A.RandomFog(fog_coef_lower=0.3, fog_coef_upper=0.8, p=1.0)
        >>> foggy_image = transform(image=image)["image"]

        # Adjust fog transparency
        >>> transform = A.RandomFog(fog_coef_lower=0.2, fog_coef_upper=0.5, alpha_coef=0.1, p=1.0)
        >>> foggy_image = transform(image=image)["image"]

    References:
        - Fog: https://en.wikipedia.org/wiki/Fog
        - Atmospheric perspective: https://en.wikipedia.org/wiki/Aerial_perspective
    """

    class InitSchema(BaseTransformInitSchema):
        fog_coef_lower: float | None = Field(
            ge=0,
            le=1,
        )
        fog_coef_upper: float | None = Field(
            ge=0,
            le=1,
        )
        fog_coef_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

        alpha_coef: float = Field(ge=0, le=1)

        @model_validator(mode="after")
        def validate_fog_coefficients(self) -> Self:
            if self.fog_coef_lower is not None:
                warn("`fog_coef_lower` is deprecated, use `fog_coef_range` instead.", DeprecationWarning, stacklevel=2)
            if self.fog_coef_upper is not None:
                warn("`fog_coef_upper` is deprecated, use `fog_coef_range` instead.", DeprecationWarning, stacklevel=2)

            lower = self.fog_coef_lower if self.fog_coef_lower is not None else self.fog_coef_range[0]
            upper = self.fog_coef_upper if self.fog_coef_upper is not None else self.fog_coef_range[1]
            self.fog_coef_range = (lower, upper)

            self.fog_coef_lower = None
            self.fog_coef_upper = None

            return self

    def __init__(
        self,
        fog_coef_lower: float | None = None,
        fog_coef_upper: float | None = None,
        alpha_coef: float = 0.08,
        fog_coef_range: tuple[float, float] = (0.3, 1),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.fog_coef_range = fog_coef_range
        self.alpha_coef = alpha_coef

    def apply(
        self,
        img: np.ndarray,
        particle_positions: np.ndarray,
        intensity: float,
        random_seed: int,
        **params: Any,
    ) -> np.ndarray:
        return fmain.add_fog(img, intensity, self.alpha_coef, particle_positions, np.random.RandomState(random_seed))

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        # Select a random fog intensity within the specified range
        intensity = random.uniform(*self.fog_coef_range)

        image_shape = params["shape"][:2]

        image_height, image_width = image_shape

        # Calculate the size of the fog effect region based on image width and fog intensity
        fog_region_size = max(1, int(image_width // 3 * intensity))

        particle_positions = []

        # Initialize the central region where fog will be most dense
        center_x, center_y = (int(x) for x in fgeometric.center(image_shape))

        # Define the initial size of the foggy area
        current_width = image_width
        current_height = image_height

        # Define shrink factor for reducing the foggy area each iteration
        shrink_factor = 0.1

        max_iterations = 10  # Prevent infinite loop
        iteration = 0

        while current_width > fog_region_size and current_height > fog_region_size and iteration < max_iterations:
            # Calculate the number of particles for this region
            area = current_width * current_height
            particles_in_region = int(area / (fog_region_size * fog_region_size) * intensity * 10)

            for _ in range(particles_in_region):
                # Generate random positions within the current region
                x = random.randint(center_x - current_width // 2, center_x + current_width // 2)
                y = random.randint(center_y - current_height // 2, center_y + current_height // 2)
                particle_positions.append((x, y))

            # Shrink the region for the next iteration
            current_width = int(current_width * (1 - shrink_factor))
            current_height = int(current_height * (1 - shrink_factor))

            iteration += 1

        return {
            "particle_positions": particle_positions,
            "intensity": intensity,
            "random_seed": random_utils.get_random_seed(),
        }

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "fog_coef_range", "alpha_coef"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    fog_coef_lower: float | None = Field(
        ge=0,
        le=1,
    )
    fog_coef_upper: float | None = Field(
        ge=0,
        le=1,
    )
    fog_coef_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

    alpha_coef: float = Field(ge=0, le=1)

    @model_validator(mode="after")
    def validate_fog_coefficients(self) -> Self:
        if self.fog_coef_lower is not None:
            warn("`fog_coef_lower` is deprecated, use `fog_coef_range` instead.", DeprecationWarning, stacklevel=2)
        if self.fog_coef_upper is not None:
            warn("`fog_coef_upper` is deprecated, use `fog_coef_range` instead.", DeprecationWarning, stacklevel=2)

        lower = self.fog_coef_lower if self.fog_coef_lower is not None else self.fog_coef_range[0]
        upper = self.fog_coef_upper if self.fog_coef_upper is not None else self.fog_coef_range[1]
        self.fog_coef_range = (lower, upper)

        self.fog_coef_lower = None
        self.fog_coef_upper = None

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, particle_positions, intensity, random_seed, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    particle_positions: np.ndarray,
    intensity: float,
    random_seed: int,
    **params: Any,
) -> np.ndarray:
    return fmain.add_fog(img, intensity, self.alpha_coef, particle_positions, np.random.RandomState(random_seed))
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    # Select a random fog intensity within the specified range
    intensity = random.uniform(*self.fog_coef_range)

    image_shape = params["shape"][:2]

    image_height, image_width = image_shape

    # Calculate the size of the fog effect region based on image width and fog intensity
    fog_region_size = max(1, int(image_width // 3 * intensity))

    particle_positions = []

    # Initialize the central region where fog will be most dense
    center_x, center_y = (int(x) for x in fgeometric.center(image_shape))

    # Define the initial size of the foggy area
    current_width = image_width
    current_height = image_height

    # Define shrink factor for reducing the foggy area each iteration
    shrink_factor = 0.1

    max_iterations = 10  # Prevent infinite loop
    iteration = 0

    while current_width > fog_region_size and current_height > fog_region_size and iteration < max_iterations:
        # Calculate the number of particles for this region
        area = current_width * current_height
        particles_in_region = int(area / (fog_region_size * fog_region_size) * intensity * 10)

        for _ in range(particles_in_region):
            # Generate random positions within the current region
            x = random.randint(center_x - current_width // 2, center_x + current_width // 2)
            y = random.randint(center_y - current_height // 2, center_y + current_height // 2)
            particle_positions.append((x, y))

        # Shrink the region for the next iteration
        current_width = int(current_width * (1 - shrink_factor))
        current_height = int(current_height * (1 - shrink_factor))

        iteration += 1

    return {
        "particle_positions": particle_positions,
        "intensity": intensity,
        "random_seed": random_utils.get_random_seed(),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return "fog_coef_range", "alpha_coef"

class RandomGamma (gamma_limit=(80, 120), always_apply=None, p=0.5) [view source on GitHub]

Applies random gamma correction to the input image.

Gamma correction, or simply gamma, is a nonlinear operation used to encode and decode luminance or tristimulus values in imaging systems. This transform can adjust the brightness of an image while preserving the relative differences between darker and lighter areas, making it useful for simulating different lighting conditions or correcting for display characteristics.

Parameters:

Name Type Description
gamma_limit float | tuple[float, float]

If gamma_limit is a single float value, the range will be (1, gamma_limit). If it's a tuple of two floats, they will serve as the lower and upper bounds for gamma adjustment. Values are in terms of percentage change, e.g., (80, 120) means the gamma will be between 80% and 120% of the original. Default: (80, 120).

eps

A small value added to the gamma to avoid division by zero or log of zero errors. Default: 1e-7.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The gamma correction is applied using the formula: output = input^gamma
  • Gamma values > 1 will make the image darker, while values < 1 will make it brighter
  • This transform is particularly useful for:
  • Simulating different lighting conditions
  • Correcting for non-linear display characteristics
  • Enhancing contrast in certain regions of the image
  • Data augmentation in computer vision tasks

Mathematical Formulation: Let I be the input image and G (gamma) be the correction factor. The gamma correction is applied as follows: 1. Normalize the image to [0, 1] range: I_norm = I / 255 (for uint8 images) 2. Apply gamma correction: I_corrected = I_norm ^ (1 / G) 3. Scale back to original range: output = I_corrected * 255 (for uint8 images)

The actual gamma value used is calculated as:
G = 1 + (random_value / 100), where random_value is sampled from gamma_limit range.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RandomGamma(p=1.0)
>>> augmented_image = transform(image=image)["image"]
Custom gamma range
Python
>>> transform = A.RandomGamma(gamma_limit=(50, 150), p=1.0)
>>> augmented_image = transform(image=image)["image"]
Applying with other transforms
Python
>>> transform = A.Compose([
...     A.RandomGamma(gamma_limit=(80, 120), p=0.5),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> augmented_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomGamma(ImageOnlyTransform):
    """Applies random gamma correction to the input image.

    Gamma correction, or simply gamma, is a nonlinear operation used to encode and decode luminance
    or tristimulus values in imaging systems. This transform can adjust the brightness of an image
    while preserving the relative differences between darker and lighter areas, making it useful
    for simulating different lighting conditions or correcting for display characteristics.

    Args:
        gamma_limit (float | tuple[float, float]): If gamma_limit is a single float value, the range
            will be (1, gamma_limit). If it's a tuple of two floats, they will serve as
            the lower and upper bounds for gamma adjustment. Values are in terms of percentage change,
            e.g., (80, 120) means the gamma will be between 80% and 120% of the original.
            Default: (80, 120).
        eps: A small value added to the gamma to avoid division by zero or log of zero errors.
            Default: 1e-7.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The gamma correction is applied using the formula: output = input^gamma
        - Gamma values > 1 will make the image darker, while values < 1 will make it brighter
        - This transform is particularly useful for:
          * Simulating different lighting conditions
          * Correcting for non-linear display characteristics
          * Enhancing contrast in certain regions of the image
          * Data augmentation in computer vision tasks

    Mathematical Formulation:
        Let I be the input image and G (gamma) be the correction factor.
        The gamma correction is applied as follows:
        1. Normalize the image to [0, 1] range: I_norm = I / 255 (for uint8 images)
        2. Apply gamma correction: I_corrected = I_norm ^ (1 / G)
        3. Scale back to original range: output = I_corrected * 255 (for uint8 images)

        The actual gamma value used is calculated as:
        G = 1 + (random_value / 100), where random_value is sampled from gamma_limit range.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RandomGamma(p=1.0)
        >>> augmented_image = transform(image=image)["image"]

        # Custom gamma range
        >>> transform = A.RandomGamma(gamma_limit=(50, 150), p=1.0)
        >>> augmented_image = transform(image=image)["image"]

        # Applying with other transforms
        >>> transform = A.Compose([
        ...     A.RandomGamma(gamma_limit=(80, 120), p=0.5),
        ...     A.RandomBrightnessContrast(p=0.5),
        ... ])
        >>> augmented_image = transform(image=image)["image"]

    References:
        - Gamma correction: https://en.wikipedia.org/wiki/Gamma_correction
        - Power law (Gamma) encoding: https://www.cambridgeincolour.com/tutorials/gamma-correction.htm
    """

    class InitSchema(BaseTransformInitSchema):
        gamma_limit: OnePlusFloatRangeType

    def __init__(
        self,
        gamma_limit: ScaleFloatType = (80, 120),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.gamma_limit = cast(Tuple[float, float], gamma_limit)

    def apply(self, img: np.ndarray, gamma: float, **params: Any) -> np.ndarray:
        return fmain.gamma_transform(img, gamma=gamma)

    def get_params(self) -> dict[str, float]:
        return {"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("gamma_limit",)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    gamma_limit: OnePlusFloatRangeType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, gamma, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, gamma: float, **params: Any) -> np.ndarray:
    return fmain.gamma_transform(img, gamma=gamma)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("gamma_limit",)

class RandomGravel (gravel_roi=(0.1, 0.4, 0.9, 0.9), number_of_patches=2, always_apply=None, p=0.5) [view source on GitHub]

Adds gravel-like artifacts to the input image.

This transform simulates the appearance of gravel or small stones scattered across specific regions of an image. It's particularly useful for augmenting datasets of road or terrain images, adding realistic texture variations.

Parameters:

Name Type Description
gravel_roi tuple[float, float, float, float]

Region of interest where gravel will be added, specified as (x_min, y_min, x_max, y_max) in relative coordinates [0, 1]. Default: (0.1, 0.4, 0.9, 0.9).

number_of_patches int

Number of gravel patch regions to generate within the ROI. Each patch will contain multiple gravel particles. Default: 2.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: 3

Note

  • The gravel effect is created by modifying the saturation channel in the HLS color space.
  • Gravel particles are distributed within randomly generated patches inside the specified ROI.
  • This transform is particularly useful for:
  • Augmenting datasets for road condition analysis
  • Simulating variations in terrain for computer vision tasks
  • Adding realistic texture to synthetic images of outdoor scenes

Mathematical Formulation: For each gravel patch: 1. A rectangular region is randomly generated within the specified ROI. 2. Within this region, multiple gravel particles are placed. 3. For each particle: - Random (x, y) coordinates are generated within the patch. - A random radius (r) between 1 and 3 pixels is assigned. - A random saturation value (sat) between 0 and 255 is assigned. 4. The saturation channel of the image is modified for each particle: image_hls[y-r:y+r, x-r:x+r, 1] = sat

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RandomGravel(p=1.0)
>>> augmented_image = transform(image=image)["image"]
Custom ROI and number of patches
Python
>>> transform = A.RandomGravel(
...     gravel_roi=(0.2, 0.2, 0.8, 0.8),
...     number_of_patches=5,
...     p=1.0
... )
>>> augmented_image = transform(image=image)["image"]
Combining with other transforms
Python
>>> transform = A.Compose([
...     A.RandomGravel(p=0.7),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> augmented_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomGravel(ImageOnlyTransform):
    """Adds gravel-like artifacts to the input image.

    This transform simulates the appearance of gravel or small stones scattered across
    specific regions of an image. It's particularly useful for augmenting datasets of
    road or terrain images, adding realistic texture variations.

    Args:
        gravel_roi (tuple[float, float, float, float]): Region of interest where gravel
            will be added, specified as (x_min, y_min, x_max, y_max) in relative coordinates
            [0, 1]. Default: (0.1, 0.4, 0.9, 0.9).
        number_of_patches (int): Number of gravel patch regions to generate within the ROI.
            Each patch will contain multiple gravel particles. Default: 2.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        3

    Note:
        - The gravel effect is created by modifying the saturation channel in the HLS color space.
        - Gravel particles are distributed within randomly generated patches inside the specified ROI.
        - This transform is particularly useful for:
          * Augmenting datasets for road condition analysis
          * Simulating variations in terrain for computer vision tasks
          * Adding realistic texture to synthetic images of outdoor scenes

    Mathematical Formulation:
        For each gravel patch:
        1. A rectangular region is randomly generated within the specified ROI.
        2. Within this region, multiple gravel particles are placed.
        3. For each particle:
           - Random (x, y) coordinates are generated within the patch.
           - A random radius (r) between 1 and 3 pixels is assigned.
           - A random saturation value (sat) between 0 and 255 is assigned.
        4. The saturation channel of the image is modified for each particle:
           image_hls[y-r:y+r, x-r:x+r, 1] = sat

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RandomGravel(p=1.0)
        >>> augmented_image = transform(image=image)["image"]

        # Custom ROI and number of patches
        >>> transform = A.RandomGravel(
        ...     gravel_roi=(0.2, 0.2, 0.8, 0.8),
        ...     number_of_patches=5,
        ...     p=1.0
        ... )
        >>> augmented_image = transform(image=image)["image"]

        # Combining with other transforms
        >>> transform = A.Compose([
        ...     A.RandomGravel(p=0.7),
        ...     A.RandomBrightnessContrast(p=0.5),
        ... ])
        >>> augmented_image = transform(image=image)["image"]

    References:
        - Road surface textures: https://en.wikipedia.org/wiki/Road_surface
        - HLS color space: https://en.wikipedia.org/wiki/HSL_and_HSV
    """

    class InitSchema(BaseTransformInitSchema):
        gravel_roi: tuple[float, float, float, float]
        number_of_patches: int = Field(ge=1)

        @model_validator(mode="after")
        def validate_gravel_roi(self) -> Self:
            gravel_lower_x, gravel_lower_y, gravel_upper_x, gravel_upper_y = self.gravel_roi
            if not 0 <= gravel_lower_x < gravel_upper_x <= 1 or not 0 <= gravel_lower_y < gravel_upper_y <= 1:
                raise ValueError(f"Invalid gravel_roi. Got: {self.gravel_roi}.")
            return self

    def __init__(
        self,
        gravel_roi: tuple[float, float, float, float] = (0.1, 0.4, 0.9, 0.9),
        number_of_patches: int = 2,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p, always_apply)
        self.gravel_roi = gravel_roi
        self.number_of_patches = number_of_patches

    def generate_gravel_patch(self, rectangular_roi: tuple[int, int, int, int]) -> np.ndarray:
        x_min, y_min, x_max, y_max = rectangular_roi
        area = abs((x_max - x_min) * (y_max - y_min))
        count = area // 10
        gravels = np.empty([count, 2], dtype=np.int64)
        gravels[:, 0] = random_utils.randint(x_min, x_max, count)
        gravels[:, 1] = random_utils.randint(y_min, y_max, count)
        return gravels

    def apply(self, img: np.ndarray, gravels_infos: list[Any], **params: Any) -> np.ndarray:
        return fmain.add_gravel(img, gravels_infos)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]

        # Calculate ROI in pixels
        x_min, y_min, x_max, y_max = (
            int(coord * dim) for coord, dim in zip(self.gravel_roi, [width, height, width, height])
        )

        roi_width = x_max - x_min
        roi_height = y_max - y_min

        gravels_info = []

        for _ in range(self.number_of_patches):
            # Generate a random rectangular region within the ROI
            patch_width = random.randint(roi_width // 10, roi_width // 5)
            patch_height = random.randint(roi_height // 10, roi_height // 5)

            patch_x = random.randint(x_min, x_max - patch_width)
            patch_y = random.randint(y_min, y_max - patch_height)

            # Generate gravel particles within this patch
            num_particles = (patch_width * patch_height) // 100  # Adjust this divisor to control density

            for _ in range(num_particles):
                x = random.randint(patch_x, patch_x + patch_width)
                y = random.randint(patch_y, patch_y + patch_height)
                r = random.randint(1, 3)
                sat = random.randint(0, 255)

                gravels_info.append(
                    [
                        max(y - r, 0),  # min_y
                        min(y + r, height - 1),  # max_y
                        max(x - r, 0),  # min_x
                        min(x + r, width - 1),  # max_x
                        sat,  # saturation
                    ],
                )

        return {"gravels_infos": np.array(gravels_info, dtype=np.int64)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "gravel_roi", "number_of_patches"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    gravel_roi: tuple[float, float, float, float]
    number_of_patches: int = Field(ge=1)

    @model_validator(mode="after")
    def validate_gravel_roi(self) -> Self:
        gravel_lower_x, gravel_lower_y, gravel_upper_x, gravel_upper_y = self.gravel_roi
        if not 0 <= gravel_lower_x < gravel_upper_x <= 1 or not 0 <= gravel_lower_y < gravel_upper_y <= 1:
            raise ValueError(f"Invalid gravel_roi. Got: {self.gravel_roi}.")
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, gravels_infos, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, gravels_infos: list[Any], **params: Any) -> np.ndarray:
    return fmain.add_gravel(img, gravels_infos)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    height, width = params["shape"][:2]

    # Calculate ROI in pixels
    x_min, y_min, x_max, y_max = (
        int(coord * dim) for coord, dim in zip(self.gravel_roi, [width, height, width, height])
    )

    roi_width = x_max - x_min
    roi_height = y_max - y_min

    gravels_info = []

    for _ in range(self.number_of_patches):
        # Generate a random rectangular region within the ROI
        patch_width = random.randint(roi_width // 10, roi_width // 5)
        patch_height = random.randint(roi_height // 10, roi_height // 5)

        patch_x = random.randint(x_min, x_max - patch_width)
        patch_y = random.randint(y_min, y_max - patch_height)

        # Generate gravel particles within this patch
        num_particles = (patch_width * patch_height) // 100  # Adjust this divisor to control density

        for _ in range(num_particles):
            x = random.randint(patch_x, patch_x + patch_width)
            y = random.randint(patch_y, patch_y + patch_height)
            r = random.randint(1, 3)
            sat = random.randint(0, 255)

            gravels_info.append(
                [
                    max(y - r, 0),  # min_y
                    min(y + r, height - 1),  # max_y
                    max(x - r, 0),  # min_x
                    min(x + r, width - 1),  # max_x
                    sat,  # saturation
                ],
            )

    return {"gravels_infos": np.array(gravels_info, dtype=np.int64)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return "gravel_roi", "number_of_patches"

class RandomGridShuffle (grid=(3, 3), p=0.5, always_apply=None) [view source on GitHub]

Randomly shuffles the grid's cells on an image, mask, or keypoints, effectively rearranging patches within the image. This transformation divides the image into a grid and then permutes these grid cells based on a random mapping.

Parameters:

Name Type Description
grid tuple[int, int]

Size of the grid for splitting the image into cells. Each cell is shuffled randomly.

p float

Probability that the transform will be applied.

Targets

image, mask, keypoints

Image types: uint8, float32

Examples:

Python
>>> import albumentations as A
>>> transform = A.Compose([
    A.RandomGridShuffle(grid=(3, 3), p=1.0)
])
>>> transformed = transform(image=my_image, mask=my_mask)
>>> image, mask = transformed['image'], transformed['mask']
# This will shuffle the 3x3 grid cells of `my_image` and `my_mask` randomly.
# Mask and image are shuffled in a consistent way

Note

This transform could be useful when only micro features are important for the model, and memorizing the global structure could be harmful. For example: - Identifying the type of cell phone used to take a picture based on micro artifacts generated by phone post-processing algorithms, rather than the semantic features of the photo. See more at https://ieeexplore.ieee.org/abstract/document/8622031 - Identifying stress, glucose, hydration levels based on skin images.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomGridShuffle(DualTransform):
    """Randomly shuffles the grid's cells on an image, mask, or keypoints,
    effectively rearranging patches within the image.
    This transformation divides the image into a grid and then permutes these grid cells based on a random mapping.


    Args:
        grid (tuple[int, int]): Size of the grid for splitting the image into cells. Each cell is shuffled randomly.
        p (float): Probability that the transform will be applied.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> import albumentations as A
        >>> transform = A.Compose([
            A.RandomGridShuffle(grid=(3, 3), p=1.0)
        ])
        >>> transformed = transform(image=my_image, mask=my_mask)
        >>> image, mask = transformed['image'], transformed['mask']
        # This will shuffle the 3x3 grid cells of `my_image` and `my_mask` randomly.
        # Mask and image are shuffled in a consistent way
    Note:
        This transform could be useful when only micro features are important for the model, and memorizing
        the global structure could be harmful. For example:
        - Identifying the type of cell phone used to take a picture based on micro artifacts generated by
        phone post-processing algorithms, rather than the semantic features of the photo.
        See more at https://ieeexplore.ieee.org/abstract/document/8622031
        - Identifying stress, glucose, hydration levels based on skin images.
    """

    class InitSchema(BaseTransformInitSchema):
        grid: Annotated[tuple[int, int], AfterValidator(check_1plus)] = (3, 3)

    _targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)

    def __init__(self, grid: tuple[int, int] = (3, 3), p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.grid = grid

    def apply(self, img: np.ndarray, tiles: np.ndarray, mapping: list[int], **params: Any) -> np.ndarray:
        return fmain.swap_tiles_on_image(img, tiles, mapping)

    def apply_to_mask(self, mask: np.ndarray, tiles: np.ndarray, mapping: list[int], **params: Any) -> np.ndarray:
        return fmain.swap_tiles_on_image(mask, tiles, mapping)

    def apply_to_keypoints(
        self,
        keypoints: np.ndarray,
        tiles: np.ndarray,
        mapping: np.ndarray,
        **params: Any,
    ) -> np.ndarray:
        return fmain.swap_tiles_on_keypoints(keypoints, tiles, mapping)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
        height, width = params["shape"][:2]
        random_state = random_utils.get_random_state()
        original_tiles = fmain.split_uniform_grid(
            (height, width),
            self.grid,
            random_state=random_state,
        )
        shape_groups = fmain.create_shape_groups(original_tiles)
        mapping = fmain.shuffle_tiles_within_shape_groups(shape_groups, random_state=random_state)

        return {"tiles": original_tiles, "mapping": mapping}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("grid",)

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "keypoints": self.apply_to_keypoints,
        }
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    grid: Annotated[tuple[int, int], AfterValidator(check_1plus)] = (3, 3)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, tiles, mapping, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, tiles: np.ndarray, mapping: list[int], **params: Any) -> np.ndarray:
    return fmain.swap_tiles_on_image(img, tiles, mapping)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
    height, width = params["shape"][:2]
    random_state = random_utils.get_random_state()
    original_tiles = fmain.split_uniform_grid(
        (height, width),
        self.grid,
        random_state=random_state,
    )
    shape_groups = fmain.create_shape_groups(original_tiles)
    mapping = fmain.shuffle_tiles_within_shape_groups(shape_groups, random_state=random_state)

    return {"tiles": original_tiles, "mapping": mapping}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("grid",)

class RandomRain (slant_lower=None, slant_upper=None, slant_range=(-10, 10), drop_length=20, drop_width=1, drop_color=(200, 200, 200), blur_value=7, brightness_coefficient=0.7, rain_type='default', always_apply=None, p=0.5) [view source on GitHub]

Adds rain effects to an image.

This transform simulates rainfall by overlaying semi-transparent streaks onto the image, creating a realistic rain effect. It can be used to augment datasets for computer vision tasks that need to perform well in rainy conditions.

Parameters:

Name Type Description
slant_range tuple[int, int]

Range for the rain slant angle in degrees. Negative values slant to the left, positive to the right. Default: (-10, 10).

drop_length int

Length of the rain drops in pixels. Default: 20.

drop_width int

Width of the rain drops in pixels. Default: 1.

drop_color tuple[int, int, int]

Color of the rain drops in RGB format. Default: (200, 200, 200).

blur_value int

Blur value for simulating rain effect. Rainy views are typically blurry. Default: 7.

brightness_coefficient float

Coefficient to adjust the brightness of the image. Rainy scenes are usually darker. Should be in the range (0, 1]. Default: 0.7.

rain_type Literal["drizzle", "heavy", "torrential", "default"]

Type of rain to simulate.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: 3

Note

  • The rain effect is created by drawing semi-transparent lines on the image.
  • The slant of the rain can be controlled to simulate wind effects.
  • Different rain types (drizzle, heavy, torrential) adjust the density and appearance of the rain.
  • The transform also adjusts image brightness and applies a blur to simulate the visual effects of rain.
  • This transform is particularly useful for:
  • Augmenting datasets for autonomous driving in rainy conditions
  • Testing the robustness of computer vision models to weather effects
  • Creating realistic rainy scenes for image editing or film production

Mathematical Formulation: For each raindrop: 1. Start position (x1, y1) is randomly generated within the image. 2. End position (x2, y2) is calculated based on drop_length and slant: x2 = x1 + drop_length * sin(slant) y2 = y1 + drop_length * cos(slant) 3. A line is drawn from (x1, y1) to (x2, y2) with the specified drop_color and drop_width. 4. The image is then blurred and its brightness is adjusted.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RandomRain(p=1.0)
>>> rainy_image = transform(image=image)["image"]
Custom rain parameters
Python
>>> transform = A.RandomRain(
...     slant_range=(-15, 15),
...     drop_length=30,
...     drop_width=2,
...     drop_color=(180, 180, 180),
...     blur_value=5,
...     brightness_coefficient=0.8,
...     p=1.0
... )
>>> rainy_image = transform(image=image)["image"]
Simulating heavy rain
Python
>>> transform = A.RandomRain(rain_type="heavy", p=1.0)
>>> heavy_rain_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomRain(ImageOnlyTransform):
    """Adds rain effects to an image.

    This transform simulates rainfall by overlaying semi-transparent streaks onto the image,
    creating a realistic rain effect. It can be used to augment datasets for computer vision
    tasks that need to perform well in rainy conditions.

    Args:
        slant_range (tuple[int, int]): Range for the rain slant angle in degrees.
            Negative values slant to the left, positive to the right. Default: (-10, 10).
        drop_length (int): Length of the rain drops in pixels. Default: 20.
        drop_width (int): Width of the rain drops in pixels. Default: 1.
        drop_color (tuple[int, int, int]): Color of the rain drops in RGB format. Default: (200, 200, 200).
        blur_value (int): Blur value for simulating rain effect. Rainy views are typically blurry. Default: 7.
        brightness_coefficient (float): Coefficient to adjust the brightness of the image.
            Rainy scenes are usually darker. Should be in the range (0, 1]. Default: 0.7.
        rain_type (Literal["drizzle", "heavy", "torrential", "default"]): Type of rain to simulate.
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        3

    Note:
        - The rain effect is created by drawing semi-transparent lines on the image.
        - The slant of the rain can be controlled to simulate wind effects.
        - Different rain types (drizzle, heavy, torrential) adjust the density and appearance of the rain.
        - The transform also adjusts image brightness and applies a blur to simulate the visual effects of rain.
        - This transform is particularly useful for:
          * Augmenting datasets for autonomous driving in rainy conditions
          * Testing the robustness of computer vision models to weather effects
          * Creating realistic rainy scenes for image editing or film production

    Mathematical Formulation:
        For each raindrop:
        1. Start position (x1, y1) is randomly generated within the image.
        2. End position (x2, y2) is calculated based on drop_length and slant:
           x2 = x1 + drop_length * sin(slant)
           y2 = y1 + drop_length * cos(slant)
        3. A line is drawn from (x1, y1) to (x2, y2) with the specified drop_color and drop_width.
        4. The image is then blurred and its brightness is adjusted.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RandomRain(p=1.0)
        >>> rainy_image = transform(image=image)["image"]

        # Custom rain parameters
        >>> transform = A.RandomRain(
        ...     slant_range=(-15, 15),
        ...     drop_length=30,
        ...     drop_width=2,
        ...     drop_color=(180, 180, 180),
        ...     blur_value=5,
        ...     brightness_coefficient=0.8,
        ...     p=1.0
        ... )
        >>> rainy_image = transform(image=image)["image"]

        # Simulating heavy rain
        >>> transform = A.RandomRain(rain_type="heavy", p=1.0)
        >>> heavy_rain_image = transform(image=image)["image"]

    References:
        - Rain visualization techniques: https://developer.nvidia.com/gpugems/gpugems3/part-iv-image-effects/chapter-27-real-time-rain-rendering
        - Weather effects in computer vision: https://www.sciencedirect.com/science/article/pii/S1077314220300692
    """

    class InitSchema(BaseTransformInitSchema):
        slant_lower: int | None = Field(default=None)
        slant_upper: int | None = Field(default=None)
        slant_range: Annotated[tuple[float, float], AfterValidator(nondecreasing)]
        drop_length: int = Field(ge=1)
        drop_width: int = Field(ge=1)
        drop_color: tuple[int, int, int]
        blur_value: int = Field(ge=1)
        brightness_coefficient: float = Field(gt=0, le=1)
        rain_type: RainMode

        @model_validator(mode="after")
        def validate_ranges(self) -> Self:
            if self.slant_lower is not None or self.slant_upper is not None:
                if self.slant_lower is not None:
                    warn(
                        "`slant_lower` deprecated. Use `slant_range` as tuple (slant_lower, slant_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.slant_upper is not None:
                    warn(
                        "`slant_upper` deprecated. Use `slant_range` as tuple (slant_lower, slant_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.slant_lower if self.slant_lower is not None else self.slant_range[0]
                upper = self.slant_upper if self.slant_upper is not None else self.slant_range[1]
                self.slant_range = (lower, upper)
                self.slant_lower = None
                self.slant_upper = None

            # Validate the slant_range
            if not (-MAX_RAIN_ANGLE <= self.slant_range[0] <= self.slant_range[1] <= MAX_RAIN_ANGLE):
                raise ValueError(
                    f"slant_range values should be increasing within [-{MAX_RAIN_ANGLE}, {MAX_RAIN_ANGLE}] range.",
                )
            return self

    def __init__(
        self,
        slant_lower: int | None = None,
        slant_upper: int | None = None,
        slant_range: tuple[int, int] = (-10, 10),
        drop_length: int = 20,
        drop_width: int = 1,
        drop_color: tuple[int, int, int] = (200, 200, 200),
        blur_value: int = 7,
        brightness_coefficient: float = 0.7,
        rain_type: RainMode = "default",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.slant_range = slant_range
        self.drop_length = drop_length
        self.drop_width = drop_width
        self.drop_color = drop_color
        self.blur_value = blur_value
        self.brightness_coefficient = brightness_coefficient
        self.rain_type = rain_type

    def apply(
        self,
        img: np.ndarray,
        slant: int,
        drop_length: int,
        rain_drops: list[tuple[int, int]],
        **params: Any,
    ) -> np.ndarray:
        non_rgb_error(img)

        return fmain.add_rain(
            img,
            slant,
            drop_length,
            self.drop_width,
            self.drop_color,
            self.blur_value,
            self.brightness_coefficient,
            rain_drops,
        )

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        slant = int(random.uniform(*self.slant_range))

        height, width = params["shape"][:2]
        area = height * width

        if self.rain_type == "drizzle":
            num_drops = area // 770
            drop_length = 10
        elif self.rain_type == "heavy":
            num_drops = width * height // 600
            drop_length = 30
        elif self.rain_type == "torrential":
            num_drops = area // 500
            drop_length = 60
        else:
            drop_length = self.drop_length
            num_drops = area // 600

        rain_drops = []

        for _ in range(num_drops):  # If You want heavy rain, try increasing this
            x = random.randint(slant, width) if slant < 0 else random.randint(0, max(width - slant, 0))
            y = random.randint(0, max(height - drop_length, 0))

            rain_drops.append((x, y))

        return {"drop_length": drop_length, "slant": slant, "rain_drops": rain_drops}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "slant_range",
            "drop_length",
            "drop_width",
            "drop_color",
            "blur_value",
            "brightness_coefficient",
            "rain_type",
        )
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    slant_lower: int | None = Field(default=None)
    slant_upper: int | None = Field(default=None)
    slant_range: Annotated[tuple[float, float], AfterValidator(nondecreasing)]
    drop_length: int = Field(ge=1)
    drop_width: int = Field(ge=1)
    drop_color: tuple[int, int, int]
    blur_value: int = Field(ge=1)
    brightness_coefficient: float = Field(gt=0, le=1)
    rain_type: RainMode

    @model_validator(mode="after")
    def validate_ranges(self) -> Self:
        if self.slant_lower is not None or self.slant_upper is not None:
            if self.slant_lower is not None:
                warn(
                    "`slant_lower` deprecated. Use `slant_range` as tuple (slant_lower, slant_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            if self.slant_upper is not None:
                warn(
                    "`slant_upper` deprecated. Use `slant_range` as tuple (slant_lower, slant_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            lower = self.slant_lower if self.slant_lower is not None else self.slant_range[0]
            upper = self.slant_upper if self.slant_upper is not None else self.slant_range[1]
            self.slant_range = (lower, upper)
            self.slant_lower = None
            self.slant_upper = None

        # Validate the slant_range
        if not (-MAX_RAIN_ANGLE <= self.slant_range[0] <= self.slant_range[1] <= MAX_RAIN_ANGLE):
            raise ValueError(
                f"slant_range values should be increasing within [-{MAX_RAIN_ANGLE}, {MAX_RAIN_ANGLE}] range.",
            )
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, slant, drop_length, rain_drops, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    slant: int,
    drop_length: int,
    rain_drops: list[tuple[int, int]],
    **params: Any,
) -> np.ndarray:
    non_rgb_error(img)

    return fmain.add_rain(
        img,
        slant,
        drop_length,
        self.drop_width,
        self.drop_color,
        self.blur_value,
        self.brightness_coefficient,
        rain_drops,
    )
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    slant = int(random.uniform(*self.slant_range))

    height, width = params["shape"][:2]
    area = height * width

    if self.rain_type == "drizzle":
        num_drops = area // 770
        drop_length = 10
    elif self.rain_type == "heavy":
        num_drops = width * height // 600
        drop_length = 30
    elif self.rain_type == "torrential":
        num_drops = area // 500
        drop_length = 60
    else:
        drop_length = self.drop_length
        num_drops = area // 600

    rain_drops = []

    for _ in range(num_drops):  # If You want heavy rain, try increasing this
        x = random.randint(slant, width) if slant < 0 else random.randint(0, max(width - slant, 0))
        y = random.randint(0, max(height - drop_length, 0))

        rain_drops.append((x, y))

    return {"drop_length": drop_length, "slant": slant, "rain_drops": rain_drops}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "slant_range",
        "drop_length",
        "drop_width",
        "drop_color",
        "blur_value",
        "brightness_coefficient",
        "rain_type",
    )

class RandomShadow (shadow_roi=(0, 0.5, 1, 1), num_shadows_limit=(1, 2), num_shadows_lower=None, num_shadows_upper=None, shadow_dimension=5, shadow_intensity_range=(0.5, 0.5), always_apply=None, p=0.5) [view source on GitHub]

Simulates shadows for the image by reducing the brightness of the image in shadow regions.

This transform adds realistic shadow effects to images, which can be useful for augmenting datasets for outdoor scene analysis, autonomous driving, or any computer vision task where shadows may be present.

Parameters:

Name Type Description
shadow_roi tuple[float, float, float, float]

Region of the image where shadows will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1]. Default: (0, 0.5, 1, 1).

num_shadows_limit tuple[int, int]

Lower and upper limits for the possible number of shadows. Default: (1, 2).

shadow_dimension int

Number of edges in the shadow polygons. Default: 5.

shadow_intensity_range tuple[float, float]

Range for the shadow intensity. Should be two float values between 0 and 1. Default: (0.5, 0.5).

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • Shadows are created by generating random polygons within the specified ROI and reducing the brightness of the image in these areas.
  • The number of shadows, their shapes, and intensities can be randomized for variety.
  • This transform is particularly useful for:
  • Augmenting datasets for outdoor scene understanding
  • Improving robustness of object detection models to shadowed conditions
  • Simulating different lighting conditions in synthetic datasets

Mathematical Formulation: For each shadow: 1. A polygon with shadow_dimension vertices is generated within the shadow ROI. 2. The shadow intensity a is randomly chosen from shadow_intensity_range. 3. For each pixel (x, y) within the polygon: new_pixel_value = original_pixel_value * (1 - a)

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage
Python
>>> transform = A.RandomShadow(p=1.0)
>>> shadowed_image = transform(image=image)["image"]
Custom shadow parameters
Python
>>> transform = A.RandomShadow(
...     shadow_roi=(0.2, 0.2, 0.8, 0.8),
...     num_shadows_limit=(2, 4),
...     shadow_dimension=8,
...     shadow_intensity_range=(0.3, 0.7),
...     p=1.0
... )
>>> shadowed_image = transform(image=image)["image"]
Combining with other transforms
Python
>>> transform = A.Compose([
...     A.RandomShadow(p=0.5),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> augmented_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomShadow(ImageOnlyTransform):
    """Simulates shadows for the image by reducing the brightness of the image in shadow regions.

    This transform adds realistic shadow effects to images, which can be useful for augmenting
    datasets for outdoor scene analysis, autonomous driving, or any computer vision task where
    shadows may be present.

    Args:
        shadow_roi (tuple[float, float, float, float]): Region of the image where shadows
            will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].
            Default: (0, 0.5, 1, 1).
        num_shadows_limit (tuple[int, int]): Lower and upper limits for the possible number of shadows.
            Default: (1, 2).
        shadow_dimension (int): Number of edges in the shadow polygons. Default: 5.
        shadow_intensity_range (tuple[float, float]): Range for the shadow intensity.
            Should be two float values between 0 and 1. Default: (0.5, 0.5).
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - Shadows are created by generating random polygons within the specified ROI and
          reducing the brightness of the image in these areas.
        - The number of shadows, their shapes, and intensities can be randomized for variety.
        - This transform is particularly useful for:
          * Augmenting datasets for outdoor scene understanding
          * Improving robustness of object detection models to shadowed conditions
          * Simulating different lighting conditions in synthetic datasets

    Mathematical Formulation:
        For each shadow:
        1. A polygon with `shadow_dimension` vertices is generated within the shadow ROI.
        2. The shadow intensity a is randomly chosen from `shadow_intensity_range`.
        3. For each pixel (x, y) within the polygon:
           new_pixel_value = original_pixel_value * (1 - a)

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage
        >>> transform = A.RandomShadow(p=1.0)
        >>> shadowed_image = transform(image=image)["image"]

        # Custom shadow parameters
        >>> transform = A.RandomShadow(
        ...     shadow_roi=(0.2, 0.2, 0.8, 0.8),
        ...     num_shadows_limit=(2, 4),
        ...     shadow_dimension=8,
        ...     shadow_intensity_range=(0.3, 0.7),
        ...     p=1.0
        ... )
        >>> shadowed_image = transform(image=image)["image"]

        # Combining with other transforms
        >>> transform = A.Compose([
        ...     A.RandomShadow(p=0.5),
        ...     A.RandomBrightnessContrast(p=0.5),
        ... ])
        >>> augmented_image = transform(image=image)["image"]

    References:
        - Shadow detection and removal: https://www.sciencedirect.com/science/article/pii/S1047320315002035
        - Shadows in computer vision: https://en.wikipedia.org/wiki/Shadow_detection
    """

    class InitSchema(BaseTransformInitSchema):
        shadow_roi: tuple[float, float, float, float]
        num_shadows_limit: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)]
        num_shadows_lower: int | None
        num_shadows_upper: int | None
        shadow_dimension: int = Field(ge=1)

        shadow_intensity_range: Annotated[
            tuple[float, float],
            AfterValidator(check_01),
            AfterValidator(nondecreasing),
        ]

        @model_validator(mode="after")
        def validate_shadows(self) -> Self:
            if self.num_shadows_lower is not None:
                warn(
                    "`num_shadows_lower` is deprecated. Use `num_shadows_limit` instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.num_shadows_upper is not None:
                warn(
                    "`num_shadows_upper` is deprecated. Use `num_shadows_limit` instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )

            if self.num_shadows_lower is not None or self.num_shadows_upper is not None:
                num_shadows_lower = (
                    self.num_shadows_lower if self.num_shadows_lower is not None else self.num_shadows_limit[0]
                )
                num_shadows_upper = (
                    self.num_shadows_upper if self.num_shadows_upper is not None else self.num_shadows_limit[1]
                )

                self.num_shadows_limit = (num_shadows_lower, num_shadows_upper)
                self.num_shadows_lower = None
                self.num_shadows_upper = None

            shadow_lower_x, shadow_lower_y, shadow_upper_x, shadow_upper_y = self.shadow_roi

            if not 0 <= shadow_lower_x <= shadow_upper_x <= 1 or not 0 <= shadow_lower_y <= shadow_upper_y <= 1:
                raise ValueError(f"Invalid shadow_roi. Got: {self.shadow_roi}")

            if isinstance(self.shadow_intensity_range, float):
                if not (0 <= self.shadow_intensity_range <= 1):
                    raise ValueError(
                        f"shadow_intensity_range value should be within [0, 1] range. "
                        f"Got: {self.shadow_intensity_range}",
                    )
            elif isinstance(self.shadow_intensity_range, tuple):
                if not (0 <= self.shadow_intensity_range[0] <= self.shadow_intensity_range[1] <= 1):
                    raise ValueError(
                        f"shadow_intensity_range values should be within [0, 1] range and increasing. "
                        f"Got: {self.shadow_intensity_range}",
                    )
            else:
                raise TypeError("shadow_intensity_range should be an float or a tuple of floats.")

            return self

    def __init__(
        self,
        shadow_roi: tuple[float, float, float, float] = (0, 0.5, 1, 1),
        num_shadows_limit: tuple[int, int] = (1, 2),
        num_shadows_lower: int | None = None,
        num_shadows_upper: int | None = None,
        shadow_dimension: int = 5,
        shadow_intensity_range: tuple[float, float] = (0.5, 0.5),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.shadow_roi = shadow_roi
        self.shadow_dimension = shadow_dimension
        self.num_shadows_limit = num_shadows_limit
        self.shadow_intensity_range = shadow_intensity_range

    def apply(
        self,
        img: np.ndarray,
        vertices_list: list[np.ndarray],
        intensities: np.ndarray,
        **params: Any,
    ) -> np.ndarray:
        return fmain.add_shadow(img, vertices_list, intensities)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, list[np.ndarray]]:
        height, width = params["shape"][:2]

        num_shadows = random.randint(self.num_shadows_limit[0], self.num_shadows_limit[1])

        x_min, y_min, x_max, y_max = self.shadow_roi

        x_min = int(x_min * width)
        x_max = int(x_max * width)
        y_min = int(y_min * height)
        y_max = int(y_max * height)

        vertices_list = [
            np.stack(
                [
                    random_utils.randint(x_min, x_max, size=5),
                    random_utils.randint(y_min, y_max, size=5),
                ],
                axis=1,
            )
            for _ in range(num_shadows)
        ]

        # Sample shadow intensity for each shadow
        intensities = random_utils.uniform(
            self.shadow_intensity_range[0],
            self.shadow_intensity_range[1],
            size=num_shadows,
        )

        return {"vertices_list": vertices_list, "intensities": intensities}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return (
            "shadow_roi",
            "num_shadows_limit",
            "shadow_dimension",
        )
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    shadow_roi: tuple[float, float, float, float]
    num_shadows_limit: Annotated[tuple[int, int], AfterValidator(check_1plus), AfterValidator(nondecreasing)]
    num_shadows_lower: int | None
    num_shadows_upper: int | None
    shadow_dimension: int = Field(ge=1)

    shadow_intensity_range: Annotated[
        tuple[float, float],
        AfterValidator(check_01),
        AfterValidator(nondecreasing),
    ]

    @model_validator(mode="after")
    def validate_shadows(self) -> Self:
        if self.num_shadows_lower is not None:
            warn(
                "`num_shadows_lower` is deprecated. Use `num_shadows_limit` instead.",
                DeprecationWarning,
                stacklevel=2,
            )

        if self.num_shadows_upper is not None:
            warn(
                "`num_shadows_upper` is deprecated. Use `num_shadows_limit` instead.",
                DeprecationWarning,
                stacklevel=2,
            )

        if self.num_shadows_lower is not None or self.num_shadows_upper is not None:
            num_shadows_lower = (
                self.num_shadows_lower if self.num_shadows_lower is not None else self.num_shadows_limit[0]
            )
            num_shadows_upper = (
                self.num_shadows_upper if self.num_shadows_upper is not None else self.num_shadows_limit[1]
            )

            self.num_shadows_limit = (num_shadows_lower, num_shadows_upper)
            self.num_shadows_lower = None
            self.num_shadows_upper = None

        shadow_lower_x, shadow_lower_y, shadow_upper_x, shadow_upper_y = self.shadow_roi

        if not 0 <= shadow_lower_x <= shadow_upper_x <= 1 or not 0 <= shadow_lower_y <= shadow_upper_y <= 1:
            raise ValueError(f"Invalid shadow_roi. Got: {self.shadow_roi}")

        if isinstance(self.shadow_intensity_range, float):
            if not (0 <= self.shadow_intensity_range <= 1):
                raise ValueError(
                    f"shadow_intensity_range value should be within [0, 1] range. "
                    f"Got: {self.shadow_intensity_range}",
                )
        elif isinstance(self.shadow_intensity_range, tuple):
            if not (0 <= self.shadow_intensity_range[0] <= self.shadow_intensity_range[1] <= 1):
                raise ValueError(
                    f"shadow_intensity_range values should be within [0, 1] range and increasing. "
                    f"Got: {self.shadow_intensity_range}",
                )
        else:
            raise TypeError("shadow_intensity_range should be an float or a tuple of floats.")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, vertices_list, intensities, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    vertices_list: list[np.ndarray],
    intensities: np.ndarray,
    **params: Any,
) -> np.ndarray:
    return fmain.add_shadow(img, vertices_list, intensities)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, list[np.ndarray]]:
    height, width = params["shape"][:2]

    num_shadows = random.randint(self.num_shadows_limit[0], self.num_shadows_limit[1])

    x_min, y_min, x_max, y_max = self.shadow_roi

    x_min = int(x_min * width)
    x_max = int(x_max * width)
    y_min = int(y_min * height)
    y_max = int(y_max * height)

    vertices_list = [
        np.stack(
            [
                random_utils.randint(x_min, x_max, size=5),
                random_utils.randint(y_min, y_max, size=5),
            ],
            axis=1,
        )
        for _ in range(num_shadows)
    ]

    # Sample shadow intensity for each shadow
    intensities = random_utils.uniform(
        self.shadow_intensity_range[0],
        self.shadow_intensity_range[1],
        size=num_shadows,
    )

    return {"vertices_list": vertices_list, "intensities": intensities}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return (
        "shadow_roi",
        "num_shadows_limit",
        "shadow_dimension",
    )

class RandomSnow (snow_point_lower=None, snow_point_upper=None, brightness_coeff=2.5, snow_point_range=(0.1, 0.3), method='bleach', always_apply=None, p=0.5) [view source on GitHub]

Applies a random snow effect to the input image.

This transform simulates snowfall by either bleaching out some pixel values or adding a snow texture to the image, depending on the chosen method.

Parameters:

Name Type Description
snow_point_range tuple[float, float]

Range for the snow point threshold. Both values should be in the (0, 1) range. Default: (0.1, 0.3).

brightness_coeff float

Coefficient applied to increase the brightness of pixels below the snow_point threshold. Larger values lead to more pronounced snow effects. Should be > 0. Default: 2.5.

method Literal["bleach", "texture"]

The snow simulation method to use. Options are: - "bleach": Uses a simple pixel value thresholding technique. - "texture": Applies a more realistic snow texture overlay. Default: "texture".

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Note

  • The "bleach" method increases the brightness of pixels above a certain threshold, creating a simple snow effect. This method is faster but may look less realistic.
  • The "texture" method creates a more realistic snow effect through the following steps:
  • Converts the image to HSV color space for better control over brightness.
  • Increases overall image brightness to simulate the reflective nature of snow.
  • Generates a snow texture using Gaussian noise, which is then smoothed with a Gaussian filter.
  • Applies a depth effect to the snow texture, making it more prominent at the top of the image.
  • Blends the snow texture with the original image using alpha compositing.
  • Adds a slight blue tint to simulate the cool color of snow.
  • Adds random sparkle effects to simulate light reflecting off snow crystals. This method produces a more realistic result but is computationally more expensive.

Mathematical Formulation: For the "bleach" method: Let L be the lightness channel in HLS color space. For each pixel (i, j): If L[i, j] > snow_point: L[i, j] = L[i, j] * brightness_coeff

For the "texture" method:
1. Brightness adjustment: V_new = V * (1 + brightness_coeff * snow_point)
2. Snow texture generation: T = GaussianFilter(GaussianNoise(μ=0.5, sigma=0.3))
3. Depth effect: D = LinearGradient(1.0 to 0.2)
4. Final pixel value: P = (1 - alpha) * original_pixel + alpha * (T * D * 255)
   where alpha is the snow intensity factor derived from snow_point.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Default usage (bleach method)
Python
>>> transform = A.RandomSnow(p=1.0)
>>> snowy_image = transform(image=image)["image"]
Using texture method with custom parameters
Python
>>> transform = A.RandomSnow(
...     snow_point_range=(0.2, 0.4),
...     brightness_coeff=2.0,
...     method="texture",
...     p=1.0
... )
>>> snowy_image = transform(image=image)["image"]

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomSnow(ImageOnlyTransform):
    """Applies a random snow effect to the input image.

    This transform simulates snowfall by either bleaching out some pixel values or
    adding a snow texture to the image, depending on the chosen method.

    Args:
        snow_point_range (tuple[float, float]): Range for the snow point threshold.
            Both values should be in the (0, 1) range. Default: (0.1, 0.3).
        brightness_coeff (float): Coefficient applied to increase the brightness of pixels
            below the snow_point threshold. Larger values lead to more pronounced snow effects.
            Should be > 0. Default: 2.5.
        method (Literal["bleach", "texture"]): The snow simulation method to use. Options are:
            - "bleach": Uses a simple pixel value thresholding technique.
            - "texture": Applies a more realistic snow texture overlay.
            Default: "texture".
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The "bleach" method increases the brightness of pixels above a certain threshold,
          creating a simple snow effect. This method is faster but may look less realistic.
        - The "texture" method creates a more realistic snow effect through the following steps:
          1. Converts the image to HSV color space for better control over brightness.
          2. Increases overall image brightness to simulate the reflective nature of snow.
          3. Generates a snow texture using Gaussian noise, which is then smoothed with a Gaussian filter.
          4. Applies a depth effect to the snow texture, making it more prominent at the top of the image.
          5. Blends the snow texture with the original image using alpha compositing.
          6. Adds a slight blue tint to simulate the cool color of snow.
          7. Adds random sparkle effects to simulate light reflecting off snow crystals.
          This method produces a more realistic result but is computationally more expensive.

    Mathematical Formulation:
        For the "bleach" method:
        Let L be the lightness channel in HLS color space.
        For each pixel (i, j):
        If L[i, j] > snow_point:
            L[i, j] = L[i, j] * brightness_coeff

        For the "texture" method:
        1. Brightness adjustment: V_new = V * (1 + brightness_coeff * snow_point)
        2. Snow texture generation: T = GaussianFilter(GaussianNoise(μ=0.5, sigma=0.3))
        3. Depth effect: D = LinearGradient(1.0 to 0.2)
        4. Final pixel value: P = (1 - alpha) * original_pixel + alpha * (T * D * 255)
           where alpha is the snow intensity factor derived from snow_point.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Default usage (bleach method)
        >>> transform = A.RandomSnow(p=1.0)
        >>> snowy_image = transform(image=image)["image"]

        # Using texture method with custom parameters
        >>> transform = A.RandomSnow(
        ...     snow_point_range=(0.2, 0.4),
        ...     brightness_coeff=2.0,
        ...     method="texture",
        ...     p=1.0
        ... )
        >>> snowy_image = transform(image=image)["image"]

    References:
        - Bleach method: https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
        - Texture method: Inspired by computer graphics techniques for snow rendering
          and atmospheric scattering simulations.
    """

    class InitSchema(BaseTransformInitSchema):
        snow_point_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

        snow_point_lower: float | None = Field(
            gt=0,
            lt=1,
        )
        snow_point_upper: float | None = Field(
            gt=0,
            lt=1,
        )
        brightness_coeff: float = Field(gt=0)
        method: Literal["bleach", "texture"]

        @model_validator(mode="after")
        def validate_ranges(self) -> Self:
            if self.snow_point_lower is not None or self.snow_point_upper is not None:
                if self.snow_point_lower is not None:
                    warn(
                        "`snow_point_lower` deprecated. Use `snow_point_range` as tuple"
                        " (snow_point_lower, snow_point_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.snow_point_upper is not None:
                    warn(
                        "`snow_point_upper` deprecated. Use `snow_point_range` as tuple"
                        "(snow_point_lower, snow_point_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.snow_point_lower if self.snow_point_lower is not None else self.snow_point_range[0]
                upper = self.snow_point_upper if self.snow_point_upper is not None else self.snow_point_range[1]
                self.snow_point_range = (lower, upper)
                self.snow_point_lower = None
                self.snow_point_upper = None

            # Validate the snow_point_range
            if not (0 < self.snow_point_range[0] <= self.snow_point_range[1] < 1):
                raise ValueError("snow_point_range values should be increasing within (0, 1) range.")

            return self

    def __init__(
        self,
        snow_point_lower: float | None = None,
        snow_point_upper: float | None = None,
        brightness_coeff: float = 2.5,
        snow_point_range: tuple[float, float] = (0.1, 0.3),
        method: Literal["bleach", "texture"] = "bleach",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.snow_point_range = snow_point_range
        self.brightness_coeff = brightness_coeff
        self.method = method

    def apply(self, img: np.ndarray, snow_point: float, **params: Any) -> np.ndarray:
        non_rgb_error(img)

        if self.method == "bleach":
            return fmain.add_snow_bleach(img, snow_point, self.brightness_coeff)
        if self.method == "texture":
            return fmain.add_snow_texture(img, snow_point, self.brightness_coeff)

        raise ValueError(f"Unknown snow method: {self.method}")

    def get_params(self) -> dict[str, np.ndarray]:
        return {"snow_point": random.uniform(*self.snow_point_range)}

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return "snow_point_range", "brightness_coeff"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    snow_point_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

    snow_point_lower: float | None = Field(
        gt=0,
        lt=1,
    )
    snow_point_upper: float | None = Field(
        gt=0,
        lt=1,
    )
    brightness_coeff: float = Field(gt=0)
    method: Literal["bleach", "texture"]

    @model_validator(mode="after")
    def validate_ranges(self) -> Self:
        if self.snow_point_lower is not None or self.snow_point_upper is not None:
            if self.snow_point_lower is not None:
                warn(
                    "`snow_point_lower` deprecated. Use `snow_point_range` as tuple"
                    " (snow_point_lower, snow_point_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            if self.snow_point_upper is not None:
                warn(
                    "`snow_point_upper` deprecated. Use `snow_point_range` as tuple"
                    "(snow_point_lower, snow_point_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            lower = self.snow_point_lower if self.snow_point_lower is not None else self.snow_point_range[0]
            upper = self.snow_point_upper if self.snow_point_upper is not None else self.snow_point_range[1]
            self.snow_point_range = (lower, upper)
            self.snow_point_lower = None
            self.snow_point_upper = None

        # Validate the snow_point_range
        if not (0 < self.snow_point_range[0] <= self.snow_point_range[1] < 1):
            raise ValueError("snow_point_range values should be increasing within (0, 1) range.")

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, snow_point, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, snow_point: float, **params: Any) -> np.ndarray:
    non_rgb_error(img)

    if self.method == "bleach":
        return fmain.add_snow_bleach(img, snow_point, self.brightness_coeff)
    if self.method == "texture":
        return fmain.add_snow_texture(img, snow_point, self.brightness_coeff)

    raise ValueError(f"Unknown snow method: {self.method}")
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, np.ndarray]:
    return {"snow_point": random.uniform(*self.snow_point_range)}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return "snow_point_range", "brightness_coeff"

class RandomSunFlare (flare_roi=(0, 0, 1, 0.5), angle_lower=None, angle_upper=None, num_flare_circles_lower=None, num_flare_circles_upper=None, src_radius=400, src_color=(255, 255, 255), angle_range=(0, 1), num_flare_circles_range=(6, 10), method='overlay', always_apply=None, p=0.5) [view source on GitHub]

Simulates a sun flare effect on the image by adding circles of light.

This transform creates a sun flare effect by overlaying multiple semi-transparent circles of varying sizes and intensities along a line originating from a "sun" point. It offers two methods: a simple overlay technique and a more complex physics-based approach.

Parameters:

Name Type Description
flare_roi tuple[float, float, float, float]

Region of interest where the sun flare can appear. Values are in the range [0, 1] and represent (x_min, y_min, x_max, y_max) in relative coordinates. Default: (0, 0, 1, 0.5).

angle_range tuple[float, float]

Range of angles (in radians) for the flare direction. Values should be in the range [0, 1], where 0 represents 0 radians and 1 represents 2π radians. Default: (0, 1).

num_flare_circles_range tuple[int, int]

Range for the number of flare circles to generate. Default: (6, 10).

src_radius int

Radius of the sun circle in pixels. Default: 400.

src_color tuple[int, int, int]

Color of the sun in RGB format. Default: (255, 255, 255).

method Literal["overlay", "physics_based"]

Method to use for generating the sun flare. "overlay" uses a simple alpha blending technique, while "physics_based" simulates more realistic optical phenomena. Default: "physics_based".

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: - overlay: Any - physics_based: RGB

Note

The transform offers two methods for generating sun flares:

  1. Overlay Method ("overlay"):
  2. Creates a simple sun flare effect using basic alpha blending.
  3. Steps: a. Generate the main sun circle with a radial gradient. b. Create smaller flare circles along the flare line. c. Blend these elements with the original image using alpha compositing.
  4. Characteristics:

    • Faster computation
    • Less realistic appearance
    • Suitable for basic augmentation or when performance is a priority
  5. Physics-based Method ("physics_based"):

  6. Simulates more realistic optical phenomena observed in actual lens flares.
  7. Steps: a. Create a separate flare layer for complex manipulations. b. Add the main sun circle and diffraction spikes to simulate light diffraction. c. Generate and add multiple flare circles with varying properties. d. Apply Gaussian blur to create a soft, glowing effect. e. Create and apply a radial gradient mask for natural fading from the center. f. Simulate chromatic aberration by applying different blurs to color channels. g. Blend the flare with the original image using screen blending mode.
  8. Characteristics:
    • More computationally intensive
    • Produces more realistic and visually appealing results
    • Includes effects like diffraction spikes and chromatic aberration
    • Suitable for high-quality augmentation or realistic image synthesis

Mathematical Formulation: For both methods: 1. Sun position (x_s, y_s) is randomly chosen within the specified ROI. 2. Flare angle θ is randomly chosen from the angle_range. 3. For each flare circle i: - Position (x_i, y_i) = (x_s + t_i * cos(θ), y_s + t_i * sin(θ)) where t_i is a random distance along the flare line. - Radius r_i is randomly chosen, with larger circles closer to the sun. - Alpha (transparency) alpha_i is randomly chosen in the range [0.05, 0.2]. - Color (R_i, G_i, B_i) is randomly chosen close to src_color.

Overlay method blending:
new_pixel = (1 - alpha_i) * original_pixel + alpha_i * flare_color_i

Physics-based method blending:
new_pixel = 255 - ((255 - original_pixel) * (255 - flare_pixel) / 255)

4. Each flare circle is blended with the image using alpha compositing:
   new_pixel = (1 - alpha_i) * original_pixel + alpha_i * flare_color_i

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [1000, 1000, 3], dtype=np.uint8)
Default sun flare (overlay method)
Python
>>> transform = A.RandomSunFlare(p=1.0)
>>> flared_image = transform(image=image)["image"]
Physics-based sun flare with custom parameters
Default sun flare
Python
>>> transform = A.RandomSunFlare(p=1.0)
>>> flared_image = transform(image=image)["image"]
Custom sun flare parameters
Python
>>> transform = A.RandomSunFlare(
...     flare_roi=(0.1, 0, 0.9, 0.3),
...     angle_range=(0.25, 0.75),
...     num_flare_circles_range=(5, 15),
...     src_radius=200,
...     src_color=(255, 200, 100),
...     method="physics_based",
...     p=1.0
... )
>>> flared_image = transform(image=image)["image"]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomSunFlare(ImageOnlyTransform):
    """Simulates a sun flare effect on the image by adding circles of light.

    This transform creates a sun flare effect by overlaying multiple semi-transparent
    circles of varying sizes and intensities along a line originating from a "sun" point.
    It offers two methods: a simple overlay technique and a more complex physics-based approach.

    Args:
        flare_roi (tuple[float, float, float, float]): Region of interest where the sun flare
            can appear. Values are in the range [0, 1] and represent (x_min, y_min, x_max, y_max)
            in relative coordinates. Default: (0, 0, 1, 0.5).
        angle_range (tuple[float, float]): Range of angles (in radians) for the flare direction.
            Values should be in the range [0, 1], where 0 represents 0 radians and 1 represents 2π radians.
            Default: (0, 1).
        num_flare_circles_range (tuple[int, int]): Range for the number of flare circles to generate.
            Default: (6, 10).
        src_radius (int): Radius of the sun circle in pixels. Default: 400.
        src_color (tuple[int, int, int]): Color of the sun in RGB format. Default: (255, 255, 255).
        method (Literal["overlay", "physics_based"]): Method to use for generating the sun flare.
            "overlay" uses a simple alpha blending technique, while "physics_based" simulates
            more realistic optical phenomena. Default: "physics_based".

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        - overlay: Any
        - physics_based: RGB

    Note:
        The transform offers two methods for generating sun flares:

        1. Overlay Method ("overlay"):
           - Creates a simple sun flare effect using basic alpha blending.
           - Steps:
             a. Generate the main sun circle with a radial gradient.
             b. Create smaller flare circles along the flare line.
             c. Blend these elements with the original image using alpha compositing.
           - Characteristics:
             * Faster computation
             * Less realistic appearance
             * Suitable for basic augmentation or when performance is a priority

        2. Physics-based Method ("physics_based"):
           - Simulates more realistic optical phenomena observed in actual lens flares.
           - Steps:
             a. Create a separate flare layer for complex manipulations.
             b. Add the main sun circle and diffraction spikes to simulate light diffraction.
             c. Generate and add multiple flare circles with varying properties.
             d. Apply Gaussian blur to create a soft, glowing effect.
             e. Create and apply a radial gradient mask for natural fading from the center.
             f. Simulate chromatic aberration by applying different blurs to color channels.
             g. Blend the flare with the original image using screen blending mode.
           - Characteristics:
             * More computationally intensive
             * Produces more realistic and visually appealing results
             * Includes effects like diffraction spikes and chromatic aberration
             * Suitable for high-quality augmentation or realistic image synthesis

    Mathematical Formulation:
        For both methods:
        1. Sun position (x_s, y_s) is randomly chosen within the specified ROI.
        2. Flare angle θ is randomly chosen from the angle_range.
        3. For each flare circle i:
           - Position (x_i, y_i) = (x_s + t_i * cos(θ), y_s + t_i * sin(θ))
             where t_i is a random distance along the flare line.
           - Radius r_i is randomly chosen, with larger circles closer to the sun.
           - Alpha (transparency) alpha_i is randomly chosen in the range [0.05, 0.2].
           - Color (R_i, G_i, B_i) is randomly chosen close to src_color.

        Overlay method blending:
        new_pixel = (1 - alpha_i) * original_pixel + alpha_i * flare_color_i

        Physics-based method blending:
        new_pixel = 255 - ((255 - original_pixel) * (255 - flare_pixel) / 255)

        4. Each flare circle is blended with the image using alpha compositing:
           new_pixel = (1 - alpha_i) * original_pixel + alpha_i * flare_color_i

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [1000, 1000, 3], dtype=np.uint8)

        # Default sun flare (overlay method)
        >>> transform = A.RandomSunFlare(p=1.0)
        >>> flared_image = transform(image=image)["image"]

        # Physics-based sun flare with custom parameters

        # Default sun flare
        >>> transform = A.RandomSunFlare(p=1.0)
        >>> flared_image = transform(image=image)["image"]

        # Custom sun flare parameters

        >>> transform = A.RandomSunFlare(
        ...     flare_roi=(0.1, 0, 0.9, 0.3),
        ...     angle_range=(0.25, 0.75),
        ...     num_flare_circles_range=(5, 15),
        ...     src_radius=200,
        ...     src_color=(255, 200, 100),
        ...     method="physics_based",
        ...     p=1.0
        ... )
        >>> flared_image = transform(image=image)["image"]

    References:
        - Lens flare: https://en.wikipedia.org/wiki/Lens_flare
        - Alpha compositing: https://en.wikipedia.org/wiki/Alpha_compositing
        - Diffraction: https://en.wikipedia.org/wiki/Diffraction
        - Chromatic aberration: https://en.wikipedia.org/wiki/Chromatic_aberration
        - Screen blending: https://en.wikipedia.org/wiki/Blend_modes#Screen
    """

    class InitSchema(BaseTransformInitSchema):
        flare_roi: tuple[float, float, float, float]
        angle_lower: float | None = Field(ge=0, le=1)
        angle_upper: float | None = Field(ge=0, le=1)

        num_flare_circles_lower: int | None = Field(
            ge=0,
        )
        num_flare_circles_upper: int | None = Field(
            gt=0,
        )
        src_radius: int = Field(gt=1)
        src_color: tuple[int, ...]

        angle_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

        num_flare_circles_range: Annotated[
            tuple[int, int],
            AfterValidator(check_1plus),
            AfterValidator(nondecreasing),
        ]
        method: Literal["overlay", "physics_based"]

        @model_validator(mode="after")
        def validate_parameters(self) -> Self:
            flare_center_lower_x, flare_center_lower_y, flare_center_upper_x, flare_center_upper_y = self.flare_roi
            if (
                not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
                or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
            ):
                raise ValueError(f"Invalid flare_roi. Got: {self.flare_roi}")

            if self.angle_lower is not None or self.angle_upper is not None:
                if self.angle_lower is not None:
                    warn(
                        "`angle_lower` deprecated. Use `angle_range` as tuple (angle_lower, angle_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.angle_upper is not None:
                    warn(
                        "`angle_upper` deprecated. Use `angle_range` as tuple(angle_lower, angle_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = self.angle_lower if self.angle_lower is not None else self.angle_range[0]
                upper = self.angle_upper if self.angle_upper is not None else self.angle_range[1]
                self.angle_range = (lower, upper)

            if self.num_flare_circles_lower is not None or self.num_flare_circles_upper is not None:
                if self.num_flare_circles_lower is not None:
                    warn(
                        "`num_flare_circles_lower` deprecated. Use `num_flare_circles_range` as tuple"
                        " (num_flare_circles_lower, num_flare_circles_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                if self.num_flare_circles_upper is not None:
                    warn(
                        "`num_flare_circles_upper` deprecated. Use `num_flare_circles_range` as tuple"
                        " (num_flare_circles_lower, num_flare_circles_upper) instead.",
                        DeprecationWarning,
                        stacklevel=2,
                    )
                lower = (
                    self.num_flare_circles_lower
                    if self.num_flare_circles_lower is not None
                    else self.num_flare_circles_range[0]
                )
                upper = (
                    self.num_flare_circles_upper
                    if self.num_flare_circles_upper is not None
                    else self.num_flare_circles_range[1]
                )
                self.num_flare_circles_range = (lower, upper)

            return self

    def __init__(
        self,
        flare_roi: tuple[float, float, float, float] = (0, 0, 1, 0.5),
        angle_lower: float | None = None,
        angle_upper: float | None = None,
        num_flare_circles_lower: int | None = None,
        num_flare_circles_upper: int | None = None,
        src_radius: int = 400,
        src_color: tuple[int, ...] = (255, 255, 255),
        angle_range: tuple[float, float] = (0, 1),
        num_flare_circles_range: tuple[int, int] = (6, 10),
        method: Literal["overlay", "physics_based"] = "overlay",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)

        self.angle_range = angle_range
        self.num_flare_circles_range = num_flare_circles_range

        self.src_radius = src_radius
        self.src_color = src_color
        self.flare_roi = flare_roi
        self.method = method

    def apply(
        self,
        img: np.ndarray,
        flare_center: tuple[float, float],
        circles: list[Any],
        **params: Any,
    ) -> np.ndarray:
        if self.method == "overlay":
            return fmain.add_sun_flare_overlay(
                img,
                flare_center,
                self.src_radius,
                self.src_color,
                circles,
            )
        if self.method == "physics_based":
            non_rgb_error(img)
            return fmain.add_sun_flare_physics_based(
                img,
                flare_center,
                self.src_radius,
                self.src_color,
                circles,
            )

        raise ValueError(f"Invalid method: {self.method}")

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]
        diagonal = math.sqrt(height**2 + width**2)

        angle = 2 * math.pi * random.uniform(*self.angle_range)

        # Calculate flare center in pixel coordinates
        x_min, y_min, x_max, y_max = self.flare_roi
        flare_center_x = int(width * random.uniform(x_min, x_max))
        flare_center_y = int(height * random.uniform(y_min, y_max))

        num_circles = random.randint(*self.num_flare_circles_range)

        # Calculate parameters relative to image size
        step_size = max(1, int(diagonal * 0.01))  # 1% of diagonal, minimum 1 pixel
        max_radius = max(2, int(height * 0.01))  # 1% of height, minimum 2 pixels
        color_range = int(max(self.src_color) * 0.2)  # 20% of max color value

        def line(t: float) -> tuple[float, float]:
            return (flare_center_x + t * math.cos(angle), flare_center_y + t * math.sin(angle))

        # Generate points along the flare line
        t_range = range(-flare_center_x, width - flare_center_x, step_size)
        points = [line(t) for t in t_range]

        circles = []
        for _ in range(num_circles):
            alpha = random.uniform(0.05, 0.2)
            point = random.choice(points)
            rad = random.randint(1, max_radius)

            # Generate colors relative to src_color
            colors = [random.randint(max(c - color_range, 0), c) for c in self.src_color]

            circles.append(
                (
                    alpha,
                    (int(point[0]), int(point[1])),
                    pow(rad, 3),
                    tuple(colors),
                ),
            )

        return {
            "circles": circles,
            "flare_center": (flare_center_x, flare_center_y),
        }

    def get_transform_init_args(self) -> dict[str, Any]:
        return {
            "flare_roi": self.flare_roi,
            "angle_range": self.angle_range,
            "num_flare_circles_range": self.num_flare_circles_range,
            "src_radius": self.src_radius,
            "src_color": self.src_color,
        }
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    flare_roi: tuple[float, float, float, float]
    angle_lower: float | None = Field(ge=0, le=1)
    angle_upper: float | None = Field(ge=0, le=1)

    num_flare_circles_lower: int | None = Field(
        ge=0,
    )
    num_flare_circles_upper: int | None = Field(
        gt=0,
    )
    src_radius: int = Field(gt=1)
    src_color: tuple[int, ...]

    angle_range: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]

    num_flare_circles_range: Annotated[
        tuple[int, int],
        AfterValidator(check_1plus),
        AfterValidator(nondecreasing),
    ]
    method: Literal["overlay", "physics_based"]

    @model_validator(mode="after")
    def validate_parameters(self) -> Self:
        flare_center_lower_x, flare_center_lower_y, flare_center_upper_x, flare_center_upper_y = self.flare_roi
        if (
            not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
            or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
        ):
            raise ValueError(f"Invalid flare_roi. Got: {self.flare_roi}")

        if self.angle_lower is not None or self.angle_upper is not None:
            if self.angle_lower is not None:
                warn(
                    "`angle_lower` deprecated. Use `angle_range` as tuple (angle_lower, angle_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            if self.angle_upper is not None:
                warn(
                    "`angle_upper` deprecated. Use `angle_range` as tuple(angle_lower, angle_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            lower = self.angle_lower if self.angle_lower is not None else self.angle_range[0]
            upper = self.angle_upper if self.angle_upper is not None else self.angle_range[1]
            self.angle_range = (lower, upper)

        if self.num_flare_circles_lower is not None or self.num_flare_circles_upper is not None:
            if self.num_flare_circles_lower is not None:
                warn(
                    "`num_flare_circles_lower` deprecated. Use `num_flare_circles_range` as tuple"
                    " (num_flare_circles_lower, num_flare_circles_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            if self.num_flare_circles_upper is not None:
                warn(
                    "`num_flare_circles_upper` deprecated. Use `num_flare_circles_range` as tuple"
                    " (num_flare_circles_lower, num_flare_circles_upper) instead.",
                    DeprecationWarning,
                    stacklevel=2,
                )
            lower = (
                self.num_flare_circles_lower
                if self.num_flare_circles_lower is not None
                else self.num_flare_circles_range[0]
            )
            upper = (
                self.num_flare_circles_upper
                if self.num_flare_circles_upper is not None
                else self.num_flare_circles_range[1]
            )
            self.num_flare_circles_range = (lower, upper)

        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, flare_center, circles, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    flare_center: tuple[float, float],
    circles: list[Any],
    **params: Any,
) -> np.ndarray:
    if self.method == "overlay":
        return fmain.add_sun_flare_overlay(
            img,
            flare_center,
            self.src_radius,
            self.src_color,
            circles,
        )
    if self.method == "physics_based":
        non_rgb_error(img)
        return fmain.add_sun_flare_physics_based(
            img,
            flare_center,
            self.src_radius,
            self.src_color,
            circles,
        )

    raise ValueError(f"Invalid method: {self.method}")
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]
    diagonal = math.sqrt(height**2 + width**2)

    angle = 2 * math.pi * random.uniform(*self.angle_range)

    # Calculate flare center in pixel coordinates
    x_min, y_min, x_max, y_max = self.flare_roi
    flare_center_x = int(width * random.uniform(x_min, x_max))
    flare_center_y = int(height * random.uniform(y_min, y_max))

    num_circles = random.randint(*self.num_flare_circles_range)

    # Calculate parameters relative to image size
    step_size = max(1, int(diagonal * 0.01))  # 1% of diagonal, minimum 1 pixel
    max_radius = max(2, int(height * 0.01))  # 1% of height, minimum 2 pixels
    color_range = int(max(self.src_color) * 0.2)  # 20% of max color value

    def line(t: float) -> tuple[float, float]:
        return (flare_center_x + t * math.cos(angle), flare_center_y + t * math.sin(angle))

    # Generate points along the flare line
    t_range = range(-flare_center_x, width - flare_center_x, step_size)
    points = [line(t) for t in t_range]

    circles = []
    for _ in range(num_circles):
        alpha = random.uniform(0.05, 0.2)
        point = random.choice(points)
        rad = random.randint(1, max_radius)

        # Generate colors relative to src_color
        colors = [random.randint(max(c - color_range, 0), c) for c in self.src_color]

        circles.append(
            (
                alpha,
                (int(point[0]), int(point[1])),
                pow(rad, 3),
                tuple(colors),
            ),
        )

    return {
        "circles": circles,
        "flare_center": (flare_center_x, flare_center_y),
    }

class RandomToneCurve (scale=0.1, per_channel=False, always_apply=None, p=0.5) [view source on GitHub]

Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

This transform applies a random S-curve to the image's tone curve, adjusting the brightness and contrast in a non-linear manner. It can be applied to the entire image or to each channel separately.

Parameters:

Name Type Description
scale float

Standard deviation of the normal distribution used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Higher values will result in more dramatic changes to the image. Default: 0.1

per_channel bool

If True, the tone curve will be applied to each channel of the input image separately, which can lead to color distortion. If False, the same curve is applied to all channels, preserving the original color relationships. Default: False

p float

Probability of applying the transform. Default: 0.5

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • This transform modifies the image's histogram by applying a smooth, S-shaped curve to it.
  • The S-curve is defined by moving two control points of a quadratic Bézier curve.
  • When per_channel is False, the same curve is applied to all channels, maintaining color balance.
  • When per_channel is True, different curves are applied to each channel, which can create color shifts.
  • This transform can be used to adjust image contrast and brightness in a more natural way than linear transforms.
  • The effect can range from subtle contrast adjustments to more dramatic "vintage" or "faded" looks.

Mathematical Formulation: 1. Two control points are randomly moved from their default positions (0.25, 0.25) and (0.75, 0.75). 2. The new positions are sampled from a normal distribution: N(μ, σ²), where μ is the original position and alpha is the scale parameter. 3. These points, along with fixed points at (0, 0) and (1, 1), define a quadratic Bézier curve. 4. The curve is applied as a lookup table to the image intensities: new_intensity = curve(original_intensity)

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
Apply a random tone curve to all channels together
Python
>>> transform = A.RandomToneCurve(scale=0.1, per_channel=False, p=1.0)
>>> augmented_image = transform(image=image)['image']
Apply random tone curves to each channel separately
Python
>>> transform = A.RandomToneCurve(scale=0.2, per_channel=True, p=1.0)
>>> augmented_image = transform(image=image)['image']

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RandomToneCurve(ImageOnlyTransform):
    """Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.

    This transform applies a random S-curve to the image's tone curve, adjusting the brightness and contrast
    in a non-linear manner. It can be applied to the entire image or to each channel separately.

    Args:
        scale (float): Standard deviation of the normal distribution used to sample random distances
            to move two control points that modify the image's curve. Values should be in range [0, 1].
            Higher values will result in more dramatic changes to the image. Default: 0.1
        per_channel (bool): If True, the tone curve will be applied to each channel of the input image separately,
            which can lead to color distortion. If False, the same curve is applied to all channels,
            preserving the original color relationships. Default: False
        p (float): Probability of applying the transform. Default: 0.5

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - This transform modifies the image's histogram by applying a smooth, S-shaped curve to it.
        - The S-curve is defined by moving two control points of a quadratic Bézier curve.
        - When per_channel is False, the same curve is applied to all channels, maintaining color balance.
        - When per_channel is True, different curves are applied to each channel, which can create color shifts.
        - This transform can be used to adjust image contrast and brightness in a more natural way than linear
            transforms.
        - The effect can range from subtle contrast adjustments to more dramatic "vintage" or "faded" looks.

    Mathematical Formulation:
        1. Two control points are randomly moved from their default positions (0.25, 0.25) and (0.75, 0.75).
        2. The new positions are sampled from a normal distribution: N(μ, σ²), where μ is the original position
        and alpha is the scale parameter.
        3. These points, along with fixed points at (0, 0) and (1, 1), define a quadratic Bézier curve.
        4. The curve is applied as a lookup table to the image intensities:
           new_intensity = curve(original_intensity)

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)

        # Apply a random tone curve to all channels together
        >>> transform = A.RandomToneCurve(scale=0.1, per_channel=False, p=1.0)
        >>> augmented_image = transform(image=image)['image']

        # Apply random tone curves to each channel separately
        >>> transform = A.RandomToneCurve(scale=0.2, per_channel=True, p=1.0)
        >>> augmented_image = transform(image=image)['image']

    References:
        - "What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance"
          by Mahmoud Afifi and Michael S. Brown, ICCV 2019.
        - Bézier curve: https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Quadratic_B%C3%A9zier_curves
        - Tone mapping: https://en.wikipedia.org/wiki/Tone_mapping
    """

    class InitSchema(BaseTransformInitSchema):
        scale: float = Field(
            ge=0,
            le=1,
        )
        per_channel: bool

    def __init__(
        self,
        scale: float = 0.1,
        per_channel: bool = False,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.scale = scale
        self.per_channel = per_channel

    def apply(
        self,
        img: np.ndarray,
        low_y: float | np.ndarray,
        high_y: float | np.ndarray,
        **params: Any,
    ) -> np.ndarray:
        return fmain.move_tone_curve(img, low_y, high_y)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        image = data["image"] if "image" in data else data["images"][0]
        num_channels = get_num_channels(image)

        if self.per_channel and num_channels != 1:
            return {
                "low_y": np.clip(random_utils.normal(loc=0.25, scale=self.scale, size=(num_channels,)), 0, 1),
                "high_y": np.clip(random_utils.normal(loc=0.75, scale=self.scale, size=(num_channels,)), 0, 1),
            }
        # Same values for all channels
        low_y = np.clip(random_utils.normal(loc=0.25, scale=self.scale), 0, 1)
        high_y = np.clip(random_utils.normal(loc=0.75, scale=self.scale), 0, 1)

        return {"low_y": low_y, "high_y": high_y}

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "scale", "per_channel"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    scale: float = Field(
        ge=0,
        le=1,
    )
    per_channel: bool
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, low_y, high_y, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    low_y: float | np.ndarray,
    high_y: float | np.ndarray,
    **params: Any,
) -> np.ndarray:
    return fmain.move_tone_curve(img, low_y, high_y)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    image = data["image"] if "image" in data else data["images"][0]
    num_channels = get_num_channels(image)

    if self.per_channel and num_channels != 1:
        return {
            "low_y": np.clip(random_utils.normal(loc=0.25, scale=self.scale, size=(num_channels,)), 0, 1),
            "high_y": np.clip(random_utils.normal(loc=0.75, scale=self.scale, size=(num_channels,)), 0, 1),
        }
    # Same values for all channels
    low_y = np.clip(random_utils.normal(loc=0.25, scale=self.scale), 0, 1)
    high_y = np.clip(random_utils.normal(loc=0.75, scale=self.scale), 0, 1)

    return {"low_y": low_y, "high_y": high_y}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "scale", "per_channel"

class RingingOvershoot (blur_limit=(7, 15), cutoff=(0.7853981633974483, 1.5707963267948966), always_apply=None, p=0.5) [view source on GitHub]

Create ringing or overshoot artifacts by convolving the image with a 2D sinc filter.

This transform simulates the ringing artifacts that can occur in digital image processing, particularly after sharpening or edge enhancement operations. It creates oscillations or overshoots near sharp transitions in the image.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

Maximum kernel size for the sinc filter. Must be an odd number in the range [3, inf). If a single int is provided, the kernel size will be randomly chosen from the range (3, blur_limit). If a tuple (min, max) is provided, the kernel size will be randomly chosen from the range (min, max). Default: (7, 15).

cutoff tuple[float, float]

Range to choose the cutoff frequency in radians. Values should be in the range (0, π). A lower cutoff frequency will result in more pronounced ringing effects. Default: (π/4, π/2).

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • Ringing artifacts are oscillations of the image intensity function in the neighborhood of sharp transitions, such as edges or object boundaries.
  • This transform uses a 2D sinc filter (also known as a 2D cardinal sine function) to introduce these artifacts.
  • The severity of the ringing effect is controlled by both the kernel size (blur_limit) and the cutoff frequency.
  • Larger kernel sizes and lower cutoff frequencies will generally produce more noticeable ringing effects.
  • This transform can be useful for:
  • Simulating imperfections in image processing or transmission systems
  • Testing the robustness of computer vision models to ringing artifacts
  • Creating artistic effects that emphasize edges and transitions in images

Mathematical Formulation: The 2D sinc filter kernel is defined as:

K(x, y) = cutoff * J₁(cutoff * √(x² + y²)) / (2π * √(x² + y²))

where:
- J₁ is the Bessel function of the first kind of order 1
- cutoff is the chosen cutoff frequency
- x and y are the distances from the kernel center

The filtered image I' is obtained by convolving the input image I with the kernel K:

I'(x, y) = ∑∑ I(x-u, y-v) * K(u, v)

The convolution operation introduces the ringing artifacts near sharp transitions.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Apply ringing effect with default parameters
Python
>>> transform = A.RingingOvershoot(p=1.0)
>>> ringing_image = transform(image=image)['image']
Apply ringing effect with custom parameters
Python
>>> transform = A.RingingOvershoot(
...     blur_limit=(9, 17),
...     cutoff=(np.pi/6, np.pi/3),
...     p=1.0
... )
>>> ringing_image = transform(image=image)['image']

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class RingingOvershoot(ImageOnlyTransform):
    """Create ringing or overshoot artifacts by convolving the image with a 2D sinc filter.

    This transform simulates the ringing artifacts that can occur in digital image processing,
    particularly after sharpening or edge enhancement operations. It creates oscillations
    or overshoots near sharp transitions in the image.

    Args:
        blur_limit (tuple[int, int] | int): Maximum kernel size for the sinc filter.
            Must be an odd number in the range [3, inf).
            If a single int is provided, the kernel size will be randomly chosen
            from the range (3, blur_limit). If a tuple (min, max) is provided,
            the kernel size will be randomly chosen from the range (min, max).
            Default: (7, 15).
        cutoff (tuple[float, float]): Range to choose the cutoff frequency in radians.
            Values should be in the range (0, π). A lower cutoff frequency will
            result in more pronounced ringing effects.
            Default: (π/4, π/2).
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - Ringing artifacts are oscillations of the image intensity function in the neighborhood
          of sharp transitions, such as edges or object boundaries.
        - This transform uses a 2D sinc filter (also known as a 2D cardinal sine function)
          to introduce these artifacts.
        - The severity of the ringing effect is controlled by both the kernel size (blur_limit)
          and the cutoff frequency.
        - Larger kernel sizes and lower cutoff frequencies will generally produce more
          noticeable ringing effects.
        - This transform can be useful for:
          * Simulating imperfections in image processing or transmission systems
          * Testing the robustness of computer vision models to ringing artifacts
          * Creating artistic effects that emphasize edges and transitions in images

    Mathematical Formulation:
        The 2D sinc filter kernel is defined as:

        K(x, y) = cutoff * J₁(cutoff * √(x² + y²)) / (2π * √(x² + y²))

        where:
        - J₁ is the Bessel function of the first kind of order 1
        - cutoff is the chosen cutoff frequency
        - x and y are the distances from the kernel center

        The filtered image I' is obtained by convolving the input image I with the kernel K:

        I'(x, y) = ∑∑ I(x-u, y-v) * K(u, v)

        The convolution operation introduces the ringing artifacts near sharp transitions.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Apply ringing effect with default parameters
        >>> transform = A.RingingOvershoot(p=1.0)
        >>> ringing_image = transform(image=image)['image']

        # Apply ringing effect with custom parameters
        >>> transform = A.RingingOvershoot(
        ...     blur_limit=(9, 17),
        ...     cutoff=(np.pi/6, np.pi/3),
        ...     p=1.0
        ... )
        >>> ringing_image = transform(image=image)['image']

    References:
        - Ringing artifacts: https://en.wikipedia.org/wiki/Ringing_artifacts
        - Sinc filter: https://en.wikipedia.org/wiki/Sinc_filter
        - "The Importance of Ringing Artifacts in Image Processing" by Jae S. Lim, 1981
        - "Digital Image Processing" by Rafael C. Gonzalez and Richard E. Woods, 4th Edition
    """

    class InitSchema(BlurInitSchema):
        blur_limit: ScaleIntType
        cutoff: Annotated[tuple[float, float], nondecreasing]

        @field_validator("cutoff")
        @classmethod
        def check_cutoff(cls, v: tuple[float, float], info: ValidationInfo) -> tuple[float, float]:
            bounds = 0, np.pi
            check_range(v, *bounds, info.field_name)
            return v

    def __init__(
        self,
        blur_limit: ScaleIntType = (7, 15),
        cutoff: tuple[float, float] = (np.pi / 4, np.pi / 2),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.cutoff = cutoff

    def get_params(self) -> dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        if ksize % 2 == 0:
            raise ValueError(f"Kernel size must be odd. Got: {ksize}")

        cutoff = random.uniform(*self.cutoff)

        # From dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
        with np.errstate(divide="ignore", invalid="ignore"):
            kernel = np.fromfunction(
                lambda x, y: cutoff
                * special.j1(cutoff * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2))
                / (2 * np.pi * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2)),
                [ksize, ksize],
            )
        kernel[(ksize - 1) // 2, (ksize - 1) // 2] = cutoff**2 / (4 * np.pi)

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)

        return {"kernel": kernel}

    def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
        return fmain.convolve(img, kernel)

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("blur_limit", "cutoff")
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BlurInitSchema):
    blur_limit: ScaleIntType
    cutoff: Annotated[tuple[float, float], nondecreasing]

    @field_validator("cutoff")
    @classmethod
    def check_cutoff(cls, v: tuple[float, float], info: ValidationInfo) -> tuple[float, float]:
        bounds = 0, np.pi
        check_range(v, *bounds, info.field_name)
        return v
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, kernel, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
    return fmain.convolve(img, kernel)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, np.ndarray]:
    ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
    if ksize % 2 == 0:
        raise ValueError(f"Kernel size must be odd. Got: {ksize}")

    cutoff = random.uniform(*self.cutoff)

    # From dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
    with np.errstate(divide="ignore", invalid="ignore"):
        kernel = np.fromfunction(
            lambda x, y: cutoff
            * special.j1(cutoff * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2))
            / (2 * np.pi * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2)),
            [ksize, ksize],
        )
    kernel[(ksize - 1) // 2, (ksize - 1) // 2] = cutoff**2 / (4 * np.pi)

    # Normalize kernel
    kernel = kernel.astype(np.float32) / np.sum(kernel)

    return {"kernel": kernel}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("blur_limit", "cutoff")

class Sharpen (alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=None, p=0.5) [view source on GitHub]

Sharpen the input image and overlays the result with the original image.

This transform applies a sharpening filter to the input image and then blends the sharpened image with the original using a specified alpha value.

Parameters:

Name Type Description
alpha tuple[float, float]

Range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Values should be in the range [0, 1]. Default: (0.2, 0.5).

lightness tuple of float

Range to choose the lightness of the sharpened image. Larger values will create images with higher contrast. Values should be greater than 0. Default: (0.5, 1.0).

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The sharpening effect is achieved using a 3x3 sharpening kernel.
  • The kernel is dynamically generated based on the 'alpha' and 'lightness' parameters.
  • Higher 'alpha' values will result in a more pronounced sharpening effect.
  • Higher 'lightness' values will increase the contrast of the sharpened areas.
  • This transform can be useful for:
  • Enhancing edge details in images
  • Improving the perceived quality of slightly blurred images
  • Creating a more crisp appearance in photographs

Mathematical Formulation: The sharpening kernel K is defined as:

K = (1 - alpha) * I + alpha * L

where:
- alpha is the alpha value (from the 'alpha' parameter)
- I is the identity kernel [[0, 0, 0], [0, 1, 0], [0, 0, 0]]
- L is the Laplacian kernel [[-1, -1, -1], [-1, 8+l, -1], [-1, -1, -1]]
  (l is the lightness value from the 'lightness' parameter)

The sharpened image S is obtained by convolving the input image I with the kernel K:

S = I * K

The final output O is a blend of the original and sharpened images:

O = (1 - alpha) * I + alpha * S

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
Apply sharpening with default parameters
Python
>>> transform = A.Sharpen(p=1.0)
>>> sharpened_image = transform(image=image)['image']
Apply sharpening with custom parameters
Python
>>> transform = A.Sharpen(alpha=(0.4, 0.7), lightness=(0.8, 1.2), p=1.0)
>>> sharpened_image = transform(image=image)['image']

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Sharpen(ImageOnlyTransform):
    """Sharpen the input image and overlays the result with the original image.

    This transform applies a sharpening filter to the input image and then blends
    the sharpened image with the original using a specified alpha value.

    Args:
        alpha (tuple[float, float]): Range to choose the visibility of the sharpened image.
            At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
            Values should be in the range [0, 1].
            Default: (0.2, 0.5).

        lightness (tuple of float): Range to choose the lightness of the sharpened image.
            Larger values will create images with higher contrast.
            Values should be greater than 0.
            Default: (0.5, 1.0).

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The sharpening effect is achieved using a 3x3 sharpening kernel.
        - The kernel is dynamically generated based on the 'alpha' and 'lightness' parameters.
        - Higher 'alpha' values will result in a more pronounced sharpening effect.
        - Higher 'lightness' values will increase the contrast of the sharpened areas.
        - This transform can be useful for:
          * Enhancing edge details in images
          * Improving the perceived quality of slightly blurred images
          * Creating a more crisp appearance in photographs

    Mathematical Formulation:
        The sharpening kernel K is defined as:

        K = (1 - alpha) * I + alpha * L

        where:
        - alpha is the alpha value (from the 'alpha' parameter)
        - I is the identity kernel [[0, 0, 0], [0, 1, 0], [0, 0, 0]]
        - L is the Laplacian kernel [[-1, -1, -1], [-1, 8+l, -1], [-1, -1, -1]]
          (l is the lightness value from the 'lightness' parameter)

        The sharpened image S is obtained by convolving the input image I with the kernel K:

        S = I * K

        The final output O is a blend of the original and sharpened images:

        O = (1 - alpha) * I + alpha * S

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

        # Apply sharpening with default parameters
        >>> transform = A.Sharpen(p=1.0)
        >>> sharpened_image = transform(image=image)['image']

        # Apply sharpening with custom parameters
        >>> transform = A.Sharpen(alpha=(0.4, 0.7), lightness=(0.8, 1.2), p=1.0)
        >>> sharpened_image = transform(image=image)['image']

    References:
        - Image sharpening: https://en.wikipedia.org/wiki/Unsharp_masking
        - Laplacian operator: https://en.wikipedia.org/wiki/Laplace_operator
        - "Digital Image Processing" by Rafael C. Gonzalez and Richard E. Woods, 4th Edition
    """

    class InitSchema(BaseTransformInitSchema):
        alpha: Annotated[tuple[float, float], AfterValidator(check_01)]
        lightness: Annotated[tuple[float, float], AfterValidator(check_0plus)]

    def __init__(
        self,
        alpha: tuple[float, float] = (0.2, 0.5),
        lightness: tuple[float, float] = (0.5, 1.0),
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.alpha = alpha
        self.lightness = lightness

    @staticmethod
    def __generate_sharpening_matrix(alpha_sample: np.ndarray, lightness_sample: np.ndarray) -> np.ndarray:
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [[-1, -1, -1], [-1, 8 + lightness_sample, -1], [-1, -1, -1]],
            dtype=np.float32,
        )

        return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect

    def get_params(self) -> dict[str, np.ndarray]:
        alpha = random.uniform(*self.alpha)
        lightness = random.uniform(*self.lightness)
        sharpening_matrix = self.__generate_sharpening_matrix(alpha_sample=alpha, lightness_sample=lightness)
        return {"sharpening_matrix": sharpening_matrix}

    def apply(self, img: np.ndarray, sharpening_matrix: np.ndarray, **params: Any) -> np.ndarray:
        return fmain.convolve(img, sharpening_matrix)

    def get_transform_init_args_names(self) -> tuple[str, str]:
        return ("alpha", "lightness")
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    alpha: Annotated[tuple[float, float], AfterValidator(check_01)]
    lightness: Annotated[tuple[float, float], AfterValidator(check_0plus)]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, sharpening_matrix, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, sharpening_matrix: np.ndarray, **params: Any) -> np.ndarray:
    return fmain.convolve(img, sharpening_matrix)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, np.ndarray]:
    alpha = random.uniform(*self.alpha)
    lightness = random.uniform(*self.lightness)
    sharpening_matrix = self.__generate_sharpening_matrix(alpha_sample=alpha, lightness_sample=lightness)
    return {"sharpening_matrix": sharpening_matrix}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str]:
    return ("alpha", "lightness")

class Solarize (threshold=(128, 128), p=0.5, always_apply=None) [view source on GitHub]

Invert all pixel values above a threshold.

Parameters:

Name Type Description
threshold ScaleIntType

range for solarizing threshold. If threshold is a single value, the range will be [1, threshold]. Default: 128.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Solarize(ImageOnlyTransform):
    """Invert all pixel values above a threshold.

    Args:
        threshold: range for solarizing threshold.
            If threshold is a single value, the range will be [1, threshold]. Default: 128.
        p: probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    """

    class InitSchema(BaseTransformInitSchema):
        threshold: OnePlusFloatRangeType = (128, 128)

    def __init__(self, threshold: ScaleIntType = (128, 128), p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.threshold = cast(Tuple[float, float], threshold)

    def apply(self, img: np.ndarray, threshold: int, **params: Any) -> np.ndarray:
        return fmain.solarize(img, threshold)

    def get_params(self) -> dict[str, float]:
        return {"threshold": random.uniform(self.threshold[0], self.threshold[1])}

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("threshold",)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    threshold: OnePlusFloatRangeType = (128, 128)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, threshold, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, threshold: int, **params: Any) -> np.ndarray:
    return fmain.solarize(img, threshold)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {"threshold": random.uniform(self.threshold[0], self.threshold[1])}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str]:
    return ("threshold",)

class Spatter (mean=(0.65, 0.65), std=(0.3, 0.3), gauss_sigma=(2, 2), cutout_threshold=(0.68, 0.68), intensity=(0.6, 0.6), mode='rain', color=None, always_apply=None, p=0.5) [view source on GitHub]

Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

Parameters:

Name Type Description
mean float, or tuple of floats

Mean value of normal distribution for generating liquid layer. If single float mean will be sampled from (0, mean) If tuple of float mean will be sampled from range (mean[0], mean[1]). If you want constant value use (mean, mean). Default (0.65, 0.65)

std float, or tuple of floats

Standard deviation value of normal distribution for generating liquid layer. If single float the number will be sampled from (0, std). If tuple of float std will be sampled from range (std[0], std[1]). If you want constant value use (std, std). Default: (0.3, 0.3).

gauss_sigma float, or tuple of floats

Sigma value for gaussian filtering of liquid layer. If single float the number will be sampled from (0, gauss_sigma). If tuple of float gauss_sigma will be sampled from range (gauss_sigma[0], gauss_sigma[1]). If you want constant value use (gauss_sigma, gauss_sigma). Default: (2, 3).

cutout_threshold float, or tuple of floats

Threshold for filtering liqued layer (determines number of drops). If single float it will used as cutout_threshold. If single float the number will be sampled from (0, cutout_threshold). If tuple of float cutout_threshold will be sampled from range (cutout_threshold[0], cutout_threshold[1]). If you want constant value use (cutout_threshold, cutout_threshold). Default: (0.68, 0.68).

intensity float, or tuple of floats

Intensity of corruption. If single float the number will be sampled from (0, intensity). If tuple of float intensity will be sampled from range (intensity[0], intensity[1]). If you want constant value use (intensity, intensity). Default: (0.6, 0.6).

mode string, or list of strings

Type of corruption. Currently, supported options are 'rain' and 'mud'. If list is provided type of corruption will be sampled list. Default: ("rain").

color list of (r, g, b) or dict or None

Corruption elements color. If list uses provided list as color for specified mode. If dict uses provided color for specified mode. Color for each specified mode should be provided in dict. If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Spatter(ImageOnlyTransform):
    """Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

    Args:
        mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
            If single float mean will be sampled from `(0, mean)`
            If tuple of float mean will be sampled from range `(mean[0], mean[1])`.
            If you want constant value use (mean, mean).
            Default (0.65, 0.65)
        std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
            If single float the number will be sampled from `(0, std)`.
            If tuple of float std will be sampled from range `(std[0], std[1])`.
            If you want constant value use (std, std).
            Default: (0.3, 0.3).
        gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
            If single float the number will be sampled from `(0, gauss_sigma)`.
            If tuple of float gauss_sigma will be sampled from range `(gauss_sigma[0], gauss_sigma[1])`.
            If you want constant value use (gauss_sigma, gauss_sigma).
            Default: (2, 3).
        cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
            (determines number of drops). If single float it will used as cutout_threshold.
            If single float the number will be sampled from `(0, cutout_threshold)`.
            If tuple of float cutout_threshold will be sampled from range `(cutout_threshold[0], cutout_threshold[1])`.
            If you want constant value use `(cutout_threshold, cutout_threshold)`.
            Default: (0.68, 0.68).
        intensity (float, or tuple of floats): Intensity of corruption.
            If single float the number will be sampled from `(0, intensity)`.
            If tuple of float intensity will be sampled from range `(intensity[0], intensity[1])`.
            If you want constant value use `(intensity, intensity)`.
            Default: (0.6, 0.6).
        mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
             If list is provided type of corruption will be sampled list. Default: ("rain").
        color (list of (r, g, b) or dict or None): Corruption elements color.
            If list uses provided list as color for specified mode.
            If dict uses provided color for specified mode. Color for each specified mode should be provided in dict.
            If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://arxiv.org/abs/1903.12261
        https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py

    """

    class InitSchema(BaseTransformInitSchema):
        mean: ZeroOneRangeType = (0.65, 0.65)
        std: ZeroOneRangeType = (0.3, 0.3)
        gauss_sigma: NonNegativeFloatRangeType = (2, 2)
        cutout_threshold: ZeroOneRangeType = (0.68, 0.68)
        intensity: ZeroOneRangeType = (0.6, 0.6)
        mode: SpatterMode | Sequence[SpatterMode] = Field(
            default="rain",
            description="Type of corruption ('rain', 'mud').",
        )
        color: Sequence[int] | dict[str, Sequence[int]] | None = None

        @field_validator("mode")
        @classmethod
        def check_mode(cls, mode: SpatterMode | Sequence[SpatterMode]) -> Sequence[SpatterMode]:
            if isinstance(mode, str):
                return [mode]
            return mode

        @model_validator(mode="after")
        def check_color(self) -> Self:
            if self.color is None:
                self.color = {"rain": [238, 238, 175], "mud": [20, 42, 63]}

            elif isinstance(self.color, (list, tuple)) and len(self.mode) == 1:
                if len(self.color) != NUM_RGB_CHANNELS:
                    msg = "Color must be a list of three integers for RGB format."
                    raise ValueError(msg)
                self.color = {self.mode[0]: self.color}
            elif isinstance(self.color, dict):
                result = {}
                for mode in self.mode:
                    if mode not in self.color:
                        raise ValueError(f"Color for mode {mode} is not specified.")
                    if len(self.color[mode]) != NUM_RGB_CHANNELS:
                        raise ValueError(f"Color for mode {mode} must be in RGB format.")
                    result[mode] = self.color[mode]
            else:
                msg = "Color must be a list of RGB values or a dict mapping mode to RGB values."
                raise ValueError(msg)
            return self

    def __init__(
        self,
        mean: ScaleFloatType = (0.65, 0.65),
        std: ScaleFloatType = (0.3, 0.3),
        gauss_sigma: ScaleFloatType = (2, 2),
        cutout_threshold: ScaleFloatType = (0.68, 0.68),
        intensity: ScaleFloatType = (0.6, 0.6),
        mode: SpatterMode | Sequence[SpatterMode] = "rain",
        color: Sequence[int] | dict[str, Sequence[int]] | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.mean = cast(Tuple[float, float], mean)
        self.std = cast(Tuple[float, float], std)
        self.gauss_sigma = cast(Tuple[float, float], gauss_sigma)
        self.cutout_threshold = cast(Tuple[float, float], cutout_threshold)
        self.intensity = cast(Tuple[float, float], intensity)
        self.mode = mode
        self.color = cast(Dict[str, Sequence[int]], color)

    def apply(
        self,
        img: np.ndarray,
        non_mud: np.ndarray,
        mud: np.ndarray,
        drops: np.ndarray,
        mode: SpatterMode,
        **params: dict[str, Any],
    ) -> np.ndarray:
        return fmain.spatter(img, non_mud, mud, drops, mode)

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        height, width = params["shape"][:2]

        mean = random.uniform(*self.mean)
        std = random.uniform(*self.std)
        cutout_threshold = random.uniform(*self.cutout_threshold)
        sigma = random.uniform(*self.gauss_sigma)
        mode = random.choice(self.mode)
        intensity = random.uniform(*self.intensity)
        color = np.array(self.color[mode]) / 255.0

        liquid_layer = random_utils.normal(size=(height, width), loc=mean, scale=std)
        liquid_layer = gaussian_filter(liquid_layer, sigma=sigma, mode="nearest")
        liquid_layer[liquid_layer < cutout_threshold] = 0

        if mode == "rain":
            liquid_layer = clip(liquid_layer * 255, np.uint8)
            dist = 255 - cv2.Canny(liquid_layer, 50, 150)
            dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
            _, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
            dist = clip(blur(dist, 3), np.uint8)
            dist = fmain.equalize(dist)

            ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
            dist = fmain.convolve(dist, ker)
            dist = blur(dist, 3).astype(np.float32)

            m = liquid_layer * dist
            m *= 1 / np.max(m, axis=(0, 1))

            drops = m[:, :, None] * color * intensity
            mud = None
            non_mud = None
        else:
            m = np.where(liquid_layer > cutout_threshold, 1, 0)
            m = gaussian_filter(m.astype(np.float32), sigma=sigma, mode="nearest")
            m[m < 1.2 * cutout_threshold] = 0
            m = m[..., np.newaxis]

            mud = m * color
            non_mud = 1 - m
            drops = None

        return {
            "non_mud": non_mud,
            "mud": mud,
            "drops": drops,
            "mode": mode,
        }

    def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str, str]:
        return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode", "color"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    mean: ZeroOneRangeType = (0.65, 0.65)
    std: ZeroOneRangeType = (0.3, 0.3)
    gauss_sigma: NonNegativeFloatRangeType = (2, 2)
    cutout_threshold: ZeroOneRangeType = (0.68, 0.68)
    intensity: ZeroOneRangeType = (0.6, 0.6)
    mode: SpatterMode | Sequence[SpatterMode] = Field(
        default="rain",
        description="Type of corruption ('rain', 'mud').",
    )
    color: Sequence[int] | dict[str, Sequence[int]] | None = None

    @field_validator("mode")
    @classmethod
    def check_mode(cls, mode: SpatterMode | Sequence[SpatterMode]) -> Sequence[SpatterMode]:
        if isinstance(mode, str):
            return [mode]
        return mode

    @model_validator(mode="after")
    def check_color(self) -> Self:
        if self.color is None:
            self.color = {"rain": [238, 238, 175], "mud": [20, 42, 63]}

        elif isinstance(self.color, (list, tuple)) and len(self.mode) == 1:
            if len(self.color) != NUM_RGB_CHANNELS:
                msg = "Color must be a list of three integers for RGB format."
                raise ValueError(msg)
            self.color = {self.mode[0]: self.color}
        elif isinstance(self.color, dict):
            result = {}
            for mode in self.mode:
                if mode not in self.color:
                    raise ValueError(f"Color for mode {mode} is not specified.")
                if len(self.color[mode]) != NUM_RGB_CHANNELS:
                    raise ValueError(f"Color for mode {mode} must be in RGB format.")
                result[mode] = self.color[mode]
        else:
            msg = "Color must be a list of RGB values or a dict mapping mode to RGB values."
            raise ValueError(msg)
        return self
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, non_mud, mud, drops, mode, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    non_mud: np.ndarray,
    mud: np.ndarray,
    drops: np.ndarray,
    mode: SpatterMode,
    **params: dict[str, Any],
) -> np.ndarray:
    return fmain.spatter(img, non_mud, mud, drops, mode)
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    height, width = params["shape"][:2]

    mean = random.uniform(*self.mean)
    std = random.uniform(*self.std)
    cutout_threshold = random.uniform(*self.cutout_threshold)
    sigma = random.uniform(*self.gauss_sigma)
    mode = random.choice(self.mode)
    intensity = random.uniform(*self.intensity)
    color = np.array(self.color[mode]) / 255.0

    liquid_layer = random_utils.normal(size=(height, width), loc=mean, scale=std)
    liquid_layer = gaussian_filter(liquid_layer, sigma=sigma, mode="nearest")
    liquid_layer[liquid_layer < cutout_threshold] = 0

    if mode == "rain":
        liquid_layer = clip(liquid_layer * 255, np.uint8)
        dist = 255 - cv2.Canny(liquid_layer, 50, 150)
        dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
        _, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
        dist = clip(blur(dist, 3), np.uint8)
        dist = fmain.equalize(dist)

        ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
        dist = fmain.convolve(dist, ker)
        dist = blur(dist, 3).astype(np.float32)

        m = liquid_layer * dist
        m *= 1 / np.max(m, axis=(0, 1))

        drops = m[:, :, None] * color * intensity
        mud = None
        non_mud = None
    else:
        m = np.where(liquid_layer > cutout_threshold, 1, 0)
        m = gaussian_filter(m.astype(np.float32), sigma=sigma, mode="nearest")
        m[m < 1.2 * cutout_threshold] = 0
        m = m[..., np.newaxis]

        mud = m * color
        non_mud = 1 - m
        drops = None

    return {
        "non_mud": non_mud,
        "mud": mud,
        "drops": drops,
        "mode": mode,
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str, str]:
    return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode", "color"

class Superpixels (p_replace=(0, 0.1), n_segments=(100, 100), max_size=128, interpolation=1, always_apply=None, p=0.5) [view source on GitHub]

Transform images partially/completely to their superpixel representation.

This implementation uses skimage's version of the SLIC (Simple Linear Iterative Clustering) algorithm.

Parameters:

Name Type Description
p_replace tuple[float, float] | float

Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed).

  • A probability of 0.0 would mean, that the pixels in no segment are replaced by their average color (image is not changed at all).
  • A probability of 0.5 would mean, that around half of all segments are replaced by their average color.
  • A probability of 1.0 would mean, that all segments are replaced by their average color (resulting in a voronoi image).

Behavior based on chosen data types for this parameter: * If a float, then that float will always be used. * If tuple (a, b), then a random probability will be sampled from the interval [a, b] per image. Default: (0.1, 0.3)

n_segments tuple[int, int] | int

Rough target number of how many superpixels to generate. The algorithm may deviate from this number. Lower value will lead to coarser superpixels. Higher values are computationally more intensive and will hence lead to a slowdown. If tuple (a, b), then a value from the discrete interval [a..b] will be sampled per image. Default: (15, 120)

max_size int | None

Maximum image size at which the augmentation is performed. If the width or height of an image exceeds this value, it will be downscaled before the augmentation so that the longest side matches max_size. This is done to speed up the process. The final output image has the same size as the input image. Note that in case p_replace is below 1.0, the down-/upscaling will affect the not-replaced pixels too. Use None to apply no down-/upscaling. Default: 128

interpolation OpenCV flag

Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • This transform can significantly change the visual appearance of the image.
  • The transform makes use of a superpixel algorithm, which tends to be slow. If performance is a concern, consider using max_size to limit the image size.
  • The effect of this transform can vary greatly depending on the p_replace and n_segments parameters.
  • When p_replace is high, the image can become highly abstracted, resembling a voronoi diagram.
  • The transform preserves the original image type (uint8 or float32).

Mathematical Formulation: 1. The image is segmented into approximately n_segments superpixels using the SLIC algorithm. 2. For each superpixel: - With probability p_replace, all pixels in the superpixel are replaced with their mean color. - With probability 1 - p_replace, the superpixel is left unchanged. 3. If the image was resized due to max_size, it is resized back to its original dimensions.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
Apply superpixels with default parameters
Python
>>> transform = A.Superpixels(p=1.0)
>>> augmented_image = transform(image=image)['image']
Apply superpixels with custom parameters
Python
>>> transform = A.Superpixels(
...     p_replace=(0.5, 0.7),
...     n_segments=(50, 100),
...     max_size=None,
...     interpolation=cv2.INTER_NEAREST,
...     p=1.0
... )
>>> augmented_image = transform(image=image)['image']

References

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class Superpixels(ImageOnlyTransform):
    """Transform images partially/completely to their superpixel representation.

    This implementation uses skimage's version of the SLIC (Simple Linear Iterative Clustering) algorithm.

    Args:
        p_replace (tuple[float, float] | float): Defines for any segment the probability that the pixels within that
            segment are replaced by their average color (otherwise, the pixels are not changed).


            * A probability of ``0.0`` would mean, that the pixels in no
                segment are replaced by their average color (image is not
                changed at all).
            * A probability of ``0.5`` would mean, that around half of all
                segments are replaced by their average color.
            * A probability of ``1.0`` would mean, that all segments are
                replaced by their average color (resulting in a voronoi
                image).

            Behavior based on chosen data types for this parameter:
            * If a ``float``, then that ``float`` will always be used.
            * If ``tuple`` ``(a, b)``, then a random probability will be
            sampled from the interval ``[a, b]`` per image.
            Default: (0.1, 0.3)

        n_segments (tuple[int, int] | int): Rough target number of how many superpixels to generate.
            The algorithm may deviate from this number.
            Lower value will lead to coarser superpixels.
            Higher values are computationally more intensive and will hence lead to a slowdown.
            If tuple ``(a, b)``, then a value from the discrete interval ``[a..b]`` will be sampled per image.
            Default: (15, 120)

        max_size (int | None): Maximum image size at which the augmentation is performed.
            If the width or height of an image exceeds this value, it will be
            downscaled before the augmentation so that the longest side matches `max_size`.
            This is done to speed up the process. The final output image has the same size as the input image.
            Note that in case `p_replace` is below ``1.0``,
            the down-/upscaling will affect the not-replaced pixels too.
            Use ``None`` to apply no down-/upscaling.
            Default: 128

        interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - This transform can significantly change the visual appearance of the image.
        - The transform makes use of a superpixel algorithm, which tends to be slow.
        If performance is a concern, consider using `max_size` to limit the image size.
        - The effect of this transform can vary greatly depending on the `p_replace` and `n_segments` parameters.
        - When `p_replace` is high, the image can become highly abstracted, resembling a voronoi diagram.
        - The transform preserves the original image type (uint8 or float32).

    Mathematical Formulation:
        1. The image is segmented into approximately `n_segments` superpixels using the SLIC algorithm.
        2. For each superpixel:
        - With probability `p_replace`, all pixels in the superpixel are replaced with their mean color.
        - With probability `1 - p_replace`, the superpixel is left unchanged.
        3. If the image was resized due to `max_size`, it is resized back to its original dimensions.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)

        # Apply superpixels with default parameters
        >>> transform = A.Superpixels(p=1.0)
        >>> augmented_image = transform(image=image)['image']

        # Apply superpixels with custom parameters
        >>> transform = A.Superpixels(
        ...     p_replace=(0.5, 0.7),
        ...     n_segments=(50, 100),
        ...     max_size=None,
        ...     interpolation=cv2.INTER_NEAREST,
        ...     p=1.0
        ... )
        >>> augmented_image = transform(image=image)['image']

    References:
        - SLIC Superpixels: https://scikit-image.org/docs/dev/api/skimage.segmentation.html#skimage.segmentation.slic
        - "SLIC Superpixels Compared to State-of-the-art Superpixel Methods" by Radhakrishna Achanta, et al.
    """

    class InitSchema(BaseTransformInitSchema):
        p_replace: ZeroOneRangeType
        n_segments: OnePlusIntRangeType
        max_size: int | None = Field(ge=1)
        interpolation: InterpolationType

    def __init__(
        self,
        p_replace: ScaleFloatType = (0, 0.1),
        n_segments: ScaleIntType = (100, 100),
        max_size: int | None = 128,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.p_replace = cast(Tuple[float, float], p_replace)
        self.n_segments = cast(Tuple[int, int], n_segments)
        self.max_size = max_size
        self.interpolation = interpolation

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "p_replace", "n_segments", "max_size", "interpolation"

    def get_params(self) -> dict[str, Any]:
        n_segments = random.randint(*self.n_segments)
        p = random.uniform(*self.p_replace)
        return {"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}

    def apply(
        self,
        img: np.ndarray,
        replace_samples: Sequence[bool],
        n_segments: int,
        **kwargs: Any,
    ) -> np.ndarray:
        return fmain.superpixels(img, n_segments, replace_samples, self.max_size, self.interpolation)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    p_replace: ZeroOneRangeType
    n_segments: OnePlusIntRangeType
    max_size: int | None = Field(ge=1)
    interpolation: InterpolationType
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, replace_samples, n_segments, **kwargs)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    replace_samples: Sequence[bool],
    n_segments: int,
    **kwargs: Any,
) -> np.ndarray:
    return fmain.superpixels(img, n_segments, replace_samples, self.max_size, self.interpolation)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    n_segments = random.randint(*self.n_segments)
    p = random.uniform(*self.p_replace)
    return {"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "p_replace", "n_segments", "max_size", "interpolation"

class TemplateTransform (templates, img_weight=(0.5, 0.5), template_weight=(0.5, 0.5), template_transform=None, name=None, always_apply=None, p=0.5) [view source on GitHub]

Apply blending of input image with specified templates.

This transform overlays one or more template images onto the input image using alpha blending. It allows for creating complex composite images or simulating various visual effects.

Parameters:

Name Type Description
templates numpy array | list[np.ndarray]

Images to use as templates for the transform. If a single numpy array is provided, it will be used as the only template. If a list of numpy arrays is provided, one will be randomly chosen for each application.

img_weight tuple[float, float] | float

Weight of the original image in the blend. If a single float, that value will always be used. If a tuple (min, max), the weight will be randomly sampled from the range [min, max) for each application. To use a fixed weight, use (weight, weight). Default: (0.5, 0.5).

template_weight tuple[float, float] | float

Weight of the template image in the blend. If a single float, that value will always be used. If a tuple (min, max), the weight will be randomly sampled from the range [min, max) for each application. To use a fixed weight, use (weight, weight). Default: (0.5, 0.5).

template_transform A.Compose | None

A composition of Albumentations transforms to apply to the template before blending. This should be an instance of A.Compose containing one or more Albumentations transforms. Default: None.

name str | None

Name of the transform instance. Used for serialization purposes. Default: None.

p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: Any

Note

  • The template(s) must have the same number of channels as the input image or be single-channel.
  • If a single-channel template is used with a multi-channel image, the template will be replicated across all channels.
  • The template(s) must have the same size as the input image.
  • The weights determine the contribution of each image (original and template) to the final blend. Higher weights result in a stronger presence of that image in the output.
  • The weights are automatically normalized before blending. This ensures that the sum of the normalized weights always equals 1, maintaining the overall brightness of the blended image.
  • The relative proportion of the weights determines the blend ratio. For example, img_weight=2 and template_weight=1 will result in the same blend as img_weight=0.66 and template_weight=0.33.
  • To make this transform serializable, provide a name when initializing it.

Mathematical Formulation: Given: - I: Input image - T: Template image - w_i: Weight of input image (sampled from img_weight) - w_t: Weight of template image (sampled from template_weight)

The normalized weights are computed as:
w_i_norm = w_i / (w_i + w_t)
w_t_norm = w_t / (w_i + w_t)

The blended image B is then computed as:

B = w_i_norm * I + w_t_norm * T

This ensures that w_i_norm + w_t_norm = 1, maintaining the overall image intensity.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> template = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
Apply template transform with a single template
Python
>>> transform = A.TemplateTransform(templates=template, name="my_template_transform", p=1.0)
>>> blended_image = transform(image=image)['image']
Apply template transform with multiple templates and custom weights
Python
>>> templates = [np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8) for _ in range(3)]
>>> transform = A.TemplateTransform(
...     templates=templates,
...     img_weight=(0.3, 0.7),
...     template_weight=(0.5, 0.8),
...     name="multi_template_transform",
...     p=1.0
... )
>>> blended_image = transform(image=image)['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class TemplateTransform(ImageOnlyTransform):
    """Apply blending of input image with specified templates.

    This transform overlays one or more template images onto the input image using alpha blending.
    It allows for creating complex composite images or simulating various visual effects.

    Args:
        templates (numpy array | list[np.ndarray]): Images to use as templates for the transform.
            If a single numpy array is provided, it will be used as the only template.
            If a list of numpy arrays is provided, one will be randomly chosen for each application.

        img_weight (tuple[float, float]  | float): Weight of the original image in the blend.
            If a single float, that value will always be used.
            If a tuple (min, max), the weight will be randomly sampled from the range [min, max) for each application.
            To use a fixed weight, use (weight, weight).
            Default: (0.5, 0.5).

        template_weight (tuple[float, float] | float): Weight of the template image in the blend.
            If a single float, that value will always be used.
            If a tuple (min, max), the weight will be randomly sampled from the range [min, max) for each application.
            To use a fixed weight, use (weight, weight).
            Default: (0.5, 0.5).

        template_transform (A.Compose | None): A composition of Albumentations transforms to apply to the template
            before blending.
            This should be an instance of A.Compose containing one or more Albumentations transforms.
            Default: None.

        name (str | None): Name of the transform instance. Used for serialization purposes.
            Default: None.

        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        Any

    Note:
        - The template(s) must have the same number of channels as the input image or be single-channel.
        - If a single-channel template is used with a multi-channel image, the template will be replicated across
            all channels.
        - The template(s) must have the same size as the input image.
        - The weights determine the contribution of each image (original and template) to the final blend.
          Higher weights result in a stronger presence of that image in the output.
        - The weights are automatically normalized before blending. This ensures that
          the sum of the normalized weights always equals 1, maintaining the overall
          brightness of the blended image.
        - The relative proportion of the weights determines the blend ratio. For example,
          img_weight=2 and template_weight=1 will result in the same blend as
          img_weight=0.66 and template_weight=0.33.
        - To make this transform serializable, provide a name when initializing it.

    Mathematical Formulation:
        Given:
        - I: Input image
        - T: Template image
        - w_i: Weight of input image (sampled from img_weight)
        - w_t: Weight of template image (sampled from template_weight)

        The normalized weights are computed as:
        w_i_norm = w_i / (w_i + w_t)
        w_t_norm = w_t / (w_i + w_t)

        The blended image B is then computed as:

        B = w_i_norm * I + w_t_norm * T

        This ensures that w_i_norm + w_t_norm = 1, maintaining the overall image intensity.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> template = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)

        # Apply template transform with a single template
        >>> transform = A.TemplateTransform(templates=template, name="my_template_transform", p=1.0)
        >>> blended_image = transform(image=image)['image']

        # Apply template transform with multiple templates and custom weights
        >>> templates = [np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8) for _ in range(3)]
        >>> transform = A.TemplateTransform(
        ...     templates=templates,
        ...     img_weight=(0.3, 0.7),
        ...     template_weight=(0.5, 0.8),
        ...     name="multi_template_transform",
        ...     p=1.0
        ... )
        >>> blended_image = transform(image=image)['image']

    References:
        - Alpha compositing: https://en.wikipedia.org/wiki/Alpha_compositing
        - Image blending: https://en.wikipedia.org/wiki/Image_blending
    """

    class InitSchema(BaseTransformInitSchema):
        templates: np.ndarray | Sequence[np.ndarray]
        img_weight: ZeroOneRangeType
        template_weight: ZeroOneRangeType
        template_transform: Compose | BasicTransform | None = None
        name: str | None

        @field_validator("templates")
        @classmethod
        def validate_templates(cls, v: np.ndarray | list[np.ndarray]) -> list[np.ndarray]:
            if isinstance(v, np.ndarray):
                return [v]
            if isinstance(v, list):
                if not all(isinstance(item, np.ndarray) for item in v):
                    msg = "All templates must be numpy arrays."
                    raise ValueError(msg)
                return v
            msg = "Templates must be a numpy array or a list of numpy arrays."
            raise TypeError(msg)

    def __init__(
        self,
        templates: np.ndarray | list[np.ndarray],
        img_weight: ScaleFloatType = (0.5, 0.5),
        template_weight: ScaleFloatType = (0.5, 0.5),
        template_transform: Compose | BasicTransform | None = None,
        name: str | None = None,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.templates = templates
        self.img_weight = cast(Tuple[float, float], img_weight)
        self.template_weight = cast(Tuple[float, float], template_weight)
        self.template_transform = template_transform
        self.name = name

    def apply(
        self,
        img: np.ndarray,
        template: np.ndarray,
        img_weight: float,
        template_weight: float,
        **params: Any,
    ) -> np.ndarray:
        if img_weight == 0:
            return template
        if template_weight == 0:
            return img

        total_weight = img_weight + template_weight
        img_weight_norm = img_weight / total_weight
        template_weight_norm = template_weight / total_weight

        return add_weighted(img, img_weight_norm, template, template_weight_norm)

    def get_params(self) -> dict[str, float]:
        return {
            "img_weight": random.uniform(*self.img_weight),
            "template_weight": random.uniform(*self.template_weight),
        }

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        img = data["image"] if "image" in data else data["images"][0]

        template = random.choice(self.templates)

        if self.template_transform is not None:
            template = self.template_transform(image=template)["image"]

        if get_num_channels(template) not in [1, get_num_channels(img)]:
            msg = (
                "Template must be a single channel or "
                "has the same number of channels as input "
                f"image ({get_num_channels(img)}), got {get_num_channels(template)}"
            )
            raise ValueError(msg)

        if template.dtype != img.dtype:
            msg = "Image and template must be the same image type"
            raise ValueError(msg)

        if img.shape[:2] != template.shape[:2]:
            template = fgeometric.resize(template, img.shape[:2], interpolation=cv2.INTER_AREA)

        if get_num_channels(template) == 1 and get_num_channels(img) > 1:
            template = np.stack((template,) * get_num_channels(img), axis=-1)

        # in order to support grayscale image with dummy dim
        template = template.reshape(img.shape)

        return {"template": template}

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    def to_dict_private(self) -> dict[str, Any]:
        if self.name is None:
            msg = (
                "To make a TemplateTransform serializable you should provide the `name` argument, "
                "e.g. `TemplateTransform(name='my_transform', ...)`."
            )
            raise ValueError(msg)
        return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    templates: np.ndarray | Sequence[np.ndarray]
    img_weight: ZeroOneRangeType
    template_weight: ZeroOneRangeType
    template_transform: Compose | BasicTransform | None = None
    name: str | None

    @field_validator("templates")
    @classmethod
    def validate_templates(cls, v: np.ndarray | list[np.ndarray]) -> list[np.ndarray]:
        if isinstance(v, np.ndarray):
            return [v]
        if isinstance(v, list):
            if not all(isinstance(item, np.ndarray) for item in v):
                msg = "All templates must be numpy arrays."
                raise ValueError(msg)
            return v
        msg = "Templates must be a numpy array or a list of numpy arrays."
        raise TypeError(msg)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, template, img_weight, template_weight, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(
    self,
    img: np.ndarray,
    template: np.ndarray,
    img_weight: float,
    template_weight: float,
    **params: Any,
) -> np.ndarray:
    if img_weight == 0:
        return template
    if template_weight == 0:
        return img

    total_weight = img_weight + template_weight
    img_weight_norm = img_weight / total_weight
    template_weight_norm = template_weight / total_weight

    return add_weighted(img, img_weight_norm, template, template_weight_norm)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, float]:
    return {
        "img_weight": random.uniform(*self.img_weight),
        "template_weight": random.uniform(*self.template_weight),
    }
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    img = data["image"] if "image" in data else data["images"][0]

    template = random.choice(self.templates)

    if self.template_transform is not None:
        template = self.template_transform(image=template)["image"]

    if get_num_channels(template) not in [1, get_num_channels(img)]:
        msg = (
            "Template must be a single channel or "
            "has the same number of channels as input "
            f"image ({get_num_channels(img)}), got {get_num_channels(template)}"
        )
        raise ValueError(msg)

    if template.dtype != img.dtype:
        msg = "Image and template must be the same image type"
        raise ValueError(msg)

    if img.shape[:2] != template.shape[:2]:
        template = fgeometric.resize(template, img.shape[:2], interpolation=cv2.INTER_AREA)

    if get_num_channels(template) == 1 and get_num_channels(img) > 1:
        template = np.stack((template,) * get_num_channels(img), axis=-1)

    # in order to support grayscale image with dummy dim
    template = template.reshape(img.shape)

    return {"template": template}

class ToFloat (max_value=None, p=1.0, always_apply=None) [view source on GitHub]

Convert the input image to a floating-point representation.

This transform divides pixel values by max_value to get a float32 output array where all values lie in the range [0, 1.0]. It's useful for normalizing image data before feeding it into neural networks or other algorithms that expect float input.

Parameters:

Name Type Description
max_value float | None

The maximum possible input value. If None, the transform will try to infer the maximum value by inspecting the data type of the input image: - uint8: 255 - uint16: 65535 - uint32: 4294967295 - float32: 1.0 Default: None.

p float

Probability of applying the transform. Default: 1.0.

Targets

image

Image types: uint8, uint16, uint32, float32

Returns:

Type Description
np.ndarray

Image in floating point representation, with values in range [0, 1.0].

Note

  • If the input image is already float32 with values in [0, 1], it will be returned unchanged.
  • For integer types (uint8, uint16, uint32), the function will scale the values to [0, 1] range.
  • The output will always be float32, regardless of the input type.
  • This transform is often used as a preprocessing step before applying other transformations or feeding the image into a neural network.

Exceptions:

Type Description
TypeError

If the input image data type is not supported.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>>
# Convert uint8 image to float
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ToFloat(max_value=None)
>>> float_image = transform(image=image)['image']
>>> assert float_image.dtype == np.float32
>>> assert 0 <= float_image.min() <= float_image.max() <= 1.0
>>>
# Convert uint16 image to float with custom max_value
>>> image = np.random.randint(0, 4096, (100, 100, 3), dtype=np.uint16)
>>> transform = A.ToFloat(max_value=4095)
>>> float_image = transform(image=image)['image']
>>> assert float_image.dtype == np.float32
>>> assert 0 <= float_image.min() <= float_image.max() <= 1.0

See Also: FromFloat: The inverse operation, converting from float back to the original data type.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ToFloat(ImageOnlyTransform):
    """Convert the input image to a floating-point representation.

    This transform divides pixel values by `max_value` to get a float32 output array
    where all values lie in the range [0, 1.0]. It's useful for normalizing image data
    before feeding it into neural networks or other algorithms that expect float input.

    Args:
        max_value (float | None): The maximum possible input value. If None, the transform
            will try to infer the maximum value by inspecting the data type of the input image:
            - uint8: 255
            - uint16: 65535
            - uint32: 4294967295
            - float32: 1.0
            Default: None.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        uint8, uint16, uint32, float32

    Returns:
        np.ndarray: Image in floating point representation, with values in range [0, 1.0].

    Note:
        - If the input image is already float32 with values in [0, 1], it will be returned unchanged.
        - For integer types (uint8, uint16, uint32), the function will scale the values to [0, 1] range.
        - The output will always be float32, regardless of the input type.
        - This transform is often used as a preprocessing step before applying other transformations
          or feeding the image into a neural network.

    Raises:
        TypeError: If the input image data type is not supported.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>>
        # Convert uint8 image to float
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.ToFloat(max_value=None)
        >>> float_image = transform(image=image)['image']
        >>> assert float_image.dtype == np.float32
        >>> assert 0 <= float_image.min() <= float_image.max() <= 1.0
        >>>
        # Convert uint16 image to float with custom max_value
        >>> image = np.random.randint(0, 4096, (100, 100, 3), dtype=np.uint16)
        >>> transform = A.ToFloat(max_value=4095)
        >>> float_image = transform(image=image)['image']
        >>> assert float_image.dtype == np.float32
        >>> assert 0 <= float_image.min() <= float_image.max() <= 1.0

    See Also:
        FromFloat: The inverse operation, converting from float back to the original data type.
    """

    class InitSchema(BaseTransformInitSchema):
        max_value: float | None
        p: ProbabilityType = 1

    def __init__(self, max_value: float | None = None, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.max_value = max_value

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return to_float(img, self.max_value)

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("max_value",)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    max_value: float | None
    p: ProbabilityType = 1
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return to_float(img, self.max_value)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str]:
    return ("max_value",)

class ToGray (num_output_channels=3, method='weighted_average', always_apply=None, p=0.5) [view source on GitHub]

Convert an image to grayscale and optionally replicate the grayscale channel.

This transform first converts a color image to a single-channel grayscale image using various methods, then replicates the grayscale channel if num_output_channels is greater than 1.

Parameters:

Name Type Description
num_output_channels int

The number of channels in the output image. If greater than 1, the grayscale channel will be replicated. Default: 3.

method Literal["weighted_average", "from_lab", "desaturation", "average", "max", "pca"]

The method used for grayscale conversion: - "weighted_average": Uses a weighted sum of RGB channels (0.299R + 0.587G + 0.114B). Works only with 3-channel images. Provides realistic results based on human perception. - "from_lab": Extracts the L channel from the LAB color space. Works only with 3-channel images. Gives perceptually uniform results. - "desaturation": Averages the maximum and minimum values across channels. Works with any number of channels. Fast but may not preserve perceived brightness well. - "average": Simple average of all channels. Works with any number of channels. Fast but may not give realistic results. - "max": Takes the maximum value across all channels. Works with any number of channels. Tends to produce brighter results. - "pca": Applies Principal Component Analysis to reduce channels. Works with any number of channels. Can preserve more information but is computationally intensive.

p float

Probability of applying the transform. Default: 0.5.

Exceptions:

Type Description
TypeError

If the input image doesn't have 3 channels for methods that require it.

Note

  • The transform first converts the input image to single-channel grayscale, then replicates this channel if num_output_channels > 1.
  • "weighted_average" and "from_lab" are typically used in image processing and computer vision applications where accurate representation of human perception is important.
  • "desaturation" and "average" are often used in simple image manipulation tools or when computational speed is a priority.
  • "max" method can be useful in scenarios where preserving bright features is important, such as in some medical imaging applications.
  • "pca" might be used in advanced image analysis tasks or when dealing with hyperspectral images.

Image types: uint8, float32

Returns:

Type Description
np.ndarray

Grayscale image with the specified number of channels.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ToGray(ImageOnlyTransform):
    """Convert an image to grayscale and optionally replicate the grayscale channel.

    This transform first converts a color image to a single-channel grayscale image using various methods,
    then replicates the grayscale channel if num_output_channels is greater than 1.

    Args:
        num_output_channels (int): The number of channels in the output image. If greater than 1,
            the grayscale channel will be replicated. Default: 3.
        method (Literal["weighted_average", "from_lab", "desaturation", "average", "max", "pca"]):
            The method used for grayscale conversion:
            - "weighted_average": Uses a weighted sum of RGB channels (0.299R + 0.587G + 0.114B).
              Works only with 3-channel images. Provides realistic results based on human perception.
            - "from_lab": Extracts the L channel from the LAB color space.
              Works only with 3-channel images. Gives perceptually uniform results.
            - "desaturation": Averages the maximum and minimum values across channels.
              Works with any number of channels. Fast but may not preserve perceived brightness well.
            - "average": Simple average of all channels.
              Works with any number of channels. Fast but may not give realistic results.
            - "max": Takes the maximum value across all channels.
              Works with any number of channels. Tends to produce brighter results.
            - "pca": Applies Principal Component Analysis to reduce channels.
              Works with any number of channels. Can preserve more information but is computationally intensive.
        p (float): Probability of applying the transform. Default: 0.5.

    Raises:
        TypeError: If the input image doesn't have 3 channels for methods that require it.

    Note:
        - The transform first converts the input image to single-channel grayscale, then replicates
          this channel if num_output_channels > 1.
        - "weighted_average" and "from_lab" are typically used in image processing and computer vision
          applications where accurate representation of human perception is important.
        - "desaturation" and "average" are often used in simple image manipulation tools or when
          computational speed is a priority.
        - "max" method can be useful in scenarios where preserving bright features is important,
          such as in some medical imaging applications.
        - "pca" might be used in advanced image analysis tasks or when dealing with hyperspectral images.

    Image types:
        uint8, float32

    Returns:
        np.ndarray: Grayscale image with the specified number of channels.
    """

    class InitSchema(BaseTransformInitSchema):
        num_output_channels: int = Field(default=3, description="The number of output channels.", ge=1)
        method: Literal["weighted_average", "from_lab", "desaturation", "average", "max", "pca"]

    def __init__(
        self,
        num_output_channels: int = 3,
        method: Literal["weighted_average", "from_lab", "desaturation", "average", "max", "pca"] = "weighted_average",
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.num_output_channels = num_output_channels
        self.method = method

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if is_grayscale_image(img):
            warnings.warn("The image is already gray.", stacklevel=2)
            return img

        num_channels = get_num_channels(img)

        if num_channels != NUM_RGB_CHANNELS and self.method not in {"desaturation", "average", "max", "pca"}:
            msg = "ToGray transformation expects 3-channel images."
            raise TypeError(msg)

        return fmain.to_gray(img, self.num_output_channels, self.method)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "num_output_channels", "method"
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    num_output_channels: int = Field(default=3, description="The number of output channels.", ge=1)
    method: Literal["weighted_average", "from_lab", "desaturation", "average", "max", "pca"]
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if is_grayscale_image(img):
        warnings.warn("The image is already gray.", stacklevel=2)
        return img

    num_channels = get_num_channels(img)

    if num_channels != NUM_RGB_CHANNELS and self.method not in {"desaturation", "average", "max", "pca"}:
        msg = "ToGray transformation expects 3-channel images."
        raise TypeError(msg)

    return fmain.to_gray(img, self.num_output_channels, self.method)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "num_output_channels", "method"

class ToRGB (num_output_channels=3, p=1.0, always_apply=None) [view source on GitHub]

Convert an input image from grayscale to RGB format.

Parameters:

Name Type Description
num_output_channels int

The number of channels in the output image. Default: 3.

p float

Probability of applying the transform. Default: 1.0.

Targets

image

Image types: uint8, float32

Number of channels: 1

Note

  • For single-channel (grayscale) images, the channel is replicated to create an RGB image.
  • If the input is already a 3-channel RGB image, it is returned unchanged.
  • This transform does not change the data type of the image (e.g., uint8 remains uint8).

Exceptions:

Type Description
TypeError

If the input image has more than 1 channel.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>>
>>> # Convert a grayscale image to RGB
>>> transform = A.Compose([A.ToRGB(p=1.0)])
>>> grayscale_image = np.random.randint(0, 256, (100, 100), dtype=np.uint8)
>>> rgb_image = transform(image=grayscale_image)['image']
>>> assert rgb_image.shape == (100, 100, 3)

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ToRGB(ImageOnlyTransform):
    """Convert an input image from grayscale to RGB format.

    Args:
        num_output_channels (int): The number of channels in the output image. Default: 3.
        p (float): Probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        1

    Note:
        - For single-channel (grayscale) images, the channel is replicated to create an RGB image.
        - If the input is already a 3-channel RGB image, it is returned unchanged.
        - This transform does not change the data type of the image (e.g., uint8 remains uint8).

    Raises:
        TypeError: If the input image has more than 1 channel.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>>
        >>> # Convert a grayscale image to RGB
        >>> transform = A.Compose([A.ToRGB(p=1.0)])
        >>> grayscale_image = np.random.randint(0, 256, (100, 100), dtype=np.uint8)
        >>> rgb_image = transform(image=grayscale_image)['image']
        >>> assert rgb_image.shape == (100, 100, 3)
    """

    class InitSchema(BaseTransformInitSchema):
        num_output_channels: int = Field(ge=1)

    def __init__(self, num_output_channels: int = 3, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)

        self.num_output_channels = num_output_channels

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        if is_rgb_image(img):
            warnings.warn("The image is already an RGB.", stacklevel=2)
            return np.ascontiguousarray(img)
        if not is_grayscale_image(img):
            msg = "ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1."
            raise TypeError(msg)

        return fmain.grayscale_to_multichannel(img, num_output_channels=self.num_output_channels)

    def get_transform_init_args_names(self) -> tuple[str]:
        return ("num_output_channels",)
class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    num_output_channels: int = Field(ge=1)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    if is_rgb_image(img):
        warnings.warn("The image is already an RGB.", stacklevel=2)
        return np.ascontiguousarray(img)
    if not is_grayscale_image(img):
        msg = "ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1."
        raise TypeError(msg)

    return fmain.grayscale_to_multichannel(img, num_output_channels=self.num_output_channels)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str]:
    return ("num_output_channels",)

class ToSepia (p=0.5, always_apply=None) [view source on GitHub]

Apply a sepia filter to the input image.

This transform converts a color image to a sepia tone, giving it a warm, brownish tint that is reminiscent of old photographs. The sepia effect is achieved by applying a specific color transformation matrix to the RGB channels of the input image.

Parameters:

Name Type Description
p float

Probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Number of channels: 3

Note

  • This transform only works with RGB images (3 channels).
  • The sepia effect is created using a fixed color transformation matrix: [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]]
  • The output image will have the same data type as the input image.
  • For float32 images, ensure the input values are in the range [0, 1].

Exceptions:

Type Description
TypeError

If the input image is not a 3-channel RGB image.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>>
# Apply sepia effect to a uint8 image
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ToSepia(p=1.0)
>>> sepia_image = transform(image=image)['image']
>>> assert sepia_image.shape == image.shape
>>> assert sepia_image.dtype == np.uint8
>>>
# Apply sepia effect to a float32 image
>>> image = np.random.rand(100, 100, 3).astype(np.float32)
>>> transform = A.ToSepia(p=1.0)
>>> sepia_image = transform(image=image)['image']
>>> assert sepia_image.shape == image.shape
>>> assert sepia_image.dtype == np.float32
>>> assert 0 <= sepia_image.min() <= sepia_image.max() <= 1.0

Mathematical Formulation: Given an input pixel [R, G, B], the sepia tone is calculated as: R_sepia = 0.393R + 0.769G + 0.189B G_sepia = 0.349R + 0.686G + 0.168B B_sepia = 0.272R + 0.534G + 0.131*B

The output values are then clipped to the valid range for the image's data type.

See Also: ToGray: For converting images to grayscale instead of sepia.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class ToSepia(ImageOnlyTransform):
    """Apply a sepia filter to the input image.

    This transform converts a color image to a sepia tone, giving it a warm, brownish tint
    that is reminiscent of old photographs. The sepia effect is achieved by applying a
    specific color transformation matrix to the RGB channels of the input image.

    Args:
        p (float): Probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Number of channels:
        3

    Note:
        - This transform only works with RGB images (3 channels).
        - The sepia effect is created using a fixed color transformation matrix:
          [[0.393, 0.769, 0.189],
           [0.349, 0.686, 0.168],
           [0.272, 0.534, 0.131]]
        - The output image will have the same data type as the input image.
        - For float32 images, ensure the input values are in the range [0, 1].

    Raises:
        TypeError: If the input image is not a 3-channel RGB image.

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>>
        # Apply sepia effect to a uint8 image
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> transform = A.ToSepia(p=1.0)
        >>> sepia_image = transform(image=image)['image']
        >>> assert sepia_image.shape == image.shape
        >>> assert sepia_image.dtype == np.uint8
        >>>
        # Apply sepia effect to a float32 image
        >>> image = np.random.rand(100, 100, 3).astype(np.float32)
        >>> transform = A.ToSepia(p=1.0)
        >>> sepia_image = transform(image=image)['image']
        >>> assert sepia_image.shape == image.shape
        >>> assert sepia_image.dtype == np.float32
        >>> assert 0 <= sepia_image.min() <= sepia_image.max() <= 1.0

    Mathematical Formulation:
        Given an input pixel [R, G, B], the sepia tone is calculated as:
        R_sepia = 0.393*R + 0.769*G + 0.189*B
        G_sepia = 0.349*R + 0.686*G + 0.168*B
        B_sepia = 0.272*R + 0.534*G + 0.131*B

        The output values are then clipped to the valid range for the image's data type.

    See Also:
        ToGray: For converting images to grayscale instead of sepia.
    """

    def __init__(self, p: float = 0.5, always_apply: bool | None = None):
        super().__init__(p, always_apply)
        self.sepia_transformation_matrix = np.array(
            [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]],
        )

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        non_rgb_error(img)
        return fmain.linear_transformation_rgb(img, self.sepia_transformation_matrix)

    def get_transform_init_args_names(self) -> tuple[()]:
        return ()
__init__ (self, p=0.5, always_apply=None) special

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/augmentations/transforms.py
Python
def __init__(self, p: float = 0.5, always_apply: bool | None = None):
    super().__init__(p, always_apply)
    self.sepia_transformation_matrix = np.array(
        [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]],
    )
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    non_rgb_error(img)
    return fmain.linear_transformation_rgb(img, self.sepia_transformation_matrix)
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[()]:
    return ()

class UnsharpMask (blur_limit=(3, 7), sigma_limit=0.0, alpha=(0.2, 0.5), threshold=10, always_apply=None, p=0.5) [view source on GitHub]

Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

Unsharp masking is a technique that enhances edge contrast in an image, creating the illusion of increased sharpness. This transform applies Gaussian blur to create a blurred version of the image, then uses this to create a mask which is combined with the original image to enhance edges and fine details.

Parameters:

Name Type Description
blur_limit tuple[int, int] | int

maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1. If set single value blur_limit will be in range (0, blur_limit). Default: (3, 7).

sigma_limit tuple[float, float] | float

Gaussian kernel standard deviation. Must be in range [0, inf). If set single value sigma_limit will be in range (0, sigma_limit). If set to 0 sigma will be computed as sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8. Default: 0.

alpha tuple[float, float]

range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).

threshold int

Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10.

p float

probability of applying the transform. Default: 0.5.

Targets

image

Image types: uint8, float32

Note

  • The algorithm creates a mask M = (I - G) * alpha, where I is the original image and G is the Gaussian blurred version.
  • The final image is computed as: output = I + M if |I - G| > threshold, else I.
  • Higher alpha values increase the strength of the sharpening effect.
  • Higher threshold values limit the sharpening effect to areas with more significant edges or details.
  • The blur_limit and sigma_limit parameters control the Gaussian blur used to create the mask.

Examples:

Python
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
# Apply UnsharpMask with default parameters
>>> transform = A.UnsharpMask(p=1.0)
>>> sharpened_image = transform(image=image)['image']
>>>
# Apply UnsharpMask with custom parameters
>>> transform = A.UnsharpMask(
...     blur_limit=(3, 7),
...     sigma_limit=(0.1, 0.5),
...     alpha=(0.2, 0.7),
...     threshold=15,
...     p=1.0
... )
>>> sharpened_image = transform(image=image)['image']

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class UnsharpMask(ImageOnlyTransform):
    """Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

    Unsharp masking is a technique that enhances edge contrast in an image, creating the illusion of increased
        sharpness.
    This transform applies Gaussian blur to create a blurred version of the image, then uses this to create a mask
    which is combined with the original image to enhance edges and fine details.

    Args:
        blur_limit (tuple[int, int] | int): maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit (tuple[float, float] | float): Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        alpha (tuple[float, float]): range to choose the visibility of the sharpened image.
            At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
            Default: (0.2, 0.5).
        threshold (int): Value to limit sharpening only for areas with high pixel difference between original image
            and it's smoothed version. Higher threshold means less sharpening on flat areas.
            Must be in range [0, 255]. Default: 10.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Note:
        - The algorithm creates a mask M = (I - G) * alpha, where I is the original image and G is the Gaussian
            blurred version.
        - The final image is computed as: output = I + M if |I - G| > threshold, else I.
        - Higher alpha values increase the strength of the sharpening effect.
        - Higher threshold values limit the sharpening effect to areas with more significant edges or details.
        - The blur_limit and sigma_limit parameters control the Gaussian blur used to create the mask.

    References:
        - https://en.wikipedia.org/wiki/Unsharp_masking
        - https://arxiv.org/pdf/2107.10833.pdf

    Examples:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>>
        # Apply UnsharpMask with default parameters
        >>> transform = A.UnsharpMask(p=1.0)
        >>> sharpened_image = transform(image=image)['image']
        >>>
        # Apply UnsharpMask with custom parameters
        >>> transform = A.UnsharpMask(
        ...     blur_limit=(3, 7),
        ...     sigma_limit=(0.1, 0.5),
        ...     alpha=(0.2, 0.7),
        ...     threshold=15,
        ...     p=1.0
        ... )
        >>> sharpened_image = transform(image=image)['image']
    """

    class InitSchema(BaseTransformInitSchema):
        sigma_limit: NonNegativeFloatRangeType
        alpha: ZeroOneRangeType
        threshold: int = Field(ge=0, le=255)
        blur_limit: ScaleIntType

        @field_validator("blur_limit")
        @classmethod
        def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
            return process_blur_limit(value, info, min_value=3)

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0.0,
        alpha: ScaleFloatType = (0.2, 0.5),
        threshold: int = 10,
        always_apply: bool | None = None,
        p: float = 0.5,
    ):
        super().__init__(p=p, always_apply=always_apply)
        self.blur_limit = cast(Tuple[int, int], blur_limit)
        self.sigma_limit = cast(Tuple[float, float], sigma_limit)
        self.alpha = cast(Tuple[float, float], alpha)
        self.threshold = threshold

    def get_params(self) -> dict[str, Any]:
        return {
            "ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
            "sigma": random.uniform(*self.sigma_limit),
            "alpha": random.uniform(*self.alpha),
        }

    def apply(self, img: np.ndarray, ksize: int, sigma: int, alpha: float, **params: Any) -> np.ndarray:
        return fmain.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return "blur_limit", "sigma_limit", "alpha", "threshold"
class InitSchema [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/augmentations/transforms.py
Python
class InitSchema(BaseTransformInitSchema):
    sigma_limit: NonNegativeFloatRangeType
    alpha: ZeroOneRangeType
    threshold: int = Field(ge=0, le=255)
    blur_limit: ScaleIntType

    @field_validator("blur_limit")
    @classmethod
    def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
        return process_blur_limit(value, info, min_value=3)
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

apply (self, img, ksize, sigma, alpha, **params)

Apply transform on image.

Source code in albumentations/augmentations/transforms.py
Python
def apply(self, img: np.ndarray, ksize: int, sigma: int, alpha: float, **params: Any) -> np.ndarray:
    return fmain.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)
get_params (self)

Returns parameters independent of input.

Source code in albumentations/augmentations/transforms.py
Python
def get_params(self) -> dict[str, Any]:
    return {
        "ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
        "sigma": random.uniform(*self.sigma_limit),
        "alpha": random.uniform(*self.alpha),
    }
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/augmentations/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return "blur_limit", "sigma_limit", "alpha", "threshold"

utils

def check_range (value, lower_bound, upper_bound, name) [view source on GitHub]

Checks if the given value is within the specified bounds

Parameters:

Name Type Description
value tuple[float, float]

The value to check and convert. Can be a single float or a tuple of floats.

lower_bound float

The lower bound for the range check.

upper_bound float

The upper bound for the range check.

name str | None

The name of the parameter being checked. Used for error messages.

Exceptions:

Type Description
ValueError

If the value is outside the bounds or if the tuple values are not ordered correctly.

Source code in albumentations/augmentations/utils.py
Python
def check_range(value: tuple[float, float], lower_bound: float, upper_bound: float, name: str | None) -> None:
    """Checks if the given value is within the specified bounds

    Args:
        value: The value to check and convert. Can be a single float or a tuple of floats.
        lower_bound: The lower bound for the range check.
        upper_bound: The upper bound for the range check.
        name: The name of the parameter being checked. Used for error messages.

    Raises:
        ValueError: If the value is outside the bounds or if the tuple values are not ordered correctly.
    """
    if not all(lower_bound <= x <= upper_bound for x in value):
        raise ValueError(f"All values in {name} must be within [{lower_bound}, {upper_bound}] for tuple inputs.")
    if not value[0] <= value[1]:
        raise ValueError(f"{name!s} tuple values must be ordered as (min, max). Got: {value}")

def non_rgb_error (image) [view source on GitHub]

Check if the input image is RGB and raise a ValueError if it's not.

This function is used to ensure that certain transformations are only applied to RGB images. It provides helpful error messages for grayscale and multi-spectral images.

Parameters:

Name Type Description
image np.ndarray

The input image to check. Expected to be a numpy array representing an image.

Exceptions:

Type Description
ValueError

If the input image is not an RGB image (i.e., does not have exactly 3 channels). The error message includes specific instructions for grayscale images and a note about incompatibility with multi-spectral images.

Note

  • RGB images are expected to have exactly 3 channels.
  • Grayscale images (1 channel) will trigger an error with conversion instructions.
  • Multi-spectral images (more than 3 channels) will trigger an error stating incompatibility.

Examples:

Python
>>> import numpy as np
>>> rgb_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> non_rgb_error(rgb_image)  # No error raised
>>>
>>> grayscale_image = np.random.randint(0, 256, (100, 100), dtype=np.uint8)
>>> non_rgb_error(grayscale_image)  # Raises ValueError with conversion instructions
>>>
>>> multispectral_image = np.random.randint(0, 256, (100, 100, 5), dtype=np.uint8)
>>> non_rgb_error(multispectral_image)  # Raises ValueError stating incompatibility
Source code in albumentations/augmentations/utils.py
Python
def non_rgb_error(image: np.ndarray) -> None:
    """Check if the input image is RGB and raise a ValueError if it's not.

    This function is used to ensure that certain transformations are only applied to
    RGB images. It provides helpful error messages for grayscale and multi-spectral images.

    Args:
        image (np.ndarray): The input image to check. Expected to be a numpy array
                            representing an image.

    Raises:
        ValueError: If the input image is not an RGB image (i.e., does not have exactly 3 channels).
                    The error message includes specific instructions for grayscale images
                    and a note about incompatibility with multi-spectral images.

    Note:
        - RGB images are expected to have exactly 3 channels.
        - Grayscale images (1 channel) will trigger an error with conversion instructions.
        - Multi-spectral images (more than 3 channels) will trigger an error stating incompatibility.

    Example:
        >>> import numpy as np
        >>> rgb_image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
        >>> non_rgb_error(rgb_image)  # No error raised
        >>>
        >>> grayscale_image = np.random.randint(0, 256, (100, 100), dtype=np.uint8)
        >>> non_rgb_error(grayscale_image)  # Raises ValueError with conversion instructions
        >>>
        >>> multispectral_image = np.random.randint(0, 256, (100, 100, 5), dtype=np.uint8)
        >>> non_rgb_error(multispectral_image)  # Raises ValueError stating incompatibility
    """
    if not is_rgb_image(image):
        message = "This transformation expects 3-channel images"
        if is_grayscale_image(image):
            message += "\nYou can convert your grayscale image to RGB using cv2.cvtColor(image, cv2.COLOR_GRAY2RGB))"
        if is_multispectral_image(image):  # Any image with a number of channels other than 1 and 3
            message += "\nThis transformation cannot be applied to multi-spectral images"

        raise ValueError(message)

check_version

def parse_version (data) [view source on GitHub]

Parses the version from the given JSON data.

Source code in albumentations/check_version.py
Python
def parse_version(data: str) -> str:
    """Parses the version from the given JSON data."""
    if data:
        try:
            json_data = json.loads(data)
            # Use .get() to avoid KeyError if 'version' is not present
            return json_data.get("info", {}).get("version", "")
        except json.JSONDecodeError:
            # This will handle malformed JSON data
            return ""
    return ""

core special

bbox_utils

class BboxParams (format, label_fields=None, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0, check_each_transform=True, clip=False) [view source on GitHub]

Parameters of bounding boxes

Parameters:

Name Type Description
format Literal["coco", "pascal_voc", "albumentations", "yolo"]

format of bounding boxes.

The coco format [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212]. The albumentations format is like pascal_voc, but normalized, in other words: [x_min, y_min, x_max, y_max], e.g. [0.2, 0.3, 0.4, 0.5]. The yolo format [x, y, width, height], e.g. [0.1, 0.2, 0.3, 0.4]; x, y - normalized bbox center; width, height - normalized bbox width and height.

label_fields list

List of fields joined with boxes, e.g., labels.

min_area float

Minimum area of a bounding box in pixels or normalized units. Bounding boxes with an area less than this value will be removed. Default: 0.0.

min_visibility float

Minimum fraction of area for a bounding box to remain in the list. Bounding boxes with a visible area less than this fraction will be removed. Default: 0.0.

min_width float

Minimum width of a bounding box in pixels or normalized units. Bounding boxes with a width less than this value will be removed. Default: 0.0.

min_height float

Minimum height of a bounding box in pixels or normalized units. Bounding boxes with a height less than this value will be removed. Default: 0.0.

check_each_transform bool

If True, bounding boxes will be checked after each dual transform. Default: True.

clip bool

If True, bounding boxes will be clipped to the image borders before applying any transform. Default: False.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/bbox_utils.py
Python
class BboxParams(Params):
    """Parameters of bounding boxes

    Args:
        format Literal["coco", "pascal_voc", "albumentations", "yolo"]: format of bounding boxes.

            The `coco` format
                `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
            The `pascal_voc` format
                `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
            The `albumentations` format
                is like `pascal_voc`, but normalized,
                in other words: `[x_min, y_min, x_max, y_max]`, e.g. [0.2, 0.3, 0.4, 0.5].
            The `yolo` format
                `[x, y, width, height]`, e.g. [0.1, 0.2, 0.3, 0.4];
                `x`, `y` - normalized bbox center; `width`, `height` - normalized bbox width and height.

        label_fields (list): List of fields joined with boxes, e.g., labels.
        min_area (float): Minimum area of a bounding box in pixels or normalized units.
            Bounding boxes with an area less than this value will be removed. Default: 0.0.
        min_visibility (float): Minimum fraction of area for a bounding box to remain in the list.
            Bounding boxes with a visible area less than this fraction will be removed. Default: 0.0.
        min_width (float): Minimum width of a bounding box in pixels or normalized units.
            Bounding boxes with a width less than this value will be removed. Default: 0.0.
        min_height (float): Minimum height of a bounding box in pixels or normalized units.
            Bounding boxes with a height less than this value will be removed. Default: 0.0.
        check_each_transform (bool): If True, bounding boxes will be checked after each dual transform. Default: True.
        clip (bool): If True, bounding boxes will be clipped to the image borders before applying any transform.
            Default: False.

    """

    def __init__(
        self,
        format: Literal["coco", "pascal_voc", "albumentations", "yolo"],  # noqa: A002
        label_fields: Sequence[Any] | None = None,
        min_area: float = 0.0,
        min_visibility: float = 0.0,
        min_width: float = 0.0,
        min_height: float = 0.0,
        check_each_transform: bool = True,
        clip: bool = False,
    ):
        super().__init__(format, label_fields)
        self.min_area = min_area
        self.min_visibility = min_visibility
        self.min_width = min_width
        self.min_height = min_height
        self.check_each_transform = check_each_transform
        self.clip = clip

    def to_dict_private(self) -> dict[str, Any]:
        data = super().to_dict_private()
        data.update(
            {
                "min_area": self.min_area,
                "min_visibility": self.min_visibility,
                "min_width": self.min_width,
                "min_height": self.min_height,
                "check_each_transform": self.check_each_transform,
                "clip": self.clip,
            },
        )
        return data

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    @classmethod
    def get_class_fullname(cls) -> str:
        return "BboxParams"

def bbox_from_mask (mask) [view source on GitHub]

Create bounding box from binary mask (fast version)

Parameters:

Name Type Description
mask numpy.ndarray

binary mask.

Returns:

Type Description
tuple

A bounding box tuple (x_min, y_min, x_max, y_max).

Source code in albumentations/core/bbox_utils.py
Python
def bbox_from_mask(mask: np.ndarray) -> tuple[int, int, int, int]:
    """Create bounding box from binary mask (fast version)

    Args:
        mask (numpy.ndarray): binary mask.

    Returns:
        tuple: A bounding box tuple `(x_min, y_min, x_max, y_max)`.

    """
    rows = np.any(mask, axis=1)
    if not rows.any():
        return -1, -1, -1, -1
    cols = np.any(mask, axis=0)
    y_min, y_max = np.where(rows)[0][[0, -1]]
    x_min, x_max = np.where(cols)[0][[0, -1]]
    return x_min, y_min, x_max + 1, y_max + 1

def calculate_bbox_areas_in_pixels (bboxes, image_shape) [view source on GitHub]

Calculate areas for multiple bounding boxes.

This function computes the areas of bounding boxes given their normalized coordinates and the dimensions of the image they belong to. The bounding boxes are expected to be in the format [x_min, y_min, x_max, y_max] with normalized coordinates (0 to 1).

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of shape (N, 4+) where N is the number of bounding boxes. Each row contains [x_min, y_min, x_max, y_max] in normalized coordinates. Additional columns beyond the first 4 are ignored.

image_shape tuple[int, int]

A tuple containing the height and width of the image (height, width).

Returns:

Type Description
np.ndarray

A 1D numpy array of shape (N,) containing the areas of the bounding boxes in pixels. Returns an empty array if the input bboxes is empty.

Note

  • The function assumes that the input bounding boxes are valid (i.e., x_max > x_min and y_max > y_min). Invalid bounding boxes may result in negative areas.
  • The function preserves the input array and creates a copy for internal calculations.
  • The returned areas are in pixel units, not normalized.

Examples:

Python
>>> bboxes = np.array([[0.1, 0.1, 0.5, 0.5], [0.2, 0.2, 0.8, 0.8]])
>>> image_shape = (100, 100)
>>> areas = calculate_bbox_areas(bboxes, image_shape)
>>> print(areas)
[1600. 3600.]
Source code in albumentations/core/bbox_utils.py
Python
def calculate_bbox_areas_in_pixels(bboxes: np.ndarray, image_shape: tuple[int, int]) -> np.ndarray:
    """Calculate areas for multiple bounding boxes.

    This function computes the areas of bounding boxes given their normalized coordinates
    and the dimensions of the image they belong to. The bounding boxes are expected to be
    in the format [x_min, y_min, x_max, y_max] with normalized coordinates (0 to 1).

    Args:
        bboxes (np.ndarray): A numpy array of shape (N, 4+) where N is the number of bounding boxes.
                             Each row contains [x_min, y_min, x_max, y_max] in normalized coordinates.
                             Additional columns beyond the first 4 are ignored.
        image_shape (tuple[int, int]): A tuple containing the height and width of the image (height, width).

    Returns:
        np.ndarray: A 1D numpy array of shape (N,) containing the areas of the bounding boxes in pixels.
                    Returns an empty array if the input `bboxes` is empty.

    Note:
        - The function assumes that the input bounding boxes are valid (i.e., x_max > x_min and y_max > y_min).
          Invalid bounding boxes may result in negative areas.
        - The function preserves the input array and creates a copy for internal calculations.
        - The returned areas are in pixel units, not normalized.

    Example:
        >>> bboxes = np.array([[0.1, 0.1, 0.5, 0.5], [0.2, 0.2, 0.8, 0.8]])
        >>> image_shape = (100, 100)
        >>> areas = calculate_bbox_areas(bboxes, image_shape)
        >>> print(areas)
        [1600. 3600.]
    """
    if len(bboxes) == 0:
        return np.array([], dtype=np.float32)

    height, width = image_shape
    bboxes_denorm = bboxes.copy()
    bboxes_denorm[:, [0, 2]] *= width
    bboxes_denorm[:, [1, 3]] *= height
    return (bboxes_denorm[:, 2] - bboxes_denorm[:, 0]) * (bboxes_denorm[:, 3] - bboxes_denorm[:, 1])

def check_bboxes (bboxes) [view source on GitHub]

Check if bboxes boundaries are in range 0, 1 and minimums are lesser than maximums.

Parameters:

Name Type Description
bboxes np.ndarray

numpy array of shape (num_bboxes, 4+) where first 4 coordinates are x_min, y_min, x_max, y_max.

Exceptions:

Type Description
ValueError

If any bbox is invalid.

Source code in albumentations/core/bbox_utils.py
Python
@handle_empty_array
def check_bboxes(bboxes: np.ndarray) -> None:
    """Check if bboxes boundaries are in range 0, 1 and minimums are lesser than maximums.

    Args:
        bboxes: numpy array of shape (num_bboxes, 4+) where first 4 coordinates are x_min, y_min, x_max, y_max.

    Raises:
        ValueError: If any bbox is invalid.
    """
    # Check if all values are in range [0, 1]
    in_range = (bboxes[:, :4] >= 0) & (bboxes[:, :4] <= 1)
    close_to_zero = np.isclose(bboxes[:, :4], 0)
    close_to_one = np.isclose(bboxes[:, :4], 1)
    valid_range = in_range | close_to_zero | close_to_one

    if not np.all(valid_range):
        invalid_idx = np.where(~np.all(valid_range, axis=1))[0][0]
        invalid_bbox = bboxes[invalid_idx]
        invalid_coord = ["x_min", "y_min", "x_max", "y_max"][np.where(~valid_range[invalid_idx])[0][0]]
        invalid_value = invalid_bbox[np.where(~valid_range[invalid_idx])[0][0]]
        raise ValueError(
            f"Expected {invalid_coord} for bbox {invalid_bbox} to be in the range [0.0, 1.0], got {invalid_value}.",
        )

    # Check if x_max > x_min and y_max > y_min
    valid_order = (bboxes[:, 2] > bboxes[:, 0]) & (bboxes[:, 3] > bboxes[:, 1])

    if not np.all(valid_order):
        invalid_idx = np.where(~valid_order)[0][0]
        invalid_bbox = bboxes[invalid_idx]
        if invalid_bbox[2] <= invalid_bbox[0]:
            raise ValueError(f"x_max is less than or equal to x_min for bbox {invalid_bbox}.")

        raise ValueError(f"y_max is less than or equal to y_min for bbox {invalid_bbox}.")

def clip_bboxes (bboxes, image_shape) [view source on GitHub]

Clips the bounding box coordinates to ensure they fit within the boundaries of an image.

Parameters:

Name Type Description
bboxes np.ndarray

Array of bounding boxes with shape (num_boxes, 4+) in normalized format. The first 4 columns are [x_min, y_min, x_max, y_max].

image_shape Tuple[int, int]

Image shape (height, width).

Returns:

Type Description
np.ndarray

The clipped bounding boxes, normalized to the image dimensions.

Source code in albumentations/core/bbox_utils.py
Python
@handle_empty_array
def clip_bboxes(bboxes: np.ndarray, image_shape: tuple[int, int]) -> np.ndarray:
    """Clips the bounding box coordinates to ensure they fit within the boundaries of an image.

    Parameters:
        bboxes (np.ndarray): Array of bounding boxes with shape (num_boxes, 4+) in normalized format.
                             The first 4 columns are [x_min, y_min, x_max, y_max].
        image_shape (Tuple[int, int]): Image shape (height, width).

    Returns:
        np.ndarray: The clipped bounding boxes, normalized to the image dimensions.

    """
    height, width = image_shape[:2]

    # Denormalize bboxes
    denorm_bboxes = denormalize_bboxes(bboxes, image_shape)

    ## Note:
    # It could be tempting to use cols - 1 and rows - 1 as the upper bounds for the clipping

    # But this would cause the bounding box to be clipped to the image dimensions - 1 which is not what we want.
    # Bounding box lives not in the middle of pixels but between them.

    # Example: for image with height 100, width 100, the pixel values are in the range [0, 99]
    # but if we want bounding box to be 1 pixel width and height and lie on the boundary of the image
    # it will be described as [99, 99, 100, 100] => clip by image_size - 1 will lead to [99, 99, 99, 99]
    # which is incorrect

    # It could be also tempting to clip `x_min`` to `cols - 1`` and `y_min` to `rows - 1`, but this also leads
    # to another error. If image fully lies outside of the visible area and min_area is set to 0, then
    # the bounding box will be clipped to the image size - 1 and will be 1 pixel in size and fully visible,
    # but it should be completely removed.

    # Clip coordinates
    denorm_bboxes[:, [0, 2]] = np.clip(denorm_bboxes[:, [0, 2]], 0, width)
    denorm_bboxes[:, [1, 3]] = np.clip(denorm_bboxes[:, [1, 3]], 0, height)

    # Normalize clipped bboxes
    return normalize_bboxes(denorm_bboxes, image_shape)

def convert_bboxes_from_albumentations (bboxes, target_format, image_shape, check_validity=False) [view source on GitHub]

Convert bounding boxes from the format used by albumentations to a specified format.

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of albumentations bounding boxes with shape (num_bboxes, 4+). The first 4 columns are [x_min, y_min, x_max, y_max].

target_format Literal['coco', 'pascal_voc', 'yolo']

Required format of the output bounding boxes. Should be 'coco', 'pascal_voc' or 'yolo'.

image_shape tuple[int, int]

Image shape (height, width).

check_validity bool

Check if all boxes are valid boxes.

Returns:

Type Description
np.ndarray

An array of bounding boxes in the target format with shape (num_bboxes, 4+).

Exceptions:

Type Description
ValueError

If target_format is not 'coco', 'pascal_voc' or 'yolo'.

Source code in albumentations/core/bbox_utils.py
Python
@handle_empty_array
def convert_bboxes_from_albumentations(
    bboxes: np.ndarray,
    target_format: Literal["coco", "pascal_voc", "yolo"],
    image_shape: tuple[int, int],
    check_validity: bool = False,
) -> np.ndarray:
    """Convert bounding boxes from the format used by albumentations to a specified format.

    Args:
        bboxes: A numpy array of albumentations bounding boxes with shape (num_bboxes, 4+).
                The first 4 columns are [x_min, y_min, x_max, y_max].
        target_format: Required format of the output bounding boxes. Should be 'coco', 'pascal_voc' or 'yolo'.
        image_shape: Image shape (height, width).
        check_validity: Check if all boxes are valid boxes.

    Returns:
        np.ndarray: An array of bounding boxes in the target format with shape (num_bboxes, 4+).

    Raises:
        ValueError: If `target_format` is not 'coco', 'pascal_voc' or 'yolo'.
    """
    if target_format not in {"coco", "pascal_voc", "yolo"}:
        raise ValueError(
            f"Unknown target_format {target_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'",
        )

    if check_validity:
        check_bboxes(bboxes)

    converted_bboxes = np.zeros_like(bboxes)
    converted_bboxes[:, 4:] = bboxes[:, 4:]  # Preserve additional columns

    denormalized_bboxes = denormalize_bboxes(bboxes[:, :4], image_shape) if target_format != "yolo" else bboxes[:, :4]

    if target_format == "coco":
        converted_bboxes[:, 0] = denormalized_bboxes[:, 0]  # x_min
        converted_bboxes[:, 1] = denormalized_bboxes[:, 1]  # y_min
        converted_bboxes[:, 2] = denormalized_bboxes[:, 2] - denormalized_bboxes[:, 0]  # width
        converted_bboxes[:, 3] = denormalized_bboxes[:, 3] - denormalized_bboxes[:, 1]  # height
    elif target_format == "yolo":
        converted_bboxes[:, 0] = (denormalized_bboxes[:, 0] + denormalized_bboxes[:, 2]) / 2  # x_center
        converted_bboxes[:, 1] = (denormalized_bboxes[:, 1] + denormalized_bboxes[:, 3]) / 2  # y_center
        converted_bboxes[:, 2] = denormalized_bboxes[:, 2] - denormalized_bboxes[:, 0]  # width
        converted_bboxes[:, 3] = denormalized_bboxes[:, 3] - denormalized_bboxes[:, 1]  # height
    else:  # pascal_voc
        converted_bboxes[:, :4] = denormalized_bboxes

    return converted_bboxes

def convert_bboxes_to_albumentations (bboxes, source_format, image_shape, check_validity=False) [view source on GitHub]

Convert bounding boxes from a specified format to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in the form of (x_min, y_min, x_max, y_max) e.g. (0.15, 0.27, 0.67, 0.5).

Parameters:

Name Type Description
bboxes np.ndarray

A numpy array of bounding boxes with shape (num_bboxes, 4+).

source_format Literal['coco', 'pascal_voc', 'yolo']

Format of the input bounding boxes. Should be 'coco', 'pascal_voc', or 'yolo'.

image_shape tuple[int, int]

Image shape (height, width).

check_validity bool

Check if all boxes are valid boxes.

Returns:

Type Description
np.ndarray

An array of bounding boxes in albumentations format with shape (num_bboxes, 4+).

Exceptions:

Type Description
ValueError

If source_format is not 'coco', 'pascal_voc', or 'yolo'.

ValueError

If in YOLO format, any coordinates are not in the range (0, 1].

Source code in albumentations/core/bbox_utils.py
Python
@handle_empty_array
def convert_bboxes_to_albumentations(
    bboxes: np.ndarray,
    source_format: Literal["coco", "pascal_voc", "yolo"],
    image_shape: tuple[int, int],
    check_validity: bool = False,
) -> np.ndarray:
    """Convert bounding boxes from a specified format to the format used by albumentations:
    normalized coordinates of top-left and bottom-right corners of the bounding box in the form of
    `(x_min, y_min, x_max, y_max)` e.g. `(0.15, 0.27, 0.67, 0.5)`.

    Args:
        bboxes: A numpy array of bounding boxes with shape (num_bboxes, 4+).
        source_format: Format of the input bounding boxes. Should be 'coco', 'pascal_voc', or 'yolo'.
        image_shape: Image shape (height, width).
        check_validity: Check if all boxes are valid boxes.

    Returns:
        np.ndarray: An array of bounding boxes in albumentations format with shape (num_bboxes, 4+).

    Raises:
        ValueError: If `source_format` is not 'coco', 'pascal_voc', or 'yolo'.
        ValueError: If in YOLO format, any coordinates are not in the range (0, 1].
    """
    if source_format not in {"coco", "pascal_voc", "yolo"}:
        raise ValueError(
            f"Unknown source_format {source_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'",
        )

    bboxes = bboxes.copy().astype(np.float32)
    converted_bboxes = np.zeros_like(bboxes)
    converted_bboxes[:, 4:] = bboxes[:, 4:]  # Preserve additional columns

    if source_format == "coco":
        converted_bboxes[:, 0] = bboxes[:, 0]  # x_min
        converted_bboxes[:, 1] = bboxes[:, 1]  # y_min
        converted_bboxes[:, 2] = bboxes[:, 0] + bboxes[:, 2]  # x_max
        converted_bboxes[:, 3] = bboxes[:, 1] + bboxes[:, 3]  # y_max
    elif source_format == "yolo":
        if check_validity and np.any((bboxes[:, :4] <= 0) | (bboxes[:, :4] > 1)):
            raise ValueError(f"In YOLO format all coordinates must be float and in range (0, 1], got {bboxes}")

        w_half, h_half = bboxes[:, 2] / 2, bboxes[:, 3] / 2
        converted_bboxes[:, 0] = bboxes[:, 0] - w_half  # x_min
        converted_bboxes[:, 1] = bboxes[:, 1] - h_half  # y_min
        converted_bboxes[:, 2] = bboxes[:, 0] + w_half  # x_max
        converted_bboxes[:, 3] = bboxes[:, 1] + h_half  # y_max
    else:  # pascal_voc
        converted_bboxes[:, :4] = bboxes[:, :4]

    if source_format != "yolo":
        converted_bboxes[:, :4] = normalize_bboxes(converted_bboxes[:, :4], image_shape)

    if check_validity:
        check_bboxes(converted_bboxes)

    return converted_bboxes

def denormalize_bboxes (bboxes, image_shape) [view source on GitHub]

Denormalize array of bounding boxes.

Parameters:

Name Type Description
bboxes np.ndarray

Normalized bounding boxes [(x_min, y_min, x_max, y_max, ...)].

image_shape tuple[int, int]

Image shape (height, width).

Returns:

Type Description
np.ndarray

Denormalized bounding boxes [(x_min, y_min, x_max, y_max, ...)].

Source code in albumentations/core/bbox_utils.py
Python
@handle_empty_array
def denormalize_bboxes(
    bboxes: np.ndarray,
    image_shape: tuple[int, int],
) -> np.ndarray:
    """Denormalize  array of bounding boxes.

    Args:
        bboxes: Normalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.
        image_shape: Image shape `(height, width)`.

    Returns:
        Denormalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.

    """
    rows, cols = image_shape[:2]

    denormalized = bboxes.copy().astype(float)
    denormalized[:, [0, 2]] *= cols
    denormalized[:, [1, 3]] *= rows
    return denormalized

def filter_bboxes (bboxes, image_shape, min_area=0.0, min_visibility=0.0, min_width=1.0, min_height=1.0) [view source on GitHub]

Remove bounding boxes that either lie outside of the visible area by more than min_visibility or whose area in pixels is under the threshold set by min_area. Also crops boxes to final image size.

Parameters:

Name Type Description
bboxes np.ndarray

numpy array of bounding boxes with shape (num_bboxes, 4+). The first 4 columns are [x_min, y_min, x_max, y_max].

image_shape tuple[int, int]

Image shape (height, width).

min_area float

Minimum area of a bounding box in pixels. Default: 0.0.

min_visibility float

Minimum fraction of area for a bounding box to remain. Default: 0.0.

min_width float

Minimum width of a bounding box in pixels. Default: 0.0.

min_height float

Minimum height of a bounding box in pixels. Default: 0.0.

Returns:

Type Description
np.ndarray

numpy array of filtered bounding boxes.

Source code in albumentations/core/bbox_utils.py
Python
def filter_bboxes(
    bboxes: np.ndarray,
    image_shape: tuple[int, int],
    min_area: float = 0.0,
    min_visibility: float = 0.0,
    min_width: float = 1.0,
    min_height: float = 1.0,
) -> np.ndarray:
    """Remove bounding boxes that either lie outside of the visible area by more than min_visibility
    or whose area in pixels is under the threshold set by `min_area`. Also crops boxes to final image size.

    Args:
        bboxes: numpy array of bounding boxes with shape (num_bboxes, 4+).
                The first 4 columns are [x_min, y_min, x_max, y_max].
        image_shape: Image shape (height, width).
        min_area: Minimum area of a bounding box in pixels. Default: 0.0.
        min_visibility: Minimum fraction of area for a bounding box to remain. Default: 0.0.
        min_width: Minimum width of a bounding box in pixels. Default: 0.0.
        min_height: Minimum height of a bounding box in pixels. Default: 0.0.

    Returns:
        numpy array of filtered bounding boxes.
    """
    if len(bboxes) == 0:
        return np.array([], dtype=np.float32)

    # Calculate areas of bounding boxes before clipping in pixels
    denormalized_box_areas = calculate_bbox_areas_in_pixels(bboxes, image_shape)

    # Clip bounding boxes in ratio
    clipped_bboxes = clip_bboxes(bboxes, image_shape)

    # Calculate areas of clipped bounding boxes in pixels
    clipped_box_areas = calculate_bbox_areas_in_pixels(clipped_bboxes, image_shape)

    # Calculate width and height of the clipped bounding boxes
    denormalized_bboxes = denormalize_bboxes(clipped_bboxes[:, :4], image_shape)

    clipped_widths = denormalized_bboxes[:, 2] - denormalized_bboxes[:, 0]
    clipped_heights = denormalized_bboxes[:, 3] - denormalized_bboxes[:, 1]

    # Create a mask for bboxes that meet all criteria
    mask = (
        (denormalized_box_areas >= EPSILON)
        & (clipped_box_areas >= min_area - EPSILON)
        & (clipped_box_areas / denormalized_box_areas >= min_visibility - EPSILON)
        & (clipped_widths >= min_width - EPSILON)
        & (clipped_heights >= min_height - EPSILON)
    )

    # Apply the mask to get the filtered bboxes
    filtered_bboxes = clipped_bboxes[mask]

    # If no bboxes pass the filter, return an empty array with the same number of columns as input
    if len(filtered_bboxes) == 0:
        return np.array([], dtype=np.float32)

    return filtered_bboxes

def mask_from_bbox (img, bbox) [view source on GitHub]

Create binary mask from bounding box

Parameters:

Name Type Description
img np.ndarray

input image

bbox tuple[int, int, int, int]

A bounding box tuple (x_min, y_min, x_max, y_max)

Returns:

Type Description
mask

binary mask

Source code in albumentations/core/bbox_utils.py
Python
def mask_from_bbox(img: np.ndarray, bbox: tuple[int, int, int, int]) -> np.ndarray:
    """Create binary mask from bounding box

    Args:
        img: input image
        bbox: A bounding box tuple `(x_min, y_min, x_max, y_max)`

    Returns:
        mask: binary mask

    """
    mask = np.zeros(img.shape[:2], dtype=np.uint8)
    x_min, y_min, x_max, y_max = bbox
    mask[y_min:y_max, x_min:x_max] = 1
    return mask

def normalize_bboxes (bboxes, image_shape) [view source on GitHub]

Normalize array of bounding boxes.

Parameters:

Name Type Description
bboxes np.ndarray

Denormalized bounding boxes [(x_min, y_min, x_max, y_max, ...)].

image_shape tuple[int, int]

Image shape (height, width).

Returns:

Type Description
np.ndarray

Normalized bounding boxes [(x_min, y_min, x_max, y_max, ...)].

Source code in albumentations/core/bbox_utils.py
Python
@handle_empty_array
def normalize_bboxes(bboxes: np.ndarray, image_shape: tuple[int, int]) -> np.ndarray:
    """Normalize array of bounding boxes.

    Args:
        bboxes: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.
        image_shape: Image shape `(height, width)`.

    Returns:
        Normalized bounding boxes `[(x_min, y_min, x_max, y_max, ...)]`.

    """
    rows, cols = image_shape[:2]
    normalized = bboxes.copy().astype(float)
    normalized[:, [0, 2]] /= cols
    normalized[:, [1, 3]] /= rows
    return normalized

def union_of_bboxes (bboxes, erosion_rate) [view source on GitHub]

Calculate union of bounding boxes. Boxes could be in albumentations or Pascal Voc format.

Parameters:

Name Type Description
bboxes list[tuple]

List of bounding boxes

erosion_rate float

How much each bounding box can be shrunk, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox lose its volume.

Returns:

Type Description
np.ndarray | None

A bounding box (x_min, y_min, x_max, y_max) or None if no bboxes are given or if the bounding boxes become invalid after erosion.

Source code in albumentations/core/bbox_utils.py
Python
def union_of_bboxes(bboxes: np.ndarray, erosion_rate: float) -> np.ndarray | None:
    """Calculate union of bounding boxes. Boxes could be in albumentations or Pascal Voc format.

    Args:
        bboxes (list[tuple]): List of bounding boxes
        erosion_rate (float): How much each bounding box can be shrunk, useful for erosive cropping.
            Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox lose its volume.

    Returns:
        np.ndarray | None: A bounding box `(x_min, y_min, x_max, y_max)` or None if no bboxes are given or if
                    the bounding boxes become invalid after erosion.
    """
    if not bboxes.size:
        return None

    if erosion_rate == 1:
        return None

    if bboxes.shape[0] == 1:
        return bboxes[0][:4]

    x_min, y_min = np.min(bboxes[:, :2], axis=0)
    x_max, y_max = np.max(bboxes[:, 2:4], axis=0)

    width = x_max - x_min
    height = y_max - y_min

    erosion_x = width * erosion_rate * 0.5
    erosion_y = height * erosion_rate * 0.5

    x_min += erosion_x
    y_min += erosion_y
    x_max -= erosion_x
    y_max -= erosion_y

    if abs(x_max - x_min) < EPSILON or abs(y_max - y_min) < EPSILON:
        return None

    return np.array([x_min, y_min, x_max, y_max], dtype=np.float32)

composition

class BaseCompose (transforms, p) [view source on GitHub]

Base class for composing multiple transforms together.

This class serves as a foundation for creating compositions of transforms in the Albumentations library. It provides basic functionality for managing a sequence of transforms and applying them to data.

Attributes:

Name Type Description
transforms List[TransformType]

A list of transforms to be applied.

p float

Probability of applying the compose. Should be in the range [0, 1].

replay_mode bool

If True, the compose is in replay mode.

applied_in_replay bool

Indicates if the compose was applied during replay.

_additional_targets Dict[str, str]

Additional targets for transforms.

_available_keys Set[str]

Set of available keys for data.

processors Dict[str, Union[BboxProcessor, KeypointsProcessor]]

Processors for specific data types.

Parameters:

Name Type Description
transforms TransformsSeqType

A sequence of transforms to compose.

p float

Probability of applying the compose.

Exceptions:

Type Description
ValueError

If an invalid additional target is specified.

Note

  • Subclasses should implement the call method to define how the composition is applied to data.
  • The class supports serialization and deserialization of transforms.
  • It provides methods for adding targets, setting deterministic behavior, and checking data validity post-transform.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class BaseCompose(Serializable):
    """Base class for composing multiple transforms together.

    This class serves as a foundation for creating compositions of transforms
    in the Albumentations library. It provides basic functionality for
    managing a sequence of transforms and applying them to data.

    Attributes:
        transforms (List[TransformType]): A list of transforms to be applied.
        p (float): Probability of applying the compose. Should be in the range [0, 1].
        replay_mode (bool): If True, the compose is in replay mode.
        applied_in_replay (bool): Indicates if the compose was applied during replay.
        _additional_targets (Dict[str, str]): Additional targets for transforms.
        _available_keys (Set[str]): Set of available keys for data.
        processors (Dict[str, Union[BboxProcessor, KeypointsProcessor]]): Processors for specific data types.

    Args:
        transforms (TransformsSeqType): A sequence of transforms to compose.
        p (float): Probability of applying the compose.

    Raises:
        ValueError: If an invalid additional target is specified.

    Note:
        - Subclasses should implement the __call__ method to define how
          the composition is applied to data.
        - The class supports serialization and deserialization of transforms.
        - It provides methods for adding targets, setting deterministic behavior,
          and checking data validity post-transform.
    """

    _transforms_dict: dict[int, BasicTransform] | None = None
    check_each_transform: tuple[DataProcessor, ...] | None = None
    main_compose: bool = True

    def __init__(self, transforms: TransformsSeqType, p: float):
        if isinstance(transforms, (BaseCompose, BasicTransform)):
            warnings.warn(
                "transforms is single transform, but a sequence is expected! Transform will be wrapped into list.",
                stacklevel=2,
            )
            transforms = [transforms]

        self.transforms = transforms
        self.p = p

        self.replay_mode = False
        self.applied_in_replay = False
        self._additional_targets: dict[str, str] = {}
        self._available_keys: set[str] = set()
        self.processors: dict[str, BboxProcessor | KeypointsProcessor] = {}
        self._set_keys()

    def __iter__(self) -> Iterator[TransformType]:
        return iter(self.transforms)

    def __len__(self) -> int:
        return len(self.transforms)

    def __call__(self, *args: Any, **data: Any) -> dict[str, Any]:
        raise NotImplementedError

    def __getitem__(self, item: int) -> TransformType:
        return self.transforms[item]

    def __repr__(self) -> str:
        return self.indented_repr()

    @property
    def additional_targets(self) -> dict[str, str]:
        return self._additional_targets

    @property
    def available_keys(self) -> set[str]:
        return self._available_keys

    def indented_repr(self, indent: int = REPR_INDENT_STEP) -> str:
        args = {k: v for k, v in self.to_dict_private().items() if not (k.startswith("__") or k == "transforms")}
        repr_string = self.__class__.__name__ + "(["
        for t in self.transforms:
            repr_string += "\n"
            t_repr = t.indented_repr(indent + REPR_INDENT_STEP) if hasattr(t, "indented_repr") else repr(t)
            repr_string += " " * indent + t_repr + ","
        repr_string += "\n" + " " * (indent - REPR_INDENT_STEP) + f"], {format_args(args)})"
        return repr_string

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    def to_dict_private(self) -> dict[str, Any]:
        return {
            "__class_fullname__": self.get_class_fullname(),
            "p": self.p,
            "transforms": [t.to_dict_private() for t in self.transforms],
        }

    def get_dict_with_id(self) -> dict[str, Any]:
        return {
            "__class_fullname__": self.get_class_fullname(),
            "id": id(self),
            "params": None,
            "transforms": [t.get_dict_with_id() for t in self.transforms],
        }

    def add_targets(self, additional_targets: dict[str, str] | None) -> None:
        if additional_targets:
            for k, v in additional_targets.items():
                if k in self._additional_targets and v != self._additional_targets[k]:
                    raise ValueError(
                        f"Trying to overwrite existed additional targets. "
                        f"Key={k} Exists={self._additional_targets[k]} New value: {v}",
                    )
            self._additional_targets.update(additional_targets)
            for t in self.transforms:
                t.add_targets(additional_targets)
            for proc in self.processors.values():
                proc.add_targets(additional_targets)
        self._set_keys()

    def _set_keys(self) -> None:
        """Set _available_keys"""
        self._available_keys.update(self._additional_targets.keys())
        for t in self.transforms:
            self._available_keys.update(t.available_keys)
            if hasattr(t, "targets_as_params"):
                self._available_keys.update(t.targets_as_params)
        if self.processors:
            self._available_keys.update(["labels"])
            for proc in self.processors.values():
                if proc.default_data_name not in self._available_keys:  # if no transform to process this data
                    warnings.warn(
                        f"Got processor for {proc.default_data_name}, but no transform to process it.",
                        stacklevel=2,
                    )
                self._available_keys.update(proc.data_fields)
                if proc.params.label_fields:
                    self._available_keys.update(proc.params.label_fields)

    def set_deterministic(self, flag: bool, save_key: str = "replay") -> None:
        for t in self.transforms:
            t.set_deterministic(flag, save_key)

    def check_data_post_transform(self, data: Any) -> dict[str, Any]:
        if self.check_each_transform:
            image_shape = get_shape(data["image"])

            for proc in self.check_each_transform:
                for data_name in data:
                    if data_name in proc.data_fields or (
                        data_name in self._additional_targets
                        and self._additional_targets[data_name] in proc.data_fields
                    ):
                        data[data_name] = proc.filter(data[data_name], image_shape)
        return data

class Compose (transforms, bbox_params=None, keypoint_params=None, additional_targets=None, p=1.0, is_check_shapes=True, strict=True, return_params=False, save_key='applied_params') [view source on GitHub]

Compose transforms and handle all transformations regarding bounding boxes

Parameters:

Name Type Description
transforms list

list of transformations to compose.

bbox_params BboxParams

Parameters for bounding boxes transforms

keypoint_params KeypointParams

Parameters for keypoints transforms

additional_targets dict

Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}

p float

probability of applying all list of transforms. Default: 1.0.

is_check_shapes bool

If True shapes consistency of images/mask/masks would be checked on each call. If you would like to disable this check - pass False (do it only if you are sure in your data consistency).

strict bool

If True, unknown keys will raise an error. If False, unknown keys will be ignored. Default: True.

return_params bool

if True returns params of each applied transform

save_key str

key to save applied params, default is 'applied_params'

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class Compose(BaseCompose, HubMixin):
    """Compose transforms and handle all transformations regarding bounding boxes

    Args:
        transforms (list): list of transformations to compose.
        bbox_params (BboxParams): Parameters for bounding boxes transforms
        keypoint_params (KeypointParams): Parameters for keypoints transforms
        additional_targets (dict): Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}
        p (float): probability of applying all list of transforms. Default: 1.0.
        is_check_shapes (bool): If True shapes consistency of images/mask/masks would be checked on each call. If you
            would like to disable this check - pass False (do it only if you are sure in your data consistency).
        strict (bool): If True, unknown keys will raise an error. If False, unknown keys will be ignored. Default: True.
        return_params (bool): if True returns params of each applied transform
        save_key (str): key to save applied params, default is 'applied_params'

    """

    def __init__(
        self,
        transforms: TransformsSeqType,
        bbox_params: dict[str, Any] | BboxParams | None = None,
        keypoint_params: dict[str, Any] | KeypointParams | None = None,
        additional_targets: dict[str, str] | None = None,
        p: float = 1.0,
        is_check_shapes: bool = True,
        strict: bool = True,
        return_params: bool = False,
        save_key: str = "applied_params",
    ):
        super().__init__(transforms, p)

        if bbox_params:
            if isinstance(bbox_params, dict):
                b_params = BboxParams(**bbox_params)
            elif isinstance(bbox_params, BboxParams):
                b_params = bbox_params
            else:
                msg = "unknown format of bbox_params, please use `dict` or `BboxParams`"
                raise ValueError(msg)
            self.processors["bboxes"] = BboxProcessor(b_params)

        if keypoint_params:
            if isinstance(keypoint_params, dict):
                k_params = KeypointParams(**keypoint_params)
            elif isinstance(keypoint_params, KeypointParams):
                k_params = keypoint_params
            else:
                msg = "unknown format of keypoint_params, please use `dict` or `KeypointParams`"
                raise ValueError(msg)
            self.processors["keypoints"] = KeypointsProcessor(k_params)

        for proc in self.processors.values():
            proc.ensure_transforms_valid(self.transforms)

        self.add_targets(additional_targets)
        if not self.transforms:  # if no transforms -> do nothing, all keys will be available
            self._available_keys.update(AVAILABLE_KEYS)

        self.is_check_args = True
        self.strict = strict

        self.is_check_shapes = is_check_shapes
        self.check_each_transform = tuple(  # processors that checks after each transform
            proc for proc in self.processors.values() if getattr(proc.params, "check_each_transform", False)
        )
        self._set_check_args_for_transforms(self.transforms)

        self.return_params = return_params
        if return_params:
            self.save_key = save_key
            self._available_keys.add(save_key)
            self._transforms_dict = get_transforms_dict(self.transforms)
            self.set_deterministic(True, save_key=save_key)

    def _set_check_args_for_transforms(self, transforms: TransformsSeqType) -> None:
        for transform in transforms:
            if isinstance(transform, BaseCompose):
                self._set_check_args_for_transforms(transform.transforms)
                transform.check_each_transform = self.check_each_transform
                transform.processors = self.processors
            if isinstance(transform, Compose):
                transform.disable_check_args_private()

    def disable_check_args_private(self) -> None:
        self.is_check_args = False
        self.strict = False
        self.main_compose = False

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if args:
            msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
            raise KeyError(msg)

        if not isinstance(force_apply, (bool, int)):
            msg = "force_apply must have bool or int type"
            raise TypeError(msg)

        if self.return_params and self.main_compose:
            data[self.save_key] = OrderedDict()

        need_to_run = force_apply or random.random() < self.p
        if not need_to_run:
            return data

        self.preprocess(data)

        for t in self.transforms:
            data = t(**data)
            data = self.check_data_post_transform(data)

        return self.postprocess(data)

    def run_with_params(self, *, params: dict[int, dict[str, Any]], **data: Any) -> dict[str, Any]:
        """Run transforms with given parameters. Available only for Compose with `return_params=True`."""
        if self._transforms_dict is None:
            raise RuntimeError("`run_with_params` is not available for Compose with `return_params=False`.")

        self.preprocess(data)

        for tr_id, param in params.items():
            tr = self._transforms_dict[tr_id]
            data = tr.apply_with_params(param, **data)
            data = self.check_data_post_transform(data)

        return self.postprocess(data)

    def preprocess(self, data: Any) -> None:
        if self.strict:
            for data_name in data:
                if data_name not in self._available_keys and data_name not in MASK_KEYS and data_name not in IMAGE_KEYS:
                    msg = f"Key {data_name} is not in available keys."
                    raise ValueError(msg)
        if self.is_check_args:
            self._check_args(**data)
        if self.main_compose:
            for p in self.processors.values():
                p.ensure_data_valid(data)
            for p in self.processors.values():
                p.preprocess(data)

    def postprocess(self, data: dict[str, Any]) -> dict[str, Any]:
        if self.main_compose:
            for p in self.processors.values():
                p.postprocess(data)
        return data

    def to_dict_private(self) -> dict[str, Any]:
        dictionary = super().to_dict_private()
        bbox_processor = self.processors.get("bboxes")
        keypoints_processor = self.processors.get("keypoints")
        dictionary.update(
            {
                "bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
                "keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
                "additional_targets": self.additional_targets,
                "is_check_shapes": self.is_check_shapes,
            },
        )
        return dictionary

    def get_dict_with_id(self) -> dict[str, Any]:
        dictionary = super().get_dict_with_id()
        bbox_processor = self.processors.get("bboxes")
        keypoints_processor = self.processors.get("keypoints")
        dictionary.update(
            {
                "bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
                "keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
                "additional_targets": self.additional_targets,
                "params": None,
                "is_check_shapes": self.is_check_shapes,
            },
        )
        return dictionary

    def _check_args(self, **kwargs: Any) -> None:
        shapes = []

        for data_name, data in kwargs.items():
            internal_data_name = self._additional_targets.get(data_name, data_name)
            if internal_data_name in CHECKED_SINGLE:
                if not isinstance(data, np.ndarray):
                    raise TypeError(f"{data_name} must be numpy array type")
                shapes.append(data.shape[:2])
            if internal_data_name in CHECKED_MULTI and data is not None and len(data):
                if not isinstance(data, Sequence) or not isinstance(data[0], np.ndarray):
                    raise TypeError(f"{data_name} must be list of numpy arrays")
                shapes.append(data[0].shape[:2])
            if internal_data_name in CHECK_BBOX_PARAM and self.processors.get("bboxes") is None:
                msg = "bbox_params must be specified for bbox transformations"
                raise ValueError(msg)

            if internal_data_name in CHECK_KEYPOINTS_PARAM and self.processors.get("keypoints") is None:
                msg = "keypoints_params must be specified for keypoint transformations"
                raise ValueError(msg)

        if self.is_check_shapes and shapes and shapes.count(shapes[0]) != len(shapes):
            msg = (
                "Height and Width of image, mask or masks should be equal. You can disable shapes check "
                "by setting a parameter is_check_shapes=False of Compose class (do it only if you are sure "
                "about your data consistency)."
            )
            raise ValueError(msg)
run_with_params (self, *, params, **data)

Run transforms with given parameters. Available only for Compose with return_params=True.

Source code in albumentations/core/composition.py
Python
def run_with_params(self, *, params: dict[int, dict[str, Any]], **data: Any) -> dict[str, Any]:
    """Run transforms with given parameters. Available only for Compose with `return_params=True`."""
    if self._transforms_dict is None:
        raise RuntimeError("`run_with_params` is not available for Compose with `return_params=False`.")

    self.preprocess(data)

    for tr_id, param in params.items():
        tr = self._transforms_dict[tr_id]
        data = tr.apply_with_params(param, **data)
        data = self.check_data_post_transform(data)

    return self.postprocess(data)

class OneOf (transforms, p=0.5) [view source on GitHub]

Select one of transforms to apply. Selected transform will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name Type Description
transforms list

list of transformations to compose.

p float

probability of applying selected transform. Default: 0.5.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class OneOf(BaseCompose):
    """Select one of transforms to apply. Selected transform will be called with `force_apply=True`.
    Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

    Args:
        transforms (list): list of transformations to compose.
        p (float): probability of applying selected transform. Default: 0.5.

    """

    def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
        super().__init__(transforms, p)
        transforms_ps = [t.p for t in self.transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if self.transforms_ps and (force_apply or random.random() < self.p):
            idx: int = random_utils.choice(len(self.transforms), p=self.transforms_ps)
            t = self.transforms[idx]
            data = t(force_apply=True, **data)
        return data

class OneOrOther (first=None, second=None, transforms=None, p=0.5) [view source on GitHub]

Select one or another transform to apply. Selected transform will be called with force_apply=True.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class OneOrOther(BaseCompose):
    """Select one or another transform to apply. Selected transform will be called with `force_apply=True`."""

    def __init__(
        self,
        first: TransformType | None = None,
        second: TransformType | None = None,
        transforms: TransformsSeqType | None = None,
        p: float = 0.5,
    ):
        if transforms is None:
            if first is None or second is None:
                msg = "You must set both first and second or set transforms argument."
                raise ValueError(msg)
            transforms = [first, second]
        super().__init__(transforms, p)
        if len(self.transforms) != NUM_ONEOF_TRANSFORMS:
            warnings.warn("Length of transforms is not equal to 2.", stacklevel=2)

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
            return data

        if random.random() < self.p:
            return self.transforms[0](force_apply=True, **data)

        return self.transforms[-1](force_apply=True, **data)

class SelectiveChannelTransform (transforms, channels=(0, 1, 2), p=1.0) [view source on GitHub]

A transformation class to apply specified transforms to selected channels of an image.

This class extends BaseCompose to allow selective application of transformations to specified image channels. It extracts the selected channels, applies the transformations, and then reinserts the transformed channels back into their original positions in the image.

Parameters:

Name Type Description
transforms TransformsSeqType

A sequence of transformations (from Albumentations) to be applied to the specified channels.

channels Sequence[int]

A sequence of integers specifying the indices of the channels to which the transforms should be applied.

p float

Probability that the transform will be applied; the default is 1.0 (always apply).

Methods

call(args, *kwargs): Applies the transforms to the image according to the specified channels. The input data should include 'image' key with the image array.

Returns:

Type Description
dict[str, Any]

The transformed data dictionary, which includes the transformed 'image' key.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class SelectiveChannelTransform(BaseCompose):
    """A transformation class to apply specified transforms to selected channels of an image.

    This class extends BaseCompose to allow selective application of transformations to
    specified image channels. It extracts the selected channels, applies the transformations,
    and then reinserts the transformed channels back into their original positions in the image.

    Parameters:
        transforms (TransformsSeqType):
            A sequence of transformations (from Albumentations) to be applied to the specified channels.
        channels (Sequence[int]):
            A sequence of integers specifying the indices of the channels to which the transforms should be applied.
        p (float):
            Probability that the transform will be applied; the default is 1.0 (always apply).

    Methods:
        __call__(*args, **kwargs):
            Applies the transforms to the image according to the specified channels.
            The input data should include 'image' key with the image array.

    Returns:
        dict[str, Any]: The transformed data dictionary, which includes the transformed 'image' key.
    """

    def __init__(
        self,
        transforms: TransformsSeqType,
        channels: Sequence[int] = (0, 1, 2),
        p: float = 1.0,
    ) -> None:
        super().__init__(transforms, p)
        self.channels = channels

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if force_apply or random.random() < self.p:
            image = data["image"]

            selected_channels = image[:, :, self.channels]
            sub_image = np.ascontiguousarray(selected_channels)

            for t in self.transforms:
                sub_image = t(image=sub_image)["image"]

            transformed_channels = cv2.split(sub_image)
            output_img = image.copy()

            for idx, channel in zip(self.channels, transformed_channels):
                output_img[:, :, idx] = channel

            data["image"] = np.ascontiguousarray(output_img)

        return data

class Sequential (transforms, p=0.5) [view source on GitHub]

Sequentially applies all transforms to targets.

Note

This transform is not intended to be a replacement for Compose. Instead, it should be used inside Compose the same way OneOf or OneOrOther are used. For instance, you can combine OneOf with Sequential to create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly chose sequence to input data (see the Example section for an example definition of such pipeline).

Examples:

Python
>>> import albumentations as A
>>> transform = A.Compose([
>>>    A.OneOf([
>>>        A.Sequential([
>>>            A.HorizontalFlip(p=0.5),
>>>            A.ShiftScaleRotate(p=0.5),
>>>        ]),
>>>        A.Sequential([
>>>            A.VerticalFlip(p=0.5),
>>>            A.RandomBrightnessContrast(p=0.5),
>>>        ]),
>>>    ], p=1)
>>> ])

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class Sequential(BaseCompose):
    """Sequentially applies all transforms to targets.

    Note:
        This transform is not intended to be a replacement for `Compose`. Instead, it should be used inside `Compose`
        the same way `OneOf` or `OneOrOther` are used. For instance, you can combine `OneOf` with `Sequential` to
        create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly
        chose sequence to input data (see the `Example` section for an example definition of such pipeline).

    Example:
        >>> import albumentations as A
        >>> transform = A.Compose([
        >>>    A.OneOf([
        >>>        A.Sequential([
        >>>            A.HorizontalFlip(p=0.5),
        >>>            A.ShiftScaleRotate(p=0.5),
        >>>        ]),
        >>>        A.Sequential([
        >>>            A.VerticalFlip(p=0.5),
        >>>            A.RandomBrightnessContrast(p=0.5),
        >>>        ]),
        >>>    ], p=1)
        >>> ])

    """

    def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
        super().__init__(transforms, p)

    def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode or force_apply or random.random() < self.p:
            for t in self.transforms:
                data = t(**data)
                data = self.check_data_post_transform(data)
        return data

class SomeOf (transforms, n, replace=True, p=1) [view source on GitHub]

Select N transforms to apply. Selected transforms will be called with force_apply=True. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

Parameters:

Name Type Description
transforms list

list of transformations to compose.

n int

number of transforms to apply.

replace bool

Whether the sampled transforms are with or without replacement. Default: True.

p float

probability of applying selected transform. Default: 1.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/composition.py
Python
class SomeOf(BaseCompose):
    """Select N transforms to apply. Selected transforms will be called with `force_apply=True`.
    Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.

    Args:
        transforms (list): list of transformations to compose.
        n (int): number of transforms to apply.
        replace (bool): Whether the sampled transforms are with or without replacement. Default: True.
        p (float): probability of applying selected transform. Default: 1.

    """

    def __init__(self, transforms: TransformsSeqType, n: int, replace: bool = True, p: float = 1):
        super().__init__(transforms, p)
        self.n = n
        self.replace = replace
        transforms_ps = [t.p for t in self.transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, *arg: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
        if self.replay_mode:
            for t in self.transforms:
                data = t(**data)
                data = self.check_data_post_transform(data)
            return data

        if self.transforms_ps and (force_apply or random.random() < self.p):
            idx = random_utils.choice(len(self.transforms), size=self.n, replace=self.replace, p=self.transforms_ps)
            for i in idx:
                t = self.transforms[i]
                data = t(force_apply=True, **data)
                data = self.check_data_post_transform(data)
        return data

    def to_dict_private(self) -> dict[str, Any]:
        dictionary = super().to_dict_private()
        dictionary.update({"n": self.n, "replace": self.replace})
        return dictionary

hub_mixin

This module provides mixin functionality for the Albumentations library. It includes utility functions and classes to enhance the core capabilities.

class HubMixin [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/hub_mixin.py
Python
class HubMixin:
    _CONFIG_KEYS = ("train", "eval")
    _CONFIG_FILE_NAME_TEMPLATE = "albumentations_config_{}.json"

    def _save_pretrained(self, save_directory: str | Path, filename: str) -> Path:
        """Save the transform to a specified directory.

        Args:
            save_directory (Union[str, Path]):
                Directory where the transform will be saved.
            filename (str):
                Name of the file to save the transform.

        Returns:
            Path: Path to the saved transform file.
        """
        # create save directory and path
        save_directory = Path(save_directory)
        save_directory.mkdir(parents=True, exist_ok=True)
        save_path = save_directory / filename

        # save transforms
        save_transform(self, save_path, data_format="json")  # type: ignore[arg-type]

        return save_path

    @classmethod
    def _from_pretrained(cls, save_directory: str | Path, filename: str) -> object:
        """Load a transform from a specified directory.

        Args:
            save_directory (Union[str, Path]):
                Directory from where the transform will be loaded.
            filename (str):
                Name of the file to load the transform from.

        Returns:
            A.Compose: Loaded transform.
        """
        save_path = Path(save_directory) / filename
        return load_transform(save_path, data_format="json")

    def save_pretrained(
        self,
        save_directory: str | Path,
        *,
        key: str = "eval",
        allow_custom_keys: bool = False,
        repo_id: str | None = None,
        push_to_hub: bool = False,
        **push_to_hub_kwargs: Any,
    ) -> str | None:
        """Save the transform and optionally push it to the Huggingface Hub.

        Args:
            save_directory (`str` or `Path`):
                Path to directory in which the transform configuration will be saved.
            key (`str`, *optional*):
                Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
            allow_custom_keys (`bool`, *optional*):
                Allow custom keys for the configuration. Defaults to False.
            push_to_hub (`bool`, *optional*, defaults to `False`):
                Whether or not to push your transform to the Huggingface Hub after saving it.
            repo_id (`str`, *optional*):
                ID of your repository on the Hub. Used only if `push_to_hub=True`. Will default to the folder name if
                not provided.
            push_to_hub_kwargs:
                Additional key word arguments passed along to the [`push_to_hub`] method.

        Returns:
            `str` or `None`: url of the commit on the Hub if `push_to_hub=True`, `None` otherwise.
        """
        if not allow_custom_keys and key not in self._CONFIG_KEYS:
            raise ValueError(
                f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
                "If you want to use a custom key, set `allow_custom_keys=True`.",
            )

        # save model transforms
        filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
        self._save_pretrained(save_directory, filename)

        # push to the Hub if required
        if push_to_hub:
            kwargs = push_to_hub_kwargs.copy()  # soft-copy to avoid mutating input
            if repo_id is None:
                repo_id = Path(save_directory).name  # Defaults to `save_directory` name
            return self.push_to_hub(repo_id=repo_id, key=key, **kwargs)
        return None

    @classmethod
    def from_pretrained(
        cls: Any,
        directory_or_repo_id: str | Path,
        *,
        key: str = "eval",
        force_download: bool = False,
        proxies: dict[str, str] | None = None,
        token: str | bool | None = None,
        cache_dir: str | Path | None = None,
        local_files_only: bool = False,
        revision: str | None = None,
    ) -> object:
        """Load a transform from the Huggingface Hub or a local directory.

        Args:
            directory_or_repo_id (`str`, `Path`):
                - Either the `repo_id` (string) of a repo with hosted transform on the Hub, e.g. `qubvel-hf/albu`.
                - Or a path to a `directory` containing transform config saved using
                    [`~albumentations.Compose.save_pretrained`], e.g., `../path/to/my_directory/`.
            key (`str`, *optional*):
                Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
            revision (`str`, *optional*):
                Revision of the repo on the Hub. Can be a branch name, a git tag or any commit id.
                Defaults to the latest commit on `main` branch.
            force_download (`bool`, *optional*, defaults to `False`):
                Whether to force (re-)downloading the transform configuration files from the Hub, overriding
                the existing cache.
            proxies (`dict[str, str]`, *optional*):
                A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
                'http://hostname': 'foo.bar:4012'}`. The proxies are used on every request.
            token (`str` or `bool`, *optional*):
                The token to use as HTTP bearer authorization for remote files. By default, it will use the token
                cached when running `huggingface-cli login`.
            cache_dir (`str`, `Path`, *optional*):
                Path to the folder where cached files are stored.
            local_files_only (`bool`, *optional*, defaults to `False`):
                If `True`, avoid downloading the file and return the path to the local cached file if it exists.
        """
        filename = cls._CONFIG_FILE_NAME_TEMPLATE.format(key)
        directory_or_repo_id = Path(directory_or_repo_id)
        transform = None

        # check if the file is already present locally
        if directory_or_repo_id.is_dir():
            if filename in os.listdir(directory_or_repo_id):
                transform = cls._from_pretrained(save_directory=directory_or_repo_id, filename=filename)
            elif is_huggingface_hub_available:
                logging.info(
                    f"{filename} not found in {Path(directory_or_repo_id).resolve()}, trying to load from the Hub.",
                )
            else:
                raise FileNotFoundError(
                    f"{filename} not found in {Path(directory_or_repo_id).resolve()}."
                    " Please install `huggingface_hub` to load from the Hub.",
                )
        if transform is not None:
            return transform

        # download the file from the Hub
        try:
            config_file = hf_hub_download(
                repo_id=directory_or_repo_id,
                filename=filename,
                revision=revision,
                cache_dir=cache_dir,
                force_download=force_download,
                proxies=proxies,
                token=token,
                local_files_only=local_files_only,
            )
            directory, filename = Path(config_file).parent, Path(config_file).name
            return cls._from_pretrained(save_directory=directory, filename=filename)

        except HfHubHTTPError as e:
            raise HfHubHTTPError(f"{filename} not found on the HuggingFace Hub") from e

    @require_huggingface_hub
    def push_to_hub(
        self,
        repo_id: str,
        *,
        key: str = "eval",
        allow_custom_keys: bool = False,
        commit_message: str = "Push transform using huggingface_hub.",
        private: bool = False,
        token: str | None = None,
        branch: str | None = None,
        create_pr: bool | None = None,
    ) -> str:
        """Push the transform to the Huggingface Hub.

        Use `allow_patterns` and `ignore_patterns` to precisely filter which files should be pushed to the hub. Use
        `delete_patterns` to delete existing remote files in the same commit. See [`upload_folder`] reference for more
        details.

        Args:
            repo_id (`str`):
                ID of the repository to push to (example: `"username/my-model"`).
            key (`str`, *optional*):
                Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
            allow_custom_keys (`bool`, *optional*):
                Allow custom keys for the configuration. Defaults to False.
            commit_message (`str`, *optional*):
                Message to commit while pushing.
            private (`bool`, *optional*, defaults to `False`):
                Whether the repository created should be private.
            token (`str`, *optional*):
                The token to use as HTTP bearer authorization for remote files. By default, it will use the token
                cached when running `huggingface-cli login`.
            branch (`str`, *optional*):
                The git branch on which to push the transform. This defaults to `"main"`.
            create_pr (`boolean`, *optional*):
                Whether or not to create a Pull Request from `branch` with that commit. Defaults to `False`.

        Returns:
            The url of the commit of your transform in the given repository.
        """
        if not allow_custom_keys and key not in self._CONFIG_KEYS:
            raise ValueError(
                f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
                "If you still want to use a custom key, set `allow_custom_keys=True`.",
            )

        api = HfApi(token=token)
        repo_id = api.create_repo(repo_id=repo_id, private=private, exist_ok=True).repo_id

        # Push the files to the repo in a single commit
        with SoftTemporaryDirectory() as tmp:
            save_directory = Path(tmp) / repo_id
            filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
            save_path = self._save_pretrained(save_directory, filename=filename)
            return api.upload_file(
                path_or_fileobj=save_path,
                path_in_repo=filename,
                repo_id=repo_id,
                commit_message=commit_message,
                revision=branch,
                create_pr=create_pr,
            )
from_pretrained (directory_or_repo_id, *, key='eval', force_download=False, proxies=None, token=None, cache_dir=None, local_files_only=False, revision=None) classmethod

Load a transform from the Huggingface Hub or a local directory.

Parameters:

Name Type Description
directory_or_repo_id `str`, `Path`
  • Either the repo_id (string) of a repo with hosted transform on the Hub, e.g. qubvel-hf/albu.
  • Or a path to a directory containing transform config saved using [~albumentations.Compose.save_pretrained], e.g., ../path/to/my_directory/.
key `str`, *optional*

Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".

revision `str`, *optional*

Revision of the repo on the Hub. Can be a branch name, a git tag or any commit id. Defaults to the latest commit on main branch.

force_download `bool`, *optional*, defaults to `False`

Whether to force (re-)downloading the transform configuration files from the Hub, overriding the existing cache.

proxies `dict[str, str]`, *optional*

A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on every request.

token `str` or `bool`, *optional*

The token to use as HTTP bearer authorization for remote files. By default, it will use the token cached when running huggingface-cli login.

cache_dir `str`, `Path`, *optional*

Path to the folder where cached files are stored.

local_files_only `bool`, *optional*, defaults to `False`

If True, avoid downloading the file and return the path to the local cached file if it exists.

Source code in albumentations/core/hub_mixin.py
Python
@classmethod
def from_pretrained(
    cls: Any,
    directory_or_repo_id: str | Path,
    *,
    key: str = "eval",
    force_download: bool = False,
    proxies: dict[str, str] | None = None,
    token: str | bool | None = None,
    cache_dir: str | Path | None = None,
    local_files_only: bool = False,
    revision: str | None = None,
) -> object:
    """Load a transform from the Huggingface Hub or a local directory.

    Args:
        directory_or_repo_id (`str`, `Path`):
            - Either the `repo_id` (string) of a repo with hosted transform on the Hub, e.g. `qubvel-hf/albu`.
            - Or a path to a `directory` containing transform config saved using
                [`~albumentations.Compose.save_pretrained`], e.g., `../path/to/my_directory/`.
        key (`str`, *optional*):
            Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
        revision (`str`, *optional*):
            Revision of the repo on the Hub. Can be a branch name, a git tag or any commit id.
            Defaults to the latest commit on `main` branch.
        force_download (`bool`, *optional*, defaults to `False`):
            Whether to force (re-)downloading the transform configuration files from the Hub, overriding
            the existing cache.
        proxies (`dict[str, str]`, *optional*):
            A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
            'http://hostname': 'foo.bar:4012'}`. The proxies are used on every request.
        token (`str` or `bool`, *optional*):
            The token to use as HTTP bearer authorization for remote files. By default, it will use the token
            cached when running `huggingface-cli login`.
        cache_dir (`str`, `Path`, *optional*):
            Path to the folder where cached files are stored.
        local_files_only (`bool`, *optional*, defaults to `False`):
            If `True`, avoid downloading the file and return the path to the local cached file if it exists.
    """
    filename = cls._CONFIG_FILE_NAME_TEMPLATE.format(key)
    directory_or_repo_id = Path(directory_or_repo_id)
    transform = None

    # check if the file is already present locally
    if directory_or_repo_id.is_dir():
        if filename in os.listdir(directory_or_repo_id):
            transform = cls._from_pretrained(save_directory=directory_or_repo_id, filename=filename)
        elif is_huggingface_hub_available:
            logging.info(
                f"{filename} not found in {Path(directory_or_repo_id).resolve()}, trying to load from the Hub.",
            )
        else:
            raise FileNotFoundError(
                f"{filename} not found in {Path(directory_or_repo_id).resolve()}."
                " Please install `huggingface_hub` to load from the Hub.",
            )
    if transform is not None:
        return transform

    # download the file from the Hub
    try:
        config_file = hf_hub_download(
            repo_id=directory_or_repo_id,
            filename=filename,
            revision=revision,
            cache_dir=cache_dir,
            force_download=force_download,
            proxies=proxies,
            token=token,
            local_files_only=local_files_only,
        )
        directory, filename = Path(config_file).parent, Path(config_file).name
        return cls._from_pretrained(save_directory=directory, filename=filename)

    except HfHubHTTPError as e:
        raise HfHubHTTPError(f"{filename} not found on the HuggingFace Hub") from e
push_to_hub (self, repo_id, *, key='eval', allow_custom_keys=False, commit_message='Push transform using huggingface_hub.', private=False, token=None, branch=None, create_pr=None)

Push the transform to the Huggingface Hub.

Use allow_patterns and ignore_patterns to precisely filter which files should be pushed to the hub. Use delete_patterns to delete existing remote files in the same commit. See [upload_folder] reference for more details.

Parameters:

Name Type Description
repo_id `str`

ID of the repository to push to (example: "username/my-model").

key `str`, *optional*

Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".

allow_custom_keys `bool`, *optional*

Allow custom keys for the configuration. Defaults to False.

commit_message `str`, *optional*

Message to commit while pushing.

private `bool`, *optional*, defaults to `False`

Whether the repository created should be private.

token `str`, *optional*

The token to use as HTTP bearer authorization for remote files. By default, it will use the token cached when running huggingface-cli login.

branch `str`, *optional*

The git branch on which to push the transform. This defaults to "main".

create_pr `boolean`, *optional*

Whether or not to create a Pull Request from branch with that commit. Defaults to False.

Returns:

Type Description
str

The url of the commit of your transform in the given repository.

Source code in albumentations/core/hub_mixin.py
Python
@require_huggingface_hub
def push_to_hub(
    self,
    repo_id: str,
    *,
    key: str = "eval",
    allow_custom_keys: bool = False,
    commit_message: str = "Push transform using huggingface_hub.",
    private: bool = False,
    token: str | None = None,
    branch: str | None = None,
    create_pr: bool | None = None,
) -> str:
    """Push the transform to the Huggingface Hub.

    Use `allow_patterns` and `ignore_patterns` to precisely filter which files should be pushed to the hub. Use
    `delete_patterns` to delete existing remote files in the same commit. See [`upload_folder`] reference for more
    details.

    Args:
        repo_id (`str`):
            ID of the repository to push to (example: `"username/my-model"`).
        key (`str`, *optional*):
            Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
        allow_custom_keys (`bool`, *optional*):
            Allow custom keys for the configuration. Defaults to False.
        commit_message (`str`, *optional*):
            Message to commit while pushing.
        private (`bool`, *optional*, defaults to `False`):
            Whether the repository created should be private.
        token (`str`, *optional*):
            The token to use as HTTP bearer authorization for remote files. By default, it will use the token
            cached when running `huggingface-cli login`.
        branch (`str`, *optional*):
            The git branch on which to push the transform. This defaults to `"main"`.
        create_pr (`boolean`, *optional*):
            Whether or not to create a Pull Request from `branch` with that commit. Defaults to `False`.

    Returns:
        The url of the commit of your transform in the given repository.
    """
    if not allow_custom_keys and key not in self._CONFIG_KEYS:
        raise ValueError(
            f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
            "If you still want to use a custom key, set `allow_custom_keys=True`.",
        )

    api = HfApi(token=token)
    repo_id = api.create_repo(repo_id=repo_id, private=private, exist_ok=True).repo_id

    # Push the files to the repo in a single commit
    with SoftTemporaryDirectory() as tmp:
        save_directory = Path(tmp) / repo_id
        filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
        save_path = self._save_pretrained(save_directory, filename=filename)
        return api.upload_file(
            path_or_fileobj=save_path,
            path_in_repo=filename,
            repo_id=repo_id,
            commit_message=commit_message,
            revision=branch,
            create_pr=create_pr,
        )
save_pretrained (self, save_directory, *, key='eval', allow_custom_keys=False, repo_id=None, push_to_hub=False, **push_to_hub_kwargs)

Save the transform and optionally push it to the Huggingface Hub.

Parameters:

Name Type Description
save_directory `str` or `Path`

Path to directory in which the transform configuration will be saved.

key `str`, *optional*

Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".

allow_custom_keys `bool`, *optional*

Allow custom keys for the configuration. Defaults to False.

push_to_hub `bool`, *optional*, defaults to `False`

Whether or not to push your transform to the Huggingface Hub after saving it.

repo_id `str`, *optional*

ID of your repository on the Hub. Used only if push_to_hub=True. Will default to the folder name if not provided.

push_to_hub_kwargs Any

Additional key word arguments passed along to the [push_to_hub] method.

Returns:

Type Description
`str` or `None`

url of the commit on the Hub if push_to_hub=True, None otherwise.

Source code in albumentations/core/hub_mixin.py
Python
def save_pretrained(
    self,
    save_directory: str | Path,
    *,
    key: str = "eval",
    allow_custom_keys: bool = False,
    repo_id: str | None = None,
    push_to_hub: bool = False,
    **push_to_hub_kwargs: Any,
) -> str | None:
    """Save the transform and optionally push it to the Huggingface Hub.

    Args:
        save_directory (`str` or `Path`):
            Path to directory in which the transform configuration will be saved.
        key (`str`, *optional*):
            Key to identify the configuration type, one of ["train", "eval"]. Defaults to "eval".
        allow_custom_keys (`bool`, *optional*):
            Allow custom keys for the configuration. Defaults to False.
        push_to_hub (`bool`, *optional*, defaults to `False`):
            Whether or not to push your transform to the Huggingface Hub after saving it.
        repo_id (`str`, *optional*):
            ID of your repository on the Hub. Used only if `push_to_hub=True`. Will default to the folder name if
            not provided.
        push_to_hub_kwargs:
            Additional key word arguments passed along to the [`push_to_hub`] method.

    Returns:
        `str` or `None`: url of the commit on the Hub if `push_to_hub=True`, `None` otherwise.
    """
    if not allow_custom_keys and key not in self._CONFIG_KEYS:
        raise ValueError(
            f"Invalid key: `{key}`. Please use key from {self._CONFIG_KEYS} keys for upload. "
            "If you want to use a custom key, set `allow_custom_keys=True`.",
        )

    # save model transforms
    filename = self._CONFIG_FILE_NAME_TEMPLATE.format(key)
    self._save_pretrained(save_directory, filename)

    # push to the Hub if required
    if push_to_hub:
        kwargs = push_to_hub_kwargs.copy()  # soft-copy to avoid mutating input
        if repo_id is None:
            repo_id = Path(save_directory).name  # Defaults to `save_directory` name
        return self.push_to_hub(repo_id=repo_id, key=key, **kwargs)
    return None

keypoints_utils

class KeypointParams (format, label_fields=None, remove_invisible=True, angle_in_degrees=True, check_each_transform=True) [view source on GitHub]

Parameters of keypoints

Parameters:

Name Type Description
format str

format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

x - X coordinate,

y - Y coordinate

s - Keypoint scale

a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)

label_fields list

list of fields that are joined with keypoints, e.g labels. Should be same type as keypoints.

remove_invisible bool

to remove invisible points after transform or not

angle_in_degrees bool

angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints

check_each_transform bool

if True, then keypoints will be checked after each dual transform. Default: True

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/keypoints_utils.py
Python
class KeypointParams(Params):
    """Parameters of keypoints

    Args:
        format (str): format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.

            x - X coordinate,

            y - Y coordinate

            s - Keypoint scale

            a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)
        label_fields (list): list of fields that are joined with keypoints, e.g labels.
            Should be same type as keypoints.
        remove_invisible (bool): to remove invisible points after transform or not
        angle_in_degrees (bool): angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints
        check_each_transform (bool): if `True`, then keypoints will be checked after each dual transform.
            Default: `True`

    """

    def __init__(
        self,
        format: str,  # noqa: A002
        label_fields: Sequence[str] | None = None,
        remove_invisible: bool = True,
        angle_in_degrees: bool = True,
        check_each_transform: bool = True,
    ):
        super().__init__(format, label_fields)
        self.remove_invisible = remove_invisible
        self.angle_in_degrees = angle_in_degrees
        self.check_each_transform = check_each_transform

    def to_dict_private(self) -> dict[str, Any]:
        data = super().to_dict_private()
        data.update(
            {
                "remove_invisible": self.remove_invisible,
                "angle_in_degrees": self.angle_in_degrees,
                "check_each_transform": self.check_each_transform,
            },
        )
        return data

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    @classmethod
    def get_class_fullname(cls) -> str:
        return "KeypointParams"

def check_keypoints (keypoints, image_shape) [view source on GitHub]

Check if keypoint coordinates are within valid ranges for the given image shape.

This function validates that: 1. All x-coordinates are within [0, width) 2. All y-coordinates are within [0, height) 3. If angles are present (i.e., keypoints have more than 2 columns), they are within the range [0, 2π)

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (N, 2+), where N is the number of keypoints. Each row represents a keypoint with at least (x, y) coordinates. If present, the third column is assumed to be the angle.

image_shape Tuple[int, int]

The shape of the image (height, width).

Exceptions:

Type Description
ValueError

If any keypoint coordinate is outside the valid range, or if any angle is invalid. The error message will detail which keypoints are invalid and why.

Note

  • The function assumes that keypoint coordinates are in absolute pixel values, not normalized.
  • Angles, if present, are assumed to be in radians.
  • The constant PAIR should be defined elsewhere in the module, typically as 2.
Source code in albumentations/core/keypoints_utils.py
Python
def check_keypoints(keypoints: np.ndarray, image_shape: tuple[int, int]) -> None:
    """Check if keypoint coordinates are within valid ranges for the given image shape.

    This function validates that:
    1. All x-coordinates are within [0, width)
    2. All y-coordinates are within [0, height)
    3. If angles are present (i.e., keypoints have more than 2 columns),
       they are within the range [0, 2π)

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (N, 2+), where N is the number of keypoints.
                                Each row represents a keypoint with at least (x, y) coordinates.
                                If present, the third column is assumed to be the angle.
        image_shape (Tuple[int, int]): The shape of the image (height, width).

    Raises:
        ValueError: If any keypoint coordinate is outside the valid range, or if any angle is invalid.
                    The error message will detail which keypoints are invalid and why.

    Note:
        - The function assumes that keypoint coordinates are in absolute pixel values, not normalized.
        - Angles, if present, are assumed to be in radians.
        - The constant PAIR should be defined elsewhere in the module, typically as 2.
    """
    height, width = image_shape[:2]

    # Check x and y coordinates
    x, y = keypoints[:, 0], keypoints[:, 1]
    if np.any((x < 0) | (x >= width)) or np.any((y < 0) | (y >= height)):
        invalid_x = np.where((x < 0) | (x >= width))[0]
        invalid_y = np.where((y < 0) | (y >= height))[0]

        error_messages = []

        error_messages = [
            f"Expected {'x' if idx in invalid_x else 'y'} for keypoint {keypoints[idx]} to be "
            f"in the range [0.0, {width if idx in invalid_x else height}], "
            f"got {x[idx] if idx in invalid_x else y[idx]}."
            for idx in sorted(set(invalid_x) | set(invalid_y))
        ]

        raise ValueError("\n".join(error_messages))

    # Check angles
    if keypoints.shape[1] > PAIR:
        angles = keypoints[:, 2]
        invalid_angles = np.where((angles < 0) | (angles >= 2 * math.pi))[0]
        if len(invalid_angles) > 0:
            error_messages = [
                f"Keypoint angle must be in range [0, 2 * PI). Got: {angles[idx]} for keypoint {keypoints[idx]}"
                for idx in invalid_angles
            ]
            raise ValueError("\n".join(error_messages))

def convert_keypoints_from_albumentations (keypoints, target_format, image_shape, check_validity=False, angle_in_degrees=True) [view source on GitHub]

Convert keypoints from Albumentations format to various other formats.

This function takes keypoints in the standard Albumentations format [x, y, angle, scale] and converts them to the specified target format.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints in Albumentations format with shape (N, 4+), where N is the number of keypoints. Each row represents a keypoint [x, y, angle, scale, ...].

target_format Literal["xy", "yx", "xya", "xys", "xyas", "xysa"]

The desired output format. - "xy": [x, y] - "yx": [y, x] - "xya": [x, y, angle] - "xys": [x, y, scale] - "xyas": [x, y, angle, scale] - "xysa": [x, y, scale, angle]

image_shape tuple[int, int]

The shape of the image (height, width).

check_validity bool

If True, check if the keypoints are within the image boundaries. Defaults to False.

angle_in_degrees bool

If True, convert output angles to degrees. If False, angles remain in radians. Defaults to True.

Returns:

Type Description
np.ndarray

Array of keypoints in the specified target format with shape (N, 2+). Any additional columns from the input keypoints beyond the first 4 are preserved and appended after the converted columns.

Exceptions:

Type Description
ValueError

If the target_format is not one of the supported formats.

Note

  • Input angles are assumed to be in the range [0, 2π) radians.
  • If the input keypoints have additional columns beyond the first 4, these columns are preserved in the output.
  • The constant NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS should be defined elsewhere in the module, typically as 4.
Source code in albumentations/core/keypoints_utils.py
Python
def convert_keypoints_from_albumentations(
    keypoints: np.ndarray,
    target_format: Literal["xy", "yx", "xya", "xys", "xyas", "xysa"],
    image_shape: tuple[int, int],
    check_validity: bool = False,
    angle_in_degrees: bool = True,
) -> np.ndarray:
    """Convert keypoints from Albumentations format to various other formats.

    This function takes keypoints in the standard Albumentations format [x, y, angle, scale]
    and converts them to the specified target format.

    Args:
        keypoints (np.ndarray): Array of keypoints in Albumentations format with shape (N, 4+),
                                where N is the number of keypoints. Each row represents a keypoint
                                [x, y, angle, scale, ...].
        target_format (Literal["xy", "yx", "xya", "xys", "xyas", "xysa"]): The desired output format.
            - "xy": [x, y]
            - "yx": [y, x]
            - "xya": [x, y, angle]
            - "xys": [x, y, scale]
            - "xyas": [x, y, angle, scale]
            - "xysa": [x, y, scale, angle]
        image_shape (tuple[int, int]): The shape of the image (height, width).
        check_validity (bool, optional): If True, check if the keypoints are within the image boundaries.
                                         Defaults to False.
        angle_in_degrees (bool, optional): If True, convert output angles to degrees.
                                           If False, angles remain in radians.
                                           Defaults to True.

    Returns:
        np.ndarray: Array of keypoints in the specified target format with shape (N, 2+).
                    Any additional columns from the input keypoints beyond the first 4
                    are preserved and appended after the converted columns.

    Raises:
        ValueError: If the target_format is not one of the supported formats.

    Note:
        - Input angles are assumed to be in the range [0, 2π) radians.
        - If the input keypoints have additional columns beyond the first 4,
          these columns are preserved in the output.
        - The constant NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS should be defined
          elsewhere in the module, typically as 4.
    """
    if target_format not in keypoint_formats:
        raise ValueError(f"Unknown target_format {target_format}. Supported formats are: {keypoint_formats}")

    x, y, angle, scale = keypoints[:, 0], keypoints[:, 1], keypoints[:, 2], keypoints[:, 3]
    angle = angle_to_2pi_range(angle)

    if check_validity:
        check_keypoints(np.column_stack((x, y, angle, scale)), image_shape)

    if angle_in_degrees:
        angle = np.degrees(angle)

    format_to_columns = {
        "xy": [x, y],
        "yx": [y, x],
        "xya": [x, y, angle],
        "xys": [x, y, scale],
        "xyas": [x, y, angle, scale],
        "xysa": [x, y, scale, angle],
    }

    result = np.column_stack(format_to_columns[target_format])

    # Add any additional columns from the original keypoints
    if keypoints.shape[1] > NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS:
        return np.column_stack((result, keypoints[:, NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS:]))

    return result

def convert_keypoints_to_albumentations (keypoints, source_format, image_shape, check_validity=False, angle_in_degrees=True) [view source on GitHub]

Convert keypoints from various formats to the Albumentations format.

This function takes keypoints in different formats and converts them to the standard Albumentations format: [x, y, angle, scale]. If the input format doesn't include angle or scale, these values are set to 0.

Parameters:

Name Type Description
keypoints np.ndarray

Array of keypoints with shape (N, 2+), where N is the number of keypoints. The number of columns depends on the source_format.

source_format Literal["xy", "yx", "xya", "xys", "xyas", "xysa"]

The format of the input keypoints. - "xy": [x, y] - "yx": [y, x] - "xya": [x, y, angle] - "xys": [x, y, scale] - "xyas": [x, y, angle, scale] - "xysa": [x, y, scale, angle]

image_shape tuple[int, int]

The shape of the image (height, width).

check_validity bool

If True, check if the converted keypoints are within the image boundaries. Defaults to False.

angle_in_degrees bool

If True, convert input angles from degrees to radians. Defaults to True.

Returns:

Type Description
np.ndarray

Array of keypoints in Albumentations format [x, y, angle, scale] with shape (N, 4+). Any additional columns from the input keypoints are preserved and appended after the first 4 columns.

Exceptions:

Type Description
ValueError

If the source_format is not one of the supported formats.

Note

  • Angles are converted to the range [0, 2π) radians.
  • If the input keypoints have additional columns beyond what's specified in the source_format, these columns are preserved in the output.
Source code in albumentations/core/keypoints_utils.py
Python
def convert_keypoints_to_albumentations(
    keypoints: np.ndarray,
    source_format: Literal["xy", "yx", "xya", "xys", "xyas", "xysa"],
    image_shape: tuple[int, int],
    check_validity: bool = False,
    angle_in_degrees: bool = True,
) -> np.ndarray:
    """Convert keypoints from various formats to the Albumentations format.

    This function takes keypoints in different formats and converts them to the standard
    Albumentations format: [x, y, angle, scale]. If the input format doesn't include
    angle or scale, these values are set to 0.

    Args:
        keypoints (np.ndarray): Array of keypoints with shape (N, 2+), where N is the number of keypoints.
                                The number of columns depends on the source_format.
        source_format (Literal["xy", "yx", "xya", "xys", "xyas", "xysa"]): The format of the input keypoints.
            - "xy": [x, y]
            - "yx": [y, x]
            - "xya": [x, y, angle]
            - "xys": [x, y, scale]
            - "xyas": [x, y, angle, scale]
            - "xysa": [x, y, scale, angle]
        image_shape (tuple[int, int]): The shape of the image (height, width).
        check_validity (bool, optional): If True, check if the converted keypoints are within the image boundaries.
                                         Defaults to False.
        angle_in_degrees (bool, optional): If True, convert input angles from degrees to radians.
                                           Defaults to True.

    Returns:
        np.ndarray: Array of keypoints in Albumentations format [x, y, angle, scale] with shape (N, 4+).
                    Any additional columns from the input keypoints are preserved and appended after the
                    first 4 columns.

    Raises:
        ValueError: If the source_format is not one of the supported formats.

    Note:
        - Angles are converted to the range [0, 2π) radians.
        - If the input keypoints have additional columns beyond what's specified in the source_format,
          these columns are preserved in the output.
    """
    if source_format not in keypoint_formats:
        raise ValueError(f"Unknown source_format {source_format}. Supported formats are: {keypoint_formats}")

    format_to_indices: dict[str, list[int | None]] = {
        "xy": [0, 1, None, None],
        "yx": [1, 0, None, None],
        "xya": [0, 1, 2, None],
        "xys": [0, 1, None, 2],
        "xyas": [0, 1, 2, 3],
        "xysa": [0, 1, 3, 2],
    }

    indices: list[int | None] = format_to_indices[source_format]

    processed_keypoints = np.zeros((keypoints.shape[0], NUM_KEYPOINTS_COLUMNS_IN_ALBUMENTATIONS), dtype=np.float32)

    for i, idx in enumerate(indices):
        if idx is not None:
            processed_keypoints[:, i] = keypoints[:, idx]

    if angle_in_degrees and indices[2] is not None:
        processed_keypoints[:, 2] = np.radians(processed_keypoints[:, 2])

    processed_keypoints[:, 2] = angle_to_2pi_range(processed_keypoints[:, 2])

    if keypoints.shape[1] > len(source_format):
        processed_keypoints = np.column_stack((processed_keypoints, keypoints[:, len(source_format) :]))

    if check_validity:
        check_keypoints(processed_keypoints, image_shape)

    return processed_keypoints

def filter_keypoints (keypoints, image_shape, remove_invisible) [view source on GitHub]

Filter keypoints to remove those outside the image boundaries.

Parameters:

Name Type Description
keypoints np.ndarray

A numpy array of shape (N, 2+) where N is the number of keypoints. Each row represents a keypoint (x, y, ...).

image_shape tuple[int, int]

A tuple (height, width) representing the image dimensions.

remove_invisible bool

If True, remove keypoints outside the image boundaries.

Returns:

Type Description
np.ndarray

A numpy array of filtered keypoints.

Source code in albumentations/core/keypoints_utils.py
Python
def filter_keypoints(
    keypoints: np.ndarray,
    image_shape: tuple[int, int],
    remove_invisible: bool,
) -> np.ndarray:
    """Filter keypoints to remove those outside the image boundaries.

    Args:
        keypoints: A numpy array of shape (N, 2+) where N is the number of keypoints.
                   Each row represents a keypoint (x, y, ...).
        image_shape: A tuple (height, width) representing the image dimensions.
        remove_invisible: If True, remove keypoints outside the image boundaries.

    Returns:
        A numpy array of filtered keypoints.
    """
    if not remove_invisible:
        return keypoints

    if not keypoints.size:
        return keypoints

    height, width = image_shape[:2]

    # Create boolean mask for visible keypoints
    x, y = keypoints[:, 0], keypoints[:, 1]
    visible = (x >= 0) & (x < width) & (y >= 0) & (y < height)

    # Apply the mask to filter keypoints
    return keypoints[visible]

serialization

class Serializable [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/serialization.py
Python
class Serializable(metaclass=SerializableMeta):
    @classmethod
    @abstractmethod
    def is_serializable(cls) -> bool:
        raise NotImplementedError

    @classmethod
    @abstractmethod
    def get_class_fullname(cls) -> str:
        raise NotImplementedError

    @abstractmethod
    def to_dict_private(self) -> dict[str, Any]:
        raise NotImplementedError

    def to_dict(self, on_not_implemented_error: str = "raise") -> dict[str, Any]:
        """Take a transform pipeline and convert it to a serializable representation that uses only standard
        python data types: dictionaries, lists, strings, integers, and floats.

        Args:
            self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
                method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
                If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
                but no transform parameters will be serialized.
            on_not_implemented_error (str): `raise` or `warn`.

        """
        if on_not_implemented_error not in {"raise", "warn"}:
            msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
            "and 'warn'"
            raise ValueError(msg)
        try:
            transform_dict = self.to_dict_private()
        except NotImplementedError:
            if on_not_implemented_error == "raise":
                raise

            transform_dict = {}
            warnings.warn(
                f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
                f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
                f"or '{self.__class__.__name__}.get_transform_init_args' "
                "method to make the transform serializable",
                stacklevel=2,
            )
        return {"__version__": __version__, "transform": transform_dict}
to_dict (self, on_not_implemented_error='raise')

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name Type Description
self

A transform that should be serialized. If the transform doesn't implement the to_dict method and on_not_implemented_error equals to 'raise' then NotImplementedError is raised. If on_not_implemented_error equals to 'warn' then NotImplementedError will be ignored but no transform parameters will be serialized.

on_not_implemented_error str

raise or warn.

Source code in albumentations/core/serialization.py
Python
def to_dict(self, on_not_implemented_error: str = "raise") -> dict[str, Any]:
    """Take a transform pipeline and convert it to a serializable representation that uses only standard
    python data types: dictionaries, lists, strings, integers, and floats.

    Args:
        self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
            method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
            If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
            but no transform parameters will be serialized.
        on_not_implemented_error (str): `raise` or `warn`.

    """
    if on_not_implemented_error not in {"raise", "warn"}:
        msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
        "and 'warn'"
        raise ValueError(msg)
    try:
        transform_dict = self.to_dict_private()
    except NotImplementedError:
        if on_not_implemented_error == "raise":
            raise

        transform_dict = {}
        warnings.warn(
            f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
            f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
            f"or '{self.__class__.__name__}.get_transform_init_args' "
            "method to make the transform serializable",
            stacklevel=2,
        )
    return {"__version__": __version__, "transform": transform_dict}

class SerializableMeta [view source on GitHub]

A metaclass that is used to register classes in SERIALIZABLE_REGISTRY or NON_SERIALIZABLE_REGISTRY so they can be found later while deserializing transformation pipeline using classes full names.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/serialization.py
Python
class SerializableMeta(ABCMeta):
    """A metaclass that is used to register classes in `SERIALIZABLE_REGISTRY` or `NON_SERIALIZABLE_REGISTRY`
    so they can be found later while deserializing transformation pipeline using classes full names.
    """

    def __new__(cls, name: str, bases: tuple[type, ...], *args: Any, **kwargs: Any) -> SerializableMeta:
        cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
        if name != "Serializable" and ABC not in bases:
            if cls_obj.is_serializable():
                SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
            else:
                NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
        return cls_obj

    @classmethod
    def is_serializable(cls) -> bool:
        return False

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def _to_dict(cls) -> dict[str, Any]:
        return {}
__new__ (cls, name, bases, *args, **kwargs) special staticmethod

Create and return a new object. See help(type) for accurate signature.

Source code in albumentations/core/serialization.py
Python
def __new__(cls, name: str, bases: tuple[type, ...], *args: Any, **kwargs: Any) -> SerializableMeta:
    cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
    if name != "Serializable" and ABC not in bases:
        if cls_obj.is_serializable():
            SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
        else:
            NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
    return cls_obj

def from_dict (transform_dict, nonserializable=None) [view source on GitHub]

transform_dict: A dictionary with serialized transform pipeline. nonserializable (dict): A dictionary that contains non-serializable transforms. This dictionary is required when you are restoring a pipeline that contains non-serializable transforms. Keys in that dictionary should be named same as name arguments in respective transforms from a serialized pipeline.

Source code in albumentations/core/serialization.py
Python
def from_dict(
    transform_dict: dict[str, Any],
    nonserializable: dict[str, Any] | None = None,
) -> Serializable | None:
    """Args:
    transform_dict: A dictionary with serialized transform pipeline.
    nonserializable (dict): A dictionary that contains non-serializable transforms.
        This dictionary is required when you are restoring a pipeline that contains non-serializable transforms.
        Keys in that dictionary should be named same as `name` arguments in respective transforms from
        a serialized pipeline.

    """
    register_additional_transforms()
    transform = transform_dict["transform"]
    lmbd = instantiate_nonserializable(transform, nonserializable)
    if lmbd:
        return lmbd
    name = transform["__class_fullname__"]
    args = {k: v for k, v in transform.items() if k != "__class_fullname__"}
    cls = SERIALIZABLE_REGISTRY[shorten_class_name(name)]
    if "transforms" in args:
        args["transforms"] = [from_dict({"transform": t}, nonserializable=nonserializable) for t in args["transforms"]]
    return cls(**args)

def get_shortest_class_fullname (cls) [view source on GitHub]

The function get_shortest_class_fullname takes a class object as input and returns its shortened full name.

:param cls: The parameter cls is of type Type[BasicCompose], which means it expects a class that is a subclass of BasicCompose :type cls: Type[BasicCompose] :return: a string, which is the shortened version of the full class name.

Source code in albumentations/core/serialization.py
Python
def get_shortest_class_fullname(cls: type[Any]) -> str:
    """The function `get_shortest_class_fullname` takes a class object as input and returns its shortened
    full name.

    :param cls: The parameter `cls` is of type `Type[BasicCompose]`, which means it expects a class that
    is a subclass of `BasicCompose`
    :type cls: Type[BasicCompose]
    :return: a string, which is the shortened version of the full class name.
    """
    class_fullname = f"{cls.__module__}.{cls.__name__}"
    return shorten_class_name(class_fullname)

def load (filepath_or_buffer, data_format='json', nonserializable=None) [view source on GitHub]

Load a serialized pipeline from a file or file-like object and construct a transform pipeline.

Parameters:

Name Type Description
filepath_or_buffer Union[str, Path, TextIO]

The file path or file-like object to read the serialized data from. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be read from it directly.

data_format str

The format of the serialized data. Valid options are 'json' and 'yaml'. Defaults to 'json'.

nonserializable Optional[dict[str, Any]]

A dictionary that contains non-serializable transforms. This dictionary is required when restoring a pipeline that contains non-serializable transforms. Keys in the dictionary should be named the same as the name arguments in respective transforms from the serialized pipeline. Defaults to None.

Returns:

Type Description
object

The deserialized transform pipeline.

Exceptions:

Type Description
ValueError

If data_format is 'yaml' but PyYAML is not installed.

Source code in albumentations/core/serialization.py
Python
def load(
    filepath_or_buffer: str | Path | TextIO,
    data_format: str = "json",
    nonserializable: dict[str, Any] | None = None,
) -> object:
    """Load a serialized pipeline from a file or file-like object and construct a transform pipeline.

    Args:
        filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to read the serialized
            data from.
            If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
            the serialized data will be read from it directly.
        data_format (str): The format of the serialized data. Valid options are 'json' and 'yaml'.
            Defaults to 'json'.
        nonserializable (Optional[dict[str, Any]]): A dictionary that contains non-serializable transforms.
            This dictionary is required when restoring a pipeline that contains non-serializable transforms.
            Keys in the dictionary should be named the same as the `name` arguments in respective transforms
            from the serialized pipeline. Defaults to None.

    Returns:
        object: The deserialized transform pipeline.

    Raises:
        ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

    """
    check_data_format(data_format)

    if isinstance(filepath_or_buffer, (str, Path)):  # Assume it's a filepath
        with open(filepath_or_buffer) as f:
            if data_format == "json":
                transform_dict = json.load(f)
            else:
                if not yaml_available:
                    msg = "You need to install PyYAML to load a pipeline in yaml format"
                    raise ValueError(msg)
                transform_dict = yaml.safe_load(f)
    elif data_format == "json":
        transform_dict = json.load(filepath_or_buffer)
    else:
        if not yaml_available:
            msg = "You need to install PyYAML to load a pipeline in yaml format"
            raise ValueError(msg)
        transform_dict = yaml.safe_load(filepath_or_buffer)

    return from_dict(transform_dict, nonserializable=nonserializable)

def register_additional_transforms () [view source on GitHub]

Register transforms that are not imported directly into the albumentations module by checking the availability of optional dependencies.

Source code in albumentations/core/serialization.py
Python
def register_additional_transforms() -> None:
    """Register transforms that are not imported directly into the `albumentations` module by checking
    the availability of optional dependencies.
    """
    if importlib.util.find_spec("torch") is not None:
        try:
            # Import `albumentations.pytorch` only if `torch` is installed.
            import albumentations.pytorch

            # Use a dummy operation to acknowledge the use of the imported module and avoid linting errors.
            _ = albumentations.pytorch.ToTensorV2
        except ImportError:
            pass

def save (transform, filepath_or_buffer, data_format='json', on_not_implemented_error='raise') [view source on GitHub]

Serialize a transform pipeline and save it to either a file specified by a path or a file-like object in either JSON or YAML format.

Parameters:

Name Type Description
transform Serializable

The transform pipeline to serialize.

filepath_or_buffer Union[str, Path, TextIO]

The file path or file-like object to write the serialized data to. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be written to it directly.

data_format str

The format to serialize the data in. Valid options are 'json' and 'yaml'. Defaults to 'json'.

on_not_implemented_error str

Determines the behavior if a transform does not implement the to_dict method. If set to 'raise', a NotImplementedError is raised. If set to 'warn', the exception is ignored, and no transform arguments are saved. Defaults to 'raise'.

Exceptions:

Type Description
ValueError

If data_format is 'yaml' but PyYAML is not installed.

Source code in albumentations/core/serialization.py
Python
def save(
    transform: Serializable,
    filepath_or_buffer: str | Path | TextIO,
    data_format: str = "json",
    on_not_implemented_error: str = "raise",
) -> None:
    """Serialize a transform pipeline and save it to either a file specified by a path or a file-like object
    in either JSON or YAML format.

    Args:
        transform (Serializable): The transform pipeline to serialize.
        filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to write the serialized
            data to.
            If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
            the serialized data will be written to it directly.
        data_format (str): The format to serialize the data in. Valid options are 'json' and 'yaml'.
            Defaults to 'json'.
        on_not_implemented_error (str): Determines the behavior if a transform does not implement the `to_dict` method.
            If set to 'raise', a `NotImplementedError` is raised. If set to 'warn', the exception is ignored, and
            no transform arguments are saved. Defaults to 'raise'.

    Raises:
        ValueError: If `data_format` is 'yaml' but PyYAML is not installed.

    """
    check_data_format(data_format)
    transform_dict = transform.to_dict(on_not_implemented_error=on_not_implemented_error)
    transform_dict = serialize_enum(transform_dict)

    # Determine whether to write to a file or a file-like object
    if isinstance(filepath_or_buffer, (str, Path)):  # It's a filepath
        with open(filepath_or_buffer, "w") as f:
            if data_format == "yaml":
                if not yaml_available:
                    msg = "You need to install PyYAML to save a pipeline in YAML format"
                    raise ValueError(msg)
                yaml.safe_dump(transform_dict, f, default_flow_style=False)
            elif data_format == "json":
                json.dump(transform_dict, f)
    elif data_format == "yaml":
        if not yaml_available:
            msg = "You need to install PyYAML to save a pipeline in YAML format"
            raise ValueError(msg)
        yaml.safe_dump(transform_dict, filepath_or_buffer, default_flow_style=False)
    elif data_format == "json":
        json.dump(transform_dict, filepath_or_buffer, indent=2)

def serialize_enum (obj) [view source on GitHub]

Recursively search for Enum objects and convert them to their value. Also handle any Mapping or Sequence types.

Source code in albumentations/core/serialization.py
Python
def serialize_enum(obj: Any) -> Any:
    """Recursively search for Enum objects and convert them to their value.
    Also handle any Mapping or Sequence types.
    """
    if isinstance(obj, Mapping):
        return {k: serialize_enum(v) for k, v in obj.items()}
    if isinstance(obj, Sequence) and not isinstance(obj, str):  # exclude strings since they're also sequences
        return [serialize_enum(v) for v in obj]
    return obj.value if isinstance(obj, Enum) else obj

def to_dict (transform, on_not_implemented_error='raise') [view source on GitHub]

Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.

Parameters:

Name Type Description
transform Serializable

A transform that should be serialized. If the transform doesn't implement the to_dict method and on_not_implemented_error equals to 'raise' then NotImplementedError is raised. If on_not_implemented_error equals to 'warn' then NotImplementedError will be ignored but no transform parameters will be serialized.

on_not_implemented_error str

raise or warn.

Source code in albumentations/core/serialization.py
Python
def to_dict(transform: Serializable, on_not_implemented_error: str = "raise") -> dict[str, Any]:
    """Take a transform pipeline and convert it to a serializable representation that uses only standard
    python data types: dictionaries, lists, strings, integers, and floats.

    Args:
        transform: A transform that should be serialized. If the transform doesn't implement the `to_dict`
            method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
            If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
            but no transform parameters will be serialized.
        on_not_implemented_error (str): `raise` or `warn`.

    """
    return transform.to_dict(on_not_implemented_error)

transforms_interface

class BaseTransformInitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py
Python
class BaseTransformInitSchema(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)
    always_apply: bool | None = Field(
        default=None,
        deprecated="Deprecated. Use `p=1` instead to always apply the transform",
    )
    p: ProbabilityType = 0.5
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class BasicTransform (p=0.5, always_apply=None) [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py
Python
class BasicTransform(Serializable, metaclass=CombinedMeta):
    _targets: tuple[Targets, ...] | Targets  # targets that this transform can work on
    _available_keys: set[str]  # targets that this transform, as string, lower-cased
    _key2func: dict[
        str,
        Callable[..., Any],
    ]  # mapping for targets (plus additional targets) and methods for which they depend
    call_backup = None
    interpolation: int
    fill_value: ColorType
    mask_fill_value: ColorType | None
    # replay mode params
    deterministic: bool = False
    save_key = "replay"
    replay_mode = False
    applied_in_replay = False

    class InitSchema(BaseTransformInitSchema):
        pass

    def __init__(self, p: float = 0.5, always_apply: bool | None = None):
        self.p = p
        if always_apply is not None:
            if always_apply:
                warn(
                    "always_apply is deprecated. Use `p=1` if you want to always apply the transform."
                    " self.p will be set to 1.",
                    DeprecationWarning,
                    stacklevel=2,
                )
                self.p = 1.0
            else:
                warn(
                    "always_apply is deprecated.",
                    DeprecationWarning,
                    stacklevel=2,
                )
        self._additional_targets: dict[str, str] = {}
        # replay mode params
        self.params: dict[Any, Any] = {}
        self._key2func = {}
        self._set_keys()

    def __call__(self, *args: Any, force_apply: bool = False, **kwargs: Any) -> Any:
        if args:
            msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
            raise KeyError(msg)
        if self.replay_mode:
            if self.applied_in_replay:
                return self.apply_with_params(self.params, **kwargs)

            return kwargs

        if self.should_apply(force_apply=force_apply):
            params = self.get_params()
            params = self.update_params_shape(params=params, data=kwargs)

            if self.targets_as_params:  # check if all required targets are in kwargs.
                missing_keys = set(self.targets_as_params).difference(kwargs.keys())
                if missing_keys and not (missing_keys == {"image"} and "images" in kwargs):
                    msg = f"{self.__class__.__name__} requires {self.targets_as_params} missing keys: {missing_keys}"
                    raise ValueError(msg)

            params_dependent_on_data = self.get_params_dependent_on_data(params=params, data=kwargs)
            params.update(params_dependent_on_data)

            if self.targets_as_params:  # this block will be removed after removing `get_params_dependent_on_targets`
                targets_as_params = {k: kwargs.get(k, None) for k in self.targets_as_params}
                if missing_keys:  # here we expecting case when missing_keys == {"image"} and "images" in kwargs
                    targets_as_params["image"] = kwargs["images"][0]
                params_dependent_on_targets = self.get_params_dependent_on_targets(targets_as_params)
                params.update(params_dependent_on_targets)
            if self.deterministic:
                kwargs[self.save_key][id(self)] = deepcopy(params)
            return self.apply_with_params(params, **kwargs)

        return kwargs

    def should_apply(self, force_apply: bool = False) -> bool:
        if self.p <= 0.0:
            return False
        if self.p >= 1.0 or force_apply:
            return True
        return random.random() < self.p

    def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
        """Apply transforms with parameters."""
        params = self.update_params(params, **kwargs)  # remove after move parameters like interpolation
        res = {}
        for key, arg in kwargs.items():
            if key in self._key2func and arg is not None:
                target_function = self._key2func[key]
                if isinstance(arg, np.ndarray):
                    result = target_function(np.require(arg, requirements=["C_CONTIGUOUS"]), **params)
                    if isinstance(result, np.ndarray):
                        res[key] = np.require(result, requirements=["C_CONTIGUOUS"])
                    else:
                        res[key] = result
                else:
                    res[key] = target_function(arg, **params)
            else:
                res[key] = arg
        return res

    def set_deterministic(self, flag: bool, save_key: str = "replay") -> BasicTransform:
        """Set transform to be deterministic."""
        if save_key == "params":
            msg = "params save_key is reserved"
            raise KeyError(msg)

        self.deterministic = flag
        if self.deterministic and self.targets_as_params:
            warn(
                self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
                " because its' params depend on targets.",
                stacklevel=2,
            )
        self.save_key = save_key
        return self

    def __repr__(self) -> str:
        state = self.get_base_init_args()
        state.update(self.get_transform_init_args())
        return f"{self.__class__.__name__}({format_args(state)})"

    def apply(self, img: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        """Apply transform on image."""
        raise NotImplementedError

    def apply_to_images(self, images: np.ndarray, **params: Any) -> list[np.ndarray]:
        """Apply transform on images."""
        return [self.apply(image, **params) for image in images]

    def get_params(self) -> dict[str, Any]:
        """Returns parameters independent of input."""
        return {}

    def update_params_shape(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        """Updates parameters with input image shape."""
        # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
        shape = data["image"].shape if "image" in data else data["images"][0].shape
        params["shape"] = shape
        params.update({"cols": shape[1], "rows": shape[0]})
        return params

    def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
        """Returns parameters dependent on input."""
        return params

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        # mapping for targets and methods for which they depend
        # for example:
        # >>  {"image": self.apply}
        # >>  {"masks": self.apply_to_masks}
        raise NotImplementedError

    def _set_keys(self) -> None:
        """Set _available_keys."""
        if not hasattr(self, "_targets"):
            self._available_keys = set()
        else:
            self._available_keys = {
                target.value.lower()
                for target in (self._targets if isinstance(self._targets, tuple) else [self._targets])
            }
        self._available_keys.update(self.targets.keys())
        self._key2func = {key: self.targets[key] for key in self._available_keys if key in self.targets}

    @property
    def available_keys(self) -> set[str]:
        """Returns set of available keys."""
        return self._available_keys

    def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
        """Update parameters with transform specific params.
        This method is deprecated, use:
        - `get_params` for transform specific params like interpolation and
        - `update_params_shape` for data like shape.
        """
        if hasattr(self, "interpolation"):
            params["interpolation"] = self.interpolation
        if hasattr(self, "fill_value"):
            params["fill_value"] = self.fill_value
        if hasattr(self, "mask_fill_value"):
            params["mask_fill_value"] = self.mask_fill_value

        # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
        shape = kwargs["image"].shape if "image" in kwargs else kwargs["images"][0].shape
        params["shape"] = shape
        params.update({"cols": shape[1], "rows": shape[0]})
        return params

    def add_targets(self, additional_targets: dict[str, str]) -> None:
        """Add targets to transform them the same way as one of existing targets.
        ex: {'target_image': 'image'}
        ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
        by the way you must have at least one object with key 'image'

        Args:
            additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

        """
        for k, v in additional_targets.items():
            if k in self._additional_targets and v != self._additional_targets[k]:
                raise ValueError(
                    f"Trying to overwrite existed additional targets. "
                    f"Key={k} Exists={self._additional_targets[k]} New value: {v}",
                )
            if v in self._available_keys:
                self._additional_targets[k] = v
                self._key2func[k] = self.targets[v]
                self._available_keys.add(k)

    @property
    def targets_as_params(self) -> list[str]:
        """Targets used to get params dependent on targets.
        This is used to check input has all required targets.
        """
        return []

    def get_params_dependent_on_targets(self, params: dict[str, Any]) -> dict[str, Any]:
        """This method is deprecated.
        Use `get_params_dependent_on_data` instead.
        Returns parameters dependent on targets.
        Dependent target is defined in `self.targets_as_params`
        """
        return {}

    @classmethod
    def get_class_fullname(cls) -> str:
        return get_shortest_class_fullname(cls)

    @classmethod
    def is_serializable(cls) -> bool:
        return True

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        """Returns names of arguments that are used in __init__ method of the transform."""
        msg = (
            f"Class {self.get_class_fullname()} is not serializable because the `get_transform_init_args_names` "
            "method is not implemented"
        )
        raise NotImplementedError(msg)

    def get_base_init_args(self) -> dict[str, Any]:
        """Returns base init args - p"""
        return {"p": self.p}

    def get_transform_init_args(self) -> dict[str, Any]:
        return {k: getattr(self, k) for k in self.get_transform_init_args_names()}

    def to_dict_private(self) -> dict[str, Any]:
        state = {"__class_fullname__": self.get_class_fullname()}
        state.update(self.get_base_init_args())
        state.update(self.get_transform_init_args())

        return state

    def get_dict_with_id(self) -> dict[str, Any]:
        d = self.to_dict_private()
        d["id"] = id(self)
        return d
available_keys: set[str] property readonly

Returns set of available keys.

targets_as_params: list[str] property readonly

Targets used to get params dependent on targets. This is used to check input has all required targets.

class InitSchema

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py
Python
class InitSchema(BaseTransformInitSchema):
    pass
__class_vars__ special

The names of the class variables defined on the model.

__private_attributes__ special

Metadata about the private attributes of the model.

__pydantic_complete__ special

Whether model building is completed, or if there are still undefined fields.

__pydantic_custom_init__ special

Whether the model has a custom __init__ method.

__pydantic_decorators__ special

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__ special

Metadata for generic models; contains data used for a similar purpose to args, origin, parameters in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__ special

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__ special

The name of the post-init method for the model, if defined.

__signature__ special

The synthesized __init__ [Signature][inspect.Signature] of the model.

model_computed_fields

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

add_targets (self, additional_targets)

Add targets to transform them the same way as one of existing targets. ex: {'target_image': 'image'} ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'} by the way you must have at least one object with key 'image'

Parameters:

Name Type Description
additional_targets dict

keys - new target name, values - old target name. ex: {'image2': 'image'}

Source code in albumentations/core/transforms_interface.py
Python
def add_targets(self, additional_targets: dict[str, str]) -> None:
    """Add targets to transform them the same way as one of existing targets.
    ex: {'target_image': 'image'}
    ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
    by the way you must have at least one object with key 'image'

    Args:
        additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}

    """
    for k, v in additional_targets.items():
        if k in self._additional_targets and v != self._additional_targets[k]:
            raise ValueError(
                f"Trying to overwrite existed additional targets. "
                f"Key={k} Exists={self._additional_targets[k]} New value: {v}",
            )
        if v in self._available_keys:
            self._additional_targets[k] = v
            self._key2func[k] = self.targets[v]
            self._available_keys.add(k)
apply (self, img, *args, **params)

Apply transform on image.

Source code in albumentations/core/transforms_interface.py
Python
def apply(self, img: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
    """Apply transform on image."""
    raise NotImplementedError
apply_to_images (self, images, **params)

Apply transform on images.

Source code in albumentations/core/transforms_interface.py
Python
def apply_to_images(self, images: np.ndarray, **params: Any) -> list[np.ndarray]:
    """Apply transform on images."""
    return [self.apply(image, **params) for image in images]
apply_with_params (self, params, *args, **kwargs)

Apply transforms with parameters.

Source code in albumentations/core/transforms_interface.py
Python
def apply_with_params(self, params: dict[str, Any], *args: Any, **kwargs: Any) -> dict[str, Any]:
    """Apply transforms with parameters."""
    params = self.update_params(params, **kwargs)  # remove after move parameters like interpolation
    res = {}
    for key, arg in kwargs.items():
        if key in self._key2func and arg is not None:
            target_function = self._key2func[key]
            if isinstance(arg, np.ndarray):
                result = target_function(np.require(arg, requirements=["C_CONTIGUOUS"]), **params)
                if isinstance(result, np.ndarray):
                    res[key] = np.require(result, requirements=["C_CONTIGUOUS"])
                else:
                    res[key] = result
            else:
                res[key] = target_function(arg, **params)
        else:
            res[key] = arg
    return res
get_base_init_args (self)

Returns base init args - p

Source code in albumentations/core/transforms_interface.py
Python
def get_base_init_args(self) -> dict[str, Any]:
    """Returns base init args - p"""
    return {"p": self.p}
get_params (self)

Returns parameters independent of input.

Source code in albumentations/core/transforms_interface.py
Python
def get_params(self) -> dict[str, Any]:
    """Returns parameters independent of input."""
    return {}
get_params_dependent_on_data (self, params, data)

Returns parameters dependent on input.

Source code in albumentations/core/transforms_interface.py
Python
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    """Returns parameters dependent on input."""
    return params
get_params_dependent_on_targets (self, params)

This method is deprecated. Use get_params_dependent_on_data instead. Returns parameters dependent on targets. Dependent target is defined in self.targets_as_params

Source code in albumentations/core/transforms_interface.py
Python
def get_params_dependent_on_targets(self, params: dict[str, Any]) -> dict[str, Any]:
    """This method is deprecated.
    Use `get_params_dependent_on_data` instead.
    Returns parameters dependent on targets.
    Dependent target is defined in `self.targets_as_params`
    """
    return {}
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/core/transforms_interface.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    """Returns names of arguments that are used in __init__ method of the transform."""
    msg = (
        f"Class {self.get_class_fullname()} is not serializable because the `get_transform_init_args_names` "
        "method is not implemented"
    )
    raise NotImplementedError(msg)
set_deterministic (self, flag, save_key='replay')

Set transform to be deterministic.

Source code in albumentations/core/transforms_interface.py
Python
def set_deterministic(self, flag: bool, save_key: str = "replay") -> BasicTransform:
    """Set transform to be deterministic."""
    if save_key == "params":
        msg = "params save_key is reserved"
        raise KeyError(msg)

    self.deterministic = flag
    if self.deterministic and self.targets_as_params:
        warn(
            self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
            " because its' params depend on targets.",
            stacklevel=2,
        )
    self.save_key = save_key
    return self
update_params (self, params, **kwargs)

Update parameters with transform specific params. This method is deprecated, use: - get_params for transform specific params like interpolation and - update_params_shape for data like shape.

Source code in albumentations/core/transforms_interface.py
Python
def update_params(self, params: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
    """Update parameters with transform specific params.
    This method is deprecated, use:
    - `get_params` for transform specific params like interpolation and
    - `update_params_shape` for data like shape.
    """
    if hasattr(self, "interpolation"):
        params["interpolation"] = self.interpolation
    if hasattr(self, "fill_value"):
        params["fill_value"] = self.fill_value
    if hasattr(self, "mask_fill_value"):
        params["mask_fill_value"] = self.mask_fill_value

    # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
    shape = kwargs["image"].shape if "image" in kwargs else kwargs["images"][0].shape
    params["shape"] = shape
    params.update({"cols": shape[1], "rows": shape[0]})
    return params
update_params_shape (self, params, data)

Updates parameters with input image shape.

Source code in albumentations/core/transforms_interface.py
Python
def update_params_shape(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
    """Updates parameters with input image shape."""
    # here we expects `image` or `images` in kwargs. it's checked at Compose._check_args
    shape = data["image"].shape if "image" in data else data["images"][0].shape
    params["shape"] = shape
    params.update({"cols": shape[1], "rows": shape[0]})
    return params

class DualTransform [view source on GitHub]

A base class for transformations that should be applied both to an image and its corresponding properties such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image, all associated entities are transformed accordingly to maintain consistency between the image and its annotations.

Properties

targets (dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints) that the transform should be applied to and maps them to the corresponding methods.

Methods

apply_to_keypoint(keypoint: KeypointInternalType, args: Any, *params: Any) -> KeypointInternalType: Applies the transform to a single keypoint. Should be implemented in the subclass.

apply_to_bboxes(bboxes: np.ndarray, args: Any, *params: Any) -> np.ndarray: Applies the transform to a numpy array of bounding boxes.

apply_to_keypoints(keypoints: np.ndarray, args: Any, *params: Any) -> np.ndarray: Applies the transform to a numpy array of keypoints.

apply_to_mask(mask: np.ndarray, args: Any, *params: Any) -> np.ndarray: Applies the transform specifically to a single mask.

apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]: Applies the transform to a list of masks. Delegates to apply_to_mask for each mask.

Note

This class is intended to be subclassed and should not be used directly. Subclasses are expected to implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the corresponding apply_to_* methods.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py
Python
class DualTransform(BasicTransform):
    """A base class for transformations that should be applied both to an image and its corresponding properties
    such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image,
    all associated entities are transformed accordingly to maintain consistency between the image and its annotations.

    Properties:
        targets (dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints)
            that the transform should be applied to and maps them to the corresponding methods.

    Methods:
        apply_to_keypoint(keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
            Applies the transform to a single keypoint. Should be implemented in the subclass.

        apply_to_bboxes(bboxes: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
            Applies the transform to a numpy array of bounding boxes.

        apply_to_keypoints(keypoints: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
            Applies the transform to a numpy array of keypoints.

        apply_to_mask(mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
            Applies the transform specifically to a single mask.

        apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]:
            Applies the transform to a list of masks. Delegates to `apply_to_mask` for each mask.

    Note:
        This class is intended to be subclassed and should not be used directly. Subclasses are expected to
        implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the
        corresponding `apply_to_*` methods.

    """

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {
            "image": self.apply,
            "images": self.apply_to_images,
            "mask": self.apply_to_mask,
            "masks": self.apply_to_masks,
            "bboxes": self.apply_to_bboxes,
            "keypoints": self.apply_to_keypoints,
        }

    def apply_to_keypoints(self, keypoints: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        msg = f"Method apply_to_keypoints is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_global_label(self, label: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        msg = f"Method apply_to_global_label is not implemented in class {self.__class__.__name__}"
        raise NotImplementedError(msg)

    def apply_to_bboxes(self, bboxes: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        raise NotImplementedError(f"BBoxes not implemented for {self.__class__.__name__}")

    def apply_to_mask(self, mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
        return self.apply(mask, **{k: cv2.INTER_NEAREST if k == "interpolation" else v for k, v in params.items()})

    def apply_to_masks(self, masks: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]:
        return [self.apply_to_mask(mask, **params) for mask in masks]

    def apply_to_global_labels(self, labels: Sequence[np.ndarray], **params: Any) -> list[np.ndarray]:
        return [self.apply_to_global_label(label, **params) for label in labels]

class ImageOnlyTransform [view source on GitHub]

Transform applied to image only.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py
Python
class ImageOnlyTransform(BasicTransform):
    """Transform applied to image only."""

    _targets = Targets.IMAGE

    @property
    def targets(self) -> dict[str, Callable[..., Any]]:
        return {"image": self.apply, "images": self.apply_to_images}

class NoOp [view source on GitHub]

Identity transform (does nothing).

Targets

image, mask, bboxes, keypoints, global_label

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/transforms_interface.py
Python
class NoOp(DualTransform):
    """Identity transform (does nothing).

    Targets:
        image, mask, bboxes, keypoints, global_label
    """

    _targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS, Targets.GLOBAL_LABEL)

    def apply_to_keypoints(self, keypoints: np.ndarray, **params: Any) -> np.ndarray:
        return keypoints

    def apply_to_bboxes(self, bboxes: np.ndarray, **params: Any) -> np.ndarray:
        return bboxes

    def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
        return img

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
        return mask

    def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
        return label

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ()
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/core/transforms_interface.py
Python
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
    return img
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/core/transforms_interface.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ()

utils

class DataProcessor (params, additional_targets=None) [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/utils.py
Python
class DataProcessor(ABC):
    def __init__(self, params: Params, additional_targets: dict[str, str] | None = None):
        self.params = params
        self.data_fields = [self.default_data_name]
        self.label_encoders: dict[str, dict[str, LabelEncoder]] = defaultdict(dict)
        self.is_sequence_input: dict[str, bool] = {}

        if additional_targets is not None:
            self.add_targets(additional_targets)

    @property
    @abstractmethod
    def default_data_name(self) -> str:
        raise NotImplementedError

    def add_targets(self, additional_targets: dict[str, str]) -> None:
        """Add targets to transform them the same way as one of existing targets."""
        for k, v in additional_targets.items():
            if v == self.default_data_name and k not in self.data_fields:
                self.data_fields.append(k)

    def ensure_data_valid(self, data: dict[str, Any]) -> None:
        pass

    def ensure_transforms_valid(self, transforms: Sequence[object]) -> None:
        pass

    def postprocess(self, data: dict[str, Any]) -> dict[str, Any]:
        image_shape = get_shape(data["image"])
        data = self.remove_label_fields_from_data(data)

        for data_name in set(self.data_fields) & set(data.keys()):
            data[data_name] = self.filter(data[data_name], image_shape)
            data[data_name] = self.check_and_convert(data[data_name], image_shape, direction="from")
            # Convert back to list of lists if original input was a list
            if self.is_sequence_input.get(data_name, False):
                data[data_name] = data[data_name].tolist()
        return data

    def preprocess(self, data: dict[str, Any]) -> None:
        image_shape = get_shape(data["image"])

        for data_name in set(self.data_fields) & set(data.keys()):  # Convert list of lists to numpy array if necessary
            if isinstance(data[data_name], Sequence):
                self.is_sequence_input[data_name] = True
                data[data_name] = np.array(data[data_name], dtype=np.float32)
            else:
                self.is_sequence_input[data_name] = False

        data = self.add_label_fields_to_data(data)

        for data_name in set(self.data_fields) & set(data.keys()):
            data[data_name] = self.check_and_convert(data[data_name], image_shape, direction="to")

    def check_and_convert(
        self,
        data: np.ndarray,
        image_shape: tuple[int, int],
        direction: Literal["to", "from"] = "to",
    ) -> np.ndarray:
        if self.params.format == "albumentations":
            self.check(data, image_shape)
            return data

        process_func = self.convert_to_albumentations if direction == "to" else self.convert_from_albumentations

        return process_func(data, image_shape)

    @abstractmethod
    def filter(self, data: np.ndarray, image_shape: tuple[int, int]) -> np.ndarray:
        pass

    @abstractmethod
    def check(self, data: np.ndarray, image_shape: tuple[int, int]) -> None:
        pass

    @abstractmethod
    def convert_to_albumentations(
        self,
        data: np.ndarray,
        image_shape: tuple[int, int],
    ) -> np.ndarray:
        pass

    @abstractmethod
    def convert_from_albumentations(
        self,
        data: np.ndarray,
        image_shape: tuple[int, int],
    ) -> np.ndarray:
        pass

    def add_label_fields_to_data(self, data: dict[str, Any]) -> dict[str, Any]:
        if not self.params.label_fields:
            return data

        for data_name in set(self.data_fields) & set(data.keys()):
            data_array = data[data_name]
            if not data_array.size:
                continue
            for label_field in self.params.label_fields:
                if len(data[data_name]) != len(data[label_field]):
                    raise ValueError(
                        f"The lengths of {data_name} and {label_field} do not match. Got {len(data[data_name])} "
                        f"and {len(data[label_field])} respectively.",
                    )

                # Encode labels
                encoder = LabelEncoder()
                encoded_labels = encoder.fit_transform(data[label_field])
                self.label_encoders[data_name][label_field] = encoder

                # Attach encoded labels as extra columns
                encoded_labels = encoded_labels.reshape(-1, 1)

                data_array = np.hstack((data_array, encoded_labels))

            data[data_name] = data_array
        return data

    def remove_label_fields_from_data(self, data: dict[str, Any]) -> dict[str, Any]:
        if not self.params.label_fields:
            return data

        for data_name in set(self.data_fields) & set(data.keys()):
            data_array = data[data_name]
            if not data_array.size:
                continue

            num_label_fields = len(self.params.label_fields)
            non_label_columns = data_array.shape[1] - num_label_fields

            for idx, label_field in enumerate(self.params.label_fields):
                encoded_labels = data_array[:, non_label_columns + idx]
                encoder = self.label_encoders.get(data_name, {}).get(label_field)
                if encoder:
                    decoded_labels = encoder.inverse_transform(encoded_labels.astype(int))
                    data[label_field] = decoded_labels.tolist()
                else:
                    raise ValueError(f"Label encoder for {label_field} not found")

            # Remove label columns from data
            data[data_name] = data_array[:, :non_label_columns]
        return data
add_targets (self, additional_targets)

Add targets to transform them the same way as one of existing targets.

Source code in albumentations/core/utils.py
Python
def add_targets(self, additional_targets: dict[str, str]) -> None:
    """Add targets to transform them the same way as one of existing targets."""
    for k, v in additional_targets.items():
        if v == self.default_data_name and k not in self.data_fields:
            self.data_fields.append(k)

def to_tuple (param, low=None, bias=None) [view source on GitHub]

Convert input argument to a min-max tuple.

Parameters:

Name Type Description
param ScaleType

Input value which could be a scalar or a sequence of exactly 2 scalars.

low ScaleType | None

Second element of the tuple, provided as an optional argument for when param is a scalar.

bias ScalarType | None

An offset added to both elements of the tuple.

Returns:

Type Description
tuple[int, int] | tuple[float, float]

A tuple of two scalars, optionally adjusted by bias. Raises ValueError for invalid combinations or types of arguments.

Source code in albumentations/core/utils.py
Python
def to_tuple(
    param: ScaleType,
    low: ScaleType | None = None,
    bias: ScalarType | None = None,
) -> tuple[int, int] | tuple[float, float]:
    """Convert input argument to a min-max tuple.

    Args:
        param: Input value which could be a scalar or a sequence of exactly 2 scalars.
        low: Second element of the tuple, provided as an optional argument for when `param` is a scalar.
        bias: An offset added to both elements of the tuple.

    Returns:
        A tuple of two scalars, optionally adjusted by `bias`.
        Raises ValueError for invalid combinations or types of arguments.

    """
    # Validate mutually exclusive arguments
    if low is not None and bias is not None:
        msg = "Arguments 'low' and 'bias' cannot be used together."
        raise ValueError(msg)

    if isinstance(param, Sequence) and len(param) == PAIR:
        min_val, max_val = min(param), max(param)

    # Handle scalar input
    elif isinstance(param, (int, float)):
        if isinstance(low, (int, float)):
            # Use low and param to create a tuple
            min_val, max_val = (low, param) if low < param else (param, low)
        else:
            # Create a symmetric tuple around 0
            min_val, max_val = -param, param
    else:
        msg = "Argument 'param' must be either a scalar or a sequence of 2 elements."
        raise ValueError(msg)

    # Apply bias if provided
    if bias is not None:
        return (bias + min_val, bias + max_val)

    return min_val, max_val

validation

class ValidatedTransformMeta [view source on GitHub]

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/core/validation.py
Python
class ValidatedTransformMeta(type):
    def __new__(cls: type[Any], name: str, bases: tuple[type, ...], dct: dict[str, Any]) -> type[Any]:
        if "InitSchema" in dct and issubclass(dct["InitSchema"], BaseModel):
            original_init: Callable[..., Any] | None = dct.get("__init__")
            if original_init is None:
                msg = "__init__ not found in class definition"
                raise ValueError(msg)

            original_sig = signature(original_init)

            def custom_init(self: Any, *args: Any, **kwargs: Any) -> None:
                init_params = signature(original_init).parameters
                param_names = list(init_params.keys())[1:]  # Exclude 'self'
                full_kwargs: dict[str, Any] = dict(zip(param_names, args))
                full_kwargs.update(kwargs)

                for parameter_name, parameter in init_params.items():
                    if (
                        parameter_name != "self"
                        and parameter_name not in full_kwargs
                        and parameter.default is not Parameter.empty
                    ):
                        full_kwargs[parameter_name] = parameter.default

                # No try-except block needed as we want the exception to propagate naturally
                config = dct["InitSchema"](**full_kwargs)

                validated_kwargs = config.model_dump()
                for name_arg in kwargs:
                    if name_arg not in validated_kwargs:
                        warn(
                            f"Argument '{name_arg}' is not valid and will be ignored.",
                            stacklevel=2,
                        )

                original_init(self, **validated_kwargs)

            # Preserve the original signature and docstring
            custom_init.__signature__ = original_sig  # type: ignore[attr-defined]
            custom_init.__doc__ = original_init.__doc__

            # Rename __init__ to custom_init to avoid the N807 warning
            dct["__init__"] = custom_init

        return super().__new__(cls, name, bases, dct)
__new__ (cls, name, bases, dct) special staticmethod

Create and return a new object. See help(type) for accurate signature.

Source code in albumentations/core/validation.py
Python
def __new__(cls: type[Any], name: str, bases: tuple[type, ...], dct: dict[str, Any]) -> type[Any]:
    if "InitSchema" in dct and issubclass(dct["InitSchema"], BaseModel):
        original_init: Callable[..., Any] | None = dct.get("__init__")
        if original_init is None:
            msg = "__init__ not found in class definition"
            raise ValueError(msg)

        original_sig = signature(original_init)

        def custom_init(self: Any, *args: Any, **kwargs: Any) -> None:
            init_params = signature(original_init).parameters
            param_names = list(init_params.keys())[1:]  # Exclude 'self'
            full_kwargs: dict[str, Any] = dict(zip(param_names, args))
            full_kwargs.update(kwargs)

            for parameter_name, parameter in init_params.items():
                if (
                    parameter_name != "self"
                    and parameter_name not in full_kwargs
                    and parameter.default is not Parameter.empty
                ):
                    full_kwargs[parameter_name] = parameter.default

            # No try-except block needed as we want the exception to propagate naturally
            config = dct["InitSchema"](**full_kwargs)

            validated_kwargs = config.model_dump()
            for name_arg in kwargs:
                if name_arg not in validated_kwargs:
                    warn(
                        f"Argument '{name_arg}' is not valid and will be ignored.",
                        stacklevel=2,
                    )

            original_init(self, **validated_kwargs)

        # Preserve the original signature and docstring
        custom_init.__signature__ = original_sig  # type: ignore[attr-defined]
        custom_init.__doc__ = original_init.__doc__

        # Rename __init__ to custom_init to avoid the N807 warning
        dct["__init__"] = custom_init

    return super().__new__(cls, name, bases, dct)

pytorch special

transforms

class ToTensorV2 (transpose_mask=False, p=1.0, always_apply=None) [view source on GitHub]

Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy HWC format and converts them to PyTorch CHW format. If the image is in HW format, it will be converted to PyTorch HW.

Attributes:

Name Type Description
transpose_mask bool

If True, transposes 3D input mask dimensions from [height, width, num_channels] to [num_channels, height, width].

always_apply bool

Deprecated. Default: None.

p float

Probability of applying the transform. Default: 1.0.

Interactive Tool Available!

Explore this transform visually and adjust parameters interactively using this tool:

Open Tool

Source code in albumentations/pytorch/transforms.py
Python
class ToTensorV2(BasicTransform):
    """Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy `HWC` format
    and converts them to PyTorch `CHW` format. If the image is in `HW` format, it will be converted to PyTorch `HW`.

    Attributes:
        transpose_mask (bool): If True, transposes 3D input mask dimensions from `[height, width, num_channels]` to
            `[num_channels, height, width]`.
        always_apply (bool): Deprecated. Default: None.
        p (float): Probability of applying the transform. Default: 1.0.

    """

    _targets = (Targets.IMAGE, Targets.MASK)

    def __init__(self, transpose_mask: bool = False, p: float = 1.0, always_apply: bool | None = None):
        super().__init__(p=p, always_apply=always_apply)
        self.transpose_mask = transpose_mask

    @property
    def targets(self) -> dict[str, Any]:
        return {"image": self.apply, "mask": self.apply_to_mask, "masks": self.apply_to_masks}

    def apply(self, img: np.ndarray, **params: Any) -> torch.Tensor:
        if len(img.shape) not in [2, 3]:
            msg = "Albumentations only supports images in HW or HWC format"
            raise ValueError(msg)

        if len(img.shape) == MONO_CHANNEL_DIMENSIONS:
            img = np.expand_dims(img, 2)

        return torch.from_numpy(img.transpose(2, 0, 1))

    def apply_to_mask(self, mask: np.ndarray, **params: Any) -> torch.Tensor:
        if self.transpose_mask and mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS:
            mask = mask.transpose(2, 0, 1)
        return torch.from_numpy(mask)

    def apply_to_masks(self, masks: list[np.ndarray], **params: Any) -> list[torch.Tensor]:
        return [self.apply_to_mask(mask, **params) for mask in masks]

    def get_transform_init_args_names(self) -> tuple[str, ...]:
        return ("transpose_mask",)
__init__ (self, transpose_mask=False, p=1.0, always_apply=None) special

Initialize self. See help(type(self)) for accurate signature.

Source code in albumentations/pytorch/transforms.py
Python
def __init__(self, transpose_mask: bool = False, p: float = 1.0, always_apply: bool | None = None):
    super().__init__(p=p, always_apply=always_apply)
    self.transpose_mask = transpose_mask
apply (self, img, **params)

Apply transform on image.

Source code in albumentations/pytorch/transforms.py
Python
def apply(self, img: np.ndarray, **params: Any) -> torch.Tensor:
    if len(img.shape) not in [2, 3]:
        msg = "Albumentations only supports images in HW or HWC format"
        raise ValueError(msg)

    if len(img.shape) == MONO_CHANNEL_DIMENSIONS:
        img = np.expand_dims(img, 2)

    return torch.from_numpy(img.transpose(2, 0, 1))
get_transform_init_args_names (self)

Returns names of arguments that are used in init method of the transform.

Source code in albumentations/pytorch/transforms.py
Python
def get_transform_init_args_names(self) -> tuple[str, ...]:
    return ("transpose_mask",)

random_utils

def shuffle (a, random_state=None) [view source on GitHub]

Shuffles an array in-place, using a specified random state or creating a new one if not provided.

Parameters:

Name Type Description
a np.ndarray

The array to be shuffled.

random_state Optional[np.random.RandomState]

The random state used for shuffling. Defaults to None.

Returns:

Type Description
np.ndarray

The shuffled array (note: the shuffle is in-place, so the original array is modified).

Source code in albumentations/random_utils.py
Python
def shuffle(
    a: np.ndarray,
    random_state: np.random.RandomState | None = None,
) -> np.ndarray:
    """Shuffles an array in-place, using a specified random state or creating a new one if not provided.

    Args:
        a (np.ndarray): The array to be shuffled.
        random_state (Optional[np.random.RandomState], optional): The random state used for shuffling. Defaults to None.

    Returns:
        np.ndarray: The shuffled array (note: the shuffle is in-place, so the original array is modified).
    """
    if random_state is None:
        random_state = get_random_state()
    random_state.shuffle(a)
    return a