Full API Reference on a single page¶
Pixel-level transforms¶
Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.
- AdditiveNoise
- AdvancedBlur
- AutoContrast
- Blur
- CLAHE
- ChannelDropout
- ChannelShuffle
- ChromaticAberration
- ColorJitter
- Defocus
- Downscale
- Emboss
- Equalize
- FDA
- FancyPCA
- FromFloat
- GaussianBlur
- GlassBlur
- HistogramMatching
- HueSaturationValue
- ISONoise
- Illumination
- ImageCompression
- InvertImg
- MedianBlur
- MotionBlur
- MultiplicativeNoise
- Normalize
- PixelDistributionAdaptation
- PlanckianJitter
- PlasmaBrightnessContrast
- PlasmaShadow
- Posterize
- RGBShift
- RandomBrightnessContrast
- RandomFog
- RandomGamma
- RandomGravel
- RandomRain
- RandomShadow
- RandomSnow
- RandomSunFlare
- RandomToneCurve
- RingingOvershoot
- SaltAndPepper
- Sharpen
- ShotNoise
- Solarize
- Spatter
- Superpixels
- TemplateTransform
- TextImage
- ToFloat
- ToGray
- ToRGB
- ToSepia
- UnsharpMask
- ZoomBlur
Spatial-level transforms¶
Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.
augmentations
special
¶
blur
special
¶
functional
¶
def create_motion_kernel (kernel_size, angle, direction, allow_shifted, random_state)
[view source on GitHub]¶
Create a motion blur kernel.
Parameters:
Name | Type | Description |
---|---|---|
kernel_size | int | Size of the kernel (must be odd) |
angle | float | Angle in degrees (counter-clockwise) |
direction | float | Blur direction (-1.0 to 1.0) |
allow_shifted | bool | Allow kernel to be randomly shifted from center |
random_state | Random | Python's random.Random instance |
Returns:
Type | Description |
---|---|
np.ndarray | Motion blur kernel |
Source code in albumentations/augmentations/blur/functional.py
def create_motion_kernel(
kernel_size: int,
angle: float,
direction: float,
allow_shifted: bool,
random_state: Random,
) -> np.ndarray:
"""Create a motion blur kernel.
Args:
kernel_size: Size of the kernel (must be odd)
angle: Angle in degrees (counter-clockwise)
direction: Blur direction (-1.0 to 1.0)
allow_shifted: Allow kernel to be randomly shifted from center
random_state: Python's random.Random instance
Returns:
Motion blur kernel
"""
kernel = np.zeros((kernel_size, kernel_size), dtype=np.float32)
center = kernel_size // 2
# Convert angle to radians
angle_rad = np.deg2rad(angle)
# Calculate direction vector
dx = np.cos(angle_rad)
dy = np.sin(angle_rad)
# Create line points with direction bias
line_length = kernel_size // 2
t = np.linspace(-line_length, line_length, kernel_size * 2)
# Apply direction bias
if direction != 0:
t = t * (1 + abs(direction))
if direction < 0:
t = t * -1
# Generate line coordinates
x = center + dx * t
y = center + dy * t
# Apply random shift if allowed
if allow_shifted and random_state is not None:
shift_x = random_state.uniform(-1, 1) * line_length / 2
shift_y = random_state.uniform(-1, 1) * line_length / 2
x += shift_x
y += shift_y
# Round coordinates and clip to kernel bounds
x = np.clip(np.round(x), 0, kernel_size - 1).astype(int)
y = np.clip(np.round(y), 0, kernel_size - 1).astype(int)
# Keep only unique points to avoid multiple assignments
points = np.unique(np.column_stack([y, x]), axis=0)
kernel[points[:, 0], points[:, 1]] = 1
# Ensure at least one point is set
if not kernel.any():
kernel[center, center] = 1
return kernel
def process_blur_limit (value, info, min_value=0)
[view source on GitHub]¶
Process blur limit to ensure valid kernel sizes.
Source code in albumentations/augmentations/blur/functional.py
def process_blur_limit(value: ScaleIntType, info: ValidationInfo, min_value: int = 0) -> tuple[int, int]:
"""Process blur limit to ensure valid kernel sizes."""
result = value if isinstance(value, Sequence) else (min_value, value)
result = _ensure_min_value(result, min_value, info.field_name)
result = _ensure_odd_values(result, info.field_name)
if result[0] > result[1]:
final_result = (result[1], result[1])
warn(
f"{info.field_name}: Invalid range {result} (min > max). "
f"Range automatically adjusted to {final_result}.",
UserWarning,
stacklevel=2,
)
return final_result
return result
def sample_odd_from_range (random_state, low, high)
[view source on GitHub]¶
Sample an odd number from the range [low, high] (inclusive).
Parameters:
Name | Type | Description |
---|---|---|
random_state | Random | instance of random.Random |
low | int | lower bound (will be converted to nearest valid odd number) |
high | int | upper bound (will be converted to nearest valid odd number) |
Returns:
Type | Description |
---|---|
int | Randomly sampled odd number from the range |
Note
- Input values will be converted to nearest valid odd numbers:
- Values less than 3 will become 3
- Even values will be rounded up to next odd number
- After normalization, high must be >= low
Source code in albumentations/augmentations/blur/functional.py
def sample_odd_from_range(random_state: Random, low: int, high: int) -> int:
"""Sample an odd number from the range [low, high] (inclusive).
Args:
random_state: instance of random.Random
low: lower bound (will be converted to nearest valid odd number)
high: upper bound (will be converted to nearest valid odd number)
Returns:
Randomly sampled odd number from the range
Note:
- Input values will be converted to nearest valid odd numbers:
* Values less than 3 will become 3
* Even values will be rounded up to next odd number
- After normalization, high must be >= low
"""
# Normalize low value
low = max(3, low + (low % 2 == 0))
# Normalize high value
high = max(3, high + (high % 2 == 0))
# Ensure high >= low after normalization
high = max(high, low)
if low == high:
return low
# Calculate number of possible odd values
num_odd_values = (high - low) // 2 + 1
# Generate random index and convert to corresponding odd number
rand_idx = random_state.randint(0, num_odd_values - 1)
return low + (2 * rand_idx)
transforms
¶
class AdvancedBlur
(blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=(-90, 90), beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=None, p=0.5)
[view source on GitHub] ¶
Applies a Generalized Gaussian blur to the input image with randomized parameters for advanced data augmentation.
This transform creates a custom blur kernel based on the Generalized Gaussian distribution, which allows for a wide range of blur effects beyond standard Gaussian blur. It then applies this kernel to the input image through convolution. The transform also incorporates noise into the kernel, resulting in a unique combination of blurring and noise injection.
Key features of this augmentation:
-
Generalized Gaussian Kernel: Uses a generalized normal distribution to create kernels that can range from box-like blurs to very peaked blurs, controlled by the beta parameter.
-
Anisotropic Blurring: Allows for different blur strengths in horizontal and vertical directions (controlled by sigma_x and sigma_y), and rotation of the kernel.
-
Kernel Noise: Adds multiplicative noise to the kernel before applying it to the image, creating more diverse and realistic blur effects.
Implementation Details: The kernel is generated using a 2D Generalized Gaussian function. The process involves: 1. Creating a 2D grid based on the kernel size 2. Applying rotation to this grid 3. Calculating the kernel values using the Generalized Gaussian formula 4. Adding multiplicative noise to the kernel 5. Normalizing the kernel
The resulting kernel is then applied to the image using convolution.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | tuple[int, int] | int | Controls the size of the blur kernel. If a single int is provided, the kernel size will be randomly chosen between 3 and that value. Must be odd and ≥ 3. Larger values create stronger blur effects. Default: (3, 7) |
sigma_x_limit | tuple[float, float] | float | Controls the spread of the blur in the x direction. Higher values increase blur strength. If a single float is provided, the range will be (0, limit). Default: (0.2, 1.0) |
sigma_y_limit | tuple[float, float] | float | Controls the spread of the blur in the y direction. Higher values increase blur strength. If a single float is provided, the range will be (0, limit). Default: (0.2, 1.0) |
rotate_limit | tuple[int, int] | int | Range of angles (in degrees) for rotating the kernel. This rotation allows for diagonal blur directions. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Default: (-90, 90) |
beta_limit | tuple[float, float] | float | Shape parameter of the Generalized Gaussian distribution. - beta = 1 gives a standard Gaussian distribution - beta < 1 creates heavier tails, resulting in more uniform, box-like blur - beta > 1 creates lighter tails, resulting in more peaked, focused blur Default: (0.5, 8.0) |
noise_limit | tuple[float, float] | float | Controls the strength of multiplicative noise applied to the kernel. Values around 1.0 keep the original kernel mostly intact, while values further from 1.0 introduce more variation. Default: (0.75, 1.25) |
p | float | Probability of applying the transform. Default: 0.5 |
Notes
- This transform is particularly useful for simulating complex, real-world blur effects that go beyond simple Gaussian blur.
- The combination of blur and noise can help in creating more robust models by simulating a wider range of image degradations.
- Extreme values, especially for beta and noise, may result in unrealistic effects and should be used cautiously.
Reference
This transform is inspired by techniques described in: "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" https://arxiv.org/abs/2107.10833
Targets
image
Image types: uint8, float32
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class AdvancedBlur(ImageOnlyTransform):
"""Applies a Generalized Gaussian blur to the input image with randomized parameters for advanced data augmentation.
This transform creates a custom blur kernel based on the Generalized Gaussian distribution,
which allows for a wide range of blur effects beyond standard Gaussian blur. It then applies
this kernel to the input image through convolution. The transform also incorporates noise
into the kernel, resulting in a unique combination of blurring and noise injection.
Key features of this augmentation:
1. Generalized Gaussian Kernel: Uses a generalized normal distribution to create kernels
that can range from box-like blurs to very peaked blurs, controlled by the beta parameter.
2. Anisotropic Blurring: Allows for different blur strengths in horizontal and vertical
directions (controlled by sigma_x and sigma_y), and rotation of the kernel.
3. Kernel Noise: Adds multiplicative noise to the kernel before applying it to the image,
creating more diverse and realistic blur effects.
Implementation Details:
The kernel is generated using a 2D Generalized Gaussian function. The process involves:
1. Creating a 2D grid based on the kernel size
2. Applying rotation to this grid
3. Calculating the kernel values using the Generalized Gaussian formula
4. Adding multiplicative noise to the kernel
5. Normalizing the kernel
The resulting kernel is then applied to the image using convolution.
Args:
blur_limit (tuple[int, int] | int, optional): Controls the size of the blur kernel. If a single int
is provided, the kernel size will be randomly chosen between 3 and that value.
Must be odd and ≥ 3. Larger values create stronger blur effects.
Default: (3, 7)
sigma_x_limit (tuple[float, float] | float): Controls the spread of the blur in the x direction.
Higher values increase blur strength.
If a single float is provided, the range will be (0, limit).
Default: (0.2, 1.0)
sigma_y_limit (tuple[float, float] | float): Controls the spread of the blur in the y direction.
Higher values increase blur strength.
If a single float is provided, the range will be (0, limit).
Default: (0.2, 1.0)
rotate_limit (tuple[int, int] | int): Range of angles (in degrees) for rotating the kernel.
This rotation allows for diagonal blur directions. If limit is a single int, an angle is picked
from (-rotate_limit, rotate_limit).
Default: (-90, 90)
beta_limit (tuple[float, float] | float): Shape parameter of the Generalized Gaussian distribution.
- beta = 1 gives a standard Gaussian distribution
- beta < 1 creates heavier tails, resulting in more uniform, box-like blur
- beta > 1 creates lighter tails, resulting in more peaked, focused blur
Default: (0.5, 8.0)
noise_limit (tuple[float, float] | float): Controls the strength of multiplicative noise
applied to the kernel. Values around 1.0 keep the original kernel mostly intact,
while values further from 1.0 introduce more variation.
Default: (0.75, 1.25)
p (float): Probability of applying the transform. Default: 0.5
Notes:
- This transform is particularly useful for simulating complex, real-world blur effects
that go beyond simple Gaussian blur.
- The combination of blur and noise can help in creating more robust models by simulating
a wider range of image degradations.
- Extreme values, especially for beta and noise, may result in unrealistic effects and
should be used cautiously.
Reference:
This transform is inspired by techniques described in:
"Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data"
https://arxiv.org/abs/2107.10833
Targets:
image
Image types:
uint8, float32
"""
class InitSchema(BlurInitSchema):
sigma_x_limit: NonNegativeFloatRangeType
sigma_y_limit: NonNegativeFloatRangeType
beta_limit: NonNegativeFloatRangeType
noise_limit: NonNegativeFloatRangeType
rotate_limit: SymmetricRangeType
@field_validator("beta_limit")
@classmethod
def check_beta_limit(cls, value: ScaleFloatType) -> tuple[float, float]:
result = to_tuple(value, low=0)
if not (result[0] < 1.0 < result[1]):
msg = "beta_limit is expected to include 1.0."
raise ValueError(msg)
return result
@model_validator(mode="after")
def validate_limits(self) -> Self:
if (
isinstance(self.sigma_x_limit, (tuple, list))
and self.sigma_x_limit[0] == 0
and isinstance(self.sigma_y_limit, (tuple, list))
and self.sigma_y_limit[0] == 0
):
msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
raise ValueError(msg)
return self
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigma_x_limit: ScaleFloatType = (0.2, 1.0),
sigma_y_limit: ScaleFloatType = (0.2, 1.0),
sigmaX_limit: ScaleFloatType | None = None, # noqa: N803
sigmaY_limit: ScaleFloatType | None = None, # noqa: N803
rotate_limit: ScaleIntType = (-90, 90),
beta_limit: ScaleFloatType = (0.5, 8.0),
noise_limit: ScaleFloatType = (0.9, 1.1),
always_apply: bool | None = None,
p: float = 0.5,
):
super().__init__(p=p, always_apply=always_apply)
if sigmaX_limit is not None:
warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning, stacklevel=2)
sigma_x_limit = sigmaX_limit
if sigmaY_limit is not None:
warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning, stacklevel=2)
sigma_y_limit = sigmaY_limit
self.blur_limit = cast(tuple[int, int], blur_limit)
self.sigma_x_limit = cast(tuple[float, float], sigma_x_limit)
self.sigma_y_limit = cast(tuple[float, float], sigma_y_limit)
self.rotate_limit = cast(tuple[int, int], rotate_limit)
self.beta_limit = cast(tuple[float, float], beta_limit)
self.noise_limit = cast(tuple[float, float], noise_limit)
def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
return fmain.convolve(img, kernel=kernel)
def get_params(self) -> dict[str, np.ndarray]:
ksize = fblur.sample_odd_from_range(self.py_random, self.blur_limit[0], self.blur_limit[1])
sigma_x = self.py_random.uniform(*self.sigma_x_limit)
sigma_y = self.py_random.uniform(*self.sigma_y_limit)
angle = np.deg2rad(self.py_random.uniform(*self.rotate_limit))
# Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
beta = (
self.py_random.uniform(self.beta_limit[0], 1)
if self.py_random.random() < HALF
else self.py_random.uniform(1, self.beta_limit[1])
)
noise_matrix = self.random_generator.uniform(*self.noise_limit, size=(ksize, ksize))
# Generate mesh grid centered at zero.
ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
# > Shape (ksize, ksize, 2)
grid = np.stack(np.meshgrid(ax, ax), axis=-1)
# Calculate rotated sigma matrix
d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))
inverse_sigma = np.linalg.inv(sigma_matrix)
# Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
# Add noise
kernel *= noise_matrix
# Normalize kernel
kernel = kernel.astype(np.float32) / np.sum(kernel)
return {"kernel": kernel}
def get_transform_init_args_names(self) -> tuple[str, str, str, str, str, str]:
return (
"blur_limit",
"sigma_x_limit",
"sigma_y_limit",
"rotate_limit",
"beta_limit",
"noise_limit",
)
class Blur
(blur_limit=(3, 7), p=0.5, always_apply=None)
[view source on GitHub] ¶
Apply uniform box blur to the input image using a randomly sized square kernel.
This transform uses OpenCV's cv2.blur function, which performs a simple box filter blur. The size of the blur kernel is randomly selected for each application, allowing for varying degrees of blur intensity.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | tuple[int, int] | int | Controls the range of the blur kernel size. - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. The kernel size must be odd and greater than or equal to 3. Larger kernel sizes produce stronger blur effects. Default: (3, 7) |
p | float | Probability of applying the transform. Default: 0.5 |
Notes
- The blur kernel is always square (same width and height).
- Only odd kernel sizes are used to ensure the blur has a clear center pixel.
- Box blur is faster than Gaussian blur but may produce less natural results.
- This blur method averages all pixels under the kernel area, which can reduce noise but also reduce image detail.
Targets
image
Image types: uint8, float32
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Blur(blur_limit=(3, 7), p=1.0)
>>> result = transform(image=image)
>>> blurred_image = result["image"]
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class Blur(ImageOnlyTransform):
"""Apply uniform box blur to the input image using a randomly sized square kernel.
This transform uses OpenCV's cv2.blur function, which performs a simple box filter blur.
The size of the blur kernel is randomly selected for each application, allowing for
varying degrees of blur intensity.
Args:
blur_limit (tuple[int, int] | int): Controls the range of the blur kernel size.
- If a single int is provided, the kernel size will be randomly chosen
between 3 and that value.
- If a tuple of two ints is provided, it defines the inclusive range
of possible kernel sizes.
The kernel size must be odd and greater than or equal to 3.
Larger kernel sizes produce stronger blur effects.
Default: (3, 7)
p (float): Probability of applying the transform. Default: 0.5
Notes:
- The blur kernel is always square (same width and height).
- Only odd kernel sizes are used to ensure the blur has a clear center pixel.
- Box blur is faster than Gaussian blur but may produce less natural results.
- This blur method averages all pixels under the kernel area, which can
reduce noise but also reduce image detail.
Targets:
image
Image types:
uint8, float32
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Blur(blur_limit=(3, 7), p=1.0)
>>> result = transform(image=image)
>>> blurred_image = result["image"]
"""
class InitSchema(BlurInitSchema):
pass
def __init__(self, blur_limit: ScaleIntType = (3, 7), p: float = 0.5, always_apply: bool | None = None):
super().__init__(p=p, always_apply=always_apply)
self.blur_limit = cast(tuple[int, int], blur_limit)
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
return fblur.blur(img, kernel)
def get_params(self) -> dict[str, Any]:
kernel = fblur.sample_odd_from_range(
self.py_random,
self.blur_limit[0],
self.blur_limit[1],
)
return {"kernel": kernel}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return ("blur_limit",)
class BlurInitSchema
[view source on GitHub] ¶
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class Defocus
(radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=None, p=0.5)
[view source on GitHub] ¶
Apply defocus blur to the input image.
This transform simulates the effect of an out-of-focus camera by applying a defocus blur to the image. It uses a combination of disc kernels and Gaussian blur to create a realistic defocus effect.
Parameters:
Name | Type | Description |
---|---|---|
radius | tuple[int, int] | int | Range for the radius of the defocus blur. If a single int is provided, the range will be [1, radius]. Larger values create a stronger blur effect. Default: (3, 10) |
alias_blur | tuple[float, float] | float | Range for the standard deviation of the Gaussian blur applied after the main defocus blur. This helps to reduce aliasing artifacts. If a single float is provided, the range will be (0, alias_blur). Larger values create a smoother, more aliased effect. Default: (0.1, 0.5) |
p | float | Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5 |
Targets
image
Image types: uint8, float32
Note
- The defocus effect is created using a disc kernel, which simulates the shape of a camera's aperture.
- The additional Gaussian blur (alias_blur) helps to soften the edges of the disc kernel, creating a more natural-looking defocus effect.
- Larger radius values will create a stronger, more noticeable defocus effect.
- The alias_blur parameter can be used to fine-tune the appearance of the defocus, with larger values creating a smoother, potentially more realistic effect.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Defocus(radius=(4, 8), alias_blur=(0.2, 0.4), always_apply=True)
>>> result = transform(image=image)
>>> defocused_image = result['image']
References
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class Defocus(ImageOnlyTransform):
"""Apply defocus blur to the input image.
This transform simulates the effect of an out-of-focus camera by applying a defocus blur
to the image. It uses a combination of disc kernels and Gaussian blur to create a realistic
defocus effect.
Args:
radius (tuple[int, int] | int): Range for the radius of the defocus blur.
If a single int is provided, the range will be [1, radius].
Larger values create a stronger blur effect.
Default: (3, 10)
alias_blur (tuple[float, float] | float): Range for the standard deviation of the Gaussian blur
applied after the main defocus blur. This helps to reduce aliasing artifacts.
If a single float is provided, the range will be (0, alias_blur).
Larger values create a smoother, more aliased effect.
Default: (0.1, 0.5)
p (float): Probability of applying the transform. Should be in the range [0, 1].
Default: 0.5
Targets:
image
Image types:
uint8, float32
Note:
- The defocus effect is created using a disc kernel, which simulates the shape of a camera's aperture.
- The additional Gaussian blur (alias_blur) helps to soften the edges of the disc kernel, creating a
more natural-looking defocus effect.
- Larger radius values will create a stronger, more noticeable defocus effect.
- The alias_blur parameter can be used to fine-tune the appearance of the defocus, with larger values
creating a smoother, potentially more realistic effect.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Defocus(radius=(4, 8), alias_blur=(0.2, 0.4), always_apply=True)
>>> result = transform(image=image)
>>> defocused_image = result['image']
References:
- https://en.wikipedia.org/wiki/Defocus_aberration
- https://www.researchgate.net/publication/261311609_Realistic_Defocus_Blur_for_Multiplane_Computer-Generated_Holography
"""
class InitSchema(BaseTransformInitSchema):
radius: OnePlusIntRangeType
alias_blur: NonNegativeFloatRangeType
def __init__(
self,
radius: ScaleIntType = (3, 10),
alias_blur: ScaleFloatType = (0.1, 0.5),
always_apply: bool | None = None,
p: float = 0.5,
):
super().__init__(p=p, always_apply=always_apply)
self.radius = cast(tuple[int, int], radius)
self.alias_blur = cast(tuple[float, float], alias_blur)
def apply(self, img: np.ndarray, radius: int, alias_blur: float, **params: Any) -> np.ndarray:
return fblur.defocus(img, radius, alias_blur)
def get_params(self) -> dict[str, Any]:
return {
"radius": self.py_random.randint(*self.radius),
"alias_blur": self.py_random.uniform(*self.alias_blur),
}
def get_transform_init_args_names(self) -> tuple[str, str]:
return ("radius", "alias_blur")
class GaussianBlur
(blur_limit=(3, 7), sigma_limit=0, always_apply=None, p=0.5)
[view source on GitHub] ¶
Apply Gaussian blur to the input image using a randomly sized kernel.
This transform blurs the input image using a Gaussian filter with a random kernel size and sigma value. Gaussian blur is a widely used image processing technique that reduces image noise and detail, creating a smoothing effect.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | tuple[int, int] | int | Controls the range of the Gaussian kernel size. - If a single int is provided, the kernel size will be randomly chosen between 0 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as |
sigma_limit | tuple[float, float] | float | Range for the Gaussian kernel standard deviation (sigma). Must be in range [0, inf). - If a single float is provided, sigma will be randomly chosen between 0 and that value. - If a tuple of two floats is provided, it defines the inclusive range of possible sigma values. If set to 0, sigma will be computed as |
p | float | Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5 |
Targets
image
Image types: uint8, float32
Number of channels: Any
Note
- The relationship between kernel size and sigma affects the blur strength: larger kernel sizes allow for stronger blurring effects.
- When both blur_limit and sigma_limit are set to ranges starting from 0, the blur_limit minimum is automatically set to 3 to ensure a valid kernel size.
- For uint8 images, the computation might be faster than for floating-point images.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GaussianBlur(blur_limit=(3, 7), sigma_limit=(0.1, 2), p=1)
>>> result = transform(image=image)
>>> blurred_image = result["image"]
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class GaussianBlur(ImageOnlyTransform):
"""Apply Gaussian blur to the input image using a randomly sized kernel.
This transform blurs the input image using a Gaussian filter with a random kernel size
and sigma value. Gaussian blur is a widely used image processing technique that reduces
image noise and detail, creating a smoothing effect.
Args:
blur_limit (tuple[int, int] | int): Controls the range of the Gaussian kernel size.
- If a single int is provided, the kernel size will be randomly chosen
between 0 and that value.
- If a tuple of two ints is provided, it defines the inclusive range
of possible kernel sizes.
Must be zero or odd and in range [0, inf). If set to 0, it will be computed
from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
Larger kernel sizes produce stronger blur effects.
Default: (3, 7)
sigma_limit (tuple[float, float] | float): Range for the Gaussian kernel standard
deviation (sigma). Must be in range [0, inf).
- If a single float is provided, sigma will be randomly chosen
between 0 and that value.
- If a tuple of two floats is provided, it defines the inclusive range
of possible sigma values.
If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
Larger sigma values produce stronger blur effects.
Default: 0
p (float): Probability of applying the transform. Should be in the range [0, 1].
Default: 0.5
Targets:
image
Image types:
uint8, float32
Number of channels:
Any
Note:
- The relationship between kernel size and sigma affects the blur strength:
larger kernel sizes allow for stronger blurring effects.
- When both blur_limit and sigma_limit are set to ranges starting from 0,
the blur_limit minimum is automatically set to 3 to ensure a valid kernel size.
- For uint8 images, the computation might be faster than for floating-point images.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GaussianBlur(blur_limit=(3, 7), sigma_limit=(0.1, 2), p=1)
>>> result = transform(image=image)
>>> blurred_image = result["image"]
"""
class InitSchema(BlurInitSchema):
sigma_limit: NonNegativeFloatRangeType
@field_validator("blur_limit")
@classmethod
def process_blur(cls, value: ScaleIntType, info: ValidationInfo) -> tuple[int, int]:
return fblur.process_blur_limit(value, info, min_value=0)
@model_validator(mode="after")
def validate_limits(self) -> Self:
if (
isinstance(self.blur_limit, (tuple, list))
and self.blur_limit[0] == 0
and isinstance(self.sigma_limit, (tuple, list))
and self.sigma_limit[0] == 0
):
self.blur_limit = 3, max(3, self.blur_limit[1])
warnings.warn(
"blur_limit and sigma_limit minimum value can not be both equal to 0. "
"blur_limit minimum value changed to 3.",
stacklevel=2,
)
return self
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigma_limit: ScaleFloatType = 0,
always_apply: bool | None = None,
p: float = 0.5,
):
super().__init__(p, always_apply)
self.blur_limit = cast(tuple[int, int], blur_limit)
self.sigma_limit = cast(tuple[float, float], sigma_limit)
def apply(self, img: np.ndarray, ksize: int, sigma: float, **params: Any) -> np.ndarray:
return fblur.gaussian_blur(img, ksize, sigma=sigma)
def get_params(self) -> dict[str, float]:
ksize = fblur.sample_odd_from_range(
self.py_random,
self.blur_limit[0],
self.blur_limit[1],
)
return {"ksize": ksize, "sigma": self.py_random.uniform(*self.sigma_limit)}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return "blur_limit", "sigma_limit"
class GlassBlur
(sigma=0.7, max_delta=4, iterations=2, mode='fast', always_apply=None, p=0.5)
[view source on GitHub] ¶
Apply a glass blur effect to the input image.
This transform simulates the effect of looking through textured glass by locally shuffling pixels in the image. It creates a distorted, frosted glass-like appearance.
Parameters:
Name | Type | Description |
---|---|---|
sigma | float | Standard deviation for the Gaussian kernel used in the process. Higher values increase the blur effect. Must be non-negative. Default: 0.7 |
max_delta | int | Maximum distance in pixels for shuffling. Determines how far pixels can be moved. Larger values create more distortion. Must be a positive integer. Default: 4 |
iterations | int | Number of times to apply the glass blur effect. More iterations create a stronger effect but increase computation time. Must be a positive integer. Default: 2 |
mode | Literal["fast", "exact"] | Mode of computation. Options are: - "fast": Uses a faster but potentially less accurate method. - "exact": Uses a slower but more precise method. Default: "fast" |
p | float | Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5 |
Targets
image
Image types: uint8, float32
Number of channels: Any
Note
- This transform is particularly effective for creating a 'looking through glass' effect or simulating the view through a frosted window.
- The 'fast' mode is recommended for most use cases as it provides a good balance between effect quality and computation speed.
- Increasing 'iterations' will strengthen the effect but also increase the processing time linearly.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GlassBlur(sigma=0.7, max_delta=4, iterations=3, mode="fast", p=1)
>>> result = transform(image=image)
>>> glass_blurred_image = result["image"]
References
- This implementation is based on the technique described in: "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" https://arxiv.org/abs/1903.12261
- Original implementation: https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class GlassBlur(ImageOnlyTransform):
"""Apply a glass blur effect to the input image.
This transform simulates the effect of looking through textured glass by locally
shuffling pixels in the image. It creates a distorted, frosted glass-like appearance.
Args:
sigma (float): Standard deviation for the Gaussian kernel used in the process.
Higher values increase the blur effect. Must be non-negative.
Default: 0.7
max_delta (int): Maximum distance in pixels for shuffling.
Determines how far pixels can be moved. Larger values create more distortion.
Must be a positive integer.
Default: 4
iterations (int): Number of times to apply the glass blur effect.
More iterations create a stronger effect but increase computation time.
Must be a positive integer.
Default: 2
mode (Literal["fast", "exact"]): Mode of computation. Options are:
- "fast": Uses a faster but potentially less accurate method.
- "exact": Uses a slower but more precise method.
Default: "fast"
p (float): Probability of applying the transform. Should be in the range [0, 1].
Default: 0.5
Targets:
image
Image types:
uint8, float32
Number of channels:
Any
Note:
- This transform is particularly effective for creating a 'looking through
glass' effect or simulating the view through a frosted window.
- The 'fast' mode is recommended for most use cases as it provides a good
balance between effect quality and computation speed.
- Increasing 'iterations' will strengthen the effect but also increase the
processing time linearly.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.GlassBlur(sigma=0.7, max_delta=4, iterations=3, mode="fast", p=1)
>>> result = transform(image=image)
>>> glass_blurred_image = result["image"]
References:
- This implementation is based on the technique described in:
"ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness"
https://arxiv.org/abs/1903.12261
- Original implementation:
https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
"""
class InitSchema(BaseTransformInitSchema):
sigma: float = Field(ge=0)
max_delta: int = Field(ge=1)
iterations: int = Field(ge=1)
mode: Literal["fast", "exact"]
def __init__(
self,
sigma: float = 0.7,
max_delta: int = 4,
iterations: int = 2,
mode: Literal["fast", "exact"] = "fast",
always_apply: bool | None = None,
p: float = 0.5,
):
super().__init__(p=p, always_apply=always_apply)
self.sigma = sigma
self.max_delta = max_delta
self.iterations = iterations
self.mode = mode
def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray, **params: Any) -> np.ndarray:
return fblur.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, np.ndarray]:
height, width = params["shape"][:2]
# generate array containing all necessary values for transformations
width_pixels = height - self.max_delta * 2
height_pixels = width - self.max_delta * 2
total_pixels = int(width_pixels * height_pixels)
dxy = self.random_generator.integers(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))
return {"dxy": dxy}
def get_transform_init_args_names(self) -> tuple[str, str, str, str]:
return "sigma", "max_delta", "iterations", "mode"
class MedianBlur
(blur_limit=7, p=0.5, always_apply=None)
[view source on GitHub] ¶
Apply median blur to the input image.
This transform uses a median filter to blur the input image. Median filtering is particularly effective at removing salt-and-pepper noise while preserving edges, making it a popular choice for noise reduction in image processing.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | int | tuple[int, int] | Maximum aperture linear size for blurring the input image. Must be odd and in the range [3, inf). - If a single int is provided, the kernel size will be randomly chosen between 3 and that value. - If a tuple of two ints is provided, it defines the inclusive range of possible kernel sizes. Default: (3, 7) |
p | float | Probability of applying the transform. Default: 0.5 |
Targets
image
Image types: uint8, float32
Number of channels: Any
Note
- The kernel size (aperture linear size) must always be odd and greater than 1.
- Unlike mean blur or Gaussian blur, median blur uses the median of all pixels under the kernel area, making it more robust to outliers.
- This transform is particularly useful for:
- Removing salt-and-pepper noise
- Preserving edges while smoothing images
- Pre-processing images for edge detection algorithms
- For color images, the median is calculated independently for each channel.
- Larger kernel sizes result in stronger blurring effects but may also remove fine details from the image.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MedianBlur(blur_limit=(3, 7), p=0.5)
>>> result = transform(image=image)
>>> blurred_image = result["image"]
References
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class MedianBlur(Blur):
"""Apply median blur to the input image.
This transform uses a median filter to blur the input image. Median filtering is particularly
effective at removing salt-and-pepper noise while preserving edges, making it a popular choice
for noise reduction in image processing.
Args:
blur_limit (int | tuple[int, int]): Maximum aperture linear size for blurring the input image.
Must be odd and in the range [3, inf).
- If a single int is provided, the kernel size will be randomly chosen
between 3 and that value.
- If a tuple of two ints is provided, it defines the inclusive range
of possible kernel sizes.
Default: (3, 7)
p (float): Probability of applying the transform. Default: 0.5
Targets:
image
Image types:
uint8, float32
Number of channels:
Any
Note:
- The kernel size (aperture linear size) must always be odd and greater than 1.
- Unlike mean blur or Gaussian blur, median blur uses the median of all pixels under
the kernel area, making it more robust to outliers.
- This transform is particularly useful for:
* Removing salt-and-pepper noise
* Preserving edges while smoothing images
* Pre-processing images for edge detection algorithms
- For color images, the median is calculated independently for each channel.
- Larger kernel sizes result in stronger blurring effects but may also remove
fine details from the image.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MedianBlur(blur_limit=(3, 7), p=0.5)
>>> result = transform(image=image)
>>> blurred_image = result["image"]
References:
- Median filter: https://en.wikipedia.org/wiki/Median_filter
- OpenCV medianBlur: https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga564869aa33e58769b4469101aac458f9
"""
def __init__(self, blur_limit: ScaleIntType = 7, p: float = 0.5, always_apply: bool | None = None):
super().__init__(blur_limit=blur_limit, p=p, always_apply=always_apply)
def apply(self, img: np.ndarray, kernel: int, **params: Any) -> np.ndarray:
return fblur.median_blur(img, kernel)
class MotionBlur
(blur_limit=7, allow_shifted=True, angle_range=(0, 360), direction_range=(-0.5, 0.5), always_apply=None, p=0.5)
[view source on GitHub] ¶
Apply motion blur to the input image using a directional kernel.
This transform simulates motion blur effects that occur during image capture, such as camera shake or object movement. It creates a directional blur using a line-shaped kernel with controllable angle, direction, and position.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | int | tuple[int, int] | Maximum kernel size for blurring. Should be in range [3, inf). - If int: kernel size will be randomly chosen from [3, blur_limit] - If tuple: kernel size will be randomly chosen from [min, max] Larger values create stronger blur effects. Default: (3, 7) |
angle_range | tuple[float, float] | Range of possible angles in degrees. Controls the rotation of the motion blur line: - 0°: Horizontal motion blur → - 45°: Diagonal motion blur ↗ - 90°: Vertical motion blur ↑ - 135°: Diagonal motion blur ↖ Default: (0, 360) |
direction_range | tuple[float, float] | Range for motion bias. Controls how the blur extends from the center: - -1.0: Blur extends only backward (←) - 0.0: Blur extends equally in both directions (←→) - 1.0: Blur extends only forward (→) For example, with angle=0: - direction=-1.0: ←• - direction=0.0: ←•→ - direction=1.0: •→ Default: (-0.5, 0.5) |
allow_shifted | bool | Allow random kernel position shifts. - If True: Kernel can be randomly offset from center - If False: Kernel will always be centered Default: True |
p | float | Probability of applying the transform. Default: 0.5 |
Examples of angle vs direction: 1. Horizontal motion (angle=0°): - direction=0.0: ←•→ (symmetric blur) - direction=1.0: •→ (forward blur) - direction=-1.0: ←• (backward blur)
2. Vertical motion (angle=90°):
- direction=0.0: ↑•↓ (symmetric blur)
- direction=1.0: •↑ (upward blur)
- direction=-1.0: ↓• (downward blur)
3. Diagonal motion (angle=45°):
- direction=0.0: ↙•↗ (symmetric blur)
- direction=1.0: •↗ (forward diagonal blur)
- direction=-1.0: ↙• (backward diagonal blur)
Note
- angle controls the orientation of the motion line
- direction controls the distribution of the blur along that line
- Together they can simulate various motion effects:
- Camera shake: Small angle range + direction near 0
- Object motion: Specific angle + direction=1.0
- Complex motion: Random angle + random direction
Examples:
>>> import albumentations as A
>>> # Horizontal camera shake (symmetric)
>>> transform = A.MotionBlur(
... angle_range=(-5, 5), # Near-horizontal motion
... direction_range=(0, 0), # Symmetric blur
... p=1.0
... )
>>>
>>> # Object moving right
>>> transform = A.MotionBlur(
... angle_range=(0, 0), # Horizontal motion
... direction_range=(0.8, 1.0), # Strong forward bias
... p=1.0
... )
References
-
Motion blur fundamentals: https://en.wikipedia.org/wiki/Motion_blur
-
Directional blur kernels: https://www.sciencedirect.com/topics/computer-science/directional-blur
-
OpenCV filter2D (used for convolution): https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga27c049795ce870216ddfb366086b5a04
-
Research on motion blur simulation: "Understanding and Evaluating Blind Deconvolution Algorithms" (CVPR 2009) https://doi.org/10.1109/CVPR.2009.5206815
-
Motion blur in photography: "The Manual of Photography", Chapter 7: Motion in Photography ISBN: 978-0240520377
-
Kornia's implementation (similar approach): https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.RandomMotionBlur
See Also: - GaussianBlur: For uniform blur effects - MedianBlur: For noise reduction while preserving edges - RandomRain: Another motion-based effect - Perspective: For geometric motion-like distortions
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class MotionBlur(Blur):
"""Apply motion blur to the input image using a directional kernel.
This transform simulates motion blur effects that occur during image capture,
such as camera shake or object movement. It creates a directional blur using
a line-shaped kernel with controllable angle, direction, and position.
Args:
blur_limit (int | tuple[int, int]): Maximum kernel size for blurring.
Should be in range [3, inf).
- If int: kernel size will be randomly chosen from [3, blur_limit]
- If tuple: kernel size will be randomly chosen from [min, max]
Larger values create stronger blur effects.
Default: (3, 7)
angle_range (tuple[float, float]): Range of possible angles in degrees.
Controls the rotation of the motion blur line:
- 0°: Horizontal motion blur →
- 45°: Diagonal motion blur ↗
- 90°: Vertical motion blur ↑
- 135°: Diagonal motion blur ↖
Default: (0, 360)
direction_range (tuple[float, float]): Range for motion bias.
Controls how the blur extends from the center:
- -1.0: Blur extends only backward (←)
- 0.0: Blur extends equally in both directions (←→)
- 1.0: Blur extends only forward (→)
For example, with angle=0:
- direction=-1.0: ←•
- direction=0.0: ←•→
- direction=1.0: •→
Default: (-0.5, 0.5)
allow_shifted (bool): Allow random kernel position shifts.
- If True: Kernel can be randomly offset from center
- If False: Kernel will always be centered
Default: True
p (float): Probability of applying the transform. Default: 0.5
Examples of angle vs direction:
1. Horizontal motion (angle=0°):
- direction=0.0: ←•→ (symmetric blur)
- direction=1.0: •→ (forward blur)
- direction=-1.0: ←• (backward blur)
2. Vertical motion (angle=90°):
- direction=0.0: ↑•↓ (symmetric blur)
- direction=1.0: •↑ (upward blur)
- direction=-1.0: ↓• (downward blur)
3. Diagonal motion (angle=45°):
- direction=0.0: ↙•↗ (symmetric blur)
- direction=1.0: •↗ (forward diagonal blur)
- direction=-1.0: ↙• (backward diagonal blur)
Note:
- angle controls the orientation of the motion line
- direction controls the distribution of the blur along that line
- Together they can simulate various motion effects:
* Camera shake: Small angle range + direction near 0
* Object motion: Specific angle + direction=1.0
* Complex motion: Random angle + random direction
Example:
>>> import albumentations as A
>>> # Horizontal camera shake (symmetric)
>>> transform = A.MotionBlur(
... angle_range=(-5, 5), # Near-horizontal motion
... direction_range=(0, 0), # Symmetric blur
... p=1.0
... )
>>>
>>> # Object moving right
>>> transform = A.MotionBlur(
... angle_range=(0, 0), # Horizontal motion
... direction_range=(0.8, 1.0), # Strong forward bias
... p=1.0
... )
References:
- Motion blur fundamentals:
https://en.wikipedia.org/wiki/Motion_blur
- Directional blur kernels:
https://www.sciencedirect.com/topics/computer-science/directional-blur
- OpenCV filter2D (used for convolution):
https://docs.opencv.org/master/d4/d86/group__imgproc__filter.html#ga27c049795ce870216ddfb366086b5a04
- Research on motion blur simulation:
"Understanding and Evaluating Blind Deconvolution Algorithms" (CVPR 2009)
https://doi.org/10.1109/CVPR.2009.5206815
- Motion blur in photography:
"The Manual of Photography", Chapter 7: Motion in Photography
ISBN: 978-0240520377
- Kornia's implementation (similar approach):
https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.RandomMotionBlur
See Also:
- GaussianBlur: For uniform blur effects
- MedianBlur: For noise reduction while preserving edges
- RandomRain: Another motion-based effect
- Perspective: For geometric motion-like distortions
"""
class InitSchema(BlurInitSchema):
allow_shifted: bool
angle_range: Annotated[tuple[float, float], AfterValidator(nondecreasing)]
direction_range: Annotated[
tuple[float, float],
AfterValidator(nondecreasing),
AfterValidator(check_range_bounds(min_val=-1.0, max_val=1.0)),
]
def __init__(
self,
blur_limit: ScaleIntType = 7,
allow_shifted: bool = True,
angle_range: tuple[float, float] = (0, 360),
direction_range: tuple[float, float] = (-0.5, 0.5),
always_apply: bool | None = None,
p: float = 0.5,
):
super().__init__(blur_limit=blur_limit, p=p)
self.allow_shifted = allow_shifted
self.blur_limit = cast(tuple[int, int], blur_limit)
self.angle_range = angle_range
self.direction_range = direction_range
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (*super().get_transform_init_args_names(), "allow_shifted", "angle_range", "direction_range")
def apply(self, img: np.ndarray, kernel: np.ndarray, **params: Any) -> np.ndarray:
return fmain.convolve(img, kernel=kernel)
def get_params(self) -> dict[str, Any]:
ksize = fblur.sample_odd_from_range(
self.py_random,
self.blur_limit[0],
self.blur_limit[1],
)
angle = self.py_random.uniform(*self.angle_range)
direction = self.py_random.uniform(*self.direction_range)
# Create motion blur kernel
kernel = fblur.create_motion_kernel(
ksize,
angle,
direction,
allow_shifted=self.allow_shifted,
random_state=self.py_random,
)
return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
class ZoomBlur
(max_factor=(1, 1.31), step_factor=(0.01, 0.03), always_apply=None, p=0.5)
[view source on GitHub] ¶
Apply zoom blur transform.
Parameters:
Name | Type | Description |
---|---|---|
max_factor | float, float) or float | range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1. |
step_factor | float, float) or float | If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: unit8, float32
Reference
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/blur/transforms.py
class ZoomBlur(ImageOnlyTransform):
"""Apply zoom blur transform.
Args:
max_factor ((float, float) or float): range for max factor for blurring.
If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
All max_factor values should be larger than 1.
step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
All step_factor values should be positive.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
unit8, float32
Reference:
https://arxiv.org/abs/1903.12261
"""
class InitSchema(BaseTransformInitSchema):
max_factor: OnePlusFloatRangeType
step_factor: NonNegativeFloatRangeType
def __init__(
self,
max_factor: ScaleFloatType = (1, 1.31),
step_factor: ScaleFloatType = (0.01, 0.03),
always_apply: bool | None = None,
p: float = 0.5,
):
super().__init__(p=p, always_apply=always_apply)
self.max_factor = cast(tuple[float, float], max_factor)
self.step_factor = cast(tuple[float, float], step_factor)
def apply(self, img: np.ndarray, zoom_factors: np.ndarray, **params: Any) -> np.ndarray:
return fblur.zoom_blur(img, zoom_factors)
def get_params(self) -> dict[str, Any]:
step_factor = self.py_random.uniform(*self.step_factor)
max_factor = max(1 + step_factor, self.py_random.uniform(*self.max_factor))
return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}
def get_transform_init_args_names(self) -> tuple[str, str]:
return ("max_factor", "step_factor")
crops
special
¶
functional
¶
def crop_and_pad_keypoints (keypoints, crop_params=None, pad_params=None, image_shape=(0, 0), result_shape=(0, 0), keep_size=False)
[view source on GitHub]¶
Crop and pad multiple keypoints simultaneously.
Parameters:
Name | Type | Description |
---|---|---|
keypoints | np.ndarray | Array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...). |
crop_params | Sequence[int] | Crop parameters [crop_x1, crop_y1, ...]. |
pad_params | Sequence[int] | Pad parameters [top, bottom, left, right]. |
image_shape | Tuple[int, int] | Original image shape (rows, cols). |
result_shape | Tuple[int, int] | Result image shape (rows, cols). |
keep_size | bool | Whether to keep the original size. |
Returns:
Type | Description |
---|---|
np.ndarray | Array of transformed keypoints with the same shape as input. |
Source code in albumentations/augmentations/crops/functional.py
@handle_empty_array("keypoints")
def crop_and_pad_keypoints(
keypoints: np.ndarray,
crop_params: tuple[int, int, int, int] | None = None,
pad_params: tuple[int, int, int, int] | None = None,
image_shape: tuple[int, int] = (0, 0),
result_shape: tuple[int, int] = (0, 0),
keep_size: bool = False,
) -> np.ndarray:
"""Crop and pad multiple keypoints simultaneously.
Args:
keypoints (np.ndarray): Array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).
crop_params (Sequence[int], optional): Crop parameters [crop_x1, crop_y1, ...].
pad_params (Sequence[int], optional): Pad parameters [top, bottom, left, right].
image_shape (Tuple[int, int]): Original image shape (rows, cols).
result_shape (Tuple[int, int]): Result image shape (rows, cols).
keep_size (bool): Whether to keep the original size.
Returns:
np.ndarray: Array of transformed keypoints with the same shape as input.
"""
transformed_keypoints = keypoints.copy()
if crop_params is not None:
crop_x1, crop_y1 = crop_params[:2]
transformed_keypoints[:, 0] -= crop_x1
transformed_keypoints[:, 1] -= crop_y1
if pad_params is not None:
top, _, left, _ = pad_params
transformed_keypoints[:, 0] += left
transformed_keypoints[:, 1] += top
rows, cols = image_shape[:2]
result_rows, result_cols = result_shape[:2]
if keep_size and (result_cols != cols or result_rows != rows):
scale_x = cols / result_cols
scale_y = rows / result_rows
return fgeometric.keypoints_scale(transformed_keypoints, scale_x, scale_y)
return transformed_keypoints
def crop_bboxes_by_coords (bboxes, crop_coords, image_shape, normalized_input=True)
[view source on GitHub]¶
Crop bounding boxes based on given crop coordinates.
This function adjusts bounding boxes to fit within a cropped image.
Parameters:
Name | Type | Description |
---|---|---|
bboxes | np.ndarray | Array of bounding boxes with shape (N, 4+) where each row is [x_min, y_min, x_max, y_max, ...]. The bounding box coordinates can be either normalized (in [0, 1]) if normalized_input=True or absolute pixel values if normalized_input=False. |
crop_coords | tuple[int, int, int, int] | Crop coordinates (x_min, y_min, x_max, y_max) in absolute pixel values. |
image_shape | tuple[int, int] | Original image shape (height, width). |
normalized_input | bool | Whether input boxes are in normalized coordinates. If True, assumes input is normalized [0,1] and returns normalized coordinates. If False, assumes input is in absolute pixels and returns absolute coordinates. Default: True for backward compatibility. |
Returns:
Type | Description |
---|---|
np.ndarray | Array of cropped bounding boxes. Coordinates will be in the same format as input (normalized if normalized_input=True, absolute pixels if normalized_input=False). |
Note
Bounding boxes that fall completely outside the crop area will be removed. Bounding boxes that partially overlap with the crop area will be adjusted to fit within it.
Source code in albumentations/augmentations/crops/functional.py
def crop_bboxes_by_coords(
bboxes: np.ndarray,
crop_coords: tuple[int, int, int, int],
image_shape: tuple[int, int],
normalized_input: bool = True,
) -> np.ndarray:
"""Crop bounding boxes based on given crop coordinates.
This function adjusts bounding boxes to fit within a cropped image.
Args:
bboxes (np.ndarray): Array of bounding boxes with shape (N, 4+) where each row is
[x_min, y_min, x_max, y_max, ...]. The bounding box coordinates
can be either normalized (in [0, 1]) if normalized_input=True or
absolute pixel values if normalized_input=False.
crop_coords (tuple[int, int, int, int]): Crop coordinates (x_min, y_min, x_max, y_max)
in absolute pixel values.
image_shape (tuple[int, int]): Original image shape (height, width).
normalized_input (bool): Whether input boxes are in normalized coordinates.
If True, assumes input is normalized [0,1] and returns normalized coordinates.
If False, assumes input is in absolute pixels and returns absolute coordinates.
Default: True for backward compatibility.
Returns:
np.ndarray: Array of cropped bounding boxes. Coordinates will be in the same format as input
(normalized if normalized_input=True, absolute pixels if normalized_input=False).
Note:
Bounding boxes that fall completely outside the crop area will be removed.
Bounding boxes that partially overlap with the crop area will be adjusted to fit within it.
"""
if not bboxes.size:
return bboxes
# Convert to absolute coordinates if needed
if normalized_input:
cropped_bboxes = denormalize_bboxes(bboxes.copy().astype(np.float32), image_shape)
else:
cropped_bboxes = bboxes.copy().astype(np.float32)
x_min, y_min = crop_coords[:2]
# Subtract crop coordinates
cropped_bboxes[:, [0, 2]] -= x_min
cropped_bboxes[:, [1, 3]] -= y_min
# Calculate crop shape
crop_height = crop_coords[3] - crop_coords[1]
crop_width = crop_coords[2] - crop_coords[0]
crop_shape = (crop_height, crop_width)
# Return in same format as input
return normalize_bboxes(cropped_bboxes, crop_shape) if normalized_input else cropped_bboxes
def crop_keypoints_by_coords (keypoints, crop_coords)
[view source on GitHub]¶
Crop keypoints using the provided coordinates of bottom-left and top-right corners in pixels.
Parameters:
Name | Type | Description |
---|---|---|
keypoints | np.ndarray | An array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...). |
crop_coords | tuple | Crop box coords (x1, y1, x2, y2). |
Returns:
Type | Description |
---|---|
np.ndarray | An array of cropped keypoints with the same shape as the input. |
Source code in albumentations/augmentations/crops/functional.py
@handle_empty_array("keypoints")
def crop_keypoints_by_coords(
keypoints: np.ndarray,
crop_coords: tuple[int, int, int, int],
) -> np.ndarray:
"""Crop keypoints using the provided coordinates of bottom-left and top-right corners in pixels.
Args:
keypoints (np.ndarray): An array of keypoints with shape (N, 4+) where each row is (x, y, angle, scale, ...).
crop_coords (tuple): Crop box coords (x1, y1, x2, y2).
Returns:
np.ndarray: An array of cropped keypoints with the same shape as the input.
"""
x1, y1 = crop_coords[:2]
cropped_keypoints = keypoints.copy()
cropped_keypoints[:, 0] -= x1 # Adjust x coordinates
cropped_keypoints[:, 1] -= y1 # Adjust y coordinates
return cropped_keypoints
transforms
¶
class BBoxSafeRandomCrop
(erosion_rate=0.0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop a random part of the input without loss of bounding boxes.
This transform performs a random crop of the input image while ensuring that all bounding boxes remain within the cropped area. It's particularly useful for object detection tasks where preserving all objects in the image is crucial.
Parameters:
Name | Type | Description |
---|---|---|
erosion_rate | float | A value between 0.0 and 1.0 that determines the minimum allowable size of the crop as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be at least 80% of the original image height. Default: 0.0 (no minimum size). |
p | float | Probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
This transform ensures that all bounding boxes in the original image are fully contained within the cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out), it will default to cropping the entire image.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.ones((300, 300, 3), dtype=np.uint8)
>>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
>>> transform = A.Compose([
... A.BBoxSafeRandomCrop(erosion_rate=0.2, p=1.0),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
>>> transformed_image = transformed['image']
>>> transformed_bboxes = transformed['bboxes']
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class BBoxSafeRandomCrop(BaseCrop):
"""Crop a random part of the input without loss of bounding boxes.
This transform performs a random crop of the input image while ensuring that all bounding boxes remain within
the cropped area. It's particularly useful for object detection tasks where preserving all objects in the image
is crucial.
Args:
erosion_rate (float): A value between 0.0 and 1.0 that determines the minimum allowable size of the crop
as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be
at least 80% of the original image height. Default: 0.0 (no minimum size).
p (float): Probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
This transform ensures that all bounding boxes in the original image are fully contained within the
cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out),
it will default to cropping the entire image.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.ones((300, 300, 3), dtype=np.uint8)
>>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
>>> transform = A.Compose([
... A.BBoxSafeRandomCrop(erosion_rate=0.2, p=1.0),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
>>> transformed_image = transformed['image']
>>> transformed_bboxes = transformed['bboxes']
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
erosion_rate: float = Field(
ge=0.0,
le=1.0,
)
def __init__(self, erosion_rate: float = 0.0, p: float = 1.0, always_apply: bool | None = None):
super().__init__(p=p)
self.erosion_rate = erosion_rate
def _get_coords_no_bbox(self, image_shape: tuple[int, int]) -> tuple[int, int, int, int]:
image_height, image_width = image_shape
erosive_h = int(image_height * (1.0 - self.erosion_rate))
crop_height = image_height if erosive_h >= image_height else self.py_random.randint(erosive_h, image_height)
crop_width = int(crop_height * image_width / image_height)
h_start = self.py_random.random()
w_start = self.py_random.random()
crop_shape = (crop_height, crop_width)
return fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
image_shape = params["shape"][:2]
if len(data["bboxes"]) == 0: # less likely, this class is for use with bboxes.
crop_coords = self._get_coords_no_bbox(image_shape)
return {"crop_coords": crop_coords}
bbox_union = union_of_bboxes(bboxes=data["bboxes"], erosion_rate=self.erosion_rate)
if bbox_union is None:
crop_coords = self._get_coords_no_bbox(image_shape)
return {"crop_coords": crop_coords}
x_min, y_min, x_max, y_max = bbox_union
x_min = np.clip(x_min, 0, 1)
y_min = np.clip(y_min, 0, 1)
x_max = np.clip(x_max, x_min, 1)
y_max = np.clip(y_max, y_min, 1)
image_height, image_width = image_shape
crop_x_min = int(x_min * self.py_random.random() * image_width)
crop_y_min = int(y_min * self.py_random.random() * image_height)
bbox_xmax = x_max + (1 - x_max) * self.py_random.random()
bbox_ymax = y_max + (1 - y_max) * self.py_random.random()
crop_x_max = int(bbox_xmax * image_width)
crop_y_max = int(bbox_ymax * image_height)
return {"crop_coords": (crop_x_min, crop_y_min, crop_x_max, crop_y_max)}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return ("erosion_rate",)
class BaseCrop
[view source on GitHub] ¶
Base class for transforms that only perform cropping.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class BaseCrop(DualTransform):
"""Base class for transforms that only perform cropping."""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def apply(
self,
img: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
return fcrops.crop(img, x_min=crop_coords[0], y_min=crop_coords[1], x_max=crop_coords[2], y_max=crop_coords[3])
def apply_to_bboxes(
self,
bboxes: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
return fcrops.crop_bboxes_by_coords(bboxes, crop_coords, params["shape"][:2])
def apply_to_keypoints(
self,
keypoints: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
return fcrops.crop_keypoints_by_coords(keypoints, crop_coords)
@staticmethod
def _clip_bbox(bbox: tuple[int, int, int, int], image_shape: tuple[int, int]) -> tuple[int, int, int, int]:
height, width = image_shape[:2]
x_min, y_min, x_max, y_max = bbox
x_min = np.clip(x_min, 0, width)
y_min = np.clip(y_min, 0, height)
x_max = np.clip(x_max, x_min, width)
y_max = np.clip(y_max, y_min, height)
return x_min, y_min, x_max, y_max
class BaseCropAndPad
(pad_if_needed, border_mode, fill, fill_mask, pad_position, p, always_apply=None)
[view source on GitHub] ¶
Base class for transforms that need both cropping and padding.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class BaseCropAndPad(BaseCrop):
"""Base class for transforms that need both cropping and padding."""
class InitSchema(BaseTransformInitSchema):
pad_if_needed: bool
border_mode: BorderModeType
fill: ColorType
fill_mask: ColorType
pad_position: PositionType
def __init__(
self,
pad_if_needed: bool,
border_mode: int,
fill: ColorType,
fill_mask: ColorType,
pad_position: PositionType,
p: float,
always_apply: bool | None = None,
):
super().__init__(p=p)
self.pad_if_needed = pad_if_needed
self.border_mode = border_mode
self.fill = fill
self.fill_mask = fill_mask
self.pad_position = pad_position
def _get_pad_params(self, image_shape: tuple[int, int], target_shape: tuple[int, int]) -> dict[str, Any] | None:
"""Calculate padding parameters if needed."""
if not self.pad_if_needed:
return None
h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = fgeometric.get_padding_params(
image_shape=image_shape,
min_height=target_shape[0],
min_width=target_shape[1],
pad_height_divisor=None,
pad_width_divisor=None,
)
if h_pad_top == h_pad_bottom == w_pad_left == w_pad_right == 0:
return None
h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = fgeometric.adjust_padding_by_position(
h_top=h_pad_top,
h_bottom=h_pad_bottom,
w_left=w_pad_left,
w_right=w_pad_right,
position=self.pad_position,
py_random=self.py_random,
)
return {
"pad_top": h_pad_top,
"pad_bottom": h_pad_bottom,
"pad_left": w_pad_left,
"pad_right": w_pad_right,
}
def apply(
self,
img: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
pad_params = params.get("pad_params")
if pad_params is not None:
img = fgeometric.pad_with_params(
img,
pad_params["pad_top"],
pad_params["pad_bottom"],
pad_params["pad_left"],
pad_params["pad_right"],
border_mode=self.border_mode,
value=self.fill,
)
return super().apply(img, crop_coords, **params)
def apply_to_bboxes(
self,
bboxes: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
pad_params = params.get("pad_params")
image_shape = params["shape"][:2]
if pad_params is not None:
# First denormalize bboxes to absolute coordinates
bboxes_np = denormalize_bboxes(bboxes, image_shape)
# Apply padding to bboxes (already works with absolute coordinates)
bboxes_np = fgeometric.pad_bboxes(
bboxes_np,
pad_params["pad_top"],
pad_params["pad_bottom"],
pad_params["pad_left"],
pad_params["pad_right"],
self.border_mode,
image_shape=image_shape,
)
# Update shape to padded dimensions
padded_height = image_shape[0] + pad_params["pad_top"] + pad_params["pad_bottom"]
padded_width = image_shape[1] + pad_params["pad_left"] + pad_params["pad_right"]
padded_shape = (padded_height, padded_width)
bboxes_np = normalize_bboxes(bboxes_np, padded_shape)
params["shape"] = padded_shape
return super().apply_to_bboxes(bboxes_np, crop_coords, **params)
# If no padding, use original function behavior
return super().apply_to_bboxes(bboxes, crop_coords, **params)
def apply_to_keypoints(
self,
keypoints: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
pad_params = params.get("pad_params")
image_shape = params["shape"][:2]
if pad_params is not None:
# Calculate padded dimensions
padded_height = image_shape[0] + pad_params["pad_top"] + pad_params["pad_bottom"]
padded_width = image_shape[1] + pad_params["pad_left"] + pad_params["pad_right"]
# First apply padding to keypoints using original image shape
keypoints = fgeometric.pad_keypoints(
keypoints,
pad_params["pad_top"],
pad_params["pad_bottom"],
pad_params["pad_left"],
pad_params["pad_right"],
self.border_mode,
image_shape=image_shape,
)
# Update image shape for subsequent crop operation
params = {**params, "shape": (padded_height, padded_width)}
return super().apply_to_keypoints(keypoints, crop_coords, **params)
class BaseRandomSizedCropInitSchema
[view source on GitHub] ¶
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class BaseRandomSizedCropInitSchema(BaseTransformInitSchema):
size: tuple[int, int]
@field_validator("size")
@classmethod
def check_size(cls, value: tuple[int, int]) -> tuple[int, int]:
if any(x <= 0 for x in value):
raise ValueError("All elements of 'size' must be positive integers.")
return value
class CenterCrop
(height, width, pad_if_needed=False, pad_mode=None, pad_cval=None, pad_cval_mask=None, pad_position='center', border_mode=0, fill=0.0, fill_mask=0.0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop the central part of the input.
This transform crops the center of the input image, mask, bounding boxes, and keypoints to the specified dimensions. It's useful when you want to focus on the central region of the input, discarding peripheral information.
Parameters:
Name | Type | Description |
---|---|---|
height | int | The height of the crop. Must be greater than 0. |
width | int | The width of the crop. Must be greater than 0. |
pad_if_needed | bool | Whether to pad if crop size exceeds image size. Default: False. |
border_mode | OpenCV flag | OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT. |
fill | ColorType | Padding value for images if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
fill_mask | ColorType | Padding value for masks if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
pad_position | Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random'] | Position of padding. Default: 'center'. |
p | float | Probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- If pad_if_needed is False and crop size exceeds image dimensions, it will raise a CropSizeError.
- If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded.
- For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class CenterCrop(BaseCropAndPad):
"""Crop the central part of the input.
This transform crops the center of the input image, mask, bounding boxes, and keypoints to the specified dimensions.
It's useful when you want to focus on the central region of the input, discarding peripheral information.
Args:
height (int): The height of the crop. Must be greater than 0.
width (int): The width of the crop. Must be greater than 0.
pad_if_needed (bool): Whether to pad if crop size exceeds image size. Default: False.
border_mode (OpenCV flag): OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
fill (ColorType): Padding value for images if border_mode is
cv2.BORDER_CONSTANT. Default: 0.
fill_mask (ColorType): Padding value for masks if border_mode is
cv2.BORDER_CONSTANT. Default: 0.
pad_position (Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']):
Position of padding. Default: 'center'.
p (float): Probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- If pad_if_needed is False and crop size exceeds image dimensions, it will raise a CropSizeError.
- If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded.
- For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.
"""
class InitSchema(BaseCropAndPad.InitSchema):
height: Annotated[int, Field(ge=1)]
width: Annotated[int, Field(ge=1)]
border_mode: BorderModeType
fill: ColorType
fill_mask: ColorType
pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead")
pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")
@model_validator(mode="after")
def validate_dimensions(self) -> Self:
if self.pad_mode is not None:
self.border_mode = self.pad_mode
if self.pad_cval is not None:
self.fill = self.pad_cval
if self.pad_cval_mask is not None:
self.fill_mask = self.pad_cval_mask
return self
def __init__(
self,
height: int,
width: int,
pad_if_needed: bool = False,
pad_mode: int | None = None,
pad_cval: ColorType | None = None,
pad_cval_mask: ColorType | None = None,
pad_position: PositionType = "center",
border_mode: int = cv2.BORDER_CONSTANT,
fill: ColorType = 0.0,
fill_mask: ColorType = 0.0,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(
pad_if_needed=pad_if_needed,
border_mode=border_mode,
fill=fill,
fill_mask=fill_mask,
pad_position=pad_position,
p=p,
)
self.height = height
self.width = width
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (
"height",
"width",
"pad_if_needed",
"border_mode",
"fill",
"fill_mask",
"pad_position",
)
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, Any]:
image_shape = params["shape"][:2]
image_height, image_width = image_shape
if not self.pad_if_needed and (self.height > image_height or self.width > image_width):
raise CropSizeError(
f"Crop size (height, width) exceeds image dimensions (height, width):"
f" {(self.height, self.width)} vs {image_shape[:2]}",
)
# Get padding params first if needed
pad_params = self._get_pad_params(image_shape, (self.height, self.width))
# If padding is needed, adjust the image shape for crop calculation
if pad_params is not None:
pad_top = pad_params["pad_top"]
pad_bottom = pad_params["pad_bottom"]
pad_left = pad_params["pad_left"]
pad_right = pad_params["pad_right"]
padded_height = image_height + pad_top + pad_bottom
padded_width = image_width + pad_left + pad_right
padded_shape = (padded_height, padded_width)
# Get crop coordinates based on padded dimensions
crop_coords = fcrops.get_center_crop_coords(padded_shape, (self.height, self.width))
else:
# Get crop coordinates based on original dimensions
crop_coords = fcrops.get_center_crop_coords(image_shape, (self.height, self.width))
return {
"crop_coords": crop_coords,
"pad_params": pad_params,
}
class Crop
(x_min=0, y_min=0, x_max=1024, y_max=1024, pad_if_needed=False, pad_mode=None, pad_cval=None, pad_cval_mask=None, pad_position='center', border_mode=0, fill=0, fill_mask=0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop a specific region from the input image.
This transform crops a rectangular region from the input image, mask, bounding boxes, and keypoints based on specified coordinates. It's useful when you want to extract a specific area of interest from your inputs.
Parameters:
Name | Type | Description |
---|---|---|
x_min | int | Minimum x-coordinate of the crop region (left edge). Must be >= 0. Default: 0. |
y_min | int | Minimum y-coordinate of the crop region (top edge). Must be >= 0. Default: 0. |
x_max | int | Maximum x-coordinate of the crop region (right edge). Must be > x_min. Default: 1024. |
y_max | int | Maximum y-coordinate of the crop region (bottom edge). Must be > y_min. Default: 1024. |
pad_if_needed | bool | Whether to pad if crop coordinates exceed image dimensions. Default: False. |
border_mode | OpenCV flag | OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT. |
fill | ColorType | Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
fill_mask | ColorType | Padding value for masks. Default: 0. |
pad_position | Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random'] | Position of padding. Default: 'center'. |
p | float | Probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- The crop coordinates are applied as follows: x_min <= x < x_max and y_min <= y < y_max.
- If pad_if_needed is False and crop region extends beyond image boundaries, it will be clipped.
- If pad_if_needed is True, image will be padded to accommodate the full crop region.
- For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class Crop(BaseCropAndPad):
"""Crop a specific region from the input image.
This transform crops a rectangular region from the input image, mask, bounding boxes, and keypoints
based on specified coordinates. It's useful when you want to extract a specific area of interest
from your inputs.
Args:
x_min (int): Minimum x-coordinate of the crop region (left edge). Must be >= 0. Default: 0.
y_min (int): Minimum y-coordinate of the crop region (top edge). Must be >= 0. Default: 0.
x_max (int): Maximum x-coordinate of the crop region (right edge). Must be > x_min. Default: 1024.
y_max (int): Maximum y-coordinate of the crop region (bottom edge). Must be > y_min. Default: 1024.
pad_if_needed (bool): Whether to pad if crop coordinates exceed image dimensions. Default: False.
border_mode (OpenCV flag): OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
fill (ColorType): Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0.
fill_mask (ColorType): Padding value for masks. Default: 0.
pad_position (Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']):
Position of padding. Default: 'center'.
p (float): Probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- The crop coordinates are applied as follows: x_min <= x < x_max and y_min <= y < y_max.
- If pad_if_needed is False and crop region extends beyond image boundaries, it will be clipped.
- If pad_if_needed is True, image will be padded to accommodate the full crop region.
- For bounding boxes and keypoints, coordinates are adjusted appropriately for both padding and cropping.
"""
class InitSchema(BaseCropAndPad.InitSchema):
x_min: Annotated[int, Field(ge=0)]
y_min: Annotated[int, Field(ge=0)]
x_max: Annotated[int, Field(gt=0)]
y_max: Annotated[int, Field(gt=0)]
border_mode: BorderModeType
fill: ColorType
fill_mask: ColorType
pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead")
pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")
@model_validator(mode="after")
def validate_coordinates(self) -> Self:
if not self.x_min < self.x_max:
msg = "x_max must be greater than x_min"
raise ValueError(msg)
if not self.y_min < self.y_max:
msg = "y_max must be greater than y_min"
raise ValueError(msg)
if self.pad_mode is not None:
self.border_mode = self.pad_mode
if self.pad_cval is not None:
self.fill = self.pad_cval
if self.pad_cval_mask is not None:
self.fill_mask = self.pad_cval_mask
return self
def __init__(
self,
x_min: int = 0,
y_min: int = 0,
x_max: int = 1024,
y_max: int = 1024,
pad_if_needed: bool = False,
pad_mode: int | None = None,
pad_cval: ColorType | None = None,
pad_cval_mask: ColorType | None = None,
pad_position: PositionType = "center",
border_mode: int = cv2.BORDER_CONSTANT,
fill: ColorType = 0,
fill_mask: ColorType = 0,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(
pad_if_needed=pad_if_needed,
border_mode=border_mode,
fill=fill,
fill_mask=fill_mask,
pad_position=pad_position,
p=p,
)
self.x_min = x_min
self.y_min = y_min
self.x_max = x_max
self.y_max = y_max
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, Any]:
image_shape = params["shape"][:2]
image_height, image_width = image_shape
crop_height = self.y_max - self.y_min
crop_width = self.x_max - self.x_min
if not self.pad_if_needed:
# If no padding, clip coordinates to image boundaries
x_min = np.clip(self.x_min, 0, image_width)
y_min = np.clip(self.y_min, 0, image_height)
x_max = np.clip(self.x_max, x_min, image_width)
y_max = np.clip(self.y_max, y_min, image_height)
return {"crop_coords": (x_min, y_min, x_max, y_max)}
# Calculate padding if needed
pad_params = self._get_pad_params(
image_shape=image_shape,
target_shape=(max(crop_height, image_height), max(crop_width, image_width)),
)
if pad_params is not None:
# Adjust crop coordinates based on padding
x_min = self.x_min + pad_params["pad_left"]
y_min = self.y_min + pad_params["pad_top"]
x_max = self.x_max + pad_params["pad_left"]
y_max = self.y_max + pad_params["pad_top"]
crop_coords = (x_min, y_min, x_max, y_max)
else:
crop_coords = (self.x_min, self.y_min, self.x_max, self.y_max)
return {
"crop_coords": crop_coords,
"pad_params": pad_params,
}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (
"x_min",
"y_min",
"x_max",
"y_max",
"pad_if_needed",
"border_mode",
"fill",
"fill_mask",
"pad_position",
)
class CropAndPad
(px=None, percent=None, pad_mode=None, pad_cval=None, pad_cval_mask=None, keep_size=True, sample_independently=True, interpolation=1, mask_interpolation=0, border_mode=0, fill=0, fill_mask=0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop and pad images by pixel amounts or fractions of image sizes.
This transform allows for simultaneous cropping and padding of images. Cropping removes pixels from the sides (i.e., extracts a subimage), while padding adds pixels to the sides (e.g., black pixels). The amount of cropping/padding can be specified either in absolute pixels or as a fraction of the image size.
Parameters:
Name | Type | Description |
---|---|---|
px | int, tuple of int, tuple of tuples of int, or None | The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter |
percent | float, tuple of float, tuple of tuples of float, or None | The fraction of the image size to crop (negative values) or pad (positive values) on each side. Either this or the parameter |
border_mode | int | OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT. |
fill | ColorType | The constant value to use for padding if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
fill_mask | ColorType | Same as fill but used for mask padding. Default: 0. |
keep_size | bool | If True, the output image will be resized to the input image size after cropping/padding. Default: True. |
sample_independently | bool | If True and ranges are used for px/percent, sample a value for each side independently. If False, sample one value and use it for all sides. Default: True. |
interpolation | int | OpenCV interpolation flag used for resizing if keep_size is True. Default: cv2.INTER_LINEAR. |
mask_interpolation | int | OpenCV interpolation flag used for resizing if keep_size is True. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST. |
p | float | Probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- This transform will never crop images below a height or width of 1.
- When using pixel values (px), the image will be cropped/padded by exactly that many pixels.
- When using percentages (percent), the amount of crop/pad will be calculated based on the image size.
- Bounding boxes that end up fully outside the image after cropping will be removed.
- Keypoints that end up outside the image after cropping will be removed.
Examples:
>>> import albumentations as A
>>> transform = A.Compose([
... A.CropAndPad(px=(-10, 20, 30, -40), border_mode=cv2.BORDER_REFLECT, fill=128, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class CropAndPad(DualTransform):
"""Crop and pad images by pixel amounts or fractions of image sizes.
This transform allows for simultaneous cropping and padding of images. Cropping removes pixels from the sides
(i.e., extracts a subimage), while padding adds pixels to the sides (e.g., black pixels). The amount of
cropping/padding can be specified either in absolute pixels or as a fraction of the image size.
Args:
px (int, tuple of int, tuple of tuples of int, or None):
The number of pixels to crop (negative values) or pad (positive values) on each side of the image.
Either this or the parameter `percent` may be set, not both at the same time.
- If int: crop/pad all sides by this value.
- If tuple of 2 ints: crop/pad by (top/bottom, left/right).
- If tuple of 4 ints: crop/pad by (top, right, bottom, left).
- Each int can also be a tuple of 2 ints for a range, or a list of ints for discrete choices.
Default: None.
percent (float, tuple of float, tuple of tuples of float, or None):
The fraction of the image size to crop (negative values) or pad (positive values) on each side.
Either this or the parameter `px` may be set, not both at the same time.
- If float: crop/pad all sides by this fraction.
- If tuple of 2 floats: crop/pad by (top/bottom, left/right) fractions.
- If tuple of 4 floats: crop/pad by (top, right, bottom, left) fractions.
- Each float can also be a tuple of 2 floats for a range, or a list of floats for discrete choices.
Default: None.
border_mode (int):
OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
fill (ColorType):
The constant value to use for padding if border_mode is cv2.BORDER_CONSTANT.
Default: 0.
fill_mask (ColorType):
Same as fill but used for mask padding. Default: 0.
keep_size (bool):
If True, the output image will be resized to the input image size after cropping/padding.
Default: True.
sample_independently (bool):
If True and ranges are used for px/percent, sample a value for each side independently.
If False, sample one value and use it for all sides. Default: True.
interpolation (int):
OpenCV interpolation flag used for resizing if keep_size is True.
Default: cv2.INTER_LINEAR.
mask_interpolation (int):
OpenCV interpolation flag used for resizing if keep_size is True.
Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_NEAREST.
p (float):
Probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- This transform will never crop images below a height or width of 1.
- When using pixel values (px), the image will be cropped/padded by exactly that many pixels.
- When using percentages (percent), the amount of crop/pad will be calculated based on the image size.
- Bounding boxes that end up fully outside the image after cropping will be removed.
- Keypoints that end up outside the image after cropping will be removed.
Example:
>>> import albumentations as A
>>> transform = A.Compose([
... A.CropAndPad(px=(-10, 20, 30, -40), border_mode=cv2.BORDER_REFLECT, fill=128, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask, bboxes=bboxes, keypoints=keypoints)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
>>> transformed_bboxes = transformed['bboxes']
>>> transformed_keypoints = transformed['keypoints']
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
px: PxType | None
percent: PercentType | None
pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead")
pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")
keep_size: bool
sample_independently: bool
interpolation: InterpolationType
mask_interpolation: InterpolationType
fill: ColorType
fill_mask: ColorType
border_mode: BorderModeType
@model_validator(mode="after")
def check_px_percent(self) -> Self:
if self.px is None and self.percent is None:
msg = "Both px and percent parameters cannot be None simultaneously."
raise ValueError(msg)
if self.px is not None and self.percent is not None:
msg = "Only px or percent may be set!"
raise ValueError(msg)
if self.pad_mode is not None:
self.border_mode = self.pad_mode
if self.pad_cval is not None:
self.fill = self.pad_cval
if self.pad_cval_mask is not None:
self.fill_mask = self.pad_cval_mask
return self
def __init__(
self,
px: int | list[int] | None = None,
percent: float | list[float] | None = None,
pad_mode: int | None = None,
pad_cval: ColorType | None = None,
pad_cval_mask: ColorType | None = None,
keep_size: bool = True,
sample_independently: bool = True,
interpolation: int = cv2.INTER_LINEAR,
mask_interpolation: int = cv2.INTER_NEAREST,
border_mode: BorderModeType = cv2.BORDER_CONSTANT,
fill: ColorType = 0,
fill_mask: ColorType = 0,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(p=p, always_apply=always_apply)
self.px = px
self.percent = percent
self.border_mode = border_mode
self.fill = fill
self.fill_mask = fill_mask
self.keep_size = keep_size
self.sample_independently = sample_independently
self.interpolation = interpolation
self.mask_interpolation = mask_interpolation
def apply(
self,
img: np.ndarray,
crop_params: Sequence[int],
pad_params: Sequence[int],
fill: ColorType,
**params: Any,
) -> np.ndarray:
return fcrops.crop_and_pad(
img,
crop_params,
pad_params,
fill,
params["shape"][:2],
self.interpolation,
self.border_mode,
self.keep_size,
)
def apply_to_mask(
self,
mask: np.ndarray,
crop_params: Sequence[int],
pad_params: Sequence[int],
fill_mask: ColorType,
**params: Any,
) -> np.ndarray:
return fcrops.crop_and_pad(
mask,
crop_params,
pad_params,
fill_mask,
params["shape"][:2],
self.mask_interpolation,
self.border_mode,
self.keep_size,
)
def apply_to_bboxes(
self,
bboxes: np.ndarray,
crop_params: tuple[int, int, int, int],
pad_params: tuple[int, int, int, int],
result_shape: tuple[int, int],
**params: Any,
) -> np.ndarray:
return fcrops.crop_and_pad_bboxes(bboxes, crop_params, pad_params, params["shape"][:2], result_shape)
def apply_to_keypoints(
self,
keypoints: np.ndarray,
crop_params: tuple[int, int, int, int],
pad_params: tuple[int, int, int, int],
result_shape: tuple[int, int],
**params: Any,
) -> np.ndarray:
return fcrops.crop_and_pad_keypoints(
keypoints,
crop_params,
pad_params,
params["shape"][:2],
result_shape,
self.keep_size,
)
@staticmethod
def __prevent_zero(val1: int, val2: int, max_val: int) -> tuple[int, int]:
regain = abs(max_val) + 1
regain1 = regain // 2
regain2 = regain // 2
if regain1 + regain2 < regain:
regain1 += 1
if regain1 > val1:
diff = regain1 - val1
regain1 = val1
regain2 += diff
elif regain2 > val2:
diff = regain2 - val2
regain2 = val2
regain1 += diff
return val1 - regain1, val2 - regain2
@staticmethod
def _prevent_zero(crop_params: list[int], height: int, width: int) -> list[int]:
top, right, bottom, left = crop_params
remaining_height = height - (top + bottom)
remaining_width = width - (left + right)
if remaining_height < 1:
top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
if remaining_width < 1:
left, right = CropAndPad.__prevent_zero(left, right, width)
return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]
def get_params_dependent_on_data(self, params: dict[str, Any], data: dict[str, Any]) -> dict[str, Any]:
height, width = params["shape"][:2]
if self.px is not None:
new_params = self._get_px_params()
else:
percent_params = self._get_percent_params()
new_params = [
int(percent_params[0] * height),
int(percent_params[1] * width),
int(percent_params[2] * height),
int(percent_params[3] * width),
]
pad_params = [max(i, 0) for i in new_params]
crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)
top, right, bottom, left = crop_params
crop_params = [left, top, width - right, height - bottom]
result_rows = crop_params[3] - crop_params[1]
result_cols = crop_params[2] - crop_params[0]
if result_cols == width and result_rows == height:
crop_params = []
top, right, bottom, left = pad_params
pad_params = [top, bottom, left, right]
if any(pad_params):
result_rows += top + bottom
result_cols += left + right
else:
pad_params = []
return {
"crop_params": crop_params or None,
"pad_params": pad_params or None,
"fill": None if pad_params is None else self._get_pad_value(cast(ColorType, self.fill)),
"fill_mask": None if pad_params is None else self._get_pad_value(cast(ColorType, self.fill_mask)),
"result_shape": (result_rows, result_cols),
}
def _get_px_params(self) -> list[int]:
if self.px is None:
msg = "px is not set"
raise ValueError(msg)
if isinstance(self.px, int):
params = [self.px] * 4
elif len(self.px) == PAIR:
if self.sample_independently:
params = [self.py_random.randrange(*self.px) for _ in range(4)]
else:
px = self.py_random.randrange(*self.px)
params = [px] * 4
elif isinstance(self.px[0], int):
params = self.px
elif len(self.px[0]) == PAIR:
params = [self.py_random.randrange(*i) for i in self.px]
else:
params = [self.py_random.choice(i) for i in self.px]
return params
def _get_percent_params(self) -> list[float]:
if self.percent is None:
msg = "percent is not set"
raise ValueError(msg)
if isinstance(self.percent, float):
params = [self.percent] * 4
elif len(self.percent) == PAIR:
if self.sample_independently:
params = [self.py_random.uniform(*self.percent) for _ in range(4)]
else:
px = self.py_random.uniform(*self.percent)
params = [px] * 4
elif isinstance(self.percent[0], (int, float)):
params = self.percent
elif len(self.percent[0]) == PAIR:
params = [self.py_random.uniform(*i) for i in self.percent]
else:
params = [self.py_random.choice(i) for i in self.percent]
return params # params = [top, right, bottom, left]
def _get_pad_value(
self,
fill: ColorType,
) -> int | float:
if isinstance(fill, (list, tuple)):
if len(fill) == PAIR:
a, b = fill
if isinstance(a, int) and isinstance(b, int):
return self.py_random.randint(a, b)
return self.py_random.uniform(a, b)
return self.py_random.choice(fill)
if isinstance(fill, Real):
return fill
msg = "fill should be a number or list, or tuple of two numbers."
raise ValueError(msg)
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (
"px",
"percent",
"border_mode",
"fill",
"fill_mask",
"keep_size",
"sample_independently",
"interpolation",
"mask_interpolation",
)
class CropNonEmptyMaskIfExists
(height, width, ignore_values=None, ignore_channels=None, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop area with mask if mask is non-empty, else make random crop.
This transform attempts to crop a region containing a mask (non-zero pixels). If the mask is empty or not provided, it falls back to a random crop. This is particularly useful for segmentation tasks where you want to focus on regions of interest defined by the mask.
Parameters:
Name | Type | Description |
---|---|---|
height | int | Vertical size of crop in pixels. Must be > 0. |
width | int | Horizontal size of crop in pixels. Must be > 0. |
ignore_values | list of int | Values to ignore in mask, |
ignore_channels | list of int | Channels to ignore in mask. For example, if background is the first channel, set |
p | float | Probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- If a mask is provided, the transform will try to crop an area containing non-zero (or non-ignored) pixels.
- If no suitable area is found in the mask or no mask is provided, it will perform a random crop.
- The crop size (height, width) must not exceed the original image dimensions.
- Bounding boxes and keypoints are also cropped along with the image and mask.
Exceptions:
Type | Description |
---|---|
ValueError | If the specified crop size is larger than the input image dimensions. |
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.zeros((100, 100), dtype=np.uint8)
>>> mask[25:75, 25:75] = 1 # Create a non-empty region in the mask
>>> transform = A.Compose([
... A.CropNonEmptyMaskIfExists(height=50, width=50, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
# The resulting crop will likely include part of the non-zero region in the mask
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class CropNonEmptyMaskIfExists(BaseCrop):
"""Crop area with mask if mask is non-empty, else make random crop.
This transform attempts to crop a region containing a mask (non-zero pixels). If the mask is empty or not provided,
it falls back to a random crop. This is particularly useful for segmentation tasks where you want to focus on
regions of interest defined by the mask.
Args:
height (int): Vertical size of crop in pixels. Must be > 0.
width (int): Horizontal size of crop in pixels. Must be > 0.
ignore_values (list of int, optional): Values to ignore in mask, `0` values are always ignored.
For example, if background value is 5, set `ignore_values=[5]` to ignore it. Default: None.
ignore_channels (list of int, optional): Channels to ignore in mask.
For example, if background is the first channel, set `ignore_channels=[0]` to ignore it. Default: None.
p (float): Probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- If a mask is provided, the transform will try to crop an area containing non-zero (or non-ignored) pixels.
- If no suitable area is found in the mask or no mask is provided, it will perform a random crop.
- The crop size (height, width) must not exceed the original image dimensions.
- Bounding boxes and keypoints are also cropped along with the image and mask.
Raises:
ValueError: If the specified crop size is larger than the input image dimensions.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.zeros((100, 100), dtype=np.uint8)
>>> mask[25:75, 25:75] = 1 # Create a non-empty region in the mask
>>> transform = A.Compose([
... A.CropNonEmptyMaskIfExists(height=50, width=50, p=1.0),
... ])
>>> transformed = transform(image=image, mask=mask)
>>> transformed_image = transformed['image']
>>> transformed_mask = transformed['mask']
# The resulting crop will likely include part of the non-zero region in the mask
"""
class InitSchema(BaseCrop.InitSchema):
ignore_values: list[int] | None
ignore_channels: list[int] | None
height: Annotated[int, Field(ge=1)]
width: Annotated[int, Field(ge=1)]
def __init__(
self,
height: int,
width: int,
ignore_values: list[int] | None = None,
ignore_channels: list[int] | None = None,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(p=p)
self.height = height
self.width = width
self.ignore_values = ignore_values
self.ignore_channels = ignore_channels
def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
mask_height, mask_width = mask.shape[:2]
if self.ignore_values is not None:
ignore_values_np = np.array(self.ignore_values)
mask = np.where(np.isin(mask, ignore_values_np), 0, mask)
if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS and self.ignore_channels is not None:
target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
mask = np.take(mask, target_channels, axis=-1)
if self.height > mask_height or self.width > mask_width:
raise ValueError(
f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})",
)
return mask
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, Any]:
"""Get crop coordinates based on mask content."""
if "mask" in data:
mask = self._preprocess_mask(data["mask"])
elif "masks" in data and len(data["masks"]):
masks = data["masks"]
mask = self._preprocess_mask(np.copy(masks[0]))
for m in masks[1:]:
mask |= self._preprocess_mask(m)
else:
msg = "Can not find mask for CropNonEmptyMaskIfExists"
raise RuntimeError(msg)
mask_height, mask_width = mask.shape[:2]
if mask.any():
# Find non-zero regions in mask
mask_sum = mask.sum(axis=-1) if mask.ndim == NUM_MULTI_CHANNEL_DIMENSIONS else mask
non_zero_yx = np.argwhere(mask_sum)
y, x = self.py_random.choice(non_zero_yx)
# Calculate crop coordinates centered around chosen point
x_min = x - self.py_random.randint(0, self.width - 1)
y_min = y - self.py_random.randint(0, self.height - 1)
x_min = np.clip(x_min, 0, mask_width - self.width)
y_min = np.clip(y_min, 0, mask_height - self.height)
else:
# Random crop if no non-zero regions
x_min = self.py_random.randint(0, mask_width - self.width)
y_min = self.py_random.randint(0, mask_height - self.height)
x_max = x_min + self.width
y_max = y_min + self.height
return {"crop_coords": (x_min, y_min, x_max, y_max)}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return "height", "width", "ignore_values", "ignore_channels"
class RandomCrop
(height, width, pad_if_needed=False, pad_mode=None, pad_cval=None, pad_cval_mask=None, pad_position='center', border_mode=0, fill=0.0, fill_mask=0.0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop a random part of the input.
Parameters:
Name | Type | Description |
---|---|---|
height | int | height of the crop. |
width | int | width of the crop. |
pad_if_needed | bool | Whether to pad if crop size exceeds image size. Default: False. |
border_mode | OpenCV flag | OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT. |
fill | ColorType | Padding value for images if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
fill_mask | ColorType | Padding value for masks if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
pad_position | Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random'] | Position of padding. Default: 'center'. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded before applying the random crop.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class RandomCrop(BaseCropAndPad):
"""Crop a random part of the input.
Args:
height: height of the crop.
width: width of the crop.
pad_if_needed (bool): Whether to pad if crop size exceeds image size. Default: False.
border_mode (OpenCV flag): OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT.
fill (ColorType): Padding value for images if border_mode is
cv2.BORDER_CONSTANT. Default: 0.
fill_mask (ColorType): Padding value for masks if border_mode is
cv2.BORDER_CONSTANT. Default: 0.
pad_position (Literal['center', 'top_left', 'top_right', 'bottom_left', 'bottom_right', 'random']):
Position of padding. Default: 'center'.
p: probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
If pad_if_needed is True and crop size exceeds image dimensions, the image will be padded
before applying the random crop.
"""
class InitSchema(BaseCropAndPad.InitSchema):
height: Annotated[int, Field(ge=1)]
width: Annotated[int, Field(ge=1)]
border_mode: BorderModeType
fill: ColorType
fill_mask: ColorType
pad_mode: BorderModeType | None = Field(deprecated="pad_mode is deprecated, use border_mode instead ")
pad_cval: ColorType | None = Field(deprecated="pad_cval is deprecated, use fill instead")
pad_cval_mask: ColorType | None = Field(deprecated="pad_cval_mask is deprecated, use fill_mask instead")
@model_validator(mode="after")
def validate_dimensions(self) -> Self:
if self.pad_mode is not None:
self.border_mode = self.pad_mode
if self.pad_cval is not None:
self.fill = self.pad_cval
if self.pad_cval_mask is not None:
self.fill_mask = self.pad_cval_mask
return self
def __init__(
self,
height: int,
width: int,
pad_if_needed: bool = False,
pad_mode: int | None = None,
pad_cval: ColorType | None = None,
pad_cval_mask: ColorType | None = None,
pad_position: PositionType = "center",
border_mode: int = cv2.BORDER_CONSTANT,
fill: ColorType = 0.0,
fill_mask: ColorType = 0.0,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(
pad_if_needed=pad_if_needed,
border_mode=border_mode,
fill=fill,
fill_mask=fill_mask,
pad_position=pad_position,
p=p,
)
self.height = height
self.width = width
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, Any]: # Changed return type to be more flexible
image_shape = params["shape"][:2]
image_height, image_width = image_shape
if not self.pad_if_needed and (self.height > image_height or self.width > image_width):
raise CropSizeError(
f"Crop size (height, width) exceeds image dimensions (height, width):"
f" {(self.height, self.width)} vs {image_shape[:2]}",
)
# Get padding params first if needed
pad_params = self._get_pad_params(image_shape, (self.height, self.width))
# If padding is needed, adjust the image shape for crop calculation
if pad_params is not None:
pad_top = pad_params["pad_top"]
pad_bottom = pad_params["pad_bottom"]
pad_left = pad_params["pad_left"]
pad_right = pad_params["pad_right"]
padded_height = image_height + pad_top + pad_bottom
padded_width = image_width + pad_left + pad_right
padded_shape = (padded_height, padded_width)
# Get random crop coordinates based on padded dimensions
h_start = self.py_random.random()
w_start = self.py_random.random()
crop_coords = fcrops.get_crop_coords(padded_shape, (self.height, self.width), h_start, w_start)
else:
# Get random crop coordinates based on original dimensions
h_start = self.py_random.random()
w_start = self.py_random.random()
crop_coords = fcrops.get_crop_coords(image_shape, (self.height, self.width), h_start, w_start)
return {
"crop_coords": crop_coords,
"pad_params": pad_params,
}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (
"height",
"width",
"pad_if_needed",
"border_mode",
"fill",
"fill_mask",
"pad_position",
)
class RandomCropFromBorders
(crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=None, p=1.0)
[view source on GitHub] ¶
Randomly crops the input from its borders without resizing.
This transform randomly crops parts of the input (image, mask, bounding boxes, or keypoints) from each of its borders. The amount of cropping is specified as a fraction of the input's dimensions for each side independently.
Parameters:
Name | Type | Description |
---|---|---|
crop_left | float | The maximum fraction of width to crop from the left side. Must be in the range [0.0, 1.0]. Default: 0.1 |
crop_right | float | The maximum fraction of width to crop from the right side. Must be in the range [0.0, 1.0]. Default: 0.1 |
crop_top | float | The maximum fraction of height to crop from the top. Must be in the range [0.0, 1.0]. Default: 0.1 |
crop_bottom | float | The maximum fraction of height to crop from the bottom. Must be in the range [0.0, 1.0]. Default: 0.1 |
p | float | Probability of applying the transform. Default: 1.0 |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- The actual amount of cropping for each side is randomly chosen between 0 and the specified maximum for each application of the transform.
- The sum of crop_left and crop_right must not exceed 1.0, and the sum of crop_top and crop_bottom must not exceed 1.0. Otherwise, a ValueError will be raised.
- This transform does not resize the input after cropping, so the output dimensions will be smaller than the input dimensions.
- Bounding boxes that end up fully outside the cropped area will be removed.
- Keypoints that end up outside the cropped area will be removed.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomCropFromBorders(
... crop_left=0.1, crop_right=0.2, crop_top=0.2, crop_bottom=0.1, p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
# The resulting image will have random crops from each border, with the maximum
# possible crops being 10% from the left, 20% from the right, 20% from the top,
# and 10% from the bottom. The image size will be reduced accordingly.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class RandomCropFromBorders(BaseCrop):
"""Randomly crops the input from its borders without resizing.
This transform randomly crops parts of the input (image, mask, bounding boxes, or keypoints)
from each of its borders. The amount of cropping is specified as a fraction of the input's
dimensions for each side independently.
Args:
crop_left (float): The maximum fraction of width to crop from the left side.
Must be in the range [0.0, 1.0]. Default: 0.1
crop_right (float): The maximum fraction of width to crop from the right side.
Must be in the range [0.0, 1.0]. Default: 0.1
crop_top (float): The maximum fraction of height to crop from the top.
Must be in the range [0.0, 1.0]. Default: 0.1
crop_bottom (float): The maximum fraction of height to crop from the bottom.
Must be in the range [0.0, 1.0]. Default: 0.1
p (float): Probability of applying the transform. Default: 1.0
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- The actual amount of cropping for each side is randomly chosen between 0 and
the specified maximum for each application of the transform.
- The sum of crop_left and crop_right must not exceed 1.0, and the sum of
crop_top and crop_bottom must not exceed 1.0. Otherwise, a ValueError will be raised.
- This transform does not resize the input after cropping, so the output dimensions
will be smaller than the input dimensions.
- Bounding boxes that end up fully outside the cropped area will be removed.
- Keypoints that end up outside the cropped area will be removed.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomCropFromBorders(
... crop_left=0.1, crop_right=0.2, crop_top=0.2, crop_bottom=0.1, p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
# The resulting image will have random crops from each border, with the maximum
# possible crops being 10% from the left, 20% from the right, 20% from the top,
# and 10% from the bottom. The image size will be reduced accordingly.
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
crop_left: float = Field(
ge=0.0,
le=1.0,
)
crop_right: float = Field(
ge=0.0,
le=1.0,
)
crop_top: float = Field(
ge=0.0,
le=1.0,
)
crop_bottom: float = Field(
ge=0.0,
le=1.0,
)
@model_validator(mode="after")
def validate_crop_values(self) -> Self:
if self.crop_left + self.crop_right > 1.0:
msg = "The sum of crop_left and crop_right must be <= 1."
raise ValueError(msg)
if self.crop_top + self.crop_bottom > 1.0:
msg = "The sum of crop_top and crop_bottom must be <= 1."
raise ValueError(msg)
return self
def __init__(
self,
crop_left: float = 0.1,
crop_right: float = 0.1,
crop_top: float = 0.1,
crop_bottom: float = 0.1,
always_apply: bool | None = None,
p: float = 1.0,
):
super().__init__(p=p)
self.crop_left = crop_left
self.crop_right = crop_right
self.crop_top = crop_top
self.crop_bottom = crop_bottom
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
height, width = params["shape"][:2]
x_min = self.py_random.randint(0, int(self.crop_left * width))
x_max = self.py_random.randint(max(x_min + 1, int((1 - self.crop_right) * width)), width)
y_min = self.py_random.randint(0, int(self.crop_top * height))
y_max = self.py_random.randint(max(y_min + 1, int((1 - self.crop_bottom) * height)), height)
crop_coords = x_min, y_min, x_max, y_max
return {"crop_coords": crop_coords}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return "crop_left", "crop_right", "crop_top", "crop_bottom"
class RandomCropNearBBox
(max_part_shift=(0, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=None, p=1.0)
[view source on GitHub] ¶
Crop bbox from image with random shift by x,y coordinates
Parameters:
Name | Type | Description |
---|---|---|
max_part_shift | float, (float, float | Max shift in |
cropping_bbox_key | str | Additional target key for cropping box. Default |
cropping_box_key | str | [Deprecated] Use |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Examples:
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
>>> bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class RandomCropNearBBox(BaseCrop):
"""Crop bbox from image with random shift by x,y coordinates
Args:
max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
to `cropping_bbox` dimension.
If max_part_shift is a single float, the range will be (0, max_part_shift).
Default (0, 0.3).
cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Examples:
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_bbox')],
>>> bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_bbox=[0, 5, 10, 20])
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
max_part_shift: ZeroOneRangeType
cropping_bbox_key: str
def __init__(
self,
max_part_shift: ScaleFloatType = (0, 0.3),
cropping_bbox_key: str = "cropping_bbox",
cropping_box_key: str | None = None, # Deprecated
always_apply: bool | None = None,
p: float = 1.0,
):
super().__init__(p=p)
# Check for deprecated parameter and issue warning
if cropping_box_key is not None:
warn(
"The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
"Use 'cropping_bbox_key' instead.",
DeprecationWarning,
stacklevel=2,
)
# Ensure the new parameter is used even if the old one is passed
cropping_bbox_key = cropping_box_key
self.max_part_shift = cast(tuple[float, float], max_part_shift)
self.cropping_bbox_key = cropping_bbox_key
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, tuple[float, ...]]:
bbox = data[self.cropping_bbox_key]
image_shape = params["shape"][:2]
bbox = self._clip_bbox(bbox, image_shape)
h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])
x_min = bbox[0] - self.py_random.randint(-w_max_shift, w_max_shift)
x_max = bbox[2] + self.py_random.randint(-w_max_shift, w_max_shift)
y_min = bbox[1] - self.py_random.randint(-h_max_shift, h_max_shift)
y_max = bbox[3] + self.py_random.randint(-h_max_shift, h_max_shift)
crop_coords = self._clip_bbox((x_min, y_min, x_max, y_max), image_shape)
if crop_coords[0] == crop_coords[2] or crop_coords[1] == crop_coords[3]:
crop_shape = (bbox[3] - bbox[1], bbox[2] - bbox[0])
crop_coords = fcrops.get_center_crop_coords(image_shape, crop_shape)
return {"crop_coords": crop_coords}
@property
def targets_as_params(self) -> list[str]:
return [self.cropping_bbox_key]
def get_transform_init_args_names(self) -> tuple[str, ...]:
return "max_part_shift", "cropping_bbox_key"
class RandomResizedCrop
(size=None, width=None, height=None, *, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, mask_interpolation=0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop a random part of the input and rescale it to a specified size.
This transform first crops a random portion of the input image (or mask, bounding boxes, keypoints) and then resizes the crop to a specified size. It's particularly useful for training neural networks on images of varying sizes and aspect ratios.
Parameters:
Name | Type | Description |
---|---|---|
size | tuple[int, int] | Target size for the output image, i.e. (height, width) after crop and resize. |
scale | tuple[float, float] | Range of the random size of the crop relative to the input size. For example, (0.08, 1.0) means the crop size will be between 8% and 100% of the input size. Default: (0.08, 1.0) |
ratio | tuple[float, float] | Range of aspect ratios of the random crop. For example, (0.75, 1.3333) allows crop aspect ratios from 3:4 to 4:3. Default: (0.75, 1.3333333333333333) |
interpolation | OpenCV flag | Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR |
mask_interpolation | OpenCV flag | Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST |
p | float | Probability of applying the transform. Default: 1.0 |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- This transform attempts to crop a random area with an aspect ratio and relative size specified by 'ratio' and 'scale' parameters. If it fails to find a suitable crop after 10 attempts, it will return a crop from the center of the image.
- The crop's aspect ratio is defined as width / height.
- Bounding boxes that end up fully outside the cropped area will be removed.
- Keypoints that end up outside the cropped area will be removed.
- After cropping, the result is resized to the specified size.
Mathematical Details: 1. A target area A is sampled from the range [scale[0] * input_area, scale[1] * input_area]. 2. A target aspect ratio r is sampled from the range [ratio[0], ratio[1]]. 3. The crop width and height are computed as: w = sqrt(A * r) h = sqrt(A / r) 4. If w and h are within the input image dimensions, the crop is accepted. Otherwise, steps 1-3 are repeated (up to 10 times). 5. If no valid crop is found after 10 attempts, a centered crop is taken. 6. The crop is then resized to the specified size.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomResizedCrop(size=80, scale=(0.5, 1.0), ratio=(0.75, 1.33), p=1.0)
>>> result = transform(image=image)
>>> transformed_image = result['image']
# transformed_image will be a 80x80 crop from a random location in the original image,
# with the crop's size between 50% and 100% of the original image size,
# and the crop's aspect ratio between 3:4 and 4:3.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class RandomResizedCrop(_BaseRandomSizedCrop):
"""Crop a random part of the input and rescale it to a specified size.
This transform first crops a random portion of the input image (or mask, bounding boxes, keypoints)
and then resizes the crop to a specified size. It's particularly useful for training neural networks
on images of varying sizes and aspect ratios.
Args:
size (tuple[int, int]): Target size for the output image, i.e. (height, width) after crop and resize.
scale (tuple[float, float]): Range of the random size of the crop relative to the input size.
For example, (0.08, 1.0) means the crop size will be between 8% and 100% of the input size.
Default: (0.08, 1.0)
ratio (tuple[float, float]): Range of aspect ratios of the random crop.
For example, (0.75, 1.3333) allows crop aspect ratios from 3:4 to 4:3.
Default: (0.75, 1.3333333333333333)
interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR
mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_NEAREST
p (float): Probability of applying the transform. Default: 1.0
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- This transform attempts to crop a random area with an aspect ratio and relative size
specified by 'ratio' and 'scale' parameters. If it fails to find a suitable crop after
10 attempts, it will return a crop from the center of the image.
- The crop's aspect ratio is defined as width / height.
- Bounding boxes that end up fully outside the cropped area will be removed.
- Keypoints that end up outside the cropped area will be removed.
- After cropping, the result is resized to the specified size.
Mathematical Details:
1. A target area A is sampled from the range [scale[0] * input_area, scale[1] * input_area].
2. A target aspect ratio r is sampled from the range [ratio[0], ratio[1]].
3. The crop width and height are computed as:
w = sqrt(A * r)
h = sqrt(A / r)
4. If w and h are within the input image dimensions, the crop is accepted.
Otherwise, steps 1-3 are repeated (up to 10 times).
5. If no valid crop is found after 10 attempts, a centered crop is taken.
6. The crop is then resized to the specified size.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomResizedCrop(size=80, scale=(0.5, 1.0), ratio=(0.75, 1.33), p=1.0)
>>> result = transform(image=image)
>>> transformed_image = result['image']
# transformed_image will be a 80x80 crop from a random location in the original image,
# with the crop's size between 50% and 100% of the original image size,
# and the crop's aspect ratio between 3:4 and 4:3.
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
scale: Annotated[tuple[float, float], AfterValidator(check_01), AfterValidator(nondecreasing)]
ratio: Annotated[tuple[float, float], AfterValidator(check_0plus), AfterValidator(nondecreasing)]
width: int | None = Field(
None,
deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
)
height: int | None = Field(
None,
deprecated="Initializing with 'height' and 'width' is deprecated. Use size instead.",
)
size: ScaleIntType | None
interpolation: InterpolationType
mask_interpolation: InterpolationType
@model_validator(mode="after")
def process(self) -> Self:
if isinstance(self.size, int):
if isinstance(self.width, int):
self.size = (self.size, self.width)
else:
msg = "If size is an integer, width as integer must be specified."
raise TypeError(msg)
if self.size is None:
if self.height is None or self.width is None:
message = "If 'size' is not provided, both 'height' and 'width' must be specified."
raise ValueError(message)
self.size = (self.height, self.width)
return self
def __init__(
self,
# NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
size: ScaleIntType | None = None,
width: int | None = None,
height: int | None = None,
*,
scale: tuple[float, float] = (0.08, 1.0),
ratio: tuple[float, float] = (0.75, 1.3333333333333333),
interpolation: int = cv2.INTER_LINEAR,
mask_interpolation: int = cv2.INTER_NEAREST,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(
size=cast(tuple[int, int], size),
interpolation=interpolation,
mask_interpolation=mask_interpolation,
p=p,
)
self.scale = scale
self.ratio = ratio
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
image_shape = params["shape"][:2]
image_height, image_width = image_shape
area = image_height * image_width
for _ in range(10):
target_area = self.py_random.uniform(*self.scale) * area
log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
aspect_ratio = math.exp(self.py_random.uniform(*log_ratio))
width = int(round(math.sqrt(target_area * aspect_ratio)))
height = int(round(math.sqrt(target_area / aspect_ratio)))
if 0 < width <= image_width and 0 < height <= image_height:
i = self.py_random.randint(0, image_height - height)
j = self.py_random.randint(0, image_width - width)
h_start = i * 1.0 / (image_height - height + 1e-10)
w_start = j * 1.0 / (image_width - width + 1e-10)
crop_shape = (height, width)
crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)
return {"crop_coords": crop_coords}
# Fallback to central crop
in_ratio = image_width / image_height
if in_ratio < min(self.ratio):
width = image_width
height = int(round(image_width / min(self.ratio)))
elif in_ratio > max(self.ratio):
height = image_height
width = int(round(height * max(self.ratio)))
else: # whole image
width = image_width
height = image_height
i = (image_height - height) // 2
j = (image_width - width) // 2
h_start = i * 1.0 / (image_height - height + 1e-10)
w_start = j * 1.0 / (image_width - width + 1e-10)
crop_shape = (height, width)
crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)
return {"crop_coords": crop_coords}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return "size", "scale", "ratio", "interpolation", "mask_interpolation"
class RandomSizedBBoxSafeCrop
(height, width, erosion_rate=0.0, interpolation=1, mask_interpolation=0, always_apply=None, p=1.0)
[view source on GitHub] ¶
Crop a random part of the input and rescale it to a specific size without loss of bounding boxes.
This transform first attempts to crop a random portion of the input image while ensuring that all bounding boxes remain within the cropped area. It then resizes the crop to the specified size. This is particularly useful for object detection tasks where preserving all objects in the image is crucial while also standardizing the image size.
Parameters:
Name | Type | Description |
---|---|---|
height | int | Height of the output image after resizing. |
width | int | Width of the output image after resizing. |
erosion_rate | float | A value between 0.0 and 1.0 that determines the minimum allowable size of the crop as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be at least 80% of the original image height and width. Default: 0.0 (no minimum size). |
interpolation | OpenCV flag | Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
mask_interpolation | OpenCV flag | Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST. |
p | float | Probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- This transform ensures that all bounding boxes in the original image are fully contained within the cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out), it will default to cropping the entire image.
- After cropping, the result is resized to the specified (height, width) size.
- Bounding box coordinates are adjusted to match the new image size.
- Keypoints are moved along with the crop and scaled to the new image size.
- If there are no bounding boxes in the image, it will fall back to a random crop.
Mathematical Details: 1. A crop region is selected that includes all bounding boxes. 2. The crop size is determined by the erosion_rate: min_crop_size = (1 - erosion_rate) * original_size 3. If the selected crop is smaller than min_crop_size, it's expanded to meet this requirement. 4. The crop is then resized to the specified (height, width) size. 5. Bounding box coordinates are transformed to match the new image size: new_coord = (old_coord - crop_start) * (new_size / crop_size)
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
>>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
>>> transform = A.Compose([
... A.RandomSizedBBoxSafeCrop(height=224, width=224, erosion_rate=0.2, p=1.0),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
>>> transformed_image = transformed['image']
>>> transformed_bboxes = transformed['bboxes']
# transformed_image will be a 224x224 image containing all original bounding boxes,
# with their coordinates adjusted to the new image size.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
"""Crop a random part of the input and rescale it to a specific size without loss of bounding boxes.
This transform first attempts to crop a random portion of the input image while ensuring that all bounding boxes
remain within the cropped area. It then resizes the crop to the specified size. This is particularly useful for
object detection tasks where preserving all objects in the image is crucial while also standardizing the image size.
Args:
height (int): Height of the output image after resizing.
width (int): Width of the output image after resizing.
erosion_rate (float): A value between 0.0 and 1.0 that determines the minimum allowable size of the crop
as a fraction of the original image size. For example, an erosion_rate of 0.2 means the crop will be
at least 80% of the original image height and width. Default: 0.0 (no minimum size).
interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_NEAREST.
p (float): Probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- This transform ensures that all bounding boxes in the original image are fully contained within the
cropped area. If it's not possible to find such a crop (e.g., when bounding boxes are too spread out),
it will default to cropping the entire image.
- After cropping, the result is resized to the specified (height, width) size.
- Bounding box coordinates are adjusted to match the new image size.
- Keypoints are moved along with the crop and scaled to the new image size.
- If there are no bounding boxes in the image, it will fall back to a random crop.
Mathematical Details:
1. A crop region is selected that includes all bounding boxes.
2. The crop size is determined by the erosion_rate:
min_crop_size = (1 - erosion_rate) * original_size
3. If the selected crop is smaller than min_crop_size, it's expanded to meet this requirement.
4. The crop is then resized to the specified (height, width) size.
5. Bounding box coordinates are transformed to match the new image size:
new_coord = (old_coord - crop_start) * (new_size / crop_size)
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (300, 300, 3), dtype=np.uint8)
>>> bboxes = [(10, 10, 50, 50), (100, 100, 150, 150)]
>>> transform = A.Compose([
... A.RandomSizedBBoxSafeCrop(height=224, width=224, erosion_rate=0.2, p=1.0),
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
>>> transformed = transform(image=image, bboxes=bboxes, labels=['cat', 'dog'])
>>> transformed_image = transformed['image']
>>> transformed_bboxes = transformed['bboxes']
# transformed_image will be a 224x224 image containing all original bounding boxes,
# with their coordinates adjusted to the new image size.
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
height: Annotated[int, Field(ge=1)]
width: Annotated[int, Field(ge=1)]
erosion_rate: float = Field(
ge=0.0,
le=1.0,
)
interpolation: InterpolationType
mask_interpolation: InterpolationType
def __init__(
self,
height: int,
width: int,
erosion_rate: float = 0.0,
interpolation: int = cv2.INTER_LINEAR,
mask_interpolation: int = cv2.INTER_NEAREST,
always_apply: bool | None = None,
p: float = 1.0,
):
super().__init__(erosion_rate=erosion_rate, p=p)
self.height = height
self.width = width
self.interpolation = interpolation
self.mask_interpolation = mask_interpolation
def apply(
self,
img: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
crop = fcrops.crop(img, *crop_coords)
return fgeometric.resize(crop, (self.height, self.width), self.interpolation)
def apply_to_mask(
self,
mask: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
crop = fcrops.crop(mask, *crop_coords)
return fgeometric.resize(crop, (self.height, self.width), self.mask_interpolation)
def apply_to_keypoints(
self,
keypoints: np.ndarray,
crop_coords: tuple[int, int, int, int],
**params: Any,
) -> np.ndarray:
keypoints = fcrops.crop_keypoints_by_coords(keypoints, crop_coords)
crop_height = crop_coords[3] - crop_coords[1]
crop_width = crop_coords[2] - crop_coords[0]
scale_y = self.height / crop_height
scale_x = self.width / crop_width
return fgeometric.keypoints_scale(keypoints, scale_x=scale_x, scale_y=scale_y)
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (*super().get_transform_init_args_names(), "height", "width", "interpolation", "mask_interpolation")
class RandomSizedCrop
(min_max_height, size=None, width=None, height=None, *, w2h_ratio=1.0, interpolation=1, mask_interpolation=0, p=1.0, always_apply=None)
[view source on GitHub] ¶
Crop a random part of the input and rescale it to a specific size.
This transform first crops a random portion of the input and then resizes it to a specified size. The size of the random crop is controlled by the 'min_max_height' parameter.
Parameters:
Name | Type | Description |
---|---|---|
min_max_height | tuple[int, int] | Minimum and maximum height of the crop in pixels. |
size | tuple[int, int] | Target size for the output image, i.e. (height, width) after crop and resize. |
w2h_ratio | float | Aspect ratio (width/height) of crop. Default: 1.0 |
interpolation | OpenCV flag | Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
mask_interpolation | OpenCV flag | Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST. |
p | float | Probability of applying the transform. Default: 1.0 |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
- The crop size is randomly selected for each execution within the range specified by 'min_max_height'.
- The aspect ratio of the crop is determined by the 'w2h_ratio' parameter.
- After cropping, the result is resized to the specified 'size'.
- Bounding boxes that end up fully outside the cropped area will be removed.
- Keypoints that end up outside the cropped area will be removed.
- This transform differs from RandomResizedCrop in that it allows more control over the crop size through the 'min_max_height' parameter, rather than using a scale parameter.
Mathematical Details: 1. A random crop height h is sampled from the range [min_max_height[0], min_max_height[1]]. 2. The crop width w is calculated as: w = h * w2h_ratio 3. A random location for the crop is selected within the input image. 4. The image is cropped to the size (h, w). 5. The crop is then resized to the specified 'size'.
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomSizedCrop(
... min_max_height=(50, 80),
... size=(64, 64),
... w2h_ratio=1.0,
... interpolation=cv2.INTER_LINEAR,
... p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
# transformed_image will be a 64x64 image, resulting from a crop with height
# between 50 and 80 pixels, and the same aspect ratio as specified by w2h_ratio,
# taken from a random location in the original image and then resized.
Interactive Tool Available!
Explore this transform visually and adjust parameters interactively using this tool:
Source code in albumentations/augmentations/crops/transforms.py
class RandomSizedCrop(_BaseRandomSizedCrop):
"""Crop a random part of the input and rescale it to a specific size.
This transform first crops a random portion of the input and then resizes it to a specified size.
The size of the random crop is controlled by the 'min_max_height' parameter.
Args:
min_max_height (tuple[int, int]): Minimum and maximum height of the crop in pixels.
size (tuple[int, int]): Target size for the output image, i.e. (height, width) after crop and resize.
w2h_ratio (float): Aspect ratio (width/height) of crop. Default: 1.0
interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_NEAREST.
p (float): Probability of applying the transform. Default: 1.0
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
- The crop size is randomly selected for each execution within the range specified by 'min_max_height'.
- The aspect ratio of the crop is determined by the 'w2h_ratio' parameter.
- After cropping, the result is resized to the specified 'size'.
- Bounding boxes that end up fully outside the cropped area will be removed.
- Keypoints that end up outside the cropped area will be removed.
- This transform differs from RandomResizedCrop in that it allows more control over the crop size
through the 'min_max_height' parameter, rather than using a scale parameter.
Mathematical Details:
1. A random crop height h is sampled from the range [min_max_height[0], min_max_height[1]].
2. The crop width w is calculated as: w = h * w2h_ratio
3. A random location for the crop is selected within the input image.
4. The image is cropped to the size (h, w).
5. The crop is then resized to the specified 'size'.
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.RandomSizedCrop(
... min_max_height=(50, 80),
... size=(64, 64),
... w2h_ratio=1.0,
... interpolation=cv2.INTER_LINEAR,
... p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
# transformed_image will be a 64x64 image, resulting from a crop with height
# between 50 and 80 pixels, and the same aspect ratio as specified by w2h_ratio,
# taken from a random location in the original image and then resized.
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
class InitSchema(BaseTransformInitSchema):
interpolation: InterpolationType
mask_interpolation: InterpolationType
min_max_height: OnePlusIntRangeType
w2h_ratio: Annotated[float, Field(gt=0)]
width: int | None = Field(
None,
deprecated=(
"Initializing with 'size' as an integer and a separate 'width' is deprecated. "
"Please use a tuple (height, width) for the 'size' argument."
),
)
height: int | None = Field(
None,
deprecated=(
"Initializing with 'height' and 'width' is deprecated. "
"Please use a tuple (height, width) for the 'size' argument."
),
)
size: ScaleIntType | None
@model_validator(mode="after")
def process(self) -> Self:
if isinstance(self.size, int):
if isinstance(self.width, int):
self.size = (self.size, self.width)
else:
msg = "If size is an integer, width as integer must be specified."
raise TypeError(msg)
if self.size is None:
if self.height is None or self.width is None:
message = "If 'size' is not provided, both 'height' and 'width' must be specified."
raise ValueError(message)
self.size = (self.height, self.width)
return self
def __init__(
self,
min_max_height: tuple[int, int],
# NOTE @zetyquickly: when (width, height) are deprecated, make 'size' non optional
size: ScaleIntType | None = None,
width: int | None = None,
height: int | None = None,
*,
w2h_ratio: float = 1.0,
interpolation: int = cv2.INTER_LINEAR,
mask_interpolation: int = cv2.INTER_NEAREST,
p: float = 1.0,
always_apply: bool | None = None,
):
super().__init__(
size=cast(tuple[int, int], size),
interpolation=interpolation,
mask_interpolation=mask_interpolation,
p=p,
)
self.min_max_height = min_max_height
self.w2h_ratio = w2h_ratio
def get_params_dependent_on_data(
self,
params: dict[str, Any],
data: dict[str, Any],
) -> dict[str, tuple[int, int, int, int]]:
image_shape = params["shape"][:2]
crop_height = self.py_random.randint(*self.min_max_height)
crop_width = int(crop_height * self.w2h_ratio)
crop_shape = (crop_height, crop_width)
h_start = self.py_random.random()
w_start = self.py_random.random()
crop_coords = fcrops.get_crop_coords(image_shape, crop_shape, h_start, w_start)
return {"crop_coords": crop_coords}
def get_transform_init_args_names(self) -> tuple[str, ...]:
return (*super().get_transform_init_args_names(), "min_max_height", "w2h_ratio")
domain_adaptation
special
¶
functional
¶
def apply_histogram (img, reference_image, blend_ratio)
[view source on GitHub]¶
Apply histogram matching to an input image using a reference image and blend the result.
This function performs histogram matching between the input image and a reference image, then blends the result with the original input image based on the specified blend ratio.
Parameters:
Name | Type | Description |
---|---|---|
img | np.ndarray | The input image to be transformed. Can be either grayscale or RGB. Supported dtypes: uint8, float32 (values should be in [0, 1] range). |
reference_image | np.ndarray | The reference image used for histogram matching. Should have the same number of channels as the input image. Supported dtypes: uint8, float32 (values should be in [0, 1] range). |
blend_ratio | float | The ratio for blending the matched image with the original image. Should be in the range [0, 1], where 0 means no change and 1 means full histogram matching. |
Returns:
Type | Description |
---|---|
np.ndarray | The transformed image after histogram matching and blending. The output will have the same shape and dtype as the input image. |
Supported image types: - Grayscale images: 2D arrays - RGB images: 3D arrays with 3 channels - Multispectral images: 3D arrays with more than 3 channels
Note
- If the input and reference images have different sizes, the reference image will be resized to match the input image's dimensions.
- The function uses a custom implementation of histogram matching based on OpenCV and NumPy.
- The @clipped and @preserve_channel_dim decorators ensure the output is within the valid range and maintains the original number of dimensions.
Source code in albumentations/augmentations/domain_adaptation/functional.py
@clipped
@preserve_channel_dim
def apply_histogram(img: np.ndarray, reference_image: np.ndarray, blend_ratio: float) -> np.ndarray:
"""Apply histogram matching to an input image using a reference image and blend the result.
This function performs histogram matching between the input image and a reference image,
then blends the result with the original input image based on the specified blend ratio.
Args:
img (np.ndarray): The input image to be transformed. Can be either grayscale or RGB.
Supported dtypes: uint8, float32 (values should be in [0, 1] range).
reference_image (np.ndarray): The reference image used for histogram matching.
Should have the same number of channels as the input image.
Supported dtypes: uint8, float32 (values should be in [0, 1] range).
blend_ratio (float): The ratio for blending the matched image with the original image.
Should be in the range [0, 1], where 0 means no change and 1 means full histogram matching.
Returns:
np.ndarray: The transformed image after histogram matching and blending.
The output will have the same shape and dtype as the input image.
Supported image types:
- Grayscale images: 2D arrays
- RGB images: 3D arrays with 3 channels
- Multispectral images: 3D arrays with more than 3 channels
Note:
- If the input and reference images have different sizes, the reference image
will be resized to match the input image's dimensions.
- The function uses a custom implementation of histogram matching based on OpenCV and NumPy.
- The @clipped and @preserve_channel_dim decorators ensure the output is within
the valid range and maintains the original number of dimensions.
"""
# Resize reference image only if necessary
if img.shape[:2] != reference_image.shape[:2]:
reference_image = cv2.resize(reference_image, dsize=(img.shape[1