Full API Reference on a single page¶
Pixel-level transforms¶
Here is a list of all available pixel-level transforms. You can apply a pixel-level transform to any target, and under the hood, the transform will change only the input image and return any other input targets such as masks, bounding boxes, or keypoints unchanged.
- AdvancedBlur
- Blur
- CLAHE
- ChannelDropout
- ChannelShuffle
- ChromaticAberration
- ColorJitter
- Defocus
- Downscale
- Emboss
- Equalize
- FDA
- FancyPCA
- FromFloat
- GaussNoise
- GaussianBlur
- GlassBlur
- HistogramMatching
- HueSaturationValue
- ISONoise
- ImageCompression
- InvertImg
- MedianBlur
- MotionBlur
- MultiplicativeNoise
- Normalize
- PixelDistributionAdaptation
- Posterize
- RGBShift
- RandomBrightnessContrast
- RandomFog
- RandomGamma
- RandomGravel
- RandomRain
- RandomShadow
- RandomSnow
- RandomSunFlare
- RandomToneCurve
- RingingOvershoot
- Sharpen
- Solarize
- Spatter
- Superpixels
- TemplateTransform
- ToFloat
- ToGray
- ToRGB
- ToSepia
- UnsharpMask
- ZoomBlur
Spatial-level transforms¶
Here is a table with spatial-level transforms and targets they support. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.
Transform | Image | Mask | BBoxes | Keypoints | Global Label |
---|---|---|---|---|---|
Affine | ✓ | ✓ | ✓ | ✓ | |
BBoxSafeRandomCrop | ✓ | ✓ | ✓ | ||
CenterCrop | ✓ | ✓ | ✓ | ✓ | |
CoarseDropout | ✓ | ✓ | ✓ | ||
Crop | ✓ | ✓ | ✓ | ✓ | |
CropAndPad | ✓ | ✓ | ✓ | ✓ | |
CropNonEmptyMaskIfExists | ✓ | ✓ | ✓ | ✓ | |
ElasticTransform | ✓ | ✓ | ✓ | ||
Flip | ✓ | ✓ | ✓ | ✓ | |
GridDistortion | ✓ | ✓ | ✓ | ||
GridDropout | ✓ | ✓ | |||
HorizontalFlip | ✓ | ✓ | ✓ | ✓ | |
Lambda | ✓ | ✓ | ✓ | ✓ | ✓ |
LongestMaxSize | ✓ | ✓ | ✓ | ✓ | |
MaskDropout | ✓ | ✓ | |||
MixUp | ✓ | ✓ | ✓ | ||
NoOp | ✓ | ✓ | ✓ | ✓ | ✓ |
OpticalDistortion | ✓ | ✓ | ✓ | ||
PadIfNeeded | ✓ | ✓ | ✓ | ✓ | |
Perspective | ✓ | ✓ | ✓ | ✓ | |
PiecewiseAffine | ✓ | ✓ | ✓ | ✓ | |
PixelDropout | ✓ | ✓ | |||
RandomCrop | ✓ | ✓ | ✓ | ✓ | |
RandomCropFromBorders | ✓ | ✓ | ✓ | ✓ | |
RandomGridShuffle | ✓ | ✓ | ✓ | ||
RandomResizedCrop | ✓ | ✓ | ✓ | ✓ | |
RandomRotate90 | ✓ | ✓ | ✓ | ✓ | |
RandomScale | ✓ | ✓ | ✓ | ✓ | |
RandomSizedBBoxSafeCrop | ✓ | ✓ | ✓ | ||
RandomSizedCrop | ✓ | ✓ | ✓ | ✓ | |
Resize | ✓ | ✓ | ✓ | ✓ | |
Rotate | ✓ | ✓ | ✓ | ✓ | |
SafeRotate | ✓ | ✓ | ✓ | ✓ | |
ShiftScaleRotate | ✓ | ✓ | ✓ | ✓ | |
SmallestMaxSize | ✓ | ✓ | ✓ | ✓ | |
Transpose | ✓ | ✓ | ✓ | ✓ | |
VerticalFlip | ✓ | ✓ | ✓ | ✓ | |
XYMasking | ✓ | ✓ | ✓ |
augmentations
special
¶
blur
special
¶
transforms
¶
class AdvancedBlur
(blur_limit=(3, 7), sigma_x_limit=(0.2, 1.0), sigma_y_limit=(0.2, 1.0), sigmaX_limit=None, sigmaY_limit=None, rotate_limit=90, beta_limit=(0.5, 8.0), noise_limit=(0.9, 1.1), always_apply=False, p=0.5)
[view source on GitHub] ¶
Blurs the input image using a Generalized Normal filter with randomly selected parameters.
This transform also adds multiplicative noise to the generated kernel before convolution, affecting the image in a unique way that combines blurring and noise injection for enhanced data augmentation.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | ScaleIntType | Maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma as |
sigma_x_limit | ScaleFloatType | Gaussian kernel standard deviation for the X dimension. Must be in range [0, inf). If a single value is provided, |
sigma_y_limit | ScaleFloatType | Gaussian kernel standard deviation for the Y dimension. Must follow the same rules as |
rotate_limit | ScaleIntType | Range from which a random angle used to rotate the Gaussian kernel is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit). Defaults to (-90, 90). |
beta_limit | ScaleFloatType | Distribution shape parameter. 1 represents the normal distribution. Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it lighter than normal. Defaults to (0.5, 8.0). |
noise_limit | ScaleFloatType | Multiplicative factor that controls the strength of kernel noise. Must be positive and preferably centered around 1.0. If a single value is provided, |
p | float | Probability of applying the transform. Defaults to 0.5. |
Reference
"Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data", available at https://arxiv.org/abs/2107.10833
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/blur/transforms.py
class AdvancedBlur(ImageOnlyTransform):
"""Blurs the input image using a Generalized Normal filter with randomly selected parameters.
This transform also adds multiplicative noise to the generated kernel before convolution,
affecting the image in a unique way that combines blurring and noise injection for enhanced
data augmentation.
Args:
blur_limit (ScaleIntType, optional): Maximum Gaussian kernel size for blurring the input image.
Must be zero or odd and in range [0, inf). If set to 0, it will be computed from sigma
as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
If a single value is provided, `blur_limit` will be in the range (0, blur_limit).
Defaults to (3, 7).
sigma_x_limit ScaleFloatType: Gaussian kernel standard deviation for the X dimension.
Must be in range [0, inf). If a single value is provided, `sigma_x_limit` will be in the range
(0, sigma_limit). If set to 0, sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`.
Defaults to (0.2, 1.0).
sigma_y_limit ScaleFloatType: Gaussian kernel standard deviation for the Y dimension.
Must follow the same rules as `sigma_x_limit`.
Defaults to (0.2, 1.0).
rotate_limit (ScaleIntType, optional): Range from which a random angle used to rotate the Gaussian kernel
is picked. If limit is a single int, an angle is picked from (-rotate_limit, rotate_limit).
Defaults to (-90, 90).
beta_limit (ScaleFloatType, optional): Distribution shape parameter. 1 represents the normal distribution.
Values below 1.0 make distribution tails heavier than normal, and values above 1.0 make it
lighter than normal.
Defaults to (0.5, 8.0).
noise_limit (ScaleFloatType, optional): Multiplicative factor that controls the strength of kernel noise.
Must be positive and preferably centered around 1.0. If a single value is provided,
`noise_limit` will be in the range (0, noise_limit).
Defaults to (0.75, 1.25).
p (float, optional): Probability of applying the transform.
Defaults to 0.5.
Reference:
"Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data",
available at https://arxiv.org/abs/2107.10833
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigma_x_limit: ScaleFloatType = (0.2, 1.0),
sigma_y_limit: ScaleFloatType = (0.2, 1.0),
sigmaX_limit: Optional[ScaleFloatType] = None, # noqa: N803
sigmaY_limit: Optional[ScaleFloatType] = None, # noqa: N803
rotate_limit: ScaleIntType = 90,
beta_limit: ScaleFloatType = (0.5, 8.0),
noise_limit: ScaleFloatType = (0.9, 1.1),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))
# Handle deprecation of sigmaX_limit and sigmaY_limit
if sigmaX_limit is not None:
warnings.warn("sigmaX_limit is deprecated; use sigma_x_limit instead.", DeprecationWarning)
sigma_x_limit = sigmaX_limit
if sigmaY_limit is not None:
warnings.warn("sigmaY_limit is deprecated; use sigma_y_limit instead.", DeprecationWarning)
sigma_y_limit = sigmaY_limit
self.sigma_x_limit = self.__check_values(to_tuple(sigma_x_limit, 0.0), name="sigma_x_limit")
self.sigma_y_limit = self.__check_values(to_tuple(sigma_y_limit, 0.0), name="sigma_y_limit")
self.rotate_limit = to_tuple(rotate_limit)
self.beta_limit = to_tuple(beta_limit, low=0.0)
self.noise_limit = self.__check_values(to_tuple(noise_limit, 0.0), name="noise_limit")
if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
):
msg = "AdvancedBlur supports only odd blur limits."
raise ValueError(msg)
if self.sigma_x_limit[0] == 0 and self.sigma_y_limit[0] == 0:
msg = "sigma_x_limit and sigma_y_limit minimum value cannot be both equal to 0."
raise ValueError(msg)
if not (self.beta_limit[0] < 1.0 < self.beta_limit[1]):
msg = "Beta limit is expected to include 1.0."
raise ValueError(msg)
@staticmethod
def __check_values(
value: Sequence[float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
) -> Sequence[float]:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{name} values should be between {bounds}")
return value
def apply(self, img: np.ndarray, kernel: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return FMain.convolve(img, kernel=kernel)
def get_params(self) -> Dict[str, np.ndarray]:
ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
sigma_x = random.uniform(*self.sigma_x_limit)
sigma_y = random.uniform(*self.sigma_y_limit)
angle = np.deg2rad(random.uniform(*self.rotate_limit))
# Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
beta = (
random.uniform(self.beta_limit[0], 1) if random.random() < HALF else random.uniform(1, self.beta_limit[1])
)
noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])
# Generate mesh grid centered at zero.
ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
# > Shape (ksize, ksize, 2)
grid = np.stack(np.meshgrid(ax, ax), axis=-1)
# Calculate rotated sigma matrix
d_matrix = np.array([[sigma_x**2, 0], [0, sigma_y**2]])
u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))
inverse_sigma = np.linalg.inv(sigma_matrix)
# Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
# Add noise
kernel *= noise_matrix
# Normalize kernel
kernel = kernel.astype(np.float32) / np.sum(kernel)
return {"kernel": kernel}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str]:
return (
"blur_limit",
"sigma_x_limit",
"sigma_y_limit",
"rotate_limit",
"beta_limit",
"noise_limit",
)
class Blur
(blur_limit=7, always_apply=False, p=0.5)
[view source on GitHub] ¶
Blur the input image using a random-sized kernel.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | Union[int, Tuple[int, int]] | maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/blur/transforms.py
class Blur(ImageOnlyTransform):
"""Blur the input image using a random-sized kernel.
Args:
blur_limit: maximum kernel size for blurring the input image.
Should be in range [3, inf). Default: (3, 7).
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply, p)
self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))
def apply(self, img: np.ndarray, kernel: int = 3, **params: Any) -> np.ndarray:
return F.blur(img, kernel)
def get_params(self) -> Dict[str, Any]:
return {"ksize": int(random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2))))}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("blur_limit",)
class Defocus
(radius=(3, 10), alias_blur=(0.1, 0.5), always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply defocus transform. See https://arxiv.org/abs/1903.12261.
Parameters:
Name | Type | Description |
---|---|---|
radius | int, int) or int | range for radius of defocusing. If limit is a single int, the range will be [1, limit]. Default: (3, 10). |
alias_blur | float, float) or float | range for alias_blur of defocusing (sigma of gaussian blur). If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: Any
Source code in albumentations/augmentations/blur/transforms.py
class Defocus(ImageOnlyTransform):
"""Apply defocus transform. See https://arxiv.org/abs/1903.12261.
Args:
radius ((int, int) or int): range for radius of defocusing.
If limit is a single int, the range will be [1, limit]. Default: (3, 10).
alias_blur ((float, float) or float): range for alias_blur of defocusing (sigma of gaussian blur).
If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
Any
"""
def __init__(
self,
radius: ScaleIntType = (3, 10),
alias_blur: ScaleFloatType = (0.1, 0.5),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.radius = to_tuple(radius, low=1)
self.alias_blur = to_tuple(alias_blur, low=0)
if self.radius[0] <= 0:
msg = "Parameter radius must be positive"
raise ValueError(msg)
if self.alias_blur[0] < 0:
msg = "Parameter alias_blur must be non-negative"
raise ValueError(msg)
def apply(self, img: np.ndarray, radius: int = 3, alias_blur: float = 0.5, **params: Any) -> np.ndarray:
return F.defocus(img, radius, alias_blur)
def get_params(self) -> Dict[str, Any]:
return {
"radius": random_utils.randint(self.radius[0], self.radius[1] + 1),
"alias_blur": random_utils.uniform(self.alias_blur[0], self.alias_blur[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("radius", "alias_blur")
class GaussianBlur
(blur_limit=(3, 7), sigma_limit=0, always_apply=False, p=0.5)
[view source on GitHub] ¶
Blur the input image using a Gaussian filter with a random kernel size.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | int, (int, int | maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as |
sigma_limit | float, (float, float | Gaussian kernel standard deviation. Must be in range [0, inf). If set single value |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/blur/transforms.py
class GaussianBlur(ImageOnlyTransform):
"""Blur the input image using a Gaussian filter with a random kernel size.
Args:
blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
If set single value `blur_limit` will be in range (0, blur_limit).
Default: (3, 7).
sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
If set single value `sigma_limit` will be in range (0, sigma_limit).
If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigma_limit: ScaleFloatType = 0,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 0))
self.sigma_limit = to_tuple(sigma_limit if sigma_limit is not None else 0, 0)
if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
self.blur_limit = 3, max(3, self.blur_limit[1])
warnings.warn(
"blur_limit and sigma_limit minimum value can not be both equal to 0. "
"blur_limit minimum value changed to 3."
)
if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
):
msg = "GaussianBlur supports only odd blur limits."
raise ValueError(msg)
def apply(self, img: np.ndarray, ksize: int = 3, sigma: float = 0, **params: Any) -> np.ndarray:
return F.gaussian_blur(img, ksize, sigma=sigma)
def get_params(self) -> Dict[str, float]:
ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
if ksize != 0 and ksize % 2 != 1:
ksize = (ksize + 1) % (self.blur_limit[1] + 1)
return {"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("blur_limit", "sigma_limit")
class GlassBlur
(sigma=0.7, max_delta=4, iterations=2, always_apply=False, mode='fast', p=0.5)
[view source on GitHub] ¶
Apply glass noise to the input image.
Parameters:
Name | Type | Description |
---|---|---|
sigma | float | standard deviation for Gaussian kernel. |
max_delta | int | max distance between pixels which are swapped. |
iterations | int | number of repeats. Should be in range [1, inf). Default: (2). |
mode | str | mode of computation: fast or exact. Default: "fast". |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Reference: | https://arxiv.org/abs/1903.12261 | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
Source code in albumentations/augmentations/blur/transforms.py
class GlassBlur(Blur):
"""Apply glass noise to the input image.
Args:
sigma (float): standard deviation for Gaussian kernel.
max_delta (int): max distance between pixels which are swapped.
iterations (int): number of repeats.
Should be in range [1, inf). Default: (2).
mode (str): mode of computation: fast or exact. Default: "fast".
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
Reference:
| https://arxiv.org/abs/1903.12261
| https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
"""
def __init__(
self,
sigma: float = 0.7,
max_delta: int = 4,
iterations: int = 2,
always_apply: bool = False,
mode: str = "fast",
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
if iterations < 1:
raise ValueError(f"Iterations should be more or equal to 1, but we got {iterations}")
if mode not in ["fast", "exact"]:
raise ValueError(f"Mode should be 'fast' or 'exact', but we got {mode}")
self.sigma = sigma
self.max_delta = max_delta
self.iterations = iterations
self.mode = mode
def apply(self, img: np.ndarray, *args: Any, dxy: np.ndarray = None, **params: Any) -> np.ndarray:
if dxy is None:
msg = "dxy is None"
raise ValueError(msg)
return F.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
img = params["image"]
# generate array containing all necessary values for transformations
width_pixels = img.shape[0] - self.max_delta * 2
height_pixels = img.shape[1] - self.max_delta * 2
total_pixels = int(width_pixels * height_pixels)
dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))
return {"dxy": dxy}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("sigma", "max_delta", "iterations", "mode")
@property
def targets_as_params(self) -> List[str]:
return ["image"]
class MedianBlur
(blur_limit=7, always_apply=False, p=0.5)
[view source on GitHub] ¶
Blur the input image using a median filter with a random aperture linear size.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | int | maximum aperture linear size for blurring the input image. Must be odd and in range [3, inf). Default: (3, 7). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/blur/transforms.py
class MedianBlur(Blur):
"""Blur the input image using a median filter with a random aperture linear size.
Args:
blur_limit (int): maximum aperture linear size for blurring the input image.
Must be odd and in range [3, inf). Default: (3, 7).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
super().__init__(blur_limit, always_apply, p)
if self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
msg = "MedianBlur supports only odd blur limits."
raise ValueError(msg)
def apply(self, img: np.ndarray, kernel: int = 3, **params: Any) -> np.ndarray:
return F.median_blur(img, kernel)
class MotionBlur
(blur_limit=7, allow_shifted=True, always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply motion blur to the input image using a random-sized kernel.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | int | maximum kernel size for blurring the input image. Should be in range [3, inf). Default: (3, 7). |
allow_shifted | bool | if set to true creates non shifted kernels only, otherwise creates randomly shifted kernels. Default: True. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/blur/transforms.py
class MotionBlur(Blur):
"""Apply motion blur to the input image using a random-sized kernel.
Args:
blur_limit (int): maximum kernel size for blurring the input image.
Should be in range [3, inf). Default: (3, 7).
allow_shifted (bool): if set to true creates non shifted kernels only,
otherwise creates randomly shifted kernels. Default: True.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
blur_limit: ScaleIntType = 7,
allow_shifted: bool = True,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(blur_limit=blur_limit, always_apply=always_apply, p=p)
self.allow_shifted = allow_shifted
if not allow_shifted and self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
raise ValueError(f"Blur limit must be odd when centered=True. Got: {self.blur_limit}")
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (*super().get_transform_init_args_names(), "allow_shifted")
def apply(self, img: np.ndarray, kernel: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return FMain.convolve(img, kernel=kernel)
def get_params(self) -> Dict[str, Any]:
ksize = random.choice(list(range(self.blur_limit[0], self.blur_limit[1] + 1, 2)))
if ksize <= TWO:
raise ValueError(f"ksize must be > 2. Got: {ksize}")
kernel = np.zeros((ksize, ksize), dtype=np.uint8)
x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
if x1 == x2:
y1, y2 = random.sample(range(ksize), 2)
else:
y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
def make_odd_val(v1: int, v2: int) -> Tuple[int, int]:
len_v = abs(v1 - v2) + 1
if len_v % 2 != 1:
if v2 > v1:
v2 -= 1
else:
v1 -= 1
return v1, v2
if not self.allow_shifted:
x1, x2 = make_odd_val(x1, x2)
y1, y2 = make_odd_val(y1, y2)
xc = (x1 + x2) / 2
yc = (y1 + y2) / 2
center = ksize / 2 - 0.5
dx = xc - center
dy = yc - center
x1, x2 = (int(i - dx) for i in [x1, x2])
y1, y2 = (int(i - dy) for i in [y1, y2])
cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)
# Normalize kernel
return {"kernel": kernel.astype(np.float32) / np.sum(kernel)}
class ZoomBlur
(max_factor=1.31, step_factor=(0.01, 0.03), always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.
Parameters:
Name | Type | Description |
---|---|---|
max_factor | float, float) or float | range for max factor for blurring. If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31). All max_factor values should be larger than 1. |
step_factor | float, float) or float | If single float will be used as step parameter for np.arange. If tuple of float step_factor will be in range |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: Any
Source code in albumentations/augmentations/blur/transforms.py
class ZoomBlur(ImageOnlyTransform):
"""Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.
Args:
max_factor ((float, float) or float): range for max factor for blurring.
If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
All max_factor values should be larger than 1.
step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
All step_factor values should be positive.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
Any
"""
def __init__(
self,
max_factor: ScaleFloatType = 1.31,
step_factor: ScaleFloatType = (0.01, 0.03),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.max_factor = to_tuple(max_factor, low=1.0)
self.step_factor = to_tuple(step_factor, step_factor)
if self.max_factor[0] < 1:
msg = "Max factor must be larger or equal 1"
raise ValueError(msg)
if self.step_factor[0] <= 0:
msg = "Step factor must be positive"
raise ValueError(msg)
def apply(self, img: np.ndarray, zoom_factors: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
if zoom_factors is None:
msg = "zoom_factors is None"
raise ValueError(msg)
return F.zoom_blur(img, zoom_factors)
def get_params(self) -> Dict[str, Any]:
max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
return {"zoom_factors": np.arange(1.0, max_factor, step_factor)}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("max_factor", "step_factor")
crops
special
¶
functional
¶
def bbox_crop (bbox, x_min, y_min, x_max, y_max, rows, cols)
[view source on GitHub]¶
Crop a bounding box.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box |
x_min | int | |
y_min | int | |
x_max | int | |
y_max | int | |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A cropped bounding box |
Source code in albumentations/augmentations/crops/functional.py
def bbox_crop(
bbox: BoxInternalType, x_min: int, y_min: int, x_max: int, y_max: int, rows: int, cols: int
) -> BoxInternalType:
"""Crop a bounding box.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
x_min:
y_min:
x_max:
y_max:
rows: Image rows.
cols: Image cols.
Returns:
A cropped bounding box `(x_min, y_min, x_max, y_max)`.
"""
crop_coords = x_min, y_min, x_max, y_max
crop_height = y_max - y_min
crop_width = x_max - x_min
return crop_bbox_by_coords(bbox, crop_coords, crop_height, crop_width, rows, cols)
def crop_bbox_by_coords (bbox, crop_coords, crop_height, crop_width, rows, cols)
[view source on GitHub]¶
Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A cropped box |
crop_coords | Tuple[int, int, int, int] | Crop coordinates |
crop_height | int | |
crop_width | int | |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A cropped bounding box |
Source code in albumentations/augmentations/crops/functional.py
def crop_bbox_by_coords(
bbox: BoxInternalType,
crop_coords: Tuple[int, int, int, int],
crop_height: int,
crop_width: int,
rows: int,
cols: int,
) -> BoxInternalType:
"""Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the
required height and width of the crop.
Args:
bbox: A cropped box `(x_min, y_min, x_max, y_max)`.
crop_coords: Crop coordinates `(x1, y1, x2, y2)`.
crop_height:
crop_width:
rows: Image rows.
cols: Image cols.
Returns:
A cropped bounding box `(x_min, y_min, x_max, y_max)`.
"""
normalized_bbox = denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = normalized_bbox[:4]
x1, y1 = crop_coords[:2]
cropped_bbox = x_min - x1, y_min - y1, x_max - x1, y_max - y1
return cast(BoxInternalType, normalize_bbox(cropped_bbox, crop_height, crop_width))
def crop_keypoint_by_coords (keypoint, crop_coords)
[view source on GitHub]¶
Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | tuple | A keypoint |
crop_coords | tuple | Crop box coords |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/crops/functional.py
def crop_keypoint_by_coords(
keypoint: KeypointInternalType, crop_coords: Tuple[int, int, int, int]
) -> KeypointInternalType:
"""Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the
required height and width of the crop.
Args:
keypoint (tuple): A keypoint `(x, y, angle, scale)`.
crop_coords (tuple): Crop box coords `(x1, x2, y1, y2)`.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
x, y, angle, scale = keypoint[:4]
x1, y1 = crop_coords[:2]
return x - x1, y - y1, angle, scale
def keypoint_center_crop (keypoint, crop_height, crop_width, rows, cols)
[view source on GitHub]¶
Keypoint center crop.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
crop_height | int | Crop height. |
crop_width | int | Crop width. |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/crops/functional.py
def keypoint_center_crop(
keypoint: KeypointInternalType, crop_height: int, crop_width: int, rows: int, cols: int
) -> KeypointInternalType:
"""Keypoint center crop.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
crop_height: Crop height.
crop_width: Crop width.
rows: Image height.
cols: Image width.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
crop_coords = get_center_crop_coords(rows, cols, crop_height, crop_width)
return crop_keypoint_by_coords(keypoint, crop_coords)
def keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols)
[view source on GitHub]¶
Keypoint random crop.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | (tuple): A keypoint |
crop_height | int | Crop height. |
crop_width | int | Crop width. |
h_start | int | Crop height start. |
w_start | int | Crop width start. |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/crops/functional.py
def keypoint_random_crop(
keypoint: KeypointInternalType,
crop_height: int,
crop_width: int,
h_start: float,
w_start: float,
rows: int,
cols: int,
) -> KeypointInternalType:
"""Keypoint random crop.
Args:
keypoint: (tuple): A keypoint `(x, y, angle, scale)`.
crop_height (int): Crop height.
crop_width (int): Crop width.
h_start (int): Crop height start.
w_start (int): Crop width start.
rows (int): Image height.
cols (int): Image width.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
crop_coords = get_random_crop_coords(rows, cols, crop_height, crop_width, h_start, w_start)
return crop_keypoint_by_coords(keypoint, crop_coords)
transforms
¶
class BBoxSafeRandomCrop
(erosion_rate=0.0, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop a random part of the input without loss of bboxes.
Parameters:
Name | Type | Description |
---|---|---|
erosion_rate | float | erosion rate applied on input image height before crop. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class BBoxSafeRandomCrop(DualTransform):
"""Crop a random part of the input without loss of bboxes.
Args:
erosion_rate: erosion rate applied on input image height before crop.
p: probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)
def __init__(self, erosion_rate: float = 0.0, always_apply: bool = False, p: float = 1.0):
super().__init__(always_apply, p)
self.erosion_rate = erosion_rate
def apply(
self,
img: np.ndarray,
crop_height: int = 0,
crop_width: int = 0,
h_start: int = 0,
w_start: int = 0,
**params: Any,
) -> np.ndarray:
return F.random_crop(img, crop_height, crop_width, h_start, w_start)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
img_h, img_w = params["image"].shape[:2]
if len(params["bboxes"]) == 0: # less likely, this class is for use with bboxes.
erosive_h = int(img_h * (1.0 - self.erosion_rate))
crop_height = img_h if erosive_h >= img_h else random.randint(erosive_h, img_h)
return {
"h_start": random.random(),
"w_start": random.random(),
"crop_height": crop_height,
"crop_width": int(crop_height * img_w / img_h),
}
# get union of all bboxes
x, y, x2, y2 = union_of_bboxes(
width=img_w, height=img_h, bboxes=params["bboxes"], erosion_rate=self.erosion_rate
)
# find bigger region
bx, by = x * random.random(), y * random.random()
bx2, by2 = x2 + (1 - x2) * random.random(), y2 + (1 - y2) * random.random()
bw, bh = bx2 - bx, by2 - by
crop_height = img_h if bh >= 1.0 else int(img_h * bh)
crop_width = img_w if bw >= 1.0 else int(img_w * bw)
h_start = np.clip(0.0 if bh >= 1.0 else by / (1.0 - bh), 0.0, 1.0)
w_start = np.clip(0.0 if bw >= 1.0 else bx / (1.0 - bw), 0.0, 1.0)
return {"h_start": h_start, "w_start": w_start, "crop_height": crop_height, "crop_width": crop_width}
def apply_to_bbox(
self,
bbox: BoxInternalType,
crop_height: int = 0,
crop_width: int = 0,
h_start: int = 0,
w_start: int = 0,
rows: int = 0,
cols: int = 0,
**params: Any,
) -> BoxInternalType:
return F.bbox_random_crop(bbox, crop_height, crop_width, h_start, w_start, rows, cols)
@property
def targets_as_params(self) -> List[str]:
return ["image", "bboxes"]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("erosion_rate",)
class CenterCrop
(height, width, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop the central part of the input.
Parameters:
Name | Type | Description |
---|---|---|
height | int | height of the crop. |
width | int | width of the crop. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Note
It is recommended to use uint8 images as input. Otherwise the operation will require internal conversion float32 -> uint8 -> float32 that causes worse performance.
Source code in albumentations/augmentations/crops/transforms.py
class CenterCrop(DualTransform):
"""Crop the central part of the input.
Args:
height: height of the crop.
width: width of the crop.
p: probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
It is recommended to use uint8 images as input.
Otherwise the operation will require internal conversion
float32 -> uint8 -> float32 that causes worse performance.
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(self, height: int, width: int, always_apply: bool = False, p: float = 1.0):
super().__init__(always_apply, p)
self.height = height
self.width = width
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.center_crop(img, self.height, self.width)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_center_crop(bbox, self.height, self.width, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.keypoint_center_crop(keypoint, self.height, self.width, **params)
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("height", "width")
class Crop
(x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop region from image.
Parameters:
Name | Type | Description |
---|---|---|
x_min | int | Minimum upper left x coordinate. |
y_min | int | Minimum upper left y coordinate. |
x_max | int | Maximum lower right x coordinate. |
y_max | int | Maximum lower right y coordinate. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class Crop(DualTransform):
"""Crop region from image.
Args:
x_min: Minimum upper left x coordinate.
y_min: Minimum upper left y coordinate.
x_max: Maximum lower right x coordinate.
y_max: Maximum lower right y coordinate.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
x_min: int = 0,
y_min: int = 0,
x_max: int = 1024,
y_max: int = 1024,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
self.x_min = x_min
self.y_min = y_min
self.x_max = x_max
self.y_max = y_max
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_crop(bbox, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.crop_keypoint_by_coords(keypoint, crop_coords=(self.x_min, self.y_min, self.x_max, self.y_max))
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("x_min", "y_min", "x_max", "y_max")
class CropAndPad
(px=None, percent=None, pad_mode=0, pad_cval=0, pad_cval_mask=0, keep_size=True, sample_independently=True, interpolation=1, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop and pad images by pixel amounts or fractions of image sizes. Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image). Padding adds pixels to the sides (e.g. black pixels). This transformation will never crop images below a height or width of 1
.
Note
This transformation automatically resizes images back to their original size. To deactivate this, add the parameter keep_size=False
.
Parameters:
Name | Type | Description |
---|---|---|
px | int or tuple | The number of pixels to crop (negative values) or pad (positive values) on each side of the image. Either this or the parameter |
percent | float or tuple | The number of pixels to crop (negative values) or pad (positive values) on each side of the image given as a fraction of the image height/width. E.g. if this is set to |
pad_mode | int | OpenCV border mode. |
pad_cval | number, Sequence[number] | The constant value to use if the pad mode is |
pad_cval_mask | number, Sequence[number] | Same as pad_cval but only for masks. |
keep_size | bool | After cropping and padding, the result image will usually have a different height/width compared to the original input image. If this parameter is set to |
sample_independently | bool | If |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
Targets
image, mask, bboxes, keypoints
Image types: any
Source code in albumentations/augmentations/crops/transforms.py
class CropAndPad(DualTransform):
"""Crop and pad images by pixel amounts or fractions of image sizes.
Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image).
Padding adds pixels to the sides (e.g. black pixels).
This transformation will never crop images below a height or width of ``1``.
Note:
This transformation automatically resizes images back to their original size. To deactivate this, add the
parameter ``keep_size=False``.
Args:
px (int or tuple):
The number of pixels to crop (negative values) or pad (positive values)
on each side of the image. Either this or the parameter `percent` may
be set, not both at the same time.
* If ``None``, then pixel-based cropping/padding will not be used.
* If ``int``, then that exact number of pixels will always be cropped/padded.
* If a ``tuple`` of two ``int`` s with values ``a`` and ``b``,
then each side will be cropped/padded by a random amount sampled
uniformly per image and side from the interval ``[a, b]``. If
however `sample_independently` is set to ``False``, only one
value will be sampled per image and used for all sides.
* If a ``tuple`` of four entries, then the entries represent top,
right, bottom, left. Each entry may be a single ``int`` (always
crop/pad by exactly that value), a ``tuple`` of two ``int`` s
``a`` and ``b`` (crop/pad by an amount within ``[a, b]``), a
``list`` of ``int`` s (crop/pad by a random value that is
contained in the ``list``).
percent (float or tuple):
The number of pixels to crop (negative values) or pad (positive values)
on each side of the image given as a *fraction* of the image
height/width. E.g. if this is set to ``-0.1``, the transformation will
always crop away ``10%`` of the image's height at both the top and the
bottom (both ``10%`` each), as well as ``10%`` of the width at the
right and left.
Expected value range is ``(-1.0, inf)``.
Either this or the parameter `px` may be set, not both
at the same time.
* If ``None``, then fraction-based cropping/padding will not be
used.
* If ``float``, then that fraction will always be cropped/padded.
* If a ``tuple`` of two ``float`` s with values ``a`` and ``b``,
then each side will be cropped/padded by a random fraction
sampled uniformly per image and side from the interval
``[a, b]``. If however `sample_independently` is set to
``False``, only one value will be sampled per image and used for
all sides.
* If a ``tuple`` of four entries, then the entries represent top,
right, bottom, left. Each entry may be a single ``float``
(always crop/pad by exactly that percent value), a ``tuple`` of
two ``float`` s ``a`` and ``b`` (crop/pad by a fraction from
``[a, b]``), a ``list`` of ``float`` s (crop/pad by a random
value that is contained in the list).
pad_mode (int): OpenCV border mode.
pad_cval (number, Sequence[number]):
The constant value to use if the pad mode is ``BORDER_CONSTANT``.
* If ``number``, then that value will be used.
* If a ``tuple`` of two ``number`` s and at least one of them is
a ``float``, then a random number will be uniformly sampled per
image from the continuous interval ``[a, b]`` and used as the
value. If both ``number`` s are ``int`` s, the interval is
discrete.
* If a ``list`` of ``number``, then a random value will be chosen
from the elements of the ``list`` and used as the value.
pad_cval_mask (number, Sequence[number]): Same as pad_cval but only for masks.
keep_size (bool):
After cropping and padding, the result image will usually have a
different height/width compared to the original input image. If this
parameter is set to ``True``, then the cropped/padded image will be
resized to the input image's size, i.e. the output shape is always identical to the input shape.
sample_independently (bool):
If ``False`` *and* the values for `px`/`percent` result in exactly
*one* probability distribution for all image sides, only one single
value will be sampled from that probability distribution and used for
all sides. I.e. the crop/pad amount then is the same for all sides.
If ``True``, four values will be sampled independently, one per side.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
Targets:
image, mask, bboxes, keypoints
Image types:
any
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
px: Optional[Union[int, List[int]]] = None,
percent: Optional[Union[float, List[float]]] = None,
pad_mode: int = cv2.BORDER_CONSTANT,
pad_cval: Union[float, Sequence[float]] = 0,
pad_cval_mask: Union[float, Sequence[float]] = 0,
keep_size: bool = True,
sample_independently: bool = True,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
if px is None and percent is None:
msg = "px and percent are empty!"
raise ValueError(msg)
if px is not None and percent is not None:
msg = "Only px or percent may be set!"
raise ValueError(msg)
self.px = px
self.percent = percent
self.pad_mode = pad_mode
self.pad_cval = pad_cval
self.pad_cval_mask = pad_cval_mask
self.keep_size = keep_size
self.sample_independently = sample_independently
self.interpolation = interpolation
def apply(
self,
img: np.ndarray,
crop_params: Sequence[int] = (),
pad_params: Sequence[int] = (),
pad_value: float = 0,
rows: int = 0,
cols: int = 0,
interpolation: int = cv2.INTER_LINEAR,
**params: Any,
) -> np.ndarray:
return F.crop_and_pad(
img, crop_params, pad_params, pad_value, rows, cols, interpolation, self.pad_mode, self.keep_size
)
def apply_to_mask(
self,
mask: np.ndarray,
crop_params: Optional[Sequence[int]] = None,
pad_params: Optional[Sequence[int]] = None,
pad_value_mask: Optional[float] = None,
rows: int = 0,
cols: int = 0,
interpolation: int = cv2.INTER_NEAREST,
**params: Any,
) -> np.ndarray:
return F.crop_and_pad(
mask, crop_params, pad_params, pad_value_mask, rows, cols, interpolation, self.pad_mode, self.keep_size
)
def apply_to_bbox(
self,
bbox: BoxInternalType,
crop_params: Optional[Sequence[int]] = None,
pad_params: Optional[Sequence[int]] = None,
rows: int = 0,
cols: int = 0,
result_rows: int = 0,
result_cols: int = 0,
**params: Any,
) -> BoxInternalType:
return F.crop_and_pad_bbox(bbox, crop_params, pad_params, rows, cols, result_rows, result_cols)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
crop_params: Optional[Sequence[int]] = None,
pad_params: Optional[Sequence[int]] = None,
rows: int = 0,
cols: int = 0,
result_rows: int = 0,
result_cols: int = 0,
**params: Any,
) -> KeypointInternalType:
return F.crop_and_pad_keypoint(
keypoint, crop_params, pad_params, rows, cols, result_rows, result_cols, self.keep_size
)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
@staticmethod
def __prevent_zero(val1: int, val2: int, max_val: int) -> Tuple[int, int]:
regain = abs(max_val) + 1
regain1 = regain // 2
regain2 = regain // 2
if regain1 + regain2 < regain:
regain1 += 1
if regain1 > val1:
diff = regain1 - val1
regain1 = val1
regain2 += diff
elif regain2 > val2:
diff = regain2 - val2
regain2 = val2
regain1 += diff
val1 = val1 - regain1
val2 = val2 - regain2
return val1, val2
@staticmethod
def _prevent_zero(crop_params: List[int], height: int, width: int) -> List[int]:
top, right, bottom, left = crop_params
remaining_height = height - (top + bottom)
remaining_width = width - (left + right)
if remaining_height < 1:
top, bottom = CropAndPad.__prevent_zero(top, bottom, height)
if remaining_width < 1:
left, right = CropAndPad.__prevent_zero(left, right, width)
return [max(top, 0), max(right, 0), max(bottom, 0), max(left, 0)]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
height, width = params["image"].shape[:2]
if self.px is not None:
new_params = self._get_px_params()
else:
percent_params = self._get_percent_params()
new_params = [
int(percent_params[0] * height),
int(percent_params[1] * width),
int(percent_params[2] * height),
int(percent_params[3] * width),
]
pad_params = [max(i, 0) for i in new_params]
crop_params = self._prevent_zero([-min(i, 0) for i in new_params], height, width)
top, right, bottom, left = crop_params
crop_params = [left, top, width - right, height - bottom]
result_rows = crop_params[3] - crop_params[1]
result_cols = crop_params[2] - crop_params[0]
if result_cols == width and result_rows == height:
crop_params = []
top, right, bottom, left = pad_params
pad_params = [top, bottom, left, right]
if any(pad_params):
result_rows += top + bottom
result_cols += left + right
else:
pad_params = []
return {
"crop_params": crop_params or None,
"pad_params": pad_params or None,
"pad_value": None if pad_params is None else self._get_pad_value(self.pad_cval),
"pad_value_mask": None if pad_params is None else self._get_pad_value(self.pad_cval_mask),
"result_rows": result_rows,
"result_cols": result_cols,
}
def _get_px_params(self) -> List[int]:
if self.px is None:
msg = "px is not set"
raise ValueError(msg)
if isinstance(self.px, int):
params = [self.px] * 4
elif len(self.px) == TWO:
if self.sample_independently:
params = [random.randrange(*self.px) for _ in range(4)]
else:
px = random.randrange(*self.px)
params = [px] * 4
elif isinstance(self.px[0], int):
params = self.px
else:
params = [random.randrange(*i) for i in self.px]
return params
def _get_percent_params(self) -> List[float]:
if self.percent is None:
msg = "percent is not set"
raise ValueError(msg)
if isinstance(self.percent, float):
params = [self.percent] * 4
elif len(self.percent) == TWO:
if self.sample_independently:
params = [random.uniform(*self.percent) for _ in range(4)]
else:
px = random.uniform(*self.percent)
params = [px] * 4
elif isinstance(self.percent[0], (int, float)):
params = self.percent
else:
params = [random.uniform(*i) for i in self.percent]
return params # params = [top, right, bottom, left]
@staticmethod
def _get_pad_value(pad_value: Union[float, Sequence[float]]) -> Union[int, float]:
if isinstance(pad_value, (int, float)):
return pad_value
if len(pad_value) == TWO:
a, b = pad_value
if isinstance(a, int) and isinstance(b, int):
return random.randint(a, b)
return random.uniform(a, b)
return random.choice(pad_value)
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"px",
"percent",
"pad_mode",
"pad_cval",
"pad_cval_mask",
"keep_size",
"sample_independently",
"interpolation",
)
class CropNonEmptyMaskIfExists
(height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop area with mask if mask is non-empty, else make random crop.
Parameters:
Name | Type | Description |
---|---|---|
height | int | vertical size of crop in pixels |
width | int | horizontal size of crop in pixels |
ignore_values | list of int | values to ignore in mask, |
ignore_channels | list of int | channels to ignore in mask (e.g. if background is a first channel set |
p | float | probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class CropNonEmptyMaskIfExists(DualTransform):
"""Crop area with mask if mask is non-empty, else make random crop.
Args:
height: vertical size of crop in pixels
width: horizontal size of crop in pixels
ignore_values (list of int): values to ignore in mask, `0` values are always ignored
(e.g. if background value is 5 set `ignore_values=[5]` to ignore)
ignore_channels (list of int): channels to ignore in mask
(e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
p: probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
height: int,
width: int,
ignore_values: Optional[List[int]] = None,
ignore_channels: Optional[List[int]] = None,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
if ignore_values is not None and not isinstance(ignore_values, list):
raise ValueError(f"Expected `ignore_values` of type `list`, got `{type(ignore_values)}`")
if ignore_channels is not None and not isinstance(ignore_channels, list):
raise ValueError(f"Expected `ignore_channels` of type `list`, got `{type(ignore_channels)}`")
self.height = height
self.width = width
self.ignore_values = ignore_values
self.ignore_channels = ignore_channels
def apply(
self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
) -> np.ndarray:
return F.crop(img, x_min, y_min, x_max, y_max)
def apply_to_bbox(
self, bbox: BoxInternalType, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
) -> BoxInternalType:
return F.bbox_crop(
bbox, x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max, rows=params["rows"], cols=params["cols"]
)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
x_min: int = 0,
x_max: int = 0,
y_min: int = 0,
y_max: int = 0,
**params: Any,
) -> KeypointInternalType:
return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))
def _preprocess_mask(self, mask: np.ndarray) -> np.ndarray:
mask_height, mask_width = mask.shape[:2]
if self.ignore_values is not None:
ignore_values_np = np.array(self.ignore_values)
mask = np.where(np.isin(mask, ignore_values_np), 0, mask)
if mask.ndim == THREE and self.ignore_channels is not None:
target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
mask = np.take(mask, target_channels, axis=-1)
if self.height > mask_height or self.width > mask_width:
raise ValueError(
f"Crop size ({self.height},{self.width}) is larger than image ({mask_height},{mask_width})"
)
return mask
def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
super().update_params(params, **kwargs)
if "mask" in kwargs:
mask = self._preprocess_mask(kwargs["mask"])
elif "masks" in kwargs and len(kwargs["masks"]):
masks = kwargs["masks"]
mask = self._preprocess_mask(np.copy(masks[0])) # need copy as we perform in-place mod afterwards
for m in masks[1:]:
mask |= self._preprocess_mask(m)
else:
msg = "Can not find mask for CropNonEmptyMaskIfExists"
raise RuntimeError(msg)
mask_height, mask_width = mask.shape[:2]
if mask.any():
mask = mask.sum(axis=-1) if mask.ndim == THREE else mask
non_zero_yx = np.argwhere(mask)
y, x = random.choice(non_zero_yx)
x_min = x - random.randint(0, self.width - 1)
y_min = y - random.randint(0, self.height - 1)
x_min = np.clip(x_min, 0, mask_width - self.width)
y_min = np.clip(y_min, 0, mask_height - self.height)
else:
x_min = random.randint(0, mask_width - self.width)
y_min = random.randint(0, mask_height - self.height)
x_max = x_min + self.width
y_max = y_min + self.height
params.update({"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max})
return params
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("height", "width", "ignore_values", "ignore_channels")
class RandomCrop
(height, width, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop a random part of the input.
Parameters:
Name | Type | Description |
---|---|---|
height | int | height of the crop. |
width | int | width of the crop. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class RandomCrop(DualTransform):
"""Crop a random part of the input.
Args:
height: height of the crop.
width: width of the crop.
p: probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(self, height: int, width: int, always_apply: bool = False, p: float = 1.0):
super().__init__(always_apply, p)
self.height = height
self.width = width
def apply(self, img: np.ndarray, h_start: int = 0, w_start: int = 0, **params: Any) -> np.ndarray:
return F.random_crop(img, self.height, self.width, h_start, w_start)
def get_params(self) -> Dict[str, float]:
return {"h_start": random.random(), "w_start": random.random()}
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_random_crop(bbox, self.height, self.width, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.keypoint_random_crop(keypoint, self.height, self.width, **params)
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("height", "width")
class RandomCropFromBorders
(crop_left=0.1, crop_right=0.1, crop_top=0.1, crop_bottom=0.1, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop bbox from image randomly cut parts from borders without resize at the end
Parameters:
Name | Type | Description |
---|---|---|
crop_left | float | single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut |
crop_right | float | single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut |
crop_top | float | singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut |
crop_bottom | float | single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class RandomCropFromBorders(DualTransform):
"""Crop bbox from image randomly cut parts from borders without resize at the end
Args:
crop_left (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from left side in range [0, crop_left * width)
crop_right (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from right side in range [(1 - crop_right) * width, width)
crop_top (float): singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from top side in range [0, crop_top * height)
crop_bottom (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from bottom side in range [(1 - crop_bottom) * height, height)
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
crop_left: float = 0.1,
crop_right: float = 0.1,
crop_top: float = 0.1,
crop_bottom: float = 0.1,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
self.crop_left = crop_left
self.crop_right = crop_right
self.crop_top = crop_top
self.crop_bottom = crop_bottom
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
img = params["image"]
x_min = random.randint(0, int(self.crop_left * img.shape[1]))
x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * img.shape[1])), img.shape[1])
y_min = random.randint(0, int(self.crop_top * img.shape[0]))
y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * img.shape[0])), img.shape[0])
return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}
def apply(
self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
) -> np.ndarray:
return F.clamping_crop(img, x_min, y_min, x_max, y_max)
def apply_to_mask(
self, mask: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
) -> np.ndarray:
return F.clamping_crop(mask, x_min, y_min, x_max, y_max)
def apply_to_bbox(
self, bbox: BoxInternalType, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
) -> BoxInternalType:
rows, cols = params["rows"], params["cols"]
return F.bbox_crop(bbox, x_min, y_min, x_max, y_max, rows, cols)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
x_min: int = 0,
x_max: int = 0,
y_min: int = 0,
y_max: int = 0,
**params: Any,
) -> KeypointInternalType:
return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return "crop_left", "crop_right", "crop_top", "crop_bottom"
class RandomCropNearBBox
(max_part_shift=(0.3, 0.3), cropping_bbox_key='cropping_bbox', cropping_box_key=None, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop bbox from image with random shift by x,y coordinates
Parameters:
Name | Type | Description |
---|---|---|
max_part_shift | float, (float, float | Max shift in |
cropping_bbox_key | str | Additional target key for cropping box. Default |
cropping_box_key | str | [Deprecated] Use |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Examples:
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_box')],
>>> bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])
Source code in albumentations/augmentations/crops/transforms.py
class RandomCropNearBBox(DualTransform):
"""Crop bbox from image with random shift by x,y coordinates
Args:
max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
to `cropping_bbox` dimension.
If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift).
Default (0.3, 0.3).
cropping_bbox_key (str): Additional target key for cropping box. Default `cropping_bbox`.
cropping_box_key (str): [Deprecated] Use `cropping_bbox_key` instead.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Examples:
>>> aug = Compose([RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_bbox_key='test_box')],
>>> bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
max_part_shift: ScaleFloatType = (0.3, 0.3),
cropping_bbox_key: str = "cropping_bbox",
cropping_box_key: Optional[str] = None, # Deprecated
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
self.max_part_shift = to_tuple(max_part_shift, low=max_part_shift)
# Check for deprecated parameter and issue warning
if cropping_box_key is not None:
warn(
"The parameter 'cropping_box_key' is deprecated and will be removed in future versions. "
"Use 'cropping_bbox_key' instead.",
DeprecationWarning,
stacklevel=2,
)
# Ensure the new parameter is used even if the old one is passed
cropping_bbox_key = cropping_box_key
self.cropping_bbox_key = cropping_bbox_key
if min(self.max_part_shift) < 0 or max(self.max_part_shift) > 1:
raise ValueError(f"Invalid max_part_shift. Got: {max_part_shift}")
def apply(
self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params: Any
) -> np.ndarray:
return F.clamping_crop(img, x_min, y_min, x_max, y_max)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
bbox = params[self.cropping_bbox_key]
h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])
x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)
y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)
x_min = max(0, x_min)
y_min = max(0, y_min)
return {"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_crop(bbox, **params)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
x_min: int = 0,
x_max: int = 0,
y_min: int = 0,
y_max: int = 0,
**params: Any,
) -> KeypointInternalType:
return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))
@property
def targets_as_params(self) -> List[str]:
return [self.cropping_bbox_key]
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("max_part_shift", "cropping_bbox_key")
class RandomResizedCrop
(height, width, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=1, always_apply=False, p=1.0)
[view source on GitHub] ¶
Torchvision's variant of crop a random part of the input and rescale it to some size.
Parameters:
Name | Type | Description |
---|---|---|
height | int | height after crop and resize. |
width | int | width after crop and resize. |
scale | float, float | range of size of the origin size cropped |
ratio | float, float | range of aspect ratio of the origin aspect ratio cropped |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class RandomResizedCrop(_BaseRandomSizedCrop):
"""Torchvision's variant of crop a random part of the input and rescale it to some size.
Args:
height (int): height after crop and resize.
width (int): width after crop and resize.
scale ((float, float)): range of size of the origin size cropped
ratio ((float, float)): range of aspect ratio of the origin aspect ratio cropped
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
height: int,
width: int,
scale: Tuple[float, float] = (0.08, 1.0),
ratio: Tuple[float, float] = (0.75, 1.3333333333333333),
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(height=height, width=width, interpolation=interpolation, always_apply=always_apply, p=p)
self.scale = scale
self.ratio = ratio
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Union[int, float]]:
img = params["image"]
area = img.shape[0] * img.shape[1]
for _ in range(10):
target_area = random.uniform(*self.scale) * area
log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
aspect_ratio = math.exp(random.uniform(*log_ratio))
width = int(round(math.sqrt(target_area * aspect_ratio)))
height = int(round(math.sqrt(target_area / aspect_ratio)))
if 0 < width <= img.shape[1] and 0 < height <= img.shape[0]:
i = random.randint(0, img.shape[0] - height)
j = random.randint(0, img.shape[1] - width)
return {
"crop_height": height,
"crop_width": width,
"h_start": i * 1.0 / (img.shape[0] - height + 1e-10),
"w_start": j * 1.0 / (img.shape[1] - width + 1e-10),
}
# Fallback to central crop
in_ratio = img.shape[1] / img.shape[0]
if in_ratio < min(self.ratio):
width = img.shape[1]
height = int(round(width / min(self.ratio)))
elif in_ratio > max(self.ratio):
height = img.shape[0]
width = int(round(height * max(self.ratio)))
else: # whole image
width = img.shape[1]
height = img.shape[0]
i = (img.shape[0] - height) // 2
j = (img.shape[1] - width) // 2
return {
"crop_height": height,
"crop_width": width,
"h_start": i * 1.0 / (img.shape[0] - height + 1e-10),
"w_start": j * 1.0 / (img.shape[1] - width + 1e-10),
}
def get_params(self) -> Dict[str, Any]:
return {}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str]:
return "height", "width", "scale", "ratio", "interpolation"
class RandomSizedBBoxSafeCrop
(height, width, erosion_rate=0.0, interpolation=1, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop a random part of the input and rescale it to some size without loss of bboxes.
Parameters:
Name | Type | Description |
---|---|---|
height | int | height after crop and resize. |
width | int | width after crop and resize. |
erosion_rate | float | erosion rate applied on input image height before crop. |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class RandomSizedBBoxSafeCrop(BBoxSafeRandomCrop):
"""Crop a random part of the input and rescale it to some size without loss of bboxes.
Args:
height: height after crop and resize.
width: width after crop and resize.
erosion_rate: erosion rate applied on input image height before crop.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)
def __init__(
self,
height: int,
width: int,
erosion_rate: float = 0.0,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(erosion_rate, always_apply, p)
self.height = height
self.width = width
self.interpolation = interpolation
def apply(
self,
img: np.ndarray,
crop_height: int = 0,
crop_width: int = 0,
h_start: int = 0,
w_start: int = 0,
interpolation: int = cv2.INTER_LINEAR,
**params: Any,
) -> np.ndarray:
crop = F.random_crop(img, crop_height, crop_width, h_start, w_start)
return FGeometric.resize(crop, self.height, self.width, interpolation)
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (*super().get_transform_init_args_names(), "height", "width", "interpolation")
class RandomSizedCrop
(min_max_height, height, width, w2h_ratio=1.0, interpolation=1, always_apply=False, p=1.0)
[view source on GitHub] ¶
Crop a random part of the input and rescale it to some size.
Parameters:
Name | Type | Description |
---|---|---|
min_max_height | int, int | crop size limits. |
height | int | height after crop and resize. |
width | int | width after crop and resize. |
w2h_ratio | float | aspect ratio of crop. |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/crops/transforms.py
class RandomSizedCrop(_BaseRandomSizedCrop):
"""Crop a random part of the input and rescale it to some size.
Args:
min_max_height ((int, int)): crop size limits.
height (int): height after crop and resize.
width (int): width after crop and resize.
w2h_ratio (float): aspect ratio of crop.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
min_max_height: Tuple[int, int],
height: int,
width: int,
w2h_ratio: float = 1.0,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(height=height, width=width, interpolation=interpolation, always_apply=always_apply, p=p)
self.min_max_height = min_max_height
self.w2h_ratio = w2h_ratio
def get_params(self) -> Dict[str, Union[int, float]]:
crop_height = random.randint(self.min_max_height[0], self.min_max_height[1])
return {
"h_start": random.random(),
"w_start": random.random(),
"crop_height": crop_height,
"crop_width": int(crop_height * self.w2h_ratio),
}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str]:
return "min_max_height", "height", "width", "w2h_ratio", "interpolation"
domain_adaptation
¶
class FDA
(reference_images, beta_limit=0.1, read_fn=<function read_rgb_image at 0x7f0ff19f4550>, always_apply=False, p=0.5)
[view source on GitHub] ¶
Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source and target datasets, effectively adapting images from one domain to closely resemble those from another without altering their semantic content.
This transform is particularly beneficial in scenarios where the training (source) and testing (target) images come from different distributions, such as synthetic versus real images, or day versus night scenes. Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain alignment by swapping low-frequency components of the Fourier transform between the source and target images. This technique has shown to improve the performance of models on the target domain, particularly for tasks like semantic segmentation, without additional training for domain invariance.
The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more of the original image's characteristics and higher values leading to more pronounced adaptation effects. It is recommended to use beta values less than 0.3 to avoid introducing artifacts.
Parameters:
Name | Type | Description |
---|---|---|
reference_images | Sequence[Any] | Sequence of objects to be converted into images by |
beta_limit | float or tuple of float | Coefficient beta from the paper, controlling the swapping extent of frequency components. Values should be less than 0.5. |
read_fn | Callable | User-defined function for reading images. It takes an element from |
Targets
image
Image types: uint8, float32
Reference
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)
Note
FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target domain samples are unavailable. It enables significant improvements in model generalization by aligning the low-level statistics of source and target images through a simple yet effective Fourier-based method.
Source code in albumentations/augmentations/domain_adaptation.py
class FDA(ImageOnlyTransform):
"""Fourier Domain Adaptation (FDA) for simple "style transfer" in the context of unsupervised domain adaptation
(UDA). FDA manipulates the frequency components of images to reduce the domain gap between source
and target datasets, effectively adapting images from one domain to closely resemble those from another without
altering their semantic content.
This transform is particularly beneficial in scenarios where the training (source) and testing (target) images
come from different distributions, such as synthetic versus real images, or day versus night scenes.
Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain
alignment by swapping low-frequency components of the Fourier transform between the source and target images.
This technique has shown to improve the performance of models on the target domain, particularly for tasks
like semantic segmentation, without additional training for domain invariance.
The 'beta_limit' parameter controls the extent of frequency component swapping, with lower values preserving more
of the original image's characteristics and higher values leading to more pronounced adaptation effects.
It is recommended to use beta values less than 0.3 to avoid introducing artifacts.
Args:
reference_images (Sequence[Any]): Sequence of objects to be converted into images by `read_fn`. This typically
involves paths to images that serve as target domain examples for adaptation.
beta_limit (float or tuple of float): Coefficient beta from the paper, controlling the swapping extent of
frequency components. Values should be less than 0.5.
read_fn (Callable): User-defined function for reading images. It takes an element from `reference_images` and
returns a numpy array of image pixels. By default, it is expected to take a path to an image and return a
numpy array.
Targets:
image
Image types:
uint8, float32
Reference:
- https://github.com/YanchaoYang/FDA
- https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)
Note:
FDA is a powerful tool for domain adaptation, particularly in unsupervised settings where annotated target
domain samples are unavailable. It enables significant improvements in model generalization by aligning
the low-level statistics of source and target images through a simple yet effective Fourier-based method.
"""
def __init__(
self,
reference_images: Sequence[np.ndarray],
beta_limit: ScaleFloatType = 0.1,
read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.reference_images = reference_images
self.read_fn = read_fn
if isinstance(beta_limit, float) and not 0 <= beta_limit <= MAX_BETA_LIMIT:
msg = "The beta_limit should be within [0, 0.5]."
raise ValueError(msg)
self.beta_limit = to_tuple(beta_limit, low=0)
def apply(
self, img: np.ndarray, target_image: Optional[np.ndarray] = None, beta: float = 0.1, **params: Any
) -> np.ndarray:
return fourier_domain_adaptation(img, target_image, beta)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
img = params["image"]
target_img = self.read_fn(random.choice(self.reference_images))
target_img = cv2.resize(target_img, dsize=(img.shape[1], img.shape[0]))
return {"target_image": target_img}
def get_params(self) -> Dict[str, float]:
return {"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return "reference_images", "beta_limit", "read_fn"
def to_dict_private(self) -> Dict[str, Any]:
msg = "FDA can not be serialized."
raise NotImplementedError(msg)
class HistogramMatching
(reference_images, blend_ratio=(0.5, 1.0), read_fn=<function read_rgb_image at 0x7f0ff19f4550>, always_apply=False, p=0.5)
[view source on GitHub] ¶
Implements histogram matching, a technique that adjusts the pixel values of an input image to match the histogram of a reference image. This adjustment ensures that the output image has a similar tone and contrast to the reference. The process is applied independently to each channel of multi-channel images, provided both the input and reference images have the same number of channels.
Histogram matching serves as an effective normalization method in image processing tasks such as feature matching. It is particularly useful when images originate from varied sources or are captured under different lighting conditions, helping to standardize the images' appearance before further processing.
Parameters:
Name | Type | Description |
---|---|---|
reference_images | Sequence[Any] | A sequence of objects to be converted into images by |
blend_ratio | Tuple[float, float] | Specifies the minimum and maximum blend ratio for blending the matched image with the original image. A random blend factor within this range is chosen for each image to increase the diversity of the output images. |
read_fn | Callable[[Any], np.ndarray] | A user-defined function for reading images, which accepts an element from |
p | float | The probability of applying the transform to any given image. Defaults to 0.5. |
Targets
image
Image types: uint8, float32
Note
This class cannot be serialized directly due to its dynamic nature and dependency on external image data. An attempt to serialize it will raise a NotImplementedError.
Reference
https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html
Examples:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)
Source code in albumentations/augmentations/domain_adaptation.py
class HistogramMatching(ImageOnlyTransform):
"""Implements histogram matching, a technique that adjusts the pixel values of an input image
to match the histogram of a reference image. This adjustment ensures that the output image
has a similar tone and contrast to the reference. The process is applied independently to
each channel of multi-channel images, provided both the input and reference images have the
same number of channels.
Histogram matching serves as an effective normalization method in image processing tasks such
as feature matching. It is particularly useful when images originate from varied sources or are
captured under different lighting conditions, helping to standardize the images' appearance
before further processing.
Args:
reference_images (Sequence[Any]): A sequence of objects to be converted into images by `read_fn`.
Typically, this is a sequence of image paths.
blend_ratio (Tuple[float, float]): Specifies the minimum and maximum blend ratio for blending the matched
image with the original image. A random blend factor within this range is chosen for each image to
increase the diversity of the output images.
read_fn (Callable[[Any], np.ndarray]): A user-defined function for reading images, which accepts an
element from `reference_images` and returns a numpy array of image pixels. By default, this is expected
to take a file path and return an image as a numpy array.
p (float): The probability of applying the transform to any given image. Defaults to 0.5.
Targets:
image
Image types:
uint8, float32
Note:
This class cannot be serialized directly due to its dynamic nature and dependency on external image data.
An attempt to serialize it will raise a NotImplementedError.
Reference:
https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.HistogramMatching([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)
"""
def __init__(
self,
reference_images: Sequence[Any],
blend_ratio: Tuple[float, float] = (0.5, 1.0),
read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.reference_images = reference_images
self.read_fn = read_fn
self.blend_ratio = blend_ratio
def apply(
self: np.ndarray,
img: np.ndarray,
reference_image: Optional[np.ndarray] = None,
blend_ratio: float = 0.5,
**params: Any,
) -> np.ndarray:
return apply_histogram(img, reference_image, blend_ratio)
def get_params(self) -> Dict[str, np.ndarray]:
return {
"reference_image": self.read_fn(random.choice(self.reference_images)),
"blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("reference_images", "blend_ratio", "read_fn")
def to_dict_private(self) -> Dict[str, Any]:
msg = "HistogramMatching can not be serialized."
raise NotImplementedError(msg)
class PixelDistributionAdaptation
(reference_images, blend_ratio=(0.25, 1.0), read_fn=<function read_rgb_image at 0x7f0ff19f4550>, transform_type='pca', always_apply=False, p=0.5)
[view source on GitHub] ¶
Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image with that of a reference image. This process involves fitting a simple statistical transformation (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images, transforming the original image with the transformation trained on it, and then applying the inverse transformation using the transform fitted on the reference image. The result is an adapted image that retains the original content while mimicking the pixel value distribution of the reference domain.
The process can be visualized as two main steps: 1. Adjusting the original image to a standard distribution space using a selected transform. 2. Moving the adjusted image into the distribution space of the reference image by applying the inverse of the transform fitted on the reference image.
This technique is especially useful in scenarios where images from different domains (e.g., synthetic vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in image processing tasks.
Parameters:
Name | Type | Description |
---|---|---|
reference_images | Sequence[Any] | A sequence of objects (typically image paths) that will be converted into images by |
blend_ratio | Tuple[float, float] | Specifies the minimum and maximum blend ratio for mixing the adapted image with the original, enhancing the diversity of the output images. |
read_fn | Callable | A user-defined function for reading and converting the objects in |
transform_type | str | Specifies the type of statistical transformation to apply. Supported values are "pca" for Principal Component Analysis, "standard" for StandardScaler, and "minmax" for MinMaxScaler. |
p | float | The probability of applying the transform to any given image. Default is 1.0. |
Targets
image
Image types: uint8, float32
Reference
For more information on the underlying approach, see: https://github.com/arsenyinfo/qudida
Note
The PixelDistributionAdaptation transform is a novel way to perform domain adaptation at the pixel level, suitable for adjusting images across different conditions without complex modeling. It is effective for preparing images before more advanced processing or analysis.
Source code in albumentations/augmentations/domain_adaptation.py
class PixelDistributionAdaptation(ImageOnlyTransform):
"""Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image
with that of a reference image. This process involves fitting a simple statistical transformation
(such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images,
transforming the original image with the transformation trained on it, and then applying the inverse
transformation using the transform fitted on the reference image. The result is an adapted image
that retains the original content while mimicking the pixel value distribution of the reference domain.
The process can be visualized as two main steps:
1. Adjusting the original image to a standard distribution space using a selected transform.
2. Moving the adjusted image into the distribution space of the reference image by applying the inverse
of the transform fitted on the reference image.
This technique is especially useful in scenarios where images from different domains (e.g., synthetic
vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in
image processing tasks.
Args:
reference_images (Sequence[Any]): A sequence of objects (typically image paths) that will be
converted into images by `read_fn`. These images serve as references for the domain adaptation.
blend_ratio (Tuple[float, float]): Specifies the minimum and maximum blend ratio for mixing
the adapted image with the original, enhancing the diversity of the output images.
read_fn (Callable): A user-defined function for reading and converting the objects in
`reference_images` into numpy arrays. By default, it assumes these objects are image paths.
transform_type (str): Specifies the type of statistical transformation to apply. Supported values
are "pca" for Principal Component Analysis, "standard" for StandardScaler, and "minmax" for
MinMaxScaler.
p (float): The probability of applying the transform to any given image. Default is 1.0.
Targets:
image
Image types:
uint8, float32
Reference:
For more information on the underlying approach, see: https://github.com/arsenyinfo/qudida
Note:
The PixelDistributionAdaptation transform is a novel way to perform domain adaptation at the pixel level,
suitable for adjusting images across different conditions without complex modeling. It is effective
for preparing images before more advanced processing or analysis.
"""
def __init__(
self,
reference_images: Sequence[Any],
blend_ratio: Tuple[float, float] = (0.25, 1.0),
read_fn: Callable[[Any], np.ndarray] = read_rgb_image,
transform_type: Literal["pca", "standard", "minmax"] = "pca",
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.reference_images = reference_images
self.read_fn = read_fn
self.blend_ratio = blend_ratio
expected_transformers = ("pca", "standard", "minmax")
if transform_type not in expected_transformers:
raise ValueError(f"Got unexpected transform_type {transform_type}. Expected one of {expected_transformers}")
self.transform_type = transform_type
@staticmethod
def _validate_shape(img: np.ndarray) -> None:
if is_grayscale_image(img) or is_multispectral_image(img):
raise ValueError(
f"Unexpected image shape: expected 3 dimensions, got {len(img.shape)}."
f"Is it a grayscale or multispectral image? It's not supported for now."
)
def ensure_uint8(self, img: np.ndarray) -> Tuple[np.ndarray, bool]:
if img.dtype == np.float32:
if img.min() < 0 or img.max() > 1:
message = (
"PixelDistributionAdaptation uses uint8 under the hood, so float32 should be converted,"
"Can not do it automatically when the image is out of [0..1] range."
)
raise TypeError(message)
return (img * 255).astype("uint8"), True
return img, False
def apply(self, img: np.ndarray, reference_image: np.ndarray, blend_ratio: float, **params: Any) -> np.ndarray:
self._validate_shape(img)
reference_image, _ = self.ensure_uint8(reference_image)
img, needs_reconvert = self.ensure_uint8(img)
adapted = adapt_pixel_distribution(
img,
ref=reference_image,
weight=blend_ratio,
transform_type=self.transform_type,
)
if needs_reconvert:
adapted = adapted.astype("float32") * (1 / 255)
return adapted
def get_params(self) -> Dict[str, Any]:
return {
"reference_image": self.read_fn(random.choice(self.reference_images)),
"blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return "reference_images", "blend_ratio", "read_fn", "transform_type"
def to_dict_private(self) -> Dict[str, Any]:
msg = "PixelDistributionAdaptation can not be serialized."
raise NotImplementedError(msg)
domain_adaptation_functional
¶
class DomainAdapter
(transformer, ref_img, color_conversions=(None, None))
[view source on GitHub] ¶
Source: https://github.com/arsenyinfo/qudida by Arseny Kravchenko
Source code in albumentations/augmentations/domain_adaptation_functional.py
class DomainAdapter:
"""Source: https://github.com/arsenyinfo/qudida by Arseny Kravchenko"""
def __init__(
self,
transformer: TransformerInterface,
ref_img: np.ndarray,
color_conversions: Tuple[None, None] = (None, None),
):
self.color_in, self.color_out = color_conversions
self.source_transformer = deepcopy(transformer)
self.target_transformer = transformer
self.target_transformer.fit(self.flatten(ref_img))
def to_colorspace(self, img: np.ndarray) -> np.ndarray:
return img if self.color_in is None else cv2.cvtColor(img, self.color_in)
def from_colorspace(self, img: np.ndarray) -> np.ndarray:
if self.color_out is None:
return img
return cv2.cvtColor(img.astype("uint8"), self.color_out)
def flatten(self, img: np.ndarray) -> np.ndarray:
img = self.to_colorspace(img)
img = img.astype("float32") / 255.0
return img.reshape(-1, 3)
def reconstruct(self, pixels: np.ndarray, height: int, width: int) -> np.ndarray:
pixels = (np.clip(pixels, 0, 1) * 255).astype("uint8")
return self.from_colorspace(pixels.reshape(height, width, 3))
@staticmethod
def _pca_sign(x: np.ndarray) -> np.ndarray:
return np.sign(np.trace(x.components_))
def __call__(self, image: np.ndarray) -> np.ndarray:
height, width = image.shape[:2]
pixels = self.flatten(image)
self.source_transformer.fit(pixels)
# dirty hack to make sure colors are not inverted
if (
hasattr(self.target_transformer, "components_")
and hasattr(self.source_transformer, "components_")
and self._pca_sign(self.target_transformer) != self._pca_sign(self.source_transformer)
):
self.target_transformer.components_ *= -1
representation = self.source_transformer.transform(pixels)
result = self.target_transformer.inverse_transform(representation)
return self.reconstruct(result, height, width)
dropout
special
¶
channel_dropout
¶
class ChannelDropout
(channel_drop_range=(1, 1), fill_value=0, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly Drop Channels in the input Image.
Parameters:
Name | Type | Description |
---|---|---|
channel_drop_range | int, int | range from which we choose the number of channels to drop. |
fill_value | int, float | pixel value for the dropped channel. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, uint16, unit32, float32
Source code in albumentations/augmentations/dropout/channel_dropout.py
class ChannelDropout(ImageOnlyTransform):
"""Randomly Drop Channels in the input Image.
Args:
channel_drop_range (int, int): range from which we choose the number of channels to drop.
fill_value (int, float): pixel value for the dropped channel.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, uint16, unit32, float32
"""
def __init__(
self,
channel_drop_range: Tuple[int, int] = (1, 1),
fill_value: float = 0,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.channel_drop_range = channel_drop_range
self.min_channels = channel_drop_range[0]
self.max_channels = channel_drop_range[1]
if not 1 <= self.min_channels <= self.max_channels:
raise ValueError(f"Invalid channel_drop_range. Got: {channel_drop_range}")
self.fill_value = fill_value
def apply(self, img: np.ndarray, channels_to_drop: Tuple[int, ...] = (0,), **params: Any) -> np.ndarray:
return channel_dropout(img, channels_to_drop, self.fill_value)
def get_params_dependent_on_targets(self, params: Mapping[str, Any]) -> Dict[str, Any]:
img = params["image"]
num_channels = img.shape[-1]
if len(img.shape) == TWO or num_channels == 1:
msg = "Images has one channel. ChannelDropout is not defined."
raise NotImplementedError(msg)
if self.max_channels >= num_channels:
msg = "Can not drop all channels in ChannelDropout."
raise ValueError(msg)
num_drop_channels = random.randint(self.min_channels, self.max_channels)
channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)
return {"channels_to_drop": channels_to_drop}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return "channel_drop_range", "fill_value"
@property
def targets_as_params(self) -> List[str]:
return ["image"]
coarse_dropout
¶
class CoarseDropout
(max_holes=8, max_height=8, max_width=8, min_holes=None, min_height=None, min_width=None, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
CoarseDropout of the rectangular regions in the image.
Parameters:
Name | Type | Description |
---|---|---|
max_holes | int | Maximum number of regions to zero out. |
max_height | int, float | Maximum height of the hole. |
max_width | int, float | Maximum width of the hole. |
min_holes | int | Minimum number of regions to zero out. If |
min_height | int, float | Minimum height of the hole. Default: None. If |
min_width | int, float | Minimum width of the hole. If |
fill_value | int, float, list of int, list of float | value for dropped pixels. |
mask_fill_value | int, float, list of int, list of float | fill value for dropped pixels in mask. If |
Targets
image, mask, keypoints
Image types: uint8, float32
Reference: | https://arxiv.org/abs/1708.04552 | https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py | https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py
Source code in albumentations/augmentations/dropout/coarse_dropout.py
class CoarseDropout(DualTransform):
"""CoarseDropout of the rectangular regions in the image.
Args:
max_holes (int): Maximum number of regions to zero out.
max_height (int, float): Maximum height of the hole.
If float, it is calculated as a fraction of the image height.
max_width (int, float): Maximum width of the hole.
If float, it is calculated as a fraction of the image width.
min_holes (int): Minimum number of regions to zero out. If `None`,
`min_holes` is be set to `max_holes`. Default: `None`.
min_height (int, float): Minimum height of the hole. Default: None. If `None`,
`min_height` is set to `max_height`. Default: `None`.
If float, it is calculated as a fraction of the image height.
min_width (int, float): Minimum width of the hole. If `None`, `min_height` is
set to `max_width`. Default: `None`.
If float, it is calculated as a fraction of the image width.
fill_value (int, float, list of int, list of float): value for dropped pixels.
mask_fill_value (int, float, list of int, list of float): fill value for dropped pixels
in mask. If `None` - mask is not affected. Default: `None`.
Targets:
image, mask, keypoints
Image types:
uint8, float32
Reference:
| https://arxiv.org/abs/1708.04552
| https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
| https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)
def __init__(
self,
max_holes: int = 8,
max_height: int = 8,
max_width: int = 8,
min_holes: Optional[int] = None,
min_height: Optional[int] = None,
min_width: Optional[int] = None,
fill_value: int = 0,
mask_fill_value: Optional[int] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.max_holes = max_holes
self.max_height = max_height
self.max_width = max_width
self.min_holes = min_holes if min_holes is not None else max_holes
self.min_height = min_height if min_height is not None else max_height
self.min_width = min_width if min_width is not None else max_width
self.fill_value = fill_value
self.mask_fill_value = mask_fill_value
if not 0 < self.min_holes <= self.max_holes:
raise ValueError(f"Invalid combination of min_holes and max_holes. Got: {[min_holes, max_holes]}")
self.check_range(self.max_height)
self.check_range(self.min_height)
self.check_range(self.max_width)
self.check_range(self.min_width)
if not 0 < self.min_height <= self.max_height:
raise ValueError(f"Invalid combination of min_height and max_height. Got: {[min_height, max_height]}")
if not 0 < self.min_width <= self.max_width:
raise ValueError(f"Invalid combination of min_width and max_width. Got: {[min_width, max_width]}")
@staticmethod
def check_range(dimension: ScalarType) -> None:
if isinstance(dimension, float) and not 0 <= dimension < 1.0:
raise ValueError(f"Invalid value {dimension}. If using floats, the value should be in the range [0.0, 1.0)")
def apply(
self,
img: np.ndarray,
fill_value: ScalarType = 0,
holes: Iterable[Tuple[int, int, int, int]] = (),
**params: Any,
) -> np.ndarray:
return cutout(img, holes, fill_value)
def apply_to_mask(
self,
mask: np.ndarray,
mask_fill_value: ScalarType = 0,
holes: Iterable[Tuple[int, int, int, int]] = (),
**params: Any,
) -> np.ndarray:
if mask_fill_value is None:
return mask
return cutout(mask, holes, mask_fill_value)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
height, width = img.shape[:2]
holes = []
for _ in range(random.randint(self.min_holes, self.max_holes)):
if all(
[
isinstance(self.min_height, int),
isinstance(self.min_width, int),
isinstance(self.max_height, int),
isinstance(self.max_width, int),
]
):
hole_height = random.randint(self.min_height, self.max_height)
hole_width = random.randint(self.min_width, self.max_width)
elif all(
[
isinstance(self.min_height, float),
isinstance(self.min_width, float),
isinstance(self.max_height, float),
isinstance(self.max_width, float),
]
):
hole_height = int(height * random.uniform(self.min_height, self.max_height))
hole_width = int(width * random.uniform(self.min_width, self.max_width))
else:
msg = "Min width, max width, \
min height and max height \
should all either be ints or floats. \
Got: {} respectively".format(
[
type(self.min_width),
type(self.max_width),
type(self.min_height),
type(self.max_height),
]
)
raise ValueError(msg)
y1 = random.randint(0, height - hole_height)
x1 = random.randint(0, width - hole_width)
y2 = y1 + hole_height
x2 = x1 + hole_width
holes.append((x1, y1, x2, y2))
return {"holes": holes}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def apply_to_keypoints(
self, keypoints: Sequence[KeypointType], holes: Iterable[Tuple[int, int, int, int]] = (), **params: Any
) -> List[KeypointType]:
return [keypoint for keypoint in keypoints if not any(keypoint_in_hole(keypoint, hole) for hole in holes)]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"max_holes",
"max_height",
"max_width",
"min_holes",
"min_height",
"min_width",
"fill_value",
"mask_fill_value",
)
grid_dropout
¶
class GridDropout
(ratio=0.5, unit_size_min=None, unit_size_max=None, holes_number_x=None, holes_number_y=None, shift_x=0, shift_y=0, random_offset=False, fill_value=0, mask_fill_value=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.
Parameters:
Name | Type | Description |
---|---|---|
ratio | float | the ratio of the mask holes to the unit_size (same for horizontal and vertical directions). Must be between 0 and 1. Default: 0.5. |
unit_size_min | int | minimum size of the grid unit. Must be between 2 and the image shorter edge. If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: |
unit_size_max | int | maximum size of the grid unit. Must be between 2 and the image shorter edge. If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: |
holes_number_x | int | the number of grid units in x direction. Must be between 1 and image width//2. If 'None', grid unit width is set as image_width//10. Default: |
holes_number_y | int | the number of grid units in y direction. Must be between 1 and image height//2. If |
shift_x | int | offsets of the grid start in x direction from (0,0) coordinate. Clipped between 0 and grid unit_width - hole_width. Default: 0. |
shift_y | int | offsets of the grid start in y direction from (0,0) coordinate. Clipped between 0 and grid unit height - hole_height. Default: 0. |
random_offset | boolean | weather to offset the grid randomly between 0 and grid unit size - hole size If 'True', entered shift_x, shift_y are ignored and set randomly. Default: |
fill_value | int | value for the dropped pixels. Default = 0 |
mask_fill_value | int | value for the dropped pixels in mask. If |
Targets
image, mask
Image types: uint8, float32
References
Source code in albumentations/augmentations/dropout/grid_dropout.py
class GridDropout(DualTransform):
"""GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.
Args:
ratio: the ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
Must be between 0 and 1. Default: 0.5.
unit_size_min (int): minimum size of the grid unit. Must be between 2 and the image shorter edge.
If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
unit_size_max (int): maximum size of the grid unit. Must be between 2 and the image shorter edge.
If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
holes_number_x (int): the number of grid units in x direction. Must be between 1 and image width//2.
If 'None', grid unit width is set as image_width//10. Default: `None`.
holes_number_y (int): the number of grid units in y direction. Must be between 1 and image height//2.
If `None`, grid unit height is set equal to the grid unit width or image height, whatever is smaller.
shift_x (int): offsets of the grid start in x direction from (0,0) coordinate.
Clipped between 0 and grid unit_width - hole_width. Default: 0.
shift_y (int): offsets of the grid start in y direction from (0,0) coordinate.
Clipped between 0 and grid unit height - hole_height. Default: 0.
random_offset (boolean): weather to offset the grid randomly between 0 and grid unit size - hole size
If 'True', entered shift_x, shift_y are ignored and set randomly. Default: `False`.
fill_value (int): value for the dropped pixels. Default = 0
mask_fill_value (int): value for the dropped pixels in mask.
If `None`, transformation is not applied to the mask. Default: `None`.
Targets:
image, mask
Image types:
uint8, float32
References:
https://arxiv.org/abs/2001.04086
"""
_targets = (Targets.IMAGE, Targets.MASK)
def __init__(
self,
ratio: float = 0.5,
unit_size_min: Optional[int] = None,
unit_size_max: Optional[int] = None,
holes_number_x: Optional[int] = None,
holes_number_y: Optional[int] = None,
shift_x: int = 0,
shift_y: int = 0,
random_offset: bool = False,
fill_value: int = 0,
mask_fill_value: Optional[ScalarType] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.ratio = ratio
self.unit_size_min = unit_size_min
self.unit_size_max = unit_size_max
self.holes_number_x = holes_number_x
self.holes_number_y = holes_number_y
self.shift_x = shift_x
self.shift_y = shift_y
self.random_offset = random_offset
self.fill_value = fill_value
self.mask_fill_value = mask_fill_value
if not 0 < self.ratio <= 1:
msg = "ratio must be between 0 and 1."
raise ValueError(msg)
def apply(self, img: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params: Any) -> np.ndarray:
return F.cutout(img, holes, self.fill_value)
def apply_to_mask(
self, mask: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params: Any
) -> np.ndarray:
if self.mask_fill_value is None:
return mask
return F.cutout(mask, holes, self.mask_fill_value)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
height, width = img.shape[:2]
unit_width, unit_height = self._calculate_unit_dimensions(width, height)
hole_width, hole_height = self._calculate_hole_dimensions(unit_width, unit_height)
shift_x, shift_y = self._calculate_shifts(unit_width, unit_height, hole_width, hole_height)
holes = self._generate_holes(width, height, unit_width, unit_height, hole_width, hole_height, shift_x, shift_y)
return {"holes": holes}
def _calculate_unit_dimensions(self, width: int, height: int) -> Tuple[int, int]:
"""Calculates the dimensions of the grid units."""
if self.unit_size_min is not None and self.unit_size_max is not None:
self._validate_unit_sizes(height, width)
unit_size = random.randint(self.unit_size_min, self.unit_size_max)
return unit_size, unit_size
return self._calculate_dimensions_based_on_holes(width, height)
def _validate_unit_sizes(self, height: int, width: int) -> None:
"""Validates the minimum and maximum unit sizes."""
if self.unit_size_min is not None and self.unit_size_max is not None:
if not TWO <= self.unit_size_min <= self.unit_size_max:
msg = "Max unit size should be >= min size, both at least 2 pixels."
raise ValueError(msg)
if self.unit_size_max > min(height, width):
msg = "Grid size limits must be within the shortest image edge."
raise ValueError(msg)
else:
msg = "unit_size_min and unit_size_max must not be None."
raise ValueError(msg)
def _calculate_dimensions_based_on_holes(self, width: int, height: int) -> Tuple[int, int]:
"""Calculates dimensions based on the number of holes specified."""
unit_width = self._calculate_dimension(width, self.holes_number_x, 10)
unit_height = self._calculate_dimension(height, self.holes_number_y, unit_width)
return unit_width, unit_height
def _calculate_dimension(self, dimension: int, holes_number: Optional[int], fallback: int) -> int:
"""Helper function to calculate unit width or height."""
if holes_number is None:
return max(2, dimension // fallback)
if not 1 <= holes_number <= dimension // 2:
raise ValueError(f"The number of holes must be between 1 and {dimension // 2}.")
return dimension // holes_number
def _calculate_hole_dimensions(self, unit_width: int, unit_height: int) -> Tuple[int, int]:
"""Calculates the dimensions of the holes to be dropped out."""
hole_width = int(unit_width * self.ratio)
hole_height = int(unit_height * self.ratio)
hole_width = min(max(hole_width, 1), unit_width - 1)
hole_height = min(max(hole_height, 1), unit_height - 1)
return hole_width, hole_height
def _calculate_shifts(
self, unit_width: int, unit_height: int, hole_width: int, hole_height: int
) -> Tuple[int, int]:
"""Calculates the shifts for the grid start."""
if self.random_offset:
shift_x = random.randint(0, unit_width - hole_width)
shift_y = random.randint(0, unit_height - hole_height)
else:
shift_x = 0 if self.shift_x is None else min(max(0, self.shift_x), unit_width - hole_width)
shift_y = 0 if self.shift_y is None else min(max(0, self.shift_y), unit_height - hole_height)
return shift_x, shift_y
def _generate_holes(
self,
width: int,
height: int,
unit_width: int,
unit_height: int,
hole_width: int,
hole_height: int,
shift_x: int,
shift_y: int,
) -> List[Tuple[int, int, int, int]]:
"""Generates the list of holes to be dropped out."""
holes = []
for i in range(width // unit_width + 1):
for j in range(height // unit_height + 1):
x1 = min(shift_x + unit_width * i, width)
y1 = min(shift_y + unit_height * j, height)
x2 = min(x1 + hole_width, width)
y2 = min(y1 + hole_height, height)
holes.append((x1, y1, x2, y2))
return holes
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"ratio",
"unit_size_min",
"unit_size_max",
"holes_number_x",
"holes_number_y",
"shift_x",
"shift_y",
"random_offset",
"fill_value",
"mask_fill_value",
)
mask_dropout
¶
class MaskDropout
(max_objects=1, image_fill_value=0, mask_fill_value=0, always_apply=False, p=0.5)
[view source on GitHub] ¶
Image & mask augmentation that zero out mask and image regions corresponding to randomly chosen object instance from mask.
Mask must be single-channel image, zero values treated as background. Image can be any number of channels.
Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254
Parameters:
Name | Type | Description |
---|---|---|
max_objects | int | Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max] |
image_fill_value | Union[float, str] | Fill value to use when filling image. Can be 'inpaint' to apply inpainting (works only for 3-channel images) |
mask_fill_value | Union[int, float] | Fill value to use when filling mask. |
Targets
image, mask
Image types: uint8, float32
Source code in albumentations/augmentations/dropout/mask_dropout.py
class MaskDropout(DualTransform):
"""Image & mask augmentation that zero out mask and image regions corresponding
to randomly chosen object instance from mask.
Mask must be single-channel image, zero values treated as background.
Image can be any number of channels.
Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254
Args:
max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
image_fill_value: Fill value to use when filling image.
Can be 'inpaint' to apply inpainting (works only for 3-channel images)
mask_fill_value: Fill value to use when filling mask.
Targets:
image, mask
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK)
def __init__(
self,
max_objects: int = 1,
image_fill_value: Union[float, str] = 0,
mask_fill_value: ScalarType = 0,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.max_objects = to_tuple(max_objects, 1)
self.image_fill_value = image_fill_value
self.mask_fill_value = mask_fill_value
@property
def targets_as_params(self) -> List[str]:
return ["mask"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
mask = params["mask"]
label_image, num_labels = label(mask, return_num=True)
if num_labels == 0:
dropout_mask = None
else:
objects_to_drop = random.randint(int(self.max_objects[0]), int(self.max_objects[1]))
objects_to_drop = min(num_labels, objects_to_drop)
if objects_to_drop == num_labels:
dropout_mask = mask > 0
else:
labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=bool)
for label_index in labels_index:
dropout_mask |= label_image == label_index
params.update({"dropout_mask": dropout_mask})
del params["mask"]
return params
def apply(self, img: np.ndarray, dropout_mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
if dropout_mask is None:
return img
if self.image_fill_value == "inpaint":
dropout_mask = dropout_mask.astype(np.uint8)
_, _, width, height = cv2.boundingRect(dropout_mask)
radius = min(3, max(width, height) // 2)
return cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)
img = img.copy()
img[dropout_mask] = self.image_fill_value
return img
def apply_to_mask(self, mask: np.ndarray, dropout_mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
if dropout_mask is None:
return mask
mask = mask.copy()
mask[dropout_mask] = self.mask_fill_value
return mask
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return "max_objects", "image_fill_value", "mask_fill_value"
xy_masking
¶
class XYMasking
(num_masks_x=0, num_masks_y=0, mask_x_length=0, mask_y_length=0, fill_value=0, mask_fill_value=0, always_apply=False, p=0.5)
[view source on GitHub] ¶
Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis), simulating occlusions. This transform is useful for training models to recognize images with varied visibility conditions. It's particularly effective for spectrogram images, allowing spectral and frequency masking to improve model robustness.
At least one of max_x_length
or max_y_length
must be specified, dictating the mask's maximum size along each axis.
Parameters:
Name | Type | Description |
---|---|---|
num_masks_x | Union[int, Tuple[int, int]] | Number or range of horizontal regions to mask. Defaults to 0. |
num_masks_y | Union[int, Tuple[int, int]] | Number or range of vertical regions to mask. Defaults to 0. |
mask_x_length | [Union[int, Tuple[int, int]] | Specifies the length of the masks along the X (horizontal) axis. If an integer is provided, it sets a fixed mask length. If a tuple of two integers (min, max) is provided, the mask length is randomly chosen within this range for each mask. This allows for variable-length masks in the horizontal direction. |
mask_y_length | Union[int, Tuple[int, int]] | Specifies the height of the masks along the Y (vertical) axis. Similar to |
fill_value | Union[int, float, List[int], List[float]] | Value to fill image masks. Defaults to 0. |
mask_fill_value | Optional[Union[int, float, List[int], List[float]]] | Value to fill masks in the mask. If |
p | float | Probability of applying the transform. Defaults to 0.5. |
Targets
image, mask, keypoints
Image types: uint8, float32
Note: Either max_x_length
or max_y_length
or both must be defined.
Source code in albumentations/augmentations/dropout/xy_masking.py
class XYMasking(DualTransform):
"""Applies masking strips to an image, either horizontally (X axis) or vertically (Y axis),
simulating occlusions. This transform is useful for training models to recognize images
with varied visibility conditions. It's particularly effective for spectrogram images,
allowing spectral and frequency masking to improve model robustness.
At least one of `max_x_length` or `max_y_length` must be specified, dictating the mask's
maximum size along each axis.
Args:
num_masks_x (Union[int, Tuple[int, int]]): Number or range of horizontal regions to mask. Defaults to 0.
num_masks_y (Union[int, Tuple[int, int]]): Number or range of vertical regions to mask. Defaults to 0.
mask_x_length ([Union[int, Tuple[int, int]]): Specifies the length of the masks along
the X (horizontal) axis. If an integer is provided, it sets a fixed mask length.
If a tuple of two integers (min, max) is provided,
the mask length is randomly chosen within this range for each mask.
This allows for variable-length masks in the horizontal direction.
mask_y_length (Union[int, Tuple[int, int]]): Specifies the height of the masks along
the Y (vertical) axis. Similar to `mask_x_length`, an integer sets a fixed mask height,
while a tuple (min, max) allows for variable-height masks, chosen randomly
within the specified range for each mask. This flexibility facilitates creating masks of various
sizes in the vertical direction.
fill_value (Union[int, float, List[int], List[float]]): Value to fill image masks. Defaults to 0.
mask_fill_value (Optional[Union[int, float, List[int], List[float]]]): Value to fill masks in the mask.
If `None`, uses mask is not affected. Default: `None`.
p (float): Probability of applying the transform. Defaults to 0.5.
Targets:
image, mask, keypoints
Image types:
uint8, float32
Note: Either `max_x_length` or `max_y_length` or both must be defined.
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)
def __init__(
self,
num_masks_x: ScaleIntType = 0,
num_masks_y: ScaleIntType = 0,
mask_x_length: ScaleIntType = 0,
mask_y_length: ScaleIntType = 0,
fill_value: ColorType = 0,
mask_fill_value: ColorType = 0,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if (
isinstance(mask_x_length, (int, float))
and mask_x_length <= 0
and isinstance(mask_y_length, (int, float))
and mask_y_length <= 0
):
msg = "At least one of `mask_x_length` or `mask_y_length` Should be a positive number."
raise ValueError(msg)
if isinstance(num_masks_x, int) and num_masks_x <= 0 and isinstance(num_masks_y, int) and num_masks_y <= 0:
msg = (
"At least one of `num_masks_x` or `num_masks_y` "
"should be a positive number or tuple of two positive numbers."
)
raise ValueError(msg)
if isinstance(num_masks_x, (tuple, list)) and min(num_masks_x) <= 0:
msg = "All values in `num_masks_x` should be non negative integers."
raise ValueError(msg)
if isinstance(num_masks_y, (tuple, list)) and min(num_masks_y) <= 0:
msg = "All values in `num_masks_y` should be non negative integers."
raise ValueError(msg)
self.num_masks_x = num_masks_x
self.num_masks_y = num_masks_y
self.mask_x_length = mask_x_length
self.mask_y_length = mask_y_length
self.fill_value = fill_value
self.mask_fill_value = mask_fill_value
def apply(
self,
img: np.ndarray,
masks_x: List[Tuple[int, int, int, int]],
masks_y: List[Tuple[int, int, int, int]],
**params: Any,
) -> np.ndarray:
return cutout(img, masks_x + masks_y, self.fill_value)
def apply_to_mask(
self,
mask: np.ndarray,
masks_x: List[Tuple[int, int, int, int]],
masks_y: List[Tuple[int, int, int, int]],
**params: Any,
) -> np.ndarray:
if self.mask_fill_value is None:
return mask
return cutout(mask, masks_x + masks_y, self.mask_fill_value)
def validate_mask_length(
self, mask_length: Optional[ScaleIntType], dimension_size: int, dimension_name: str
) -> None:
"""Validate the mask length against the corresponding image dimension size.
Args:
mask_length (Optional[Union[int, Tuple[int, int]]]): The length of the mask to be validated.
dimension_size (int): The size of the image dimension (width or height)
against which to validate the mask length.
dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.
"""
if mask_length is not None:
if isinstance(mask_length, (tuple, list)):
if mask_length[0] < 0 or mask_length[1] > dimension_size:
raise ValueError(
f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]"
)
elif mask_length < 0 or mask_length > dimension_size:
raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, List[Tuple[int, int, int, int]]]:
img = params["image"]
height, width = img.shape[:2]
# Use the helper method to validate mask lengths against image dimensions
self.validate_mask_length(self.mask_x_length, width, "mask_x_length")
self.validate_mask_length(self.mask_y_length, height, "mask_y_length")
masks_x = self.generate_masks(self.num_masks_x, width, height, self.mask_x_length, axis="x")
masks_y = self.generate_masks(self.num_masks_y, width, height, self.mask_y_length, axis="y")
return {"masks_x": masks_x, "masks_y": masks_y}
@staticmethod
def generate_mask_size(mask_length: Union[ScaleIntType]) -> int:
if isinstance(mask_length, int):
return mask_length # Use fixed size or adjust to dimension size
return random.randint(min(mask_length), max(mask_length))
def generate_masks(
self,
num_masks: ScaleIntType,
width: int,
height: int,
max_length: Optional[ScaleIntType],
axis: str,
) -> List[Tuple[int, int, int, int]]:
if max_length is None or max_length == 0 or isinstance(num_masks, (int, float)) and num_masks == 0:
return []
masks = []
num_masks_integer = num_masks if isinstance(num_masks, int) else random.randint(num_masks[0], num_masks[1])
for _ in range(num_masks_integer):
length = self.generate_mask_size(max_length)
if axis == "x":
x1 = random.randint(0, width - length)
y1 = 0
x2, y2 = x1 + length, height
else: # axis == 'y'
y1 = random.randint(0, height - length)
x1 = 0
x2, y2 = width, y1 + length
masks.append((x1, y1, x2, y2))
return masks
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def apply_to_keypoints(
self,
keypoints: Sequence[KeypointType],
masks_x: List[Tuple[int, int, int, int]],
masks_y: List[Tuple[int, int, int, int]],
**params: Any,
) -> List[KeypointType]:
return [
keypoint
for keypoint in keypoints
if not any(keypoint_in_hole(keypoint, hole) for hole in masks_x + masks_y)
]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"num_masks_x",
"num_masks_y",
"mask_x_length",
"mask_y_length",
"fill_value",
"mask_fill_value",
)
validate_mask_length (self, mask_length, dimension_size, dimension_name)
¶
Validate the mask length against the corresponding image dimension size.
Parameters:
Name | Type | Description |
---|---|---|
mask_length | Optional[Union[int, Tuple[int, int]]] | The length of the mask to be validated. |
dimension_size | int | The size of the image dimension (width or height) against which to validate the mask length. |
dimension_name | str | The name of the dimension ('width' or 'height') for error messaging. |
Source code in albumentations/augmentations/dropout/xy_masking.py
def validate_mask_length(
self, mask_length: Optional[ScaleIntType], dimension_size: int, dimension_name: str
) -> None:
"""Validate the mask length against the corresponding image dimension size.
Args:
mask_length (Optional[Union[int, Tuple[int, int]]]): The length of the mask to be validated.
dimension_size (int): The size of the image dimension (width or height)
against which to validate the mask length.
dimension_name (str): The name of the dimension ('width' or 'height') for error messaging.
"""
if mask_length is not None:
if isinstance(mask_length, (tuple, list)):
if mask_length[0] < 0 or mask_length[1] > dimension_size:
raise ValueError(
f"{dimension_name} range {mask_length} is out of valid range [0, {dimension_size}]"
)
elif mask_length < 0 or mask_length > dimension_size:
raise ValueError(f"{dimension_name} {mask_length} exceeds image {dimension_name} {dimension_size}")
functional
¶
def add_fog (img, fog_coef, alpha_coef, haze_list)
[view source on GitHub]¶
Add fog to the image.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | Image. |
fog_coef | float | Fog coefficient. |
alpha_coef | float | Alpha coefficient. |
haze_list | List[Tuple[int, int]] |
Returns:
Type | Description |
---|---|
ndarray | Image. |
Source code in albumentations/augmentations/functional.py
@preserve_shape
def add_fog(img: np.ndarray, fog_coef: float, alpha_coef: float, haze_list: List[Tuple[int, int]]) -> np.ndarray:
"""Add fog to the image.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
img: Image.
fog_coef: Fog coefficient.
alpha_coef: Alpha coefficient.
haze_list:
Returns:
Image.
"""
non_rgb_warning(img)
input_dtype = img.dtype
needs_float = False
if input_dtype == np.float32:
img = from_float(img, dtype=np.dtype("uint8"))
needs_float = True
elif input_dtype not in (np.uint8, np.float32):
raise ValueError(f"Unexpected dtype {input_dtype} for RandomFog augmentation")
width = img.shape[1]
hw = max(int(width // 3 * fog_coef), 10)
for haze_points in haze_list:
x, y = haze_points
overlay = img.copy()
output = img.copy()
alpha = alpha_coef * fog_coef
rad = hw // 2
point = (x + hw // 2, y + hw // 2)
cv2.circle(overlay, point, int(rad), (255, 255, 255), -1)
cv2.addWeighted(overlay, alpha, output, 1 - alpha, 0, output)
img = output.copy()
image_rgb = cv2.blur(img, (hw // 10, hw // 10))
if needs_float:
image_rgb = to_float(image_rgb, max_value=255)
return image_rgb
def add_gravel (img, gravels)
[view source on GitHub]¶
Add gravel to the image.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
img | numpy.ndarray | image to add gravel to |
gravels | list | list of gravel parameters. (float, float, float, float): (top-left x, top-left y, bottom-right x, bottom right y) |
Returns:
Type | Description |
---|---|
numpy.ndarray |
Source code in albumentations/augmentations/functional.py
@ensure_contiguous
@preserve_shape
def add_gravel(img: np.ndarray, gravels: List[Any]) -> np.ndarray:
"""Add gravel to the image.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
img (numpy.ndarray): image to add gravel to
gravels (list): list of gravel parameters. (float, float, float, float):
(top-left x, top-left y, bottom-right x, bottom right y)
Returns:
numpy.ndarray:
"""
non_rgb_warning(img)
input_dtype = img.dtype
needs_float = False
if input_dtype == np.float32:
img = from_float(img, dtype=np.dtype("uint8"))
needs_float = True
elif input_dtype not in (np.uint8, np.float32):
raise ValueError(f"Unexpected dtype {input_dtype} for AddGravel augmentation")
image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
for gravel in gravels:
y1, y2, x1, x2, sat = gravel
image_hls[x1:x2, y1:y2, 1] = sat
image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)
if needs_float:
image_rgb = to_float(image_rgb, max_value=255)
return image_rgb
def add_rain (img, slant, drop_length, drop_width, drop_color, blur_value, brightness_coefficient, rain_drops)
[view source on GitHub]¶
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | Image. |
slant | int | |
drop_length | int | |
drop_width | int | |
drop_color | Tuple[int, int, int] | |
blur_value | int | Rainy view are blurry. |
brightness_coefficient | float | Rainy days are usually shady. |
rain_drops | List[Tuple[int, int]] |
Returns:
Type | Description |
---|---|
ndarray | Image |
Source code in albumentations/augmentations/functional.py
@preserve_shape
def add_rain(
img: np.ndarray,
slant: int,
drop_length: int,
drop_width: int,
drop_color: Tuple[int, int, int],
blur_value: int,
brightness_coefficient: float,
rain_drops: List[Tuple[int, int]],
) -> np.ndarray:
"""From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
img: Image.
slant:
drop_length:
drop_width:
drop_color:
blur_value: Rainy view are blurry.
brightness_coefficient: Rainy days are usually shady.
rain_drops:
Returns:
Image
"""
non_rgb_warning(img)
input_dtype = img.dtype
needs_float = False
if input_dtype == np.float32:
img = from_float(img, dtype=np.dtype("uint8"))
needs_float = True
elif input_dtype not in (np.uint8, np.float32):
raise ValueError(f"Unexpected dtype {input_dtype} for RandomRain augmentation")
image = img.copy()
for rain_drop_x0, rain_drop_y0 in rain_drops:
rain_drop_x1 = rain_drop_x0 + slant
rain_drop_y1 = rain_drop_y0 + drop_length
cv2.line(
image,
(rain_drop_x0, rain_drop_y0),
(rain_drop_x1, rain_drop_y1),
drop_color,
drop_width,
)
image = cv2.blur(image, (blur_value, blur_value)) # rainy view are blurry
image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype(np.float32)
image_hsv[:, :, 2] *= brightness_coefficient
image_rgb = cv2.cvtColor(image_hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)
if needs_float:
return to_float(image_rgb, max_value=255)
return image_rgb
def add_shadow (img, vertices_list)
[view source on GitHub]¶
Add shadows to the image.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
img | numpy.ndarray | |
vertices_list | list |
Returns:
Type | Description |
---|---|
numpy.ndarray |
Source code in albumentations/augmentations/functional.py
@ensure_contiguous
@preserve_shape
def add_shadow(img: np.ndarray, vertices_list: List[List[Tuple[int, int]]]) -> np.ndarray:
"""Add shadows to the image.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
img (numpy.ndarray):
vertices_list (list):
Returns:
numpy.ndarray:
"""
non_rgb_warning(img)
input_dtype = img.dtype
needs_float = False
if input_dtype == np.float32:
img = from_float(img, dtype=np.dtype("uint8"))
needs_float = True
elif input_dtype not in (np.uint8, np.float32):
raise ValueError(f"Unexpected dtype {input_dtype} for RandomShadow augmentation")
image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
mask = np.zeros_like(img)
# adding all shadow polygons on empty mask, single 255 denotes only red channel
for vertices in vertices_list:
cv2.fillPoly(mask, vertices, 255)
# if red channel is hot, image's "Lightness" channel's brightness is lowered
red_max_value_ind = mask[:, :, 0] == MAX_VALUES_BY_DTYPE[np.dtype("uint8")]
image_hls[:, :, 1][red_max_value_ind] = image_hls[:, :, 1][red_max_value_ind] * 0.5
image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)
if needs_float:
return to_float(image_rgb, max_value=255)
return image_rgb
def add_snow (img, snow_point, brightness_coeff)
[view source on GitHub]¶
Bleaches out pixels, imitation snow.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | Image. |
snow_point | float | Number of show points. |
brightness_coeff | float | Brightness coefficient. |
Returns:
Type | Description |
---|---|
ndarray | Image. |
Source code in albumentations/augmentations/functional.py
@preserve_shape
def add_snow(img: np.ndarray, snow_point: float, brightness_coeff: float) -> np.ndarray:
"""Bleaches out pixels, imitation snow.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
img: Image.
snow_point: Number of show points.
brightness_coeff: Brightness coefficient.
Returns:
Image.
"""
non_rgb_warning(img)
input_dtype = img.dtype
needs_float = False
snow_point *= 127.5 # = 255 / 2
snow_point += 85 # = 255 / 3
if input_dtype == np.float32:
img = from_float(img, dtype=np.dtype("uint8"))
needs_float = True
elif input_dtype not in (np.uint8, np.float32):
raise ValueError(f"Unexpected dtype {input_dtype} for RandomSnow augmentation")
image_hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
image_hls = np.array(image_hls, dtype=np.float32)
image_hls[:, :, 1][image_hls[:, :, 1] < snow_point] *= brightness_coeff
image_hls[:, :, 1] = clip(image_hls[:, :, 1], np.uint8, 255)
image_hls = np.array(image_hls, dtype=np.uint8)
image_rgb = cv2.cvtColor(image_hls, cv2.COLOR_HLS2RGB)
if needs_float:
image_rgb = to_float(image_rgb, max_value=255)
return image_rgb
def add_sun_flare (img, flare_center_x, flare_center_y, src_radius, src_color, circles)
[view source on GitHub]¶
Add sun flare.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
img | numpy.ndarray | |
flare_center_x | float | |
flare_center_y | float | |
src_radius | int | |
src_color | int, int, int | |
circles | list |
Returns:
Type | Description |
---|---|
numpy.ndarray |
Source code in albumentations/augmentations/functional.py
@preserve_shape
def add_sun_flare(
img: np.ndarray,
flare_center_x: float,
flare_center_y: float,
src_radius: int,
src_color: ColorType,
circles: List[Any],
) -> np.ndarray:
"""Add sun flare.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
img (numpy.ndarray):
flare_center_x (float):
flare_center_y (float):
src_radius:
src_color (int, int, int):
circles (list):
Returns:
numpy.ndarray:
"""
non_rgb_warning(img)
input_dtype = img.dtype
needs_float = False
if input_dtype == np.float32:
img = from_float(img, dtype=np.dtype("uint8"))
needs_float = True
elif input_dtype not in (np.uint8, np.float32):
raise ValueError(f"Unexpected dtype {input_dtype} for RandomSunFlareaugmentation")
overlay = img.copy()
output = img.copy()
for alpha, (x, y), rad3, (r_color, g_color, b_color) in circles:
cv2.circle(overlay, (x, y), rad3, (r_color, g_color, b_color), -1)
cv2.addWeighted(overlay, alpha, output, 1 - alpha, 0, output)
point = (int(flare_center_x), int(flare_center_y))
overlay = output.copy()
num_times = src_radius // 10
alpha = np.linspace(0.0, 1, num=num_times)
rad = np.linspace(1, src_radius, num=num_times)
for i in range(num_times):
cv2.circle(overlay, point, int(rad[i]), src_color, -1)
alp = alpha[num_times - i - 1] * alpha[num_times - i - 1] * alpha[num_times - i - 1]
cv2.addWeighted(overlay, alp, output, 1 - alp, 0, output)
image_rgb = output
if needs_float:
image_rgb = to_float(image_rgb, max_value=255)
return image_rgb
def bbox_from_mask (mask)
[view source on GitHub]¶
Create bounding box from binary mask (fast version)
Parameters:
Name | Type | Description |
---|---|---|
mask | numpy.ndarray | binary mask. |
Returns:
Type | Description |
---|---|
tuple | A bounding box tuple |
Source code in albumentations/augmentations/functional.py
def bbox_from_mask(mask: np.ndarray) -> Tuple[int, int, int, int]:
"""Create bounding box from binary mask (fast version)
Args:
mask (numpy.ndarray): binary mask.
Returns:
tuple: A bounding box tuple `(x_min, y_min, x_max, y_max)`.
"""
rows = np.any(mask, axis=1)
if not rows.any():
return -1, -1, -1, -1
cols = np.any(mask, axis=0)
y_min, y_max = np.where(rows)[0][[0, -1]]
x_min, x_max = np.where(cols)[0][[0, -1]]
return x_min, y_min, x_max + 1, y_max + 1
def fancy_pca (img, alpha=0.1)
[view source on GitHub]¶
Perform 'Fancy PCA' augmentation from: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | numpy array with (h, w, rgb) shape, as ints between 0-255 |
alpha | float | how much to perturb/scale the eigen vecs and vals the paper used std=0.1 |
Returns:
Type | Description |
---|---|
ndarray | numpy image-like array as uint8 range(0, 255) |
Source code in albumentations/augmentations/functional.py
def fancy_pca(img: np.ndarray, alpha: float = 0.1) -> np.ndarray:
"""Perform 'Fancy PCA' augmentation from:
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Args:
img: numpy array with (h, w, rgb) shape, as ints between 0-255
alpha: how much to perturb/scale the eigen vecs and vals
the paper used std=0.1
Returns:
numpy image-like array as uint8 range(0, 255)
"""
if not is_rgb_image(img) or img.dtype != np.uint8:
msg = "Image must be RGB image in uint8 format."
raise TypeError(msg)
orig_img = img.astype(float).copy()
img = img / 255.0 # rescale to 0 to 1 range
# flatten image to columns of RGB
img_rs = img.reshape(-1, 3)
# img_rs shape (640000, 3)
# center mean
img_centered = img_rs - np.mean(img_rs, axis=0)
# paper says 3x3 covariance matrix
img_cov = np.cov(img_centered, rowvar=False)
# eigen values and eigen vectors
eig_vals, eig_vecs = np.linalg.eigh(img_cov)
# sort values and vector
sort_perm = eig_vals[::-1].argsort()
eig_vals[::-1].sort()
eig_vecs = eig_vecs[:, sort_perm]
# > get [p1, p2, p3]
m1 = np.column_stack(eig_vecs)
# get 3x1 matrix of eigen values multiplied by random variable draw from normal
# distribution with mean of 0 and standard deviation of 0.1
m2 = np.zeros((3, 1))
# according to the paper alpha should only be draw once per augmentation (not once per channel)
# > alpha = np.random.normal(0, alpha_std)
# broad cast to speed things up
m2[:, 0] = alpha * eig_vals[:]
# this is the vector that we're going to add to each pixel in a moment
add_vect = np.array(m1) @ np.array(m2)
for idx in range(3): # RGB
orig_img[..., idx] += add_vect[idx] * 255
# for image processing it was found that working with float 0.0 to 1.0
# was easier than integers between 0-255
# > orig_img /= 255.0
orig_img = np.clip(orig_img, 0.0, 255.0)
# > orig_img *= 255
return orig_img.astype(np.uint8)
def iso_noise (image, color_shift=0.05, intensity=0.5, random_state=None, ** kwargs)
[view source on GitHub]¶
Apply poisson noise to image to simulate camera sensor noise.
Parameters:
Name | Type | Description |
---|---|---|
image | numpy.ndarray | Input image, currently, only RGB, uint8 images are supported. |
color_shift | float | |
intensity | float | Multiplication factor for noise values. Values of ~0.5 are produce noticeable, yet acceptable level of noise. |
random_state | Optional[int] | |
**kwargs | Any |
Returns:
Type | Description |
---|---|
numpy.ndarray | Noised image |
Source code in albumentations/augmentations/functional.py
@clipped
def iso_noise(
image: np.ndarray,
color_shift: float = 0.05,
intensity: float = 0.5,
random_state: Optional[int] = None,
**kwargs: Any,
) -> np.ndarray:
"""Apply poisson noise to image to simulate camera sensor noise.
Args:
image (numpy.ndarray): Input image, currently, only RGB, uint8 images are supported.
color_shift (float):
intensity (float): Multiplication factor for noise values. Values of ~0.5 are produce noticeable,
yet acceptable level of noise.
random_state:
**kwargs:
Returns:
numpy.ndarray: Noised image
"""
if image.dtype != np.uint8:
msg = "Image must have uint8 channel type"
raise TypeError(msg)
if not is_rgb_image(image):
msg = "Image must be RGB"
raise TypeError(msg)
one_over_255 = float(1.0 / 255.0)
image = np.multiply(image, one_over_255, dtype=np.float32)
hls = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
_, stddev = cv2.meanStdDev(hls)
luminance_noise = random_utils.poisson(stddev[1] * intensity * 255, size=hls.shape[:2], random_state=random_state)
color_noise = random_utils.normal(0, color_shift * 360 * intensity, size=hls.shape[:2], random_state=random_state)
hue = hls[..., 0]
hue += color_noise
hue %= 360
luminance = hls[..., 1]
luminance += (luminance_noise / 255) * (1.0 - luminance)
image = cv2.cvtColor(hls, cv2.COLOR_HLS2RGB) * 255
return image.astype(np.uint8)
def mask_from_bbox (img, bbox)
[view source on GitHub]¶
Create binary mask from bounding box
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | input image |
bbox | Tuple[int, int, int, int] | A bounding box tuple |
Returns:
Type | Description |
---|---|
mask | binary mask |
Source code in albumentations/augmentations/functional.py
def mask_from_bbox(img: np.ndarray, bbox: Tuple[int, int, int, int]) -> np.ndarray:
"""Create binary mask from bounding box
Args:
img: input image
bbox: A bounding box tuple `(x_min, y_min, x_max, y_max)`
Returns:
mask: binary mask
"""
mask = np.zeros(img.shape[:2], dtype=np.uint8)
x_min, y_min, x_max, y_max = bbox
mask[y_min:y_max, x_min:x_max] = 1
return mask
def move_tone_curve (img, low_y, high_y)
[view source on GitHub]¶
Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | RGB or grayscale image. |
low_y | float | y-position of a Bezier control point used to adjust the tone curve, must be in range [0, 1] |
high_y | float | y-position of a Bezier control point used to adjust image tone curve, must be in range [0, 1] |
Source code in albumentations/augmentations/functional.py
@preserve_shape
def move_tone_curve(img: np.ndarray, low_y: float, high_y: float) -> np.ndarray:
"""Rescales the relationship between bright and dark areas of the image by manipulating its tone curve.
Args:
img: RGB or grayscale image.
low_y: y-position of a Bezier control point used
to adjust the tone curve, must be in range [0, 1]
high_y: y-position of a Bezier control point used
to adjust image tone curve, must be in range [0, 1]
"""
input_dtype = img.dtype
if not 0 <= low_y <= 1:
msg = "low_shift must be in range [0, 1]"
raise ValueError(msg)
if not 0 <= high_y <= 1:
msg = "high_shift must be in range [0, 1]"
raise ValueError(msg)
if input_dtype != np.uint8:
raise ValueError(f"Unsupported image type {input_dtype}")
t = np.linspace(0.0, 1.0, 256)
# Defines response of a four-point Bezier curve
def evaluate_bez(t: np.ndarray) -> np.ndarray:
return 3 * (1 - t) ** 2 * t * low_y + 3 * (1 - t) * t**2 * high_y + t**3
evaluate_bez = np.vectorize(evaluate_bez)
remapping = np.rint(evaluate_bez(t) * 255).astype(np.uint8)
lut_fn = _maybe_process_in_chunks(cv2.LUT, lut=remapping)
return lut_fn(img)
def multiply (img, multiplier)
[view source on GitHub]¶
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | Image. |
multiplier | ndarray | Multiplier coefficient. |
Returns:
Type | Description |
---|---|
ndarray | Image multiplied by |
Source code in albumentations/augmentations/functional.py
def multiply(img: np.ndarray, multiplier: np.ndarray) -> np.ndarray:
"""Args:
img: Image.
multiplier: Multiplier coefficient.
Returns:
Image multiplied by `multiplier` coefficient.
"""
if img.dtype == np.uint8:
if len(multiplier.shape) == 1:
return _multiply_uint8_optimized(img, multiplier)
return _multiply_uint8(img, multiplier)
return _multiply_non_uint8(img, multiplier)
def posterize (img, bits)
[view source on GitHub]¶
Reduce the number of bits for each color channel.
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | image to posterize. |
bits | int | number of high bits. Must be in range [0, 8] |
Returns:
Type | Description |
---|---|
ndarray | Image with reduced color channels. |
Source code in albumentations/augmentations/functional.py
@preserve_shape
def posterize(img: np.ndarray, bits: int) -> np.ndarray:
"""Reduce the number of bits for each color channel.
Args:
img: image to posterize.
bits: number of high bits. Must be in range [0, 8]
Returns:
Image with reduced color channels.
"""
bits_array = np.uint8(bits)
if img.dtype != np.uint8:
msg = "Image must have uint8 channel type"
raise TypeError(msg)
if np.any((bits_array < 0) | (bits_array > EIGHT)):
msg = "bits must be in range [0, 8]"
raise ValueError(msg)
if not bits_array.shape or len(bits_array) == 1:
if bits_array == 0:
return np.zeros_like(img)
if bits_array == EIGHT:
return img.copy()
lut = np.arange(0, 256, dtype=np.uint8)
mask = ~np.uint8(2 ** (8 - bits_array) - 1)
lut &= mask
return cv2.LUT(img, lut)
if not is_rgb_image(img):
msg = "If bits is iterable image must be RGB"
raise TypeError(msg)
result_img = np.empty_like(img)
for i, channel_bits in enumerate(bits_array):
if channel_bits == 0:
result_img[..., i] = np.zeros_like(img[..., i])
elif channel_bits == EIGHT:
result_img[..., i] = img[..., i].copy()
else:
lut = np.arange(0, 256, dtype=np.uint8)
mask = ~np.uint8(2 ** (8 - channel_bits) - 1)
lut &= mask
result_img[..., i] = cv2.LUT(img[..., i], lut)
return result_img
def solarize (img, threshold=128)
[view source on GitHub]¶
Invert all pixel values above a threshold.
Parameters:
Name | Type | Description |
---|---|---|
img | ndarray | The image to solarize. |
threshold | int | All pixels above this grayscale level are inverted. |
Returns:
Type | Description |
---|---|
ndarray | Solarized image. |
Source code in albumentations/augmentations/functional.py
def solarize(img: np.ndarray, threshold: int = 128) -> np.ndarray:
"""Invert all pixel values above a threshold.
Args:
img: The image to solarize.
threshold: All pixels above this grayscale level are inverted.
Returns:
Solarized image.
"""
dtype = img.dtype
max_val = MAX_VALUES_BY_DTYPE[dtype]
if dtype == np.dtype("uint8"):
lut = [(i if i < threshold else max_val - i) for i in range(int(max_val) + 1)]
prev_shape = img.shape
img = cv2.LUT(img, np.array(lut, dtype=dtype))
if len(prev_shape) != len(img.shape):
img = np.expand_dims(img, -1)
return img
result_img = img.copy()
cond = img >= threshold
result_img[cond] = max_val - result_img[cond]
return result_img
def split_uniform_grid (image_shape, grid)
[view source on GitHub]¶
Splits an image shape into a uniform grid specified by the grid dimensions.
Parameters:
Name | Type | Description |
---|---|---|
image_shape | Tuple[int, int] | The shape of the image as (height, width). |
grid | Tuple[int, int] | The grid size as (rows, columns). |
Returns:
Type | Description |
---|---|
np.ndarray | An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x). |
Source code in albumentations/augmentations/functional.py
def split_uniform_grid(image_shape: Tuple[int, int], grid: Tuple[int, int]) -> np.ndarray:
"""Splits an image shape into a uniform grid specified by the grid dimensions.
Args:
image_shape (Tuple[int, int]): The shape of the image as (height, width).
grid (Tuple[int, int]): The grid size as (rows, columns).
Returns:
np.ndarray: An array containing the tiles' coordinates in the format (start_y, start_x, end_y, end_x).
"""
height, width = image_shape
n_rows, n_cols = grid
# Compute split points for the grid
height_splits = np.linspace(0, height, n_rows + 1, dtype=int)
width_splits = np.linspace(0, width, n_cols + 1, dtype=int)
# Calculate tiles coordinates
tiles = [
(height_splits[i], width_splits[j], height_splits[i + 1], width_splits[j + 1])
for i in range(n_rows)
for j in range(n_cols)
]
return np.array(tiles)
def swap_tiles_on_image (image, tiles)
[view source on GitHub]¶
Swap tiles on the image according to the new format.
Parameters:
Name | Type | Description |
---|---|---|
image | ndarray | Input image. |
tiles | ndarray | Array of tiles with each tile as [start_y, start_x, end_y, end_x]. |
Returns:
Type | Description |
---|---|
np.ndarray | Output image with tiles swapped according to the random shuffle. |
Source code in albumentations/augmentations/functional.py
def swap_tiles_on_image(image: np.ndarray, tiles: np.ndarray) -> np.ndarray:
"""Swap tiles on the image according to the new format.
Args:
image: Input image.
tiles: Array of tiles with each tile as [start_y, start_x, end_y, end_x].
Returns:
np.ndarray: Output image with tiles swapped according to the random shuffle.
"""
# If no tiles are provided, return a copy of the original image
if tiles.size == 0:
return image.copy()
# Create a copy of the image to retain original for reference
new_image = np.empty_like(image)
for start_y, start_x, end_y, end_x in tiles:
# Assign the corresponding tile from the original image to the new image
new_image[start_y:end_y, start_x:end_x] = image[start_y:end_y, start_x:end_x]
return new_image
geometric
special
¶
functional
¶
def bbox_flip (bbox, d, rows, cols)
[view source on GitHub]¶
Flip a bounding box either vertically, horizontally or both depending on the value of d
.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box |
d | int | dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A bounding box |
Exceptions:
Type | Description |
---|---|
ValueError | if value of |
Source code in albumentations/augmentations/geometric/functional.py
def bbox_flip(bbox: BoxInternalType, d: int, rows: int, cols: int) -> BoxInternalType:
"""Flip a bounding box either vertically, horizontally or both depending on the value of `d`.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
d: dimension. 0 for vertical flip, 1 for horizontal, -1 for transpose
rows: Image rows.
cols: Image cols.
Returns:
A bounding box `(x_min, y_min, x_max, y_max)`.
Raises:
ValueError: if value of `d` is not -1, 0 or 1.
"""
if d == 0:
bbox = bbox_vflip(bbox, rows, cols)
elif d == 1:
bbox = bbox_hflip(bbox, rows, cols)
elif d == -1:
bbox = bbox_hflip(bbox, rows, cols)
bbox = bbox_vflip(bbox, rows, cols)
else:
raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
return bbox
def bbox_hflip (bbox, rows, cols)
[view source on GitHub]¶
Flip a bounding box horizontally around the y-axis.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A bounding box |
Source code in albumentations/augmentations/geometric/functional.py
def bbox_hflip(bbox: BoxInternalType, rows: int, cols: int) -> BoxInternalType:
"""Flip a bounding box horizontally around the y-axis.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image rows.
cols: Image cols.
Returns:
A bounding box `(x_min, y_min, x_max, y_max)`.
"""
x_min, y_min, x_max, y_max = bbox[:4]
return 1 - x_max, y_min, 1 - x_min, y_max
def bbox_rot90 (bbox, factor, rows, cols)
[view source on GitHub]¶
Rotates a bounding box by 90 degrees CCW (see np.rot90)
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box tuple (x_min, y_min, x_max, y_max). |
factor | int | Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90. |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
tuple | A bounding box tuple (x_min, y_min, x_max, y_max). |
Source code in albumentations/augmentations/geometric/functional.py
def bbox_rot90(bbox: BoxInternalType, factor: int, rows: int, cols: int) -> BoxInternalType:
"""Rotates a bounding box by 90 degrees CCW (see np.rot90)
Args:
bbox: A bounding box tuple (x_min, y_min, x_max, y_max).
factor: Number of CCW rotations. Must be in set {0, 1, 2, 3} See np.rot90.
rows: Image rows.
cols: Image cols.
Returns:
tuple: A bounding box tuple (x_min, y_min, x_max, y_max).
"""
if factor not in {0, 1, 2, 3}:
msg = "Parameter n must be in set {0, 1, 2, 3}"
raise ValueError(msg)
x_min, y_min, x_max, y_max = bbox[:4]
if factor == 1:
bbox = y_min, 1 - x_max, y_max, 1 - x_min
elif factor == TWO:
bbox = 1 - x_max, 1 - y_max, 1 - x_min, 1 - y_min
elif factor == THREE:
bbox = 1 - y_max, x_min, 1 - y_min, x_max
return bbox
def bbox_rotate (bbox, angle, method, rows, cols)
[view source on GitHub]¶
Rotates a bounding box by angle degrees.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box |
angle | float | Angle of rotation in degrees. |
method | str | Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box". |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A bounding box |
References
Source code in albumentations/augmentations/geometric/functional.py
def bbox_rotate(bbox: BoxInternalType, angle: float, method: str, rows: int, cols: int) -> BoxInternalType:
"""Rotates a bounding box by angle degrees.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
angle: Angle of rotation in degrees.
method: Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box".
rows: Image rows.
cols: Image cols.
Returns:
A bounding box `(x_min, y_min, x_max, y_max)`.
References:
https://arxiv.org/abs/2109.13488
"""
x_min, y_min, x_max, y_max = bbox[:4]
scale = cols / float(rows)
if method == "largest_box":
x = np.array([x_min, x_max, x_max, x_min]) - 0.5
y = np.array([y_min, y_min, y_max, y_max]) - 0.5
elif method == "ellipse":
w = (x_max - x_min) / 2
h = (y_max - y_min) / 2
data = np.arange(0, 360, dtype=np.float32)
x = w * np.sin(np.radians(data)) + (w + x_min - 0.5)
y = h * np.cos(np.radians(data)) + (h + y_min - 0.5)
else:
raise ValueError(f"Method {method} is not a valid rotation method.")
angle = np.deg2rad(angle)
x_t = (np.cos(angle) * x * scale + np.sin(angle) * y) / scale
y_t = -np.sin(angle) * x * scale + np.cos(angle) * y
x_t = x_t + 0.5
y_t = y_t + 0.5
x_min, x_max = min(x_t), max(x_t)
y_min, y_max = min(y_t), max(y_t)
return x_min, y_min, x_max, y_max
def bbox_shift_scale_rotate (bbox, angle, scale, dx, dy, rotate_method, rows, cols, ** kwargs)
[view source on GitHub]¶
Rotates, shifts and scales a bounding box. Rotation is made by angle degrees, scaling is made by scale factor and shifting is made by dx and dy.
Parameters:
Name | Type | Description |
---|---|---|
bbox | tuple | A bounding box |
angle | int | Angle of rotation in degrees. |
scale | int | Scale factor. |
dx | int | Shift along x-axis in pixel units. |
dy | int | Shift along y-axis in pixel units. |
rotate_method(str) | Rotation method used. Should be one of: "largest_box", "ellipse". Default: "largest_box". | |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A bounding box |
Source code in albumentations/augmentations/geometric/functional.py
def bbox_shift_scale_rotate(
bbox: BoxInternalType,
angle: float,
scale: float,
dx: int,
dy: int,
rotate_method: str,
rows: int,
cols: int,
**kwargs: Any,
) -> BoxInternalType:
"""Rotates, shifts and scales a bounding box. Rotation is made by angle degrees,
scaling is made by scale factor and shifting is made by dx and dy.
Args:
bbox (tuple): A bounding box `(x_min, y_min, x_max, y_max)`.
angle (int): Angle of rotation in degrees.
scale (int): Scale factor.
dx (int): Shift along x-axis in pixel units.
dy (int): Shift along y-axis in pixel units.
rotate_method(str): Rotation method used. Should be one of: "largest_box", "ellipse".
Default: "largest_box".
rows (int): Image rows.
cols (int): Image cols.
Returns:
A bounding box `(x_min, y_min, x_max, y_max)`.
"""
height, width = rows, cols
center = (width / 2, height / 2)
if rotate_method == "ellipse":
x_min, y_min, x_max, y_max = bbox_rotate(bbox, angle, rotate_method, rows, cols)
matrix = cv2.getRotationMatrix2D(center, 0, scale)
else:
x_min, y_min, x_max, y_max = bbox[:4]
matrix = cv2.getRotationMatrix2D(center, angle, scale)
matrix[0, 2] += dx * width
matrix[1, 2] += dy * height
x = np.array([x_min, x_max, x_max, x_min])
y = np.array([y_min, y_min, y_max, y_max])
ones = np.ones(shape=(len(x)))
points_ones = np.vstack([x, y, ones]).transpose()
points_ones[:, 0] *= width
points_ones[:, 1] *= height
tr_points = matrix.dot(points_ones.T).T
tr_points[:, 0] /= width
tr_points[:, 1] /= height
x_min, x_max = min(tr_points[:, 0]), max(tr_points[:, 0])
y_min, y_max = min(tr_points[:, 1]), max(tr_points[:, 1])
return x_min, y_min, x_max, y_max
def bbox_transpose (bbox, axis, rows, cols)
[view source on GitHub]¶
Transposes a bounding box along given axis.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box |
axis | int | 0 - main axis, 1 - secondary axis. |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A bounding box tuple |
Exceptions:
Type | Description |
---|---|
ValueError | If axis not equal to 0 or 1. |
Source code in albumentations/augmentations/geometric/functional.py
def bbox_transpose(bbox: KeypointInternalType, axis: int, rows: int, cols: int) -> KeypointInternalType:
"""Transposes a bounding box along given axis.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
axis: 0 - main axis, 1 - secondary axis.
rows: Image rows.
cols: Image cols.
Returns:
A bounding box tuple `(x_min, y_min, x_max, y_max)`.
Raises:
ValueError: If axis not equal to 0 or 1.
"""
x_min, y_min, x_max, y_max = bbox[:4]
if axis not in {0, 1}:
msg = "Axis must be either 0 or 1."
raise ValueError(msg)
if axis == 0:
bbox = (y_min, x_min, y_max, x_max)
if axis == 1:
bbox = (1 - y_max, 1 - x_max, 1 - y_min, 1 - x_min)
return bbox
def bbox_vflip (bbox, rows, cols)
[view source on GitHub]¶
Flip a bounding box vertically around the x-axis.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Tuple[float, float, float, float] | A bounding box |
rows | int | Image rows. |
cols | int | Image cols. |
Returns:
Type | Description |
---|---|
tuple | A bounding box |
Source code in albumentations/augmentations/geometric/functional.py
def bbox_vflip(bbox: BoxInternalType, rows: int, cols: int) -> BoxInternalType:
"""Flip a bounding box vertically around the x-axis.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image rows.
cols: Image cols.
Returns:
tuple: A bounding box `(x_min, y_min, x_max, y_max)`.
"""
x_min, y_min, x_max, y_max = bbox[:4]
return x_min, 1 - y_max, x_max, 1 - y_min
def elastic_transform (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None, approximate=False, same_dxdy=False)
[view source on GitHub]¶
Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
Source code in albumentations/augmentations/geometric/functional.py
@preserve_shape
def elastic_transform(
img: np.ndarray,
alpha: float,
sigma: float,
alpha_affine: float,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
random_state: Optional[np.random.RandomState] = None,
approximate: bool = False,
same_dxdy: bool = False,
) -> np.ndarray:
"""Elastic deformation of images as described in [Simard2003]_ (with modifications).
Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
Convolutional Neural Networks applied to Visual Document Analysis", in
Proc. of the International Conference on Document Analysis and
Recognition, 2003.
"""
height, width = img.shape[:2]
# Random affine
center_square = np.array((height, width), dtype=np.float32) // 2
square_size = min((height, width)) // 3
alpha = float(alpha)
sigma = float(sigma)
alpha_affine = float(alpha_affine)
pts1 = np.array(
[
center_square + square_size,
[center_square[0] + square_size, center_square[1] - square_size],
center_square - square_size,
],
dtype=np.float32,
)
pts2 = pts1 + random_utils.uniform(-alpha_affine, alpha_affine, size=pts1.shape, random_state=random_state).astype(
np.float32
)
matrix = cv2.getAffineTransform(pts1, pts2)
warp_fn = _maybe_process_in_chunks(
cv2.warpAffine, M=matrix, dsize=(width, height), flags=interpolation, borderMode=border_mode, borderValue=value
)
img = warp_fn(img)
if approximate:
# Approximate computation smooth displacement map with a large enough kernel.
# On large images (512+) this is approximately 2X times faster
dx = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
cv2.GaussianBlur(dx, (17, 17), sigma, dst=dx)
dx *= alpha
if same_dxdy:
# Speed up even more
dy = dx
else:
dy = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
cv2.GaussianBlur(dy, (17, 17), sigma, dst=dy)
dy *= alpha
else:
dx = np.float32(
gaussian_filter((random_utils.rand(height, width, random_state=random_state) * 2 - 1), sigma) * alpha
)
if same_dxdy:
# Speed up
dy = dx
else:
dy = np.float32(
gaussian_filter((random_utils.rand(height, width, random_state=random_state) * 2 - 1), sigma) * alpha
)
x, y = np.meshgrid(np.arange(width), np.arange(height))
map_x = np.float32(x + dx)
map_y = np.float32(y + dy)
remap_fn = _maybe_process_in_chunks(
cv2.remap, map1=map_x, map2=map_y, interpolation=interpolation, borderMode=border_mode, borderValue=value
)
return remap_fn(img)
def elastic_transform_approx (img, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, value=None, random_state=None)
[view source on GitHub]¶
Elastic deformation of images as described in [Simard2003]_ (with modifications for speed). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
Source code in albumentations/augmentations/geometric/functional.py
@preserve_shape
def elastic_transform_approx(
img: np.ndarray,
alpha: float,
sigma: float,
alpha_affine: float,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
random_state: Optional[np.random.RandomState] = None,
) -> np.ndarray:
"""Elastic deformation of images as described in [Simard2003]_ (with modifications for speed).
Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
Convolutional Neural Networks applied to Visual Document Analysis", in
Proc. of the International Conference on Document Analysis and
Recognition, 2003.
"""
height, width = img.shape[:2]
# Random affine
center_square = np.array((height, width), dtype=np.float32) // 2
square_size = min((height, width)) // 3
alpha = float(alpha)
sigma = float(sigma)
alpha_affine = float(alpha_affine)
pts1 = np.array(
[
center_square + square_size,
[center_square[0] + square_size, center_square[1] - square_size],
center_square - square_size,
],
dtype=np.float32,
)
pts2 = pts1 + random_utils.uniform(-alpha_affine, alpha_affine, size=pts1.shape, random_state=random_state).astype(
np.float32
)
matrix = cv2.getAffineTransform(pts1, pts2)
warp_fn = _maybe_process_in_chunks(
cv2.warpAffine,
M=matrix,
dsize=(width, height),
flags=interpolation,
borderMode=border_mode,
borderValue=value,
)
img = warp_fn(img)
dx = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
cv2.GaussianBlur(dx, (17, 17), sigma, dst=dx)
dx *= alpha
dy = random_utils.rand(height, width, random_state=random_state).astype(np.float32) * 2 - 1
cv2.GaussianBlur(dy, (17, 17), sigma, dst=dy)
dy *= alpha
x, y = np.meshgrid(np.arange(width), np.arange(height))
map_x = np.float32(x + dx)
map_y = np.float32(y + dy)
remap_fn = _maybe_process_in_chunks(
cv2.remap,
map1=map_x,
map2=map_y,
interpolation=interpolation,
borderMode=border_mode,
borderValue=value,
)
return remap_fn(img)
def find_keypoint (position, distance_map, threshold, inverted)
[view source on GitHub]¶
Determine if a valid keypoint can be found at the given position.
Source code in albumentations/augmentations/geometric/functional.py
def find_keypoint(
position: Tuple[int, int], distance_map: np.ndarray, threshold: Optional[float], inverted: bool
) -> Optional[Tuple[float, float]]:
"""Determine if a valid keypoint can be found at the given position."""
y, x = position
value = distance_map[y, x]
if not inverted and threshold is not None and value >= threshold:
return None
if inverted and threshold is not None and value < threshold:
return None
return float(x), float(y)
def from_distance_maps (distance_maps, inverted, if_not_found_coords, threshold=None)
[view source on GitHub]¶
Convert outputs of to_distance_maps
to KeypointsOnImage
. This is the inverse of to_distance_maps
.
Source code in albumentations/augmentations/geometric/functional.py
def from_distance_maps(
distance_maps: np.ndarray,
inverted: bool,
if_not_found_coords: Optional[Union[Sequence[int], Dict[str, Any]]],
threshold: Optional[float] = None,
) -> List[Tuple[float, float]]:
"""Convert outputs of `to_distance_maps` to `KeypointsOnImage`.
This is the inverse of `to_distance_maps`.
"""
if distance_maps.ndim != THREE:
msg = f"Expected three-dimensional input, got {distance_maps.ndim} dimensions and shape {distance_maps.shape}."
raise ValueError(msg)
height, width, nb_keypoints = distance_maps.shape
drop_if_not_found, if_not_found_x, if_not_found_y = validate_if_not_found_coords(if_not_found_coords)
keypoints = []
for i in range(nb_keypoints):
hitidx_flat = np.argmax(distance_maps[..., i]) if inverted else np.argmin(distance_maps[..., i])
hitidx_ndim = np.unravel_index(hitidx_flat, (height, width))
keypoint = find_keypoint(hitidx_ndim, distance_maps[:, :, i], threshold, inverted)
if keypoint:
keypoints.append(keypoint)
elif not drop_if_not_found:
keypoints.append((if_not_found_x, if_not_found_y))
return keypoints
def grid_distortion (img, num_steps=10, xsteps=(), ysteps=(), interpolation=1, border_mode=4, value=None)
[view source on GitHub]¶
Perform a grid distortion of an input image.
Source code in albumentations/augmentations/geometric/functional.py
@preserve_shape
def grid_distortion(
img: np.ndarray,
num_steps: int = 10,
xsteps: Tuple[()] = (),
ysteps: Tuple[()] = (),
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
) -> np.ndarray:
"""Perform a grid distortion of an input image.
Reference:
http://pythology.blogspot.sg/2014/03/interpolation-on-regular-distorted-grid.html
"""
height, width = img.shape[:2]
x_step = width // num_steps
xx = np.zeros(width, np.float32)
prev = 0
for idx in range(num_steps + 1):
x = idx * x_step
start = int(x)
end = int(x) + x_step
if end > width:
end = width
cur = width
else:
cur = prev + x_step * xsteps[idx]
xx[start:end] = np.linspace(prev, cur, end - start)
prev = cur
y_step = height // num_steps
yy = np.zeros(height, np.float32)
prev = 0
for idx in range(num_steps + 1):
y = idx * y_step
start = int(y)
end = int(y) + y_step
if end > height:
end = height
cur = height
else:
cur = prev + y_step * ysteps[idx]
yy[start:end] = np.linspace(prev, cur, end - start)
prev = cur
map_x, map_y = np.meshgrid(xx, yy)
map_x = map_x.astype(np.float32)
map_y = map_y.astype(np.float32)
remap_fn = _maybe_process_in_chunks(
cv2.remap,
map1=map_x,
map2=map_y,
interpolation=interpolation,
borderMode=border_mode,
borderValue=value,
)
return remap_fn(img)
def keypoint_flip (keypoint, d, rows, cols)
[view source on GitHub]¶
Flip a keypoint either vertically, horizontally or both depending on the value of d
.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
d | int | Number of flip. Must be -1, 0 or 1: * 0 - vertical flip, * 1 - horizontal flip, * -1 - vertical and horizontal flip. |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Exceptions:
Type | Description |
---|---|
ValueError | if value of |
Source code in albumentations/augmentations/geometric/functional.py
def keypoint_flip(keypoint: KeypointInternalType, d: int, rows: int, cols: int) -> KeypointInternalType:
"""Flip a keypoint either vertically, horizontally or both depending on the value of `d`.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
d: Number of flip. Must be -1, 0 or 1:
* 0 - vertical flip,
* 1 - horizontal flip,
* -1 - vertical and horizontal flip.
rows: Image height.
cols: Image width.
Returns:
A keypoint `(x, y, angle, scale)`.
Raises:
ValueError: if value of `d` is not -1, 0 or 1.
"""
if d == 0:
keypoint = keypoint_vflip(keypoint, rows, cols)
elif d == 1:
keypoint = keypoint_hflip(keypoint, rows, cols)
elif d == -1:
keypoint = keypoint_hflip(keypoint, rows, cols)
keypoint = keypoint_vflip(keypoint, rows, cols)
else:
raise ValueError(f"Invalid d value {d}. Valid values are -1, 0 and 1")
return keypoint
def keypoint_hflip (keypoint, rows, cols)
[view source on GitHub]¶
Flip a keypoint horizontally around the y-axis.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/geometric/functional.py
@angle_2pi_range
def keypoint_hflip(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
"""Flip a keypoint horizontally around the y-axis.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
rows: Image height.
cols: Image width.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
x, y, angle, scale = keypoint[:4]
angle = math.pi - angle
return (cols - 1) - x, y, angle, scale
def keypoint_rot90 (keypoint, factor, rows, cols, ** params)
[view source on GitHub]¶
Rotates a keypoint by 90 degrees CCW (see np.rot90)
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
factor | int | Number of CCW rotations. Must be in range [0;3] See np.rot90. |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
tuple | A keypoint |
Exceptions:
Type | Description |
---|---|
ValueError | if factor not in set {0, 1, 2, 3} |
Source code in albumentations/augmentations/geometric/functional.py
@angle_2pi_range
def keypoint_rot90(
keypoint: KeypointInternalType, factor: int, rows: int, cols: int, **params: Any
) -> KeypointInternalType:
"""Rotates a keypoint by 90 degrees CCW (see np.rot90)
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
factor: Number of CCW rotations. Must be in range [0;3] See np.rot90.
rows: Image height.
cols: Image width.
Returns:
tuple: A keypoint `(x, y, angle, scale)`.
Raises:
ValueError: if factor not in set {0, 1, 2, 3}
"""
x, y, angle, scale = keypoint[:4]
if factor not in {0, 1, 2, 3}:
msg = "Parameter n must be in set {0, 1, 2, 3}"
raise ValueError(msg)
if factor == 1:
x, y, angle = y, (cols - 1) - x, angle - math.pi / 2
elif factor == TWO:
x, y, angle = (cols - 1) - x, (rows - 1) - y, angle - math.pi
elif factor == THREE:
x, y, angle = (rows - 1) - y, x, angle + math.pi / 2
return x, y, angle, scale
def keypoint_rotate (keypoint, angle, rows, cols, ** params)
[view source on GitHub]¶
Rotate a keypoint by angle.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
angle | float | Rotation angle. |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/geometric/functional.py
@angle_2pi_range
def keypoint_rotate(
keypoint: KeypointInternalType, angle: float, rows: int, cols: int, **params: Any
) -> KeypointInternalType:
"""Rotate a keypoint by angle.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
angle: Rotation angle.
rows: Image height.
cols: Image width.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
center = (cols - 1) * 0.5, (rows - 1) * 0.5
matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
x, y, a, s = keypoint[:4]
x, y = cv2.transform(np.array([[[x, y]]]), matrix).squeeze()
return x, y, a + math.radians(angle), s
def keypoint_scale (keypoint, scale_x, scale_y)
[view source on GitHub]¶
Scales a keypoint by scale_x and scale_y.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
scale_x | float | Scale coefficient x-axis. |
scale_y | float | Scale coefficient y-axis. |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/geometric/functional.py
def keypoint_scale(keypoint: KeypointInternalType, scale_x: float, scale_y: float) -> KeypointInternalType:
"""Scales a keypoint by scale_x and scale_y.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
scale_x: Scale coefficient x-axis.
scale_y: Scale coefficient y-axis.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
x, y, angle, scale = keypoint[:4]
return x * scale_x, y * scale_y, angle, scale * max(scale_x, scale_y)
def keypoint_transpose (keypoint)
[view source on GitHub]¶
Rotate a keypoint by angle.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
Returns:
Type | Description |
---|---|
Tuple[float, float, float, float] | A keypoint |
Source code in albumentations/augmentations/geometric/functional.py
def keypoint_transpose(keypoint: KeypointInternalType) -> KeypointInternalType:
"""Rotate a keypoint by angle.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
Returns:
A keypoint `(x, y, angle, scale)`.
"""
x, y, angle, scale = keypoint[:4]
angle = np.pi - angle if angle <= np.pi else 3 * np.pi - angle
return y, x, angle, scale
def keypoint_vflip (keypoint, rows, cols)
[view source on GitHub]¶
Flip a keypoint vertically around the x-axis.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | Tuple[float, float, float, float] | A keypoint |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
tuple | A keypoint |
Source code in albumentations/augmentations/geometric/functional.py
@angle_2pi_range
def keypoint_vflip(keypoint: KeypointInternalType, rows: int, cols: int) -> KeypointInternalType:
"""Flip a keypoint vertically around the x-axis.
Args:
keypoint: A keypoint `(x, y, angle, scale)`.
rows: Image height.
cols: Image width.
Returns:
tuple: A keypoint `(x, y, angle, scale)`.
"""
x, y, angle, scale = keypoint[:4]
angle = -angle
return x, (rows - 1) - y, angle, scale
def optical_distortion (img, k=0, dx=0, dy=0, interpolation=1, border_mode=4, value=None)
[view source on GitHub]¶
Barrel / pincushion distortion. Unconventional augment.
Reference
| https://stackoverflow.com/questions/6199636/formulas-for-barrel-pincushion-distortion | https://stackoverflow.com/questions/10364201/image-transformation-in-opencv | https://stackoverflow.com/questions/2477774/correcting-fisheye-distortion-programmatically | http://www.coldvision.io/2017/03/02/advanced-lane-finding-using-opencv/
Source code in albumentations/augmentations/geometric/functional.py
@preserve_shape
def optical_distortion(
img: np.ndarray,
k: int = 0,
dx: int = 0,
dy: int = 0,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
) -> np.ndarray:
"""Barrel / pincushion distortion. Unconventional augment.
Reference:
| https://stackoverflow.com/questions/6199636/formulas-for-barrel-pincushion-distortion
| https://stackoverflow.com/questions/10364201/image-transformation-in-opencv
| https://stackoverflow.com/questions/2477774/correcting-fisheye-distortion-programmatically
| http://www.coldvision.io/2017/03/02/advanced-lane-finding-using-opencv/
"""
height, width = img.shape[:2]
fx = width
fy = height
cx = width * 0.5 + dx
cy = height * 0.5 + dy
camera_matrix = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]], dtype=np.float32)
distortion = np.array([k, k, 0, 0, 0], dtype=np.float32)
map1, map2 = cv2.initUndistortRectifyMap(camera_matrix, distortion, None, None, (width, height), cv2.CV_32FC1)
return cv2.remap(img, map1, map2, interpolation=interpolation, borderMode=border_mode, borderValue=value)
def rotation2d_matrix_to_euler_angles (matrix, y_up=False)
[view source on GitHub]¶
matrix (np.ndarray): Rotation matrix y_up (bool): is Y axis looks up or down
Source code in albumentations/augmentations/geometric/functional.py
def to_distance_maps (keypoints, height, width, inverted=False)
[view source on GitHub]¶
Generate a (H,W,N)
array of distance maps for N
keypoints.
The n
-th distance map contains at every location (y, x)
the euclidean distance to the n
-th keypoint.
This function can be used as a helper when augmenting keypoints with a method that only supports the augmentation of images.
Parameters:
Name | Type | Description |
---|---|---|
keypoint | keypoint coordinates | |
height | int | image height |
width | int | image width |
inverted | bool | If |
Returns:
Type | Description |
---|---|
ndarray | (H, W, N) ndarray A |
Source code in albumentations/augmentations/geometric/functional.py
def to_distance_maps(
keypoints: Sequence[Tuple[float, float]], height: int, width: int, inverted: bool = False
) -> np.ndarray:
"""Generate a ``(H,W,N)`` array of distance maps for ``N`` keypoints.
The ``n``-th distance map contains at every location ``(y, x)`` the
euclidean distance to the ``n``-th keypoint.
This function can be used as a helper when augmenting keypoints with a
method that only supports the augmentation of images.
Args:
keypoint: keypoint coordinates
height: image height
width: image width
inverted (bool): If ``True``, inverted distance maps are returned where each
distance value d is replaced by ``d/(d+1)``, i.e. the distance
maps have values in the range ``(0.0, 1.0]`` with ``1.0`` denoting
exactly the position of the respective keypoint.
Returns:
(H, W, N) ndarray
A ``float32`` array containing ``N`` distance maps for ``N``
keypoints. Each location ``(y, x, n)`` in the array denotes the
euclidean distance at ``(y, x)`` to the ``n``-th keypoint.
If `inverted` is ``True``, the distance ``d`` is replaced
by ``d/(d+1)``. The height and width of the array match the
height and width in ``KeypointsOnImage.shape``.
"""
distance_maps = np.zeros((height, width, len(keypoints)), dtype=np.float32)
yy = np.arange(0, height)
xx = np.arange(0, width)
grid_xx, grid_yy = np.meshgrid(xx, yy)
for i, (x, y) in enumerate(keypoints):
distance_maps[:, :, i] = (grid_xx - x) ** 2 + (grid_yy - y) ** 2
distance_maps = np.sqrt(distance_maps)
if inverted:
return 1 / (distance_maps + 1)
return distance_maps
def validate_if_not_found_coords (if_not_found_coords)
[view source on GitHub]¶
Validate and process if_not_found_coords
parameter.
Source code in albumentations/augmentations/geometric/functional.py
def validate_if_not_found_coords(
if_not_found_coords: Optional[Union[Sequence[int], Dict[str, Any]]],
) -> Tuple[bool, int, int]:
"""Validate and process `if_not_found_coords` parameter."""
if if_not_found_coords is None:
return True, -1, -1
if isinstance(if_not_found_coords, (tuple, list)):
if len(if_not_found_coords) != TWO:
msg = "Expected tuple/list 'if_not_found_coords' to contain exactly two entries."
raise ValueError(msg)
return False, if_not_found_coords[0], if_not_found_coords[1]
if isinstance(if_not_found_coords, dict):
return False, if_not_found_coords["x"], if_not_found_coords["y"]
msg = "Expected if_not_found_coords to be None, tuple, list, or dict."
raise ValueError(msg)
resize
¶
class LongestMaxSize
(max_size=1024, interpolation=1, always_apply=False, p=1)
[view source on GitHub] ¶
Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.
Parameters:
Name | Type | Description |
---|---|---|
max_size | int, list of int | maximum size of the image after the transformation. When using a list, max size will be randomly selected from the values in the list. |
interpolation | OpenCV flag | interpolation method. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/resize.py
class LongestMaxSize(DualTransform):
"""Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.
Args:
max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
will be randomly selected from the values in the list.
interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
max_size: Union[int, Sequence[int]] = 1024,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1,
):
super().__init__(always_apply, p)
self.interpolation = interpolation
self.max_size = max_size
def apply(
self, img: np.ndarray, max_size: int = 1024, interpolation: int = cv2.INTER_LINEAR, **params: Any
) -> np.ndarray:
return F.longest_max_size(img, max_size=max_size, interpolation=interpolation)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
# Bounding box coordinates are scale invariant
return bbox
def apply_to_keypoint(
self, keypoint: KeypointInternalType, max_size: int = 1024, **params: Any
) -> KeypointInternalType:
height = params["rows"]
width = params["cols"]
scale = max_size / max([height, width])
return F.keypoint_scale(keypoint, scale, scale)
def get_params(self) -> Dict[str, int]:
return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("max_size", "interpolation")
class RandomScale
(scale_limit=0.1, interpolation=1, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly resize the input. Output image size is different from the input image size.
Parameters:
Name | Type | Description |
---|---|---|
scale_limit | float, float) or float | scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1). |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/resize.py
class RandomScale(DualTransform):
"""Randomly resize the input. Output image size is different from the input image size.
Args:
scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
Default: (-0.1, 0.1).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
scale_limit: ScaleFloatType = 0.1,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.scale_limit = to_tuple(scale_limit, bias=1.0)
self.interpolation = interpolation
def get_params(self) -> Dict[str, float]:
return {"scale": random.uniform(self.scale_limit[0], self.scale_limit[1])}
def apply(
self, img: np.ndarray, scale: float = 0, interpolation: int = cv2.INTER_LINEAR, **params: Any
) -> np.ndarray:
return F.scale(img, scale, interpolation)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
# Bounding box coordinates are scale invariant
return bbox
def apply_to_keypoint(
self, keypoint: KeypointInternalType, scale: float = 0, **params: Any
) -> KeypointInternalType:
return F.keypoint_scale(keypoint, scale, scale)
def get_transform_init_args(self) -> Dict[str, Any]:
return {"interpolation": self.interpolation, "scale_limit": to_tuple(self.scale_limit, bias=-1.0)}
class Resize
(height, width, interpolation=1, always_apply=False, p=1)
[view source on GitHub] ¶
Resize the input to the given height and width.
Parameters:
Name | Type | Description |
---|---|---|
height | int | desired height of the output. |
width | int | desired width of the output. |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/resize.py
class Resize(DualTransform):
"""Resize the input to the given height and width.
Args:
height (int): desired height of the output.
width (int): desired width of the output.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)
def __init__(
self, height: int, width: int, interpolation: int = cv2.INTER_LINEAR, always_apply: bool = False, p: float = 1
):
super().__init__(always_apply, p)
self.height = height
self.width = width
self.interpolation = interpolation
def apply(self, img: np.ndarray, interpolation: int = cv2.INTER_LINEAR, **params: Any) -> np.ndarray:
return F.resize(img, height=self.height, width=self.width, interpolation=interpolation)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
# Bounding box coordinates are scale invariant
return bbox
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
height = params["rows"]
width = params["cols"]
scale_x = self.width / width
scale_y = self.height / height
return F.keypoint_scale(keypoint, scale_x, scale_y)
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("height", "width", "interpolation")
class SmallestMaxSize
(max_size=1024, interpolation=1, always_apply=False, p=1)
[view source on GitHub] ¶
Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.
Parameters:
Name | Type | Description |
---|---|---|
max_size | int, list of int | maximum size of smallest side of the image after the transformation. When using a list, max size will be randomly selected from the values in the list. |
interpolation | OpenCV flag | interpolation method. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 1. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/resize.py
class SmallestMaxSize(DualTransform):
"""Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.
Args:
max_size (int, list of int): maximum size of smallest side of the image after the transformation. When using a
list, max size will be randomly selected from the values in the list.
interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)
def __init__(
self,
max_size: Union[int, Sequence[int]] = 1024,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1,
):
super().__init__(always_apply, p)
self.interpolation = interpolation
self.max_size = max_size
def apply(
self, img: np.ndarray, max_size: int = 1024, interpolation: int = cv2.INTER_LINEAR, **params: Any
) -> np.ndarray:
return F.smallest_max_size(img, max_size=max_size, interpolation=interpolation)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return bbox
def apply_to_keypoint(
self, keypoint: KeypointInternalType, max_size: int = 1024, **params: Any
) -> KeypointInternalType:
height = params["rows"]
width = params["cols"]
scale = max_size / min([height, width])
return F.keypoint_scale(keypoint, scale, scale)
def get_params(self) -> Dict[str, int]:
return {"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("max_size", "interpolation")
rotate
¶
class RandomRotate90
[view source on GitHub] ¶
Randomly rotate the input by 90 degrees zero or more times.
Parameters:
Name | Type | Description |
---|---|---|
p | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/rotate.py
class RandomRotate90(DualTransform):
"""Randomly rotate the input by 90 degrees zero or more times.
Args:
p: probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def apply(self, img: np.ndarray, factor: float = 0, **params: Any) -> np.ndarray:
"""Args:
factor (int): number of times the input will be rotated by 90 degrees.
"""
return np.ascontiguousarray(np.rot90(img, factor))
def get_params(self) -> Dict[str, int]:
# Random int in the range [0, 3]
return {"factor": random.randint(0, 3)}
def apply_to_bbox(self, bbox: BoxInternalType, factor: int = 0, **params: Any) -> BoxInternalType:
return F.bbox_rot90(bbox, factor, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, factor: int = 0, **params: Any) -> BoxInternalType:
return F.keypoint_rot90(keypoint, factor, **params)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
apply (self, img, factor=0, **params)
¶
factor (int): number of times the input will be rotated by 90 degrees.
class Rotate
(limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, rotate_method='largest_box', crop_border=False, always_apply=False, p=0.5)
[view source on GitHub] ¶
Rotate the input by an angle selected randomly from the uniform distribution.
Parameters:
Name | Type | Description |
---|---|---|
limit | Union[int, Tuple[int, int]] | range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90) |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode | OpenCV flag | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
rotate_method | str | rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box" |
crop_border | bool | If True would make a largest possible crop within rotated image |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/rotate.py
class Rotate(DualTransform):
"""Rotate the input by an angle selected randomly from the uniform distribution.
Args:
limit: range from which a random angle is picked. If limit is a single int
an angle is picked from (-limit, limit). Default: (-90, 90)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
Default: "largest_box"
crop_border (bool): If True would make a largest possible crop within rotated image
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
limit: ScaleIntType = 90,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[Union[int, float, Tuple[int, int], Tuple[float, float]]] = None,
mask_value: Optional[Union[int, float, Tuple[int, int], Tuple[float, float]]] = None,
rotate_method: str = "largest_box",
crop_border: bool = False,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.limit = to_tuple(limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
self.rotate_method = rotate_method
self.crop_border = crop_border
if rotate_method not in ["largest_box", "ellipse"]:
raise ValueError(f"Rotation method {self.rotate_method} is not valid.")
def apply(
self,
img: np.ndarray,
angle: float = 0,
interpolation: int = cv2.INTER_LINEAR,
x_min: Optional[int] = None,
x_max: Optional[int] = None,
y_min: Optional[int] = None,
y_max: Optional[int] = None,
**params: Any,
) -> np.ndarray:
img_out = F.rotate(img, angle, interpolation, self.border_mode, self.value)
if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
return FCrops.crop(img_out, x_min, y_min, x_max, y_max)
return img_out
def apply_to_mask(
self,
mask: np.ndarray,
angle: float,
x_min: Optional[int] = None,
x_max: Optional[int] = None,
y_min: Optional[int] = None,
y_max: Optional[int] = None,
**params: Any,
) -> np.ndarray:
img_out = F.rotate(mask, angle, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
return FCrops.crop(img_out, x_min, y_min, x_max, y_max)
return img_out
def apply_to_bbox(
self,
bbox: BoxInternalType,
angle: float = 0,
x_min: Optional[int] = None,
x_max: Optional[int] = None,
y_min: Optional[int] = None,
y_max: Optional[int] = None,
cols: int = 0,
rows: int = 0,
**params: Any,
) -> np.ndarray:
bbox_out = F.bbox_rotate(bbox, angle, self.rotate_method, rows, cols)
if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
return FCrops.bbox_crop(bbox_out, x_min, y_min, x_max, y_max, rows, cols)
return bbox_out
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
angle: float = 0,
x_min: Optional[int] = None,
x_max: Optional[int] = None,
y_min: Optional[int] = None,
y_max: Optional[int] = None,
cols: int = 0,
rows: int = 0,
**params: Any,
) -> KeypointInternalType:
keypoint_out = F.keypoint_rotate(keypoint, angle, rows, cols, **params)
if self.crop_border and x_min is not None and x_max is not None and y_min is not None and y_max is not None:
return FCrops.crop_keypoint_by_coords(keypoint_out, (x_min, y_min, x_max, y_max))
return keypoint_out
@staticmethod
def _rotated_rect_with_max_area(h: int, w: int, angle: float) -> Dict[str, int]:
"""Given a rectangle of size wxh that has been rotated by 'angle' (in
degrees), computes the width and height of the largest possible
axis-aligned rectangle (maximal area) within the rotated rectangle.
Code from: https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders
"""
angle = math.radians(angle)
width_is_longer = w >= h
side_long, side_short = (w, h) if width_is_longer else (h, w)
# since the solutions for angle, -angle and 180-angle are all the same,
# it is sufficient to look at the first quadrant and the absolute values of sin,cos:
sin_a, cos_a = abs(math.sin(angle)), abs(math.cos(angle))
if side_short <= 2.0 * sin_a * cos_a * side_long or abs(sin_a - cos_a) < SMALL_NUMBER:
# half constrained case: two crop corners touch the longer side,
# the other two corners are on the mid-line parallel to the longer line
x = 0.5 * side_short
wr, hr = (x / sin_a, x / cos_a) if width_is_longer else (x / cos_a, x / sin_a)
else:
# fully constrained case: crop touches all 4 sides
cos_2a = cos_a * cos_a - sin_a * sin_a
wr, hr = (w * cos_a - h * sin_a) / cos_2a, (h * cos_a - w * sin_a) / cos_2a
return {
"x_min": max(0, int(w / 2 - wr / 2)),
"x_max": min(w, int(w / 2 + wr / 2)),
"y_min": max(0, int(h / 2 - hr / 2)),
"y_max": min(h, int(h / 2 + hr / 2)),
}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
out_params = {"angle": random.uniform(self.limit[0], self.limit[1])}
if self.crop_border:
h, w = params["image"].shape[:2]
out_params.update(self._rotated_rect_with_max_area(h, w, out_params["angle"]))
return out_params
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("limit", "interpolation", "border_mode", "value", "mask_value", "rotate_method", "crop_border")
class SafeRotate
(limit=90, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.
The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we may see some artifacts.
Parameters:
Name | Type | Description |
---|---|---|
limit | int, int) or int | range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: (-90, 90) |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode | OpenCV flag | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/rotate.py
class SafeRotate(DualTransform):
"""Rotate the input inside the input's frame by an angle selected randomly from the uniform distribution.
The resulting image may have artifacts in it. After rotation, the image may have a different aspect ratio, and
after resizing, it returns to its original shape with the original aspect ratio of the image. For these reason we
may see some artifacts.
Args:
limit ((int, int) or int): range from which a random angle is picked. If limit is a single int
an angle is picked from (-limit, limit). Default: (-90, 90)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
limit: Union[float, Tuple[float, float]] = 90,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ColorType] = None,
mask_value: Optional[ColorType] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.limit = to_tuple(limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
def apply(self, img: np.ndarray, matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.safe_rotate(img, matrix, cast(int, self.interpolation), self.value, self.border_mode)
def apply_to_mask(self, mask: np.ndarray, matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.safe_rotate(mask, matrix, cv2.INTER_NEAREST, self.mask_value, self.border_mode)
def apply_to_bbox(self, bbox: BoxInternalType, cols: int = 0, rows: int = 0, **params: Any) -> BoxInternalType:
return F.bbox_safe_rotate(bbox, params["matrix"], cols, rows)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
angle: float = 0,
scale_x: float = 0,
scale_y: float = 0,
cols: int = 0,
rows: int = 0,
**params: Any,
) -> KeypointInternalType:
return F.keypoint_safe_rotate(keypoint, params["matrix"], angle, scale_x, scale_y, cols, rows)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
angle = random.uniform(self.limit[0], self.limit[1])
image = params["image"]
height, width = image.shape[:2]
# https://stackoverflow.com/questions/43892506/opencv-python-rotate-image-without-cropping-sides
image_center = (width / 2, height / 2)
# Rotation Matrix
rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
# rotation calculates the cos and sin, taking absolutes of those.
abs_cos = abs(rotation_mat[0, 0])
abs_sin = abs(rotation_mat[0, 1])
# find the new width and height bounds
new_w = math.ceil(height * abs_sin + width * abs_cos)
new_h = math.ceil(height * abs_cos + width * abs_sin)
scale_x = width / new_w
scale_y = height / new_h
# Shift the image to create padding
rotation_mat[0, 2] += new_w / 2 - image_center[0]
rotation_mat[1, 2] += new_h / 2 - image_center[1]
# Rescale to original size
scale_mat = np.diag(np.ones(3))
scale_mat[0, 0] *= scale_x
scale_mat[1, 1] *= scale_y
_tmp = np.diag(np.ones(3))
_tmp[:2] = rotation_mat
_tmp = scale_mat @ _tmp
rotation_mat = _tmp[:2]
return {"matrix": rotation_mat, "angle": angle, "scale_x": scale_x, "scale_y": scale_y}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str]:
return ("limit", "interpolation", "border_mode", "value", "mask_value")
transforms
¶
class Affine
(scale=None, translate_percent=None, translate_px=None, rotate=None, shear=None, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode=0, fit_output=False, keep_ratio=False, rotate_method='largest_box', always_apply=False, p=0.5)
[view source on GitHub] ¶
Augmentation to apply affine transformations to images. This is mostly a wrapper around the corresponding classes and functions in OpenCV.
Affine transformations involve:
- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)
All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters cval
and mode
of this class deal with this.
Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation
and mask_interpolation
deals with the method of interpolation used for this.
Parameters:
Name | Type | Description |
---|---|---|
scale | number, tuple of number or dict | Scaling factor to use, where |
translate_percent | None, number, tuple of number or dict | Translation as a fraction of the image height/width (x-translation, y-translation), where |
translate_px | None, int, tuple of int or dict | Translation in pixels. * If |
rotate | number or tuple of number | Rotation in degrees (NOT radians), i.e. expected value range is around |
shear | number, tuple of number or dict | Shear in degrees (NOT radians), i.e. expected value range is around |
interpolation | int | OpenCV interpolation flag. |
mask_interpolation | int | OpenCV interpolation flag. |
cval | number or sequence of number | The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when |
cval_mask | number or tuple of number | Same as cval but only for masks. |
mode | int | OpenCV border flag. |
fit_output | bool | If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation ( |
keep_ratio | bool | When True, the original aspect ratio will be kept when the random scale is applied. Default: False. |
rotate_method | str | rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box" |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, keypoints, bboxes
Image types: uint8, float32
Reference
Source code in albumentations/augmentations/geometric/transforms.py
class Affine(DualTransform):
"""Augmentation to apply affine transformations to images.
This is mostly a wrapper around the corresponding classes and functions in OpenCV.
Affine transformations involve:
- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)
All such transformations can create "new" pixels in the image without a defined content, e.g.
if the image is translated to the left, pixels are created on the right.
A method has to be defined to deal with these pixel values.
The parameters `cval` and `mode` of this class deal with this.
Some transformations involve interpolations between several pixels
of the input image to generate output pixel values. The parameters `interpolation` and
`mask_interpolation` deals with the method of interpolation used for this.
Args:
scale (number, tuple of number or dict): Scaling factor to use, where ``1.0`` denotes "no change" and
``0.5`` is zoomed out to ``50`` percent of the original size.
* If a single number, then that value will be used for all images.
* If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
That the same range will be used for both x- and y-axis. To keep the aspect ratio, set
``keep_ratio=True``, then the same value will be used for both x- and y-axis.
* If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
Each of these keys can have the same values as described above.
Using a dictionary allows to set different values for the two axis and sampling will then happen
*independently* per axis, resulting in samples that differ between the axes. Note that when
the ``keep_ratio=True``, the x- and y-axis ranges should be the same.
translate_percent (None, number, tuple of number or dict): Translation as a fraction of the image height/width
(x-translation, y-translation), where ``0`` denotes "no change"
and ``0.5`` denotes "half of the axis size".
* If ``None`` then equivalent to ``0.0`` unless `translate_px` has a value other than ``None``.
* If a single number, then that value will be used for all images.
* If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``.
That sampled fraction value will be used identically for both x- and y-axis.
* If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
Each of these keys can have the same values as described above.
Using a dictionary allows to set different values for the two axis and sampling will then happen
*independently* per axis, resulting in samples that differ between the axes.
translate_px (None, int, tuple of int or dict): Translation in pixels.
* If ``None`` then equivalent to ``0`` unless `translate_percent` has a value other than ``None``.
* If a single int, then that value will be used for all images.
* If a tuple ``(a, b)``, then a value will be uniformly sampled per image from
the discrete interval ``[a..b]``. That number will be used identically for both x- and y-axis.
* If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
Each of these keys can have the same values as described above.
Using a dictionary allows to set different values for the two axis and sampling will then happen
*independently* per axis, resulting in samples that differ between the axes.
rotate (number or tuple of number): Rotation in degrees (**NOT** radians), i.e. expected value range is
around ``[-360, 360]``. Rotation happens around the *center* of the image,
not the top left corner as in some other frameworks.
* If a number, then that value will be used for all images.
* If a tuple ``(a, b)``, then a value will be uniformly sampled per image from the interval ``[a, b]``
and used as the rotation value.
shear (number, tuple of number or dict): Shear in degrees (**NOT** radians), i.e. expected value range is
around ``[-360, 360]``, with reasonable values being in the range of ``[-45, 45]``.
* If a number, then that value will be used for all images as
the shear on the x-axis (no shear on the y-axis will be done).
* If a tuple ``(a, b)``, then two value will be uniformly sampled per image
from the interval ``[a, b]`` and be used as the x- and y-shear value.
* If a dictionary, then it is expected to have the keys ``x`` and/or ``y``.
Each of these keys can have the same values as described above.
Using a dictionary allows to set different values for the two axis and sampling will then happen
*independently* per axis, resulting in samples that differ between the axes.
interpolation (int): OpenCV interpolation flag.
mask_interpolation (int): OpenCV interpolation flag.
cval (number or sequence of number): The constant value to use when filling in newly created pixels.
(E.g. translating by 1px to the right will create a new 1px-wide column of pixels
on the left of the image).
The value is only used when `mode=constant`. The expected value range is ``[0, 255]`` for ``uint8`` images.
cval_mask (number or tuple of number): Same as cval but only for masks.
mode (int): OpenCV border flag.
fit_output (bool): If True, the image plane size and position will be adjusted to tightly capture
the whole image after affine transformation (`translate_percent` and `translate_px` are ignored).
Otherwise (``False``), parts of the transformed image may end up outside the image plane.
Fitting the output shape can be useful to avoid corners of the image being outside the image plane
after applying rotations. Default: False
keep_ratio (bool): When True, the original aspect ratio will be kept when the random scale is applied.
Default: False.
rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or
"ellipse"[1].
Default: "largest_box"
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, keypoints, bboxes
Image types:
uint8, float32
Reference:
[1] https://arxiv.org/abs/2109.13488
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
scale: Optional[Union[ScaleFloatType, Dict[str, Any]]] = None,
translate_percent: Optional[Union[float, Tuple[float, float], Dict[str, Any]]] = None,
translate_px: Optional[Union[int, Tuple[int, int], Dict[str, Any]]] = None,
rotate: Optional[ScaleFloatType] = None,
shear: Optional[Union[ScaleFloatType, Dict[str, Any]]] = None,
interpolation: int = cv2.INTER_LINEAR,
mask_interpolation: int = cv2.INTER_NEAREST,
cval: Union[float, Tuple[float, float]] = 0,
cval_mask: Union[float, Tuple[float, float]] = 0,
mode: int = cv2.BORDER_CONSTANT,
fit_output: bool = False,
keep_ratio: bool = False,
rotate_method: str = "largest_box",
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
params = [scale, translate_percent, translate_px, rotate, shear]
if all(p is None for p in params):
scale = {"x": (0.9, 1.1), "y": (0.9, 1.1)}
translate_percent = {"x": (-0.1, 0.1), "y": (-0.1, 0.1)}
rotate = (-15, 15)
shear = {"x": (-10, 10), "y": (-10, 10)}
else:
scale = scale if scale is not None else 1.0
rotate = rotate if rotate is not None else 0.0
shear = shear if shear is not None else 0.0
self.interpolation = interpolation
self.mask_interpolation = mask_interpolation
self.cval = cval
self.cval_mask = cval_mask
self.mode = mode
self.scale = self._handle_dict_arg(scale, "scale")
self.translate_percent, self.translate_px = self._handle_translate_arg(translate_px, translate_percent)
self.rotate = to_tuple(rotate, rotate)
self.fit_output = fit_output
self.shear = self._handle_dict_arg(shear, "shear")
self.keep_ratio = keep_ratio
self.rotate_method = rotate_method
if self.keep_ratio and self.scale["x"] != self.scale["y"]:
raise ValueError(f"When keep_ratio is True, the x and y scale range should be identical. got {self.scale}")
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"interpolation",
"mask_interpolation",
"cval",
"mode",
"scale",
"translate_percent",
"translate_px",
"rotate",
"fit_output",
"shear",
"cval_mask",
"keep_ratio",
"rotate_method",
)
@staticmethod
def _handle_dict_arg(
val: Union[float, Tuple[float, float], Dict[str, Any]], name: str, default: float = 1.0
) -> Dict[str, Any]:
if isinstance(val, dict):
if "x" not in val and "y" not in val:
raise ValueError(
f'Expected {name} dictionary to contain at least key "x" or ' 'key "y". Found neither of them.'
)
x = val.get("x", default)
y = val.get("y", default)
return {"x": to_tuple(x, x), "y": to_tuple(y, y)}
return {"x": to_tuple(val, val), "y": to_tuple(val, val)}
@classmethod
def _handle_translate_arg(
cls,
translate_px: Optional[Union[float, Tuple[float, float], Dict[str, Any]]],
translate_percent: Optional[Union[float, Tuple[float, float], Dict[str, Any]]],
) -> Any:
if translate_percent is None and translate_px is None:
translate_px = 0
if translate_percent is not None and translate_px is not None:
msg = "Expected either translate_percent or translate_px to be " "provided, " "but neither of them was."
raise ValueError(msg)
if translate_percent is not None:
# translate by percent
return cls._handle_dict_arg(translate_percent, "translate_percent", default=0.0), translate_px
if translate_px is None:
msg = "translate_px is None."
raise ValueError(msg)
# translate by pixels
return translate_percent, cls._handle_dict_arg(translate_px, "translate_px")
def apply(
self,
img: np.ndarray,
matrix: skimage.transform.ProjectiveTransform = None,
output_shape: Sequence[int] = (),
**params: Any,
) -> np.ndarray:
return F.warp_affine(
img,
matrix,
interpolation=cast(int, self.interpolation),
cval=self.cval,
mode=self.mode,
output_shape=output_shape,
)
def apply_to_mask(
self,
mask: np.ndarray,
matrix: skimage.transform.ProjectiveTransform = None,
output_shape: Sequence[int] = (),
**params: Any,
) -> np.ndarray:
return F.warp_affine(
mask,
matrix,
interpolation=self.mask_interpolation,
cval=self.cval_mask,
mode=self.mode,
output_shape=output_shape,
)
def apply_to_bbox(
self,
bbox: BoxInternalType,
matrix: skimage.transform.ProjectiveTransform = None,
rows: int = 0,
cols: int = 0,
output_shape: Sequence[int] = (),
**params: Any,
) -> BoxInternalType:
return F.bbox_affine(bbox, matrix, self.rotate_method, rows, cols, output_shape)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
matrix: Optional[skimage.transform.ProjectiveTransform] = None,
scale: Optional[Dict[str, Any]] = None,
**params: Any,
) -> KeypointInternalType:
if scale is None:
msg = "Expected scale to be provided, but got None."
raise ValueError(msg)
if matrix is None:
msg = "Expected matrix to be provided, but got None."
raise ValueError(msg)
return F.keypoint_affine(keypoint, matrix=matrix, scale=scale)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
height, width = params["image"].shape[:2]
translate: Dict[str, Union[int, float]]
if self.translate_px is not None:
translate = {key: random.randint(*value) for key, value in self.translate_px.items()}
elif self.translate_percent is not None:
translate = {key: random.uniform(*value) for key, value in self.translate_percent.items()}
translate["x"] = translate["x"] * width
translate["y"] = translate["y"] * height
else:
translate = {"x": 0, "y": 0}
# Look to issue https://github.com/albumentations-team/albumentations/issues/1079
shear = {key: -random.uniform(*value) for key, value in self.shear.items()}
scale = {key: random.uniform(*value) for key, value in self.scale.items()}
if self.keep_ratio:
scale["y"] = scale["x"]
# Look to issue https://github.com/albumentations-team/albumentations/issues/1079
rotate = -random.uniform(*self.rotate)
# for images we use additional shifts of (0.5, 0.5) as otherwise
# we get an ugly black border for 90deg rotations
shift_x = width / 2 - 0.5
shift_y = height / 2 - 0.5
matrix_to_topleft = skimage.transform.SimilarityTransform(translation=[-shift_x, -shift_y])
matrix_shear_y_rot = skimage.transform.AffineTransform(rotation=-np.pi / 2)
matrix_shear_y = skimage.transform.AffineTransform(shear=np.deg2rad(shear["y"]))
matrix_shear_y_rot_inv = skimage.transform.AffineTransform(rotation=np.pi / 2)
matrix_transforms = skimage.transform.AffineTransform(
scale=(scale["x"], scale["y"]),
translation=(translate["x"], translate["y"]),
rotation=np.deg2rad(rotate),
shear=np.deg2rad(shear["x"]),
)
matrix_to_center = skimage.transform.SimilarityTransform(translation=[shift_x, shift_y])
matrix = (
matrix_to_topleft
+ matrix_shear_y_rot
+ matrix_shear_y
+ matrix_shear_y_rot_inv
+ matrix_transforms
+ matrix_to_center
)
if self.fit_output:
matrix, output_shape = self._compute_affine_warp_output_shape(matrix, params["image"].shape)
else:
output_shape = params["image"].shape
return {
"rotate": rotate,
"scale": scale,
"matrix": matrix,
"output_shape": output_shape,
}
@staticmethod
def _compute_affine_warp_output_shape(
matrix: skimage.transform.ProjectiveTransform, input_shape: Sequence[int]
) -> Tuple[skimage.transform.ProjectiveTransform, Sequence[int]]:
height, width = input_shape[:2]
if height == 0 or width == 0:
return matrix, input_shape
# determine shape of output image
corners = np.array([[0, 0], [0, height - 1], [width - 1, height - 1], [width - 1, 0]])
corners = matrix(corners)
minc = corners[:, 0].min()
minr = corners[:, 1].min()
maxc = corners[:, 0].max()
maxr = corners[:, 1].max()
out_height = maxr - minr + 1
out_width = maxc - minc + 1
if len(input_shape) == THREE:
output_shape = np.ceil((out_height, out_width, input_shape[2]))
else:
output_shape = np.ceil((out_height, out_width))
output_shape_tuple = tuple([int(v) for v in output_shape.tolist()])
# fit output image in new shape
translation = (-minc, -minr)
matrix_to_fit = skimage.transform.SimilarityTransform(translation=translation)
matrix = matrix + matrix_to_fit
return matrix, output_shape_tuple
class ElasticTransform
(alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, approximate=False, same_dxdy=False, p=0.5)
[view source on GitHub] ¶
Elastic deformation of images as described in [Simard2003]_ (with modifications). Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for Convolutional Neural Networks applied to Visual Document Analysis", in Proc. of the International Conference on Document Analysis and Recognition, 2003.
Parameters:
Name | Type | Description |
---|---|---|
alpha | float | |
sigma | float | Gaussian filter parameter. |
alpha_affine | float | The range will be (-alpha_affine, alpha_affine) |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode | OpenCV flag | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
approximate | boolean | Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images. |
same_dxdy | boolean | Whether to use same random generated shift for x and y. Enabling this option gives ~2X speedup. |
Targets
image, mask, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class ElasticTransform(DualTransform):
"""Elastic deformation of images as described in [Simard2003]_ (with modifications).
Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
Convolutional Neural Networks applied to Visual Document Analysis", in
Proc. of the International Conference on Document Analysis and
Recognition, 2003.
Args:
alpha (float):
sigma (float): Gaussian filter parameter.
alpha_affine (float): The range will be (-alpha_affine, alpha_affine)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
approximate (boolean): Whether to smooth displacement map with fixed kernel size.
Enabling this option gives ~2X speedup on large images.
same_dxdy (boolean): Whether to use same random generated shift for x and y.
Enabling this option gives ~2X speedup.
Targets:
image, mask, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)
def __init__(
self,
alpha: float = 1,
sigma: float = 50,
alpha_affine: float = 50,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[Union[int, float, List[int], List[float]]] = None,
mask_value: Optional[Union[int, float, List[int], List[float]]] = None,
always_apply: bool = False,
approximate: bool = False,
same_dxdy: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.alpha = alpha
self.alpha_affine = alpha_affine
self.sigma = sigma
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
self.approximate = approximate
self.same_dxdy = same_dxdy
def apply(
self, img: np.ndarray, random_state: Optional[int] = None, interpolation: int = cv2.INTER_LINEAR, **params: Any
) -> np.ndarray:
return F.elastic_transform(
img,
self.alpha,
self.sigma,
self.alpha_affine,
interpolation,
self.border_mode,
self.value,
np.random.RandomState(random_state),
self.approximate,
self.same_dxdy,
)
def apply_to_mask(self, mask: np.ndarray, random_state: Optional[int] = None, **params: Any) -> np.ndarray:
return F.elastic_transform(
mask,
self.alpha,
self.sigma,
self.alpha_affine,
cv2.INTER_NEAREST,
self.border_mode,
self.mask_value,
np.random.RandomState(random_state),
self.approximate,
self.same_dxdy,
)
def apply_to_bbox(
self, bbox: BoxInternalType, random_state: Optional[int] = None, **params: Any
) -> BoxInternalType:
rows, cols = params["rows"], params["cols"]
mask = np.zeros((rows, cols), dtype=np.uint8)
bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = bbox_denorm[:4]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
mask[y_min:y_max, x_min:x_max] = 1
mask = F.elastic_transform(
mask,
self.alpha,
self.sigma,
self.alpha_affine,
cv2.INTER_NEAREST,
self.border_mode,
self.mask_value,
np.random.RandomState(random_state),
self.approximate,
)
bbox_returned = bbox_from_mask(mask)
return cast(BoxInternalType, F.normalize_bbox(bbox_returned, rows, cols))
def get_params(self) -> Dict[str, int]:
return {"random_state": random.randint(0, 10000)}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"alpha",
"sigma",
"alpha_affine",
"interpolation",
"border_mode",
"value",
"mask_value",
"approximate",
"same_dxdy",
)
class Flip
[view source on GitHub] ¶
Flip the input either horizontally, vertically or both horizontally and vertically.
Parameters:
Name | Type | Description |
---|---|---|
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class Flip(DualTransform):
"""Flip the input either horizontally, vertically or both horizontally and vertically.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def apply(self, img: np.ndarray, d: int = 0, **params: Any) -> np.ndarray:
"""Args:
d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
-1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
180 degrees).
"""
return F.random_flip(img, d)
def get_params(self) -> Dict[str, int]:
# Random int in the range [-1, 1]
return {"d": random.randint(-1, 1)}
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_flip(bbox, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.keypoint_flip(keypoint, **params)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
apply (self, img, d=0, **params)
¶
d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping, -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).
Source code in albumentations/augmentations/geometric/transforms.py
def apply(self, img: np.ndarray, d: int = 0, **params: Any) -> np.ndarray:
"""Args:
d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
-1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
180 degrees).
"""
return F.random_flip(img, d)
class GridDistortion
(num_steps=5, distort_limit=0.3, interpolation=1, border_mode=4, value=None, mask_value=None, normalized=False, always_apply=False, p=0.5)
[view source on GitHub] ¶
Parameters:
Name | Type | Description |
---|---|---|
num_steps | int | count of grid cells on each side. |
distort_limit | float, (float, float | If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.03, 0.03). |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode | OpenCV flag | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
normalized | bool | if true, distortion will be normalized to do not go outside the image. Default: False See for more information: https://github.com/albumentations-team/albumentations/pull/722 |
Targets
image, mask, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class GridDistortion(DualTransform):
"""Args:
num_steps (int): count of grid cells on each side.
distort_limit (float, (float, float)): If distort_limit is a single float, the range
will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
normalized (bool): if true, distortion will be normalized to do not go outside the image. Default: False
See for more information: https://github.com/albumentations-team/albumentations/pull/722
Targets:
image, mask, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)
def __init__(
self,
num_steps: int = 5,
distort_limit: ScaleFloatType = 0.3,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
mask_value: Optional[ImageColorType] = None,
normalized: bool = False,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.num_steps = num_steps
self.distort_limit = to_tuple(distort_limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
self.normalized = normalized
def apply(
self,
img: np.ndarray,
stepsx: Tuple[()] = (),
stepsy: Tuple[()] = (),
interpolation: int = cv2.INTER_LINEAR,
**params: Any,
) -> np.ndarray:
return F.grid_distortion(img, self.num_steps, stepsx, stepsy, interpolation, self.border_mode, self.value)
def apply_to_mask(
self, mask: np.ndarray, stepsx: Tuple[()] = (), stepsy: Tuple[()] = (), **params: Any
) -> np.ndarray:
return F.grid_distortion(
mask, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
)
def apply_to_bbox(
self, bbox: BoxInternalType, stepsx: Tuple[()] = (), stepsy: Tuple[()] = (), **params: Any
) -> BoxInternalType:
rows, cols = params["rows"], params["cols"]
mask = np.zeros((rows, cols), dtype=np.uint8)
bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = bbox_denorm[:4]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
mask[y_min:y_max, x_min:x_max] = 1
mask = F.grid_distortion(
mask, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
)
bbox_returned = bbox_from_mask(mask)
return cast(BoxInternalType, F.normalize_bbox(bbox_returned, rows, cols))
def _normalize(self, h: int, w: int, xsteps: List[float], ysteps: List[float]) -> Dict[str, Any]:
# compensate for smaller last steps in source image.
x_step = w // self.num_steps
last_x_step = min(w, ((self.num_steps + 1) * x_step)) - (self.num_steps * x_step)
xsteps[-1] *= last_x_step / x_step
y_step = h // self.num_steps
last_y_step = min(h, ((self.num_steps + 1) * y_step)) - (self.num_steps * y_step)
ysteps[-1] *= last_y_step / y_step
# now normalize such that distortion never leaves image bounds.
tx = w / math.floor(w / self.num_steps)
ty = h / math.floor(h / self.num_steps)
xsteps = np.array(xsteps) * (tx / np.sum(xsteps))
ysteps = np.array(ysteps) * (ty / np.sum(ysteps))
return {"stepsx": xsteps, "stepsy": ysteps}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
height, width = params["image"].shape[:2]
stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
if self.normalized:
return self._normalize(height, width, stepsx, stepsy)
return {"stepsx": stepsx, "stepsy": stepsy}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"
class HorizontalFlip
[view source on GitHub] ¶
Flip the input horizontally around the y-axis.
Parameters:
Name | Type | Description |
---|---|---|
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class HorizontalFlip(DualTransform):
"""Flip the input horizontally around the y-axis.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
if img.ndim == THREE and img.shape[2] > 1 and img.dtype == np.uint8:
# Opencv is faster than numpy only in case of
# non-gray scale 8bits images
return F.hflip_cv2(img)
return F.hflip(img)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_hflip(bbox, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.keypoint_hflip(keypoint, **params)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class OpticalDistortion
(distort_limit=0.05, shift_limit=0.05, interpolation=1, border_mode=4, value=None, mask_value=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Parameters:
Name | Type | Description |
---|---|---|
distort_limit | float, (float, float | If distort_limit is a single float, the range will be (-distort_limit, distort_limit). Default: (-0.05, 0.05). |
shift_limit | float, (float, float | If shift_limit is a single float, the range will be (-shift_limit, shift_limit). Default: (-0.05, 0.05). |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode | OpenCV flag | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of ints, list of float | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
Targets
image, mask, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class OpticalDistortion(DualTransform):
"""Args:
distort_limit (float, (float, float)): If distort_limit is a single float, the range
will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
shift_limit (float, (float, float))): If shift_limit is a single float, the range
will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
Targets:
image, mask, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES)
def __init__(
self,
distort_limit: ScaleFloatType = 0.05,
shift_limit: ScaleFloatType = 0.05,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
mask_value: Optional[ImageColorType] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.shift_limit = to_tuple(shift_limit)
self.distort_limit = to_tuple(distort_limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
def apply(
self,
img: np.ndarray,
k: int = 0,
dx: int = 0,
dy: int = 0,
interpolation: int = cv2.INTER_LINEAR,
**params: Any,
) -> np.ndarray:
return F.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)
def apply_to_mask(self, mask: np.ndarray, k: int = 0, dx: int = 0, dy: int = 0, **params: Any) -> np.ndarray:
return F.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
def apply_to_bbox(
self, bbox: BoxInternalType, k: int = 0, dx: int = 0, dy: int = 0, **params: Any
) -> BoxInternalType:
rows, cols = params["rows"], params["cols"]
mask = np.zeros((rows, cols), dtype=np.uint8)
bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = bbox_denorm[:4]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
mask[y_min:y_max, x_min:x_max] = 1
mask = F.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
bbox_returned = bbox_from_mask(mask)
return cast(BoxInternalType, F.normalize_bbox(bbox_returned, rows, cols))
def get_params(self) -> Dict[str, Any]:
return {
"k": random.uniform(self.distort_limit[0], self.distort_limit[1]),
"dx": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
"dy": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"distort_limit",
"shift_limit",
"interpolation",
"border_mode",
"value",
"mask_value",
)
class PadIfNeeded
(min_height=1024, min_width=1024, pad_height_divisor=None, pad_width_divisor=None, position=<PositionType.CENTER: 'center'>, border_mode=4, value=None, mask_value=None, always_apply=False, p=1.0)
[view source on GitHub] ¶
Pad side of the image / max if side is less than desired number.
Parameters:
Name | Type | Description |
---|---|---|
min_height | int | minimal result image height. |
min_width | int | minimal result image width. |
pad_height_divisor | int | if not None, ensures image height is dividable by value of this argument. |
pad_width_divisor | int | if not None, ensures image width is dividable by value of this argument. |
position | Union[str, PositionType] | Position of the image. should be PositionType.CENTER or PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT. or PositionType.RANDOM. Default: PositionType.CENTER. |
border_mode | OpenCV flag | OpenCV border mode. |
value | int, float, list of int, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of int, list of float | padding value for mask if border_mode is cv2.BORDER_CONSTANT. |
p | float | probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class PadIfNeeded(DualTransform):
"""Pad side of the image / max if side is less than desired number.
Args:
min_height (int): minimal result image height.
min_width (int): minimal result image width.
pad_height_divisor (int): if not None, ensures image height is dividable by value of this argument.
pad_width_divisor (int): if not None, ensures image width is dividable by value of this argument.
position (Union[str, PositionType]): Position of the image. should be PositionType.CENTER or
PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT.
or PositionType.RANDOM. Default: PositionType.CENTER.
border_mode (OpenCV flag): OpenCV border mode.
value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of int,
list of float): padding value for mask if border_mode is cv2.BORDER_CONSTANT.
p (float): probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
class PositionType(Enum):
"""Enumerates the types of positions for placing an object within a container.
This Enum class is utilized to define specific anchor positions that an object can
assume relative to a container. It's particularly useful in image processing, UI layout,
and graphic design to specify the alignment and positioning of elements.
Attributes:
CENTER (str): Specifies that the object should be placed at the center.
TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
RANDOM (str): Indicates that the object's position should be determined randomly.
"""
CENTER = "center"
TOP_LEFT = "top_left"
TOP_RIGHT = "top_right"
BOTTOM_LEFT = "bottom_left"
BOTTOM_RIGHT = "bottom_right"
RANDOM = "random"
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
min_height: Optional[int] = 1024,
min_width: Optional[int] = 1024,
pad_height_divisor: Optional[int] = None,
pad_width_divisor: Optional[int] = None,
position: Union[PositionType, str] = PositionType.CENTER,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
mask_value: Optional[ImageColorType] = None,
always_apply: bool = False,
p: float = 1.0,
):
if (min_height is None) == (pad_height_divisor is None):
msg = "Only one of 'min_height' and 'pad_height_divisor' parameters must be set"
raise ValueError(msg)
if (min_width is None) == (pad_width_divisor is None):
msg = "Only one of 'min_width' and 'pad_width_divisor' parameters must be set"
raise ValueError(msg)
super().__init__(always_apply, p)
self.min_height = min_height
self.min_width = min_width
self.pad_width_divisor = pad_width_divisor
self.pad_height_divisor = pad_height_divisor
self.position = PadIfNeeded.PositionType(position)
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
params = super().update_params(params, **kwargs)
rows = params["rows"]
cols = params["cols"]
if self.min_height is not None:
if rows < self.min_height:
h_pad_top = int((self.min_height - rows) / 2.0)
h_pad_bottom = self.min_height - rows - h_pad_top
else:
h_pad_top = 0
h_pad_bottom = 0
else:
pad_remained = rows % self.pad_height_divisor
pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0
h_pad_top = pad_rows // 2
h_pad_bottom = pad_rows - h_pad_top
if self.min_width is not None:
if cols < self.min_width:
w_pad_left = int((self.min_width - cols) / 2.0)
w_pad_right = self.min_width - cols - w_pad_left
else:
w_pad_left = 0
w_pad_right = 0
else:
pad_remainder = cols % self.pad_width_divisor
pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0
w_pad_left = pad_cols // 2
w_pad_right = pad_cols - w_pad_left
h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
h_top=h_pad_top, h_bottom=h_pad_bottom, w_left=w_pad_left, w_right=w_pad_right
)
params.update(
{
"pad_top": h_pad_top,
"pad_bottom": h_pad_bottom,
"pad_left": w_pad_left,
"pad_right": w_pad_right,
}
)
return params
def apply(
self,
img: np.ndarray,
pad_top: int = 0,
pad_bottom: int = 0,
pad_left: int = 0,
pad_right: int = 0,
**params: Any,
) -> np.ndarray:
return F.pad_with_params(
img,
pad_top,
pad_bottom,
pad_left,
pad_right,
border_mode=self.border_mode,
value=self.value,
)
def apply_to_mask(
self,
mask: np.ndarray,
pad_top: int = 0,
pad_bottom: int = 0,
pad_left: int = 0,
pad_right: int = 0,
**params: Any,
) -> np.ndarray:
return F.pad_with_params(
mask,
pad_top,
pad_bottom,
pad_left,
pad_right,
border_mode=self.border_mode,
value=self.mask_value,
)
def apply_to_bbox(
self,
bbox: BoxInternalType,
pad_top: int = 0,
pad_bottom: int = 0,
pad_left: int = 0,
pad_right: int = 0,
rows: int = 0,
cols: int = 0,
**params: Any,
) -> BoxInternalType:
x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
bbox = x_min + pad_left, y_min + pad_top, x_max + pad_left, y_max + pad_top
return cast(BoxInternalType, normalize_bbox(bbox, rows + pad_top + pad_bottom, cols + pad_left + pad_right))
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
pad_top: int = 0,
pad_bottom: int = 0,
pad_left: int = 0,
pad_right: int = 0,
**params: Any,
) -> KeypointInternalType:
x, y, angle, scale = keypoint[:4]
return x + pad_left, y + pad_top, angle, scale
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"min_height",
"min_width",
"pad_height_divisor",
"pad_width_divisor",
"position",
"border_mode",
"value",
"mask_value",
)
def __update_position_params(
self, h_top: int, h_bottom: int, w_left: int, w_right: int
) -> Tuple[int, int, int, int]:
if self.position == PadIfNeeded.PositionType.TOP_LEFT:
h_bottom += h_top
w_right += w_left
h_top = 0
w_left = 0
elif self.position == PadIfNeeded.PositionType.TOP_RIGHT:
h_bottom += h_top
w_left += w_right
h_top = 0
w_right = 0
elif self.position == PadIfNeeded.PositionType.BOTTOM_LEFT:
h_top += h_bottom
w_right += w_left
h_bottom = 0
w_left = 0
elif self.position == PadIfNeeded.PositionType.BOTTOM_RIGHT:
h_top += h_bottom
w_left += w_right
h_bottom = 0
w_right = 0
elif self.position == PadIfNeeded.PositionType.RANDOM:
h_pad = h_top + h_bottom
w_pad = w_left + w_right
h_top = random.randint(0, h_pad)
h_bottom = h_pad - h_top
w_left = random.randint(0, w_pad)
w_right = w_pad - w_left
return h_top, h_bottom, w_left, w_right
class PositionType
¶
Enumerates the types of positions for placing an object within a container.
This Enum class is utilized to define specific anchor positions that an object can assume relative to a container. It's particularly useful in image processing, UI layout, and graphic design to specify the alignment and positioning of elements.
Attributes:
Name | Type | Description |
---|---|---|
CENTER | str | Specifies that the object should be placed at the center. |
TOP_LEFT | str | Specifies that the object should be placed at the top-left corner. |
TOP_RIGHT | str | Specifies that the object should be placed at the top-right corner. |
BOTTOM_LEFT | str | Specifies that the object should be placed at the bottom-left corner. |
BOTTOM_RIGHT | str | Specifies that the object should be placed at the bottom-right corner. |
RANDOM | str | Indicates that the object's position should be determined randomly. |
Source code in albumentations/augmentations/geometric/transforms.py
class PositionType(Enum):
"""Enumerates the types of positions for placing an object within a container.
This Enum class is utilized to define specific anchor positions that an object can
assume relative to a container. It's particularly useful in image processing, UI layout,
and graphic design to specify the alignment and positioning of elements.
Attributes:
CENTER (str): Specifies that the object should be placed at the center.
TOP_LEFT (str): Specifies that the object should be placed at the top-left corner.
TOP_RIGHT (str): Specifies that the object should be placed at the top-right corner.
BOTTOM_LEFT (str): Specifies that the object should be placed at the bottom-left corner.
BOTTOM_RIGHT (str): Specifies that the object should be placed at the bottom-right corner.
RANDOM (str): Indicates that the object's position should be determined randomly.
"""
CENTER = "center"
TOP_LEFT = "top_left"
TOP_RIGHT = "top_right"
BOTTOM_LEFT = "bottom_left"
BOTTOM_RIGHT = "bottom_right"
RANDOM = "random"
class Perspective
(scale=(0.05, 0.1), keep_size=True, pad_mode=0, pad_val=0, mask_pad_val=0, fit_output=False, interpolation=1, always_apply=False, p=0.5)
[view source on GitHub] ¶
Perform a random four point perspective transform of the input.
Parameters:
Name | Type | Description |
---|---|---|
scale | Union[float, Tuple[float, float]] | standard deviation of the normal distributions. These are used to sample the random distances of the subimage's corners from the full image's corners. If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1). |
keep_size | bool | Whether to resize image back to their original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes and will always be a list, never an array. Default: True |
pad_mode | OpenCV flag | OpenCV border mode. |
pad_val | int, float, list of int, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0 |
mask_pad_val | int, float, list of int, list of float | padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0 |
fit_output | bool | If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.) Otherwise, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, keypoints, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class Perspective(DualTransform):
"""Perform a random four point perspective transform of the input.
Args:
scale: standard deviation of the normal distributions. These are used to sample
the random distances of the subimage's corners from the full image's corners.
If scale is a single float value, the range will be (0, scale). Default: (0.05, 0.1).
keep_size: Whether to resize image back to their original size after applying the perspective
transform. If set to False, the resulting images may end up having different shapes
and will always be a list, never an array. Default: True
pad_mode (OpenCV flag): OpenCV border mode.
pad_val (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
Default: 0
mask_pad_val (int, float, list of int, list of float): padding value for mask
if border_mode is cv2.BORDER_CONSTANT. Default: 0
fit_output (bool): If True, the image plane size and position will be adjusted to still capture
the whole image after perspective transformation. (Followed by image resizing if keep_size is set to True.)
Otherwise, parts of the transformed image may be outside of the image plane.
This setting should not be set to True when using large scale values as it could lead to very large images.
Default: False
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, keypoints, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)
def __init__(
self,
scale: ScaleFloatType = (0.05, 0.1),
keep_size: bool = True,
pad_mode: int = cv2.BORDER_CONSTANT,
pad_val: Union[float, List[float]] = 0,
mask_pad_val: Union[float, List[float]] = 0,
fit_output: bool = False,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.scale = to_tuple(scale, 0)
self.keep_size = keep_size
self.pad_mode = pad_mode
self.pad_val = pad_val
self.mask_pad_val = mask_pad_val
self.fit_output = fit_output
self.interpolation = interpolation
def apply(
self,
img: np.ndarray,
matrix: np.ndarray,
max_height: int,
max_width: int,
**params: Any,
) -> np.ndarray:
return F.perspective(
img, matrix, max_width, max_height, self.pad_val, self.pad_mode, self.keep_size, params["interpolation"]
)
def apply_to_bbox(
self,
bbox: BoxInternalType,
matrix: np.ndarray,
max_height: int,
max_width: int,
**params: Any,
) -> BoxInternalType:
return F.perspective_bbox(bbox, params["rows"], params["cols"], matrix, max_width, max_height, self.keep_size)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
matrix: np.ndarray,
max_height: int,
max_width: int,
**params: Any,
) -> np.ndarray:
return F.perspective_keypoint(
keypoint, params["rows"], params["cols"], matrix, max_width, max_height, self.keep_size
)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
height, width = params["image"].shape[:2]
scale = random_utils.uniform(*self.scale)
points = random_utils.normal(0, scale, [4, 2])
points = np.mod(np.abs(points), 0.32)
# top left -- no changes needed, just use jitter
# top right
points[1, 0] = 1.0 - points[1, 0] # w = 1.0 - jitter
# bottom right
points[2] = 1.0 - points[2] # w = 1.0 - jitt
# bottom left
points[3, 1] = 1.0 - points[3, 1] # h = 1.0 - jitter
points[:, 0] *= width
points[:, 1] *= height
# Obtain a consistent order of the points and unpack them individually.
# Warning: don't just do (tl, tr, br, bl) = _order_points(...)
# here, because the reordered points is used further below.
points = self._order_points(points)
tl, tr, br, bl = points
# compute the width of the new image, which will be the
# maximum distance between bottom-right and bottom-left
# x-coordiates or the top-right and top-left x-coordinates
min_width = None
max_width = None
while min_width is None or min_width < TWO:
width_top = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
width_bottom = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
max_width = int(max(width_top, width_bottom))
min_width = int(min(width_top, width_bottom))
if min_width < TWO:
step_size = (2 - min_width) / 2
tl[0] -= step_size
tr[0] += step_size
bl[0] -= step_size
br[0] += step_size
# compute the height of the new image, which will be the maximum distance between the top-right
# and bottom-right y-coordinates or the top-left and bottom-left y-coordinates
min_height = None
max_height = None
while min_height is None or min_height < TWO:
height_right = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
height_left = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
max_height = int(max(height_right, height_left))
min_height = int(min(height_right, height_left))
if min_height < TWO:
step_size = (2 - min_height) / 2
tl[1] -= step_size
tr[1] -= step_size
bl[1] += step_size
br[1] += step_size
# now that we have the dimensions of the new image, construct
# the set of destination points to obtain a "birds eye view",
# (i.e. top-down view) of the image, again specifying points
# in the top-left, top-right, bottom-right, and bottom-left order
# do not use width-1 or height-1 here, as for e.g. width=3, height=2
# the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
dst = np.array([[0, 0], [max_width, 0], [max_width, max_height], [0, max_height]], dtype=np.float32)
# compute the perspective transform matrix and then apply it
m = cv2.getPerspectiveTransform(points, dst)
if self.fit_output:
m, max_width, max_height = self._expand_transform(m, (height, width))
return {"matrix": m, "max_height": max_height, "max_width": max_width, "interpolation": self.interpolation}
@classmethod
def _expand_transform(cls, matrix: np.ndarray, shape: SizeType) -> Tuple[np.ndarray, int, int]:
height, width = shape[:2]
# do not use width-1 or height-1 here, as for e.g. width=3, height=2, max_height
# the bottom right coordinate is at (3.0, 2.0) and not (2.0, 1.0)
rect = np.array([[0, 0], [width, 0], [width, height], [0, height]], dtype=np.float32)
dst = cv2.perspectiveTransform(np.array([rect]), matrix)[0]
# get min x, y over transformed 4 points
# then modify target points by subtracting these minima => shift to (0, 0)
dst -= dst.min(axis=0, keepdims=True)
dst = np.around(dst, decimals=0)
matrix_expanded = cv2.getPerspectiveTransform(rect, dst)
max_width, max_height = dst.max(axis=0)
return matrix_expanded, int(max_width), int(max_height)
@staticmethod
def _order_points(pts: np.ndarray) -> np.ndarray:
pts = np.array(sorted(pts, key=lambda x: x[0]))
left = pts[:2] # points with smallest x coordinate - left points
right = pts[2:] # points with greatest x coordinate - right points
if left[0][1] < left[1][1]:
tl, bl = left
else:
bl, tl = left
if right[0][1] < right[1][1]:
tr, br = right
else:
br, tr = right
return np.array([tl, tr, br, bl], dtype=np.float32)
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return "scale", "keep_size", "pad_mode", "pad_val", "mask_pad_val", "fit_output", "interpolation"
class PiecewiseAffine
(scale=(0.03, 0.05), nb_rows=4, nb_cols=4, interpolation=1, mask_interpolation=0, cval=0, cval_mask=0, mode='constant', absolute_scale=False, always_apply=False, keypoints_threshold=0.01, p=0.5)
[view source on GitHub] ¶
Apply affine transformations that differ between local neighbourhoods. This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.
This is mostly a wrapper around scikit-image's PiecewiseAffine
. See also Affine
for a similar technique.
Note
This augmenter is very slow. Try to use ElasticTransformation
instead, which is at least 10x faster.
Note
For coordinate-based inputs (keypoints, bounding boxes, polygons, ...), this augmenter still has to perform an image-based augmentation, which will make it significantly slower and not fully correct for such inputs than other transforms.
Parameters:
Name | Type | Description |
---|---|---|
scale | float, tuple of float | Each point on the regular grid is moved around via a normal distribution. This scale factor is equivalent to the normal distribution's sigma. Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of the image if |
nb_rows | int, tuple of int | Number of rows of points that the regular grid should have. Must be at least |
nb_cols | int, tuple of int | Number of columns. Analogous to |
interpolation | int | The order of interpolation. The order has to be in the range 0-5: - 0: Nearest-neighbor - 1: Bi-linear (default) - 2: Bi-quadratic - 3: Bi-cubic - 4: Bi-quartic - 5: Bi-quintic |
mask_interpolation | int | same as interpolation but for mask. |
cval | number | The constant value to use when filling in newly created pixels. |
cval_mask | number | Same as cval but only for masks. |
mode | str | {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional Points outside the boundaries of the input are filled according to the given mode. Modes match the behaviour of |
absolute_scale | bool | Take |
keypoints_threshold | float | Used as threshold in conversion from distance maps to keypoints. The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use |
Targets
image, mask, keypoints, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class PiecewiseAffine(DualTransform):
"""Apply affine transformations that differ between local neighbourhoods.
This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point
around via affine transformations. This leads to local distortions.
This is mostly a wrapper around scikit-image's ``PiecewiseAffine``.
See also ``Affine`` for a similar technique.
Note:
This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.
Note:
For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
this augmenter still has to perform an image-based augmentation,
which will make it significantly slower and not fully correct for such inputs than other transforms.
Args:
scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
This scale factor is equivalent to the normal distribution's sigma.
Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
* If a single ``float``, then that value will always be used as the scale.
* If a tuple ``(a, b)`` of ``float`` s, then a random value will
be uniformly sampled per image from the interval ``[a, b]``.
nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
You might have to then adjust scale to lower values.
* If a single ``int``, then that value will always be used as the number of rows.
* If a tuple ``(a, b)``, then a value from the discrete interval
``[a..b]`` will be uniformly sampled per image.
nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
interpolation (int): The order of interpolation. The order has to be in the range 0-5:
- 0: Nearest-neighbor
- 1: Bi-linear (default)
- 2: Bi-quadratic
- 3: Bi-cubic
- 4: Bi-quartic
- 5: Bi-quintic
mask_interpolation (int): same as interpolation but for mask.
cval (number): The constant value to use when filling in newly created pixels.
cval_mask (number): Same as cval but only for masks.
mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
Points outside the boundaries of the input are filled according
to the given mode. Modes match the behaviour of `numpy.pad`.
absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
The search for keypoints works by searching for the
argmin (non-inverted) or argmax (inverted) in each channel. This
parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
as a keypoint. Use ``None`` to use no min/max. Default: 0.01
Targets:
image, mask, keypoints, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(
self,
scale: ScaleFloatType = (0.03, 0.05),
nb_rows: Union[ScaleIntType] = 4,
nb_cols: Union[ScaleIntType] = 4,
interpolation: int = 1,
mask_interpolation: int = 0,
cval: int = 0,
cval_mask: int = 0,
mode: str = "constant",
absolute_scale: bool = False,
always_apply: bool = False,
keypoints_threshold: float = 0.01,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.scale = to_tuple(scale, scale)
self.nb_rows = to_tuple(nb_rows, nb_rows)
self.nb_cols = to_tuple(nb_cols, nb_cols)
self.interpolation = interpolation
self.mask_interpolation = mask_interpolation
self.cval = cval
self.cval_mask = cval_mask
self.mode = mode
self.absolute_scale = absolute_scale
self.keypoints_threshold = keypoints_threshold
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"scale",
"nb_rows",
"nb_cols",
"interpolation",
"mask_interpolation",
"cval",
"cval_mask",
"mode",
"absolute_scale",
"keypoints_threshold",
)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
height, width = params["image"].shape[:2]
nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
nb_cells = nb_cols * nb_rows
scale = random.uniform(*self.scale)
jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
if not np.any(jitter > 0):
for _ in range(10): # See: https://github.com/albumentations-team/albumentations/issues/1442
jitter = random_utils.normal(0, scale, (nb_cells, 2))
if np.any(jitter > 0):
break
if not np.any(jitter > 0):
return {"matrix": None}
y = np.linspace(0, height, nb_rows)
x = np.linspace(0, width, nb_cols)
# (H, W) and (H, W) for H=rows, W=cols
xx_src, yy_src = np.meshgrid(x, y)
# (1, HW, 2) => (HW, 2) for H=rows, W=cols
points_src = np.dstack([yy_src.flat, xx_src.flat])[0]
if self.absolute_scale:
jitter[:, 0] = jitter[:, 0] / height if height > 0 else 0.0
jitter[:, 1] = jitter[:, 1] / width if width > 0 else 0.0
jitter[:, 0] = jitter[:, 0] * height
jitter[:, 1] = jitter[:, 1] * width
points_dest = np.copy(points_src)
points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]
# Restrict all destination points to be inside the image plane.
# This is necessary, as otherwise keypoints could be augmented
# outside of the image plane and these would be replaced by
# (-1, -1), which would not conform with the behaviour of the other augmenters.
points_dest[:, 0] = np.clip(points_dest[:, 0], 0, height - 1)
points_dest[:, 1] = np.clip(points_dest[:, 1], 0, width - 1)
matrix = skimage.transform.PiecewiseAffineTransform()
matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])
return {
"matrix": matrix,
}
def apply(
self, img: np.ndarray, matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None, **params: Any
) -> np.ndarray:
return F.piecewise_affine(img, matrix, cast(int, self.interpolation), self.mode, self.cval)
def apply_to_mask(
self, mask: np.ndarray, matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None, **params: Any
) -> np.ndarray:
return F.piecewise_affine(mask, matrix, self.mask_interpolation, self.mode, self.cval_mask)
def apply_to_bbox(
self,
bbox: BoxInternalType,
rows: int = 0,
cols: int = 0,
matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None,
**params: Any,
) -> BoxInternalType:
return F.bbox_piecewise_affine(bbox, matrix, rows, cols, self.keypoints_threshold)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
rows: int = 0,
cols: int = 0,
matrix: Optional[skimage.transform.PiecewiseAffineTransform] = None,
**params: Any,
) -> KeypointInternalType:
return F.keypoint_piecewise_affine(keypoint, matrix, rows, cols, self.keypoints_threshold)
class ShiftScaleRotate
(shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, value=None, mask_value=None, shift_limit_x=None, shift_limit_y=None, rotate_method='largest_box', always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly apply affine transforms: translate, scale and rotate the input.
Parameters:
Name | Type | Description |
---|---|---|
shift_limit | float, float) or float | shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625). |
scale_limit | float, float) or float | scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1. If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high). Default: (-0.1, 0.1). |
rotate_limit | int, int) or int | rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: (-45, 45). |
interpolation | OpenCV flag | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
border_mode | OpenCV flag | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101 |
value | int, float, list of int, list of float | padding value if border_mode is cv2.BORDER_CONSTANT. |
mask_value | int, float, list of int, list of float | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
shift_limit_x | float, float) or float | shift factor range for width. If it is set then this value instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value, the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None. |
shift_limit_y | float, float) or float | shift factor range for height. If it is set then this value instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value, the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie in the range [0, 1]. Default: None. |
rotate_method | str | rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box" |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, keypoints, bboxes
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class ShiftScaleRotate(DualTransform):
"""Randomly apply affine transforms: translate, scale and rotate the input.
Args:
shift_limit ((float, float) or float): shift factor range for both height and width. If shift_limit
is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and
upper bounds should lie in range [0, 1]. Default: (-0.0625, 0.0625).
scale_limit ((float, float) or float): scaling factor range. If scale_limit is a single float value, the
range will be (-scale_limit, scale_limit). Note that the scale_limit will be biased by 1.
If scale_limit is a tuple, like (low, high), sampling will be done from the range (1 + low, 1 + high).
Default: (-0.1, 0.1).
rotate_limit ((int, int) or int): rotation range. If rotate_limit is a single int value, the
range will be (-rotate_limit, rotate_limit). Default: (-45, 45).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of int,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
shift_limit_x ((float, float) or float): shift factor range for width. If it is set then this value
instead of shift_limit will be used for shifting width. If shift_limit_x is a single float value,
the range will be (-shift_limit_x, shift_limit_x). Absolute values for lower and upper bounds should lie in
the range [0, 1]. Default: None.
shift_limit_y ((float, float) or float): shift factor range for height. If it is set then this value
instead of shift_limit will be used for shifting height. If shift_limit_y is a single float value,
the range will be (-shift_limit_y, shift_limit_y). Absolute values for lower and upper bounds should lie
in the range [0, 1]. Default: None.
rotate_method (str): rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse".
Default: "largest_box"
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, keypoints, bboxes
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS, Targets.BBOXES)
def __init__(
self,
shift_limit: ScaleFloatType = 0.0625,
scale_limit: ScaleFloatType = 0.1,
rotate_limit: int = 45,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[Tuple[int, ...]] = None,
mask_value: Optional[Tuple[int, ...]] = None,
shift_limit_x: Optional[ScaleFloatType] = None,
shift_limit_y: Optional[ScaleFloatType] = None,
rotate_method: str = "largest_box",
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.shift_limit_x = to_tuple(shift_limit_x if shift_limit_x is not None else shift_limit)
self.shift_limit_y = to_tuple(shift_limit_y if shift_limit_y is not None else shift_limit)
self.scale_limit = to_tuple(scale_limit, bias=1.0)
self.rotate_limit = to_tuple(rotate_limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
self.rotate_method = rotate_method
if self.rotate_method not in ["largest_box", "ellipse"]:
raise ValueError(f"Rotation method {self.rotate_method} is not valid.")
def apply(
self,
img: np.ndarray,
angle: float = 0,
scale: float = 0,
dx: int = 0,
dy: int = 0,
interpolation: int = cv2.INTER_LINEAR,
**params: Any,
) -> np.ndarray:
return F.shift_scale_rotate(img, angle, scale, dx, dy, interpolation, self.border_mode, self.value)
def apply_to_mask(
self, mask: np.ndarray, angle: float = 0, scale: float = 0, dx: int = 0, dy: int = 0, **params: Any
) -> np.ndarray:
return F.shift_scale_rotate(mask, angle, scale, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
angle: float = 0,
scale: float = 0,
dx: int = 0,
dy: int = 0,
rows: int = 0,
cols: int = 0,
**params: Any,
) -> KeypointInternalType:
return F.keypoint_shift_scale_rotate(keypoint, angle, scale, dx, dy, rows, cols)
def get_params(self) -> Dict[str, Any]:
return {
"angle": random.uniform(self.rotate_limit[0], self.rotate_limit[1]),
"scale": random.uniform(self.scale_limit[0], self.scale_limit[1]),
"dx": random.uniform(self.shift_limit_x[0], self.shift_limit_x[1]),
"dy": random.uniform(self.shift_limit_y[0], self.shift_limit_y[1]),
}
def apply_to_bbox(
self, bbox: BoxInternalType, angle: float, scale: float, dx: int, dy: int, **params: Any
) -> BoxInternalType:
return F.bbox_shift_scale_rotate(bbox, angle, scale, dx, dy, self.rotate_method, **params)
def get_transform_init_args(self) -> Dict[str, Any]:
return {
"shift_limit_x": self.shift_limit_x,
"shift_limit_y": self.shift_limit_y,
"scale_limit": to_tuple(self.scale_limit, bias=-1.0),
"rotate_limit": self.rotate_limit,
"interpolation": self.interpolation,
"border_mode": self.border_mode,
"value": self.value,
"mask_value": self.mask_value,
"rotate_method": self.rotate_method,
}
class Transpose
(always_apply=False, p=0.5)
[view source on GitHub] ¶
Transpose the input by swapping rows and columns.
Parameters:
Name | Type | Description |
---|---|---|
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class Transpose(DualTransform):
"""Transpose the input by swapping rows and columns.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def __init__(self, always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply, p)
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.transpose(img)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_transpose(bbox, 0, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.keypoint_transpose(keypoint)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class VerticalFlip
[view source on GitHub] ¶
Flip the input vertically around the x-axis.
Parameters:
Name | Type | Description |
---|---|---|
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask, bboxes, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/geometric/transforms.py
class VerticalFlip(DualTransform):
"""Flip the input vertically around the x-axis.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS)
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.vflip(img)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return F.bbox_vflip(bbox, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return F.keypoint_vflip(keypoint, **params)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
mixing
special
¶
transforms
¶
class MixUp
(reference_data=None, read_fn=<function MixUp.<lambda> at 0x7f0fe4e22040>, alpha=0.4, mix_coef_return_name='mix_coef', always_apply=False, p=0.5)
[view source on GitHub] ¶
Performs MixUp data augmentation, blending images, masks, and class labels with reference data.
MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution. This method is known for improving model generalization by promoting linear behavior between classes and smoothing decision boundaries.
Reference
Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412
Parameters:
Name | Type | Description |
---|---|---|
reference_data | Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]] | A sequence or generator of dictionaries containing the reference data for mixing If None or an empty sequence is provided, no operation is performed and a warning is issued. |
read_fn | Callable[[ReferenceImage], Dict[str, Any]] | A function to process items from reference_data. It should accept items from reference_data and return a dictionary containing processed data: - The returned dictionary must include an 'image' key with a numpy array value. - It may also include 'mask', 'global_label' each associated with numpy array values. Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it. |
mix_coef_return_name | str | Name used for the applied alpha coefficient in the returned dictionary. Defaults to "mix_coef". |
alpha | float | The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0. Higher values lead to more uniform mixing. Defaults to 0.4. |
p | float | The probability of applying the transformation. Defaults to 0.5. |
Targets
image, mask, global_label
Image types: - uint8, float32
Exceptions:
Type | Description |
---|---|
- ValueError | If the alpha parameter is negative. |
- NotImplementedError | If the transform is applied to bounding boxes or keypoints. |
Notes
- If no reference data is provided, a warning is issued, and the transform acts as a no-op.
- Notes if images are in float32 format, they should be within [0, 1] range.
Example Usage: import albumentations as A import numpy as np from albumentations.core.types import ReferenceImage
# Prepare reference data
# Note: This code generates random reference data for demonstration purposes only.
# In real-world applications, it's crucial to use meaningful and representative data.
# The quality and relevance of your input data significantly impact the effectiveness
# of the augmentation process. Ensure your data closely aligns with your specific
# use case and application requirements.
reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
global_label=np.random.choice([0, 1], size=3)) for i in range(10)]
# In this example, the lambda function simply returns its input, which works well for
# data already in the expected format. For more complex scenarios, where the data might not be in
# the required format or additional processing is needed, a more sophisticated function can be implemented.
# Below is a hypothetical example where the input data is a file path, # and the function reads the image
# file, converts it to a specific format, and possibly performs other preprocessing steps.
# Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
# def custom_read_fn(file_path):
# from PIL import Image
# image = Image.open(file_path).convert('RGB')
# image = image.resize((100, 100)) # Example resize, adjust as needed.
# return np.array(image)
# aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])
# For simplicity, the original lambda function is used in this example.
# Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.
# Apply augmentations
image = np.empty([100, 100, 3], dtype=np.uint8)
mask = np.empty([100, 100], dtype=np.uint8)
global_label = np.array([0, 1, 0])
data = aug(image=image, global_label=global_label, mask=mask)
transformed_image = data["image"]
transformed_mask = data["mask"]
transformed_global_label = data["global_label"]
# Print applied mix coefficient
print(data["mix_coef"]) # Output: e.g., 0.9991580344142427
Source code in albumentations/augmentations/mixing/transforms.py
class MixUp(ReferenceBasedTransform):
"""Performs MixUp data augmentation, blending images, masks, and class labels with reference data.
MixUp augmentation linearly combines an input (image, mask, and class label) with another set from a predefined
reference dataset. The mixing degree is controlled by a parameter λ (lambda), sampled from a Beta distribution.
This method is known for improving model generalization by promoting linear behavior between classes and
smoothing decision boundaries.
Reference:
Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization.
In International Conference on Learning Representations. https://arxiv.org/abs/1710.09412
Args:
reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
A sequence or generator of dictionaries containing the reference data for mixing
If None or an empty sequence is provided, no operation is performed and a warning is issued.
read_fn (Callable[[ReferenceImage], Dict[str, Any]]):
A function to process items from reference_data. It should accept items from reference_data
and return a dictionary containing processed data:
- The returned dictionary must include an 'image' key with a numpy array value.
- It may also include 'mask', 'global_label' each associated with numpy array values.
Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.
mix_coef_return_name (str): Name used for the applied alpha coefficient in the returned dictionary.
Defaults to "mix_coef".
alpha (float):
The alpha parameter for the Beta distribution, influencing the mix's balance. Must be ≥ 0.
Higher values lead to more uniform mixing. Defaults to 0.4.
p (float):
The probability of applying the transformation. Defaults to 0.5.
Targets:
image, mask, global_label
Image types:
- uint8, float32
Raises:
- ValueError: If the alpha parameter is negative.
- NotImplementedError: If the transform is applied to bounding boxes or keypoints.
Notes:
- If no reference data is provided, a warning is issued, and the transform acts as a no-op.
- Notes if images are in float32 format, they should be within [0, 1] range.
Example Usage:
import albumentations as A
import numpy as np
from albumentations.core.types import ReferenceImage
# Prepare reference data
# Note: This code generates random reference data for demonstration purposes only.
# In real-world applications, it's crucial to use meaningful and representative data.
# The quality and relevance of your input data significantly impact the effectiveness
# of the augmentation process. Ensure your data closely aligns with your specific
# use case and application requirements.
reference_data = [ReferenceImage(image=np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8),
mask=np.random.randint(0, 4, (100, 100, 1), dtype=np.uint8),
global_label=np.random.choice([0, 1], size=3)) for i in range(10)]
# In this example, the lambda function simply returns its input, which works well for
# data already in the expected format. For more complex scenarios, where the data might not be in
# the required format or additional processing is needed, a more sophisticated function can be implemented.
# Below is a hypothetical example where the input data is a file path, # and the function reads the image
# file, converts it to a specific format, and possibly performs other preprocessing steps.
# Example of a more complex read_fn that reads an image from a file path, converts it to RGB, and resizes it.
# def custom_read_fn(file_path):
# from PIL import Image
# image = Image.open(file_path).convert('RGB')
# image = image.resize((100, 100)) # Example resize, adjust as needed.
# return np.array(image)
# aug = A.Compose([A.RandomRotate90(), A.MixUp(p=1, reference_data=reference_data, read_fn=lambda x: x)])
# For simplicity, the original lambda function is used in this example.
# Replace `lambda x: x` with `custom_read_fn`if you need to process the data more extensively.
# Apply augmentations
image = np.empty([100, 100, 3], dtype=np.uint8)
mask = np.empty([100, 100], dtype=np.uint8)
global_label = np.array([0, 1, 0])
data = aug(image=image, global_label=global_label, mask=mask)
transformed_image = data["image"]
transformed_mask = data["mask"]
transformed_global_label = data["global_label"]
# Print applied mix coefficient
print(data["mix_coef"]) # Output: e.g., 0.9991580344142427
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.GLOBAL_LABEL)
def __init__(
self,
reference_data: Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]] = None,
read_fn: Callable[[ReferenceImage], Any] = lambda x: {"image": x, "mask": None, "class_label": None},
alpha: float = 0.4,
mix_coef_return_name: str = "mix_coef",
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.mix_coef_return_name = mix_coef_return_name
if alpha < 0:
msg = "Alpha must be >= 0."
raise ValueError(msg)
self.read_fn = read_fn
self.alpha = alpha
if reference_data is None:
warn("No reference data provided for MixUp. This transform will act as a no-op.")
# Create an empty generator
self.reference_data: List[Any] = []
elif (
isinstance(reference_data, types.GeneratorType)
or isinstance(reference_data, Iterable)
and not isinstance(reference_data, str)
):
self.reference_data = reference_data # type: ignore[assignment]
else:
msg = "reference_data must be a list, tuple, generator, or None."
raise TypeError(msg)
def apply(self, img: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
mix_img = mix_data.get("image")
if not is_grayscale_image(img) and img.shape != img.shape:
msg = "The shape of the reference image should be the same as the input image."
raise ValueError(msg)
return mix_arrays(img, mix_img, mix_coef) if mix_img is not None else img
def apply_to_mask(self, mask: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any) -> np.ndarray:
mix_mask = mix_data.get("mask")
return mix_arrays(mask, mix_mask, mix_coef) if mix_mask is not None else mask
def apply_to_global_label(
self, label: np.ndarray, mix_data: ReferenceImage, mix_coef: float, **params: Any
) -> np.ndarray:
mix_label = mix_data.get("global_label")
if mix_label is not None and label is not None:
return mix_coef * label + (1 - mix_coef) * mix_label
return label
def apply_to_bboxes(self, bboxes: Sequence[BoxType], mix_data: ReferenceImage, **params: Any) -> Sequence[BoxType]:
msg = "MixUp does not support bounding boxes yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
raise NotImplementedError(msg)
def apply_to_keypoints(
self, keypoints: Sequence[KeypointType], *args: Any, **params: Any
) -> Sequence[KeypointType]:
msg = "MixUp does not support keypoints yet, feel free to submit pull request to https://github.com/albumentations-team/albumentations/."
raise NotImplementedError(msg)
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return "reference_data", "alpha"
def get_params(self) -> Dict[str, Union[None, float, Dict[str, Any]]]:
mix_data = None
# Check if reference_data is not empty and is a sequence (list, tuple, np.array)
if isinstance(self.reference_data, Sequence) and not isinstance(self.reference_data, (str, bytes)):
if len(self.reference_data) > 0: # Additional check to ensure it's not empty
mix_idx = random.randint(0, len(self.reference_data) - 1)
mix_data = self.reference_data[mix_idx]
# Check if reference_data is an iterator or generator
elif isinstance(self.reference_data, Iterator):
try:
mix_data = next(self.reference_data) # Attempt to get the next item
except StopIteration:
warn(
"Reference data iterator/generator has been exhausted. "
"Further mixing augmentations will not be applied.",
RuntimeWarning,
)
return {"mix_data": {}, "mix_coef": 1}
# If mix_data is None or empty after the above checks, return default values
if mix_data is None:
return {"mix_data": {}, "mix_coef": 1}
# If mix_data is not None, calculate mix_coef and apply read_fn
mix_coef = beta(self.alpha, self.alpha) # Assuming beta is defined elsewhere
return {"mix_data": self.read_fn(mix_data), "mix_coef": mix_coef}
def apply_with_params(self, params: Dict[str, Any], *args: Any, **kwargs: Any) -> Dict[str, Any]:
res = super().apply_with_params(params, *args, **kwargs)
if self.mix_coef_return_name:
res[self.mix_coef_return_name] = params["mix_coef"]
return res
transforms
¶
class CLAHE
(clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply Contrast Limited Adaptive Histogram Equalization to the input image.
Parameters:
Name | Type | Description |
---|---|---|
clip_limit | Union[float, Tuple[float, float]] | upper threshold value for contrast limiting. If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4). |
tile_grid_size | Tuple[int, int] | size of grid for histogram equalization. Default: (8, 8). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8
Source code in albumentations/augmentations/transforms.py
class CLAHE(ImageOnlyTransform):
"""Apply Contrast Limited Adaptive Histogram Equalization to the input image.
Args:
clip_limit: upper threshold value for contrast limiting.
If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
tile_grid_size: size of grid for histogram equalization. Default: (8, 8).
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
"""
def __init__(
self,
clip_limit: ScaleFloatType = 4.0,
tile_grid_size: Tuple[int, int] = (8, 8),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.clip_limit = to_tuple(clip_limit, 1)
self.tile_grid_size = cast(Tuple[int, int], tuple(tile_grid_size))
def apply(self, img: np.ndarray, clip_limit: float = 2, **params: Any) -> np.ndarray:
if not is_rgb_image(img) and not is_grayscale_image(img):
msg = "CLAHE transformation expects 1-channel or 3-channel images."
raise TypeError(msg)
return F.clahe(img, clip_limit, self.tile_grid_size)
def get_params(self) -> Dict[str, float]:
return {"clip_limit": random.uniform(self.clip_limit[0], self.clip_limit[1])}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("clip_limit", "tile_grid_size")
class ChannelShuffle
[view source on GitHub] ¶
Randomly rearrange channels of the input RGB image.
Parameters:
Name | Type | Description |
---|---|---|
p | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class ChannelShuffle(ImageOnlyTransform):
"""Randomly rearrange channels of the input RGB image.
Args:
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def apply(self, img: np.ndarray, channels_shuffled: Tuple[int, int, int] = (0, 1, 2), **params: Any) -> np.ndarray:
return F.channel_shuffle(img, channels_shuffled)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
ch_arr = list(range(img.shape[2]))
random.shuffle(ch_arr)
return {"channels_shuffled": ch_arr}
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class ChromaticAberration
(primary_distortion_limit=0.02, secondary_distortion_limit=0.05, mode='green_purple', interpolation=1, always_apply=False, p=0.5)
[view source on GitHub] ¶
Add lateral chromatic aberration by distorting the red and blue channels of the input image.
Parameters:
Name | Type | Description |
---|---|---|
primary_distortion_limit | Union[float, Tuple[float, float]] | range of the primary radial distortion coefficient. If primary_distortion_limit is a single float value, the range will be (-primary_distortion_limit, primary_distortion_limit). Controls the distortion in the center of the image (positive values result in pincushion distortion, negative values result in barrel distortion). Default: 0.02. |
secondary_distortion_limit | Union[float, Tuple[float, float]] | range of the secondary radial distortion coefficient. If secondary_distortion_limit is a single float value, the range will be (-secondary_distortion_limit, secondary_distortion_limit). Controls the distortion in the corners of the image (positive values result in pincushion distortion, negative values result in barrel distortion). Default: 0.05. |
mode | Literal['green_purple', 'red_blue', 'random'] | type of color fringing. Supported modes are 'green_purple', 'red_blue' and 'random'. 'random' will choose one of the modes 'green_purple' or 'red_blue' randomly. Default: 'green_purple'. |
interpolation | int | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class ChromaticAberration(ImageOnlyTransform):
"""Add lateral chromatic aberration by distorting the red and blue channels of the input image.
Args:
primary_distortion_limit: range of the primary radial distortion coefficient.
If primary_distortion_limit is a single float value, the range will be
(-primary_distortion_limit, primary_distortion_limit).
Controls the distortion in the center of the image (positive values result in pincushion distortion,
negative values result in barrel distortion).
Default: 0.02.
secondary_distortion_limit: range of the secondary radial distortion coefficient.
If secondary_distortion_limit is a single float value, the range will be
(-secondary_distortion_limit, secondary_distortion_limit).
Controls the distortion in the corners of the image (positive values result in pincushion distortion,
negative values result in barrel distortion).
Default: 0.05.
mode: type of color fringing.
Supported modes are 'green_purple', 'red_blue' and 'random'.
'random' will choose one of the modes 'green_purple' or 'red_blue' randomly.
Default: 'green_purple'.
interpolation: flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p: probability of applying the transform.
Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
primary_distortion_limit: ScaleFloatType = 0.02,
secondary_distortion_limit: ScaleFloatType = 0.05,
mode: ChromaticAberrationMode = "green_purple",
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.primary_distortion_limit = to_tuple(primary_distortion_limit)
self.secondary_distortion_limit = to_tuple(secondary_distortion_limit)
self.mode = self._validate_mode(mode)
self.interpolation = interpolation
@staticmethod
def _validate_mode(
mode: ChromaticAberrationMode,
) -> ChromaticAberrationMode:
valid_modes = ["green_purple", "red_blue", "random"]
if mode not in valid_modes:
msg = f"Unsupported mode: {mode}. Supported modes are 'green_purple', 'red_blue', 'random'."
raise ValueError(msg)
return mode
def apply(
self,
img: np.ndarray,
primary_distortion_red: float = -0.02,
secondary_distortion_red: float = -0.05,
primary_distortion_blue: float = -0.02,
secondary_distortion_blue: float = -0.05,
**params: Any,
) -> np.ndarray:
return F.chromatic_aberration(
img,
primary_distortion_red,
secondary_distortion_red,
primary_distortion_blue,
secondary_distortion_blue,
cast(int, self.interpolation),
)
def get_params(self) -> Dict[str, float]:
primary_distortion_red = random_utils.uniform(*self.primary_distortion_limit)
secondary_distortion_red = random_utils.uniform(*self.secondary_distortion_limit)
primary_distortion_blue = random_utils.uniform(*self.primary_distortion_limit)
secondary_distortion_blue = random_utils.uniform(*self.secondary_distortion_limit)
secondary_distortion_red = self._match_sign(primary_distortion_red, secondary_distortion_red)
secondary_distortion_blue = self._match_sign(primary_distortion_blue, secondary_distortion_blue)
if self.mode == "green_purple":
# distortion coefficients of the red and blue channels have the same sign
primary_distortion_blue = self._match_sign(primary_distortion_red, primary_distortion_blue)
secondary_distortion_blue = self._match_sign(secondary_distortion_red, secondary_distortion_blue)
if self.mode == "red_blue":
# distortion coefficients of the red and blue channels have the opposite sign
primary_distortion_blue = self._unmatch_sign(primary_distortion_red, primary_distortion_blue)
secondary_distortion_blue = self._unmatch_sign(secondary_distortion_red, secondary_distortion_blue)
return {
"primary_distortion_red": primary_distortion_red,
"secondary_distortion_red": secondary_distortion_red,
"primary_distortion_blue": primary_distortion_blue,
"secondary_distortion_blue": secondary_distortion_blue,
}
@staticmethod
def _match_sign(a: float, b: float) -> float:
# Match the sign of b to a
if (a < 0 < b) or (a > 0 > b):
b = -b
return b
@staticmethod
def _unmatch_sign(a: float, b: float) -> float:
# Unmatch the sign of b to a
if (a < 0 and b < 0) or (a > 0 and b > 0):
b = -b
return b
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return "primary_distortion_limit", "secondary_distortion_limit", "mode", "interpolation"
class ColorJitter
(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision, this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8 overflow, but we use value saturation.
Parameters:
Name | Type | Description |
---|---|---|
brightness | float or tuple of float (min, max | How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers. |
contrast | float or tuple of float (min, max | How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers. |
saturation | float or tuple of float (min, max | How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers. |
hue | float or tuple of float (min, max | How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5. |
Source code in albumentations/augmentations/transforms.py
class ColorJitter(ImageOnlyTransform):
"""Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision,
this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in
Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8
overflow, but we use value saturation.
Args:
brightness (float or tuple of float (min, max)): How much to jitter brightness.
brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of float (min, max)): How much to jitter contrast.
contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]
or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of float (min, max)): How much to jitter saturation.
saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
or the given [min, max]. Should be non negative numbers.
hue (float or tuple of float (min, max)): How much to jitter hue.
hue_factor is chosen uniformly from [-hue, hue] or the given [min, max].
Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
"""
def __init__(
self,
brightness: ScaleFloatType = 0.2,
contrast: ScaleFloatType = 0.2,
saturation: ScaleFloatType = 0.2,
hue: ScaleFloatType = 0.2,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.brightness = self.__check_values(brightness, "brightness")
self.contrast = self.__check_values(contrast, "contrast")
self.saturation = self.__check_values(saturation, "saturation")
self.hue = self.__check_values(hue, "hue", offset=0, bounds=(-0.5, 0.5), clip=False)
self.transforms = [
F.adjust_brightness_torchvision,
F.adjust_contrast_torchvision,
F.adjust_saturation_torchvision,
F.adjust_hue_torchvision,
]
@staticmethod
def __check_values(
value: ScaleFloatType,
name: str,
offset: float = 1,
bounds: Tuple[float, float] = (0, float("inf")),
clip: bool = True,
) -> Tuple[float, float]:
if isinstance(value, numbers.Number):
if value < 0:
raise ValueError(f"If {name} is a single number, it must be non negative.")
value = [offset - value, offset + value]
if clip:
value[0] = max(value[0], 0)
elif isinstance(value, (tuple, list)) and len(value) == TWO:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{name} values should be between {bounds}")
else:
raise TypeError(f"{name} should be a single number or a list/tuple with length 2.")
return value
def get_params(self) -> Dict[str, Any]:
brightness = random.uniform(self.brightness[0], self.brightness[1])
contrast = random.uniform(self.contrast[0], self.contrast[1])
saturation = random.uniform(self.saturation[0], self.saturation[1])
hue = random.uniform(self.hue[0], self.hue[1])
order = [0, 1, 2, 3]
random.shuffle(order)
return {
"brightness": brightness,
"contrast": contrast,
"saturation": saturation,
"hue": hue,
"order": order,
}
def apply(
self,
img: np.ndarray,
brightness: float = 1.0,
contrast: float = 1.0,
saturation: float = 1.0,
hue: float = 0,
order: Optional[List[int]] = None,
**params: Any,
) -> np.ndarray:
if order is None:
order = [0, 1, 2, 3]
if not is_rgb_image(img) and not is_grayscale_image(img):
msg = "ColorJitter transformation expects 1-channel or 3-channel images."
raise TypeError(msg)
color_transforms = [brightness, contrast, saturation, hue]
for i in order:
img = self.transforms[i](img, color_transforms[i]) # type: ignore[operator]
return img
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("brightness", "contrast", "saturation", "hue")
class Downscale
(scale_min=0.25, scale_max=0.25, interpolation=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Decreases image quality by downscaling and upscaling back.
Parameters:
Name | Type | Description |
---|---|---|
scale_min | float | lower bound on the image scale. Should be <= scale_max. |
scale_max | float | upper bound on the image scale. Should be < 1. |
interpolation | Union[int, albumentations.core.transforms_interface.Interpolation, Dict[str, int]] | cv2 interpolation method. Could be: - single cv2 interpolation flag - selected method will be used for downscale and upscale. - dict(downscale=flag, upscale=flag) - Downscale.Interpolation(downscale=flag, upscale=flag) - Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST) |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class Downscale(ImageOnlyTransform):
"""Decreases image quality by downscaling and upscaling back.
Args:
scale_min: lower bound on the image scale. Should be <= scale_max.
scale_max: upper bound on the image scale. Should be < 1.
interpolation: cv2 interpolation method. Could be:
- single cv2 interpolation flag - selected method will be used for downscale and upscale.
- dict(downscale=flag, upscale=flag)
- Downscale.Interpolation(downscale=flag, upscale=flag) -
Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
scale_min: float = 0.25,
scale_max: float = 0.25,
interpolation: Optional[Union[int, Interpolation, Dict[str, int]]] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if interpolation is None:
self.interpolation = Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)
warnings.warn(
"Using default interpolation INTER_NEAREST, which is sub-optimal."
"Please specify interpolation mode for downscale and upscale explicitly."
"For additional information see this PR https://github.com/albumentations-team/albumentations/pull/584"
)
elif isinstance(interpolation, int):
self.interpolation = Interpolation(downscale=interpolation, upscale=interpolation)
elif isinstance(interpolation, Interpolation):
self.interpolation = interpolation
elif isinstance(interpolation, dict):
self.interpolation = Interpolation(**interpolation)
else:
raise ValueError(
"Wrong interpolation data type. Supported types: `Optional[Union[int, Interpolation, Dict[str, int]]]`."
f" Got: {type(interpolation)}"
)
if scale_min > scale_max:
raise ValueError(f"Expected scale_min be less or equal scale_max, got {scale_min} {scale_max}")
if scale_max >= 1:
raise ValueError(f"Expected scale_max to be less than 1, got {scale_max}")
self.scale_min = scale_min
self.scale_max = scale_max
def apply(self, img: np.ndarray, scale: float, **params: Any) -> np.ndarray:
if isinstance(self.interpolation, int):
msg = "Should not be here, added for typing purposes. Please report this issue."
raise TypeError(msg)
return F.downscale(
img,
scale=scale,
down_interpolation=self.interpolation.downscale,
up_interpolation=self.interpolation.upscale,
)
def get_params(self) -> Dict[str, Any]:
return {"scale": random.uniform(self.scale_min, self.scale_max)}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return "scale_min", "scale_max"
def to_dict_private(self) -> Dict[str, Any]:
if isinstance(self.interpolation, int):
msg = "Should not be here, added for typing purposes. Please report this issue."
raise TypeError(msg)
result = super().to_dict_private()
result["interpolation"] = {"upscale": self.interpolation.upscale, "downscale": self.interpolation.downscale}
return result
class Emboss
(alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5)
[view source on GitHub] ¶
Emboss the input image and overlays the result with the original image.
Parameters:
Name | Type | Description |
---|---|---|
alpha | Tuple[float, float] | range to choose the visibility of the embossed image. At 0, only the original image is visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5). |
strength | Tuple[float, float] | strength range of the embossing. Default: (0.2, 0.7). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Source code in albumentations/augmentations/transforms.py
class Emboss(ImageOnlyTransform):
"""Emboss the input image and overlays the result with the original image.
Args:
alpha: range to choose the visibility of the embossed image. At 0, only the original image is
visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
strength: strength range of the embossing. Default: (0.2, 0.7).
p: probability of applying the transform. Default: 0.5.
Targets:
image
"""
def __init__(
self,
alpha: Tuple[float, float] = (0.2, 0.5),
strength: Tuple[float, float] = (0.2, 0.7),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
self.strength = self.__check_values(to_tuple(strength, 0.0), name="strength")
@staticmethod
def __check_values(
value: Tuple[float, float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
) -> Tuple[float, float]:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{name} values should be between {bounds}")
return value
@staticmethod
def __generate_emboss_matrix(alpha_sample: np.ndarray, strength_sample: np.ndarray) -> np.ndarray:
matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
matrix_effect = np.array(
[
[-1 - strength_sample, 0 - strength_sample, 0],
[0 - strength_sample, 1, 0 + strength_sample],
[0, 0 + strength_sample, 1 + strength_sample],
],
dtype=np.float32,
)
return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect
def get_params(self) -> Dict[str, np.ndarray]:
alpha = random.uniform(*self.alpha)
strength = random.uniform(*self.strength)
emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
return {"emboss_matrix": emboss_matrix}
def apply(self, img: np.ndarray, emboss_matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.convolve(img, emboss_matrix)
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("alpha", "strength")
class Equalize
(mode='cv', by_channels=True, mask=None, mask_params=(), always_apply=False, p=0.5)
[view source on GitHub] ¶
Equalize the image histogram.
Parameters:
Name | Type | Description |
---|---|---|
mode | str | {'cv', 'pil'}. Use OpenCV or Pillow equalization method. |
by_channels | bool | If True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by |
mask | np.ndarray, callable | If given, only the pixels selected by the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable. Function signature must include |
mask_params | list of str | Params for mask function. |
Targets
image
Image types: uint8
Source code in albumentations/augmentations/transforms.py
class Equalize(ImageOnlyTransform):
"""Equalize the image histogram.
Args:
mode (str): {'cv', 'pil'}. Use OpenCV or Pillow equalization method.
by_channels (bool): If True, use equalization by channels separately,
else convert image to YCbCr representation and use equalization by `Y` channel.
mask (np.ndarray, callable): If given, only the pixels selected by
the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable.
Function signature must include `image` argument.
mask_params (list of str): Params for mask function.
Targets:
image
Image types:
uint8
"""
def __init__(
self,
mode: ImageMode = "cv",
by_channels: bool = True,
mask: Optional[np.ndarray] = None,
mask_params: Tuple[()] = (),
always_apply: bool = False,
p: float = 0.5,
):
if mode not in image_modes:
raise ValueError(f"Unsupported equalization mode. Supports: {image_modes}. " f"Got: {mode}")
super().__init__(always_apply, p)
self.mode = mode
self.by_channels = by_channels
self.mask = mask
self.mask_params = mask_params
def apply(self, img: np.ndarray, mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.equalize(img, mode=self.mode, by_channels=self.by_channels, mask=mask)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
if not callable(self.mask):
return {"mask": self.mask}
return {"mask": self.mask(**params)}
@property
def targets_as_params(self) -> List[str]:
return ["image", *list(self.mask_params)]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("mode", "by_channels", "mask", "mask_params")
class FancyPCA
(alpha=0.1, always_apply=False, p=0.5)
[view source on GitHub] ¶
Augment RGB image using FancyPCA from Krizhevsky's paper "ImageNet Classification with Deep Convolutional Neural Networks"
Parameters:
Name | Type | Description |
---|---|---|
alpha | float | how much to perturb/scale the eigen vecs and vals. scale is samples from gaussian distribution (mu=0, sigma=alpha) |
Targets
image
Image types: 3-channel uint8 images only
Credit
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image https://pixelatedbrian.github.io/2018-04-29-fancy_pca/
Source code in albumentations/augmentations/transforms.py
class FancyPCA(ImageOnlyTransform):
"""Augment RGB image using FancyPCA from Krizhevsky's paper
"ImageNet Classification with Deep Convolutional Neural Networks"
Args:
alpha: how much to perturb/scale the eigen vecs and vals.
scale is samples from gaussian distribution (mu=0, sigma=alpha)
Targets:
image
Image types:
3-channel uint8 images only
Credit:
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image
https://pixelatedbrian.github.io/2018-04-29-fancy_pca/
"""
def __init__(self, alpha: float = 0.1, always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply=always_apply, p=p)
self.alpha = alpha
def apply(self, img: np.ndarray, alpha: float = 0.1, **params: Any) -> np.ndarray:
return F.fancy_pca(img, alpha)
def get_params(self) -> Dict[str, float]:
return {"alpha": random.gauss(0, self.alpha)}
def get_transform_init_args_names(self) -> Tuple[str]:
return ("alpha",)
class FromFloat
(dtype='uint16', max_value=None, always_apply=False, p=1.0)
[view source on GitHub] ¶
Take an input array where all values should lie in the range [0, 1.0], multiply them by max_value
and then cast the resulted value to a type specified by dtype
. If max_value
is None the transform will try to infer the maximum value for the data type from the dtype
argument.
This is the inverse transform for :class:~albumentations.augmentations.transforms.ToFloat
.
Parameters:
Name | Type | Description |
---|---|---|
max_value | Optional[float] | maximum possible input value. Default: None. |
dtype | str | data type of the output. See the |
p | float | probability of applying the transform. Default: 1.0. |
Targets
image
Image types: float32
.. _'Data types' page from the NumPy docs: https://docs.scipy.org/doc/numpy/user/basics.types.html
Source code in albumentations/augmentations/transforms.py
class FromFloat(ImageOnlyTransform):
"""Take an input array where all values should lie in the range [0, 1.0], multiply them by `max_value` and then
cast the resulted value to a type specified by `dtype`. If `max_value` is None the transform will try to infer
the maximum value for the data type from the `dtype` argument.
This is the inverse transform for :class:`~albumentations.augmentations.transforms.ToFloat`.
Args:
max_value: maximum possible input value. Default: None.
dtype: data type of the output. See the `'Data types' page from the NumPy docs`_.
Default: 'uint16'.
p: probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
float32
.. _'Data types' page from the NumPy docs:
https://docs.scipy.org/doc/numpy/user/basics.types.html
"""
def __init__(
self, dtype: str = "uint16", max_value: Optional[float] = None, always_apply: bool = False, p: float = 1.0
):
super().__init__(always_apply, p)
self.dtype = np.dtype(dtype)
self.max_value = max_value
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.from_float(img, self.dtype, self.max_value)
def get_transform_init_args(self) -> Dict[str, Any]:
return {"dtype": self.dtype.name, "max_value": self.max_value}
class GaussNoise
(var_limit=(10.0, 50.0), mean=0, per_channel=True, always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply gaussian noise to the input image.
Parameters:
Name | Type | Description |
---|---|---|
var_limit | Union[float, Tuple[float, float]] | variance range for noise. If var_limit is a single float, the range will be (0, var_limit). Default: (10.0, 50.0). |
mean | float | mean of the noise. Default: 0 |
per_channel | bool | if set to True, noise will be sampled for each channel independently. Otherwise, the noise will be sampled once for all channels. Default: True |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class GaussNoise(ImageOnlyTransform):
"""Apply gaussian noise to the input image.
Args:
var_limit: variance range for noise. If var_limit is a single float, the range
will be (0, var_limit). Default: (10.0, 50.0).
mean: mean of the noise. Default: 0
per_channel: if set to True, noise will be sampled for each channel independently.
Otherwise, the noise will be sampled once for all channels. Default: True
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
var_limit: ScaleFloatType = (10.0, 50.0),
mean: float = 0,
per_channel: bool = True,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if isinstance(var_limit, (tuple, list)):
if var_limit[0] < 0:
msg = "Lower var_limit should be non negative."
raise ValueError(msg)
if var_limit[1] < 0:
msg = "Upper var_limit should be non negative."
raise ValueError(msg)
self.var_limit = var_limit
elif isinstance(var_limit, (int, float)):
if var_limit < 0:
msg = "var_limit should be non negative."
raise ValueError(msg)
self.var_limit = (0, var_limit)
else:
raise TypeError(f"Expected var_limit type to be one of (int, float, tuple, list), got {type(var_limit)}")
self.mean = mean
self.per_channel = per_channel
def apply(self, img: np.ndarray, gauss: Optional[float] = None, **params: Any) -> np.ndarray:
return F.gauss_noise(img, gauss=gauss)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, float]:
image = params["image"]
var = random.uniform(self.var_limit[0], self.var_limit[1])
sigma = var**0.5
if self.per_channel:
gauss = random_utils.normal(self.mean, sigma, image.shape)
else:
gauss = random_utils.normal(self.mean, sigma, image.shape[:2])
if len(image.shape) == THREE:
gauss = np.expand_dims(gauss, -1)
return {"gauss": gauss}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("var_limit", "per_channel", "mean")
class HueSaturationValue
(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly change hue, saturation and value of the input image.
Parameters:
Name | Type | Description |
---|---|---|
hue_shift_limit | Union[int, Tuple[int, int]] | range for changing hue. If hue_shift_limit is a single int, the range will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20). |
sat_shift_limit | Union[int, Tuple[int, int]] | range for changing saturation. If sat_shift_limit is a single int, the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30). |
val_shift_limit | Union[int, Tuple[int, int]] | range for changing value. If val_shift_limit is a single int, the range will be (-val_shift_limit, val_shift_limit). Default: (-20, 20). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class HueSaturationValue(ImageOnlyTransform):
"""Randomly change hue, saturation and value of the input image.
Args:
hue_shift_limit: range for changing hue. If hue_shift_limit is a single int, the range
will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
sat_shift_limit: range for changing saturation. If sat_shift_limit is a single int,
the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
val_shift_limit: range for changing value. If val_shift_limit is a single int, the range
will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
hue_shift_limit: ScaleIntType = 20,
sat_shift_limit: ScaleIntType = 30,
val_shift_limit: ScaleIntType = 20,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.hue_shift_limit = to_tuple(hue_shift_limit)
self.sat_shift_limit = to_tuple(sat_shift_limit)
self.val_shift_limit = to_tuple(val_shift_limit)
def apply(
self, img: np.ndarray, hue_shift: int = 0, sat_shift: int = 0, val_shift: int = 0, **params: Any
) -> np.ndarray:
if not is_rgb_image(img) and not is_grayscale_image(img):
msg = "HueSaturationValue transformation expects 1-channel or 3-channel images."
raise TypeError(msg)
return F.shift_hsv(img, hue_shift, sat_shift, val_shift)
def get_params(self) -> Dict[str, float]:
return {
"hue_shift": random.uniform(self.hue_shift_limit[0], self.hue_shift_limit[1]),
"sat_shift": random.uniform(self.sat_shift_limit[0], self.sat_shift_limit[1]),
"val_shift": random.uniform(self.val_shift_limit[0], self.val_shift_limit[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("hue_shift_limit", "sat_shift_limit", "val_shift_limit")
class ISONoise
(color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply camera sensor noise.
Parameters:
Name | Type | Description |
---|---|---|
color_shift | float, float | variance range for color hue change. Measured as a fraction of 360 degree Hue angle in HLS colorspace. |
intensity | float, float | Multiplicative factor that control strength of color and luminace noise. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8
Source code in albumentations/augmentations/transforms.py
class ISONoise(ImageOnlyTransform):
"""Apply camera sensor noise.
Args:
color_shift (float, float): variance range for color hue change.
Measured as a fraction of 360 degree Hue angle in HLS colorspace.
intensity ((float, float): Multiplicative factor that control strength
of color and luminace noise.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
"""
def __init__(
self,
color_shift: Tuple[float, float] = (0.01, 0.05),
intensity: Tuple[float, float] = (0.1, 0.5),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.intensity = intensity
self.color_shift = color_shift
def apply(
self,
img: np.ndarray,
color_shift: float = 0.05,
intensity: float = 1.0,
random_state: Optional[int] = None,
**params: Any,
) -> np.ndarray:
return F.iso_noise(img, color_shift, intensity, np.random.RandomState(random_state))
def get_params(self) -> Dict[str, Any]:
return {
"color_shift": random.uniform(self.color_shift[0], self.color_shift[1]),
"intensity": random.uniform(self.intensity[0], self.intensity[1]),
"random_state": random.randint(0, 65536),
}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("intensity", "color_shift")
class ImageCompression
(quality_lower=99, quality_upper=100, compression_type=<ImageCompressionType.JPEG: 0>, always_apply=False, p=0.5)
[view source on GitHub] ¶
Decreases image quality by Jpeg, WebP compression of an image.
Parameters:
Name | Type | Description |
---|---|---|
quality_lower | int | lower bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp. |
quality_upper | int | upper bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp. |
compression_type | ImageCompressionType | should be ImageCompressionType.JPEG or ImageCompressionType.WEBP. Default: ImageCompressionType.JPEG |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class ImageCompression(ImageOnlyTransform):
"""Decreases image quality by Jpeg, WebP compression of an image.
Args:
quality_lower: lower bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.
quality_upper: upper bound on the image quality. Should be in [0, 100] range for jpeg and [1, 100] for webp.
compression_type (ImageCompressionType): should be ImageCompressionType.JPEG or ImageCompressionType.WEBP.
Default: ImageCompressionType.JPEG
Targets:
image
Image types:
uint8, float32
"""
class ImageCompressionType(IntEnum):
"""Defines the types of image compression.
This Enum class is used to specify the image compression format.
Attributes:
JPEG (int): Represents the JPEG image compression format.
WEBP (int): Represents the WEBP image compression format.
"""
JPEG = 0
WEBP = 1
def __init__(
self,
quality_lower: int = 99,
quality_upper: int = 100,
compression_type: ImageCompressionType = ImageCompressionType.JPEG,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.compression_type = ImageCompression.ImageCompressionType(compression_type)
low_thresh_quality_assert = 0
if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
low_thresh_quality_assert = 1
if not low_thresh_quality_assert <= quality_lower <= MAX_JPEG_QUALITY:
raise ValueError(f"Invalid quality_lower. Got: {quality_lower}")
if not low_thresh_quality_assert <= quality_upper <= MAX_JPEG_QUALITY:
raise ValueError(f"Invalid quality_upper. Got: {quality_upper}")
self.quality_lower = quality_lower
self.quality_upper = quality_upper
def apply(self, img: np.ndarray, quality: int = 100, image_type: str = ".jpg", **params: Any) -> np.ndarray:
if img.ndim != TWO and img.shape[-1] not in (1, 3, 4):
msg = "ImageCompression transformation expects 1, 3 or 4 channel images."
raise TypeError(msg)
return F.image_compression(img, quality, image_type)
def get_params(self) -> Dict[str, Any]:
image_type = ".jpg"
if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
image_type = ".webp"
return {
"quality": random.randint(self.quality_lower, self.quality_upper),
"image_type": image_type,
}
def get_transform_init_args(self) -> Dict[str, Any]:
return {
"quality_lower": self.quality_lower,
"quality_upper": self.quality_upper,
"compression_type": self.compression_type.value,
}
class ImageCompressionType
¶
Defines the types of image compression.
This Enum class is used to specify the image compression format.
Attributes:
Name | Type | Description |
---|---|---|
JPEG | int | Represents the JPEG image compression format. |
WEBP | int | Represents the WEBP image compression format. |
Source code in albumentations/augmentations/transforms.py
class ImageCompressionType(IntEnum):
"""Defines the types of image compression.
This Enum class is used to specify the image compression format.
Attributes:
JPEG (int): Represents the JPEG image compression format.
WEBP (int): Represents the WEBP image compression format.
"""
JPEG = 0
WEBP = 1
class InvertImg
[view source on GitHub] ¶
Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.
Parameters:
Name | Type | Description |
---|---|---|
p | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class InvertImg(ImageOnlyTransform):
"""Invert the input image by subtracting pixel values from max values of the image types,
i.e., 255 for uint8 and 1.0 for float32.
Args:
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.invert(img)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class Lambda
(image=None, mask=None, keypoint=None, bbox=None, global_label=None, name=None, always_apply=False, p=1.0)
[view source on GitHub] ¶
A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:
Parameters:
Name | Type | Description |
---|---|---|
image | Optional[Callable[..., Any]] | Image transformation function. |
mask | Optional[Callable[..., Any]] | Mask transformation function. |
keypoint | Optional[Callable[..., Any]] | Keypoint transformation function. |
bbox | Optional[Callable[..., Any]] | BBox transformation function. |
global_label | Optional[Callable[..., Any]] | Global label transformation function. |
always_apply | bool | Indicates whether this transformation should be always applied. |
p | float | probability of applying the transform. Default: 1.0. |
Targets
image, mask, bboxes, keypoints, global_label
Image types: Any
Source code in albumentations/augmentations/transforms.py
class Lambda(NoOp):
"""A flexible transformation class for using user-defined transformation functions per targets.
Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:
Args:
image: Image transformation function.
mask: Mask transformation function.
keypoint: Keypoint transformation function.
bbox: BBox transformation function.
global_label: Global label transformation function.
always_apply: Indicates whether this transformation should be always applied.
p: probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints, global_label
Image types:
Any
"""
def __init__(
self,
image: Optional[Callable[..., Any]] = None,
mask: Optional[Callable[..., Any]] = None,
keypoint: Optional[Callable[..., Any]] = None,
bbox: Optional[Callable[..., Any]] = None,
global_label: Optional[Callable[..., Any]] = None,
name: Optional[str] = None,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
self.name = name
self.custom_apply_fns = {
target_name: F.noop for target_name in ("image", "mask", "keypoint", "bbox", "global_label")
}
for target_name, custom_apply_fn in {
"image": image,
"mask": mask,
"keypoint": keypoint,
"bbox": bbox,
"global_label": global_label,
}.items():
if custom_apply_fn is not None:
if isinstance(custom_apply_fn, LambdaType) and custom_apply_fn.__name__ == "<lambda>":
warnings.warn(
"Using lambda is incompatible with multiprocessing. "
"Consider using regular functions or partial()."
)
self.custom_apply_fns[target_name] = custom_apply_fn
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
fn = self.custom_apply_fns["image"]
return fn(img, **params)
def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
fn = self.custom_apply_fns["mask"]
return fn(mask, **params)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
fn = self.custom_apply_fns["bbox"]
return fn(bbox, **params)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
fn = self.custom_apply_fns["keypoint"]
return fn(keypoint, **params)
def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
fn = self.custom_apply_fns["global_label"]
return fn(label, **params)
@classmethod
def is_serializable(cls) -> bool:
return False
def to_dict_private(self) -> Dict[str, Any]:
if self.name is None:
msg = (
"To make a Lambda transform serializable you should provide the `name` argument, "
"e.g. `Lambda(name='my_transform', image=<some func>, ...)`."
)
raise ValueError(msg)
return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}
def __repr__(self) -> str:
state = {"name": self.name}
state.update(self.custom_apply_fns.items()) # type: ignore[arg-type]
state.update(self.get_base_init_args())
return f"{self.__class__.__name__}({format_args(state)})"
class MultiplicativeNoise
(multiplier=(0.9, 1.1), per_channel=False, elementwise=False, always_apply=False, p=0.5)
[view source on GitHub] ¶
Multiply image to random number or array of numbers.
Parameters:
Name | Type | Description |
---|---|---|
multiplier | Union[float, Tuple[float, float]] | If single float image will be multiplied to this number. If tuple of float multiplier will be in range |
per_channel | bool | If |
elementwise | bool | If |
Targets
image
Image types: Any
Source code in albumentations/augmentations/transforms.py
class MultiplicativeNoise(ImageOnlyTransform):
"""Multiply image to random number or array of numbers.
Args:
multiplier: If single float image will be multiplied to this number.
If tuple of float multiplier will be in range `[multiplier[0], multiplier[1])`. Default: (0.9, 1.1).
per_channel: If `False`, same values for all channels will be used.
If `True` use sample values for each channels. Default False.
elementwise: If `False` multiply multiply all pixels in an image with a random value sampled once.
If `True` Multiply image pixels with values that are pixelwise randomly sampled. Default: False.
Targets:
image
Image types:
Any
"""
def __init__(
self,
multiplier: ScaleFloatType = (0.9, 1.1),
per_channel: bool = False,
elementwise: bool = False,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.multiplier = to_tuple(multiplier, multiplier)
self.per_channel = per_channel
self.elementwise = elementwise
def apply(self, img: np.ndarray, multiplier: float = np.array([1]), **kwargs: Any) -> np.ndarray:
return F.multiply(img, multiplier)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
if self.multiplier[0] == self.multiplier[1]:
return {"multiplier": np.array([self.multiplier[0]])}
img = params["image"]
height, width = img.shape[:2]
num_channels = (1 if is_grayscale_image(img) else img.shape[-1]) if self.per_channel else 1
shape = [height, width, num_channels] if self.elementwise else [num_channels]
multiplier = random_utils.uniform(self.multiplier[0], self.multiplier[1], tuple(shape))
if is_grayscale_image(img) and img.ndim == TWO:
multiplier = np.squeeze(multiplier)
return {"multiplier": multiplier}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return "multiplier", "per_channel", "elementwise"
class Normalize
(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, always_apply=False, p=1.0)
[view source on GitHub] ¶
Normalization is applied by the formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value)
Parameters:
Name | Type | Description |
---|---|---|
mean | Union[float, Sequence[float]] | mean values |
std | Union[float, Sequence[float]] | std values |
max_pixel_value | float | maximum possible pixel value |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class Normalize(ImageOnlyTransform):
"""Normalization is applied by the formula: `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`
Args:
mean: mean values
std: std values
max_pixel_value: maximum possible pixel value
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
mean: Union[float, Sequence[float]] = (0.485, 0.456, 0.406),
std: Union[float, Sequence[float]] = (0.229, 0.224, 0.225),
max_pixel_value: float = 255.0,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
self.mean = mean
self.std = std
self.max_pixel_value = max_pixel_value
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.normalize(img, self.mean, self.std, self.max_pixel_value)
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("mean", "std", "max_pixel_value")
class PixelDropout
(dropout_prob=0.01, per_channel=False, drop_value=0, mask_drop_value=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Set pixels to 0 with some probability.
Parameters:
Name | Type | Description |
---|---|---|
dropout_prob | float | pixel drop probability. Default: 0.01 |
per_channel | bool | if set to |
drop_value | number or sequence of numbers or None | Value that will be set in dropped place. If set to None value will be sampled randomly, default ranges will be used: - uint8 - [0, 255] - uint16 - [0, 65535] - uint32 - [0, 4294967295] - float, double - [0, 1] Default: 0 |
mask_drop_value | number or sequence of numbers or None | Value that will be set in dropped place in masks. If set to None masks will be unchanged. Default: 0 |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image, mask
Image types: any
Source code in albumentations/augmentations/transforms.py
class PixelDropout(DualTransform):
"""Set pixels to 0 with some probability.
Args:
dropout_prob (float): pixel drop probability. Default: 0.01
per_channel (bool): if set to `True` drop mask will be sampled for each channel,
otherwise the same mask will be sampled for all channels. Default: False
drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
If set to None value will be sampled randomly, default ranges will be used:
- uint8 - [0, 255]
- uint16 - [0, 65535]
- uint32 - [0, 4294967295]
- float, double - [0, 1]
Default: 0
mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
If set to None masks will be unchanged. Default: 0
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask
Image types:
any
"""
_targets = (Targets.IMAGE, Targets.MASK)
def __init__(
self,
dropout_prob: float = 0.01,
per_channel: bool = False,
drop_value: Optional[ScaleFloatType] = 0,
mask_drop_value: Optional[ScaleFloatType] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.dropout_prob = dropout_prob
self.per_channel = per_channel
self.drop_value = drop_value
self.mask_drop_value = mask_drop_value
if self.mask_drop_value is not None and self.per_channel:
msg = "PixelDropout supports mask only with per_channel=False"
raise ValueError(msg)
def apply(
self,
img: np.ndarray,
drop_mask: Optional[np.ndarray] = None,
drop_value: Union[float, Sequence[float]] = (),
**params: Any,
) -> np.ndarray:
return F.pixel_dropout(img, drop_mask, drop_value)
def apply_to_mask(self, mask: np.ndarray, drop_mask: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
if self.mask_drop_value is None:
return mask
if mask.ndim == TWO:
drop_mask = np.squeeze(drop_mask)
return F.pixel_dropout(mask, drop_mask, self.mask_drop_value)
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return bbox
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return keypoint
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
shape = img.shape if self.per_channel else img.shape[:2]
rnd = np.random.RandomState(random.randint(0, 1 << 31))
# Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])
drop_value: Union[float, Sequence[float], np.ndarray]
if drop_mask.ndim != img.ndim:
drop_mask = np.expand_dims(drop_mask, -1)
if self.drop_value is None:
drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])
if img.dtype in (np.uint8, np.uint16, np.uint32):
drop_value = rnd.randint(0, int(F.MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
elif img.dtype in [np.float32, np.double]:
drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtype)
else:
raise ValueError(f"Unsupported dtype: {img.dtype}")
else:
drop_value = self.drop_value
return {"drop_mask": drop_mask, "drop_value": drop_value}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")
class Posterize
(num_bits=4, always_apply=False, p=0.5)
[view source on GitHub] ¶
Reduce the number of bits for each color channel.
Parameters:
Name | Type | Description |
---|---|---|
num_bits | int, int) or int, or list of ints [r, g, b], or list of ints [[r1, r1], [g1, g2], [b1, b2]] | number of high bits. If num_bits is a single value, the range will be [num_bits, num_bits]. Must be in range [0, 8]. Default: 4. |
p | float | probability of applying the transform. Default: 0.5. |
Targets: image
Image types: uint8
Source code in albumentations/augmentations/transforms.py
class Posterize(ImageOnlyTransform):
"""Reduce the number of bits for each color channel.
Args:
num_bits ((int, int) or int,
or list of ints [r, g, b],
or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
If num_bits is a single value, the range will be [num_bits, num_bits].
Must be in range [0, 8]. Default: 4.
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
"""
def __init__(
self,
num_bits: Union[int, Tuple[int, int], Tuple[int, int, int]] = 4,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if isinstance(num_bits, int):
self.num_bits = to_tuple(num_bits, num_bits)
elif isinstance(num_bits, Sequence) and len(num_bits) == THREE:
self.num_bits = [to_tuple(i, 0) for i in num_bits] # type: ignore[assignment]
else:
self.num_bits = to_tuple(num_bits, 0) # type: ignore[arg-type]
def apply(self, img: np.ndarray, num_bits: int = 1, **params: Any) -> np.ndarray:
return F.posterize(img, num_bits)
def get_params(self) -> Dict[str, Any]:
if len(self.num_bits) == THREE:
return {"num_bits": [random.randint(int(i[0]), int(i[1])) for i in self.num_bits]} # type: ignore[index]
num_bits = self.num_bits
return {"num_bits": random.randint(int(num_bits[0]), int(num_bits[1]))}
def get_transform_init_args_names(self) -> Tuple[str]:
return ("num_bits",)
class RGBShift
(r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly shift values for each channel of the input RGB image.
Parameters:
Name | Type | Description |
---|---|---|
r_shift_limit | Union[int, Tuple[int, int]] | range for changing values for the red channel. If r_shift_limit is a single int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20). |
g_shift_limit | Union[int, Tuple[int, int]] | range for changing values for the green channel. If g_shift_limit is a single int, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20). |
b_shift_limit | Union[int, Tuple[int, int]] | range for changing values for the blue channel. If b_shift_limit is a single int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RGBShift(ImageOnlyTransform):
"""Randomly shift values for each channel of the input RGB image.
Args:
r_shift_limit: range for changing values for the red channel. If r_shift_limit is a single
int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
g_shift_limit: range for changing values for the green channel. If g_shift_limit is a
single int, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
b_shift_limit: range for changing values for the blue channel. If b_shift_limit is a single
int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
r_shift_limit: ScaleIntType = 20,
g_shift_limit: ScaleIntType = 20,
b_shift_limit: ScaleIntType = 20,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.r_shift_limit = to_tuple(r_shift_limit)
self.g_shift_limit = to_tuple(g_shift_limit)
self.b_shift_limit = to_tuple(b_shift_limit)
def apply(self, img: np.ndarray, r_shift: int = 0, g_shift: int = 0, b_shift: int = 0, **params: Any) -> np.ndarray:
if not is_rgb_image(img):
msg = "RGBShift transformation expects 3-channel images."
raise TypeError(msg)
return F.shift_rgb(img, r_shift, g_shift, b_shift)
def get_params(self) -> Dict[str, Any]:
return {
"r_shift": random.uniform(self.r_shift_limit[0], self.r_shift_limit[1]),
"g_shift": random.uniform(self.g_shift_limit[0], self.g_shift_limit[1]),
"b_shift": random.uniform(self.b_shift_limit[0], self.b_shift_limit[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("r_shift_limit", "g_shift_limit", "b_shift_limit")
class RandomBrightnessContrast
(brightness_limit=0.2, contrast_limit=0.2, brightness_by_max=True, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly change brightness and contrast of the input image.
Parameters:
Name | Type | Description |
---|---|---|
brightness_limit | Union[float, Tuple[float, float]] | factor range for changing brightness. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2). |
contrast_limit | Union[float, Tuple[float, float]] | factor range for changing contrast. If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2). |
brightness_by_max | bool | If True adjust contrast by image dtype maximum, else adjust contrast by image mean. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomBrightnessContrast(ImageOnlyTransform):
"""Randomly change brightness and contrast of the input image.
Args:
brightness_limit: factor range for changing brightness.
If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
contrast_limit: factor range for changing contrast.
If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
brightness_by_max: If True adjust contrast by image dtype maximum,
else adjust contrast by image mean.
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
brightness_limit: ScaleFloatType = 0.2,
contrast_limit: ScaleFloatType = 0.2,
brightness_by_max: bool = True,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.brightness_limit = to_tuple(brightness_limit)
self.contrast_limit = to_tuple(contrast_limit)
self.brightness_by_max = brightness_by_max
def apply(self, img: np.ndarray, alpha: float = 1.0, beta: float = 0.0, **params: Any) -> np.ndarray:
return F.brightness_contrast_adjust(img, alpha, beta, self.brightness_by_max)
def get_params(self) -> Dict[str, float]:
return {
"alpha": 1.0 + random.uniform(self.contrast_limit[0], self.contrast_limit[1]),
"beta": 0.0 + random.uniform(self.brightness_limit[0], self.brightness_limit[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("brightness_limit", "contrast_limit", "brightness_by_max")
class RandomFog
(fog_coef_lower=0.3, fog_coef_upper=1, alpha_coef=0.08, always_apply=False, p=0.5)
[view source on GitHub] ¶
Simulates fog for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
fog_coef_lower | float | lower limit for fog intensity coefficient. Should be in [0, 1] range. |
fog_coef_upper | float | upper limit for fog intensity coefficient. Should be in [0, 1] range. |
alpha_coef | float | transparency of the fog circles. Should be in [0, 1] range. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomFog(ImageOnlyTransform):
"""Simulates fog for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
fog_coef_lower: lower limit for fog intensity coefficient. Should be in [0, 1] range.
fog_coef_upper: upper limit for fog intensity coefficient. Should be in [0, 1] range.
alpha_coef: transparency of the fog circles. Should be in [0, 1] range.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
fog_coef_lower: float = 0.3,
fog_coef_upper: float = 1,
alpha_coef: float = 0.08,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if not 0 <= fog_coef_lower <= fog_coef_upper <= 1:
raise ValueError(
f"Invalid combination if fog_coef_lower and fog_coef_upper. Got: {(fog_coef_lower, fog_coef_upper)}"
)
if not 0 <= alpha_coef <= 1:
raise ValueError(f"alpha_coef must be in range [0, 1]. Got: {alpha_coef}")
self.fog_coef_lower = fog_coef_lower
self.fog_coef_upper = fog_coef_upper
self.alpha_coef = alpha_coef
def apply(
self,
img: np.ndarray,
fog_coef: np.ndarray = 0.1,
haze_list: Optional[List[Tuple[int, int]]] = None,
**params: Any,
) -> np.ndarray:
if haze_list is None:
haze_list = []
return F.add_fog(img, fog_coef, self.alpha_coef, haze_list)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
fog_coef = random.uniform(self.fog_coef_lower, self.fog_coef_upper)
height, width = imshape = img.shape[:2]
hw = max(1, int(width // 3 * fog_coef))
haze_list = []
midx = width // 2 - 2 * hw
midy = height // 2 - hw
index = 1
while midx > -hw or midy > -hw:
for _ in range(hw // 10 * index):
x = random.randint(midx, width - midx - hw)
y = random.randint(midy, height - midy - hw)
haze_list.append((x, y))
midx -= 3 * hw * width // sum(imshape)
midy -= 3 * hw * height // sum(imshape)
index += 1
return {"haze_list": haze_list, "fog_coef": fog_coef}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("fog_coef_lower", "fog_coef_upper", "alpha_coef")
class RandomGamma
(gamma_limit=(80, 120), always_apply=False, p=0.5)
[view source on GitHub] ¶
Applies random gamma correction to an image as a form of data augmentation.
This class adjusts the luminance of an image by applying gamma correction with a randomly selected gamma value from a specified range. Gamma correction can simulate various lighting conditions, potentially enhancing model generalization. For more details on gamma correction, see: https://en.wikipedia.org/wiki/Gamma_correction
Attributes:
Name | Type | Description |
---|---|---|
gamma_limit | Union[int, Tuple[int, int]] | The range for gamma adjustment. If |
always_apply | bool | If |
p | float | The probability that the transform will be applied. Default is 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomGamma(ImageOnlyTransform):
"""Applies random gamma correction to an image as a form of data augmentation.
This class adjusts the luminance of an image by applying gamma correction with a randomly
selected gamma value from a specified range. Gamma correction can simulate various lighting
conditions, potentially enhancing model generalization. For more details on gamma correction,
see: https://en.wikipedia.org/wiki/Gamma_correction
Attributes:
gamma_limit (Union[int, Tuple[int, int]]): The range for gamma adjustment. If `gamma_limit` is a single
int, the range will be interpreted as (-gamma_limit, gamma_limit), defining how much
to adjust the image's gamma. Default is (80, 120).
always_apply (bool): If `True`, the transform will always be applied, regardless of `p`.
Default is `False`.
p (float): The probability that the transform will be applied. Default is 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
gamma_limit: ScaleIntType = (80, 120),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.gamma_limit = to_tuple(gamma_limit)
def apply(self, img: np.ndarray, gamma: float = 1, **params: Any) -> np.ndarray:
return F.gamma_transform(img, gamma=gamma)
def get_params(self) -> Dict[str, float]:
return {"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("gamma_limit",)
class RandomGravel
(gravel_roi=(0.1, 0.4, 0.9, 0.9), number_of_patches=2, always_apply=False, p=0.5)
[view source on GitHub] ¶
Add gravels.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
gravel_roi | Tuple[float, float, float, float] | (top-left x, top-left y, bottom-right x, bottom right y). Should be in [0, 1] range |
number_of_patches | int | no. of gravel patches required |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomGravel(ImageOnlyTransform):
"""Add gravels.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
gravel_roi: (top-left x, top-left y,
bottom-right x, bottom right y). Should be in [0, 1] range
number_of_patches: no. of gravel patches required
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
gravel_roi: Tuple[float, float, float, float] = (0.1, 0.4, 0.9, 0.9),
number_of_patches: int = 2,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
(gravel_lower_x, gravel_lower_y, gravel_upper_x, gravel_upper_y) = gravel_roi
if not 0 <= gravel_lower_x < gravel_upper_x <= 1 or not 0 <= gravel_lower_y < gravel_upper_y <= 1:
raise ValueError(f"Invalid gravel_roi. Got: {gravel_roi}.")
if number_of_patches < 1:
raise ValueError(f"Invalid gravel number_of_patches. Got: {number_of_patches}.")
self.gravel_roi = gravel_roi
self.number_of_patches = number_of_patches
def generate_gravel_patch(self, rectangular_roi: Tuple[int, int, int, int]) -> np.ndarray:
x1, y1, x2, y2 = rectangular_roi
area = abs((x2 - x1) * (y2 - y1))
count = area // 10
gravels = np.empty([count, 2], dtype=np.int64)
gravels[:, 0] = random_utils.randint(x1, x2, count)
gravels[:, 1] = random_utils.randint(y1, y2, count)
return gravels
def apply(self, img: np.ndarray, gravels_infos: Optional[List[Any]] = None, **params: Any) -> np.ndarray:
if gravels_infos is None:
gravels_infos = []
return F.add_gravel(img, gravels_infos)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
img = params["image"]
height, width = img.shape[:2]
x_min, y_min, x_max, y_max = self.gravel_roi
x_min = int(x_min * width)
x_max = int(x_max * width)
y_min = int(y_min * height)
y_max = int(y_max * height)
max_height = 200
max_width = 30
rectangular_rois = np.zeros([self.number_of_patches, 4], dtype=np.int64)
xx1 = random_utils.randint(x_min + 1, x_max, self.number_of_patches) # xmax
xx2 = random_utils.randint(x_min, xx1) # xmin
yy1 = random_utils.randint(y_min + 1, y_max, self.number_of_patches) # ymax
yy2 = random_utils.randint(y_min, yy1) # ymin
rectangular_rois[:, 0] = xx2
rectangular_rois[:, 1] = yy2
rectangular_rois[:, 2] = [min(tup) for tup in zip(xx1, xx2 + max_height)]
rectangular_rois[:, 3] = [min(tup) for tup in zip(yy1, yy2 + max_width)]
minx = []
maxx = []
miny = []
maxy = []
val = []
for roi in rectangular_rois:
gravels = self.generate_gravel_patch(roi)
x = gravels[:, 0]
y = gravels[:, 1]
r = random_utils.randint(1, 4, len(gravels))
sat = random_utils.randint(0, 255, len(gravels))
miny.append(np.maximum(y - r, 0))
maxy.append(np.minimum(y + r, y))
minx.append(np.maximum(x - r, 0))
maxx.append(np.minimum(x + r, x))
val.append(sat)
return {
"gravels_infos": np.stack(
[
np.concatenate(miny),
np.concatenate(maxy),
np.concatenate(minx),
np.concatenate(maxx),
np.concatenate(val),
],
1,
)
}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return "gravel_roi", "number_of_patches"
class RandomGridShuffle
(grid=(3, 3), always_apply=False, p=0.5)
[view source on GitHub] ¶
Random shuffle grid's cells on image.
Parameters:
Name | Type | Description |
---|---|---|
grid | int, int | size of grid for splitting image. |
Targets
image, mask, keypoints
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomGridShuffle(DualTransform):
"""Random shuffle grid's cells on image.
Args:
grid ((int, int)): size of grid for splitting image.
Targets:
image, mask, keypoints
Image types:
uint8, float32
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.KEYPOINTS)
def __init__(self, grid: Tuple[int, int] = (3, 3), always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply, p)
n, m = grid
if not all(isinstance(dim, int) and dim > 0 for dim in [n, m]):
raise ValueError(f"Grid dimensions must be positive integers. Current grid dimensions: [{n}, {m}]")
self.grid = grid
def apply(self, img: np.ndarray, tiles: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.swap_tiles_on_image(img, tiles)
def apply_to_mask(self, mask: np.ndarray, tiles: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.swap_tiles_on_image(mask, tiles)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
tiles: np.ndarray,
mapping: Dict[int, int],
**params: Any,
) -> KeypointInternalType:
x, y = keypoint[:2]
# Find which original tile the keypoint belongs to
for original_index, (start_y, start_x, end_y, end_x) in enumerate(tiles):
if start_y <= y < end_y and start_x <= x < end_x:
# Find this tile's new index after shuffling
new_index = mapping[original_index]
# Get the new tile's coordinates
new_start_y, new_start_x = tiles[new_index][:2]
# Map the keypoint to the new tile's position
new_x = (x - start_x) + new_start_x
new_y = (y - start_y) + new_start_y
return (new_x, new_y, *keypoint[2:])
# If the keypoint wasn't in any tile (shouldn't happen), log a warning for debugging purposes
warn(
"Keypoint not in any tile, returning it unchanged. This is unexpected and should be investigated.",
RuntimeWarning,
)
return keypoint
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
# Generate the original grid
original_tiles = split_uniform_grid(params["image"].shape[:2], self.grid)
# Copy the original grid to keep track of the initial positions
indexed_tiles = np.array(list(enumerate(original_tiles)), dtype=object)
# Shuffle the tiles while keeping track of original indices
random_utils.shuffle(indexed_tiles)
# Create a mapping from original positions to new positions
mapping = {original_index: i for i, (original_index, tile) in enumerate(indexed_tiles)}
# Extract the shuffled tiles without indices
shuffled_tiles = np.array([tile for _, tile in indexed_tiles])
return {"tiles": shuffled_tiles, "mapping": mapping}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("grid",)
class RandomRain
(slant_lower=-10, slant_upper=10, drop_length=20, drop_width=1, drop_color=(200, 200, 200), blur_value=7, brightness_coefficient=0.7, rain_type=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Adds rain effects.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
slant_lower | int | should be in range [-20, 20]. |
slant_upper | int | should be in range [-20, 20]. |
drop_length | int | should be in range [0, 100]. |
drop_width | int | should be in range [1, 5]. |
drop_color | list of (r, g, b | rain lines color. |
blur_value | int | rainy view are blurry |
brightness_coefficient | float | rainy days are usually shady. Should be in range [0, 1]. |
rain_type | Optional[str] | One of [None, "drizzle", "heavy", "torrential"] |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomRain(ImageOnlyTransform):
"""Adds rain effects.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
slant_lower: should be in range [-20, 20].
slant_upper: should be in range [-20, 20].
drop_length: should be in range [0, 100].
drop_width: should be in range [1, 5].
drop_color (list of (r, g, b)): rain lines color.
blur_value (int): rainy view are blurry
brightness_coefficient (float): rainy days are usually shady. Should be in range [0, 1].
rain_type: One of [None, "drizzle", "heavy", "torrential"]
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
slant_lower: int = -10,
slant_upper: int = 10,
drop_length: int = 20,
drop_width: int = 1,
drop_color: Tuple[int, int, int] = (200, 200, 200),
blur_value: int = 7,
brightness_coefficient: float = 0.7,
rain_type: Optional[str] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if rain_type not in ["drizzle", "heavy", "torrential", None]:
msg = "raint_type must be one of ({}). Got: {}".format(["drizzle", "heavy", "torrential", None], rain_type)
raise ValueError(msg)
if not -TWENTY <= slant_lower <= slant_upper <= TWENTY:
raise ValueError(f"Invalid combination of slant_lower and slant_upper. Got: {(slant_lower, slant_upper)}")
if not 1 <= drop_width <= FIVE:
raise ValueError(f"drop_width must be in range [1, 5]. Got: {drop_width}")
if not 0 <= drop_length <= MAX_JPEG_QUALITY:
raise ValueError(f"drop_length must be in range [0, 100]. Got: {drop_length}")
if not 0 <= brightness_coefficient <= 1:
raise ValueError(f"brightness_coefficient must be in range [0, 1]. Got: {brightness_coefficient}")
self.slant_lower = slant_lower
self.slant_upper = slant_upper
self.drop_length = drop_length
self.drop_width = drop_width
self.drop_color = drop_color
self.blur_value = blur_value
self.brightness_coefficient = brightness_coefficient
self.rain_type = rain_type
def apply(
self,
img: np.ndarray,
slant: int = 10,
drop_length: int = 20,
rain_drops: Optional[List[Tuple[int, int]]] = None,
**params: Any,
) -> np.ndarray:
if rain_drops is None:
rain_drops = []
return F.add_rain(
img,
slant,
drop_length,
self.drop_width,
self.drop_color,
self.blur_value,
self.brightness_coefficient,
rain_drops,
)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
slant = int(random.uniform(self.slant_lower, self.slant_upper))
height, width = img.shape[:2]
area = height * width
if self.rain_type == "drizzle":
num_drops = area // 770
drop_length = 10
elif self.rain_type == "heavy":
num_drops = width * height // 600
drop_length = 30
elif self.rain_type == "torrential":
num_drops = area // 500
drop_length = 60
else:
drop_length = self.drop_length
num_drops = area // 600
rain_drops = []
for _ in range(num_drops): # If You want heavy rain, try increasing this
x = random.randint(slant, width) if slant < 0 else random.randint(0, width - slant)
y = random.randint(0, height - drop_length)
rain_drops.append((x, y))
return {"drop_length": drop_length, "slant": slant, "rain_drops": rain_drops}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return (
"slant_lower",
"slant_upper",
"drop_length",
"drop_width",
"drop_color",
"blur_value",
"brightness_coefficient",
"rain_type",
)
class RandomShadow
(shadow_roi=(0, 0.5, 1, 1), num_shadows_lower=1, num_shadows_upper=2, shadow_dimension=5, always_apply=False, p=0.5)
[view source on GitHub] ¶
Simulates shadows for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
shadow_roi | Tuple[float, float, float, float] | region of the image where shadows will appear. All values should be in range [0, 1]. |
num_shadows_lower | int | Lower limit for the possible number of shadows. Should be in range [0, |
num_shadows_upper | int | Lower limit for the possible number of shadows. Should be in range [ |
shadow_dimension | int | number of edges in the shadow polygons |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomShadow(ImageOnlyTransform):
"""Simulates shadows for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
shadow_roi: region of the image where shadows
will appear. All values should be in range [0, 1].
num_shadows_lower: Lower limit for the possible number of shadows.
Should be in range [0, `num_shadows_upper`].
num_shadows_upper: Lower limit for the possible number of shadows.
Should be in range [`num_shadows_lower`, inf].
shadow_dimension: number of edges in the shadow polygons
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
shadow_roi: Tuple[float, float, float, float] = (0, 0.5, 1, 1),
num_shadows_lower: int = 1,
num_shadows_upper: int = 2,
shadow_dimension: int = 5,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
(shadow_lower_x, shadow_lower_y, shadow_upper_x, shadow_upper_y) = shadow_roi
if not 0 <= shadow_lower_x <= shadow_upper_x <= 1 or not 0 <= shadow_lower_y <= shadow_upper_y <= 1:
raise ValueError(f"Invalid shadow_roi. Got: {shadow_roi}")
if not 0 <= num_shadows_lower <= num_shadows_upper:
msg = "Invalid combination of num_shadows_lower nad num_shadows_upper. "
f"Got: {(num_shadows_lower, num_shadows_upper)}"
raise ValueError(msg)
self.shadow_roi = shadow_roi
self.num_shadows_lower = num_shadows_lower
self.num_shadows_upper = num_shadows_upper
self.shadow_dimension = shadow_dimension
def apply(
self, img: np.ndarray, vertices_list: Optional[List[List[Tuple[int, int]]]] = None, **params: Any
) -> np.ndarray:
if vertices_list is None:
vertices_list = []
return F.add_shadow(img, vertices_list)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, List[np.ndarray]]:
img = params["image"]
height, width = img.shape[:2]
num_shadows = random.randint(self.num_shadows_lower, self.num_shadows_upper)
x_min, y_min, x_max, y_max = self.shadow_roi
x_min = int(x_min * width)
x_max = int(x_max * width)
y_min = int(y_min * height)
y_max = int(y_max * height)
vertices_list = []
for _ in range(num_shadows):
vertex = [
(random.randint(x_min, x_max), random.randint(y_min, y_max)) for _ in range(self.shadow_dimension)
]
vertices = np.array([vertex], dtype=np.int32)
vertices_list.append(vertices)
return {"vertices_list": vertices_list}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return (
"shadow_roi",
"num_shadows_lower",
"num_shadows_upper",
"shadow_dimension",
)
class RandomSnow
(snow_point_lower=0.1, snow_point_upper=0.3, brightness_coeff=2.5, always_apply=False, p=0.5)
[view source on GitHub] ¶
Bleach out some pixel values simulating snow.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
snow_point_lower | float | lower_bond of the amount of snow. Should be in [0, 1] range |
snow_point_upper | float | upper_bond of the amount of snow. Should be in [0, 1] range |
brightness_coeff | float | larger number will lead to a more snow on the image. Should be >= 0 |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomSnow(ImageOnlyTransform):
"""Bleach out some pixel values simulating snow.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
snow_point_lower: lower_bond of the amount of snow. Should be in [0, 1] range
snow_point_upper: upper_bond of the amount of snow. Should be in [0, 1] range
brightness_coeff: larger number will lead to a more snow on the image. Should be >= 0
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
snow_point_lower: float = 0.1,
snow_point_upper: float = 0.3,
brightness_coeff: float = 2.5,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
if not 0 <= snow_point_lower <= snow_point_upper <= 1:
msg = (
"Invalid combination of snow_point_lower and snow_point_upper. "
f"Got: {(snow_point_lower, snow_point_upper)}"
)
raise ValueError(msg)
if brightness_coeff < 0:
raise ValueError(f"brightness_coeff must be greater than 0. Got: {brightness_coeff}")
self.snow_point_lower = snow_point_lower
self.snow_point_upper = snow_point_upper
self.brightness_coeff = brightness_coeff
def apply(self, img: np.ndarray, snow_point: float = 0.1, **params: Any) -> np.ndarray:
return F.add_snow(img, snow_point, self.brightness_coeff)
def get_params(self) -> Dict[str, np.ndarray]:
return {"snow_point": random.uniform(self.snow_point_lower, self.snow_point_upper)}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("snow_point_lower", "snow_point_upper", "brightness_coeff")
class RandomSunFlare
(flare_roi=(0, 0, 1, 0.5), angle_lower=0, angle_upper=1, num_flare_circles_lower=6, num_flare_circles_upper=10, src_radius=400, src_color=(255, 255, 255), always_apply=False, p=0.5)
[view source on GitHub] ¶
Simulates Sun Flare for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Parameters:
Name | Type | Description |
---|---|---|
flare_roi | Tuple[float, float, float, float] | region of the image where flare will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1]. |
angle_lower | float | should be in range [0, |
angle_upper | float | should be in range [ |
num_flare_circles_lower | int | lower limit for the number of flare circles. Should be in range [0, |
num_flare_circles_upper | int | upper limit for the number of flare circles. Should be in range [ |
src_radius | int | |
src_color | Tuple[int, int, int] | color of the flare |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class RandomSunFlare(ImageOnlyTransform):
"""Simulates Sun Flare for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
flare_roi: region of the image where flare will appear (x_min, y_min, x_max, y_max).
All values should be in range [0, 1].
angle_lower: should be in range [0, `angle_upper`].
angle_upper: should be in range [`angle_lower`, 1].
num_flare_circles_lower: lower limit for the number of flare circles.
Should be in range [0, `num_flare_circles_upper`].
num_flare_circles_upper: upper limit for the number of flare circles.
Should be in range [`num_flare_circles_lower`, inf].
src_radius:
src_color: color of the flare
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
flare_roi: Tuple[float, float, float, float] = (0, 0, 1, 0.5),
angle_lower: float = 0,
angle_upper: float = 1,
num_flare_circles_lower: int = 6,
num_flare_circles_upper: int = 10,
src_radius: int = 400,
src_color: Tuple[int, int, int] = (255, 255, 255),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
(
flare_center_lower_x,
flare_center_lower_y,
flare_center_upper_x,
flare_center_upper_y,
) = flare_roi
if (
not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
):
raise ValueError(f"Invalid flare_roi. Got: {flare_roi}")
if not 0 <= angle_lower < angle_upper <= 1:
raise ValueError(f"Invalid combination of angle_lower nad angle_upper. Got: {(angle_lower, angle_upper)}")
if not 0 <= num_flare_circles_lower < num_flare_circles_upper:
msg = (
"Invalid combination of num_flare_circles_lower and num_flare_circles_upper. "
f"Got: {(num_flare_circles_lower, num_flare_circles_upper)}"
)
raise ValueError(msg)
self.flare_center_lower_x = flare_center_lower_x
self.flare_center_upper_x = flare_center_upper_x
self.flare_center_lower_y = flare_center_lower_y
self.flare_center_upper_y = flare_center_upper_y
self.angle_lower = angle_lower
self.angle_upper = angle_upper
self.num_flare_circles_lower = num_flare_circles_lower
self.num_flare_circles_upper = num_flare_circles_upper
self.src_radius = src_radius
self.src_color = src_color
def apply(
self,
img: np.ndarray,
flare_center_x: float = 0.5,
flare_center_y: float = 0.5,
circles: Optional[List[Any]] = None,
**params: Any,
) -> np.ndarray:
if circles is None:
circles = []
return F.add_sun_flare(
img,
flare_center_x,
flare_center_y,
self.src_radius,
self.src_color,
circles,
)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
height, width = img.shape[:2]
angle = 2 * math.pi * random.uniform(self.angle_lower, self.angle_upper)
flare_center_x = random.uniform(self.flare_center_lower_x, self.flare_center_upper_x)
flare_center_y = random.uniform(self.flare_center_lower_y, self.flare_center_upper_y)
flare_center_x = int(width * flare_center_x)
flare_center_y = int(height * flare_center_y)
num_circles = random.randint(self.num_flare_circles_lower, self.num_flare_circles_upper)
circles = []
x = []
y = []
def line(t: float) -> Tuple[float, float]:
return (flare_center_x + t * math.cos(angle), flare_center_y + t * math.sin(angle))
for t_val in range(-flare_center_x, width - flare_center_x, 10):
rand_x, rand_y = line(t_val)
x.append(rand_x)
y.append(rand_y)
for _ in range(num_circles):
alpha = random.uniform(0.05, 0.2)
r = random.randint(0, len(x) - 1)
rad = random.randint(1, max(height // 100 - 2, 2))
r_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
g_color = random.randint(max(self.src_color[1] - 50, 0), self.src_color[1])
b_color = random.randint(max(self.src_color[2] - 50, 0), self.src_color[2])
circles += [
(
alpha,
(int(x[r]), int(y[r])),
pow(rad, 3),
(r_color, g_color, b_color),
)
]
return {
"circles": circles,
"flare_center_x": flare_center_x,
"flare_center_y": flare_center_y,
}
def get_transform_init_args(self) -> Dict[str, Any]:
return {
"flare_roi": (
self.flare_center_lower_x,
self.flare_center_lower_y,
self.flare_center_upper_x,
self.flare_center_upper_y,
),
"angle_lower": self.angle_lower,
"angle_upper": self.angle_upper,
"num_flare_circles_lower": self.num_flare_circles_lower,
"num_flare_circles_upper": self.num_flare_circles_upper,
"src_radius": self.src_radius,
"src_color": self.src_color,
}
class RandomToneCurve
(scale=0.1, always_apply=False, p=0.5)
[view source on GitHub] ¶
Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.
Parameters:
Name | Type | Description |
---|---|---|
scale | float | standard deviation of the normal distribution. Used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Default: 0.1 |
Targets
image
Image types: uint8
Source code in albumentations/augmentations/transforms.py
class RandomToneCurve(ImageOnlyTransform):
"""Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve.
Args:
scale: standard deviation of the normal distribution.
Used to sample random distances to move two control points that modify the image's curve.
Values should be in range [0, 1]. Default: 0.1
Targets:
image
Image types:
uint8
"""
def __init__(
self,
scale: float = 0.1,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.scale = scale
def apply(self, img: np.ndarray, low_y: float, high_y: float, **params: Any) -> np.ndarray:
return F.move_tone_curve(img, low_y, high_y)
def get_params(self) -> Dict[str, float]:
return {
"low_y": np.clip(random_utils.normal(loc=0.25, scale=self.scale), 0, 1),
"high_y": np.clip(random_utils.normal(loc=0.75, scale=self.scale), 0, 1),
}
def get_transform_init_args_names(self) -> Tuple[str]:
return ("scale",)
class RingingOvershoot
(blur_limit=(7, 15), cutoff=(0.7853981633974483, 1.5707963267948966), always_apply=False, p=0.5)
[view source on GitHub] ¶
Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | Union[int, Tuple[int, int]] | maximum kernel size for sinc filter. Should be in range [3, inf). Default: (7, 15). |
cutoff | Union[float, Tuple[float, float]] | range to choose the cutoff frequency in radians. Should be in range (0, np.pi) Default: (np.pi / 4, np.pi / 2). |
p | float | probability of applying the transform. Default: 0.5. |
Reference
dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter https://arxiv.org/abs/2107.10833
Targets
image
Source code in albumentations/augmentations/transforms.py
class RingingOvershoot(ImageOnlyTransform):
"""Create ringing or overshoot artefacts by conlvolving image with 2D sinc filter.
Args:
blur_limit: maximum kernel size for sinc filter.
Should be in range [3, inf). Default: (7, 15).
cutoff: range to choose the cutoff frequency in radians.
Should be in range (0, np.pi)
Default: (np.pi / 4, np.pi / 2).
p: probability of applying the transform. Default: 0.5.
Reference:
dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
https://arxiv.org/abs/2107.10833
Targets:
image
"""
def __init__(
self,
blur_limit: ScaleIntType = (7, 15),
cutoff: ScaleFloatType = (np.pi / 4, np.pi / 2),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))
self.cutoff = self.__check_values(to_tuple(cutoff, np.pi / 2), name="cutoff", bounds=(0, np.pi))
@staticmethod
def __check_values(
value: Tuple[float, float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
) -> Tuple[float, float]:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{name} values should be between {bounds}")
return value
def get_params(self) -> Dict[str, np.ndarray]:
ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
if ksize % 2 == 0:
raise ValueError(f"Kernel size must be odd. Got: {ksize}")
cutoff = random.uniform(*self.cutoff)
# From dsp.stackexchange.com/questions/58301/2-d-circularly-symmetric-low-pass-filter
with np.errstate(divide="ignore", invalid="ignore"):
kernel = np.fromfunction(
lambda x, y: cutoff
* special.j1(cutoff * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2))
/ (2 * np.pi * np.sqrt((x - (ksize - 1) / 2) ** 2 + (y - (ksize - 1) / 2) ** 2)),
[ksize, ksize],
)
kernel[(ksize - 1) // 2, (ksize - 1) // 2] = cutoff**2 / (4 * np.pi)
# Normalize kernel
kernel = kernel.astype(np.float32) / np.sum(kernel)
return {"kernel": kernel}
def apply(self, img: np.ndarray, kernel: Optional[int] = None, **params: Any) -> np.ndarray:
return F.convolve(img, kernel)
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("blur_limit", "cutoff")
class Sharpen
(alpha=(0.2, 0.5), lightness=(0.5, 1.0), always_apply=False, p=0.5)
[view source on GitHub] ¶
Sharpen the input image and overlays the result with the original image.
Parameters:
Name | Type | Description |
---|---|---|
alpha | Tuple[float, float] | range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5). |
lightness | Tuple[float, float] | range to choose the lightness of the sharpened image. Default: (0.5, 1.0). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Source code in albumentations/augmentations/transforms.py
class Sharpen(ImageOnlyTransform):
"""Sharpen the input image and overlays the result with the original image.
Args:
alpha: range to choose the visibility of the sharpened image. At 0, only the original image is
visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
lightness: range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
p: probability of applying the transform. Default: 0.5.
Targets:
image
"""
def __init__(
self,
alpha: Tuple[float, float] = (0.2, 0.5),
lightness: Tuple[float, float] = (0.5, 1.0),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
self.lightness = self.__check_values(to_tuple(lightness, 0.0), name="lightness")
@staticmethod
def __check_values(
value: Tuple[float, float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
) -> Tuple[float, float]:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{name} values should be between {bounds}")
return value
@staticmethod
def __generate_sharpening_matrix(alpha_sample: np.ndarray, lightness_sample: np.ndarray) -> np.ndarray:
matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
matrix_effect = np.array(
[[-1, -1, -1], [-1, 8 + lightness_sample, -1], [-1, -1, -1]],
dtype=np.float32,
)
return (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect
def get_params(self) -> Dict[str, np.ndarray]:
alpha = random.uniform(*self.alpha)
lightness = random.uniform(*self.lightness)
sharpening_matrix = self.__generate_sharpening_matrix(alpha_sample=alpha, lightness_sample=lightness)
return {"sharpening_matrix": sharpening_matrix}
def apply(self, img: np.ndarray, sharpening_matrix: Optional[np.ndarray] = None, **params: Any) -> np.ndarray:
return F.convolve(img, sharpening_matrix)
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("alpha", "lightness")
class Solarize
(threshold=128, always_apply=False, p=0.5)
[view source on GitHub] ¶
Invert all pixel values above a threshold.
Parameters:
Name | Type | Description |
---|---|---|
threshold | Union[float, Tuple[float, float], int, Tuple[int, int]] | range for solarizing threshold. If threshold is a single value, the range will be [threshold, threshold]. Default: 128. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: any
Source code in albumentations/augmentations/transforms.py
class Solarize(ImageOnlyTransform):
"""Invert all pixel values above a threshold.
Args:
threshold: range for solarizing threshold.
If threshold is a single value, the range will be [threshold, threshold]. Default: 128.
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
any
"""
def __init__(self, threshold: ScaleType = 128, always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply, p)
if isinstance(threshold, (int, float)):
self.threshold = to_tuple(threshold, low=threshold)
else:
self.threshold = to_tuple(threshold, low=0)
self.threshold = self.threshold
def apply(self, img: np.ndarray, threshold: int = 0, **params: Any) -> np.ndarray:
return F.solarize(img, threshold)
def get_params(self) -> Dict[str, float]:
return {"threshold": random.uniform(self.threshold[0], self.threshold[1])}
def get_transform_init_args_names(self) -> Tuple[str]:
return ("threshold",)
class Spatter
(mean=0.65, std=0.3, gauss_sigma=2, cutout_threshold=0.68, intensity=0.6, mode='rain', color=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.
Parameters:
Name | Type | Description |
---|---|---|
mean | float, or tuple of floats | Mean value of normal distribution for generating liquid layer. If single float it will be used as mean. If tuple of float mean will be sampled from range |
std | float, or tuple of floats | Standard deviation value of normal distribution for generating liquid layer. If single float it will be used as std. If tuple of float std will be sampled from range |
gauss_sigma | float, or tuple of floats | Sigma value for gaussian filtering of liquid layer. If single float it will be used as gauss_sigma. If tuple of float gauss_sigma will be sampled from range |
cutout_threshold | float, or tuple of floats | Threshold for filtering liqued layer (determines number of drops). If single float it will used as cutout_threshold. If tuple of float cutout_threshold will be sampled from range |
intensity | float, or tuple of floats | Intensity of corruption. If single float it will be used as intensity. If tuple of float intensity will be sampled from range |
mode | string, or list of strings | Type of corruption. Currently, supported options are 'rain' and 'mud'. If list is provided type of corruption will be sampled list. Default: ("rain"). |
color | list of (r, g, b) or dict or None | Corruption elements color. If list uses provided list as color for specified mode. If dict uses provided color for specified mode. Color for each specified mode should be provided in dict. If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)). |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Reference: | https://arxiv.org/pdf/1903.12261.pdf | https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
Source code in albumentations/augmentations/transforms.py
class Spatter(ImageOnlyTransform):
"""Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.
Args:
mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
If single float it will be used as mean.
If tuple of float mean will be sampled from range `[mean[0], mean[1])`. Default: (0.65).
std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
If single float it will be used as std.
If tuple of float std will be sampled from range `[std[0], std[1])`. Default: (0.3).
gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
If single float it will be used as gauss_sigma.
If tuple of float gauss_sigma will be sampled from range `[sigma[0], sigma[1])`. Default: (2).
cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
(determines number of drops). If single float it will used as cutout_threshold.
If tuple of float cutout_threshold will be sampled from range `[cutout_threshold[0], cutout_threshold[1])`.
Default: (0.68).
intensity (float, or tuple of floats): Intensity of corruption.
If single float it will be used as intensity.
If tuple of float intensity will be sampled from range `[intensity[0], intensity[1])`. Default: (0.6).
mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
If list is provided type of corruption will be sampled list. Default: ("rain").
color (list of (r, g, b) or dict or None): Corruption elements color.
If list uses provided list as color for specified mode.
If dict uses provided color for specified mode. Color for each specified mode should be provided in dict.
If None uses default colors (rain: (238, 238, 175), mud: (20, 42, 63)).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
Reference:
| https://arxiv.org/pdf/1903.12261.pdf
| https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
"""
def __init__(
self,
mean: ScaleFloatType = 0.65,
std: ScaleFloatType = 0.3,
gauss_sigma: ScaleFloatType = 2,
cutout_threshold: ScaleFloatType = 0.68,
intensity: ScaleFloatType = 0.6,
mode: Union[SpatterMode, Sequence[SpatterMode]] = "rain",
color: Optional[Union[Sequence[int], Dict[str, Sequence[int]]]] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.mean = to_tuple(mean, mean)
self.std = to_tuple(std, std)
self.gauss_sigma = to_tuple(gauss_sigma, gauss_sigma)
self.intensity = to_tuple(intensity, intensity)
self.cutout_threshold = to_tuple(cutout_threshold, cutout_threshold)
self.color = (
color
if color is not None
else {
"rain": [238, 238, 175],
"mud": [20, 42, 63],
}
)
self.mode = mode if isinstance(mode, (list, tuple)) else [mode]
if len(set(self.mode)) > 1 and not isinstance(self.color, dict):
raise ValueError(f"Unsupported color: {self.color}. Please specify color for each mode (use dict for it).")
for i in self.mode:
if i not in ["rain", "mud"]:
raise ValueError(f"Unsupported color mode: {mode}. Transform supports only `rain` and `mud` mods.")
if isinstance(self.color, dict):
if i not in self.color:
raise ValueError(f"Wrong color definition: {self.color}. Color for mode: {i} not specified.")
if len(self.color[i]) != THREE:
raise ValueError(
f"Unsupported color: {self.color[i]} for mode {i}. Color should be presented in RGB format."
)
if isinstance(self.color, (list, tuple)):
if len(self.color) != THREE:
raise ValueError(f"Unsupported color: {self.color}. Color should be presented in RGB format.")
self.color = {self.mode[0]: self.color}
def apply(
self,
img: np.ndarray,
non_mud: Optional[np.ndarray] = None,
mud: Optional[np.ndarray] = None,
drops: Optional[np.ndarray] = None,
mode: SpatterMode = "mud",
**params: Dict[str, Any],
) -> np.ndarray:
return F.spatter(img, non_mud, mud, drops, mode)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
height, width = params["image"].shape[:2]
mean = random.uniform(self.mean[0], self.mean[1])
std = random.uniform(self.std[0], self.std[1])
cutout_threshold = random.uniform(self.cutout_threshold[0], self.cutout_threshold[1])
sigma = random.uniform(self.gauss_sigma[0], self.gauss_sigma[1])
mode = random.choice(self.mode)
intensity = random.uniform(self.intensity[0], self.intensity[1])
color = np.array(self.color[mode]) / 255.0
liquid_layer = random_utils.normal(size=(height, width), loc=mean, scale=std)
liquid_layer = gaussian_filter(liquid_layer, sigma=sigma, mode="nearest")
liquid_layer[liquid_layer < cutout_threshold] = 0
if mode == "rain":
liquid_layer = (liquid_layer * 255).astype(np.uint8)
dist = 255 - cv2.Canny(liquid_layer, 50, 150)
dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
_, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
dist = blur(dist, 3).astype(np.uint8)
dist = F.equalize(dist)
ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
dist = F.convolve(dist, ker)
dist = blur(dist, 3).astype(np.float32)
m = liquid_layer * dist
m *= 1 / np.max(m, axis=(0, 1))
drops = m[:, :, None] * color * intensity
mud = None
non_mud = None
else:
m = np.where(liquid_layer > cutout_threshold, 1, 0)
m = gaussian_filter(m.astype(np.float32), sigma=sigma, mode="nearest")
m[m < 1.2 * cutout_threshold] = 0
m = m[..., np.newaxis]
mud = m * color
non_mud = 1 - m
drops = None
return {
"non_mud": non_mud,
"mud": mud,
"drops": drops,
"mode": mode,
}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str, str]:
return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode", "color"
class Superpixels
(p_replace=0.1, n_segments=100, max_size=128, interpolation=1, always_apply=False, p=0.5)
[view source on GitHub] ¶
Transform images partially/completely to their superpixel representation. This implementation uses skimage's version of the SLIC algorithm.
Parameters:
Name | Type | Description |
---|---|---|
p_replace | float or tuple of float | Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed). |
Examples:
- A probability of
0.0
would mean, that the pixels in no segment are replaced by their average color (image is not changed at all). - A probability of
0.5
would mean, that around half of all segments are replaced by their average color. - A probability of
1.0
would mean, that all segments are replaced by their average color (resulting in a voronoi image).
Behaviour based on chosen data types for this parameter:
* If a ``float``, then that ``flat`` will always be used.
* If ``tuple`` ``(a, b)``, then a random probability will be
sampled from the interval ``[a, b]`` per image.
n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
may deviate from this number). Lower value will lead to coarser superpixels.
Higher values are computationally more intensive and will hence lead to a slowdown
* If a single ``int``, then that value will always be used as the
number of segments.
* If a ``tuple`` ``(a, b)``, then a value from the discrete
interval ``[a..b]`` will be sampled per image.
max_size (int or None): Maximum image size at which the augmentation is performed.
If the width or height of an image exceeds this value, it will be
downscaled before the augmentation so that the longest side matches `max_size`.
This is done to speed up the process. The final output image has the same size as the input image.
Note that in case `p_replace` is below ``1.0``,
the down-/upscaling will affect the not-replaced pixels too.
Use ``None`` to apply no down-/upscaling.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.
Targets
image
Source code in albumentations/augmentations/transforms.py
class Superpixels(ImageOnlyTransform):
"""Transform images partially/completely to their superpixel representation.
This implementation uses skimage's version of the SLIC algorithm.
Args:
p_replace (float or tuple of float): Defines for any segment the probability that the pixels within that
segment are replaced by their average color (otherwise, the pixels are not changed).
Examples:
* A probability of ``0.0`` would mean, that the pixels in no
segment are replaced by their average color (image is not
changed at all).
* A probability of ``0.5`` would mean, that around half of all
segments are replaced by their average color.
* A probability of ``1.0`` would mean, that all segments are
replaced by their average color (resulting in a voronoi
image).
Behaviour based on chosen data types for this parameter:
* If a ``float``, then that ``flat`` will always be used.
* If ``tuple`` ``(a, b)``, then a random probability will be
sampled from the interval ``[a, b]`` per image.
n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
may deviate from this number). Lower value will lead to coarser superpixels.
Higher values are computationally more intensive and will hence lead to a slowdown
* If a single ``int``, then that value will always be used as the
number of segments.
* If a ``tuple`` ``(a, b)``, then a value from the discrete
interval ``[a..b]`` will be sampled per image.
max_size (int or None): Maximum image size at which the augmentation is performed.
If the width or height of an image exceeds this value, it will be
downscaled before the augmentation so that the longest side matches `max_size`.
This is done to speed up the process. The final output image has the same size as the input image.
Note that in case `p_replace` is below ``1.0``,
the down-/upscaling will affect the not-replaced pixels too.
Use ``None`` to apply no down-/upscaling.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
"""
def __init__(
self,
p_replace: ScaleFloatType = 0.1,
n_segments: ScaleIntType = 100,
max_size: Optional[int] = 128,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.p_replace = to_tuple(p_replace, p_replace)
self.n_segments = to_tuple(n_segments, n_segments)
self.max_size = max_size
self.interpolation = interpolation
if min(self.n_segments) < 1:
raise ValueError(f"n_segments must be >= 1. Got: {n_segments}")
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("p_replace", "n_segments", "max_size", "interpolation")
def get_params(self) -> Dict[str, Any]:
n_segments = random.randint(*self.n_segments)
p = random.uniform(*self.p_replace)
return {"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}
def apply(
self, img: np.ndarray, replace_samples: Sequence[bool] = (False,), n_segments: int = 1, **kwargs: Any
) -> np.ndarray:
return F.superpixels(img, n_segments, replace_samples, self.max_size, cast(int, self.interpolation))
class TemplateTransform
(templates, img_weight=0.5, template_weight=0.5, template_transform=None, name=None, always_apply=False, p=0.5)
[view source on GitHub] ¶
Apply blending of input image with specified templates
Parameters:
Name | Type | Description |
---|---|---|
templates | numpy array or list of numpy arrays | Images as template for transform. |
img_weight | Union[float, Tuple[float, float]] | If single float will be used as weight for input image. If tuple of float img_weight will be in range |
template_weight | Union[float, Tuple[float, float]] | If single float will be used as weight for template. If tuple of float template_weight will be in range |
template_transform | Optional[Callable[..., Any]] | transformation object which could be applied to template, must produce template the same size as input image. |
name | Optional[str] | (Optional) Name of transform, used only for deserialization. |
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class TemplateTransform(ImageOnlyTransform):
"""Apply blending of input image with specified templates
Args:
templates (numpy array or list of numpy arrays): Images as template for transform.
img_weight: If single float will be used as weight for input image.
If tuple of float img_weight will be in range `[img_weight[0], img_weight[1])`. Default: 0.5.
template_weight: If single float will be used as weight for template.
If tuple of float template_weight will be in range `[template_weight[0], template_weight[1])`.
Default: 0.5.
template_transform: transformation object which could be applied to template,
must produce template the same size as input image.
name: (Optional) Name of transform, used only for deserialization.
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
templates: Union[np.ndarray, List[np.ndarray]],
img_weight: ScaleFloatType = 0.5,
template_weight: ScaleFloatType = 0.5,
template_transform: Optional[Callable[..., Any]] = None,
name: Optional[str] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.templates = templates if isinstance(templates, (list, tuple)) else [templates]
self.img_weight = to_tuple(img_weight, img_weight)
self.template_weight = to_tuple(template_weight, template_weight)
self.template_transform = template_transform
self.name = name
def apply(
self,
img: np.ndarray,
template: Optional[np.ndarray] = None,
img_weight: float = 0.5,
template_weight: float = 0.5,
**params: Any,
) -> np.ndarray:
return F.add_weighted(img, img_weight, template, template_weight)
def get_params(self) -> Dict[str, float]:
return {
"img_weight": random.uniform(self.img_weight[0], self.img_weight[1]),
"template_weight": random.uniform(self.template_weight[0], self.template_weight[1]),
}
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
template = random.choice(self.templates)
if self.template_transform is not None:
template = self.template_transform(image=template)["image"]
if get_num_channels(template) not in [1, get_num_channels(img)]:
msg = (
"Template must be a single channel or "
"has the same number of channels as input "
f"image ({get_num_channels(img)}), got {get_num_channels(template)}"
)
raise ValueError(msg)
if template.dtype != img.dtype:
msg = "Image and template must be the same image type"
raise ValueError(msg)
if img.shape[:2] != template.shape[:2]:
raise ValueError(f"Image and template must be the same size, got {img.shape[:2]} and {template.shape[:2]}")
if get_num_channels(template) == 1 and get_num_channels(img) > 1:
template = np.stack((template,) * get_num_channels(img), axis=-1)
# in order to support grayscale image with dummy dim
template = template.reshape(img.shape)
return {"template": template}
@classmethod
def is_serializable(cls) -> bool:
return False
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def to_dict_private(self) -> Dict[str, Any]:
if self.name is None:
msg = (
"To make a TemplateTransform serializable you should provide the `name` argument, "
"e.g. `TemplateTransform(name='my_transform', ...)`."
)
raise ValueError(msg)
return {"__class_fullname__": self.get_class_fullname(), "__name__": self.name}
class ToFloat
(max_value=None, always_apply=False, p=1.0)
[view source on GitHub] ¶
Divide pixel values by max_value
to get a float32 output array where all values lie in the range [0, 1.0]. If max_value
is None the transform will try to infer the maximum value by inspecting the data type of the input image.
See Also: :class:~albumentations.augmentations.transforms.FromFloat
Parameters:
Name | Type | Description |
---|---|---|
max_value | Optional[float] | maximum possible input value. Default: None. |
p | float | probability of applying the transform. Default: 1.0. |
Targets
image
Image types: any type
Source code in albumentations/augmentations/transforms.py
class ToFloat(ImageOnlyTransform):
"""Divide pixel values by `max_value` to get a float32 output array where all values lie in the range [0, 1.0].
If `max_value` is None the transform will try to infer the maximum value by inspecting the data type of the input
image.
See Also:
:class:`~albumentations.augmentations.transforms.FromFloat`
Args:
max_value: maximum possible input value. Default: None.
p: probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
any type
"""
def __init__(self, max_value: Optional[float] = None, always_apply: bool = False, p: float = 1.0):
super().__init__(always_apply, p)
self.max_value = max_value
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return F.to_float(img, self.max_value)
def get_transform_init_args_names(self) -> Tuple[str]:
return ("max_value",)
class ToGray
[view source on GitHub] ¶
Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater than 127, invert the resulting grayscale image.
Parameters:
Name | Type | Description |
---|---|---|
p | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class ToGray(ImageOnlyTransform):
"""Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater
than 127, invert the resulting grayscale image.
Args:
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
if is_grayscale_image(img):
warnings.warn("The image is already gray.")
return img
if not is_rgb_image(img):
msg = "ToGray transformation expects 3-channel images."
raise TypeError(msg)
return F.to_gray(img)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class ToRGB
(always_apply=True, p=1.0)
[view source on GitHub] ¶
Convert the input grayscale image to RGB.
Parameters:
Name | Type | Description |
---|---|---|
p | float | probability of applying the transform. Default: 1. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class ToRGB(ImageOnlyTransform):
"""Convert the input grayscale image to RGB.
Args:
p: probability of applying the transform. Default: 1.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, always_apply: bool = True, p: float = 1.0):
super().__init__(always_apply=always_apply, p=p)
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
if is_rgb_image(img):
warnings.warn("The image is already an RGB.")
return img
if not is_grayscale_image(img):
msg = "ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1."
raise TypeError(msg)
return F.gray_to_rgb(img)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class ToSepia
(always_apply=False, p=0.5)
[view source on GitHub] ¶
Applies sepia filter to the input RGB image
Parameters:
Name | Type | Description |
---|---|---|
p | float | probability of applying the transform. Default: 0.5. |
Targets
image
Image types: uint8, float32
Source code in albumentations/augmentations/transforms.py
class ToSepia(ImageOnlyTransform):
"""Applies sepia filter to the input RGB image
Args:
p: probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply, p)
self.sepia_transformation_matrix = np.array(
[[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]]
)
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
if not is_rgb_image(img):
msg = "ToSepia transformation expects 3-channel images."
raise TypeError(msg)
return F.linear_transformation_rgb(img, self.sepia_transformation_matrix)
def get_transform_init_args_names(self) -> Tuple[()]:
return ()
class UnsharpMask
(blur_limit=(3, 7), sigma_limit=0.0, alpha=(0.2, 0.5), threshold=10, always_apply=False, p=0.5)
[view source on GitHub] ¶
Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.
Parameters:
Name | Type | Description |
---|---|---|
blur_limit | Union[int, Tuple[int, int]] | maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as |
sigma_limit | Union[float, Tuple[float, float]] | Gaussian kernel standard deviation. Must be in range [0, inf). If set single value |
alpha | Union[float, Tuple[float, float]] | range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5). |
threshold | int | Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10. |
p | float | probability of applying the transform. Default: 0.5. |
Reference
arxiv.org/pdf/2107.10833.pdf
Targets
image
Source code in albumentations/augmentations/transforms.py
class UnsharpMask(ImageOnlyTransform):
"""Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.
Args:
blur_limit: maximum Gaussian kernel size for blurring the input image.
Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
If set single value `blur_limit` will be in range (0, blur_limit).
Default: (3, 7).
sigma_limit: Gaussian kernel standard deviation. Must be in range [0, inf).
If set single value `sigma_limit` will be in range (0, sigma_limit).
If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
alpha: range to choose the visibility of the sharpened image.
At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
Default: (0.2, 0.5).
threshold: Value to limit sharpening only for areas with high pixel difference between original image
and it's smoothed version. Higher threshold means less sharpening on flat areas.
Must be in range [0, 255]. Default: 10.
p: probability of applying the transform. Default: 0.5.
Reference:
arxiv.org/pdf/2107.10833.pdf
Targets:
image
"""
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigma_limit: ScaleFloatType = 0.0,
alpha: ScaleFloatType = (0.2, 0.5),
threshold: int = 10,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.blur_limit = cast(Tuple[int, int], to_tuple(blur_limit, 3))
self.sigma_limit = self.__check_values(to_tuple(sigma_limit, 0.0), name="sigma_limit")
self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
self.threshold = threshold
if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
self.blur_limit = 3, max(3, self.blur_limit[1])
msg = "blur_limit and sigma_limit minimum value can not be both equal to 0."
raise ValueError(msg)
if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
):
msg = "UnsharpMask supports only odd blur limits."
raise ValueError(msg)
@staticmethod
def __check_values(
value: Union[Tuple[int, int], Tuple[float, float]], name: str, bounds: Tuple[float, float] = (0, float("inf"))
) -> Tuple[float, float]:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{name} values should be between {bounds}")
return value
def get_params(self) -> Dict[str, Any]:
return {
"ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
"sigma": random.uniform(*self.sigma_limit),
"alpha": random.uniform(*self.alpha),
}
def apply(self, img: np.ndarray, ksize: int = 3, sigma: int = 0, alpha: float = 0.2, **params: Any) -> np.ndarray:
return F.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return "blur_limit", "sigma_limit", "alpha", "threshold"
utils
¶
def ensure_contiguous (func)
[view source on GitHub]¶
Ensure that input img is contiguous.
Source code in albumentations/augmentations/utils.py
def ensure_contiguous(
func: Callable[Concatenate[np.ndarray, P], np.ndarray],
) -> Callable[Concatenate[np.ndarray, P], np.ndarray]:
"""Ensure that input img is contiguous."""
@wraps(func)
def wrapped_function(img: np.ndarray, *args: P.args, **kwargs: P.kwargs) -> np.ndarray:
img = np.require(img, requirements=["C_CONTIGUOUS"])
return func(img, *args, **kwargs)
return wrapped_function
def get_opencv_dtype_from_numpy (value)
[view source on GitHub]¶
Return a corresponding OpenCV dtype for a numpy's dtype :param value: Input dtype of numpy array :return: Corresponding dtype for OpenCV
Source code in albumentations/augmentations/utils.py
def get_opencv_dtype_from_numpy(value: Union[np.ndarray, int, np.dtype, object]) -> int:
"""Return a corresponding OpenCV dtype for a numpy's dtype
:param value: Input dtype of numpy array
:return: Corresponding dtype for OpenCV
"""
if isinstance(value, np.ndarray):
value = value.dtype
return NPDTYPE_TO_OPENCV_DTYPE[value]
def preserve_channel_dim (func)
[view source on GitHub]¶
Preserve dummy channel dim.
Source code in albumentations/augmentations/utils.py
def preserve_channel_dim(
func: Callable[Concatenate[np.ndarray, P], np.ndarray],
) -> Callable[Concatenate[np.ndarray, P], np.ndarray]:
"""Preserve dummy channel dim."""
@wraps(func)
def wrapped_function(img: np.ndarray, *args: P.args, **kwargs: P.kwargs) -> np.ndarray:
shape = img.shape
result = func(img, *args, **kwargs)
if len(shape) == THREE and shape[-1] == 1 and len(result.shape) == TWO:
result = np.expand_dims(result, axis=-1)
return result
return wrapped_function
def preserve_shape (func)
[view source on GitHub]¶
Preserve shape of the image
Source code in albumentations/augmentations/utils.py
def preserve_shape(
func: Callable[Concatenate[np.ndarray, P], np.ndarray],
) -> Callable[Concatenate[np.ndarray, P], np.ndarray]:
"""Preserve shape of the image"""
@wraps(func)
def wrapped_function(img: np.ndarray, *args: P.args, **kwargs: P.kwargs) -> np.ndarray:
shape = img.shape
result = func(img, *args, **kwargs)
return result.reshape(shape)
return wrapped_function
core
special
¶
bbox_utils
¶
class BboxParams
(format, label_fields=None, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0, check_each_transform=True)
[view source on GitHub] ¶
Parameters of bounding boxes
Parameters:
Name | Type | Description |
---|---|---|
format | str | format of bounding boxes. Should be 'coco', 'pascal_voc', 'albumentations' or 'yolo'. The |
label_fields | list | list of fields that are joined with boxes, e.g labels. Should be same type as boxes. |
min_area | float | minimum area of a bounding box. All bounding boxes whose visible area in pixels is less than this value will be removed. Default: 0.0. |
min_visibility | float | minimum fraction of area for a bounding box to remain this box in list. Default: 0.0. |
min_width | float | Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0. |
min_height | float | Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0. |
check_each_transform | bool | if |
Source code in albumentations/core/bbox_utils.py
class BboxParams(Params):
"""Parameters of bounding boxes
Args:
format (str): format of bounding boxes. Should be 'coco', 'pascal_voc', 'albumentations' or 'yolo'.
The `coco` format
`[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
The `pascal_voc` format
`[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
The `albumentations` format
is like `pascal_voc`, but normalized,
in other words: `[x_min, y_min, x_max, y_max]`, e.g. [0.2, 0.3, 0.4, 0.5].
The `yolo` format
`[x, y, width, height]`, e.g. [0.1, 0.2, 0.3, 0.4];
`x`, `y` - normalized bbox center; `width`, `height` - normalized bbox width and height.
label_fields (list): list of fields that are joined with boxes, e.g labels.
Should be same type as boxes.
min_area (float): minimum area of a bounding box. All bounding boxes whose
visible area in pixels is less than this value will be removed. Default: 0.0.
min_visibility (float): minimum fraction of area for a bounding box
to remain this box in list. Default: 0.0.
min_width (float): Minimum width of a bounding box. All bounding boxes whose width is
less than this value will be removed. Default: 0.0.
min_height (float): Minimum height of a bounding box. All bounding boxes whose height is
less than this value will be removed. Default: 0.0.
check_each_transform (bool): if `True`, then bboxes will be checked after each dual transform.
Default: `True`
"""
def __init__(
self,
format: str,
label_fields: Optional[Sequence[str]] = None,
min_area: float = 0.0,
min_visibility: float = 0.0,
min_width: float = 0.0,
min_height: float = 0.0,
check_each_transform: bool = True,
):
super().__init__(format, label_fields)
self.min_area = min_area
self.min_visibility = min_visibility
self.min_width = min_width
self.min_height = min_height
self.check_each_transform = check_each_transform
def to_dict_private(self) -> Dict[str, Any]:
data = super().to_dict_private()
data.update(
{
"min_area": self.min_area,
"min_visibility": self.min_visibility,
"min_width": self.min_width,
"min_height": self.min_height,
"check_each_transform": self.check_each_transform,
}
)
return data
@classmethod
def is_serializable(cls) -> bool:
return True
@classmethod
def get_class_fullname(cls) -> str:
return "BboxParams"
def calculate_bbox_area (bbox, rows, cols)
[view source on GitHub]¶
Calculate the area of a bounding box in (fractional) pixels.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | A bounding box |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
float | Area in (fractional) pixels of the (denormalized) bounding box. |
Source code in albumentations/core/bbox_utils.py
def calculate_bbox_area(bbox: BoxType, rows: int, cols: int) -> float:
"""Calculate the area of a bounding box in (fractional) pixels.
Args:
bbox: A bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.
Return:
Area in (fractional) pixels of the (denormalized) bounding box.
"""
bbox = denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = bbox[:4]
return (x_max - x_min) * (y_max - y_min)
def check_bbox (bbox)
[view source on GitHub]¶
Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums
Source code in albumentations/core/bbox_utils.py
def check_bbox(bbox: BoxType) -> None:
"""Check if bbox boundaries are in range 0, 1 and minimums are lesser then maximums"""
for name, value in zip(["x_min", "y_min", "x_max", "y_max"], bbox[:4]):
if not 0 <= value <= 1 and not np.isclose(value, 0) and not np.isclose(value, 1):
raise ValueError(f"Expected {name} for bbox {bbox} to be in the range [0.0, 1.0], got {value}.")
x_min, y_min, x_max, y_max = bbox[:4]
if x_max <= x_min:
raise ValueError(f"x_max is less than or equal to x_min for bbox {bbox}.")
if y_max <= y_min:
raise ValueError(f"y_max is less than or equal to y_min for bbox {bbox}.")
def check_bboxes (bboxes)
[view source on GitHub]¶
Check if bboxes boundaries are in range 0, 1 and minimums are lesser then maximums
def convert_bbox_from_albumentations (bbox, target_format, rows, cols, check_validity=False)
[view source on GitHub]¶
Convert a bounding box from the format used by albumentations to a format, specified in target_format
.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | An albumentations bounding box |
target_format | str | required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'. |
rows | int | Image height. |
cols | int | Image width. |
check_validity | bool | Check if all boxes are valid boxes. |
Returns:
Type | Description |
---|---|
tuple | A bounding box. |
Note
The coco
format of a bounding box looks like [x_min, y_min, width, height]
, e.g. [97, 12, 150, 200]. The pascal_voc
format of a bounding box looks like [x_min, y_min, x_max, y_max]
, e.g. [97, 12, 247, 212]. The yolo
format of a bounding box looks like [x, y, width, height]
, e.g. [0.3, 0.1, 0.05, 0.07].
Exceptions:
Type | Description |
---|---|
ValueError | if |
Source code in albumentations/core/bbox_utils.py
def convert_bbox_from_albumentations(
bbox: BoxType, target_format: str, rows: int, cols: int, check_validity: bool = False
) -> BoxType:
"""Convert a bounding box from the format used by albumentations to a format, specified in `target_format`.
Args:
bbox: An albumentations bounding box `(x_min, y_min, x_max, y_max)`.
target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
rows: Image height.
cols: Image width.
check_validity: Check if all boxes are valid boxes.
Returns:
tuple: A bounding box.
Note:
The `coco` format of a bounding box looks like `[x_min, y_min, width, height]`, e.g. [97, 12, 150, 200].
The `pascal_voc` format of a bounding box looks like `[x_min, y_min, x_max, y_max]`, e.g. [97, 12, 247, 212].
The `yolo` format of a bounding box looks like `[x, y, width, height]`, e.g. [0.3, 0.1, 0.05, 0.07].
Raises:
ValueError: if `target_format` is not equal to `coco`, `pascal_voc` or `yolo`.
"""
if target_format not in {"coco", "pascal_voc", "yolo"}:
raise ValueError(
f"Unknown target_format {target_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'"
)
if check_validity:
check_bbox(bbox)
if target_format != "yolo":
bbox = denormalize_bbox(bbox, rows, cols)
if target_format == "coco":
(x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
width = x_max - x_min
height = y_max - y_min
bbox = cast(BoxType, (x_min, y_min, width, height, *tail))
elif target_format == "yolo":
(x_min, y_min, x_max, y_max), tail = bbox[:4], bbox[4:]
x = (x_min + x_max) / 2.0
y = (y_min + y_max) / 2.0
w = x_max - x_min
h = y_max - y_min
bbox = cast(BoxType, (x, y, w, h, *tail))
return bbox
def convert_bbox_to_albumentations (bbox, source_format, rows, cols, check_validity=False)
[view source on GitHub]¶
Convert a bounding box from a format specified in source_format
to the format used by albumentations: normalized coordinates of top-left and bottom-right corners of the bounding box in a form of (x_min, y_min, x_max, y_max)
e.g. (0.15, 0.27, 0.67, 0.5)
.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | A bounding box tuple. |
source_format | str | format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'. |
check_validity | bool | Check if all boxes are valid boxes. |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
tuple | A bounding box |
Note
The coco
format of a bounding box looks like (x_min, y_min, width, height)
, e.g. (97, 12, 150, 200). The pascal_voc
format of a bounding box looks like (x_min, y_min, x_max, y_max)
, e.g. (97, 12, 247, 212). The yolo
format of a bounding box looks like (x, y, width, height)
, e.g. (0.3, 0.1, 0.05, 0.07); where x
, y
coordinates of the center of the box, all values normalized to 1 by image height and width.
Exceptions:
Type | Description |
---|---|
ValueError | if |
ValueError | If in YOLO format all labels not in range (0, 1). |
Source code in albumentations/core/bbox_utils.py
def convert_bbox_to_albumentations(
bbox: BoxType, source_format: str, rows: int, cols: int, check_validity: bool = False
) -> BoxType:
"""Convert a bounding box from a format specified in `source_format` to the format used by albumentations:
normalized coordinates of top-left and bottom-right corners of the bounding box in a form of
`(x_min, y_min, x_max, y_max)` e.g. `(0.15, 0.27, 0.67, 0.5)`.
Args:
bbox: A bounding box tuple.
source_format: format of the bounding box. Should be 'coco', 'pascal_voc', or 'yolo'.
check_validity: Check if all boxes are valid boxes.
rows: Image height.
cols: Image width.
Returns:
tuple: A bounding box `(x_min, y_min, x_max, y_max)`.
Note:
The `coco` format of a bounding box looks like `(x_min, y_min, width, height)`, e.g. (97, 12, 150, 200).
The `pascal_voc` format of a bounding box looks like `(x_min, y_min, x_max, y_max)`, e.g. (97, 12, 247, 212).
The `yolo` format of a bounding box looks like `(x, y, width, height)`, e.g. (0.3, 0.1, 0.05, 0.07);
where `x`, `y` coordinates of the center of the box, all values normalized to 1 by image height and width.
Raises:
ValueError: if `target_format` is not equal to `coco` or `pascal_voc`, or `yolo`.
ValueError: If in YOLO format all labels not in range (0, 1).
"""
if source_format not in {"coco", "pascal_voc", "yolo"}:
raise ValueError(
f"Unknown source_format {source_format}. Supported formats are: 'coco', 'pascal_voc' and 'yolo'"
)
if source_format == "coco":
(x_min, y_min, width, height), tail = bbox[:4], bbox[4:]
x_max = x_min + width
y_max = y_min + height
elif source_format == "yolo":
# https://github.com/pjreddie/darknet/blob/f6d861736038da22c9eb0739dca84003c5a5e275/scripts/voc_label.py#L12
_bbox = np.array(bbox[:4])
if check_validity and np.any((_bbox <= 0) | (_bbox > 1)):
msg = "In YOLO format all coordinates must be float and in range (0, 1]"
raise ValueError(msg)
(x, y, w, h), tail = bbox[:4], bbox[4:]
w_half, h_half = w / 2, h / 2
x_min = x - w_half
y_min = y - h_half
x_max = x_min + w
y_max = y_min + h
else:
(x_min, y_min, x_max, y_max), tail = bbox[:4], bbox[4:]
bbox = (x_min, y_min, x_max, y_max, *tuple(tail))
if source_format != "yolo":
bbox = normalize_bbox(bbox, rows, cols)
if check_validity:
check_bbox(bbox)
return bbox
def convert_bboxes_from_albumentations (bboxes, target_format, rows, cols, check_validity=False)
[view source on GitHub]¶
Convert a list of bounding boxes from the format used by albumentations to a format, specified in target_format
.
Parameters:
Name | Type | Description |
---|---|---|
bboxes | Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | List of albumentations bounding box |
target_format | str | required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'. |
rows | int | Image height. |
cols | int | Image width. |
check_validity | bool | Check if all boxes are valid boxes. |
Returns:
Type | Description |
---|---|
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | List of bounding boxes. |
Source code in albumentations/core/bbox_utils.py
def convert_bboxes_from_albumentations(
bboxes: Sequence[BoxType], target_format: str, rows: int, cols: int, check_validity: bool = False
) -> List[BoxType]:
"""Convert a list of bounding boxes from the format used by albumentations to a format, specified
in `target_format`.
Args:
bboxes: List of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
target_format: required format of the output bounding box. Should be 'coco', 'pascal_voc' or 'yolo'.
rows: Image height.
cols: Image width.
check_validity: Check if all boxes are valid boxes.
Returns:
List of bounding boxes.
"""
return [convert_bbox_from_albumentations(bbox, target_format, rows, cols, check_validity) for bbox in bboxes]
def convert_bboxes_to_albumentations (bboxes, source_format, rows, cols, check_validity=False)
[view source on GitHub]¶
Convert a list bounding boxes from a format specified in source_format
to the format used by albumentations
Source code in albumentations/core/bbox_utils.py
def convert_bboxes_to_albumentations(
bboxes: Sequence[BoxType], source_format: str, rows: int, cols: int, check_validity: bool = False
) -> List[BoxType]:
"""Convert a list bounding boxes from a format specified in `source_format` to the format used by albumentations"""
return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
def denormalize_bbox (bbox, rows, cols)
[view source on GitHub]¶
Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates by image height. This is an inverse operation for :func:~albumentations.augmentations.bbox.normalize_bbox
.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | Normalized bounding box |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | Denormalized bounding box |
Exceptions:
Type | Description |
---|---|
ValueError | If rows or cols is less or equal zero |
Source code in albumentations/core/bbox_utils.py
def denormalize_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
"""Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates
by image height. This is an inverse operation for :func:`~albumentations.augmentations.bbox.normalize_bbox`.
Args:
bbox: Normalized bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.
Returns:
Denormalized bounding box `(x_min, y_min, x_max, y_max)`.
Raises:
ValueError: If rows or cols is less or equal zero
"""
tail: Tuple[Any, ...]
(x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
if rows <= 0:
msg = "Argument rows must be positive integer"
raise ValueError(msg)
if cols <= 0:
msg = "Argument cols must be positive integer"
raise ValueError(msg)
x_min, x_max = x_min * cols, x_max * cols
y_min, y_max = y_min * rows, y_max * rows
return cast(BoxType, (x_min, y_min, x_max, y_max, *tail))
def denormalize_bboxes (bboxes, rows, cols)
[view source on GitHub]¶
Denormalize a list of bounding boxes.
Parameters:
Name | Type | Description |
---|---|---|
bboxes | Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | Normalized bounding boxes |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
List | Denormalized bounding boxes |
Source code in albumentations/core/bbox_utils.py
def denormalize_bboxes(bboxes: Sequence[BoxType], rows: int, cols: int) -> List[BoxType]:
"""Denormalize a list of bounding boxes.
Args:
bboxes: Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
rows: Image height.
cols: Image width.
Returns:
List: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
"""
return [denormalize_bbox(bbox, rows, cols) for bbox in bboxes]
def filter_bboxes (bboxes, rows, cols, min_area=0.0, min_visibility=0.0, min_width=0.0, min_height=0.0)
[view source on GitHub]¶
Remove bounding boxes that either lie outside of the visible area by more then min_visibility or whose area in pixels is under the threshold set by min_area
. Also it crops boxes to final image size.
Parameters:
Name | Type | Description |
---|---|---|
bboxes | Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | List of albumentations bounding box |
rows | int | Image height. |
cols | int | Image width. |
min_area | float | Minimum area of a bounding box. All bounding boxes whose visible area in pixels. is less than this value will be removed. Default: 0.0. |
min_visibility | float | Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0. |
min_width | float | Minimum width of a bounding box. All bounding boxes whose width is less than this value will be removed. Default: 0.0. |
min_height | float | Minimum height of a bounding box. All bounding boxes whose height is less than this value will be removed. Default: 0.0. |
Returns:
Type | Description |
---|---|
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | List of bounding boxes. |
Source code in albumentations/core/bbox_utils.py
def filter_bboxes(
bboxes: Sequence[BoxType],
rows: int,
cols: int,
min_area: float = 0.0,
min_visibility: float = 0.0,
min_width: float = 0.0,
min_height: float = 0.0,
) -> List[BoxType]:
"""Remove bounding boxes that either lie outside of the visible area by more then min_visibility
or whose area in pixels is under the threshold set by `min_area`. Also it crops boxes to final image size.
Args:
bboxes: List of albumentations bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.
min_area: Minimum area of a bounding box. All bounding boxes whose visible area in pixels.
is less than this value will be removed. Default: 0.0.
min_visibility: Minimum fraction of area for a bounding box to remain this box in list. Default: 0.0.
min_width: Minimum width of a bounding box. All bounding boxes whose width is
less than this value will be removed. Default: 0.0.
min_height: Minimum height of a bounding box. All bounding boxes whose height is
less than this value will be removed. Default: 0.0.
Returns:
List of bounding boxes.
"""
resulting_boxes: List[BoxType] = []
for i in range(len(bboxes)):
bbox = bboxes[i]
# Calculate areas of bounding box before and after clipping.
transformed_box_area = calculate_bbox_area(bbox, rows, cols)
bbox, tail = cast(BoxType, tuple(np.clip(bbox[:4], 0, 1.0))), tuple(bbox[4:])
clipped_box_area = calculate_bbox_area(bbox, rows, cols)
# Calculate width and height of the clipped bounding box.
x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
clipped_width, clipped_height = x_max - x_min, y_max - y_min
if (
clipped_box_area != 0 # to ensure transformed_box_area!=0 and to handle min_area=0 or min_visibility=0
and clipped_box_area >= min_area
and clipped_box_area / transformed_box_area >= min_visibility
and clipped_width >= min_width
and clipped_height >= min_height
):
resulting_boxes.append(cast(BoxType, bbox + tail))
return resulting_boxes
def filter_bboxes_by_visibility (original_shape, bboxes, transformed_shape, transformed_bboxes, threshold=0.0, min_area=0.0)
[view source on GitHub]¶
Filter bounding boxes and return only those boxes whose visibility after transformation is above the threshold and minimal area of bounding box in pixels is more then min_area.
Parameters:
Name | Type | Description |
---|---|---|
original_shape | Sequence[int] | Original image shape |
bboxes | Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | Original bounding boxes |
transformed_shape | Sequence[int] | Transformed image shape |
transformed_bboxes | Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | Transformed bounding boxes |
threshold | float | visibility threshold. Should be a value in the range [0.0, 1.0]. |
min_area | float | Minimal area threshold. |
Returns:
Type | Description |
---|---|
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | Filtered bounding boxes |
Source code in albumentations/core/bbox_utils.py
def filter_bboxes_by_visibility(
original_shape: Sequence[int],
bboxes: Sequence[BoxType],
transformed_shape: Sequence[int],
transformed_bboxes: Sequence[BoxType],
threshold: float = 0.0,
min_area: float = 0.0,
) -> List[BoxType]:
"""Filter bounding boxes and return only those boxes whose visibility after transformation is above
the threshold and minimal area of bounding box in pixels is more then min_area.
Args:
original_shape: Original image shape `(height, width, ...)`.
bboxes: Original bounding boxes `[(x_min, y_min, x_max, y_max)]`.
transformed_shape: Transformed image shape `(height, width)`.
transformed_bboxes: Transformed bounding boxes `[(x_min, y_min, x_max, y_max)]`.
threshold: visibility threshold. Should be a value in the range [0.0, 1.0].
min_area: Minimal area threshold.
Returns:
Filtered bounding boxes `[(x_min, y_min, x_max, y_max)]`.
"""
img_height, img_width = original_shape[:2]
transformed_img_height, transformed_img_width = transformed_shape[:2]
visible_bboxes = []
for bbox, transformed_bbox in zip(bboxes, transformed_bboxes):
if not all(0.0 <= value <= 1.0 for value in transformed_bbox[:4]):
continue
bbox_area = calculate_bbox_area(bbox, img_height, img_width)
transformed_bbox_area = calculate_bbox_area(transformed_bbox, transformed_img_height, transformed_img_width)
if transformed_bbox_area < min_area:
continue
visibility = transformed_bbox_area / bbox_area
if visibility >= threshold:
visible_bboxes.append(transformed_bbox)
return visible_bboxes
def normalize_bbox (bbox, rows, cols)
[view source on GitHub]¶
Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates by image height.
Parameters:
Name | Type | Description |
---|---|---|
bbox | Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | Denormalized bounding box |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]] | Normalized bounding box |
Exceptions:
Type | Description |
---|---|
ValueError | If rows or cols is less or equal zero |
Source code in albumentations/core/bbox_utils.py
def normalize_bbox(bbox: BoxType, rows: int, cols: int) -> BoxType:
"""Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates
by image height.
Args:
bbox: Denormalized bounding box `(x_min, y_min, x_max, y_max)`.
rows: Image height.
cols: Image width.
Returns:
Normalized bounding box `(x_min, y_min, x_max, y_max)`.
Raises:
ValueError: If rows or cols is less or equal zero
"""
if rows <= 0:
msg = "Argument rows must be positive integer"
raise ValueError(msg)
if cols <= 0:
msg = "Argument cols must be positive integer"
raise ValueError(msg)
tail: Tuple[Any, ...]
(x_min, y_min, x_max, y_max), tail = bbox[:4], tuple(bbox[4:])
x_min /= cols
x_max /= cols
y_min /= rows
y_max /= rows
return cast(BoxType, (x_min, y_min, x_max, y_max, *tail))
def normalize_bboxes (bboxes, rows, cols)
[view source on GitHub]¶
Normalize a list of bounding boxes.
Parameters:
Name | Type | Description |
---|---|---|
bboxes | Sequence[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | Denormalized bounding boxes |
rows | int | Image height. |
cols | int | Image width. |
Returns:
Type | Description |
---|---|
List[Union[Tuple[float, float, float, float], Tuple[float, float, float, float, Any]]] | Normalized bounding boxes |
Source code in albumentations/core/bbox_utils.py
def normalize_bboxes(bboxes: Sequence[BoxType], rows: int, cols: int) -> List[BoxType]:
"""Normalize a list of bounding boxes.
Args:
bboxes: Denormalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
rows: Image height.
cols: Image width.
Returns:
Normalized bounding boxes `[(x_min, y_min, x_max, y_max)]`.
"""
return [normalize_bbox(bbox, rows, cols) for bbox in bboxes]
def union_of_bboxes (height, width, bboxes, erosion_rate=0.0)
[view source on GitHub]¶
Calculate union of bounding boxes.
Parameters:
Name | Type | Description |
---|---|---|
height | float | Height of image or space. |
width | float | Width of image or space. |
bboxes | List[tuple] | List like bounding boxes. Format is |
erosion_rate | float | How much each bounding box can be shrunk, useful for erosive cropping. Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox to lose its volume. |
Returns:
Type | Description |
---|---|
tuple | A bounding box |
Source code in albumentations/core/bbox_utils.py
def union_of_bboxes(height: int, width: int, bboxes: Sequence[BoxType], erosion_rate: float = 0.0) -> BoxInternalType:
"""Calculate union of bounding boxes.
Args:
height (float): Height of image or space.
width (float): Width of image or space.
bboxes (List[tuple]): List like bounding boxes. Format is `[(x_min, y_min, x_max, y_max)]`.
erosion_rate (float): How much each bounding box can be shrunk, useful for erosive cropping.
Set this in range [0, 1]. 0 will not be erosive at all, 1.0 can make any bbox to lose its volume.
Returns:
tuple: A bounding box `(x_min, y_min, x_max, y_max)`.
"""
x1, y1 = width, height
x2, y2 = 0, 0
for bbox in bboxes:
x_min, y_min, x_max, y_max = bbox[:4]
w, h = x_max - x_min, y_max - y_min
lim_x1, lim_y1 = x_min + erosion_rate * w, y_min + erosion_rate * h
lim_x2, lim_y2 = x_max - erosion_rate * w, y_max - erosion_rate * h
x1, y1 = np.min([x1, lim_x1]), np.min([y1, lim_y1])
x2, y2 = np.max([x2, lim_x2]), np.max([y2, lim_y2])
return x1, y1, x2, y2
composition
¶
class Compose
(transforms, bbox_params=None, keypoint_params=None, additional_targets=None, p=1.0, is_check_shapes=True)
[view source on GitHub] ¶
Compose transforms and handle all transformations regarding bounding boxes
Parameters:
Name | Type | Description |
---|---|---|
transforms | list | list of transformations to compose. |
bbox_params | BboxParams | Parameters for bounding boxes transforms |
keypoint_params | KeypointParams | Parameters for keypoints transforms |
additional_targets | dict | Dict with keys - new target name, values - old target name. ex: {'image2': 'image'} |
p | float | probability of applying all list of transforms. Default: 1.0. |
is_check_shapes | bool | If True shapes consistency of images/mask/masks would be checked on each call. If you would like to disable this check - pass False (do it only if you are sure in your data consistency). |
Source code in albumentations/core/composition.py
class Compose(BaseCompose):
"""Compose transforms and handle all transformations regarding bounding boxes
Args:
transforms (list): list of transformations to compose.
bbox_params (BboxParams): Parameters for bounding boxes transforms
keypoint_params (KeypointParams): Parameters for keypoints transforms
additional_targets (dict): Dict with keys - new target name, values - old target name. ex: {'image2': 'image'}
p (float): probability of applying all list of transforms. Default: 1.0.
is_check_shapes (bool): If True shapes consistency of images/mask/masks would be checked on each call. If you
would like to disable this check - pass False (do it only if you are sure in your data consistency).
"""
def __init__(
self,
transforms: TransformsSeqType,
bbox_params: Optional[Union[Dict[str, Any], "BboxParams"]] = None,
keypoint_params: Optional[Union[Dict[str, Any], "KeypointParams"]] = None,
additional_targets: Optional[Dict[str, str]] = None,
p: float = 1.0,
is_check_shapes: bool = True,
):
super().__init__(transforms, p)
self.processors: Dict[str, Union[BboxProcessor, KeypointsProcessor]] = {}
if bbox_params:
if isinstance(bbox_params, dict):
b_params = BboxParams(**bbox_params)
elif isinstance(bbox_params, BboxParams):
b_params = bbox_params
else:
msg = "unknown format of bbox_params, please use `dict` or `BboxParams`"
raise ValueError(msg)
self.processors["bboxes"] = BboxProcessor(b_params, additional_targets)
if keypoint_params:
if isinstance(keypoint_params, dict):
k_params = KeypointParams(**keypoint_params)
elif isinstance(keypoint_params, KeypointParams):
k_params = keypoint_params
else:
msg = "unknown format of keypoint_params, please use `dict` or `KeypointParams`"
raise ValueError(msg)
self.processors["keypoints"] = KeypointsProcessor(k_params, additional_targets)
if additional_targets is None:
additional_targets = {}
self.additional_targets = additional_targets
for proc in self.processors.values():
proc.ensure_transforms_valid(self.transforms)
self.add_targets(additional_targets)
self.is_check_args = True
self._disable_check_args_for_transforms(self.transforms)
self.is_check_shapes = is_check_shapes
@staticmethod
def _disable_check_args_for_transforms(transforms: TransformsSeqType) -> None:
for transform in transforms:
if isinstance(transform, BaseCompose):
Compose._disable_check_args_for_transforms(transform.transforms)
if isinstance(transform, Compose):
transform.disable_check_args_private()
def disable_check_args_private(self) -> None:
self.is_check_args = False
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
if args:
msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
raise KeyError(msg)
if self.is_check_args:
self._check_args(**data)
if not isinstance(force_apply, (bool, int)):
msg = "force_apply must have bool or int type"
raise TypeError(msg)
need_to_run = force_apply or random.random() < self.p
for p in self.processors.values():
p.ensure_data_valid(data)
transforms = self.transforms if need_to_run else get_always_apply(self.transforms)
check_each_transform = any(
getattr(item.params, "check_each_transform", False) for item in self.processors.values()
)
for p in self.processors.values():
p.preprocess(data)
for t in transforms:
data = t(**data)
if check_each_transform:
data = self._check_data_post_transform(data)
data = Compose._make_targets_contiguous(data) # ensure output targets are contiguous
for p in self.processors.values():
p.postprocess(data)
return data
def _check_data_post_transform(self, data: Any) -> Dict[str, Any]:
rows, cols = get_shape(data["image"])
for p in self.processors.values():
if not getattr(p.params, "check_each_transform", False):
continue
for data_name in p.data_fields:
data[data_name] = p.filter(data[data_name], rows, cols)
return data
def to_dict_private(self) -> Dict[str, Any]:
dictionary = super().to_dict_private()
bbox_processor = self.processors.get("bboxes")
keypoints_processor = self.processors.get("keypoints")
dictionary.update(
{
"bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
"keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
"additional_targets": self.additional_targets,
"is_check_shapes": self.is_check_shapes,
}
)
return dictionary
def get_dict_with_id(self) -> Dict[str, Any]:
dictionary = super().get_dict_with_id()
bbox_processor = self.processors.get("bboxes")
keypoints_processor = self.processors.get("keypoints")
dictionary.update(
{
"bbox_params": bbox_processor.params.to_dict_private() if bbox_processor else None,
"keypoint_params": (keypoints_processor.params.to_dict_private() if keypoints_processor else None),
"additional_targets": self.additional_targets,
"params": None,
"is_check_shapes": self.is_check_shapes,
}
)
return dictionary
def _check_args(self, **kwargs: Any) -> None:
checked_single = ["image", "mask"]
checked_multi = ["masks"]
check_bbox_param = ["bboxes"]
shapes = []
for data_name, data in kwargs.items():
internal_data_name = self.additional_targets.get(data_name, data_name)
if internal_data_name in checked_single:
if not isinstance(data, np.ndarray):
raise TypeError(f"{data_name} must be numpy array type")
shapes.append(data.shape[:2])
if internal_data_name in checked_multi and data is not None and len(data):
if not isinstance(data[0], np.ndarray):
raise TypeError(f"{data_name} must be list of numpy arrays")
shapes.append(data[0].shape[:2])
if internal_data_name in check_bbox_param and self.processors.get("bboxes") is None:
msg = "bbox_params must be specified for bbox transformations"
raise ValueError(msg)
if self.is_check_shapes and shapes and shapes.count(shapes[0]) != len(shapes):
msg = (
"Height and Width of image, mask or masks should be equal. You can disable shapes check "
"by setting a parameter is_check_shapes=False of Compose class (do it only if you are sure "
"about your data consistency)."
)
raise ValueError(msg)
@staticmethod
def _make_targets_contiguous(data: Any) -> Dict[str, Any]:
result = {}
for key, value in data.items():
if isinstance(value, np.ndarray):
result[key] = np.ascontiguousarray(value)
else:
result[key] = value
return result
class OneOf
(transforms, p=0.5)
[view source on GitHub] ¶
Select one of transforms to apply. Selected transform will be called with force_apply=True
. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.
Parameters:
Name | Type | Description |
---|---|---|
transforms | list | list of transformations to compose. |
p | float | probability of applying selected transform. Default: 0.5. |
Source code in albumentations/core/composition.py
class OneOf(BaseCompose):
"""Select one of transforms to apply. Selected transform will be called with `force_apply=True`.
Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.
Args:
transforms (list): list of transformations to compose.
p (float): probability of applying selected transform. Default: 0.5.
"""
def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
super().__init__(transforms, p)
transforms_ps = [t.p for t in self.transforms]
s = sum(transforms_ps)
self.transforms_ps = [t / s for t in transforms_ps]
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
if self.replay_mode:
for t in self.transforms:
data = t(**data)
return data
if self.transforms_ps and (force_apply or random.random() < self.p):
idx: int = random_utils.choice(len(self.transforms), p=self.transforms_ps)
t = self.transforms[idx]
data = t(force_apply=True, **data)
return data
class OneOrOther
(first=None, second=None, transforms=None, p=0.5)
[view source on GitHub] ¶
Select one or another transform to apply. Selected transform will be called with force_apply=True
.
Source code in albumentations/core/composition.py
class OneOrOther(BaseCompose):
"""Select one or another transform to apply. Selected transform will be called with `force_apply=True`."""
def __init__(
self,
first: Optional[TransformType] = None,
second: Optional[TransformType] = None,
transforms: Optional[TransformsSeqType] = None,
p: float = 0.5,
):
if transforms is None:
if first is None or second is None:
msg = "You must set both first and second or set transforms argument."
raise ValueError(msg)
transforms = [first, second]
super().__init__(transforms, p)
if len(self.transforms) != TWO:
warnings.warn("Length of transforms is not equal to 2.")
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
if self.replay_mode:
for t in self.transforms:
data = t(**data)
return data
if random.random() < self.p:
return self.transforms[0](force_apply=True, **data)
return self.transforms[-1](force_apply=True, **data)
class PerChannel
(transforms, channels=None, p=0.5)
[view source on GitHub] ¶
Apply transformations per-channel
Parameters:
Name | Type | Description |
---|---|---|
transforms | list | list of transformations to compose. |
channels | sequence | channels to apply the transform to. Pass None to apply to all. |
Default | None (apply to all) | |
p | float | probability of applying the transform. Default: 0.5. |
Source code in albumentations/core/composition.py
class PerChannel(BaseCompose):
"""Apply transformations per-channel
Args:
transforms (list): list of transformations to compose.
channels (sequence): channels to apply the transform to. Pass None to apply to all.
Default: None (apply to all)
p (float): probability of applying the transform. Default: 0.5.
"""
def __init__(self, transforms: TransformsSeqType, channels: Optional[Sequence[int]] = None, p: float = 0.5):
super().__init__(transforms, p)
self.channels = channels
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
if force_apply or random.random() < self.p:
image = data["image"]
# Expand mono images to have a single channel
if len(image.shape) == TWO:
image = np.expand_dims(image, -1)
if self.channels is None:
self.channels = range(image.shape[2])
for c in self.channels:
for t in self.transforms:
image[:, :, c] = t(image=image[:, :, c])["image"]
data["image"] = image
return data
class Sequential
(transforms, p=0.5)
[view source on GitHub] ¶
Sequentially applies all transforms to targets.
Note
This transform is not intended to be a replacement for Compose
. Instead, it should be used inside Compose
the same way OneOf
or OneOrOther
are used. For instance, you can combine OneOf
with Sequential
to create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly chose sequence to input data (see the Example
section for an example definition of such pipeline).
Examples:
>>> import albumentations as A
>>> transform = A.Compose([
>>> A.OneOf([
>>> A.Sequential([
>>> A.HorizontalFlip(p=0.5),
>>> A.ShiftScaleRotate(p=0.5),
>>> ]),
>>> A.Sequential([
>>> A.VerticalFlip(p=0.5),
>>> A.RandomBrightnessContrast(p=0.5),
>>> ]),
>>> ], p=1)
>>> ])
Source code in albumentations/core/composition.py
class Sequential(BaseCompose):
"""Sequentially applies all transforms to targets.
Note:
This transform is not intended to be a replacement for `Compose`. Instead, it should be used inside `Compose`
the same way `OneOf` or `OneOrOther` are used. For instance, you can combine `OneOf` with `Sequential` to
create an augmentation pipeline that contains multiple sequences of augmentations and applies one randomly
chose sequence to input data (see the `Example` section for an example definition of such pipeline).
Example:
>>> import albumentations as A
>>> transform = A.Compose([
>>> A.OneOf([
>>> A.Sequential([
>>> A.HorizontalFlip(p=0.5),
>>> A.ShiftScaleRotate(p=0.5),
>>> ]),
>>> A.Sequential([
>>> A.VerticalFlip(p=0.5),
>>> A.RandomBrightnessContrast(p=0.5),
>>> ]),
>>> ], p=1)
>>> ])
"""
def __init__(self, transforms: TransformsSeqType, p: float = 0.5):
super().__init__(transforms, p)
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
for t in self.transforms:
data = t(**data)
return data
class SomeOf
(transforms, n, replace=True, p=1)
[view source on GitHub] ¶
Select N transforms to apply. Selected transforms will be called with force_apply=True
. Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.
Parameters:
Name | Type | Description |
---|---|---|
transforms | list | list of transformations to compose. |
n | int | number of transforms to apply. |
replace | bool | Whether the sampled transforms are with or without replacement. Default: True. |
p | float | probability of applying selected transform. Default: 1. |
Source code in albumentations/core/composition.py
class SomeOf(BaseCompose):
"""Select N transforms to apply. Selected transforms will be called with `force_apply=True`.
Transforms probabilities will be normalized to one 1, so in this case transforms probabilities works as weights.
Args:
transforms (list): list of transformations to compose.
n (int): number of transforms to apply.
replace (bool): Whether the sampled transforms are with or without replacement. Default: True.
p (float): probability of applying selected transform. Default: 1.
"""
def __init__(self, transforms: TransformsSeqType, n: int, replace: bool = True, p: float = 1):
super().__init__(transforms, p)
self.n = n
self.replace = replace
transforms_ps = [t.p for t in self.transforms]
s = sum(transforms_ps)
self.transforms_ps = [t / s for t in transforms_ps]
def __call__(self, *arg: Any, force_apply: bool = False, **data: Any) -> Dict[str, Any]:
if self.replay_mode:
for t in self.transforms:
data = t(**data)
return data
if self.transforms_ps and (force_apply or random.random() < self.p):
idx = random_utils.choice(len(self.transforms), size=self.n, replace=self.replace, p=self.transforms_ps)
for i in idx:
t = self.transforms[i]
data = t(force_apply=True, **data)
return data
def to_dict_private(self) -> Dict[str, Any]:
dictionary = super().to_dict_private()
dictionary.update({"n": self.n, "replace": self.replace})
return dictionary
keypoints_utils
¶
class KeypointParams
(format, label_fields=None, remove_invisible=True, angle_in_degrees=True, check_each_transform=True)
[view source on GitHub] ¶
Parameters of keypoints
Parameters:
Name | Type | Description |
---|---|---|
format | str | format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'. x - X coordinate, y - Y coordinate s - Keypoint scale a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees) |
label_fields | list | list of fields that are joined with keypoints, e.g labels. Should be same type as keypoints. |
remove_invisible | bool | to remove invisible points after transform or not |
angle_in_degrees | bool | angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints |
check_each_transform | bool | if |
Source code in albumentations/core/keypoints_utils.py
class KeypointParams(Params):
"""Parameters of keypoints
Args:
format (str): format of keypoints. Should be 'xy', 'yx', 'xya', 'xys', 'xyas', 'xysa'.
x - X coordinate,
y - Y coordinate
s - Keypoint scale
a - Keypoint orientation in radians or degrees (depending on KeypointParams.angle_in_degrees)
label_fields (list): list of fields that are joined with keypoints, e.g labels.
Should be same type as keypoints.
remove_invisible (bool): to remove invisible points after transform or not
angle_in_degrees (bool): angle in degrees or radians in 'xya', 'xyas', 'xysa' keypoints
check_each_transform (bool): if `True`, then keypoints will be checked after each dual transform.
Default: `True`
"""
def __init__(
self,
format: str,
label_fields: Optional[Sequence[str]] = None,
remove_invisible: bool = True,
angle_in_degrees: bool = True,
check_each_transform: bool = True,
):
super().__init__(format, label_fields)
self.remove_invisible = remove_invisible
self.angle_in_degrees = angle_in_degrees
self.check_each_transform = check_each_transform
def to_dict_private(self) -> Dict[str, Any]:
data = super().to_dict_private()
data.update(
{
"remove_invisible": self.remove_invisible,
"angle_in_degrees": self.angle_in_degrees,
"check_each_transform": self.check_each_transform,
}
)
return data
@classmethod
def is_serializable(cls) -> bool:
return True
@classmethod
def get_class_fullname(cls) -> str:
return "KeypointParams"
class KeypointsProcessor
(params, additional_targets=None)
[view source on GitHub] ¶
Source code in albumentations/core/keypoints_utils.py
class KeypointsProcessor(DataProcessor):
def __init__(self, params: KeypointParams, additional_targets: Optional[Dict[str, str]] = None):
super().__init__(params, additional_targets)
@property
def default_data_name(self) -> str:
return "keypoints"
def ensure_data_valid(self, data: Dict[str, Any]) -> None:
if self.params.label_fields and not all(i in data for i in self.params.label_fields):
msg = "Your 'label_fields' are not valid - them must have same names as params in " "'keypoint_params' dict"
raise ValueError(msg)
def filter(self, data: Sequence[KeypointType], rows: int, cols: int) -> Sequence[KeypointType]:
"""The function filters a sequence of data based on the number of rows and columns, and returns a
sequence of keypoints.
:param data: The `data` parameter is a sequence of sequences. Each inner sequence represents a
set of keypoints
:type data: Sequence[Sequence]
:param rows: The `rows` parameter represents the number of rows in the data matrix. It specifies
the number of rows that will be used for filtering the keypoints
:type rows: int
:param cols: The parameter "cols" represents the number of columns in the grid that the
keypoints will be filtered on
:type cols: int
:return: a sequence of KeypointType objects.
"""
self.params: KeypointParams
return filter_keypoints(data, rows, cols, remove_invisible=self.params.remove_invisible)
def check(self, data: Sequence[KeypointType], rows: int, cols: int) -> None:
check_keypoints(data, rows, cols)
def convert_from_albumentations(self, data: Sequence[KeypointType], rows: int, cols: int) -> List[KeypointType]:
params = self.params
return convert_keypoints_from_albumentations(
data,
params.format,
rows,
cols,
check_validity=params.remove_invisible,
angle_in_degrees=params.angle_in_degrees,
)
def convert_to_albumentations(self, data: Sequence[KeypointType], rows: int, cols: int) -> List[KeypointType]:
params = self.params
return convert_keypoints_to_albumentations(
data,
params.format,
rows,
cols,
check_validity=params.remove_invisible,
angle_in_degrees=params.angle_in_degrees,
)
filter (self, data, rows, cols)
¶
The function filters a sequence of data based on the number of rows and columns, and returns a sequence of keypoints.
:param data: The data
parameter is a sequence of sequences. Each inner sequence represents a set of keypoints :type data: Sequence[Sequence] :param rows: The rows
parameter represents the number of rows in the data matrix. It specifies the number of rows that will be used for filtering the keypoints :type rows: int :param cols: The parameter "cols" represents the number of columns in the grid that the keypoints will be filtered on :type cols: int :return: a sequence of KeypointType objects.
Source code in albumentations/core/keypoints_utils.py
def filter(self, data: Sequence[KeypointType], rows: int, cols: int) -> Sequence[KeypointType]:
"""The function filters a sequence of data based on the number of rows and columns, and returns a
sequence of keypoints.
:param data: The `data` parameter is a sequence of sequences. Each inner sequence represents a
set of keypoints
:type data: Sequence[Sequence]
:param rows: The `rows` parameter represents the number of rows in the data matrix. It specifies
the number of rows that will be used for filtering the keypoints
:type rows: int
:param cols: The parameter "cols" represents the number of columns in the grid that the
keypoints will be filtered on
:type cols: int
:return: a sequence of KeypointType objects.
"""
self.params: KeypointParams
return filter_keypoints(data, rows, cols, remove_invisible=self.params.remove_invisible)
def check_keypoint (kp, rows, cols)
[view source on GitHub]¶
Check if keypoint coordinates are less than image shapes
Source code in albumentations/core/keypoints_utils.py
def check_keypoint(kp: KeypointType, rows: int, cols: int) -> None:
"""Check if keypoint coordinates are less than image shapes"""
for name, value, size in zip(["x", "y"], kp[:2], [cols, rows]):
if not 0 <= value < size:
raise ValueError(f"Expected {name} for keypoint {kp} " f"to be in the range [0.0, {size}], got {value}.")
angle = kp[2]
if not (0 <= angle < 2 * math.pi):
raise ValueError(f"Keypoint angle must be in range [0, 2 * PI). Got: {angle}")
def check_keypoints (keypoints, rows, cols)
[view source on GitHub]¶
Check if keypoints boundaries are less than image shapes
serialization
¶
class Serializable
[view source on GitHub] ¶
Source code in albumentations/core/serialization.py
class Serializable(metaclass=SerializableMeta):
@classmethod
@abstractmethod
def is_serializable(cls) -> bool:
raise NotImplementedError
@classmethod
@abstractmethod
def get_class_fullname(cls) -> str:
raise NotImplementedError
@abstractmethod
def to_dict_private(self) -> Dict[str, Any]:
raise NotImplementedError
def to_dict(self, on_not_implemented_error: str = "raise") -> Dict[str, Any]:
"""Take a transform pipeline and convert it to a serializable representation that uses only standard
python data types: dictionaries, lists, strings, integers, and floats.
Args:
self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
but no transform parameters will be serialized.
on_not_implemented_error (str): `raise` or `warn`.
"""
if on_not_implemented_error not in {"raise", "warn"}:
msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
"and 'warn'"
raise ValueError(msg)
try:
transform_dict = self.to_dict_private()
except NotImplementedError:
if on_not_implemented_error == "raise":
raise
transform_dict = {}
warnings.warn(
f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
f"or '{self.__class__.__name__}.get_transform_init_args' "
"method to make the transform serializable"
)
return {"__version__": __version__, "transform": transform_dict}
to_dict (self, on_not_implemented_error='raise')
¶
Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.
Parameters:
Name | Type | Description |
---|---|---|
self | A transform that should be serialized. If the transform doesn't implement the | |
on_not_implemented_error | str |
|
Source code in albumentations/core/serialization.py
def to_dict(self, on_not_implemented_error: str = "raise") -> Dict[str, Any]:
"""Take a transform pipeline and convert it to a serializable representation that uses only standard
python data types: dictionaries, lists, strings, integers, and floats.
Args:
self: A transform that should be serialized. If the transform doesn't implement the `to_dict`
method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
but no transform parameters will be serialized.
on_not_implemented_error (str): `raise` or `warn`.
"""
if on_not_implemented_error not in {"raise", "warn"}:
msg = f"Unknown on_not_implemented_error value: {on_not_implemented_error}. Supported values are: 'raise' "
"and 'warn'"
raise ValueError(msg)
try:
transform_dict = self.to_dict_private()
except NotImplementedError:
if on_not_implemented_error == "raise":
raise
transform_dict = {}
warnings.warn(
f"Got NotImplementedError while trying to serialize {self}. Object arguments are not preserved. "
f"Implement either '{self.__class__.__name__}.get_transform_init_args_names' "
f"or '{self.__class__.__name__}.get_transform_init_args' "
"method to make the transform serializable"
)
return {"__version__": __version__, "transform": transform_dict}
class SerializableMeta
[view source on GitHub] ¶
A metaclass that is used to register classes in SERIALIZABLE_REGISTRY
or NON_SERIALIZABLE_REGISTRY
so they can be found later while deserializing transformation pipeline using classes full names.
Source code in albumentations/core/serialization.py
class SerializableMeta(ABCMeta):
"""A metaclass that is used to register classes in `SERIALIZABLE_REGISTRY` or `NON_SERIALIZABLE_REGISTRY`
so they can be found later while deserializing transformation pipeline using classes full names.
"""
def __new__(cls, name: str, bases: Tuple[type, ...], *args: Any, **kwargs: Any) -> "SerializableMeta":
cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
if name != "Serializable" and ABC not in bases:
if cls_obj.is_serializable():
SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
else:
NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
return cls_obj
@classmethod
def is_serializable(cls) -> bool:
return False
@classmethod
def get_class_fullname(cls) -> str:
return get_shortest_class_fullname(cls)
@classmethod
def _to_dict(cls) -> Dict[str, Any]:
return {}
__new__ (cls, name, bases, *args, **kwargs)
special
staticmethod
¶
Create and return a new object. See help(type) for accurate signature.
Source code in albumentations/core/serialization.py
def __new__(cls, name: str, bases: Tuple[type, ...], *args: Any, **kwargs: Any) -> "SerializableMeta":
cls_obj = super().__new__(cls, name, bases, *args, **kwargs)
if name != "Serializable" and ABC not in bases:
if cls_obj.is_serializable():
SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
else:
NON_SERIALIZABLE_REGISTRY[cls_obj.get_class_fullname()] = cls_obj
return cls_obj
def from_dict (transform_dict, nonserializable=None)
[view source on GitHub]¶
transform_dict: A dictionary with serialized transform pipeline. nonserializable (dict): A dictionary that contains non-serializable transforms. This dictionary is required when you are restoring a pipeline that contains non-serializable transforms. Keys in that dictionary should be named same as name
arguments in respective transforms from a serialized pipeline.
Source code in albumentations/core/serialization.py
def from_dict(
transform_dict: Dict[str, Any], nonserializable: Optional[Dict[str, Any]] = None
) -> Optional[Serializable]:
"""Args:
transform_dict: A dictionary with serialized transform pipeline.
nonserializable (dict): A dictionary that contains non-serializable transforms.
This dictionary is required when you are restoring a pipeline that contains non-serializable transforms.
Keys in that dictionary should be named same as `name` arguments in respective transforms from
a serialized pipeline.
"""
register_additional_transforms()
transform = transform_dict["transform"]
lmbd = instantiate_nonserializable(transform, nonserializable)
if lmbd:
return lmbd
name = transform["__class_fullname__"]
args = {k: v for k, v in transform.items() if k != "__class_fullname__"}
cls = SERIALIZABLE_REGISTRY[shorten_class_name(name)]
if "transforms" in args:
args["transforms"] = [from_dict({"transform": t}, nonserializable=nonserializable) for t in args["transforms"]]
return cls(**args)
def get_shortest_class_fullname (cls)
[view source on GitHub]¶
The function get_shortest_class_fullname
takes a class object as input and returns its shortened full name.
:param cls: The parameter cls
is of type Type[BasicCompose]
, which means it expects a class that is a subclass of BasicCompose
:type cls: Type[BasicCompose] :return: a string, which is the shortened version of the full class name.
Source code in albumentations/core/serialization.py
def get_shortest_class_fullname(cls: Type[Any]) -> str:
"""The function `get_shortest_class_fullname` takes a class object as input and returns its shortened
full name.
:param cls: The parameter `cls` is of type `Type[BasicCompose]`, which means it expects a class that
is a subclass of `BasicCompose`
:type cls: Type[BasicCompose]
:return: a string, which is the shortened version of the full class name.
"""
class_fullname = f"{cls.__module__}.{cls.__name__}"
return shorten_class_name(class_fullname)
def load (filepath_or_buffer, data_format='json', nonserializable=None)
[view source on GitHub]¶
Load a serialized pipeline from a file or file-like object and construct a transform pipeline.
Parameters:
Name | Type | Description |
---|---|---|
filepath_or_buffer | Union[str, Path, TextIO] | The file path or file-like object to read the serialized data from. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be read from it directly. |
data_format | str | The format of the serialized data. Valid options are 'json' and 'yaml'. Defaults to 'json'. |
nonserializable | Optional[Dict[str, Any]] | A dictionary that contains non-serializable transforms. This dictionary is required when restoring a pipeline that contains non-serializable transforms. Keys in the dictionary should be named the same as the |
Returns:
Type | Description |
---|---|
object | The deserialized transform pipeline. |
Exceptions:
Type | Description |
---|---|
ValueError | If |
Source code in albumentations/core/serialization.py
def load(
filepath_or_buffer: Union[str, Path, TextIO],
data_format: str = "json",
nonserializable: Optional[Dict[str, Any]] = None,
) -> object:
"""Load a serialized pipeline from a file or file-like object and construct a transform pipeline.
Args:
filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to read the serialized
data from.
If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
the serialized data will be read from it directly.
data_format (str): The format of the serialized data. Valid options are 'json' and 'yaml'.
Defaults to 'json'.
nonserializable (Optional[Dict[str, Any]]): A dictionary that contains non-serializable transforms.
This dictionary is required when restoring a pipeline that contains non-serializable transforms.
Keys in the dictionary should be named the same as the `name` arguments in respective transforms
from the serialized pipeline. Defaults to None.
Returns:
object: The deserialized transform pipeline.
Raises:
ValueError: If `data_format` is 'yaml' but PyYAML is not installed.
"""
check_data_format(data_format)
if isinstance(filepath_or_buffer, (str, Path)): # Assume it's a filepath
with open(filepath_or_buffer) as f:
if data_format == "json":
transform_dict = json.load(f)
else:
if not yaml_available:
msg = "You need to install PyYAML to load a pipeline in yaml format"
raise ValueError(msg)
transform_dict = yaml.safe_load(f)
elif data_format == "json":
transform_dict = json.load(filepath_or_buffer)
else:
if not yaml_available:
msg = "You need to install PyYAML to load a pipeline in yaml format"
raise ValueError(msg)
transform_dict = yaml.safe_load(filepath_or_buffer)
return from_dict(transform_dict, nonserializable=nonserializable)
def register_additional_transforms ()
[view source on GitHub]¶
Register transforms that are not imported directly into the albumentations
module by checking the availability of optional dependencies.
Source code in albumentations/core/serialization.py
def register_additional_transforms() -> None:
"""Register transforms that are not imported directly into the `albumentations` module by checking
the availability of optional dependencies.
"""
if importlib.util.find_spec("torch") is not None:
try:
# Import `albumentations.pytorch` only if `torch` is installed.
import albumentations.pytorch
# Use a dummy operation to acknowledge the use of the imported module and avoid linting errors.
_ = albumentations.pytorch.ToTensorV2
except ImportError:
pass
def save (transform, filepath_or_buffer, data_format='json', on_not_implemented_error='raise')
[view source on GitHub]¶
Serialize a transform pipeline and save it to either a file specified by a path or a file-like object in either JSON or YAML format.
Parameters:
Name | Type | Description |
---|---|---|
transform | Serializable | The transform pipeline to serialize. |
filepath_or_buffer | Union[str, Path, TextIO] | The file path or file-like object to write the serialized data to. If a string is provided, it is interpreted as a path to a file. If a file-like object is provided, the serialized data will be written to it directly. |
data_format | str | The format to serialize the data in. Valid options are 'json' and 'yaml'. Defaults to 'json'. |
on_not_implemented_error | str | Determines the behavior if a transform does not implement the |
Exceptions:
Type | Description |
---|---|
ValueError | If |
Source code in albumentations/core/serialization.py
def save(
transform: "Serializable",
filepath_or_buffer: Union[str, Path, TextIO],
data_format: str = "json",
on_not_implemented_error: str = "raise",
) -> None:
"""Serialize a transform pipeline and save it to either a file specified by a path or a file-like object
in either JSON or YAML format.
Args:
transform (Serializable): The transform pipeline to serialize.
filepath_or_buffer (Union[str, Path, TextIO]): The file path or file-like object to write the serialized
data to.
If a string is provided, it is interpreted as a path to a file. If a file-like object is provided,
the serialized data will be written to it directly.
data_format (str): The format to serialize the data in. Valid options are 'json' and 'yaml'.
Defaults to 'json'.
on_not_implemented_error (str): Determines the behavior if a transform does not implement the `to_dict` method.
If set to 'raise', a `NotImplementedError` is raised. If set to 'warn', the exception is ignored, and
no transform arguments are saved. Defaults to 'raise'.
Raises:
ValueError: If `data_format` is 'yaml' but PyYAML is not installed.
"""
check_data_format(data_format)
transform_dict = transform.to_dict(on_not_implemented_error=on_not_implemented_error)
transform_dict = serialize_enum(transform_dict)
# Determine whether to write to a file or a file-like object
if isinstance(filepath_or_buffer, (str, Path)): # It's a filepath
with open(filepath_or_buffer, "w") as f:
if data_format == "yaml":
if not yaml_available:
msg = "You need to install PyYAML to save a pipeline in YAML format"
raise ValueError(msg)
yaml.safe_dump(transform_dict, f, default_flow_style=False)
elif data_format == "json":
json.dump(transform_dict, f)
elif data_format == "yaml":
if not yaml_available:
msg = "You need to install PyYAML to save a pipeline in YAML format"
raise ValueError(msg)
yaml.safe_dump(transform_dict, filepath_or_buffer, default_flow_style=False)
elif data_format == "json":
json.dump(transform_dict, filepath_or_buffer)
def serialize_enum (obj)
[view source on GitHub]¶
Recursively search for Enum objects and convert them to their value. Also handle any Mapping or Sequence types.
Source code in albumentations/core/serialization.py
def serialize_enum(obj: Any) -> Any:
"""Recursively search for Enum objects and convert them to their value.
Also handle any Mapping or Sequence types.
"""
if isinstance(obj, Mapping):
return {k: serialize_enum(v) for k, v in obj.items()}
if isinstance(obj, Sequence) and not isinstance(obj, str): # exclude strings since they're also sequences
return [serialize_enum(v) for v in obj]
return obj.value if isinstance(obj, Enum) else obj
def to_dict (transform, on_not_implemented_error='raise')
[view source on GitHub]¶
Take a transform pipeline and convert it to a serializable representation that uses only standard python data types: dictionaries, lists, strings, integers, and floats.
Parameters:
Name | Type | Description |
---|---|---|
transform | Serializable | A transform that should be serialized. If the transform doesn't implement the |
on_not_implemented_error | str |
|
Source code in albumentations/core/serialization.py
def to_dict(transform: Serializable, on_not_implemented_error: str = "raise") -> Dict[str, Any]:
"""Take a transform pipeline and convert it to a serializable representation that uses only standard
python data types: dictionaries, lists, strings, integers, and floats.
Args:
transform: A transform that should be serialized. If the transform doesn't implement the `to_dict`
method and `on_not_implemented_error` equals to 'raise' then `NotImplementedError` is raised.
If `on_not_implemented_error` equals to 'warn' then `NotImplementedError` will be ignored
but no transform parameters will be serialized.
on_not_implemented_error (str): `raise` or `warn`.
"""
return transform.to_dict(on_not_implemented_error)
transforms_interface
¶
class BasicTransform
(always_apply=False, p=0.5)
[view source on GitHub] ¶
Source code in albumentations/core/transforms_interface.py
class BasicTransform(Serializable):
call_backup = None
interpolation: Union[int, Interpolation]
fill_value: ColorType
mask_fill_value: Optional[ColorType]
def __init__(self, always_apply: bool = False, p: float = 0.5):
self.p = p
self.always_apply = always_apply
self._additional_targets: Dict[str, str] = {}
# replay mode params
self.deterministic = False
self.save_key = "replay"
self.params: Dict[Any, Any] = {}
self.replay_mode = False
self.applied_in_replay = False
def __call__(self, *args: Any, force_apply: bool = False, **kwargs: Any) -> Any:
if args:
msg = "You have to pass data to augmentations as named arguments, for example: aug(image=image)"
raise KeyError(msg)
if self.replay_mode:
if self.applied_in_replay:
return self.apply_with_params(self.params, **kwargs)
return kwargs
if force_apply or self.always_apply or (random.random() < self.p):
params = self.get_params()
if self.targets_as_params:
if not all(key in kwargs for key in self.targets_as_params):
msg = f"{self.__class__.__name__} requires {self.targets_as_params}"
raise ValueError(msg)
targets_as_params = {k: kwargs[k] for k in self.targets_as_params}
params_dependent_on_targets = self.get_params_dependent_on_targets(targets_as_params)
params.update(params_dependent_on_targets)
if self.deterministic:
if self.targets_as_params:
warn(
self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
" because its' params depend on targets."
)
kwargs[self.save_key][id(self)] = deepcopy(params)
return self.apply_with_params(params, **kwargs)
return kwargs
def apply_with_params(self, params: Dict[str, Any], *args: Any, **kwargs: Any) -> Dict[str, Any]:
if params is None:
return kwargs
params = self.update_params(params, **kwargs)
res = {}
for key, arg in kwargs.items():
if arg is not None:
target_function = self._get_target_function(key)
target_dependencies = {k: kwargs[k] for k in self.target_dependence.get(key, [])}
res[key] = target_function(arg, **dict(params, **target_dependencies))
else:
res[key] = None
return res
def set_deterministic(self, flag: bool, save_key: str = "replay") -> "BasicTransform":
if save_key == "params":
msg = "params save_key is reserved"
raise KeyError(msg)
self.deterministic = flag
self.save_key = save_key
return self
def __repr__(self) -> str:
state = self.get_base_init_args()
state.update(self.get_transform_init_args())
return f"{self.__class__.__name__}({format_args(state)})"
def _get_target_function(self, key: str) -> Callable[..., Any]:
transform_key = key
if key in self._additional_targets:
transform_key = self._additional_targets.get(key, key)
return self.targets.get(transform_key, lambda x, **p: x)
def apply(self, img: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
raise NotImplementedError
def get_params(self) -> Dict[str, Any]:
return {}
@property
def targets(self) -> Dict[str, Callable[..., Any]]:
# you must specify targets in subclass
# foe example:
# >> ('image', 'mask')
# >> ('image', 'boxes')
raise NotImplementedError
def update_params(self, params: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]:
if hasattr(self, "interpolation"):
params["interpolation"] = self.interpolation
if hasattr(self, "fill_value"):
params["fill_value"] = self.fill_value
if hasattr(self, "mask_fill_value"):
params["mask_fill_value"] = self.mask_fill_value
params.update({"cols": kwargs["image"].shape[1], "rows": kwargs["image"].shape[0]})
return params
@property
def target_dependence(self) -> Dict[str, Any]:
return {}
def add_targets(self, additional_targets: Dict[str, str]) -> None:
"""Add targets to transform them the same way as one of existing targets
ex: {'target_image': 'image'}
ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
by the way you must have at least one object with key 'image'
Args:
additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}
"""
self._additional_targets = additional_targets
@property
def targets_as_params(self) -> List[str]:
return []
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
raise NotImplementedError(
"Method get_params_dependent_on_targets is not implemented in class " + self.__class__.__name__
)
@classmethod
def get_class_fullname(cls) -> str:
return get_shortest_class_fullname(cls)
@classmethod
def is_serializable(cls) -> bool:
return True
def get_transform_init_args_names(self) -> Tuple[str, ...]:
msg = f"Class {self.get_class_fullname()} is not serializable because the `get_transform_init_args_names` "
"method is not implemented"
raise NotImplementedError(msg)
def get_base_init_args(self) -> Dict[str, Any]:
return {"always_apply": self.always_apply, "p": self.p}
def get_transform_init_args(self) -> Dict[str, Any]:
return {k: getattr(self, k) for k in self.get_transform_init_args_names()}
def to_dict_private(self) -> Dict[str, Any]:
state = {"__class_fullname__": self.get_class_fullname()}
state.update(self.get_base_init_args())
state.update(self.get_transform_init_args())
return state
def get_dict_with_id(self) -> Dict[str, Any]:
d = self.to_dict_private()
d["id"] = id(self)
return d
add_targets (self, additional_targets)
¶
Add targets to transform them the same way as one of existing targets ex: {'target_image': 'image'} ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'} by the way you must have at least one object with key 'image'
Parameters:
Name | Type | Description |
---|---|---|
additional_targets | dict | keys - new target name, values - old target name. ex: {'image2': 'image'} |
Source code in albumentations/core/transforms_interface.py
def add_targets(self, additional_targets: Dict[str, str]) -> None:
"""Add targets to transform them the same way as one of existing targets
ex: {'target_image': 'image'}
ex: {'obj1_mask': 'mask', 'obj2_mask': 'mask'}
by the way you must have at least one object with key 'image'
Args:
additional_targets (dict): keys - new target name, values - old target name. ex: {'image2': 'image'}
"""
self._additional_targets = additional_targets
class DualTransform
[view source on GitHub] ¶
A base class for transformations that should be applied both to an image and its corresponding properties such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image, all associated entities are transformed accordingly to maintain consistency between the image and its annotations.
Properties
targets (Dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints) that the transform should be applied to and maps them to the corresponding methods.
Methods
apply_to_bbox(bbox: BoxInternalType, args: Any, *params: Any) -> BoxInternalType: Applies the transform to a single bounding box. Should be implemented in the subclass.
apply_to_keypoint(keypoint: KeypointInternalType, args: Any, *params: Any) -> KeypointInternalType: Applies the transform to a single keypoint. Should be implemented in the subclass.
apply_to_bboxes(bboxes: Sequence[BoxType], args: Any, *params: Any) -> Sequence[BoxType]: Applies the transform to a list of bounding boxes. Delegates to apply_to_bbox
for each bounding box.
apply_to_keypoints(keypoints: Sequence[KeypointType], args: Any, *params: Any) -> Sequence[KeypointType]: Applies the transform to a list of keypoints. Delegates to apply_to_keypoint
for each keypoint.
apply_to_mask(mask: np.ndarray, args: Any, *params: Any) -> np.ndarray: Applies the transform specifically to a single mask.
apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]: Applies the transform to a list of masks. Delegates to apply_to_mask
for each mask.
Note
This class is intended to be subclassed and should not be used directly. Subclasses are expected to implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the corresponding apply_to_*
methods.
Source code in albumentations/core/transforms_interface.py
class DualTransform(BasicTransform):
"""A base class for transformations that should be applied both to an image and its corresponding properties
such as masks, bounding boxes, and keypoints. This class ensures that when a transform is applied to an image,
all associated entities are transformed accordingly to maintain consistency between the image and its annotations.
Properties:
targets (Dict[str, Callable[..., Any]]): Defines the types of targets (e.g., image, mask, bboxes, keypoints)
that the transform should be applied to and maps them to the corresponding methods.
Methods:
apply_to_bbox(bbox: BoxInternalType, *args: Any, **params: Any) -> BoxInternalType:
Applies the transform to a single bounding box. Should be implemented in the subclass.
apply_to_keypoint(keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
Applies the transform to a single keypoint. Should be implemented in the subclass.
apply_to_bboxes(bboxes: Sequence[BoxType], *args: Any, **params: Any) -> Sequence[BoxType]:
Applies the transform to a list of bounding boxes. Delegates to `apply_to_bbox` for each bounding box.
apply_to_keypoints(keypoints: Sequence[KeypointType], *args: Any, **params: Any) -> Sequence[KeypointType]:
Applies the transform to a list of keypoints. Delegates to `apply_to_keypoint` for each keypoint.
apply_to_mask(mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
Applies the transform specifically to a single mask.
apply_to_masks(masks: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]:
Applies the transform to a list of masks. Delegates to `apply_to_mask` for each mask.
Note:
This class is intended to be subclassed and should not be used directly. Subclasses are expected to
implement the specific logic for each type of target (e.g., image, mask, bboxes, keypoints) in the
corresponding `apply_to_*` methods.
"""
@property
def targets(self) -> Dict[str, Callable[..., Any]]:
return {
"image": self.apply,
"mask": self.apply_to_mask,
"masks": self.apply_to_masks,
"bboxes": self.apply_to_bboxes,
"keypoints": self.apply_to_keypoints,
}
def apply_to_bbox(self, bbox: BoxInternalType, *args: Any, **params: Any) -> BoxInternalType:
msg = f"Method apply_to_bbox is not implemented in class {self.__class__.__name__}"
raise NotImplementedError(msg)
def apply_to_keypoint(self, keypoint: KeypointInternalType, *args: Any, **params: Any) -> KeypointInternalType:
msg = f"Method apply_to_keypoint is not implemented in class {self.__class__.__name__}"
raise NotImplementedError(msg)
def apply_to_global_label(self, label: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
msg = f"Method apply_to_global_label is not implemented in class {self.__class__.__name__}"
raise NotImplementedError(msg)
def apply_to_bboxes(self, bboxes: Sequence[BoxType], *args: Any, **params: Any) -> Sequence[BoxType]:
return [
self.apply_to_bbox(cast(BoxInternalType, tuple(cast(BoxInternalType, bbox[:4]))), **params)
+ tuple(bbox[4:])
for bbox in bboxes
]
def apply_to_keypoints(
self, keypoints: Sequence[KeypointType], *args: Any, **params: Any
) -> Sequence[KeypointType]:
return [
self.apply_to_keypoint(cast(KeypointInternalType, tuple(keypoint[:4])), **params) + tuple(keypoint[4:])
for keypoint in keypoints
]
def apply_to_mask(self, mask: np.ndarray, *args: Any, **params: Any) -> np.ndarray:
return self.apply(mask, **{k: cv2.INTER_NEAREST if k == "interpolation" else v for k, v in params.items()})
def apply_to_masks(self, masks: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]:
return [self.apply_to_mask(mask, **params) for mask in masks]
def apply_to_global_labels(self, labels: Sequence[np.ndarray], **params: Any) -> List[np.ndarray]:
return [self.apply_to_global_label(label, **params) for label in labels]
class ImageOnlyTransform
[view source on GitHub] ¶
class NoOp
[view source on GitHub] ¶
Does nothing
Targets
image, mask, bboxes, keypoints, global_label
Source code in albumentations/core/transforms_interface.py
class NoOp(DualTransform):
"""Does nothing
Targets:
image, mask, bboxes, keypoints, global_label
"""
_targets = (Targets.IMAGE, Targets.MASK, Targets.BBOXES, Targets.KEYPOINTS, Targets.GLOBAL_LABEL)
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params: Any) -> KeypointInternalType:
return keypoint
def apply_to_bbox(self, bbox: BoxInternalType, **params: Any) -> BoxInternalType:
return bbox
def apply(self, img: np.ndarray, **params: Any) -> np.ndarray:
return img
def apply_to_mask(self, mask: np.ndarray, **params: Any) -> np.ndarray:
return mask
def apply_to_global_label(self, label: np.ndarray, **params: Any) -> np.ndarray:
return label
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ()
def to_tuple (param, low=None, bias=None)
[view source on GitHub]¶
Convert input argument to a min-max tuple.
Parameters:
Name | Type | Description |
---|---|---|
param | Union[float, Tuple[float, float], int, Tuple[int, int]] | Input value which could be a scalar or a sequence of exactly 2 scalars. |
low | Union[float, Tuple[float, float], int, Tuple[int, int]] | Second element of the tuple, provided as an optional argument for when |
bias | Union[int, float] | An offset added to both elements of the tuple. |
Returns:
Type | Description |
---|---|
Union[Tuple[int, int], Tuple[float, float]] | A tuple of two scalars, optionally adjusted by |
Source code in albumentations/core/transforms_interface.py
def to_tuple(
param: ScaleType,
low: Optional[ScaleType] = None,
bias: Optional[ScalarType] = None,
) -> Union[Tuple[int, int], Tuple[float, float]]:
"""Convert input argument to a min-max tuple.
Args:
param: Input value which could be a scalar or a sequence of exactly 2 scalars.
low: Second element of the tuple, provided as an optional argument for when `param` is a scalar.
bias: An offset added to both elements of the tuple.
Returns:
A tuple of two scalars, optionally adjusted by `bias`.
Raises ValueError for invalid combinations or types of arguments.
"""
# Validate mutually exclusive arguments
if low is not None and bias is not None:
msg = "Arguments 'low' and 'bias' cannot be used together."
raise ValueError(msg)
if isinstance(param, Sequence) and len(param) == PAIR:
min_val, max_val = min(param), max(param)
# Handle scalar input
elif isinstance(param, (int, float)):
if isinstance(low, (int, float)):
# Use low and param to create a tuple
min_val, max_val = (low, param) if low < param else (param, low)
else:
# Create a symmetric tuple around 0
min_val, max_val = -param, param
else:
msg = "Argument 'param' must be either a scalar or a sequence of 2 elements."
raise ValueError(msg)
# Apply bias if provided
if bias is not None:
return (bias + min_val, bias + max_val)
return min_val, max_val
pytorch
special
¶
transforms
¶
class ToTensorV2
(transpose_mask=False, always_apply=True, p=1.0)
[view source on GitHub] ¶
Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy HWC
format and converts them to PyTorch CHW
format. If the image is in HW
format, it will be converted to PyTorch HW
.
Attributes:
Name | Type | Description |
---|---|---|
transpose_mask | bool | If True, transposes 3D input mask dimensions from |
always_apply | bool | Indicates if this transformation should be always applied. Default: True. |
p | float | Probability of applying the transform. Default: 1.0. |
Source code in albumentations/pytorch/transforms.py
class ToTensorV2(BasicTransform):
"""Converts images/masks to PyTorch Tensors, inheriting from BasicTransform. Supports images in numpy `HWC` format
and converts them to PyTorch `CHW` format. If the image is in `HW` format, it will be converted to PyTorch `HW`.
Attributes:
transpose_mask (bool): If True, transposes 3D input mask dimensions from `[height, width, num_channels]` to
`[num_channels, height, width]`.
always_apply (bool): Indicates if this transformation should be always applied. Default: True.
p (float): Probability of applying the transform. Default: 1.0.
"""
def __init__(self, transpose_mask: bool = False, always_apply: bool = True, p: float = 1.0):
super().__init__(always_apply=always_apply, p=p)
self.transpose_mask = transpose_mask
@property
def targets(self) -> Dict[str, Any]:
return {"image": self.apply, "mask": self.apply_to_mask, "masks": self.apply_to_masks}
def apply(self, img: np.ndarray, **params: Any) -> torch.Tensor:
if len(img.shape) not in [2, 3]:
msg = "Albumentations only supports images in HW or HWC format"
raise ValueError(msg)
if len(img.shape) == TWO:
img = np.expand_dims(img, 2)
return torch.from_numpy(img.transpose(2, 0, 1))
def apply_to_mask(self, mask: np.ndarray, **params: Any) -> torch.Tensor:
if self.transpose_mask and mask.ndim == THREE:
mask = mask.transpose(2, 0, 1)
return torch.from_numpy(mask)
def apply_to_masks(self, masks: List[np.ndarray], **params: Any) -> List[torch.Tensor]:
return [self.apply_to_mask(mask, **params) for mask in masks]
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("transpose_mask",)
def get_params_dependent_on_targets(self, params: Any) -> Dict[str, Any]:
return {}
random_utils
¶
def shuffle (a, random_state=None)
[view source on GitHub]¶
Shuffles an array in-place, using a specified random state or creating a new one if not provided.
Parameters:
Name | Type | Description |
---|---|---|
a | np.ndarray | The array to be shuffled. |
random_state | Optional[np.random.RandomState] | The random state used for shuffling. Defaults to None. |
Returns:
Type | Description |
---|---|
np.ndarray | The shuffled array (note: the shuffle is in-place, so the original array is modified). |
Source code in albumentations/random_utils.py
def shuffle(
a: np.ndarray,
random_state: Optional[np.random.RandomState] = None,
) -> np.ndarray:
"""Shuffles an array in-place, using a specified random state or creating a new one if not provided.
Args:
a (np.ndarray): The array to be shuffled.
random_state (Optional[np.random.RandomState], optional): The random state used for shuffling. Defaults to None.
Returns:
np.ndarray: The shuffled array (note: the shuffle is in-place, so the original array is modified).
"""
if random_state is None:
random_state = get_random_state()
random_state.shuffle(a)
return a