albumentations.augmentations.transforms


Module containing image transformation classes for augmentation. This module provides a wide range of image transformation classes for data augmentation. These transformations can modify properties such as color, brightness, contrast, noise levels, and more. Each transformation class inherits from a base transform interface and implements specific augmentation logic.

AdditiveNoiseclass

AdditiveNoise(
    noise_type: Literal['uniform', 'gaussian', 'laplace', 'beta'] = uniform,
    spatial_mode: Literal['constant', 'per_pixel', 'shared'] = constant,
    noise_params: dict[str, Any] | None = None,
    approximation: float = 1.0,
    p: float = 0.5
)

Apply random noise to image channels using various noise distributions. This transform generates noise using different probability distributions and applies it to image channels. The noise can be generated in three spatial modes and supports multiple noise distributions, each with configurable parameters.

Parameters

NameTypeDefaultDescription
noise_type
One of:
  • 'uniform'
  • 'gaussian'
  • 'laplace'
  • 'beta'
uniformType of noise distribution to use. Options: - "uniform": Uniform distribution, good for simple random perturbations - "gaussian": Normal distribution, models natural random processes - "laplace": Similar to Gaussian but with heavier tails, good for outliers - "beta": Flexible bounded distribution, can be symmetric or skewed
spatial_mode
One of:
  • 'constant'
  • 'per_pixel'
  • 'shared'
constantHow to generate and apply the noise. Options: - "constant": One noise value per channel, fastest - "per_pixel": Independent noise value for each pixel and channel, slowest - "shared": One noise map shared across all channels, medium speed
noise_params
One of:
  • dict[str, Any]
  • None
NoneParameters for the chosen noise distribution. Must match the noise_type: uniform: ranges: list[tuple[float, float]] List of (min, max) ranges for each channel. Each range must be in [-1, 1]. If only one range is provided, it will be used for all channels. [(-0.2, 0.2)] # Same range for all channels [(-0.2, 0.2), (-0.1, 0.1), (-0.1, 0.1)] # Different ranges for RGB gaussian: mean_range: tuple[float, float], default (0.0, 0.0) Range for sampling mean value, in [-1, 1] std_range: tuple[float, float], default (0.1, 0.1) Range for sampling standard deviation, in [0, 1] laplace: mean_range: tuple[float, float], default (0.0, 0.0) Range for sampling location parameter, in [-1, 1] scale_range: tuple[float, float], default (0.1, 0.1) Range for sampling scale parameter, in [0, 1] beta: alpha_range: tuple[float, float], default (0.5, 1.5) Value < 1 = U-shaped, Value > 1 = Bell-shaped Range for sampling first shape parameter, in (0, inf) beta_range: tuple[float, float], default (0.5, 1.5) Value < 1 = U-shaped, Value > 1 = Bell-shaped Range for sampling second shape parameter, in (0, inf) scale_range: tuple[float, float], default (0.1, 0.3) Smaller scale for subtler noise Range for sampling output scale, in [0, 1]
approximationfloat1.0float in [0, 1], default=1.0 Controls noise generation speed vs quality tradeoff. - 1.0: Generate full resolution noise (slowest, highest quality) - 0.5: Generate noise at half resolution and upsample - 0.25: Generate noise at quarter resolution and upsample Only affects 'per_pixel' and 'shared' spatial modes.
pfloat0.5-

AutoContrastclass

AutoContrast(
    cutoff: float = 0,
    ignore: int | None = None,
    method: Literal['cdf', 'pil'] = cdf,
    p: float = 0.5
)

Automatically adjust image contrast by stretching the intensity range. This transform provides two methods for contrast enhancement: 1. CDF method (default): Uses cumulative distribution function for more gradual adjustment 2. PIL method: Uses linear scaling like PIL.ImageOps.autocontrast The transform can optionally exclude extreme values from both ends of the intensity range and preserve specific intensity values (e.g., alpha channel).

Parameters

NameTypeDefaultDescription
cutofffloat0Percentage of pixels to exclude from both ends of the histogram. Range: [0, 100]. Default: 0 (use full intensity range) - 0 means use the minimum and maximum intensity values found - 20 means exclude darkest and brightest 20% of pixels
ignore
One of:
  • int
  • None
NoneIntensity value to preserve (e.g., alpha channel). Range: [0, 255]. Default: None - If specified, this intensity value will not be modified - Useful for images with alpha channel or special marker values
method
One of:
  • 'cdf'
  • 'pil'
cdfAlgorithm to use for contrast enhancement. Default: "cdf" - "cdf": Uses cumulative distribution for smoother adjustment - "pil": Uses linear scaling like PIL.ImageOps.autocontrast
pfloat0.5Probability of applying the transform. Default: 0.5

Notes

- The transform processes each color channel independently - For grayscale images, only one channel is processed - The output maintains the same dtype as input - Empty or single-color channels remain unchanged

BetaParamsclass

BetaParams(
    noise_type: Literal = beta,
    alpha_range: Annotated,
    beta_range: Annotated,
    scale_range: Annotated
)

Parameters

NameTypeDefaultDescription
noise_typeLiteralbeta-
alpha_rangeAnnotated--
beta_rangeAnnotated--
scale_rangeAnnotated--

CLAHEclass

CLAHE(
    clip_limit: tuple[float, float] | float = 4.0,
    tile_grid_size: tuple[int, int] = (8, 8),
    p: float = 0.5
)

Apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to the input image. CLAHE is an advanced method of improving the contrast in an image. Unlike regular histogram equalization, which operates on the entire image, CLAHE operates on small regions (tiles) in the image. This results in a more balanced equalization, preventing over-amplification of contrast in areas with initially low contrast.

Parameters

NameTypeDefaultDescription
clip_limit
One of:
  • tuple[float, float]
  • float
4.0Controls the contrast enhancement limit. - If a single float is provided, the range will be (1, clip_limit). - If a tuple of two floats is provided, it defines the range for random selection. Higher values allow for more contrast enhancement, but may also increase noise. Default: (1, 4)
tile_grid_sizetuple[int, int](8, 8)Defines the number of tiles in the row and column directions. Format is (rows, columns). Smaller tile sizes can lead to more localized enhancements, while larger sizes give results closer to global histogram equalization. Default: (8, 8)
pfloat0.5Probability of applying the transform. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.CLAHE(clip_limit=(1, 4), tile_grid_size=(8, 8), p=1.0)
>>> result = transform(image=image)
>>> clahe_image = result["image"]

Notes

- Supports only RGB or grayscale images. - For color images, CLAHE is applied to the L channel in the LAB color space. - The clip limit determines the maximum slope of the cumulative histogram. A lower clip limit will result in more contrast limiting. - Tile grid size affects the adaptiveness of the method. More tiles increase local adaptiveness but can lead to an unnatural look if set too high.

ChannelShuffleclass

ChannelShuffle(
    p: float = 0.5
)

Randomly rearrange channels of the image.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

ChromaticAberrationclass

ChromaticAberration(
    primary_distortion_limit: tuple[float, float] | float = (-0.02, 0.02),
    secondary_distortion_limit: tuple[float, float] | float = (-0.05, 0.05),
    mode: Literal['green_purple', 'red_blue', 'random'] = green_purple,
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT] = 1,
    p: float = 0.5
)

Add lateral chromatic aberration by distorting the red and blue channels of the input image. Chromatic aberration is an optical effect that occurs when a lens fails to focus all colors to the same point. This transform simulates this effect by applying different radial distortions to the red and blue channels of the image, while leaving the green channel unchanged.

Parameters

NameTypeDefaultDescription
primary_distortion_limit
One of:
  • tuple[float, float]
  • float
(-0.02, 0.02)Range of the primary radial distortion coefficient. If a single float value is provided, the range will be (-primary_distortion_limit, primary_distortion_limit). This parameter controls the distortion in the center of the image: - Positive values result in pincushion distortion (edges bend inward) - Negative values result in barrel distortion (edges bend outward) Default: (-0.02, 0.02).
secondary_distortion_limit
One of:
  • tuple[float, float]
  • float
(-0.05, 0.05)Range of the secondary radial distortion coefficient. If a single float value is provided, the range will be (-secondary_distortion_limit, secondary_distortion_limit). This parameter controls the distortion in the corners of the image: - Positive values enhance pincushion distortion - Negative values enhance barrel distortion Default: (-0.05, 0.05).
mode
One of:
  • 'green_purple'
  • 'red_blue'
  • 'random'
green_purpleType of color fringing to apply. Options are: - 'green_purple': Distorts red and blue channels in opposite directions, creating green-purple fringing. - 'red_blue': Distorts red and blue channels in the same direction, creating red-blue fringing. - 'random': Randomly chooses between 'green_purple' and 'red_blue' modes for each application. Default: 'green_purple'.
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_NEAREST_EXACT
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
  • cv2.INTER_LINEAR_EXACT
1Flag specifying the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
pfloat0.5Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5.

Example

>>> import albumentations as A
>>> import cv2
>>> transform = A.ChromaticAberration(
...     primary_distortion_limit=0.05,
...     secondary_distortion_limit=0.1,
...     mode='green_purple',
...     interpolation=cv2.INTER_LINEAR,
...     p=1.0
... )
>>> transformed = transform(image=image)
>>> aberrated_image = transformed['image']

Notes

- This transform only affects RGB images. Grayscale images will raise an error. - The strength of the effect depends on both primary and secondary distortion limits. - Higher absolute values for distortion limits will result in more pronounced chromatic aberration. - The 'green_purple' mode tends to produce more noticeable effects than 'red_blue'.

ColorJitterclass

ColorJitter(
    brightness: tuple[float, float] | float = (0.8, 1.2),
    contrast: tuple[float, float] | float = (0.8, 1.2),
    saturation: tuple[float, float] | float = (0.8, 1.2),
    hue: tuple[float, float] | float = (-0.5, 0.5),
    p: float = 0.5
)

Randomly changes the brightness, contrast, saturation, and hue of an image. This transform is similar to torchvision's ColorJitter but with some differences due to the use of OpenCV instead of Pillow. The main differences are: 1. OpenCV and Pillow use different formulas to convert images to HSV format. 2. This implementation uses value saturation instead of uint8 overflow as in Pillow. These differences may result in slightly different output compared to torchvision's ColorJitter.

Parameters

NameTypeDefaultDescription
brightness
One of:
  • tuple[float, float]
  • float
(0.8, 1.2)How much to jitter brightness. If float: The brightness factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]. If tuple: The brightness factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)
contrast
One of:
  • tuple[float, float]
  • float
(0.8, 1.2)How much to jitter contrast. If float: The contrast factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]. If tuple: The contrast factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)
saturation
One of:
  • tuple[float, float]
  • float
(0.8, 1.2)How much to jitter saturation. If float: The saturation factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]. If tuple: The saturation factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)
hue
One of:
  • tuple[float, float]
  • float
(-0.5, 0.5)How much to jitter hue. If float: The hue factor is chosen uniformly from [-hue, hue]. Should have 0 <= hue <= 0.5. If tuple: The hue factor is sampled from the range specified. Values should be in range [-0.5, 0.5]. Default: (-0.5, 0.5) p (float): Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5
pfloat0.5-

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1, p=1.0)
>>> result = transform(image=image)
>>> jittered_image = result['image']

Notes

- The order of application for these color transformations is random for each image. - The ranges for brightness, contrast, and saturation are applied as multiplicative factors. - The range for hue is applied as an additive factor.

Downscaleclass

Downscale(
    scale_range: tuple[float, float] = (0.25, 0.25),
    interpolation_pair: dict[Literal['downscale', 'upscale'], Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT]] = {'upscale': 0, 'downscale': 0},
    p: float = 0.5
)

Decrease image quality by downscaling and upscaling back. This transform simulates the effect of a low-resolution image by first downscaling the image to a lower resolution and then upscaling it back to its original size. This process introduces loss of detail and can be used to simulate low-quality images or to test the robustness of models to different image resolutions.

Parameters

NameTypeDefaultDescription
scale_rangetuple[float, float](0.25, 0.25)Range for the downscaling factor. Should be two float values between 0 and 1, where the first value is less than or equal to the second. The actual downscaling factor will be randomly chosen from this range for each image. Lower values result in more aggressive downscaling. Default: (0.25, 0.25)
interpolation_pairdict[Literal['downscale', 'upscale'], Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT]]{'upscale': 0, 'downscale': 0}A dictionary specifying the interpolation methods to use for downscaling and upscaling. Should contain two keys: - 'downscale': Interpolation method for downscaling - 'upscale': Interpolation method for upscaling Values should be OpenCV interpolation flags (e.g., cv2.INTER_NEAREST, cv2.INTER_LINEAR, etc.) Default: {'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_NEAREST}
pfloat0.5Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Example

>>> import albumentations as A
>>> import cv2
>>> transform = A.Downscale(
...     scale_range=(0.5, 0.75),
...     interpolation_pair={'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_LINEAR},
...     p=0.5
... )
>>> transformed = transform(image=image)
>>> downscaled_image = transformed['image']

Notes

- The actual downscaling factor is randomly chosen for each image from the range specified in scale_range. - Using different interpolation methods for downscaling and upscaling can produce various effects. For example, using INTER_NEAREST for both can create a pixelated look, while using INTER_LINEAR or INTER_CUBIC can produce smoother results. - This transform can be useful for data augmentation, especially when training models that need to be robust to variations in image quality or resolution.

Embossclass

Emboss(
    alpha: tuple[float, float] = (0.2, 0.5),
    strength: tuple[float, float] = (0.2, 0.7),
    p: float = 0.5
)

Apply embossing effect to the input image. This transform creates an emboss effect by highlighting edges and creating a 3D-like texture in the image. It works by applying a specific convolution kernel to the image that emphasizes differences in adjacent pixel values.

Parameters

NameTypeDefaultDescription
alphatuple[float, float](0.2, 0.5)Range to choose the visibility of the embossed image. At 0, only the original image is visible, at 1.0 only its embossed version is visible. Values should be in the range [0, 1]. Alpha will be randomly selected from this range for each image. Default: (0.2, 0.5)
strengthtuple[float, float](0.2, 0.7)Range to choose the strength of the embossing effect. Higher values create a more pronounced 3D effect. Values should be non-negative. Strength will be randomly selected from this range for each image. Default: (0.2, 0.7)
pfloat0.5Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Emboss(alpha=(0.2, 0.5), strength=(0.2, 0.7), p=0.5)
>>> result = transform(image=image)
>>> embossed_image = result['image']

Notes

- The emboss effect is created using a 3x3 convolution kernel. - The 'alpha' parameter controls the blend between the original image and the embossed version. A higher alpha value will result in a more pronounced emboss effect. - The 'strength' parameter affects the intensity of the embossing. Higher strength values will create more contrast in the embossed areas, resulting in a stronger 3D-like effect. - This transform can be useful for creating artistic effects or for data augmentation in tasks where edge information is important.

Equalizeclass

Equalize(
    mode: Literal['cv', 'pil'] = cv,
    by_channels: bool = True,
    mask: np.ndarray | Callable[..., Any] | None = None,
    mask_params: Sequence[str] = (),
    p: float = 0.5
)

Equalize the image histogram. This transform applies histogram equalization to the input image. Histogram equalization is a method in image processing of contrast adjustment using the image's histogram.

Parameters

NameTypeDefaultDescription
mode
One of:
  • 'cv'
  • 'pil'
cvUse OpenCV or Pillow equalization method. Default: 'cv'
by_channelsboolTrueIf True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by `Y` channel. Default: True
mask
One of:
  • np.ndarray
  • Callable[..., Any]
  • None
NoneIf given, only the pixels selected by the mask are included in the analysis. Can be: - A 1-channel or 3-channel numpy array of the same size as the input image. - A callable (function) that generates a mask. The function should accept 'image' as its first argument, and can accept additional arguments specified in mask_params. Default: None
mask_paramsSequence[str]()Additional parameters to pass to the mask function. These parameters will be taken from the data dict passed to __call__. Default: ()
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> # Using a static mask
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> transform = A.Equalize(mask=mask, p=1.0)
>>> result = transform(image=image)
>>>
>>> # Using a dynamic mask function
>>> def mask_func(image, bboxes):
...     mask = np.ones_like(image[:, :, 0], dtype=np.uint8)
...     for bbox in bboxes:
...         x1, y1, x2, y2 = map(int, bbox)
...         mask[y1:y2, x1:x2] = 0  # Exclude areas inside bounding boxes
...     return mask
>>>
>>> transform = A.Equalize(mask=mask_func, mask_params=['bboxes'], p=1.0)
>>> bboxes = [(10, 10, 50, 50), (60, 60, 90, 90)]  # Example bounding boxes
>>> result = transform(image=image, bboxes=bboxes)

Notes

- When mode='cv', OpenCV's equalizeHist() function is used. - When mode='pil', Pillow's equalize() function is used. - The 'by_channels' parameter determines whether equalization is applied to each color channel independently (True) or to the luminance channel only (False). - If a mask is provided as a numpy array, it should have the same height and width as the input image. - If a mask is provided as a function, it allows for dynamic mask generation based on the input image and additional parameters. This is useful for scenarios where the mask depends on the image content or external data (e.g., bounding boxes, segmentation masks).

FancyPCAclass

FancyPCA(
    alpha: float = 0.1,
    p: float = 0.5
)

Apply Fancy PCA augmentation to the input image. This augmentation technique applies PCA (Principal Component Analysis) to the image's color channels, then adds multiples of the principal components to the image, with magnitudes proportional to the corresponding eigenvalues times a random variable drawn from a Gaussian with mean 0 and standard deviation 'alpha'.

Parameters

NameTypeDefaultDescription
alphafloat0.1Standard deviation of the Gaussian distribution used to generate random noise for each principal component. Default: 0.1.
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.FancyPCA(alpha=0.1, p=1.0)
>>> result = transform(image=image)
>>> augmented_image = result["image"]

Notes

- This augmentation is particularly effective for RGB images but can work with any number of channels. - For grayscale images, it applies a simplified version of the augmentation. - The transform preserves the mean of the image while adjusting the color/intensity variation. - This implementation is based on the paper by Krizhevsky et al. and is similar to the one used in the original AlexNet paper.

References

  • ImageNet Classification with Deep Convolutional Neural Networks: In Advances in Neural Information

FromFloatclass

FromFloat(
    dtype: Literal['uint8', 'uint16', 'uint32'] = uint8,
    max_value: float | None = None,
    p: float = 1.0
)

Convert an image from floating point representation to the specified data type. This transform is designed to convert images from a normalized floating-point representation (typically with values in the range [0, 1]) to other data types, scaling the values appropriately.

Parameters

NameTypeDefaultDescription
dtype
One of:
  • 'uint8'
  • 'uint16'
  • 'uint32'
uint8The desired output data type. Supported types include 'uint8', 'uint16', 'uint32'. Default: 'uint8'.
max_value
One of:
  • float
  • None
NoneThe maximum value for the output dtype. If None, the transform will attempt to infer the maximum value based on the dtype. Default: None.
pfloat1.0Probability of applying the transform. Default: 1.0.

Example

>>> import numpy as np
>>> import albumentations as A
>>> transform = A.FromFloat(dtype='uint8', max_value=None, p=1.0)
>>> image = np.random.rand(100, 100, 3).astype(np.float32)  # Float image in [0, 1] range
>>> result = transform(image=image)
>>> uint8_image = result['image']
>>> assert uint8_image.dtype == np.uint8
>>> assert uint8_image.min() >= 0 and uint8_image.max() <= 255

Notes

- This is the inverse transform for ToFloat. - Input images are expected to be in floating point format with values in the range [0, 1]. - For integer output types (uint8, uint16, uint32), the function will scale the values to the appropriate range (e.g., 0-255 for uint8). - For float output types (float32, float64), the values will remain in the [0, 1] range. - The transform uses the `from_float` function internally, which ensures output values are within the valid range for the specified dtype.

GaussNoiseclass

GaussNoise(
    std_range: tuple[float, float] = (0.2, 0.44),
    mean_range: tuple[float, float] = (0.0, 0.0),
    per_channel: bool = True,
    noise_scale_factor: float = 1,
    p: float = 0.5
)

Apply Gaussian noise to the input image.

Parameters

NameTypeDefaultDescription
std_rangetuple[float, float](0.2, 0.44)Range for noise standard deviation as a fraction of the maximum value (255 for uint8 images or 1.0 for float images). Values should be in range [0, 1]. Default: (0.2, 0.44).
mean_rangetuple[float, float](0.0, 0.0)Range for noise mean as a fraction of the maximum value (255 for uint8 images or 1.0 for float images). Values should be in range [-1, 1]. Default: (0.0, 0.0).
per_channelboolTrueIf True, noise will be sampled for each channel independently. Otherwise, the noise will be sampled once for all channels. Default: True.
noise_scale_factorfloat1Scaling factor for noise generation. Value should be in the range (0, 1]. When set to 1, noise is sampled for each pixel independently. If less, noise is sampled for a smaller size and resized to fit the shape of the image. Smaller values make the transform faster. Default: 1.0.
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The noise parameters (std_range and mean_range) are normalized to [0, 1] range: * For uint8 images, they are multiplied by 255 * For float32 images, they are used directly - Setting per_channel=False is faster but applies the same noise to all channels - The noise_scale_factor parameter allows for a trade-off between transform speed and noise granularity

GaussianParamsclass

GaussianParams(
    noise_type: Literal = gaussian,
    mean_range: Annotated,
    std_range: Annotated
)

Parameters

NameTypeDefaultDescription
noise_typeLiteralgaussian-
mean_rangeAnnotated--
std_rangeAnnotated--

HEStainclass

HEStain(
    method: Literal['preset', 'random_preset', 'vahadane', 'macenko'] = random_preset,
    preset: Literal['ruifrok', 'macenko', 'standard', 'high_contrast', 'h_heavy', 'e_heavy', 'dark', 'light'] | None = None,
    intensity_scale_range: tuple[float, float] = (0.7, 1.3),
    intensity_shift_range: tuple[float, float] = (-0.2, 0.2),
    augment_background: bool = False,
    p: float = 0.5
)

Applies H&E (Hematoxylin and Eosin) stain augmentation to histopathology images. This transform simulates different H&E staining conditions using either: 1. Predefined stain matrices (8 standard references) 2. Vahadane method for stain extraction 3. Macenko method for stain extraction 4. Custom stain matrices

Parameters

NameTypeDefaultDescription
method
One of:
  • 'preset'
  • 'random_preset'
  • 'vahadane'
  • 'macenko'
random_presetMethod to use for stain augmentation: - "preset": Use predefined stain matrices - "random_preset": Randomly select a preset matrix each time - "vahadane": Extract using Vahadane method - "macenko": Extract using Macenko method Default: "preset"
preset
One of:
  • 'ruifrok'
  • 'macenko'
  • 'standard'
  • 'high_contrast'
  • 'h_heavy'
  • 'e_heavy'
  • 'dark'
  • 'light'] | Non
NonePreset stain matrix to use when method="preset": - "ruifrok": Standard reference from Ruifrok & Johnston - "macenko": Reference from Macenko's method - "standard": Typical bright-field microscopy - "high_contrast": Enhanced contrast - "h_heavy": Hematoxylin dominant - "e_heavy": Eosin dominant - "dark": Darker staining - "light": Lighter staining Default: "standard"
intensity_scale_rangetuple[float, float](0.7, 1.3)Range for multiplicative stain intensity variation. Values are multipliers between 0.5 and 1.5. For example: - (0.7, 1.3) means stain intensities will vary from 70% to 130% - (0.9, 1.1) gives subtle variations - (0.5, 1.5) gives dramatic variations Default: (0.7, 1.3)
intensity_shift_rangetuple[float, float](-0.2, 0.2)Range for additive stain intensity variation. Values between -0.3 and 0.3. For example: - (-0.2, 0.2) means intensities will be shifted by -20% to +20% - (-0.1, 0.1) gives subtle shifts - (-0.3, 0.3) gives dramatic shifts Default: (-0.2, 0.2)
augment_backgroundboolFalseWhether to apply augmentation to background regions. Default: False
pfloat0.5-

References

  • A. C. Ruifrok and D. A. Johnston, "Quantification of histochemical": Analytical and quantitative cytology and histology, 2001.
  • M. Macenko et al., "A method for normalizing histology slides for: 2009 IEEE International Symposium on quantitative analysis," 2009 IEEE International Symposium on Biomedical Imaging, 2009.

HueSaturationValueclass

HueSaturationValue(
    hue_shift_limit: tuple[float, float] | float = (-20, 20),
    sat_shift_limit: tuple[float, float] | float = (-30, 30),
    val_shift_limit: tuple[float, float] | float = (-20, 20),
    p: float = 0.5
)

Randomly change hue, saturation and value of the input image. This transform adjusts the HSV (Hue, Saturation, Value) channels of an input RGB image. It allows for independent control over each channel, providing a wide range of color and brightness modifications.

Parameters

NameTypeDefaultDescription
hue_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for changing hue. If a single float value is provided, the range will be (-hue_shift_limit, hue_shift_limit). Values should be in the range [-180, 180]. Default: (-20, 20).
sat_shift_limit
One of:
  • tuple[float, float]
  • float
(-30, 30)Range for changing saturation. If a single float value is provided, the range will be (-sat_shift_limit, sat_shift_limit). Values should be in the range [-255, 255]. Default: (-30, 30).
val_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for changing value (brightness). If a single float value is provided, the range will be (-val_shift_limit, val_shift_limit). Values should be in the range [-255, 255]. Default: (-20, 20).
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.HueSaturationValue(
...     hue_shift_limit=20,
...     sat_shift_limit=30,
...     val_shift_limit=20,
...     p=0.7
... )
>>> result = transform(image=image)
>>> augmented_image = result["image"]

Notes

- The transform first converts the input RGB image to the HSV color space. - Each channel (Hue, Saturation, Value) is adjusted independently. - Hue is circular, so it wraps around at 180 degrees. - For float32 images, the shift values are applied as percentages of the full range. - This transform is particularly useful for color augmentation and simulating different lighting conditions.

ISONoiseclass

ISONoise(
    color_shift: tuple[float, float] = (0.01, 0.05),
    intensity: tuple[float, float] = (0.1, 0.5),
    p: float = 0.5
)

Applies camera sensor noise to the input image, simulating high ISO settings. This transform adds random noise to an image, mimicking the effect of using high ISO settings in digital photography. It simulates two main components of ISO noise: 1. Color noise: random shifts in color hue 2. Luminance noise: random variations in pixel intensity

Parameters

NameTypeDefaultDescription
color_shifttuple[float, float](0.01, 0.05)Range for changing color hue. Values should be in the range [0, 1], where 1 represents a full 360° hue rotation. Default: (0.01, 0.05)
intensitytuple[float, float](0.1, 0.5)Range for the noise intensity. Higher values increase the strength of both color and luminance noise. Default: (0.1, 0.5)
pfloat0.5Probability of applying the transform. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ISONoise(color_shift=(0.01, 0.05), intensity=(0.1, 0.5), p=0.5)
>>> result = transform(image=image)
>>> noisy_image = result["image"]

Notes

- This transform only works with RGB images. It will raise a TypeError if applied to non-RGB images. - The color shift is applied in the HSV color space, affecting the hue channel. - Luminance noise is added to all channels independently. - This transform can be useful for data augmentation in low-light scenarios or when training models to be robust against noisy inputs.

References

Illuminationclass

Illumination(
    mode: Literal['linear', 'corner', 'gaussian'] = linear,
    intensity_range: tuple[float, float] = (0.01, 0.2),
    effect_type: Literal['brighten', 'darken', 'both'] = both,
    angle_range: tuple[float, float] = (0, 360),
    center_range: tuple[float, float] = (0.1, 0.9),
    sigma_range: tuple[float, float] = (0.2, 1.0),
    p: float = 0.5
)

Apply various illumination effects to the image. This transform simulates different lighting conditions by applying controlled illumination patterns. It can create effects like: - Directional lighting (linear mode) - Corner shadows/highlights (corner mode) - Spotlights or local lighting (gaussian mode) These effects can be used to: - Simulate natural lighting variations - Add dramatic lighting effects - Create synthetic shadows or highlights - Augment training data with different lighting conditions

Parameters

NameTypeDefaultDescription
mode
One of:
  • 'linear'
  • 'corner'
  • 'gaussian'
linearType of illumination pattern: - 'linear': Creates a smooth gradient across the image, simulating directional lighting like sunlight through a window - 'corner': Applies gradient from any corner, simulating light source from a corner - 'gaussian': Creates a circular spotlight effect, simulating local light sources Default: 'linear'
intensity_rangetuple[float, float](0.01, 0.2)Range for effect strength. Values between 0.01 and 0.2: - 0.01-0.05: Subtle lighting changes - 0.05-0.1: Moderate lighting effects - 0.1-0.2: Strong lighting effects Default: (0.01, 0.2)
effect_type
One of:
  • 'brighten'
  • 'darken'
  • 'both'
bothType of lighting change: - 'brighten': Only adds light (like a spotlight) - 'darken': Only removes light (like a shadow) - 'both': Randomly chooses between brightening and darkening Default: 'both'
angle_rangetuple[float, float](0, 360)Range for gradient angle in degrees. Controls direction of linear gradient: - 0°: Left to right - 90°: Top to bottom - 180°: Right to left - 270°: Bottom to top Only used for 'linear' mode. Default: (0, 360)
center_rangetuple[float, float](0.1, 0.9)Range for spotlight position. Values between 0 and 1 representing relative position: - (0, 0): Top-left corner - (1, 1): Bottom-right corner - (0.5, 0.5): Center of image Only used for 'gaussian' mode. Default: (0.1, 0.9)
sigma_rangetuple[float, float](0.2, 1.0)Range for spotlight size. Values between 0.2 and 1.0: - 0.2: Small, focused spotlight - 0.5: Medium-sized light area - 1.0: Broad, soft lighting Only used for 'gaussian' mode. Default: (0.2, 1.0)
pfloat0.5Probability of applying the transform. Default: 0.5

Notes

- The transform preserves image range and dtype - Effects are applied multiplicatively to preserve texture - Can be combined with other transforms for complex lighting scenarios - Useful for training models to be robust to lighting variations

References

ImageCompressionclass

ImageCompression(
    compression_type: Literal['jpeg', 'webp'] = jpeg,
    quality_range: tuple[int, int] = (99, 100),
    p: float = 0.5
)

Decrease image quality by applying JPEG or WebP compression. This transform simulates the effect of saving an image with lower quality settings, which can introduce compression artifacts. It's useful for data augmentation and for testing model robustness against varying image qualities.

Parameters

NameTypeDefaultDescription
compression_type
One of:
  • 'jpeg'
  • 'webp'
jpegType of compression to apply. - "jpeg": JPEG compression - "webp": WebP compression Default: "jpeg"
quality_rangetuple[int, int](99, 100)Range for the compression quality. The values should be in [1, 100] range, where: - 1 is the lowest quality (maximum compression) - 100 is the highest quality (minimum compression) Default: (99, 100)
pfloat0.5Probability of applying the transform. Default: 0.5.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ImageCompression(quality_range=(50, 90), compression_type=0, p=1.0)
>>> result = transform(image=image)
>>> compressed_image = result["image"]

Notes

- This transform expects images with 1, 3, or 4 channels. - For JPEG compression, alpha channels (4th channel) will be ignored. - WebP compression supports transparency (4 channels). - The actual file is not saved to disk; the compression is simulated in memory. - Lower quality values result in smaller file sizes but may introduce visible artifacts. - This transform can be useful for: * Data augmentation to improve model robustness * Testing how models perform on images of varying quality * Simulating images transmitted over low-bandwidth connections

InterpolationPydanticclass

InterpolationPydantic(
    upscale: Literal,
    downscale: Literal
)

Parameters

NameTypeDefaultDescription
upscaleLiteral--
downscaleLiteral--

InvertImgclass

InvertImg(
    p: float = 0.5
)

Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Lambdaclass

Lambda(
    image: Callable[..., Any] | None = None,
    mask: Callable[..., Any] | None = None,
    keypoints: Callable[..., Any] | None = None,
    bboxes: Callable[..., Any] | None = None,
    name: str | None = None,
    p: float = 1.0
)

A flexible transformation class for using user-defined transformation functions per targets. Function signature must include **kwargs to accept optional arguments like interpolation method, image size, etc:

Parameters

NameTypeDefaultDescription
image
One of:
  • Callable[..., Any]
  • None
NoneImage transformation function.
mask
One of:
  • Callable[..., Any]
  • None
NoneMask transformation function.
keypoints
One of:
  • Callable[..., Any]
  • None
NoneKeypoints transformation function.
bboxes
One of:
  • Callable[..., Any]
  • None
NoneBBoxes transformation function.
name
One of:
  • str
  • None
None-
pfloat1.0probability of applying the transform. Default: 1.0.

LaplaceParamsclass

LaplaceParams(
    noise_type: Literal = laplace,
    mean_range: Annotated,
    scale_range: Annotated
)

Parameters

NameTypeDefaultDescription
noise_typeLiterallaplace-
mean_rangeAnnotated--
scale_rangeAnnotated--

Morphologicalclass

Morphological(
    scale: tuple[int, int] | int = (2, 3),
    operation: Literal['erosion', 'dilation'] = dilation,
    p: float = 0.5
)

Apply a morphological operation (dilation or erosion) to an image, with particular value for enhancing document scans. Morphological operations modify the structure of the image. Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them. These operations are beneficial in document processing, for example: - Dilation helps in closing up gaps within text or making thin lines thicker, enhancing legibility for OCR (Optical Character Recognition). - Erosion can remove small white noise and detach connected objects, making the structure of larger objects more pronounced.

Parameters

NameTypeDefaultDescription
scale
One of:
  • tuple[int, int]
  • int
(2, 3)Specifies the size of the structuring element (kernel) used for the operation. - If an integer is provided, a square kernel of that size will be used. - If a tuple or list is provided, it should contain two integers representing the minimum and maximum sizes for the dilation kernel.
operation
One of:
  • 'erosion'
  • 'dilation'
dilationThe morphological operation to apply. Default is 'dilation'.
pfloat0.5The probability of applying this transformation. Default is 0.5.

Example

>>> import albumentations as A
>>> transform = A.Compose([
>>>     A.Morphological(scale=(2, 3), operation='dilation', p=0.5)
>>> ])
>>> image = transform(image=image)["image"]

MultiplicativeNoiseclass

MultiplicativeNoise(
    multiplier: tuple[float, float] | float = (0.9, 1.1),
    per_channel: bool = False,
    elementwise: bool = False,
    p: float = 0.5
)

Apply multiplicative noise to the input image. This transform multiplies each pixel in the image by a random value or array of values, effectively creating a noise pattern that scales with the image intensity.

Parameters

NameTypeDefaultDescription
multiplier
One of:
  • tuple[float, float]
  • float
(0.9, 1.1)The range for the random multiplier. Defines the range from which the multiplier is sampled. Default: (0.9, 1.1)
per_channelboolFalseIf True, use a different random multiplier for each channel. If False, use the same multiplier for all channels. Setting this to False is slightly faster. Default: False
elementwiseboolFalseIf True, generates a unique multiplier for each pixel. If False, generates a single multiplier (or one per channel if per_channel=True). Default: False
pfloat0.5Probability of applying the transform. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MultiplicativeNoise(multiplier=(0.9, 1.1), per_channel=True, p=1.0)
>>> result = transform(image=image)
>>> noisy_image = result["image"]

Notes

- When elementwise=False and per_channel=False, a single multiplier is applied to the entire image. - When elementwise=False and per_channel=True, each channel gets a different multiplier. - When elementwise=True and per_channel=False, each pixel gets the same multiplier across all channels. - When elementwise=True and per_channel=True, each pixel in each channel gets a unique multiplier. - Setting per_channel=False is slightly faster, especially for larger images. - This transform can be used to simulate various lighting conditions or to create noise that scales with image intensity.

NoiseParamsBaseclass

NoiseParamsBase(
    noise_type: str
)

Base class for all noise parameter models.

Parameters

NameTypeDefaultDescription
noise_typestr--

Normalizeclass

Normalize(
    mean: tuple[float, ...] | float | None = (0.485, 0.456, 0.406),
    std: tuple[float, ...] | float | None = (0.229, 0.224, 0.225),
    max_pixel_value: float | None = 255.0,
    normalization: Literal['standard', 'image', 'image_per_channel', 'min_max', 'min_max_per_channel'] = standard,
    p: float = 1.0
)

Applies various normalization techniques to an image. The specific normalization technique can be selected with the `normalization` parameter. Standard normalization is applied using the formula: `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`. Other normalization techniques adjust the image based on global or per-channel statistics, or scale pixel values to a specified range.

Parameters

NameTypeDefaultDescription
mean
One of:
  • tuple[float, ...]
  • float
  • None
(0.485, 0.456, 0.406)Mean values for standard normalization. For "standard" normalization, the default values are ImageNet mean values: (0.485, 0.456, 0.406).
std
One of:
  • tuple[float, ...]
  • float
  • None
(0.229, 0.224, 0.225)Standard deviation values for standard normalization. For "standard" normalization, the default values are ImageNet standard deviation :(0.229, 0.224, 0.225).
max_pixel_value
One of:
  • float
  • None
255.0Maximum possible pixel value, used for scaling in standard normalization. Defaults to 255.0.
normalization
One of:
  • 'standard'
  • 'image'
  • 'image_per_channel'
  • 'min_max'
  • 'min_max_per_channel'
standardSpecifies the normalization technique to apply. Defaults to "standard". - "standard": Applies the formula `(img - mean * max_pixel_value) / (std * max_pixel_value)`. The default mean and std are based on ImageNet. You can use mean and std values of (0.5, 0.5, 0.5) for inception normalization. And mean values of (0, 0, 0) and std values of (1, 1, 1) for YOLO. - "image": Normalizes the whole image based on its global mean and standard deviation. - "image_per_channel": Normalizes the image per channel based on each channel's mean and standard deviation. - "min_max": Scales the image pixel values to a [0, 1] range based on the global minimum and maximum pixel values. - "min_max_per_channel": Scales each channel of the image pixel values to a [0, 1] range based on the per-channel minimum and maximum pixel values.
pfloat1.0Probability of applying the transform. Defaults to 1.0.

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> # Standard ImageNet normalization
>>> transform = A.Normalize(
...     mean=(0.485, 0.456, 0.406),
...     std=(0.229, 0.224, 0.225),
...     max_pixel_value=255.0,
...     p=1.0
... )
>>> normalized_image = transform(image=image)["image"]
>>>
>>> # Min-max normalization
>>> transform_minmax = A.Normalize(normalization="min_max", p=1.0)
>>> normalized_image_minmax = transform_minmax(image=image)["image"]

Notes

- For "standard" normalization, `mean`, `std`, and `max_pixel_value` must be provided. - For other normalization types, these parameters are ignored. - For inception normalization, use mean values of (0.5, 0.5, 0.5). - For YOLO normalization, use mean values of (0, 0, 0) and std values of (1, 1, 1). - This transform is often used as a final step in image preprocessing pipelines to prepare images for neural network input.

PlanckianJitterclass

PlanckianJitter(
    mode: Literal['blackbody', 'cied'] = blackbody,
    temperature_limit: tuple[int, int] | None = None,
    sampling_method: Literal['uniform', 'gaussian'] = uniform,
    p: float = 0.5
)

Applies Planckian Jitter to the input image, simulating color temperature variations in illumination. This transform adjusts the color of an image to mimic the effect of different color temperatures of light sources, based on Planck's law of black body radiation. It can simulate the appearance of an image under various lighting conditions, from warm (reddish) to cool (bluish) color casts. PlanckianJitter vs. ColorJitter: PlanckianJitter is fundamentally different from ColorJitter in its approach and use cases: 1. Physics-based: PlanckianJitter is grounded in the physics of light, simulating real-world color temperature changes. ColorJitter applies arbitrary color adjustments. 2. Natural effects: This transform produces color shifts that correspond to natural lighting variations, making it ideal for outdoor scene simulation or color constancy problems. 3. Single parameter: Color changes are controlled by a single, physically meaningful parameter (color temperature), unlike ColorJitter's multiple abstract parameters. 4. Correlated changes: Color shifts are correlated across channels in a way that mimics natural light, whereas ColorJitter can make independent channel adjustments. When to use PlanckianJitter: - Simulating different times of day or lighting conditions in outdoor scenes - Augmenting data for computer vision tasks that need to be robust to natural lighting changes - Preparing synthetic data to better match real-world lighting variations - Color constancy research or applications - When you need physically plausible color variations rather than arbitrary color changes The logic behind PlanckianJitter: As the color temperature increases: 1. Lower temperatures (around 3000K) produce warm, reddish tones, simulating sunset or incandescent lighting. 2. Mid-range temperatures (around 5500K) correspond to daylight. 3. Higher temperatures (above 7000K) result in cool, bluish tones, similar to overcast sky or shade. This progression mimics the natural variation of sunlight throughout the day and in different weather conditions.

Parameters

NameTypeDefaultDescription
mode
One of:
  • 'blackbody'
  • 'cied'
blackbodyThe mode of the transformation. - "blackbody": Simulates blackbody radiation color changes. - "cied": Uses the CIE D illuminant series for color temperature simulation. Default: "blackbody"
temperature_limit
One of:
  • tuple[int, int]
  • None
NoneThe range of color temperatures (in Kelvin) to sample from. - For "blackbody" mode: Should be within [3000K, 15000K]. Default: (3000, 15000) - For "cied" mode: Should be within [4000K, 15000K]. Default: (4000, 15000) If None, the default ranges will be used based on the selected mode. Higher temperatures produce cooler (bluish) images, lower temperatures produce warmer (reddish) images.
sampling_method
One of:
  • 'uniform'
  • 'gaussian'
uniformMethod to sample the temperature. - "uniform": Samples uniformly across the specified range. - "gaussian": Samples from a Gaussian distribution centered at 6500K (approximate daylight). Default: "uniform"
pfloat0.5Probability of applying the transform. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> transform = A.PlanckianJitter(mode="blackbody",
...                               temperature_range=(3000, 9000),
...                               sampling_method="uniform",
...                               p=1.0)
>>> result = transform(image=image)
>>> jittered_image = result["image"]

Notes

- The transform preserves the overall brightness of the image while shifting its color. - The "blackbody" mode provides a wider range of color shifts, especially in the lower (warmer) temperatures. - The "cied" mode is based on standard illuminants and may provide more realistic daylight variations. - The Gaussian sampling method tends to produce more subtle variations, as it's centered around daylight. - Unlike ColorJitter, this transform ensures that color changes are physically plausible and correlated across channels, maintaining the natural appearance of the scene under different lighting conditions.

PlasmaBrightnessContrastclass

PlasmaBrightnessContrast(
    brightness_range: tuple[float, float] = (-0.3, 0.3),
    contrast_range: tuple[float, float] = (-0.3, 0.3),
    plasma_size: int = 256,
    roughness: float = 3.0,
    p: float = 0.5
)

Apply plasma fractal pattern to modify image brightness and contrast. Uses Diamond-Square algorithm to generate organic-looking fractal patterns that create spatially-varying brightness and contrast adjustments.

Parameters

NameTypeDefaultDescription
brightness_rangetuple[float, float](-0.3, 0.3)Range for brightness adjustment strength. Values between -1 and 1: - Positive values increase brightness - Negative values decrease brightness - 0 means no brightness change Default: (-0.3, 0.3)
contrast_rangetuple[float, float](-0.3, 0.3)Range for contrast adjustment strength. Values between -1 and 1: - Positive values increase contrast - Negative values decrease contrast - 0 means no contrast change Default: (-0.3, 0.3)
plasma_sizeint256Size of the initial plasma pattern grid. Larger values create more detailed patterns but are slower to compute. The pattern will be resized to match the input image dimensions. Default: 256
roughnessfloat3.0Controls how quickly the noise amplitude increases at each iteration. Must be greater than 0: - Low values (< 1.0): Smoother, more gradual pattern - Medium values (~2.0): Natural-looking pattern - High values (> 3.0): Very rough, noisy pattern Default: 3.0
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- Works with any number of channels (grayscale, RGB, multispectral) - The same plasma pattern is applied to all channels - Operations are performed in float32 precision - Final values are clipped to valid range [0, max_value]

References

PlasmaShadowclass

PlasmaShadow(
    shadow_intensity_range: tuple[float, float] = (0.3, 0.7),
    plasma_size: int = 256,
    roughness: float = 3.0,
    p: float = 0.5
)

Apply plasma-based shadow effect to the image using Diamond-Square algorithm. Creates organic-looking shadows using plasma fractal noise pattern. The shadow intensity varies smoothly across the image, creating natural-looking darkening effects that can simulate shadows, shading, or lighting variations.

Parameters

NameTypeDefaultDescription
shadow_intensity_rangetuple[float, float](0.3, 0.7)Range for shadow intensity. Values between 0 and 1: - 0 means no shadow (original image) - 1 means maximum darkening (black) - Values between create partial shadows Default: (0.3, 0.7)
plasma_sizeint256-
roughnessfloat3.0Controls how quickly the noise amplitude increases at each iteration. Must be greater than 0: - Low values (< 1.0): Smoother, more gradual shadows - Medium values (~2.0): Natural-looking shadows - High values (> 3.0): Very rough, noisy shadows Default: 3.0
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The transform darkens the image using a plasma pattern - Works with any number of channels (grayscale, RGB, multispectral) - Shadow pattern is generated using Diamond-Square algorithm with specific kernels - The same shadow pattern is applied to all channels - Final values are clipped to valid range [0, max_value]

References

Posterizeclass

Posterize(
    num_bits: int | tuple[int, int] | list[tuple[int, int]] = 4,
    p: float = 0.5
)

Reduces the number of bits for each color channel in the image. This transform applies color posterization, a technique that reduces the number of distinct colors used in an image. It works by lowering the number of bits used to represent each color channel, effectively creating a "poster-like" effect with fewer color gradations.

Parameters

NameTypeDefaultDescription
num_bits
One of:
  • int
  • tuple[int, int]
  • list[tuple[int, int]]
4Defines the number of bits to keep for each color channel. Can be specified in several ways: - Single int: Same number of bits for all channels. Range: [1, 7]. - tuple of two ints: (min_bits, max_bits) to randomly choose from. Range for each: [1, 7]. - list of three ints: Specific number of bits for each channel [r_bits, g_bits, b_bits]. - list of three tuples: Ranges for each channel [(r_min, r_max), (g_min, g_max), (b_min, b_max)]. Default: 4
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The effect becomes more pronounced as the number of bits is reduced. - This transform can create interesting artistic effects or be used for image compression simulation. - Posterization is particularly useful for: * Creating stylized or retro-looking images * Reducing the color palette for specific artistic effects * Simulating the look of older or lower-quality digital images * Data augmentation in scenarios where color depth might vary

RGBShiftclass

RGBShift(
    r_shift_limit: tuple[float, float] | float = (-20, 20),
    g_shift_limit: tuple[float, float] | float = (-20, 20),
    b_shift_limit: tuple[float, float] | float = (-20, 20),
    p: float = 0.5
)

Randomly shift values for each channel of the input RGB image. A specialized version of AdditiveNoise that applies constant uniform shifts to RGB channels. Each channel (R,G,B) can have its own shift range specified.

Parameters

NameTypeDefaultDescription
r_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for shifting the red channel. Options: - If tuple (min, max): Sample shift value from this range - If int: Sample shift value from (-r_shift_limit, r_shift_limit) - For uint8 images: Values represent absolute shifts in [0, 255] - For float images: Values represent relative shifts in [0, 1] Default: (-20, 20)
g_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for shifting the green channel. Options: - If tuple (min, max): Sample shift value from this range - If int: Sample shift value from (-g_shift_limit, g_shift_limit) - For uint8 images: Values represent absolute shifts in [0, 255] - For float images: Values represent relative shifts in [0, 1] Default: (-20, 20)
b_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for shifting the blue channel. Options: - If tuple (min, max): Sample shift value from this range - If int: Sample shift value from (-b_shift_limit, b_shift_limit) - For uint8 images: Values represent absolute shifts in [0, 255] - For float images: Values represent relative shifts in [0, 1] Default: (-20, 20)
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- Values are shifted independently for each channel - For uint8 images: * Input ranges like (-20, 20) represent pixel value shifts * A shift of 20 means adding 20 to that channel * Final values are clipped to [0, 255] - For float32 images: * Input ranges like (-0.1, 0.1) represent relative shifts * A shift of 0.1 means adding 0.1 to that channel * Final values are clipped to [0, 1]

RandomBrightnessContrastclass

RandomBrightnessContrast(
    brightness_limit: tuple[float, float] | float = (-0.2, 0.2),
    contrast_limit: tuple[float, float] | float = (-0.2, 0.2),
    brightness_by_max: bool = True,
    ensure_safe_range: bool = False,
    p: float = 0.5
)

Randomly changes the brightness and contrast of the input image. This transform adjusts the brightness and contrast of an image simultaneously, allowing for a wide range of lighting and contrast variations. It's particularly useful for data augmentation in computer vision tasks, helping models become more robust to different lighting conditions.

Parameters

NameTypeDefaultDescription
brightness_limit
One of:
  • tuple[float, float]
  • float
(-0.2, 0.2)Factor range for changing brightness. If a single float value is provided, the range will be (-brightness_limit, brightness_limit). Values should typically be in the range [-1.0, 1.0], where 0 means no change, 1.0 means maximum brightness, and -1.0 means minimum brightness. Default: (-0.2, 0.2).
contrast_limit
One of:
  • tuple[float, float]
  • float
(-0.2, 0.2)Factor range for changing contrast. If a single float value is provided, the range will be (-contrast_limit, contrast_limit). Values should typically be in the range [-1.0, 1.0], where 0 means no change, 1.0 means maximum increase in contrast, and -1.0 means maximum decrease in contrast. Default: (-0.2, 0.2).
brightness_by_maxboolTrueIf True, adjusts brightness by scaling pixel values up to the maximum value of the image's dtype. If False, uses the mean pixel value for adjustment. Default: True.
ensure_safe_rangeboolFalseIf True, adjusts alpha and beta to prevent overflow/underflow. This ensures output values stay within the valid range for the image dtype without clipping. Default: False.
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The order of operation is: contrast adjustment, then brightness adjustment. - For uint8 images, the output is clipped to [0, 255] range. - For float32 images, the output is clipped to [0, 1] range. - The `brightness_by_max` parameter affects how brightness is adjusted: * If True, brightness adjustment is more pronounced and can lead to more saturated results. * If False, brightness adjustment is more subtle and preserves the overall lighting better. - This transform is useful for: * Simulating different lighting conditions * Enhancing low-light or overexposed images * Data augmentation to improve model robustness

RandomFogclass

RandomFog(
    alpha_coef: float = 0.08,
    fog_coef_range: tuple[float, float] = (0.3, 1),
    p: float = 0.5
)

Simulates fog for the image by adding random fog-like artifacts. This transform creates a fog effect by generating semi-transparent overlays that mimic the visual characteristics of fog. The fog intensity and distribution can be controlled to create various fog-like conditions.

Parameters

NameTypeDefaultDescription
alpha_coeffloat0.08Transparency of the fog circles. Should be in [0, 1] range. Default: 0.08.
fog_coef_rangetuple[float, float](0.3, 1)Range for fog intensity coefficient. Should be in [0, 1] range.
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The fog effect is created by overlaying semi-transparent circles on the image. - Higher fog coefficient values result in denser fog effects. - The fog is typically denser in the center of the image and gradually decreases towards the edges. - This transform is useful for: * Simulating various weather conditions in outdoor scenes * Data augmentation for improving model robustness to foggy conditions * Creating atmospheric effects in image editing

RandomGammaclass

RandomGamma(
    gamma_limit: tuple[float, float] | float = (80, 120),
    p: float = 0.5
)

Applies random gamma correction to the input image. Gamma correction, or simply gamma, is a nonlinear operation used to encode and decode luminance or tristimulus values in imaging systems. This transform can adjust the brightness of an image while preserving the relative differences between darker and lighter areas, making it useful for simulating different lighting conditions or correcting for display characteristics.

Parameters

NameTypeDefaultDescription
gamma_limit
One of:
  • tuple[float, float]
  • float
(80, 120)If gamma_limit is a single float value, the range will be (1, gamma_limit). If it's a tuple of two floats, they will serve as the lower and upper bounds for gamma adjustment. Values are in terms of percentage change, e.g., (80, 120) means the gamma will be between 80% and 120% of the original. Default: (80, 120).
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The gamma correction is applied using the formula: output = input^gamma - Gamma values > 1 will make the image darker, while values < 1 will make it brighter - This transform is particularly useful for: * Simulating different lighting conditions * Correcting for non-linear display characteristics * Enhancing contrast in certain regions of the image * Data augmentation in computer vision tasks

RandomGravelclass

RandomGravel(
    gravel_roi: tuple[float, float, float, float] = (0.1, 0.4, 0.9, 0.9),
    number_of_patches: int = 2,
    p: float = 0.5
)

Adds gravel-like artifacts to the input image. This transform simulates the appearance of gravel or small stones scattered across specific regions of an image. It's particularly useful for augmenting datasets of road or terrain images, adding realistic texture variations.

Parameters

NameTypeDefaultDescription
gravel_roituple[float, float, float, float](0.1, 0.4, 0.9, 0.9)Region of interest where gravel will be added, specified as (x_min, y_min, x_max, y_max) in relative coordinates [0, 1]. Default: (0.1, 0.4, 0.9, 0.9).
number_of_patchesint2Number of gravel patch regions to generate within the ROI. Each patch will contain multiple gravel particles. Default: 2.
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The gravel effect is created by modifying the saturation channel in the HLS color space. - Gravel particles are distributed within randomly generated patches inside the specified ROI. - This transform is particularly useful for: * Augmenting datasets for road condition analysis * Simulating variations in terrain for computer vision tasks * Adding realistic texture to synthetic images of outdoor scenes

RandomRainclass

RandomRain(
    slant_range: tuple[float, float] = (-10, 10),
    drop_length: int | None = None,
    drop_width: int = 1,
    drop_color: tuple[int, int, int] = (200, 200, 200),
    blur_value: int = 7,
    brightness_coefficient: float = 0.7,
    rain_type: Literal['drizzle', 'heavy', 'torrential', 'default'] = default,
    p: float = 0.5
)

Adds rain effects to an image. This transform simulates rainfall by overlaying semi-transparent streaks onto the image, creating a realistic rain effect. It can be used to augment datasets for computer vision tasks that need to perform well in rainy conditions.

Parameters

NameTypeDefaultDescription
slant_rangetuple[float, float](-10, 10)Range for the rain slant angle in degrees. Negative values slant to the left, positive to the right. Default: (-10, 10).
drop_length
One of:
  • int
  • None
NoneLength of the rain drops in pixels. If None, drop length will be automatically calculated as height // 8. This allows the rain effect to scale with the image size. Default: None
drop_widthint1Width of the rain drops in pixels. Default: 1.
drop_colortuple[int, int, int](200, 200, 200)Color of the rain drops in RGB format. Default: (200, 200, 200).
blur_valueint7Blur value for simulating rain effect. Rainy views are typically blurry. Default: 7.
brightness_coefficientfloat0.7Coefficient to adjust the brightness of the image. Rainy scenes are usually darker. Should be in the range (0, 1]. Default: 0.7.
rain_type
One of:
  • 'drizzle'
  • 'heavy'
  • 'torrential'
  • 'default'
defaultType of rain to simulate.
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The rain effect is created by drawing semi-transparent lines on the image. - The slant of the rain can be controlled to simulate wind effects. - Different rain types (drizzle, heavy, torrential) adjust the density and appearance of the rain. - The transform also adjusts image brightness and applies a blur to simulate the visual effects of rain. - This transform is particularly useful for: * Augmenting datasets for autonomous driving in rainy conditions * Testing the robustness of computer vision models to weather effects * Creating realistic rainy scenes for image editing or film production

RandomShadowclass

RandomShadow(
    shadow_roi: tuple[float, float, float, float] = (0, 0.5, 1, 1),
    num_shadows_limit: tuple[int, int] = (1, 2),
    shadow_dimension: int = 5,
    shadow_intensity_range: tuple[float, float] = (0.5, 0.5),
    p: float = 0.5
)

Simulates shadows for the image by reducing the brightness of the image in shadow regions. This transform adds realistic shadow effects to images, which can be useful for augmenting datasets for outdoor scene analysis, autonomous driving, or any computer vision task where shadows may be present.

Parameters

NameTypeDefaultDescription
shadow_roituple[float, float, float, float](0, 0.5, 1, 1)Region of the image where shadows will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1]. Default: (0, 0.5, 1, 1).
num_shadows_limittuple[int, int](1, 2)Lower and upper limits for the possible number of shadows. Default: (1, 2).
shadow_dimensionint5Number of edges in the shadow polygons. Default: 5.
shadow_intensity_rangetuple[float, float](0.5, 0.5)Range for the shadow intensity. Larger value means darker shadow. Should be two float values between 0 and 1. Default: (0.5, 0.5).
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- Shadows are created by generating random polygons within the specified ROI and reducing the brightness of the image in these areas. - The number of shadows, their shapes, and intensities can be randomized for variety. - This transform is particularly useful for: * Augmenting datasets for outdoor scene understanding * Improving robustness of object detection models to shadowed conditions * Simulating different lighting conditions in synthetic datasets

RandomSnowclass

RandomSnow(
    brightness_coeff: float = 2.5,
    snow_point_range: tuple[float, float] = (0.1, 0.3),
    method: Literal['bleach', 'texture'] = bleach,
    p: float = 0.5
)

Applies a random snow effect to the input image. This transform simulates snowfall by either bleaching out some pixel values or adding a snow texture to the image, depending on the chosen method.

Parameters

NameTypeDefaultDescription
brightness_coefffloat2.5Coefficient applied to increase the brightness of pixels below the snow_point threshold. Larger values lead to more pronounced snow effects. Should be > 0. Default: 2.5.
snow_point_rangetuple[float, float](0.1, 0.3)Range for the snow point threshold. Both values should be in the (0, 1) range. Default: (0.1, 0.3).
method
One of:
  • 'bleach'
  • 'texture'
bleachThe snow simulation method to use. Options are: - "bleach": Uses a simple pixel value thresholding technique. - "texture": Applies a more realistic snow texture overlay. Default: "texture".
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The "bleach" method increases the brightness of pixels above a certain threshold, creating a simple snow effect. This method is faster but may look less realistic. - The "texture" method creates a more realistic snow effect through the following steps: 1. Converts the image to HSV color space for better control over brightness. 2. Increases overall image brightness to simulate the reflective nature of snow. 3. Generates a snow texture using Gaussian noise, which is then smoothed with a Gaussian filter. 4. Applies a depth effect to the snow texture, making it more prominent at the top of the image. 5. Blends the snow texture with the original image using alpha compositing. 6. Adds a slight blue tint to simulate the cool color of snow. 7. Adds random sparkle effects to simulate light reflecting off snow crystals. This method produces a more realistic result but is computationally more expensive.

References

RandomSunFlareclass

RandomSunFlare(
    flare_roi: tuple[float, float, float, float] = (0, 0, 1, 0.5),
    src_radius: int = 400,
    src_color: tuple[int, ...] = (255, 255, 255),
    angle_range: tuple[float, float] = (0, 1),
    num_flare_circles_range: tuple[int, int] = (6, 10),
    method: Literal['overlay', 'physics_based'] = overlay,
    p: float = 0.5
)

Simulates a sun flare effect on the image by adding circles of light. This transform creates a sun flare effect by overlaying multiple semi-transparent circles of varying sizes and intensities along a line originating from a "sun" point. It offers two methods: a simple overlay technique and a more complex physics-based approach.

Parameters

NameTypeDefaultDescription
flare_roituple[float, float, float, float](0, 0, 1, 0.5)Region of interest where the sun flare can appear. Values are in the range [0, 1] and represent (x_min, y_min, x_max, y_max) in relative coordinates. Default: (0, 0, 1, 0.5).
src_radiusint400Radius of the sun circle in pixels. Default: 400.
src_colortuple[int, ...](255, 255, 255)Color of the sun in RGB format. Default: (255, 255, 255).
angle_rangetuple[float, float](0, 1)Range of angles (in radians) for the flare direction. Values should be in the range [0, 1], where 0 represents 0 radians and 1 represents 2π radians. Default: (0, 1).
num_flare_circles_rangetuple[int, int](6, 10)Range for the number of flare circles to generate. Default: (6, 10).
method
One of:
  • 'overlay'
  • 'physics_based'
overlayMethod to use for generating the sun flare. "overlay" uses a simple alpha blending technique, while "physics_based" simulates more realistic optical phenomena. Default: "overlay".
pfloat0.5Probability of applying the transform. Default: 0.5.

RandomToneCurveclass

RandomToneCurve(
    scale: float = 0.1,
    per_channel: bool = False,
    p: float = 0.5
)

Randomly change the relationship between bright and dark areas of the image by manipulating its tone curve. This transform applies a random S-curve to the image's tone curve, adjusting the brightness and contrast in a non-linear manner. It can be applied to the entire image or to each channel separately.

Parameters

NameTypeDefaultDescription
scalefloat0.1Standard deviation of the normal distribution used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Higher values will result in more dramatic changes to the image. Default: 0.1
per_channelboolFalseIf True, the tone curve will be applied to each channel of the input image separately, which can lead to color distortion. If False, the same curve is applied to all channels, preserving the original color relationships. Default: False
pfloat0.5Probability of applying the transform. Default: 0.5

Notes

- This transform modifies the image's histogram by applying a smooth, S-shaped curve to it. - The S-curve is defined by moving two control points of a quadratic Bézier curve. - When per_channel is False, the same curve is applied to all channels, maintaining color balance. - When per_channel is True, different curves are applied to each channel, which can create color shifts. - This transform can be used to adjust image contrast and brightness in a more natural way than linear transforms. - The effect can range from subtle contrast adjustments to more dramatic "vintage" or "faded" looks.

References

RingingOvershootclass

RingingOvershoot(
    blur_limit: tuple[int, int] | int = (7, 15),
    cutoff: tuple[float, float] = (0.7853981633974483, 1.5707963267948966),
    p: float = 0.5
)

Create ringing or overshoot artifacts by convolving the image with a 2D sinc filter. This transform simulates the ringing artifacts that can occur in digital image processing, particularly after sharpening or edge enhancement operations. It creates oscillations or overshoots near sharp transitions in the image.

Parameters

NameTypeDefaultDescription
blur_limit
One of:
  • tuple[int, int]
  • int
(7, 15)Maximum kernel size for the sinc filter. Must be an odd number in the range [3, inf). If a single int is provided, the kernel size will be randomly chosen from the range (3, blur_limit). If a tuple (min, max) is provided, the kernel size will be randomly chosen from the range (min, max). Default: (7, 15).
cutofftuple[float, float](0.7853981633974483, 1.5707963267948966)Range to choose the cutoff frequency in radians. Values should be in the range (0, π). A lower cutoff frequency will result in more pronounced ringing effects. Default: (π/4, π/2).
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- Ringing artifacts are oscillations of the image intensity function in the neighborhood of sharp transitions, such as edges or object boundaries. - This transform uses a 2D sinc filter (also known as a 2D cardinal sine function) to introduce these artifacts. - The severity of the ringing effect is controlled by both the kernel size (blur_limit) and the cutoff frequency. - Larger kernel sizes and lower cutoff frequencies will generally produce more noticeable ringing effects. - This transform can be useful for: * Simulating imperfections in image processing or transmission systems * Testing the robustness of computer vision models to ringing artifacts * Creating artistic effects that emphasize edges and transitions in images

References

SaltAndPepperclass

SaltAndPepper(
    amount: tuple[float, float] = (0.01, 0.06),
    salt_vs_pepper: tuple[float, float] = (0.4, 0.6),
    p: float = 0.5
)

Apply salt and pepper noise to the input image. Salt and pepper noise is a form of impulse noise that randomly sets pixels to either maximum value (salt) or minimum value (pepper). The amount and proportion of salt vs pepper noise can be controlled. The same noise mask is applied to all channels of the image to preserve color consistency.

Parameters

NameTypeDefaultDescription
amounttuple[float, float](0.01, 0.06)Range for total amount of noise (both salt and pepper). Values between 0 and 1. For example: - 0.05 means 5% of all pixels will be replaced with noise - (0.01, 0.06) will sample amount uniformly from 1% to 6% Default: (0.01, 0.06)
salt_vs_peppertuple[float, float](0.4, 0.6)Range for ratio of salt (white) vs pepper (black) noise. Values between 0 and 1. For example: - 0.5 means equal amounts of salt and pepper - 0.7 means 70% of noisy pixels will be salt, 30% pepper - (0.4, 0.6) will sample ratio uniformly from 40% to 60% Default: (0.4, 0.6)
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- Salt noise sets pixels to maximum value (255 for uint8, 1.0 for float32) - Pepper noise sets pixels to 0 - The noise mask is generated once and applied to all channels to maintain color consistency (i.e., if a pixel is set to salt, all its color channels will be set to maximum value) - The exact number of affected pixels matches the specified amount as masks are generated without overlap

References

  • Digital Image Processing: Rafael C. Gonzalez and Richard E. Woods, 4th Edition, Chapter 5: Image Restoration and Reconstruction.
  • Fundamentals of Digital Image Processing: A. K. Jain, Chapter 7: Image Degradation and Restoration.
  • Salt and pepper noise: https://en.wikipedia.org/wiki/Salt-and-pepper_noise

Sharpenclass

Sharpen(
    alpha: tuple[float, float] = (0.2, 0.5),
    lightness: tuple[float, float] = (0.5, 1.0),
    method: Literal['kernel', 'gaussian'] = kernel,
    kernel_size: int = 5,
    sigma: float = 1.0,
    p: float = 0.5
)

Sharpen the input image using either kernel-based or Gaussian interpolation method. Implements two different approaches to image sharpening: 1. Traditional kernel-based method using Laplacian operator 2. Gaussian interpolation method (similar to Kornia's approach)

Parameters

NameTypeDefaultDescription
alphatuple[float, float](0.2, 0.5)Range for the visibility of sharpening effect. At 0, only the original image is visible, at 1.0 only its processed version is visible. Values should be in the range [0, 1]. Used in both methods. Default: (0.2, 0.5).
lightnesstuple[float, float](0.5, 1.0)Range for the lightness of the sharpened image. Only used in 'kernel' method. Larger values create higher contrast. Values should be greater than 0. Default: (0.5, 1.0).
method
One of:
  • 'kernel'
  • 'gaussian'
kernelSharpening algorithm to use: - 'kernel': Traditional kernel-based sharpening using Laplacian operator - 'gaussian': Interpolation between Gaussian blurred and original image Default: 'kernel'
kernel_sizeint5Size of the Gaussian blur kernel for 'gaussian' method. Must be odd. Default: 5
sigmafloat1.0Standard deviation for Gaussian kernel in 'gaussian' method. Default: 1.0
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- Kernel sizes must be odd to maintain spatial alignment - Methods produce different visual results: * Kernel method: More pronounced edges, possible artifacts * Gaussian method: More natural look, limited to original sharpness

References

ShotNoiseclass

ShotNoise(
    scale_range: tuple[float, float] = (0.1, 0.3),
    p: float = 0.5
)

Apply shot noise to the image by modeling photon counting as a Poisson process. Shot noise (also known as Poisson noise) occurs in imaging due to the quantum nature of light. When photons hit an imaging sensor, they arrive at random times following Poisson statistics. This transform simulates this physical process in linear light space by: 1. Converting to linear space (removing gamma) 2. Treating each pixel value as an expected photon count 3. Sampling actual photon counts from a Poisson distribution 4. Converting back to display space (reapplying gamma) The noise characteristics follow real camera behavior: - Noise variance equals signal mean in linear space (Poisson statistics) - Brighter regions have more absolute noise but less relative noise - Darker regions have less absolute noise but more relative noise - Noise is generated independently for each pixel and color channel

Parameters

NameTypeDefaultDescription
scale_rangetuple[float, float](0.1, 0.3)Range for sampling the noise scale factor. Represents the reciprocal of the expected photon count per unit intensity. Higher values mean more noise: - scale = 0.1: ~100 photons per unit intensity (low noise) - scale = 1.0: ~1 photon per unit intensity (moderate noise) - scale = 10.0: ~0.1 photons per unit intensity (high noise) Default: (0.1, 0.3)
pfloat0.5Probability of applying the transform. Default: 0.5

Example

>>> import numpy as np
>>> import albumentations as A
>>> # Generate synthetic image
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> # Apply moderate shot noise
>>> transform = A.ShotNoise(scale_range=(0.1, 1.0), p=1.0)
>>> noisy_image = transform(image=image)["image"]

Notes

- Performs calculations in linear light space (gamma = 2.2) - Preserves the image's mean intensity - Memory efficient with in-place operations - Thread-safe with independent random seeds

Solarizeclass

Solarize(
    threshold_range: tuple[float, float] = (0.5, 0.5),
    p: float = 0.5
)

Invert all pixel values above a threshold. This transform applies a solarization effect to the input image. Solarization is a phenomenon in photography in which the image recorded on a negative or on a photographic print is wholly or partially reversed in tone. Dark areas appear light or light areas appear dark. In this implementation, all pixel values above a threshold are inverted.

Parameters

NameTypeDefaultDescription
threshold_rangetuple[float, float](0.5, 0.5)Range for solarizing threshold as a fraction of maximum value. The threshold_range should be in the range [0, 1] and will be multiplied by the maximum value of the image type (255 for uint8 images or 1.0 for float images). Default: (0.5, 0.5) (corresponds to 127.5 for uint8 and 0.5 for float32).
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- For uint8 images, pixel values above the threshold are inverted as: 255 - pixel_value - For float32 images, pixel values above the threshold are inverted as: 1.0 - pixel_value - The threshold is applied to each channel independently - The threshold is calculated in two steps: 1. Sample a value from threshold_range 2. Multiply by the image's maximum value: * For uint8: threshold = sampled_value * 255 * For float32: threshold = sampled_value * 1.0 - This transform can create interesting artistic effects or be used for data augmentation

Spatterclass

Spatter(
    mean: tuple[float, float] | float = (0.65, 0.65),
    std: tuple[float, float] | float = (0.3, 0.3),
    gauss_sigma: tuple[float, float] | float = (2, 2),
    cutout_threshold: tuple[float, float] | float = (0.68, 0.68),
    intensity: tuple[float, float] | float = (0.6, 0.6),
    mode: Literal['rain', 'mud'] = rain,
    color: tuple[int, ...] | None = None,
    p: float = 0.5
)

Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

Parameters

NameTypeDefaultDescription
mean
One of:
  • tuple[float, float]
  • float
(0.65, 0.65)Mean value of normal distribution for generating liquid layer. If single float mean will be sampled from `(0, mean)` If tuple of float mean will be sampled from range `(mean[0], mean[1])`. If you want constant value use (mean, mean). Default (0.65, 0.65)
std
One of:
  • tuple[float, float]
  • float
(0.3, 0.3)Standard deviation value of normal distribution for generating liquid layer. If single float the number will be sampled from `(0, std)`. If tuple of float std will be sampled from range `(std[0], std[1])`. If you want constant value use (std, std). Default: (0.3, 0.3).
gauss_sigma
One of:
  • tuple[float, float]
  • float
(2, 2)Sigma value for gaussian filtering of liquid layer. If single float the number will be sampled from `(0, gauss_sigma)`. If tuple of float gauss_sigma will be sampled from range `(gauss_sigma[0], gauss_sigma[1])`. If you want constant value use (gauss_sigma, gauss_sigma). Default: (2, 3).
cutout_threshold
One of:
  • tuple[float, float]
  • float
(0.68, 0.68)Threshold for filtering liquid layer (determines number of drops). If single float it will used as cutout_threshold. If single float the number will be sampled from `(0, cutout_threshold)`. If tuple of float cutout_threshold will be sampled from range `(cutout_threshold[0], cutout_threshold[1])`. If you want constant value use `(cutout_threshold, cutout_threshold)`. Default: (0.68, 0.68).
intensity
One of:
  • tuple[float, float]
  • float
(0.6, 0.6)Intensity of corruption. If single float the number will be sampled from `(0, intensity)`. If tuple of float intensity will be sampled from range `(intensity[0], intensity[1])`. If you want constant value use `(intensity, intensity)`. Default: (0.6, 0.6).
mode
One of:
  • 'rain'
  • 'mud'
rainType of corruption. Default: "rain".
color
One of:
  • tuple[int, ...]
  • None
NoneCorruption elements color. If list uses provided list as color for the effect. If None uses default colors based on mode (rain: (238, 238, 175), mud: (20, 42, 63)).
pfloat0.5probability of applying the transform. Default: 0.5.

References

Superpixelsclass

Superpixels(
    p_replace: tuple[float, float] | float = (0, 0.1),
    n_segments: tuple[int, int] | int = (100, 100),
    max_size: int | None = 128,
    interpolation: Literal[cv2.INTER_NEAREST, cv2.INTER_NEAREST_EXACT, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4, cv2.INTER_LINEAR_EXACT] = 1,
    p: float = 0.5
)

Transform images partially/completely to their superpixel representation.

Parameters

NameTypeDefaultDescription
p_replace
One of:
  • tuple[float, float]
  • float
(0, 0.1)Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed). * A probability of ``0.0`` would mean, that the pixels in no segment are replaced by their average color (image is not changed at all). * A probability of ``0.5`` would mean, that around half of all segments are replaced by their average color. * A probability of ``1.0`` would mean, that all segments are replaced by their average color (resulting in a voronoi image). Behavior based on chosen data types for this parameter: * If a ``float``, then that ``float`` will always be used. * If ``tuple`` ``(a, b)``, then a random probability will be sampled from the interval ``[a, b]`` per image. Default: (0.1, 0.3)
n_segments
One of:
  • tuple[int, int]
  • int
(100, 100)Rough target number of how many superpixels to generate. The algorithm may deviate from this number. Lower value will lead to coarser superpixels. Higher values are computationally more intensive and will hence lead to a slowdown. If tuple ``(a, b)``, then a value from the discrete interval ``[a..b]`` will be sampled per image. Default: (15, 120)
max_size
One of:
  • int
  • None
128Maximum image size at which the augmentation is performed. If the width or height of an image exceeds this value, it will be downscaled before the augmentation so that the longest side matches `max_size`. This is done to speed up the process. The final output image has the same size as the input image. Note that in case `p_replace` is below ``1.0``, the down-/upscaling will affect the not-replaced pixels too. Use ``None`` to apply no down-/upscaling. Default: 128
interpolation
One of:
  • cv2.INTER_NEAREST
  • cv2.INTER_NEAREST_EXACT
  • cv2.INTER_LINEAR
  • cv2.INTER_CUBIC
  • cv2.INTER_AREA
  • cv2.INTER_LANCZOS4
  • cv2.INTER_LINEAR_EXACT
1Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- This transform can significantly change the visual appearance of the image. - The transform makes use of a superpixel algorithm, which tends to be slow. If performance is a concern, consider using `max_size` to limit the image size. - The effect of this transform can vary greatly depending on the `p_replace` and `n_segments` parameters. - When `p_replace` is high, the image can become highly abstracted, resembling a voronoi diagram. - The transform preserves the original image type (uint8 or float32).

ToFloatclass

ToFloat(
    max_value: float | None = None,
    p: float = 1.0
)

Convert the input image to a floating-point representation. This transform divides pixel values by `max_value` to get a float32 output array where all values lie in the range [0, 1.0]. It's useful for normalizing image data before feeding it into neural networks or other algorithms that expect float input.

Parameters

NameTypeDefaultDescription
max_value
One of:
  • float
  • None
NoneThe maximum possible input value. If None, the transform will try to infer the maximum value by inspecting the data type of the input image: - uint8: 255 - uint16: 65535 - uint32: 4294967295 - float32: 1.0 Default: None.
pfloat1.0Probability of applying the transform. Default: 1.0.

Returns

  • np.ndarray: Image in floating point representation, with values in range [0, 1.0].

Notes

- If the input image is already float32 with values in [0, 1], it will be returned unchanged. - For integer types (uint8, uint16, uint32), the function will scale the values to [0, 1] range. - The output will always be float32, regardless of the input type. - This transform is often used as a preprocessing step before applying other transformations or feeding the image into a neural network.

ToGrayclass

ToGray(
    num_output_channels: int = 3,
    method: Literal['weighted_average', 'from_lab', 'desaturation', 'average', 'max', 'pca'] = weighted_average,
    p: float = 0.5
)

Convert an image to grayscale and optionally replicate the grayscale channel. This transform first converts a color image to a single-channel grayscale image using various methods, then replicates the grayscale channel if num_output_channels is greater than 1.

Parameters

NameTypeDefaultDescription
num_output_channelsint3The number of channels in the output image. If greater than 1, the grayscale channel will be replicated. Default: 3.
method
One of:
  • 'weighted_average'
  • 'from_lab'
  • 'desaturation'
  • 'average'
  • 'max'
  • 'pca'
weighted_averageThe method used for grayscale conversion: - "weighted_average": Uses a weighted sum of RGB channels (0.299R + 0.587G + 0.114B). Works only with 3-channel images. Provides realistic results based on human perception. - "from_lab": Extracts the L channel from the LAB color space. Works only with 3-channel images. Gives perceptually uniform results. - "desaturation": Averages the maximum and minimum values across channels. Works with any number of channels. Fast but may not preserve perceived brightness well. - "average": Simple average of all channels. Works with any number of channels. Fast but may not give realistic results. - "max": Takes the maximum value across all channels. Works with any number of channels. Tends to produce brighter results. - "pca": Applies Principal Component Analysis to reduce channels. Works with any number of channels. Can preserve more information but is computationally intensive.
pfloat0.5Probability of applying the transform. Default: 0.5.

Returns

  • np.ndarray: Grayscale image with the specified number of channels.

Notes

- The transform first converts the input image to single-channel grayscale, then replicates this channel if num_output_channels > 1. - "weighted_average" and "from_lab" are typically used in image processing and computer vision applications where accurate representation of human perception is important. - "desaturation" and "average" are often used in simple image manipulation tools or when computational speed is a priority. - "max" method can be useful in scenarios where preserving bright features is important, such as in some medical imaging applications. - "pca" might be used in advanced image analysis tasks or when dealing with hyperspectral images.

ToRGBclass

ToRGB(
    num_output_channels: int = 3,
    p: float = 1.0
)

Convert an input image from grayscale to RGB format.

Parameters

NameTypeDefaultDescription
num_output_channelsint3The number of channels in the output image. Default: 3.
pfloat1.0Probability of applying the transform. Default: 1.0.

Notes

- For single-channel (grayscale) images, the channel is replicated to create an RGB image. - If the input is already a 3-channel RGB image, it is returned unchanged. - This transform does not change the data type of the image (e.g., uint8 remains uint8).

ToSepiaclass

ToSepia(
    p: float = 0.5
)

Apply a sepia filter to the input image. This transform converts a color image to a sepia tone, giving it a warm, brownish tint that is reminiscent of old photographs. The sepia effect is achieved by applying a specific color transformation matrix to the RGB channels of the input image. For grayscale images, the transform is a no-op and returns the original image.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Notes

- The sepia effect only works with RGB images (3 channels). For grayscale images, the original image is returned unchanged since the sepia transformation would have no visible effect when R=G=B. - The sepia effect is created using a fixed color transformation matrix: [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]] - The output image will have the same data type as the input image. - For float32 images, ensure the input values are in the range [0, 1].

UniformParamsclass

UniformParams(
    noise_type: Literal = uniform,
    ranges: Annotated
)

Parameters

NameTypeDefaultDescription
noise_typeLiteraluniform-
rangesAnnotated--

UnsharpMaskclass

UnsharpMask(
    blur_limit: tuple[int, int] | int = (3, 7),
    sigma_limit: tuple[float, float] | float = 0.0,
    alpha: tuple[float, float] | float = (0.2, 0.5),
    threshold: int = 10,
    p: float = 0.5
)

Sharpen the input image using Unsharp Masking processing and overlays the result with the original image. Unsharp masking is a technique that enhances edge contrast in an image, creating the illusion of increased sharpness. This transform applies Gaussian blur to create a blurred version of the image, then uses this to create a mask which is combined with the original image to enhance edges and fine details.

Parameters

NameTypeDefaultDescription
blur_limit
One of:
  • tuple[int, int]
  • int
(3, 7)maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`. If set single value `blur_limit` will be in range (0, blur_limit). Default: (3, 7).
sigma_limit
One of:
  • tuple[float, float]
  • float
0.0Gaussian kernel standard deviation. Must be more or equal to 0. If set single value `sigma_limit` will be in range (0, sigma_limit). If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
alpha
One of:
  • tuple[float, float]
  • float
(0.2, 0.5)range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
thresholdint10Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10.
pfloat0.5probability of applying the transform. Default: 0.5.

Notes

- The algorithm creates a mask M = (I - G) * alpha, where I is the original image and G is the Gaussian blurred version. - The final image is computed as: output = I + M if |I - G| > threshold, else I. - Higher alpha values increase the strength of the sharpening effect. - Higher threshold values limit the sharpening effect to areas with more significant edges or details. - The blur_limit and sigma_limit parameters control the Gaussian blur used to create the mask.