Stay updated

News & Insights
utils

albumentations.augmentations.pixel.transforms


Random noise to channels: uniform, gaussian, laplace, or beta. spatial_mode: constant, per_pixel, or shared. Params depend on noise_type.

AdditiveNoiseclass

AdditiveNoise(
    noise_type: 'uniform' | 'gaussian' | 'laplace' | 'beta' = uniform,
    spatial_mode: 'constant' | 'per_pixel' | 'shared' = constant,
    noise_params: dict[str, Any] | None,
    approximation: float = 1.0,
    p: float = 0.5
)

Random noise to channels: uniform, gaussian, laplace, or beta. spatial_mode: constant, per_pixel, or shared. Params depend on noise_type. This transform generates noise using different probability distributions and applies it to image channels. The noise can be generated in three spatial modes and supports multiple noise distributions, each with configurable parameters.

Parameters

NameTypeDefaultDescription
noise_type
One of:
  • 'uniform'
  • 'gaussian'
  • 'laplace'
  • 'beta'
uniformType of noise distribution to use. Options: - "uniform": Uniform distribution, good for simple random perturbations - "gaussian": Normal distribution, models natural random processes - "laplace": Similar to Gaussian but with heavier tails, good for outliers - "beta": Flexible bounded distribution, can be symmetric or skewed
spatial_mode
One of:
  • 'constant'
  • 'per_pixel'
  • 'shared'
constantHow to generate and apply the noise. Options: - "constant": One noise value per channel, fastest - "per_pixel": Independent noise value for each pixel and channel, slowest - "shared": One noise map shared across all channels, medium speed
noise_params
One of:
  • dict[str, Any]
  • None
-Parameters for the chosen noise distribution. Must match the noise_type: uniform: ranges: list[tuple[float, float]] List of (min, max) ranges for each channel. Each range must be in [-1, 1]. If only one range is provided, it will be used for all channels. [(-0.2, 0.2)] # Same range for all channels [(-0.2, 0.2), (-0.1, 0.1), (-0.1, 0.1)] # Different ranges for RGB gaussian: mean_range: tuple[float, float], default (0.0, 0.0) Range for sampling mean value, in [-1, 1] std_range: tuple[float, float], default (0.1, 0.1) Range for sampling standard deviation, in [0, 1] laplace: mean_range: tuple[float, float], default (0.0, 0.0) Range for sampling location parameter, in [-1, 1] scale_range: tuple[float, float], default (0.1, 0.1) Range for sampling scale parameter, in [0, 1] beta: alpha_range: tuple[float, float], default (0.5, 1.5) Value < 1 = U-shaped, Value > 1 = Bell-shaped Range for sampling first shape parameter, in (0, inf) beta_range: tuple[float, float], default (0.5, 1.5) Value < 1 = U-shaped, Value > 1 = Bell-shaped Range for sampling second shape parameter, in (0, inf) scale_range: tuple[float, float], default (0.1, 0.3) Smaller scale for subtler noise Range for sampling output scale, in [0, 1]
approximationfloat1.0float in [0, 1], default=1.0 Controls noise generation speed vs quality tradeoff. - 1.0: Generate full resolution noise (slowest, highest quality) - 0.5: Generate noise at half resolution and upsample - 0.25: Generate noise at quarter resolution and upsample Only affects 'per_pixel' and 'shared' spatial modes.
pfloat0.5-

Examples

>>> # Constant RGB shift with different ranges per channel:
>>> transform = AdditiveNoise(
...     noise_type="uniform",
...     spatial_mode="constant",
...     noise_params={"ranges": [(-0.2, 0.2), (-0.1, 0.1), (-0.1, 0.1)]}
... )

AtmosphericFogclass

AtmosphericFog(
    density_range: tuple[float, float] = (1.0, 3.0),
    fog_color: tuple[int, ...] = (200, 200, 200),
    depth_mode: 'linear' | 'diagonal' | 'radial' = linear,
    p: float = 0.5
)

Add depth-dependent fog via the atmospheric scattering equation and a synthetic depth map. Use for outdoor and driving robustness to haze. Unlike RandomFog (which overlays circular fog patches), this transform uses a physically-based scattering model: farther pixels (by synthetic depth) get more fog, producing realistic distance-dependent haze. Depth is derived from image position (linear, diagonal, or radial), not from a real depth map. Formula: `result = image * exp(-density * depth) + fog_color * (1 - exp(-density * depth))`

Parameters

NameTypeDefaultDescription
density_rangetuple[float, float](1.0, 3.0)Range for fog density. Higher values give thicker fog. Default: (1.0, 3.0).
fog_colortuple[int, ...](200, 200, 200)Fog color per channel, e.g. (R, G, B) for 3 channels. Length must match image channels. Default: (200, 200, 200).
depth_mode
One of:
  • 'linear'
  • 'diagonal'
  • 'radial'
linearHow synthetic depth is generated: - "linear": top of image = far, bottom = near (sky vs ground). - "diagonal": top-left = far. - "radial": center = near, edges = far. Default: "linear".
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.AtmosphericFog(density_range=(1.0, 2.5), depth_mode="linear", p=1.0)
>>> result = transform(image=image)["image"]
>>> # Radial fog (center clear, edges foggy)
>>> transform_radial = A.AtmosphericFog(density_range=(1.5, 3.0), depth_mode="radial", p=1.0)
>>> result_radial = transform_radial(image=image)["image"]

Notes

- Depth is synthetic (from pixel position), not from scene geometry. - For typical outdoor frames, "linear" matches sky far / ground near.

AutoContrastclass

AutoContrast(
    cutoff: float = 0,
    ignore: int | None,
    method: 'cdf' | 'pil' = cdf,
    p: float = 0.5
)

Stretch intensity to full range (autocontrast). method: CDF or PIL-style. cutoff, ignore trim extremes. Use for normalizing brightness/contrast across images. This transform provides two methods for contrast enhancement: 1. CDF method (default): Uses cumulative distribution function for more gradual adjustment 2. PIL method: Uses linear scaling like PIL.ImageOps.autocontrast The transform can optionally exclude extreme values from both ends of the intensity range and preserve specific intensity values (e.g., alpha channel).

Parameters

NameTypeDefaultDescription
cutofffloat0Percentage of pixels to exclude from both ends of the histogram. Range: [0, 100]. Default: 0 (use full intensity range) - 0 means use the minimum and maximum intensity values found - 20 means exclude darkest and brightest 20% of pixels
ignore
One of:
  • int
  • None
-Intensity value to preserve (e.g., alpha channel). Range: [0, 255]. Default: None - If specified, this intensity value will not be modified - Useful for images with alpha channel or special marker values
method
One of:
  • 'cdf'
  • 'pil'
cdfAlgorithm to use for contrast enhancement. Default: "cdf" - "cdf": Uses cumulative distribution for smoother adjustment - "pil": Uses linear scaling like PIL.ImageOps.autocontrast
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import albumentations as A
>>> # Basic usage
>>> transform = A.AutoContrast(p=1.0)
>>>
>>> # Exclude extreme values
>>> transform = A.AutoContrast(cutoff=20, p=1.0)
>>>
>>> # Preserve alpha channel
>>> transform = A.AutoContrast(ignore=255, p=1.0)
>>>
>>> # Use PIL-like contrast enhancement
>>> transform = A.AutoContrast(method="pil", p=1.0)

Notes

- The transform processes each color channel independently - For grayscale images, only one channel is processed - The output maintains the same dtype as input - Empty or single-color channels remain unchanged

CLAHEclass

CLAHE(
    clip_limit: tuple[float, float] | float = 4.0,
    tile_grid_size: tuple[int, int] = (8, 8),
    p: float = 0.5
)

Contrast Limited Adaptive Histogram Equalization: local contrast with clip_limit and tile_grid_size. Good for non-uniform lighting; preserves detail. CLAHE is an advanced method of improving the contrast in an image. Unlike regular histogram equalization, which operates on the entire image, CLAHE operates on small regions (tiles) in the image. This results in a more balanced equalization, preventing over-amplification of contrast in areas with initially low contrast.

Parameters

NameTypeDefaultDescription
clip_limit
One of:
  • tuple[float, float]
  • float
4.0Controls the contrast enhancement limit. - If a single float is provided, the range will be (1, clip_limit). - If a tuple of two floats is provided, it defines the range for random selection. Higher values allow for more contrast enhancement, but may also increase noise. Default: (1, 4)
tile_grid_sizetuple[int, int](8, 8)Defines the number of tiles in the row and column directions. Format is (rows, columns). Smaller tile sizes can lead to more localized enhancements, while larger sizes give results closer to global histogram equalization. Default: (8, 8)
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.CLAHE(clip_limit=(1, 4), tile_grid_size=(8, 8), p=1.0)
>>> result = transform(image=image)
>>> clahe_image = result["image"]

Notes

- Supports only RGB or grayscale images. - For color images, CLAHE is applied to the L channel in the LAB color space. - The clip limit determines the maximum slope of the cumulative histogram. A lower clip limit will result in more contrast limiting. - Tile grid size affects the adaptiveness of the method. More tiles increase local adaptiveness but can lead to an unnatural look if set too high.

References

  • [{'description': 'Tutorial', 'source': 'https://docs.opencv.org/master/d5/daf/tutorial_py_histogram_equalization.html'}, {'description': '"Contrast Limited Adaptive Histogram Equalization."', 'source': 'https://ieeexplore.ieee.org/document/109340'}]

ChannelShuffleclass

ChannelShuffle(
    p: float = 0.5
)

Randomly permute channel order (e.g. RGB→BGR) each call. Makes model invariant to channel order; useful for multi-channel or color-agnostic training. Unlike ChannelSwap, the permutation is random every time (uniform over all orders). All pixel data is preserved; only channel indices change.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> transform = A.ChannelShuffle(p=1.0)
>>> result = transform(image=image)["image"]

Notes

- Permutation is chosen uniformly over all channel orderings. - Same image can get different orderings on different calls.

ChannelSwapclass

ChannelSwap(
    channel_order: tuple[int, ...] = (2, 1, 0),
    p: float = 0.5
)

Fixed channel reordering (e.g. RGB→BGR). Deterministic permutation; unlike ChannelShuffle. Use for color-space conversion or fixed channel layouts. Applies a user-specified channel order every time (no randomness). Useful for BGR↔RGB conversion or training models invariant to a specific channel ordering.

Parameters

NameTypeDefaultDescription
channel_ordertuple[int, ...](2, 1, 0)Permutation of channel indices. Length must match image channels. For 3-channel, (2, 1, 0) swaps R and B (RGB→BGR). Default: (2, 1, 0).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> # Swap R and B (RGB → BGR)
>>> transform = A.ChannelSwap(channel_order=(2, 1, 0), p=1.0)
>>> result = transform(image=image)["image"]
>>> np.testing.assert_array_equal(result[:, :, 0], image[:, :, 2])

Notes

- channel_order must be a permutation of 0..C-1 for C channels. - (2, 1, 0) gives RGB→BGR; (0, 2, 1) swaps G and B.

ChromaticAberrationclass

ChromaticAberration(
    primary_distortion_limit: tuple[float, float] | float = (-0.02, 0.02),
    secondary_distortion_limit: tuple[float, float] | float = (-0.05, 0.05),
    mode: 'green_purple' | 'red_blue' | 'random' = green_purple,
    interpolation: 0 | 6 | 1 | 2 | 3 | 4 | 5 = 1,
    p: float = 0.5
)

Add lateral chromatic aberration: shift red and blue relative to green. distortion_limit and shift_limit control strength. Simulates lens color fringing. Chromatic aberration is an optical effect that occurs when a lens fails to focus all colors to the same point. This transform simulates this effect by applying different radial distortions to the red and blue channels of the image, while leaving the green channel unchanged.

Parameters

NameTypeDefaultDescription
primary_distortion_limit
One of:
  • tuple[float, float]
  • float
(-0.02, 0.02)Range of the primary radial distortion coefficient. If a single float value is provided, the range will be (-primary_distortion_limit, primary_distortion_limit). This parameter controls the distortion in the center of the image: - Positive values result in pincushion distortion (edges bend inward) - Negative values result in barrel distortion (edges bend outward) Default: (-0.02, 0.02).
secondary_distortion_limit
One of:
  • tuple[float, float]
  • float
(-0.05, 0.05)Range of the secondary radial distortion coefficient. If a single float value is provided, the range will be (-secondary_distortion_limit, secondary_distortion_limit). This parameter controls the distortion in the corners of the image: - Positive values enhance pincushion distortion - Negative values enhance barrel distortion Default: (-0.05, 0.05).
mode
One of:
  • 'green_purple'
  • 'red_blue'
  • 'random'
green_purpleType of color fringing to apply. Options are: - 'green_purple': Distorts red and blue channels in opposite directions, creating green-purple fringing. - 'red_blue': Distorts red and blue channels in the same direction, creating red-blue fringing. - 'random': Randomly chooses between 'green_purple' and 'red_blue' modes for each application. Default: 'green_purple'.
interpolation
One of:
  • 0
  • 6
  • 1
  • 2
  • 3
  • 4
  • 5
1Flag specifying the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
pfloat0.5Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5.

Examples

>>> import albumentations as A
>>> import cv2
>>> transform = A.ChromaticAberration(
...     primary_distortion_limit=0.05,
...     secondary_distortion_limit=0.1,
...     mode='green_purple',
...     interpolation=cv2.INTER_LINEAR,
...     p=1.0
... )
>>> transformed = transform(image=image)
>>> aberrated_image = transformed['image']

Notes

- This transform only affects RGB images. Grayscale images will raise an error. - The strength of the effect depends on both primary and secondary distortion limits. - Higher absolute values for distortion limits will result in more pronounced chromatic aberration. - The 'green_purple' mode tends to produce more noticeable effects than 'red_blue'.

References

  • [{'description': 'Chromatic Aberration', 'source': 'https://en.wikipedia.org/wiki/Chromatic_aberration'}]

ColorJitterclass

ColorJitter(
    brightness: tuple[float, float] | float = (0.8, 1.2),
    contrast: tuple[float, float] | float = (0.8, 1.2),
    saturation: tuple[float, float] | float = (0.8, 1.2),
    hue: tuple[float, float] | float = (-0.5, 0.5),
    p: float = 0.5
)

Randomly apply brightness, contrast, saturation, hue in random order. Separate ranges per effect. Strong color augmentation for classification and detection. This transform is similar to torchvision's ColorJitter but with some differences due to the use of OpenCV instead of Pillow. The main differences are: 1. OpenCV and Pillow use different formulas to convert images to HSV format. 2. This implementation uses value saturation instead of uint8 overflow as in Pillow. These differences may result in slightly different output compared to torchvision's ColorJitter.

Parameters

NameTypeDefaultDescription
brightness
One of:
  • tuple[float, float]
  • float
(0.8, 1.2)How much to jitter brightness. If float: The brightness factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]. If tuple: The brightness factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)
contrast
One of:
  • tuple[float, float]
  • float
(0.8, 1.2)How much to jitter contrast. If float: The contrast factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]. If tuple: The contrast factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)
saturation
One of:
  • tuple[float, float]
  • float
(0.8, 1.2)How much to jitter saturation. If float: The saturation factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]. If tuple: The saturation factor is sampled from the range specified. Should be non-negative numbers. Default: (0.8, 1.2)
hue
One of:
  • tuple[float, float]
  • float
(-0.5, 0.5)How much to jitter hue. If float: The hue factor is chosen uniformly from [-hue, hue]. Should have 0 <= hue <= 0.5. If tuple: The hue factor is sampled from the range specified. Values should be in range [-0.5, 0.5]. Default: (-0.5, 0.5) p (float): Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5
pfloat0.5-

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1, p=1.0)
>>> result = transform(image=image)
>>> jittered_image = result['image']

Notes

- The order of application for these color transformations is random for each image. - The ranges for brightness, contrast, and saturation are applied as multiplicative factors. - The range for hue is applied as an additive factor.

References

  • [{'description': 'ColorJitter', 'source': 'https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html'}, {'description': 'Color Conversions', 'source': 'https://docs.opencv.org/3.4/de/d25/imgproc_color_conversions.html'}]

Ditheringclass

Dithering(
    method: 'random' | 'ordered' | 'error_diffusion' = error_diffusion,
    n_colors: int = 2,
    color_mode: 'grayscale' | 'per_channel' = grayscale,
    error_diffusion_algorithm: 'floyd_steinberg' | 'jarvis' | 'stucki' | 'atkinson' | 'burkes' | 'sierra' | 'sierra_2row' | 'sierra_lite' = floyd_steinberg,
    bayer_matrix_size: 2 | 4 | 8 | 16 = 4,
    serpentine: bool = False,
    noise_range: tuple[float, float] = (-0.5, 0.5),
    p: float = 0.5
)

Reduce colors via dithering: ordered Bayer, error diffusion, or random. num_levels, method. Good for retro look or limited-color output. Dithering is like creating a newspaper photo - it uses patterns of dots to create the illusion of more colors than are actually present. When you have a limited color palette (like only black and white), dithering arranges these limited colors in patterns that trick your eye into seeing intermediate shades. Think of it like pointillist paintings - up close you see individual dots, but from a distance they blend together to create smooth gradients and subtle color variations. This transform works with ANY number of channels - it processes each channel independently, whether you have a standard RGB image (3 channels), RGBA with transparency (4 channels), multispectral satellite imagery (dozens of channels), or even single-channel grayscale images.

Parameters

NameTypeDefaultDescription
method
One of:
  • 'random'
  • 'ordered'
  • 'error_diffusion'
error_diffusionWhich dithering algorithm to use. Each has different characteristics: - "random": Adds random noise before quantization. Creates a grainy, film-like texture. Good for artistic effects or simulating old photographs. - "ordered": Uses a repeating pattern (Bayer matrix) to decide which pixels to darken. Creates distinctive crosshatch patterns. Fast and predictable. Common in old computer graphics and newspaper printing. - "error_diffusion": Most sophisticated method. When a pixel is made darker or lighter than it should be, the "error" is spread to neighboring pixels. Creates the most natural-looking results. Like using a fine brush. Default: "error_diffusion"
n_colorsint2How many different color levels to keep per channel. Must be between 2 and 256. - 2 = only black and white (or min/max values for each channel) - 4 = 4 levels of gray (or 4 levels per color channel) - 16 = 16 shades, creating a retro computer graphics look - 256 = full range, no reduction (but patterns still visible from dithering process) Lower values create more dramatic effects. Default: 2
color_mode
One of:
  • 'grayscale'
  • 'per_channel'
grayscaleHow to handle color channels: - "per_channel": Each color channel (R, G, B, etc.) is dithered separately. Maintains color relationships but each channel gets its own pattern. Works with any number of channels. - "grayscale": First converts the image to grayscale (using standard luminance weights), then applies dithering, then expands back to the original number of channels. All color information is lost, but the dithering pattern is consistent across channels. Default: "grayscale"
error_diffusion_algorithm
One of:
  • 'floyd_steinberg'
  • 'jarvis'
  • 'stucki'
  • 'atkinson'
  • 'burkes'
  • 'sierra'
  • 'sierra_2row'
  • 'sierra_lite'
floyd_steinbergUsed only in "error_diffusion" method. Which specific algorithm: - "floyd_steinberg": The classic, invented in 1976. Spreads error to 4 neighbors. Good balance of quality and speed. Industry standard. - "jarvis": Jarvis-Judice-Ninke algorithm. Spreads error to 12 neighbors. Higher quality but 3x slower than Floyd-Steinberg. - "stucki": Similar to Jarvis but with different weights. Also 12 neighbors. - "atkinson": Created by Bill Atkinson for original Macintosh. Only spreads 75% of error, creating lighter images with more contrast. - "burkes": Spreads to 7 neighbors. Faster than Jarvis, better than Floyd-Steinberg. - "sierra": Spreads to 10 neighbors. Good quality, moderate speed. - "sierra_2row": Simplified Sierra using only 2 rows. Faster. - "sierra_lite": Minimal Sierra using only 3 neighbors. Very fast. Default: "floyd_steinberg"
bayer_matrix_size
One of:
  • 2
  • 4
  • 8
  • 16
4Used only in "ordered" method. The size of the repeating pattern (2, 4, 8, or 16). - 2x2: Very visible checkerboard pattern - 4x4: Standard, good balance - 8x8: Finer pattern, less visible - 16x16: Very fine pattern, almost noise-like Default: 4
serpentineboolFalseUsed only in "error_diffusion" method. Whether to process rows in alternating directions (left-to-right, then right-to-left). This can reduce visible "worm" artifacts that sometimes appear as diagonal lines. Slightly slower. Default: False
noise_rangetuple[float, float](-0.5, 0.5)Used only in "random" method. How much random noise to add before quantization. Larger range = more variation in the dithering pattern. Range: (-1.0, 1.0). Default: (-0.5, 0.5)
pfloat0.5Probability of applying this transform. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> # Black and white dithering with Floyd-Steinberg
>>> transform = A.Compose([
...     A.Dithering(
...         method="error_diffusion",
...         n_colors=2,
...         error_diffusion_algorithm="floyd_steinberg",
...         color_mode="grayscale",
...         p=1.0
...     )
... ])
>>> transformed = transform(image=image)
>>> dithered_image = transformed['image']  # Black and white dithered image
>>>
>>> # Ordered dithering with 16 colors per channel
>>> transform = A.Compose([
...     A.Dithering(
...         method="ordered",
...         n_colors=16,
...         bayer_matrix_size=8,
...         color_mode="per_channel",
...         p=1.0
...     )
... ])
>>> transformed = transform(image=image)
>>> dithered_image = transformed['image']  # Reduced color depth with Bayer pattern
>>>
>>> # Random dithering
>>> transform = A.Compose([
...     A.Dithering(
...         method="random",
...         n_colors=4,
...         noise_range=(-0.3, 0.3),
...         p=1.0
...     )
... ])
>>> transformed = transform(image=image)
>>> dithered_image = transformed['image']  # Noisy dithered appearance

References

  • [{'description': 'Wikipedia', 'source': 'https://en.wikipedia.org/wiki/Dither'}, {'description': 'Floyd-Steinberg dithering', 'source': 'https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering'}, {'description': 'Ordered dithering', 'source': 'https://en.wikipedia.org/wiki/Ordered_dithering'}, {'description': 'Error diffusion dithering', 'source': 'https://en.wikipedia.org/wiki/Error_diffusion'}]

Downscaleclass

Downscale(
    scale_range: tuple[float, float] = (0.25, 0.25),
    interpolation_pair: dict[['downscale', 'upscale'], [0, 6, 1, 2, 3, 4, 5]] = {'upscale': 0, 'downscale': 0},
    p: float = 0.5
)

Reduce quality by downscale then upscale. scale_min and scale_max control factor. Simulates resolution or compression loss. This transform simulates the effect of a low-resolution image by first downscaling the image to a lower resolution and then upscaling it back to its original size. This process introduces loss of detail and can be used to simulate low-quality images or to test the robustness of models to different image resolutions.

Parameters

NameTypeDefaultDescription
scale_rangetuple[float, float](0.25, 0.25)Range for the downscaling factor. Should be two float values between 0 and 1, where the first value is less than or equal to the second. The actual downscaling factor will be randomly chosen from this range for each image. Lower values result in more aggressive downscaling. Default: (0.25, 0.25)
interpolation_pairdict[['downscale', 'upscale'], [0, 6, 1, 2, 3, 4, 5]]{'upscale': 0, 'downscale': 0}A dictionary specifying the interpolation methods to use for downscaling and upscaling. Should contain two keys: - 'downscale': Interpolation method for downscaling - 'upscale': Interpolation method for upscaling Values should be OpenCV interpolation flags (e.g., cv2.INTER_NEAREST, cv2.INTER_LINEAR, etc.) Default: {'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_NEAREST}
pfloat0.5Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Examples

>>> import albumentations as A
>>> import cv2
>>> transform = A.Downscale(
...     scale_range=(0.5, 0.75),
...     interpolation_pair={'downscale': cv2.INTER_NEAREST, 'upscale': cv2.INTER_LINEAR},
...     p=0.5
... )
>>> transformed = transform(image=image)
>>> downscaled_image = transformed['image']

Notes

- The actual downscaling factor is randomly chosen for each image from the range specified in scale_range. - Using different interpolation methods for downscaling and upscaling can produce various effects. For example, using INTER_NEAREST for both can create a pixelated look, while using INTER_LINEAR or INTER_CUBIC can produce smoother results. - This transform can be useful for data augmentation, especially when training models that need to be robust to variations in image quality or resolution.

Embossclass

Emboss(
    alpha: tuple[float, float] = (0.2, 0.5),
    strength: tuple[float, float] = (0.2, 0.7),
    p: float = 0.5
)

Apply emboss effect (directional highlight and shadow). strength_range controls intensity. Pseudo-3D look; for texture or style augmentation. This transform creates an emboss effect by highlighting edges and creating a 3D-like texture in the image. It works by applying a specific convolution kernel to the image that emphasizes differences in adjacent pixel values.

Parameters

NameTypeDefaultDescription
alphatuple[float, float](0.2, 0.5)Range to choose the visibility of the embossed image. At 0, only the original image is visible, at 1.0 only its embossed version is visible. Values should be in the range [0, 1]. Alpha will be randomly selected from this range for each image. Default: (0.2, 0.5)
strengthtuple[float, float](0.2, 0.7)Range to choose the strength of the embossing effect. Higher values create a more pronounced 3D effect. Values should be non-negative. Strength will be randomly selected from this range for each image. Default: (0.2, 0.7)
pfloat0.5Probability of applying the transform. Should be in the range [0, 1]. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Emboss(alpha=(0.2, 0.5), strength=(0.2, 0.7), p=0.5)
>>> result = transform(image=image)
>>> embossed_image = result['image']

Notes

- The emboss effect is created using a 3x3 convolution kernel. - The 'alpha' parameter controls the blend between the original image and the embossed version. A higher alpha value will result in a more pronounced emboss effect. - The 'strength' parameter affects the intensity of the embossing. Higher strength values will create more contrast in the embossed areas, resulting in a stronger 3D-like effect. - This transform can be useful for creating artistic effects or for data augmentation in tasks where edge information is important.

References

  • [{'description': 'Image Embossing', 'source': 'https://en.wikipedia.org/wiki/Image_embossing'}, {'description': 'Application of Emboss Filtering in Image Processing', 'source': 'https://www.researchgate.net/publication/303412455_Application_of_Emboss_Filtering_in_Image_Processing'}]

Equalizeclass

Equalize(
    mode: 'cv' | 'pil' = cv,
    by_channels: bool = True,
    mask: ndarray | Callable[..., Any] | None,
    mask_params: Sequence = (),
    p: float = 0.5
)

Equalize histogram to spread intensities. mode: global or adaptive; mask optional. Improves contrast normalization across datasets. This transform applies histogram equalization to the input image. Histogram equalization is a method in image processing of contrast adjustment using the image's histogram.

Parameters

NameTypeDefaultDescription
mode
One of:
  • 'cv'
  • 'pil'
cvUse OpenCV or Pillow equalization method. Default: 'cv'
by_channelsboolTrueIf True, use equalization by channels separately, else convert image to YCbCr representation and use equalization by `Y` channel. Default: True
mask
One of:
  • ndarray
  • Callable[..., Any]
  • None
-If given, only the pixels selected by the mask are included in the analysis. Can be: - A 1-channel or 3-channel numpy array of the same size as the input image. - A callable (function) that generates a mask. The function should accept 'image' as its first argument, and can accept additional arguments specified in mask_params. Default: None
mask_paramsSequence()Additional parameters to pass to the mask function. These parameters will be taken from the data dict passed to __call__. Default: ()
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> # Using a static mask
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> transform = A.Equalize(mask=mask, p=1.0)
>>> result = transform(image=image)
>>>
>>> # Using a dynamic mask function
>>> def mask_func(image, bboxes):
...     mask = np.ones_like(image[:, :, 0], dtype=np.uint8)
...     for bbox in bboxes:
...         x1, y1, x2, y2 = map(int, bbox)
...         mask[y1:y2, x1:x2] = 0  # Exclude areas inside bounding boxes
...     return mask
>>>
>>> transform = A.Equalize(mask=mask_func, mask_params=['bboxes'], p=1.0)
>>> bboxes = [(10, 10, 50, 50), (60, 60, 90, 90)]  # Example bounding boxes
>>> result = transform(image=image, bboxes=bboxes)

Notes

- When mode='cv', OpenCV's equalizeHist() function is used. - When mode='pil', Pillow's equalize() function is used. - The 'by_channels' parameter determines whether equalization is applied to each color channel independently (True) or to the luminance channel only (False). - If a mask is provided as a numpy array, it should have the same height and width as the input image. - If a mask is provided as a function, it allows for dynamic mask generation based on the input image and additional parameters. This is useful for scenarios where the mask depends on the image content or external data (e.g., bounding boxes, segmentation masks).

References

  • [{'description': 'OpenCV equalizeHist', 'source': 'https://docs.opencv.org/3.4/d6/dc7/group__imgproc__hist.html#ga7e54091f0c937d49bf84152a16f76d6e'}, {'description': 'Pillow ImageOps.equalize', 'source': 'https://pillow.readthedocs.io/en/stable/reference/ImageOps.html#PIL.ImageOps.equalize'}, {'description': 'Histogram Equalization', 'source': 'https://en.wikipedia.org/wiki/Histogram_equalization'}]

FancyPCAclass

FancyPCA(
    alpha: float = 0.1,
    p: float = 0.5
)

Add color variation via PCA on RGB: perturb components by alpha_std. Simulates natural lighting variation (ImageNet-style). Good for object recognition. This augmentation technique applies PCA (Principal Component Analysis) to the image's color channels, then adds multiples of the principal components to the image, with magnitudes proportional to the corresponding eigenvalues times a random variable drawn from a Gaussian with mean 0 and standard deviation 'alpha'.

Parameters

NameTypeDefaultDescription
alphafloat0.1Standard deviation of the Gaussian distribution used to generate random noise for each principal component. Default: 0.1.
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.FancyPCA(alpha=0.1, p=1.0)
>>> result = transform(image=image)
>>> augmented_image = result["image"]

Notes

- This augmentation is particularly effective for RGB images but can work with any number of channels. - For grayscale images, it applies a simplified version of the augmentation. - The transform preserves the mean of the image while adjusting the color/intensity variation. - This implementation is based on the paper by Krizhevsky et al. and is similar to the one used in the original AlexNet paper.

References

  • [{'description': 'ImageNet Classification with Deep Convolutional Neural Networks', 'source': 'In Advances in Neural Information'}]

FilmGrainclass

FilmGrain(
    intensity_range: tuple[float, float] = (0.1, 0.3),
    grain_size_range: tuple[int, int] = (1, 3),
    p: float = 0.5
)

Analog film grain: luminance-dependent, spatially correlated noise. Distinct from i.i.d. GaussNoise or ShotNoise. Use for vintage or film-like augmentation. Unlike GaussNoise or ShotNoise, film grain is: - Luminance-dependent: darker areas show more visible grain - Spatially correlated: grain is clumped, not i.i.d. per-pixel - Optionally chromatic: separate grain patterns per channel

Parameters

NameTypeDefaultDescription
intensity_rangetuple[float, float](0.1, 0.3)Range for grain intensity. Higher values give more prominent grain. Default: (0.1, 0.3).
grain_size_rangetuple[int, int](1, 3)Grain resolution as divisor of image size. 1 = full resolution (fine); larger = coarser, more clumped. Default: (1, 3).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> transform = A.FilmGrain(intensity_range=(0.1, 0.3), grain_size_range=(1, 3), p=1.0)
>>> result = transform(image=image)["image"]

Notes

- Grain is generated at lower resolution and upscaled → spatial correlation (clumping) like real film. - Visibility modulated by inverse luminance; darker regions show more grain (silver halide-like behavior).

GaussNoiseclass

GaussNoise(
    std_range: tuple[float, float] = (0.2, 0.44),
    mean_range: tuple[float, float] = (0.0, 0.0),
    per_channel: bool = False,
    noise_scale_factor: float = 1,
    p: float = 0.5
)

Add Gaussian (normal) noise to the image. i.i.d. per pixel (or per block if scaled). Use for robustness to sensor or transmission noise. Noise standard deviation and mean are sampled from configurable ranges and scaled to image dtype (255 for uint8, 1.0 for float32). Optional per-channel sampling and lower-resolution noise for speed.

Parameters

NameTypeDefaultDescription
std_rangetuple[float, float](0.2, 0.44)Range for noise standard deviation as a fraction of the max value (255 for uint8, 1.0 for float32). In [0, 1]. Default: (0.2, 0.44).
mean_rangetuple[float, float](0.0, 0.0)Range for noise mean as a fraction of max. In [-1, 1]. Default: (0.0, 0.0).
per_channelboolFalseIf True, sample noise per channel; else same noise for all. Default: False.
noise_scale_factorfloat1If < 1, noise is generated at lower resolution and resized (faster, coarser). 1 = per-pixel. In (0, 1]. Default: 1.0.
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> transform = A.GaussNoise(std_range=(0.1, 0.2), p=1.0)
>>> noisy_image = transform(image=image)["image"]

Notes

- std_range and mean_range are in [0, 1] / [-1, 1]; scaled by 255 (uint8) or used directly (float32). - per_channel=False: faster, same noise on all channels (grayscale-like on RGB). - per_channel=True: different noise per channel (colored noise). - noise_scale_factor < 1 trades speed for noise granularity.

HEStainclass

HEStain(
    method: 'preset' | 'random_preset' | 'vahadane' | 'macenko' = random_preset,
    preset: 'ruifrok' | 'macenko' | 'standard' | 'high_contrast' | 'h_heavy' | 'e_heavy' | 'dark' | 'light' | None,
    intensity_scale_range: tuple[float, float] = (0.7, 1.3),
    intensity_shift_range: tuple[float, float] = (-0.2, 0.2),
    augment_background: bool = False,
    p: float = 0.5
)

H&E stain augmentation for histopathology. method: preset, random_preset, vahadane, macenko. Simulates staining variation for robust pathology models. This transform simulates different H&E staining conditions using either: 1. Predefined stain matrices (8 standard references) 2. Vahadane method for stain extraction 3. Macenko method for stain extraction 4. Custom stain matrices

Parameters

NameTypeDefaultDescription
method
One of:
  • 'preset'
  • 'random_preset'
  • 'vahadane'
  • 'macenko'
random_presetMethod to use for stain augmentation: - "preset": Use predefined stain matrices - "random_preset": Randomly select a preset matrix each time - "vahadane": Extract using Vahadane method - "macenko": Extract using Macenko method Default: "preset"
preset
One of:
  • 'ruifrok'
  • 'macenko'
  • 'standard'
  • 'high_contrast'
  • 'h_heavy'
  • 'e_heavy'
  • 'dark'
  • 'light'
  • None
-Preset stain matrix to use when method="preset": - "ruifrok": Standard reference from Ruifrok & Johnston - "macenko": Reference from Macenko's method - "standard": Typical bright-field microscopy - "high_contrast": Enhanced contrast - "h_heavy": Hematoxylin dominant - "e_heavy": Eosin dominant - "dark": Darker staining - "light": Lighter staining Default: "standard"
intensity_scale_rangetuple[float, float](0.7, 1.3)Range for multiplicative stain intensity variation. Values are multipliers between 0.5 and 1.5. For example: - (0.7, 1.3) means stain intensities will vary from 70% to 130% - (0.9, 1.1) gives subtle variations - (0.5, 1.5) gives dramatic variations Default: (0.7, 1.3)
intensity_shift_rangetuple[float, float](-0.2, 0.2)Range for additive stain intensity variation. Values between -0.3 and 0.3. For example: - (-0.2, 0.2) means intensities will be shifted by -20% to +20% - (-0.1, 0.1) gives subtle shifts - (-0.3, 0.3) gives dramatic shifts Default: (-0.2, 0.2)
augment_backgroundboolFalseWhether to apply augmentation to background regions. Default: False
pfloat0.5-

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a sample H&E stained histopathology image
>>> # For real use cases, load an actual H&E stained image
>>> image = np.zeros((300, 300, 3), dtype=np.uint8)
>>> # Simulate tissue regions with different staining patterns
>>> image[50:150, 50:150] = np.array([120, 140, 180], dtype=np.uint8)  # Hematoxylin-rich region
>>> image[150:250, 150:250] = np.array([140, 160, 120], dtype=np.uint8)  # Eosin-rich region
>>>
>>> # Example 1: Using a specific preset stain matrix
>>> transform = A.HEStain(
...     method="preset",
...     preset="standard",
...     intensity_scale_range=(0.8, 1.2),
...     intensity_shift_range=(-0.1, 0.1),
...     augment_background=False,
...     p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
>>>
>>> # Example 2: Using random preset selection
>>> transform = A.HEStain(
...     method="random_preset",
...     intensity_scale_range=(0.7, 1.3),
...     intensity_shift_range=(-0.15, 0.15),
...     p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
>>>
>>> # Example 3: Using Vahadane method (requires H&E stained input)
>>> transform = A.HEStain(
...     method="vahadane",
...     intensity_scale_range=(0.7, 1.3),
...     p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
>>>
>>> # Example 4: Using Macenko method (requires H&E stained input)
>>> transform = A.HEStain(
...     method="macenko",
...     intensity_scale_range=(0.7, 1.3),
...     intensity_shift_range=(-0.2, 0.2),
...     p=1.0
... )
>>> result = transform(image=image)
>>> transformed_image = result['image']
>>>
>>> # Example 5: Combining with other transforms in a pipeline
>>> transform = A.Compose([
...     A.HEStain(method="preset", preset="high_contrast", p=1.0),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> result = transform(image=image)
>>> transformed_image = result['image']

References

  • [{'description': 'A. C. Ruifrok and D. A. Johnston, "Quantification of histochemical"', 'source': 'Analytical and quantitative cytology and histology, 2001.'}, {'description': 'M. Macenko et al., "A method for normalizing histology slides for', 'source': '2009 IEEE International Symposium on quantitative analysis," 2009 IEEE International Symposium on Biomedical Imaging, 2009.'}]

Halftoneclass

Halftone(
    dot_size_range: tuple[int, int] = (4, 10),
    blend_range: tuple[float, float] = (0.0, 0.5),
    p: float = 0.5
)

Halftone dot pattern (printing-style). Continuous tones become dots of varying size. Use for vintage or print-aesthetic augmentation. Simulates halftone printing: a grid of cells, each drawn as a filled circle whose size is proportional to mean luminance in that cell. Larger dots = brighter, smaller = darker. Optional blend with the original image controls strength.

Parameters

NameTypeDefaultDescription
dot_size_rangetuple[int, int](4, 10)Range for grid cell size in pixels. Larger = coarser pattern. Default: (4, 10).
blend_rangetuple[float, float](0.0, 0.5)Blend with original: 0 = pure halftone, 1 = original. Default: (0.0, 0.5).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> transform = A.Halftone(dot_size_range=(4, 8), blend_range=(0.0, 0.3), p=1.0)
>>> result = transform(image=image)["image"]

Notes

- Mean luminance per grid cell drives dot radius; cell color from original image. - Dot size is proportional to luminance (bright → large dot, dark → small dot).

HueSaturationValueclass

HueSaturationValue(
    hue_shift_limit: tuple[float, float] | float = (-20, 20),
    sat_shift_limit: tuple[float, float] | float = (-30, 30),
    val_shift_limit: tuple[float, float] | float = (-20, 20),
    p: float = 0.5
)

Randomly shift hue, saturation, and value (HSV). Separate ranges per channel. Common for color augmentation in classification. This transform adjusts the HSV (Hue, Saturation, Value) channels of an input RGB image. It allows for independent control over each channel, providing a wide range of color and brightness modifications.

Parameters

NameTypeDefaultDescription
hue_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for changing hue. If a single float value is provided, the range will be (-hue_shift_limit, hue_shift_limit). Values should be in the range [-180, 180]. Default: (-20, 20).
sat_shift_limit
One of:
  • tuple[float, float]
  • float
(-30, 30)Range for changing saturation. If a single float value is provided, the range will be (-sat_shift_limit, sat_shift_limit). Values should be in the range [-255, 255]. Default: (-30, 30).
val_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for changing value (brightness). If a single float value is provided, the range will be (-val_shift_limit, val_shift_limit). Values should be in the range [-255, 255]. Default: (-20, 20).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.HueSaturationValue(
...     hue_shift_limit=20,
...     sat_shift_limit=30,
...     val_shift_limit=20,
...     p=0.7
... )
>>> result = transform(image=image)
>>> augmented_image = result["image"]

Notes

- The transform first converts the input RGB image to the HSV color space. - Each channel (Hue, Saturation, Value) is adjusted independently. - Hue is circular, so it wraps around at 180 degrees. - For float32 images, the shift values are applied as percentages of the full range. - This transform is particularly useful for color augmentation and simulating different lighting conditions.

References

  • [{'description': 'HSV color space', 'source': 'https://en.wikipedia.org/wiki/HSL_and_HSV'}]

ISONoiseclass

ISONoise(
    color_shift: tuple[float, float] = (0.01, 0.05),
    intensity: tuple[float, float] = (0.1, 0.5),
    p: float = 0.5
)

Add camera-sensor-like noise scaling with intensity (high ISO). color_shift and intensity range control strength. Good for low-light or camera noise simulation. This transform adds random noise to an image, mimicking the effect of using high ISO settings in digital photography. It simulates two main components of ISO noise: 1. Color noise: random shifts in color hue 2. Luminance noise: random variations in pixel intensity

Parameters

NameTypeDefaultDescription
color_shifttuple[float, float](0.01, 0.05)Range for changing color hue. Values should be in the range [0, 1], where 1 represents a full 360° hue rotation. Default: (0.01, 0.05)
intensitytuple[float, float](0.1, 0.5)Range for the noise intensity. Higher values increase the strength of both color and luminance noise. Default: (0.1, 0.5)
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ISONoise(color_shift=(0.01, 0.05), intensity=(0.1, 0.5), p=0.5)
>>> result = transform(image=image)
>>> noisy_image = result["image"]

Notes

- This transform only works with RGB images. It will raise a TypeError if applied to non-RGB images. - The color shift is applied in the HSV color space, affecting the hue channel. - Luminance noise is added to all channels independently. - This transform can be useful for data augmentation in low-light scenarios or when training models to be robust against noisy inputs.

References

  • [{'description': 'ISO noise in digital photography', 'source': 'https://en.wikipedia.org/wiki/Image_noise#In_digital_cameras'}]

Illuminationclass

Illumination(
    mode: 'linear' | 'corner' | 'gaussian' = linear,
    intensity_range: tuple[float, float] = (0.01, 0.2),
    effect_type: 'brighten' | 'darken' | 'both' = both,
    angle_range: tuple[float, float] = (0, 360),
    center_range: tuple[float, float] = (0.1, 0.9),
    sigma_range: tuple[float, float] = (0.2, 1.0),
    p: float = 0.5
)

Illumination patterns: directional (linear), corner shadows/highlights, or gaussian. mode and params control shape and strength. Simulates lighting variation. This transform simulates different lighting conditions by applying controlled illumination patterns. It can create effects like: - Directional lighting (linear mode) - Corner shadows/highlights (corner mode) - Spotlights or local lighting (gaussian mode) These effects can be used to: - Simulate natural lighting variations - Add dramatic lighting effects - Create synthetic shadows or highlights - Augment training data with different lighting conditions

Parameters

NameTypeDefaultDescription
mode
One of:
  • 'linear'
  • 'corner'
  • 'gaussian'
linearType of illumination pattern: - 'linear': Creates a smooth gradient across the image, simulating directional lighting like sunlight through a window - 'corner': Applies gradient from any corner, simulating light source from a corner - 'gaussian': Creates a circular spotlight effect, simulating local light sources Default: 'linear'
intensity_rangetuple[float, float](0.01, 0.2)Range for effect strength. Values between 0.01 and 0.2: - 0.01-0.05: Subtle lighting changes - 0.05-0.1: Moderate lighting effects - 0.1-0.2: Strong lighting effects Default: (0.01, 0.2)
effect_type
One of:
  • 'brighten'
  • 'darken'
  • 'both'
bothType of lighting change: - 'brighten': Only adds light (like a spotlight) - 'darken': Only removes light (like a shadow) - 'both': Randomly chooses between brightening and darkening Default: 'both'
angle_rangetuple[float, float](0, 360)Range for gradient angle in degrees. Controls direction of linear gradient: - 0°: Left to right - 90°: Top to bottom - 180°: Right to left - 270°: Bottom to top Only used for 'linear' mode. Default: (0, 360)
center_rangetuple[float, float](0.1, 0.9)Range for spotlight position. Values between 0 and 1 representing relative position: - (0, 0): Top-left corner - (1, 1): Bottom-right corner - (0.5, 0.5): Center of image Only used for 'gaussian' mode. Default: (0.1, 0.9)
sigma_rangetuple[float, float](0.2, 1.0)Range for spotlight size. Values between 0.2 and 1.0: - 0.2: Small, focused spotlight - 0.5: Medium-sized light area - 1.0: Broad, soft lighting Only used for 'gaussian' mode. Default: (0.2, 1.0)
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import albumentations as A
>>> # Simulate sunlight through window
>>> transform = A.Illumination(
...     mode='linear',
...     intensity_range=(0.05, 0.1),
...     effect_type='brighten',
...     angle_range=(30, 60)
... )
>>>
>>> # Create dramatic corner shadow
>>> transform = A.Illumination(
...     mode='corner',
...     intensity_range=(0.1, 0.2),
...     effect_type='darken'
... )
>>>
>>> # Add multiple spotlights
>>> transform1 = A.Illumination(
...     mode='gaussian',
...     intensity_range=(0.05, 0.15),
...     effect_type='brighten',
...     center_range=(0.2, 0.4),
...     sigma_range=(0.2, 0.3)
... )
>>> transform2 = A.Illumination(
...     mode='gaussian',
...     intensity_range=(0.05, 0.15),
...     effect_type='darken',
...     center_range=(0.6, 0.8),
...     sigma_range=(0.3, 0.5)
... )
>>> transforms = A.Compose([transform1, transform2])

Notes

- The transform preserves image range and dtype - Effects are applied multiplicatively to preserve texture - Can be combined with other transforms for complex lighting scenarios - Useful for training models to be robust to lighting variations

References

  • [{'description': 'Lighting in Computer Vision', 'source': 'https://en.wikipedia.org/wiki/Lighting_in_computer_vision'}, {'description': 'Image-based lighting', 'source': 'https://en.wikipedia.org/wiki/Image-based_lighting'}, {'description': 'Similar implementation in Kornia', 'source': 'https://kornia.readthedocs.io/en/latest/augmentation.html#randomlinearillumination'}, {'description': 'Research on lighting augmentation', 'source': '"Learning Deep Representations of Fine-grained Visual Descriptions" https://arxiv.org/abs/1605.05395'}, {'description': 'Photography lighting patterns', 'source': 'https://en.wikipedia.org/wiki/Lighting_pattern'}]

ImageCompressionclass

ImageCompression(
    compression_type: 'jpeg' | 'webp' = jpeg,
    quality_range: tuple[int, int] = (99, 100),
    p: float = 0.5
)

Reduce image quality via JPEG or WebP compression. quality_range and compression_type control strength and format. Simulates real-world compression artifacts. This transform simulates the effect of saving an image with lower quality settings, which can introduce compression artifacts. It's useful for data augmentation and for testing model robustness against varying image qualities.

Parameters

NameTypeDefaultDescription
compression_type
One of:
  • 'jpeg'
  • 'webp'
jpegType of compression to apply. - "jpeg": JPEG compression - "webp": WebP compression Default: "jpeg"
quality_rangetuple[int, int](99, 100)Range for the compression quality. The values should be in [1, 100] range, where: - 1 is the lowest quality (maximum compression) - 100 is the highest quality (minimum compression) Default: (99, 100)
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ImageCompression(quality_range=(50, 90), compression_type=0, p=1.0)
>>> result = transform(image=image)
>>> compressed_image = result["image"]

Notes

- This transform expects images with 1, 3, or 4 channels. - For JPEG compression, alpha channels (4th channel) will be ignored. - WebP compression supports transparency (4 channels). - The actual file is not saved to disk; the compression is simulated in memory. - Lower quality values result in smaller file sizes but may introduce visible artifacts. - This transform can be useful for: * Data augmentation to improve model robustness * Testing how models perform on images of varying quality * Simulating images transmitted over low-bandwidth connections

References

  • [{'description': 'JPEG compression', 'source': 'https://en.wikipedia.org/wiki/JPEG'}, {'description': 'WebP compression', 'source': 'https://developers.google.com/speed/webp'}]

InvertImgclass

InvertImg(
    p: float = 0.5
)

Invert the input image by subtracting pixel values from max values of the image types, i.e., 255 for uint8 and 1.0 for float32.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a sample image with different elements
>>> image = np.zeros((100, 100, 3), dtype=np.uint8)
>>> cv2.circle(image, (30, 30), 20, (255, 255, 255), -1)  # White circle
>>> cv2.rectangle(image, (60, 60), (90, 90), (128, 128, 128), -1)  # Gray rectangle
>>>
>>> # Apply InvertImg transform
>>> transform = A.InvertImg(p=1.0)
>>> result = transform(image=image)
>>> inverted_image = result['image']
>>>
>>> # Result:
>>> # - Black background becomes white (0 → 255)
>>> # - White circle becomes black (255 → 0)
>>> # - Gray rectangle is inverted (128 → 127)
>>> # The same approach works for float32 images (0-1 range) and grayscale images

LensFlareclass

LensFlare(
    flare_roi: tuple[float, float, float, float] = (0, 0, 1, 0.5),
    num_ghosts_range: tuple[int, int] = (3, 7),
    intensity_range: tuple[float, float] = (0.3, 0.7),
    num_rays_range: tuple[int, int] = (4, 8),
    bloom_range: tuple[float, float] = (0.01, 0.05),
    p: float = 0.5
)

Add lens flare: starburst rays and ghost reflections from a bright source. Use for outdoor or backlit robustness and optical-artifact simulation. A flare center is chosen in a configurable region; starburst rays and mirrored ghost circles are drawn toward the image center. Strength and blur are configurable.

Parameters

NameTypeDefaultDescription
flare_roituple[float, float, float, float](0, 0, 1, 0.5)Region of interest for flare source placement as (x_min, y_min, x_max, y_max) in normalized [0, 1] coords. Default: (0, 0, 1, 0.5).
num_ghosts_rangetuple[int, int](3, 7)Range for number of ghost reflections. Default: (3, 7).
intensity_rangetuple[float, float](0.3, 0.7)Range for overall flare brightness. Default: (0.3, 0.7).
num_rays_rangetuple[int, int](4, 8)Range for number of starburst rays. Default: (4, 8).
bloom_rangetuple[float, float](0.01, 0.05)Range for bloom blur radius as fraction of image diagonal. Default: (0.01, 0.05).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.LensFlare(intensity_range=(0.3, 0.6), p=1.0)
>>> result = transform(image=image)["image"]

Notes

- Ghost reflections lie along the line from flare source to image center. - Size decreases and color shifts with distance from the source.

MultiplicativeNoiseclass

MultiplicativeNoise(
    multiplier: tuple[float, float] = (0.9, 1.1),
    per_channel: bool = False,
    elementwise: bool = False,
    p: float = 0.5
)

Multiply image by random per-pixel or per-channel factor. multiplier_range controls strength. Simulates illumination or gain variation; preserves zeros. This transform multiplies each pixel in the image by a random value or array of values, effectively creating a noise pattern that scales with the image intensity.

Parameters

NameTypeDefaultDescription
multipliertuple[float, float](0.9, 1.1)The range for the random multiplier. Defines the range from which the multiplier is sampled. Default: (0.9, 1.1)
per_channelboolFalseIf True, use a different random multiplier for each channel. If False, use the same multiplier for all channels. Setting this to False is slightly faster. Default: False
elementwiseboolFalseIf True, generates a unique multiplier for each pixel. If False, generates a single multiplier (or one per channel if per_channel=True). Default: False
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.MultiplicativeNoise(multiplier=(0.9, 1.1), per_channel=True, p=1.0)
>>> result = transform(image=image)
>>> noisy_image = result["image"]

Notes

- When elementwise=False and per_channel=False, a single multiplier is applied to the entire image. - When elementwise=False and per_channel=True, each channel gets a different multiplier. - When elementwise=True and per_channel=False, each pixel gets the same multiplier across all channels. - When elementwise=True and per_channel=True, each pixel in each channel gets a unique multiplier. - Setting per_channel=False is slightly faster, especially for larger images. - This transform can be used to simulate various lighting conditions or to create noise that scales with image intensity.

References

  • [{'description': 'Multiplicative noise', 'source': 'https://en.wikipedia.org/wiki/Multiplicative_noise'}]

Normalizeclass

Normalize(
    mean: tuple[float, ...] | float | None = (0.485, 0.456, 0.406),
    std: tuple[float, ...] | float | None = (0.229, 0.224, 0.225),
    max_pixel_value: float | None = 255.0,
    normalization: 'standard' | 'image' | 'image_per_channel' | 'min_max' | 'min_max_per_channel' = standard,
    p: float = 1.0
)

Applies various normalization techniques to an image. The specific normalization technique can be selected with the `normalization` parameter. Standard normalization is applied using the formula: `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`. Other normalization techniques adjust the image based on global or per-channel statistics, or scale pixel values to a specified range.

Parameters

NameTypeDefaultDescription
mean
One of:
  • tuple[float, ...]
  • float
  • None
(0.485, 0.456, 0.406)Mean values for standard normalization. For "standard" normalization, the default values are ImageNet mean values: (0.485, 0.456, 0.406).
std
One of:
  • tuple[float, ...]
  • float
  • None
(0.229, 0.224, 0.225)Standard deviation values for standard normalization. For "standard" normalization, the default values are ImageNet standard deviation :(0.229, 0.224, 0.225).
max_pixel_value
One of:
  • float
  • None
255.0Maximum possible pixel value, used for scaling in standard normalization. Defaults to 255.0.
normalization
One of:
  • 'standard'
  • 'image'
  • 'image_per_channel'
  • 'min_max'
  • 'min_max_per_channel'
standardSpecifies the normalization technique to apply. Defaults to "standard". - "standard": Applies the formula `(img - mean * max_pixel_value) / (std * max_pixel_value)`. The default mean and std are based on ImageNet. You can use mean and std values of (0.5, 0.5, 0.5) for inception normalization. And mean values of (0, 0, 0) and std values of (1, 1, 1) for YOLO. - "image": Normalizes the whole image based on its global mean and standard deviation. - "image_per_channel": Normalizes the image per channel based on each channel's mean and standard deviation. - "min_max": Scales the image pixel values to a [0, 1] range based on the global minimum and maximum pixel values. - "min_max_per_channel": Scales each channel of the image pixel values to a [0, 1] range based on the per-channel minimum and maximum pixel values.
pfloat1.0Probability of applying the transform. Defaults to 1.0.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> # Standard ImageNet normalization
>>> transform = A.Normalize(
...     mean=(0.485, 0.456, 0.406),
...     std=(0.229, 0.224, 0.225),
...     max_pixel_value=255.0,
...     p=1.0
... )
>>> normalized_image = transform(image=image)["image"]
>>>
>>> # Min-max normalization
>>> transform_minmax = A.Normalize(normalization="min_max", p=1.0)
>>> normalized_image_minmax = transform_minmax(image=image)["image"]

Notes

- For "standard" normalization, `mean`, `std`, and `max_pixel_value` must be provided. - For other normalization types, these parameters are ignored. - For inception normalization, use mean values of (0.5, 0.5, 0.5). - For YOLO normalization, use mean values of (0, 0, 0) and std values of (1, 1, 1). - This transform is often used as a final step in image preprocessing pipelines to prepare images for neural network input.

References

  • [{'description': 'ImageNet mean and std', 'source': 'https://pytorch.org/vision/stable/models.html'}, {'description': 'Inception preprocessing', 'source': 'https://keras.io/api/applications/inceptionv3/'}]

PhotoMetricDistortclass

PhotoMetricDistort(
    brightness_range: tuple[float, float] = (0.875, 1.125),
    contrast_range: tuple[float, float] = (0.5, 1.5),
    saturation_range: tuple[float, float] = (0.5, 1.5),
    hue_range: tuple[float, float] = (-0.05, 0.05),
    distort_p: float = 0.5,
    p: float = 0.5
)

SSD-style photometric distortion: brightness, contrast, saturation, hue, channel shuffle; each with probability distort_p. For detection training. Applies brightness, contrast, saturation, and hue adjustments independently with probability `distort_p` each. Contrast is applied either before or after the HSV-space adjustments (randomly chosen). Optionally permutes channels with probability `distort_p`. This mirrors the `RandomPhotometricDistort` transform from torchvision but uses our existing `adjust_*_torchvision` functional primitives.

Parameters

NameTypeDefaultDescription
brightness_rangetuple[float, float](0.875, 1.125)Multiplicative factor range for brightness. Factor is drawn uniformly from this range. Must be non-negative. Default: `(0.875, 1.125)`.
contrast_rangetuple[float, float](0.5, 1.5)Multiplicative factor range for contrast. Factor is drawn uniformly from this range. Must be non-negative. Default: `(0.5, 1.5)`.
saturation_rangetuple[float, float](0.5, 1.5)Multiplicative factor range for saturation. Factor is drawn uniformly from this range. Must be non-negative. Default: `(0.5, 1.5)`.
hue_rangetuple[float, float](-0.05, 0.05)Additive factor range for hue. Factor is drawn uniformly from this range. Must be in `[-0.5, 0.5]`. Default: `(-0.05, 0.05)`.
distort_pfloat0.5Probability of applying each individual distortion (brightness, contrast, saturation, hue, channel permutation). Default: `0.5`.
pfloat0.5Probability of applying the overall transform. Default: `0.5`.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50]], dtype=np.float32)
>>> bbox_labels = [1]
>>> keypoints = np.array([[20, 30]], dtype=np.float32)
>>> keypoint_labels = [0]
>>>
>>> transform = A.Compose([
...     A.PhotoMetricDistort(
...         brightness_range=(0.875, 1.125),
...         contrast_range=(0.5, 1.5),
...         saturation_range=(0.5, 1.5),
...         hue_range=(-0.05, 0.05),
...         distort_p=0.5,
...         p=1.0,
...     )
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
...    keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> result = transform(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels,
... )
>>> transformed_image = result['image']

Notes

- Each of the five distortions (brightness, contrast, saturation, hue, channel shuffle) is applied independently with probability `distort_p`. - Contrast is randomly applied either before or after saturation/hue adjustment. - For single-channel images, saturation and hue adjustments have no effect.

References

  • [{'description': 'SSD', 'source': 'https://arxiv.org/abs/1512.02325'}, {'description': 'torchvision RandomPhotometricDistort', 'source': 'https://pytorch.org/vision/stable/generated/torchvision.transforms.v2.RandomPhotometricDistort.html'}]

PlanckianJitterclass

PlanckianJitter(
    mode: 'blackbody' | 'cied' = blackbody,
    temperature_limit: tuple[int, int] | None,
    sampling_method: 'uniform' | 'gaussian' = uniform,
    p: float = 0.5
)

Simulate color temperature variation via Planckian locus jitter. mode and magnitude control the shift. Good for robustness to different light sources. This transform adjusts the color of an image to mimic the effect of different color temperatures of light sources, based on Planck's law of black body radiation. It can simulate the appearance of an image under various lighting conditions, from warm (reddish) to cool (bluish) color casts. PlanckianJitter vs. ColorJitter: PlanckianJitter is fundamentally different from ColorJitter in its approach and use cases: 1. Physics-based: PlanckianJitter is grounded in the physics of light, simulating real-world color temperature changes. ColorJitter applies arbitrary color adjustments. 2. Natural effects: This transform produces color shifts that correspond to natural lighting variations, making it ideal for outdoor scene simulation or color constancy problems. 3. Single parameter: Color changes are controlled by a single, physically meaningful parameter (color temperature), unlike ColorJitter's multiple abstract parameters. 4. Correlated changes: Color shifts are correlated across channels in a way that mimics natural light, whereas ColorJitter can make independent channel adjustments. When to use PlanckianJitter: - Simulating different times of day or lighting conditions in outdoor scenes - Augmenting data for computer vision tasks that need to be robust to natural lighting changes - Preparing synthetic data to better match real-world lighting variations - Color constancy research or applications - When you need physically plausible color variations rather than arbitrary color changes The logic behind PlanckianJitter: As the color temperature increases: 1. Lower temperatures (around 3000K) produce warm, reddish tones, simulating sunset or incandescent lighting. 2. Mid-range temperatures (around 5500K) correspond to daylight. 3. Higher temperatures (above 7000K) result in cool, bluish tones, similar to overcast sky or shade. This progression mimics the natural variation of sunlight throughout the day and in different weather conditions.

Parameters

NameTypeDefaultDescription
mode
One of:
  • 'blackbody'
  • 'cied'
blackbodyThe mode of the transformation. - "blackbody": Simulates blackbody radiation color changes. - "cied": Uses the CIE D illuminant series for color temperature simulation. Default: "blackbody"
temperature_limit
One of:
  • tuple[int, int]
  • None
-The range of color temperatures (in Kelvin) to sample from. - For "blackbody" mode: Should be within [3000K, 15000K]. Default: (3000, 15000) - For "cied" mode: Should be within [4000K, 15000K]. Default: (4000, 15000) If None, the default ranges will be used based on the selected mode. Higher temperatures produce cooler (bluish) images, lower temperatures produce warmer (reddish) images.
sampling_method
One of:
  • 'uniform'
  • 'gaussian'
uniformMethod to sample the temperature. - "uniform": Samples uniformly across the specified range. - "gaussian": Samples from a Gaussian distribution centered at 6500K (approximate daylight). Default: "uniform"
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> transform = A.PlanckianJitter(mode="blackbody",
...                               temperature_range=(3000, 9000),
...                               sampling_method="uniform",
...                               p=1.0)
>>> result = transform(image=image)
>>> jittered_image = result["image"]

Notes

- The transform preserves the overall brightness of the image while shifting its color. - The "blackbody" mode provides a wider range of color shifts, especially in the lower (warmer) temperatures. - The "cied" mode is based on standard illuminants and may provide more realistic daylight variations. - The Gaussian sampling method tends to produce more subtle variations, as it's centered around daylight. - Unlike ColorJitter, this transform ensures that color changes are physically plausible and correlated across channels, maintaining the natural appearance of the scene under different lighting conditions.

References

  • [{'description': "Planck's law", 'source': 'https://en.wikipedia.org/wiki/Planck%27s_law'}, {'description': 'CIE Standard Illuminants', 'source': 'https://en.wikipedia.org/wiki/Standard_illuminant'}, {'description': 'Color temperature', 'source': 'https://en.wikipedia.org/wiki/Color_temperature'}, {'description': 'Implementation inspired by', 'source': 'https://github.com/TheZino/PlanckianJitter'}]

PlasmaBrightnessContrastclass

PlasmaBrightnessContrast(
    brightness_range: tuple[float, float] = (-0.3, 0.3),
    contrast_range: tuple[float, float] = (-0.3, 0.3),
    plasma_size: int = 256,
    roughness: float = 3.0,
    p: float = 0.5
)

Plasma fractal (Diamond-Square) pattern varies brightness and contrast spatially. brightness_range, contrast_range. Organic, non-uniform look. Uses Diamond-Square algorithm to generate organic-looking fractal patterns that create spatially-varying brightness and contrast adjustments.

Parameters

NameTypeDefaultDescription
brightness_rangetuple[float, float](-0.3, 0.3)Range for brightness adjustment strength. Values between -1 and 1: - Positive values increase brightness - Negative values decrease brightness - 0 means no brightness change Default: (-0.3, 0.3)
contrast_rangetuple[float, float](-0.3, 0.3)Range for contrast adjustment strength. Values between -1 and 1: - Positive values increase contrast - Negative values decrease contrast - 0 means no contrast change Default: (-0.3, 0.3)
plasma_sizeint256Size of the initial plasma pattern grid. Larger values create more detailed patterns but are slower to compute. The pattern will be resized to match the input image dimensions. Default: 256
roughnessfloat3.0Controls how quickly the noise amplitude increases at each iteration. Must be greater than 0: - Low values (< 1.0): Smoother, more gradual pattern - Medium values (~2.0): Natural-looking pattern - High values (> 3.0): Very rough, noisy pattern Default: 3.0
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import albumentations as A
>>> import numpy as np

# Default parameters
>>> transform = A.PlasmaBrightnessContrast(p=1.0)

# Custom adjustments
>>> transform = A.PlasmaBrightnessContrast(
...     brightness_range=(-0.5, 0.5),
...     contrast_range=(-0.3, 0.3),
...     plasma_size=512,    # More detailed pattern
...     roughness=0.7,      # Smoother transitions
...     p=1.0
... )

Notes

- Works with any number of channels (grayscale, RGB, multispectral) - The same plasma pattern is applied to all channels - Operations are performed in float32 precision - Final values are clipped to valid range [0, max_value]

References

  • [{'description': 'Fournier, Fussell, and Carpenter, "Computer rendering of stochastic models,"', 'source': 'Communications of the ACM, 1982. Paper introducing the Diamond-Square algorithm.'}, {'description': 'Diamond-Square algorithm', 'source': 'https://en.wikipedia.org/wiki/Diamond-square_algorithm'}]

PlasmaShadowclass

PlasmaShadow(
    shadow_intensity_range: tuple[float, float] = (0.3, 0.7),
    plasma_size: int = 256,
    roughness: float = 3.0,
    p: float = 0.5
)

Plasma fractal (Diamond-Square) shadow: organic darkening. shadow_intensity_range, roughness. Good for natural shading and lighting variation. Creates organic-looking shadows using plasma fractal noise pattern. The shadow intensity varies smoothly across the image, creating natural-looking darkening effects that can simulate shadows, shading, or lighting variations.

Parameters

NameTypeDefaultDescription
shadow_intensity_rangetuple[float, float](0.3, 0.7)Range for shadow intensity. Values between 0 and 1: - 0 means no shadow (original image) - 1 means maximum darkening (black) - Values between create partial shadows Default: (0.3, 0.7)
plasma_sizeint256-
roughnessfloat3.0Controls how quickly the noise amplitude increases at each iteration. Must be greater than 0: - Low values (< 1.0): Smoother, more gradual shadows - Medium values (~2.0): Natural-looking shadows - High values (> 3.0): Very rough, noisy shadows Default: 3.0
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import albumentations as A
>>> import numpy as np

# Default parameters for natural shadows
>>> transform = A.PlasmaShadow(p=1.0)

# Subtle, smooth shadows
>>> transform = A.PlasmaShadow(
...     shadow_intensity_range=(0.1, 0.3),
...     roughness=0.7,
...     p=1.0
... )

# Dramatic, detailed shadows
>>> transform = A.PlasmaShadow(
...     shadow_intensity_range=(0.5, 0.9),
...     roughness=0.3,
...     p=1.0
... )

Notes

- The transform darkens the image using a plasma pattern - Works with any number of channels (grayscale, RGB, multispectral) - Shadow pattern is generated using Diamond-Square algorithm with specific kernels - The same shadow pattern is applied to all channels - Final values are clipped to valid range [0, max_value]

References

  • [{'description': 'Fournier, Fussell, and Carpenter, "Computer rendering of stochastic models,"', 'source': 'Communications of the ACM, 1982. Paper introducing the Diamond-Square algorithm.'}, {'description': 'Diamond-Square algorithm', 'source': 'https://en.wikipedia.org/wiki/Diamond-square_algorithm'}]

Posterizeclass

Posterize(
    num_bits: int | tuple[int, int] | list[tuple[int, int]] = 4,
    p: float = 0.5
)

Reduce bits per color channel (e.g. 8→4). num_bits_range controls strength; lower gives stronger posterization. Simulates low-bit-depth or compression. This transform applies color posterization, a technique that reduces the number of distinct colors used in an image. It works by lowering the number of bits used to represent each color channel, effectively creating a "poster-like" effect with fewer color gradations.

Parameters

NameTypeDefaultDescription
num_bits
One of:
  • int
  • tuple[int, int]
  • list[tuple[int, int]]
4Defines the number of bits to keep for each color channel. Can be specified in several ways: - Single int: Same number of bits for all channels. Range: [1, 7]. - tuple of two ints: (min_bits, max_bits) to randomly choose from. Range for each: [1, 7]. - list of three ints: Specific number of bits for each channel [r_bits, g_bits, b_bits]. - list of three tuples: Ranges for each channel [(r_min, r_max), (g_min, g_max), (b_min, b_max)]. Default: 4
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Posterize all channels to 3 bits
>>> transform = A.Posterize(num_bits=3, p=1.0)
>>> posterized_image = transform(image=image)["image"]

# Randomly posterize between 2 and 5 bits
>>> transform = A.Posterize(num_bits=(2, 5), p=1.0)
>>> posterized_image = transform(image=image)["image"]

# Different bits for each channel
>>> transform = A.Posterize(num_bits=[3, 5, 2], p=1.0)
>>> posterized_image = transform(image=image)["image"]

# Range of bits for each channel
>>> transform = A.Posterize(num_bits=[(1, 3), (3, 5), (2, 4)], p=1.0)
>>> posterized_image = transform(image=image)["image"]

Notes

- The effect becomes more pronounced as the number of bits is reduced. - This transform can create interesting artistic effects or be used for image compression simulation. - Posterization is particularly useful for: * Creating stylized or retro-looking images * Reducing the color palette for specific artistic effects * Simulating the look of older or lower-quality digital images * Data augmentation in scenarios where color depth might vary

References

  • [{'description': 'Color Quantization', 'source': 'https://en.wikipedia.org/wiki/Color_quantization'}, {'description': 'Posterization', 'source': 'https://en.wikipedia.org/wiki/Posterization'}]

RGBShiftclass

RGBShift(
    r_shift_limit: tuple[float, float] | float = (-20, 20),
    g_shift_limit: tuple[float, float] | float = (-20, 20),
    b_shift_limit: tuple[float, float] | float = (-20, 20),
    p: float = 0.5
)

Shift R, G, B with separate ranges. Specialized AdditiveNoise with constant uniform shifts. Params: r_shift_limit, g_shift_limit, b_shift_limit. A specialized version of AdditiveNoise that applies constant uniform shifts to RGB channels. Each channel (R,G,B) can have its own shift range specified.

Parameters

NameTypeDefaultDescription
r_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for shifting the red channel. Options: - If tuple (min, max): Sample shift value from this range - If int: Sample shift value from (-r_shift_limit, r_shift_limit) - For uint8 images: Values represent absolute shifts in [0, 255] - For float images: Values represent relative shifts in [0, 1] Default: (-20, 20)
g_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for shifting the green channel. Options: - If tuple (min, max): Sample shift value from this range - If int: Sample shift value from (-g_shift_limit, g_shift_limit) - For uint8 images: Values represent absolute shifts in [0, 255] - For float images: Values represent relative shifts in [0, 1] Default: (-20, 20)
b_shift_limit
One of:
  • tuple[float, float]
  • float
(-20, 20)Range for shifting the blue channel. Options: - If tuple (min, max): Sample shift value from this range - If int: Sample shift value from (-b_shift_limit, b_shift_limit) - For uint8 images: Values represent absolute shifts in [0, 255] - For float images: Values represent relative shifts in [0, 1] Default: (-20, 20)
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A

# Shift RGB channels of uint8 image
>>> transform = A.RGBShift(
...     r_shift_limit=30,  # Will sample red shift from [-30, 30]
...     g_shift_limit=(-20, 20),  # Will sample green shift from [-20, 20]
...     b_shift_limit=(-10, 10),  # Will sample blue shift from [-10, 10]
...     p=1.0
... )
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> shifted = transform(image=image)["image"]

# Same effect using AdditiveNoise
>>> transform = A.AdditiveNoise(
...     noise_type="uniform",
...     spatial_mode="constant",  # One value per channel
...     noise_params={
...         "ranges": [(-30/255, 30/255), (-20/255, 20/255), (-10/255, 10/255)]
...     },
...     p=1.0
... )

Notes

- Values are shifted independently for each channel - For uint8 images: * Input ranges like (-20, 20) represent pixel value shifts * A shift of 20 means adding 20 to that channel * Final values are clipped to [0, 255] - For float32 images: * Input ranges like (-0.1, 0.1) represent relative shifts * A shift of 0.1 means adding 0.1 to that channel * Final values are clipped to [0, 1]

RandomBrightnessContrastclass

RandomBrightnessContrast(
    brightness_limit: tuple[float, float] | float = (-0.2, 0.2),
    contrast_limit: tuple[float, float] | float = (-0.2, 0.2),
    brightness_by_max: bool = True,
    ensure_safe_range: bool = False,
    p: float = 0.5
)

Randomly adjust brightness and contrast with separate ranges. Simple and fast; good baseline color augmentation for classification and detection. This transform adjusts the brightness and contrast of an image simultaneously, allowing for a wide range of lighting and contrast variations. It's particularly useful for data augmentation in computer vision tasks, helping models become more robust to different lighting conditions.

Parameters

NameTypeDefaultDescription
brightness_limit
One of:
  • tuple[float, float]
  • float
(-0.2, 0.2)Factor range for changing brightness. If a single float value is provided, the range will be (-brightness_limit, brightness_limit). Values should typically be in the range [-1.0, 1.0], where 0 means no change, 1.0 means maximum brightness, and -1.0 means minimum brightness. Default: (-0.2, 0.2).
contrast_limit
One of:
  • tuple[float, float]
  • float
(-0.2, 0.2)Factor range for changing contrast. If a single float value is provided, the range will be (-contrast_limit, contrast_limit). Values should typically be in the range [-1.0, 1.0], where 0 means no change, 1.0 means maximum increase in contrast, and -1.0 means maximum decrease in contrast. Default: (-0.2, 0.2).
brightness_by_maxboolTrueIf True, adjusts brightness by scaling pixel values up to the maximum value of the image's dtype. If False, uses the mean pixel value for adjustment. Default: True.
ensure_safe_rangeboolFalseIf True, adjusts alpha and beta to prevent overflow/underflow. This ensures output values stay within the valid range for the image dtype without clipping. Default: False.
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Default usage
>>> transform = A.RandomBrightnessContrast(p=1.0)
>>> augmented_image = transform(image=image)["image"]

# Custom brightness and contrast limits
>>> transform = A.RandomBrightnessContrast(
...     brightness_limit=0.3,
...     contrast_limit=0.3,
...     p=1.0
... )
>>> augmented_image = transform(image=image)["image"]

# Adjust brightness based on mean value
>>> transform = A.RandomBrightnessContrast(
...     brightness_limit=0.2,
...     contrast_limit=0.2,
...     brightness_by_max=False,
...     p=1.0
... )
>>> augmented_image = transform(image=image)["image"]

Notes

- The order of operation is: contrast adjustment, then brightness adjustment. - For uint8 images, the output is clipped to [0, 255] range. - For float32 images, the output is clipped to [0, 1] range. - The `brightness_by_max` parameter affects how brightness is adjusted: * If True, brightness adjustment is more pronounced and can lead to more saturated results. * If False, brightness adjustment is more subtle and preserves the overall lighting better. - This transform is useful for: * Simulating different lighting conditions * Enhancing low-light or overexposed images * Data augmentation to improve model robustness

References

  • [{'description': 'Brightness', 'source': 'https://en.wikipedia.org/wiki/Brightness'}, {'description': 'Contrast', 'source': 'https://en.wikipedia.org/wiki/Contrast_(vision)'}]

RandomFogclass

RandomFog(
    alpha_coef: float = 0.08,
    fog_coef_range: tuple[float, float] = (0.3, 1),
    p: float = 0.5
)

Simulate fog by overlaying semi-transparent circles and blending with a fog color. Good for driving or outdoor robustness to weather. Fog is built from random circles with controllable intensity; an image-size-dependent Gaussian blur is applied to the result. Patch-based (no depth); for distance-dependent fog use AtmosphericFog.

Parameters

NameTypeDefaultDescription
alpha_coeffloat0.08Transparency of the fog circles in [0, 1]. Default: 0.08.
fog_coef_rangetuple[float, float](0.3, 1)Range for fog intensity coefficient in [0, 1]. Default: (0.3, 1).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Default usage
>>> transform = A.RandomFog(p=1.0)
>>> foggy_image = transform(image=image)["image"]

# Custom fog intensity range
>>> transform = A.RandomFog(fog_coef_range=(0.3, 0.8), p=1.0)
>>> foggy_image = transform(image=image)["image"]

# Adjust fog transparency
>>> transform = A.RandomFog(fog_coef_range=(0.2, 0.5), alpha_coef=0.1, p=1.0)
>>> foggy_image = transform(image=image)["image"]

Notes

- Fog is created by overlaying semi-transparent circles at random positions and with random radius; alpha is controlled by alpha_coef. - Higher fog_coef values give denser fog; effect is typically stronger toward center and gradually decreases toward the edges. - A Gaussian blur (dependent on the shorter image dimension) is applied after blending to reduce sharpness.

References

  • [{'description': 'Fog', 'source': 'https://en.wikipedia.org/wiki/Fog'}, {'description': 'Atmospheric perspective', 'source': 'https://en.wikipedia.org/wiki/Aerial_perspective'}]

RandomGammaclass

RandomGamma(
    gamma_limit: tuple[float, float] | float = (80, 120),
    p: float = 0.5
)

Apply random gamma correction (power-law on intensity). gamma_limit controls range. Common for exposure and display variation. Gamma correction, or simply gamma, is a nonlinear operation used to encode and decode luminance or tristimulus values in imaging systems. This transform can adjust the brightness of an image while preserving the relative differences between darker and lighter areas, making it useful for simulating different lighting conditions or correcting for display characteristics.

Parameters

NameTypeDefaultDescription
gamma_limit
One of:
  • tuple[float, float]
  • float
(80, 120)If gamma_limit is a single float value, the range will be (1, gamma_limit). If it's a tuple of two floats, they will serve as the lower and upper bounds for gamma adjustment. Values are in terms of percentage change, e.g., (80, 120) means the gamma will be between 80% and 120% of the original. Default: (80, 120).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Default usage
>>> transform = A.RandomGamma(p=1.0)
>>> augmented_image = transform(image=image)["image"]

# Custom gamma range
>>> transform = A.RandomGamma(gamma_limit=(50, 150), p=1.0)
>>> augmented_image = transform(image=image)["image"]

# Applying with other transforms
>>> transform = A.Compose([
...     A.RandomGamma(gamma_limit=(80, 120), p=0.5),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> augmented_image = transform(image=image)["image"]

Notes

- The gamma correction is applied using the formula: output = input^gamma - Gamma values > 1 will make the image darker, while values < 1 will make it brighter - This transform is particularly useful for: * Simulating different lighting conditions * Correcting for non-linear display characteristics * Enhancing contrast in certain regions of the image * Data augmentation in computer vision tasks

References

  • [{'description': 'Gamma correction', 'source': 'https://en.wikipedia.org/wiki/Gamma_correction'}, {'description': 'Power law (Gamma) encoding', 'source': 'https://www.cambridgeincolour.com/tutorials/gamma-correction.htm'}]

RandomGravelclass

RandomGravel(
    gravel_roi: tuple[float, float, float, float] = (0.1, 0.4, 0.9, 0.9),
    number_of_patches: int = 2,
    p: float = 0.5
)

Add gravel-like particle artifacts on the image. Number and size of particles and ROI are configurable. Simulates dirt or debris on a lens or surface. This transform simulates the appearance of gravel or small stones scattered across specific regions of an image. It's particularly useful for augmenting datasets of road or terrain images, adding realistic texture variations.

Parameters

NameTypeDefaultDescription
gravel_roituple[float, float, float, float](0.1, 0.4, 0.9, 0.9)Region of interest where gravel will be added, specified as (x_min, y_min, x_max, y_max) in relative coordinates [0, 1]. Default: (0.1, 0.4, 0.9, 0.9).
number_of_patchesint2Number of gravel patch regions to generate within the ROI. Each patch will contain multiple gravel particles. Default: 2.
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Default usage
>>> transform = A.RandomGravel(p=1.0)
>>> augmented_image = transform(image=image)["image"]

# Custom ROI and number of patches
>>> transform = A.RandomGravel(
...     gravel_roi=(0.2, 0.2, 0.8, 0.8),
...     number_of_patches=5,
...     p=1.0
... )
>>> augmented_image = transform(image=image)["image"]

# Combining with other transforms
>>> transform = A.Compose([
...     A.RandomGravel(p=0.7),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> augmented_image = transform(image=image)["image"]

Notes

- The gravel effect is created by modifying the saturation channel in the HLS color space. - Gravel particles are distributed within randomly generated patches inside the specified ROI. - This transform is particularly useful for: * Augmenting datasets for road condition analysis * Simulating variations in terrain for computer vision tasks * Adding realistic texture to synthetic images of outdoor scenes

References

  • [{'description': 'Road surface textures', 'source': 'https://en.wikipedia.org/wiki/Road_surface'}, {'description': 'HLS color space', 'source': 'https://en.wikipedia.org/wiki/HSL_and_HSV'}]

RandomRainclass

RandomRain(
    slant_range: tuple[float, float] = (-10, 10),
    drop_length: int | None,
    drop_width: int = 1,
    drop_color: tuple[int, int, int] = (200, 200, 200),
    blur_value: int = 7,
    brightness_coefficient: float = 0.7,
    rain_type: 'drizzle' | 'heavy' | 'torrential' | 'default' = default,
    p: float = 0.5
)

Add rain streaks (semi-transparent lines), optional blur and brightness reduction. Good for outdoor or driving robustness to rainy conditions. Streaks are drawn with configurable slant, length, and width; blur and darkening simulate wet, low-contrast views. Density and style are configurable (e.g. drizzle, heavy, torrential).

Parameters

NameTypeDefaultDescription
slant_rangetuple[float, float](-10, 10)Range for the rain slant angle in degrees. Negative values slant to the left, positive to the right. Default: (-10, 10).
drop_length
One of:
  • int
  • None
-Length of the rain drops in pixels. If None, drop length will be automatically calculated as height // 8. This allows the rain effect to scale with the image size. Default: None
drop_widthint1Width of the rain drops in pixels. Default: 1.
drop_colortuple[int, int, int](200, 200, 200)Color of the rain drops in RGB format. Default: (200, 200, 200).
blur_valueint7Blur value for simulating rain effect. Rainy views are typically blurry. Default: 7.
brightness_coefficientfloat0.7Coefficient to adjust the brightness of the image. Rainy scenes are usually darker. Should be in the range (0, 1]. Default: 0.7.
rain_type
One of:
  • 'drizzle'
  • 'heavy'
  • 'torrential'
  • 'default'
defaultType of rain to simulate.
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> # Default usage
>>> transform = A.RandomRain(p=1.0)
>>> rainy_image = transform(image=image)["image"]
>>>
>>> # Custom rain parameters
>>> transform = A.RandomRain(
...     slant_range=(-15, 15),
...     drop_length=30,
...     drop_width=2,
...     drop_color=(180, 180, 180),
...     blur_value=5,
...     brightness_coefficient=0.8,
...     p=1.0
... )
>>> rainy_image = transform(image=image)["image"]
>>>
>>> # Heavy rain
>>> transform = A.RandomRain(rain_type="heavy", p=1.0)
>>> heavy_rain_image = transform(image=image)["image"]

Notes

- Rain is drawn as semi-transparent lines; slant simulates wind. - rain_type (drizzle, heavy, torrential, default) controls drop count and style. - Blur and brightness reduction mimic wet, darker scenes.

References

  • [{'description': 'Rain visualization techniques', 'source': 'https://developer.nvidia.com/gpugems/gpugems3/part-iv-image-effects/chapter-27-real-time-rain-rendering'}, {'description': 'Weather effects in computer vision', 'source': 'https://www.sciencedirect.com/science/article/pii/S1077314220300692'}]

RandomShadowclass

RandomShadow(
    shadow_roi: tuple[float, float, float, float] = (0, 0.5, 1, 1),
    num_shadows_limit: tuple[int, int] = (1, 2),
    shadow_dimension: int = 5,
    shadow_intensity_range: tuple[float, float] = (0.5, 0.5),
    p: float = 0.5
)

Simulate cast shadows by darkening random regions. shadow_roi, num_shadows, shadow_dimension control placement and softness. Improves lighting robustness. This transform adds realistic shadow effects to images, which can be useful for augmenting datasets for outdoor scene analysis, autonomous driving, or any computer vision task where shadows may be present.

Parameters

NameTypeDefaultDescription
shadow_roituple[float, float, float, float](0, 0.5, 1, 1)Region of the image where shadows will appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1]. Default: (0, 0.5, 1, 1).
num_shadows_limittuple[int, int](1, 2)Lower and upper limits for the possible number of shadows. Default: (1, 2).
shadow_dimensionint5Number of edges in the shadow polygons. Default: 5.
shadow_intensity_rangetuple[float, float](0.5, 0.5)Range for the shadow intensity. Larger value means darker shadow. Should be two float values between 0 and 1. Default: (0.5, 0.5).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Default usage
>>> transform = A.RandomShadow(p=1.0)
>>> shadowed_image = transform(image=image)["image"]

# Custom shadow parameters
>>> transform = A.RandomShadow(
...     shadow_roi=(0.2, 0.2, 0.8, 0.8),
...     num_shadows_limit=(2, 4),
...     shadow_dimension=8,
...     shadow_intensity_range=(0.3, 0.7),
...     p=1.0
... )
>>> shadowed_image = transform(image=image)["image"]

# Combining with other transforms
>>> transform = A.Compose([
...     A.RandomShadow(p=0.5),
...     A.RandomBrightnessContrast(p=0.5),
... ])
>>> augmented_image = transform(image=image)["image"]

Notes

- Shadows are created by generating random polygons within the specified ROI and reducing the brightness of the image in these areas. - The number of shadows, their shapes, and intensities can be randomized for variety. - This transform is particularly useful for: * Augmenting datasets for outdoor scene understanding * Improving robustness of object detection models to shadowed conditions * Simulating different lighting conditions in synthetic datasets

References

  • [{'description': 'Shadow detection and removal', 'source': 'https://www.sciencedirect.com/science/article/pii/S1047320315002035'}, {'description': 'Shadows in computer vision', 'source': 'https://en.wikipedia.org/wiki/Shadow_detection'}]

RandomSnowclass

RandomSnow(
    brightness_coeff: float = 2.5,
    snow_point_range: tuple[float, float] = (0.1, 0.3),
    method: 'bleach' | 'texture' = bleach,
    p: float = 0.5
)

Add snow overlay via bleach (brightness threshold) or texture (noise-based overlay). Good for winter or snowy-scene robustness in outdoor imagery. Two methods: "bleach" brightens pixels above a threshold (faster, simpler); "texture" adds a depth-weighted snow layer with sparkle (more realistic, heavier).

Parameters

NameTypeDefaultDescription
brightness_coefffloat2.5Brightness multiplier for snow; must be > 0. Default: 2.5.
snow_point_rangetuple[float, float](0.1, 0.3)Range for snow intensity threshold in (0, 1). Default: (0.1, 0.3).
method
One of:
  • 'bleach'
  • 'texture'
bleach"bleach" = threshold + brighten; "texture" = noise-based overlay with depth and sparkle. Default: "bleach".
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Default usage (bleach method)
>>> transform = A.RandomSnow(p=1.0)
>>> snowy_image = transform(image=image)["image"]

# Using texture method with custom parameters
>>> transform = A.RandomSnow(
...     snow_point_range=(0.2, 0.4),
...     brightness_coeff=2.0,
...     method="texture",
...     p=1.0
... )
>>> snowy_image = transform(image=image)["image"]

Notes

- "bleach": brightness threshold in HLS; pixels above snow_point are scaled by brightness_coeff. Fast, less realistic. - "texture": HSV brightness boost, Gaussian noise texture, depth gradient (stronger at top), alpha blend, blue tint, sparkle. More realistic, heavier.

References

  • [{'description': 'Bleach method', 'source': 'https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library'}, {'description': 'Texture method', 'source': 'Inspired by computer graphics techniques for snow rendering and atmospheric scattering simulations.'}]

RandomSunFlareclass

RandomSunFlare(
    flare_roi: tuple[float, float, float, float] = (0, 0, 1, 0.5),
    src_radius: int = 400,
    src_color: tuple[int, ...] = (255, 255, 255),
    angle_range: tuple[float, float] = (0, 1),
    num_flare_circles_range: tuple[int, int] = (6, 10),
    method: 'overlay' | 'physics_based' = overlay,
    p: float = 0.5
)

Simulate lens flare: circles of light and rays. src_radius, num_flare_circles, angle control the effect. Good for outdoor robustness. This transform creates a sun flare effect by overlaying multiple semi-transparent circles of varying sizes and intensities along a line originating from a "sun" point. It offers two methods: a simple overlay technique and a more complex physics-based approach.

Parameters

NameTypeDefaultDescription
flare_roituple[float, float, float, float](0, 0, 1, 0.5)Region of interest where the sun flare can appear. Values are in the range [0, 1] and represent (x_min, y_min, x_max, y_max) in relative coordinates. Default: (0, 0, 1, 0.5).
src_radiusint400Radius of the sun circle in pixels. Default: 400.
src_colortuple[int, ...](255, 255, 255)Color of the sun in RGB format. Default: (255, 255, 255).
angle_rangetuple[float, float](0, 1)Range of angles (in radians) for the flare direction. Values should be in the range [0, 1], where 0 represents 0 radians and 1 represents 2π radians. Default: (0, 1).
num_flare_circles_rangetuple[int, int](6, 10)Range for the number of flare circles to generate. Default: (6, 10).
method
One of:
  • 'overlay'
  • 'physics_based'
overlayMethod to use for generating the sun flare. "overlay" uses a simple alpha blending technique, while "physics_based" simulates more realistic optical phenomena. Default: "overlay".
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [1000, 1000, 3], dtype=np.uint8)

# Default sun flare (overlay method)
>>> transform = A.RandomSunFlare(p=1.0)
>>> flared_image = transform(image=image)["image"]

# Physics-based sun flare with custom parameters

# Default sun flare
>>> transform = A.RandomSunFlare(p=1.0)
>>> flared_image = transform(image=image)["image"]

# Custom sun flare parameters

>>> transform = A.RandomSunFlare(
...     flare_roi=(0.1, 0, 0.9, 0.3),
...     angle_range=(0.25, 0.75),
...     num_flare_circles_range=(5, 15),
...     src_radius=200,
...     src_color=(255, 200, 100),
...     method="physics_based",
...     p=1.0
... )
>>> flared_image = transform(image=image)["image"]

References

  • [{'description': 'Lens flare', 'source': 'https://en.wikipedia.org/wiki/Lens_flare'}, {'description': 'Alpha compositing', 'source': 'https://en.wikipedia.org/wiki/Alpha_compositing'}, {'description': 'Diffraction', 'source': 'https://en.wikipedia.org/wiki/Diffraction'}, {'description': 'Chromatic aberration', 'source': 'https://en.wikipedia.org/wiki/Chromatic_aberration'}, {'description': 'Screen blending', 'source': 'https://en.wikipedia.org/wiki/Blend_modes#Screen'}]

RandomToneCurveclass

RandomToneCurve(
    scale: float = 0.1,
    per_channel: bool = False,
    p: float = 0.5
)

Randomly warp the tone curve to change contrast and tonal distribution. scale and scale_upper control strength. Good for exposure variation. This transform applies a random S-curve to the image's tone curve, adjusting the brightness and contrast in a non-linear manner. It can be applied to the entire image or to each channel separately.

Parameters

NameTypeDefaultDescription
scalefloat0.1Standard deviation of the normal distribution used to sample random distances to move two control points that modify the image's curve. Values should be in range [0, 1]. Higher values will result in more dramatic changes to the image. Default: 0.1
per_channelboolFalseIf True, the tone curve will be applied to each channel of the input image separately, which can lead to color distortion. If False, the same curve is applied to all channels, preserving the original color relationships. Default: False
pfloat0.5Probability of applying the transform. Default: 0.5

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)

# Apply a random tone curve to all channels together
>>> transform = A.RandomToneCurve(scale=0.1, per_channel=False, p=1.0)
>>> augmented_image = transform(image=image)['image']

# Apply random tone curves to each channel separately
>>> transform = A.RandomToneCurve(scale=0.2, per_channel=True, p=1.0)
>>> augmented_image = transform(image=image)['image']

Notes

- This transform modifies the image's histogram by applying a smooth, S-shaped curve to it. - The S-curve is defined by moving two control points of a quadratic Bézier curve. - When per_channel is False, the same curve is applied to all channels, maintaining color balance. - When per_channel is True, different curves are applied to each channel, which can create color shifts. - This transform can be used to adjust image contrast and brightness in a more natural way than linear transforms. - The effect can range from subtle contrast adjustments to more dramatic "vintage" or "faded" looks.

References

  • [{'description': '"What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance"', 'source': 'https://arxiv.org/abs/1912.06960'}, {'description': 'Bézier curve', 'source': 'https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Quadratic_B%C3%A9zier_curves'}, {'description': 'Tone mapping', 'source': 'https://en.wikipedia.org/wiki/Tone_mapping'}]

RingingOvershootclass

RingingOvershoot(
    blur_limit: tuple[int, int] | int = (7, 15),
    cutoff: tuple[float, float] = (0.7853981633974483, 1.5707963267948966),
    p: float = 0.5
)

Create ringing or overshoot artifacts via 2D sinc convolution. blur_limit and cutoff control strength. Simulates sharpening or compression artifacts. This transform simulates the ringing artifacts that can occur in digital image processing, particularly after sharpening or edge enhancement operations. It creates oscillations or overshoots near sharp transitions in the image.

Parameters

NameTypeDefaultDescription
blur_limit
One of:
  • tuple[int, int]
  • int
(7, 15)Maximum kernel size for the sinc filter. Must be an odd number in the range [3, inf). If a single int is provided, the kernel size will be randomly chosen from the range (3, blur_limit). If a tuple (min, max) is provided, the kernel size will be randomly chosen from the range (min, max). Default: (7, 15).
cutofftuple[float, float](0.7853981633974483, 1.5707963267948966)Range to choose the cutoff frequency in radians. Values should be in the range (0, π). A lower cutoff frequency will result in more pronounced ringing effects. Default: (π/4, π/2).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)

# Apply ringing effect with default parameters
>>> transform = A.RingingOvershoot(p=1.0)
>>> ringing_image = transform(image=image)['image']

# Apply ringing effect with custom parameters
>>> transform = A.RingingOvershoot(
...     blur_limit=(9, 17),
...     cutoff=(np.pi/6, np.pi/3),
...     p=1.0
... )
>>> ringing_image = transform(image=image)['image']

Notes

- Ringing artifacts are oscillations of the image intensity function in the neighborhood of sharp transitions, such as edges or object boundaries. - This transform uses a 2D sinc filter (also known as a 2D cardinal sine function) to introduce these artifacts. - The severity of the ringing effect is controlled by both the kernel size (blur_limit) and the cutoff frequency. - Larger kernel sizes and lower cutoff frequencies will generally produce more noticeable ringing effects. - This transform can be useful for: * Simulating imperfections in image processing or transmission systems * Testing the robustness of computer vision models to ringing artifacts * Creating artistic effects that emphasize edges and transitions in images

References

  • [{'description': 'Ringing artifacts', 'source': 'https://en.wikipedia.org/wiki/Ringing_artifacts'}, {'description': 'Sinc filter', 'source': 'https://en.wikipedia.org/wiki/Sinc_filter'}, {'description': 'Digital Image Processing', 'source': 'Rafael C. Gonzalez and Richard E. Woods, 4th Edition'}]

SaltAndPepperclass

SaltAndPepper(
    amount: tuple[float, float] = (0.01, 0.06),
    salt_vs_pepper: tuple[float, float] = (0.4, 0.6),
    p: float = 0.5
)

Apply salt-and-pepper (impulse) noise: randomly set pixels to min or max. amount and salt_vs_pepper control density and ratio. Same mask for all channels. Salt and pepper noise is a form of impulse noise that randomly sets pixels to either maximum value (salt) or minimum value (pepper). The amount and proportion of salt vs pepper can be controlled. The same noise mask is applied to all channels of the image to preserve color consistency.

Parameters

NameTypeDefaultDescription
amounttuple[float, float](0.01, 0.06)Range for total amount of noise (both salt and pepper). Values between 0 and 1. For example: - 0.05 means 5% of all pixels will be replaced with noise - (0.01, 0.06) will sample amount uniformly from 1% to 6% Default: (0.01, 0.06)
salt_vs_peppertuple[float, float](0.4, 0.6)Range for ratio of salt (white) vs pepper (black) noise. Values between 0 and 1. For example: - 0.5 means equal amounts of salt and pepper - 0.7 means 70% of noisy pixels will be salt, 30% pepper - (0.4, 0.6) will sample ratio uniformly from 40% to 60% Default: (0.4, 0.6)
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import albumentations as A
>>> import numpy as np

# Apply salt and pepper noise with default parameters
>>> transform = A.SaltAndPepper(p=1.0)
>>> noisy_image = transform(image=image)["image"]

# Heavy noise with more salt than pepper
>>> transform = A.SaltAndPepper(
...     amount=(0.1, 0.2),       # 10-20% of pixels will be noisy
...     salt_vs_pepper=(0.7, 0.9),  # 70-90% of noise will be salt
...     p=1.0
... )
>>> noisy_image = transform(image=image)["image"]

Notes

- Salt noise sets pixels to maximum value (255 for uint8, 1.0 for float32) - Pepper noise sets pixels to 0 - The noise mask is generated once and applied to all channels to maintain color consistency (i.e., if a pixel is set to salt, all its color channels will be set to maximum value) - The exact number of affected pixels matches the specified amount as masks are generated without overlap

References

  • [{'description': 'Digital Image Processing', 'source': 'Rafael C. Gonzalez and Richard E. Woods, 4th Edition, Chapter 5: Image Restoration and Reconstruction.'}, {'description': 'Fundamentals of Digital Image Processing', 'source': 'A. K. Jain, Chapter 7: Image Degradation and Restoration.'}, {'description': 'Salt and pepper noise', 'source': 'https://en.wikipedia.org/wiki/Salt-and-pepper_noise'}]

Sharpenclass

Sharpen(
    alpha: tuple[float, float] = (0.2, 0.5),
    lightness: tuple[float, float] = (0.5, 1.0),
    method: 'kernel' | 'gaussian' = kernel,
    kernel_size: int = 5,
    sigma: float = 1.0,
    p: float = 0.5
)

Sharpen the image via kernel or Gaussian unsharp method. alpha and lightness control strength. Enhances edges; useful for document or detail-sensitive tasks. Implements two different approaches to image sharpening: 1. Traditional kernel-based method using Laplacian operator 2. Gaussian interpolation method (similar to Kornia's approach)

Parameters

NameTypeDefaultDescription
alphatuple[float, float](0.2, 0.5)Range for the visibility of sharpening effect. At 0, only the original image is visible, at 1.0 only its processed version is visible. Values should be in the range [0, 1]. Used in both methods. Default: (0.2, 0.5).
lightnesstuple[float, float](0.5, 1.0)Range for the lightness of the sharpened image. Only used in 'kernel' method. Larger values create higher contrast. Values should be greater than 0. Default: (0.5, 1.0).
method
One of:
  • 'kernel'
  • 'gaussian'
kernelSharpening algorithm to use: - 'kernel': Traditional kernel-based sharpening using Laplacian operator - 'gaussian': Interpolation between Gaussian blurred and original image Default: 'kernel'
kernel_sizeint5Size of the Gaussian blur kernel for 'gaussian' method. Must be odd. Default: 5
sigmafloat1.0Standard deviation for Gaussian kernel in 'gaussian' method. Default: 1.0
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import albumentations as A
>>> import numpy as np

# Traditional kernel sharpening
>>> transform = A.Sharpen(
...     alpha=(0.2, 0.5),
...     lightness=(0.5, 1.0),
...     method='kernel',
...     p=1.0
... )

# Gaussian interpolation sharpening
>>> transform = A.Sharpen(
...     alpha=(0.5, 1.0),
...     method='gaussian',
...     kernel_size=5,
...     sigma=1.0,
...     p=1.0
... )

Notes

- Kernel sizes must be odd to maintain spatial alignment - Methods produce different visual results: * Kernel method: More pronounced edges, possible artifacts * Gaussian method: More natural look, limited to original sharpness

References

  • [{'description': 'R. C. Gonzalez and R. E. Woods, "Digital Image Processing (4th Edition),"', 'source': 'Chapter 3: Intensity Transformations and Spatial Filtering.'}, {'description': 'J. C. Russ, "The Image Processing Handbook (7th Edition),"', 'source': 'Chapter 4: Image Enhancement.'}, {'description': 'T. Acharya and A. K. Ray, "Image Processing', 'source': 'Principles and Applications,": Chapter 5: Image Enhancement.'}, {'description': 'Unsharp masking', 'source': 'https://en.wikipedia.org/wiki/Unsharp_masking'}, {'description': 'Laplacian operator', 'source': 'https://en.wikipedia.org/wiki/Laplace_operator'}, {'description': 'Gaussian blur', 'source': 'https://en.wikipedia.org/wiki/Gaussian_blur'}]

ShotNoiseclass

ShotNoise(
    scale_range: tuple[float, float] = (0.1, 0.3),
    p: float = 0.5
)

Shot noise (Poisson) in linear light space. Sensor-realistic; use for low-light or photon-limited imaging and camera simulation. Simulates photon-counting: convert to linear space (gamma removed), treat pixel values as expected photon counts, sample from Poisson, convert back. Variance equals mean in linear space; brighter regions have more absolute noise, less relative.

Parameters

NameTypeDefaultDescription
scale_rangetuple[float, float](0.1, 0.3)Reciprocal of photons per unit intensity. Higher = more noise. e.g. 0.1 ≈ low, 1.0 ≈ moderate, 10.0 ≈ high. Default: (0.1, 0.3).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> transform = A.ShotNoise(scale_range=(0.1, 1.0), p=1.0)
>>> noisy_image = transform(image=image)["image"]

Notes

- Pipeline: linear space (gamma = 2.2), Poisson sample, back to display space. - Preserves mean intensity. Per-pixel, per-channel independent.

References

  • [{'description': 'Shot noise', 'source': 'https://en.wikipedia.org/wiki/Shot_noise'}, {'description': 'Original paper', 'source': 'https://doi.org/10.1002/andp.19183622304 (Schottky, 1918)'}, {'description': 'Poisson process', 'source': 'https://en.wikipedia.org/wiki/Poisson_point_process'}, {'description': 'Gamma correction', 'source': 'https://en.wikipedia.org/wiki/Gamma_correction'}]

Solarizeclass

Solarize(
    threshold_range: tuple[float, float] = (0.5, 0.5),
    p: float = 0.5
)

Invert pixel values above a threshold. threshold_range controls cutoff. Strong highlight inversion; useful for data augmentation. This transform applies a solarization effect to the input image. Solarization is a phenomenon in photography in which the image recorded on a negative or on a photographic print is wholly or partially reversed in tone. Dark areas appear light or light areas appear dark. In this implementation, all pixel values above a threshold are inverted.

Parameters

NameTypeDefaultDescription
threshold_rangetuple[float, float](0.5, 0.5)Range for solarizing threshold as a fraction of maximum value. The threshold_range should be in the range [0, 1] and will be multiplied by the maximum value of the image type (255 for uint8 images or 1.0 for float images). Default: (0.5, 0.5) (corresponds to 127.5 for uint8 and 0.5 for float32).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>>
# Solarize uint8 image with fixed threshold at 50% of max value (127.5)
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.Solarize(threshold_range=(0.5, 0.5), p=1.0)
>>> solarized_image = transform(image=image)['image']
>>>
# Solarize uint8 image with random threshold between 40-60% of max value (102-153)
>>> transform = A.Solarize(threshold_range=(0.4, 0.6), p=1.0)
>>> solarized_image = transform(image=image)['image']
>>>
# Solarize float32 image at 50% of max value (0.5)
>>> image = np.random.rand(100, 100, 3).astype(np.float32)
>>> transform = A.Solarize(threshold_range=(0.5, 0.5), p=1.0)
>>> solarized_image = transform(image=image)['image']

Notes

- For uint8 images, pixel values above the threshold are inverted as: 255 - pixel_value - For float32 images, pixel values above the threshold are inverted as: 1.0 - pixel_value - The threshold is applied to each channel independently - The threshold is calculated in two steps: 1. Sample a value from threshold_range 2. Multiply by the image's maximum value: * For uint8: threshold = sampled_value * 255 * For float32: threshold = sampled_value * 1.0 - This transform can create interesting artistic effects or be used for data augmentation

Spatterclass

Spatter(
    mean: tuple[float, float] | float = (0.65, 0.65),
    std: tuple[float, float] | float = (0.3, 0.3),
    gauss_sigma: tuple[float, float] | float = (2, 2),
    cutout_threshold: tuple[float, float] | float = (0.68, 0.68),
    intensity: tuple[float, float] | float = (0.6, 0.6),
    mode: 'rain' | 'mud' = rain,
    color: Sequence | None,
    p: float = 0.5
)

Simulate lens occlusion from rain or mud: splatter patterns and optional blur. fill and spread control appearance. Good for dirty or wet lens robustness.

Parameters

NameTypeDefaultDescription
mean
One of:
  • tuple[float, float]
  • float
(0.65, 0.65)Mean value of normal distribution for generating liquid layer. If single float mean will be sampled from `(0, mean)` If tuple of float mean will be sampled from range `(mean[0], mean[1])`. If you want constant value use (mean, mean). Default (0.65, 0.65)
std
One of:
  • tuple[float, float]
  • float
(0.3, 0.3)Standard deviation value of normal distribution for generating liquid layer. If single float the number will be sampled from `(0, std)`. If tuple of float std will be sampled from range `(std[0], std[1])`. If you want constant value use (std, std). Default: (0.3, 0.3).
gauss_sigma
One of:
  • tuple[float, float]
  • float
(2, 2)Sigma value for gaussian filtering of liquid layer. If single float the number will be sampled from `(0, gauss_sigma)`. If tuple of float gauss_sigma will be sampled from range `(gauss_sigma[0], gauss_sigma[1])`. If you want constant value use (gauss_sigma, gauss_sigma). Default: (2, 3).
cutout_threshold
One of:
  • tuple[float, float]
  • float
(0.68, 0.68)Threshold for filtering liquid layer (determines number of drops). If single float it will used as cutout_threshold. If single float the number will be sampled from `(0, cutout_threshold)`. If tuple of float cutout_threshold will be sampled from range `(cutout_threshold[0], cutout_threshold[1])`. If you want constant value use `(cutout_threshold, cutout_threshold)`. Default: (0.68, 0.68).
intensity
One of:
  • tuple[float, float]
  • float
(0.6, 0.6)Intensity of corruption. If single float the number will be sampled from `(0, intensity)`. If tuple of float intensity will be sampled from range `(intensity[0], intensity[1])`. If you want constant value use `(intensity, intensity)`. Default: (0.6, 0.6).
mode
One of:
  • 'rain'
  • 'mud'
rainType of corruption. Default: "rain".
color
One of:
  • Sequence
  • None
-Corruption elements color. If list uses provided list as color for the effect. If None uses default colors based on mode (rain: (238, 238, 175), mud: (20, 42, 63)).
pfloat0.5probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a sample image
>>> image = np.ones((300, 300, 3), dtype=np.uint8) * 200  # Light gray background
>>> # Add some gradient to make effects more visible
>>> for i in range(300):
...     image[i, :, :] = np.clip(image[i, :, :] - i // 3, 0, 255)
>>>
>>> # Example 1: Rain effect with default parameters
>>> rain_transform = A.Spatter(
...     mode="rain",
...     p=1.0
... )
>>> rain_result = rain_transform(image=image)
>>> rain_image = rain_result['image']  # Image with rain drops
>>>
>>> # Example 2: Heavy rain with custom parameters
>>> heavy_rain = A.Spatter(
...     mode="rain",
...     mean=(0.7, 0.7),             # Higher mean = more coverage
...     std=(0.2, 0.2),              # Lower std = more uniform effect
...     cutout_threshold=(0.65, 0.65),  # Lower threshold = more drops
...     intensity=(0.8, 0.8),        # Higher intensity = more visible effect
...     color=(200, 200, 255),       # Blueish rain drops
...     p=1.0
... )
>>> heavy_rain_result = heavy_rain(image=image)
>>> heavy_rain_image = heavy_rain_result['image']
>>>
>>> # Example 3: Mud effect
>>> mud_transform = A.Spatter(
...     mode="mud",
...     mean=(0.6, 0.6),
...     std=(0.3, 0.3),
...     cutout_threshold=(0.62, 0.62),
...     intensity=(0.7, 0.7),
...     p=1.0
... )
>>> mud_result = mud_transform(image=image)
>>> mud_image = mud_result['image']  # Image with mud splatters
>>>
>>> # Example 4: Custom colored mud
>>> red_mud = A.Spatter(
...     mode="mud",
...     mean=(0.55, 0.55),
...     std=(0.25, 0.25),
...     cutout_threshold=(0.7, 0.7),
...     intensity=(0.6, 0.6),
...     color=(120, 40, 40),  # Reddish-brown mud
...     p=1.0
... )
>>> red_mud_result = red_mud(image=image)
>>> red_mud_image = red_mud_result['image']
>>>
>>> # Example 5: Random effect (50% chance of applying)
>>> random_spatter = A.Compose([
...     A.Spatter(
...         mode="rain" if np.random.random() < 0.5 else "mud",
...         p=0.5
...     )
... ])
>>> random_result = random_spatter(image=image)
>>> result_image = random_result['image']  # May or may not have spatter effect

References

  • [{'description': 'Benchmarking Neural Network Robustness to Common Corruptions and Perturbations', 'source': 'https://arxiv.org/abs/1903.12261'}]

Superpixelsclass

Superpixels(
    p_replace: tuple[float, float] | float = (0, 0.1),
    n_segments: tuple[int, int] | int = (100, 100),
    max_size: int | None = 128,
    interpolation: 0 | 6 | 1 | 2 | 3 | 4 | 5 = 1,
    p: float = 0.5
)

Replace image with superpixel segmentation (SLIC). p_replace, n_segments, max_size control fraction and segment count. Reduces fine texture.

Parameters

NameTypeDefaultDescription
p_replace
One of:
  • tuple[float, float]
  • float
(0, 0.1)Defines for any segment the probability that the pixels within that segment are replaced by their average color (otherwise, the pixels are not changed). * A probability of `0.0` would mean, that the pixels in no segment are replaced by their average color (image is not changed at all). * A probability of `0.5` would mean, that around half of all segments are replaced by their average color. * A probability of `1.0` would mean, that all segments are replaced by their average color (resulting in a voronoi image). Behavior based on chosen data types for this parameter: * If a `float`, then that `float` will always be used. * If `tuple` `(a, b)`, then a random probability will be sampled from the interval `[a, b]` per image. Default: (0.1, 0.3)
n_segments
One of:
  • tuple[int, int]
  • int
(100, 100)Rough target number of how many superpixels to generate. The algorithm may deviate from this number. Lower value will lead to coarser superpixels. Higher values are computationally more intensive and will hence lead to a slowdown. If tuple `(a, b)`, then a value from the discrete interval `[a..b]` will be sampled per image. Default: (15, 120)
max_size
One of:
  • int
  • None
128Maximum image size at which the augmentation is performed. If the width or height of an image exceeds this value, it will be downscaled before the augmentation so that the longest side matches `max_size`. This is done to speed up the process. The final output image has the same size as the input image. Note that in case `p_replace` is below `1.0`, the down-/upscaling will affect the not-replaced pixels too. Use `None` to apply no down-/upscaling. Default: 128
interpolation
One of:
  • 0
  • 6
  • 1
  • 2
  • 3
  • 4
  • 5
1Flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)

# Apply superpixels with default parameters
>>> transform = A.Superpixels(p=1.0)
>>> augmented_image = transform(image=image)['image']

# Apply superpixels with custom parameters
>>> transform = A.Superpixels(
...     p_replace=(0.5, 0.7),
...     n_segments=(50, 100),
...     max_size=None,
...     interpolation=cv2.INTER_NEAREST,
...     p=1.0
... )
>>> augmented_image = transform(image=image)['image']

Notes

- This transform can significantly change the visual appearance of the image. - The transform makes use of a superpixel algorithm, which tends to be slow. If performance is a concern, consider using `max_size` to limit the image size. - The effect of this transform can vary greatly depending on the `p_replace` and `n_segments` parameters. - When `p_replace` is high, the image can become highly abstracted, resembling a voronoi diagram. - The transform preserves the original image type (uint8 or float32).

ToGrayclass

ToGray(
    num_output_channels: int = 3,
    method: 'weighted_average' | 'from_lab' | 'desaturation' | 'average' | 'max' | 'pca' = weighted_average,
    p: float = 0.5
)

Convert to grayscale (weighted by channel weights). Optionally replicate to keep shape. Useful for grayscale training or channel reduction. This transform first converts a color image to a single-channel grayscale image using various methods, then replicates the grayscale channel if num_output_channels is greater than 1.

Parameters

NameTypeDefaultDescription
num_output_channelsint3The number of channels in the output image. If greater than 1, the grayscale channel will be replicated. Default: 3.
method
One of:
  • 'weighted_average'
  • 'from_lab'
  • 'desaturation'
  • 'average'
  • 'max'
  • 'pca'
weighted_averageThe method used for grayscale conversion: - "weighted_average": Uses a weighted sum of RGB channels (0.299R + 0.587G + 0.114B). Works only with 3-channel images. Provides realistic results based on human perception. - "from_lab": Extracts the L channel from the LAB color space. Works only with 3-channel images. Gives perceptually uniform results. - "desaturation": Averages the maximum and minimum values across channels. Works with any number of channels. Fast but may not preserve perceived brightness well. - "average": Simple average of all channels. Works with any number of channels. Fast but may not give realistic results. - "max": Takes the maximum value across all channels. Works with any number of channels. Tends to produce brighter results. - "pca": Applies Principal Component Analysis to reduce channels. Works with any number of channels. Can preserve more information but is computationally intensive.
pfloat0.5Probability of applying the transform. Default: 0.5.

Returns

  • np.ndarray: Grayscale image with the specified number of channels.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a sample color image with distinct RGB values
>>> image = np.zeros((100, 100, 3), dtype=np.uint8)
>>> # Red square in top-left
>>> image[10:40, 10:40, 0] = 200
>>> # Green square in top-right
>>> image[10:40, 60:90, 1] = 200
>>> # Blue square in bottom-left
>>> image[60:90, 10:40, 2] = 200
>>> # Yellow square in bottom-right (Red + Green)
>>> image[60:90, 60:90, 0] = 200
>>> image[60:90, 60:90, 1] = 200
>>>
>>> # Example 1: Default conversion (weighted average, 3 channels)
>>> transform = A.ToGray(p=1.0)
>>> result = transform(image=image)
>>> gray_image = result['image']
>>> # Output has 3 duplicate channels with values based on RGB perception weights
>>> # R=0.299, G=0.587, B=0.114
>>> assert gray_image.shape == (100, 100, 3)
>>> assert np.allclose(gray_image[:, :, 0], gray_image[:, :, 1])
>>> assert np.allclose(gray_image[:, :, 1], gray_image[:, :, 2])
>>>
>>> # Example 2: Single-channel output
>>> transform = A.ToGray(num_output_channels=1, p=1.0)
>>> result = transform(image=image)
>>> gray_image = result['image']
>>> assert gray_image.shape == (100, 100, 1)
>>>
>>> # Example 3: Using different conversion methods
>>> # "desaturation" method (min+max)/2
>>> transform_desaturate = A.ToGray(
...     method="desaturation",
...     p=1.0
... )
>>> result = transform_desaturate(image=image)
>>> gray_desaturate = result['image']
>>>
>>> # "from_lab" method (using L channel from LAB colorspace)
>>> transform_lab = A.ToGray(
...     method="from_lab",
...     p=1.0
>>> )
>>> result = transform_lab(image=image)
>>> gray_lab = result['image']
>>>
>>> # "average" method (simple average of channels)
>>> transform_avg = A.ToGray(
...     method="average",
...     p=1.0
>>> )
>>> result = transform_avg(image=image)
>>> gray_avg = result['image']
>>>
>>> # "max" method (takes max value across channels)
>>> transform_max = A.ToGray(
...     method="max",
...     p=1.0
>>> )
>>> result = transform_max(image=image)
>>> gray_max = result['image']
>>>
>>> # Example 4: Using grayscale in an augmentation pipeline
>>> pipeline = A.Compose([
...     A.ToGray(p=0.5),           # 50% chance of grayscale conversion
...     A.RandomBrightnessContrast(p=1.0)  # Always apply brightness/contrast
... ])
>>> result = pipeline(image=image)
>>> augmented_image = result['image']  # May be grayscale or color
>>>
>>> # Example 5: Converting float32 image
>>> float_image = image.astype(np.float32) / 255.0  # Range [0, 1]
>>> transform = A.ToGray(p=1.0)
>>> result = transform(image=float_image)
>>> gray_float_image = result['image']
>>> assert gray_float_image.dtype == np.float32
>>> assert gray_float_image.max() <= 1.0

Notes

- The transform first converts the input image to single-channel grayscale, then replicates this channel if num_output_channels > 1. - "weighted_average" and "from_lab" are typically used in image processing and computer vision applications where accurate representation of human perception is important. - "desaturation" and "average" are often used in simple image manipulation tools or when computational speed is a priority. - "max" method can be useful in scenarios where preserving bright features is important, such as in some medical imaging applications. - "pca" might be used in advanced image analysis tasks or when dealing with hyperspectral images.

ToRGBclass

ToRGB(
    num_output_channels: int = 3,
    p: float = 1.0
)

Convert grayscale image to RGB by replicating the single channel to three. No color information added; use when a model expects 3-channel input.

Parameters

NameTypeDefaultDescription
num_output_channelsint3The number of channels in the output image. Default: 3.
pfloat1.0Probability of applying the transform. Default: 1.0.

Examples

>>> import numpy as np
>>> import albumentations as A
>>>
>>> # Convert a grayscale image to RGB
>>> transform = A.Compose([A.ToRGB(p=1.0)])
>>> grayscale_image = np.random.randint(0, 256, (100, 100), dtype=np.uint8)
>>> rgb_image = transform(image=grayscale_image)['image']
>>> assert rgb_image.shape == (100, 100, 3)

Notes

- For single-channel (grayscale) images, the channel is replicated to create an RGB image. - If the input is already a 3-channel RGB image, it is returned unchanged. - This transform does not change the data type of the image (e.g., uint8 remains uint8).

ToSepiaclass

ToSepia(
    p: float = 0.5
)

Apply sepia (brownish vintage) filter via fixed color matrix. Optional alpha for blending with original. Good for style or temporal variation in datasets. This transform converts a color image to a sepia tone, giving it a warm, brownish tint that is reminiscent of old photographs. The sepia effect is achieved by applying a specific color transformation matrix to the RGB channels of the input image. For grayscale images, the transform is a no-op and returns the original image.

Parameters

NameTypeDefaultDescription
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>>
# Apply sepia effect to a uint8 RGB image
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> transform = A.ToSepia(p=1.0)
>>> sepia_image = transform(image=image)['image']
>>> assert sepia_image.shape == image.shape
>>> assert sepia_image.dtype == np.uint8
>>>
# Apply sepia effect to a float32 RGB image
>>> image = np.random.rand(100, 100, 3).astype(np.float32)
>>> transform = A.ToSepia(p=1.0)
>>> sepia_image = transform(image=image)['image']
>>> assert sepia_image.shape == image.shape
>>> assert sepia_image.dtype == np.float32
>>> assert 0 <= sepia_image.min() <= sepia_image.max() <= 1.0
>>>
# No effect on grayscale images
>>> gray_image = np.random.randint(0, 256, (100, 100), dtype=np.uint8)
>>> transform = A.ToSepia(p=1.0)
>>> result = transform(image=gray_image)['image']
>>> assert np.array_equal(result, gray_image)

Notes

- The sepia effect only works with RGB images (3 channels). For grayscale images, the original image is returned unchanged since the sepia transformation would have no visible effect when R=G=B. - The sepia effect is created using a fixed color transformation matrix: [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]] - The output image will have the same data type as the input image. - For float32 images, ensure the input values are in the range [0, 1].

UnsharpMaskclass

UnsharpMask(
    blur_limit: tuple[int, int] | int = (3, 7),
    sigma_limit: tuple[float, float] | float = 0.0,
    alpha: tuple[float, float] | float = (0.2, 0.5),
    threshold: int = 10,
    p: float = 0.5
)

Sharpen via unsharp masking: blur, subtract, add back. blur_limit, sigma_limit, alpha control strength. Luminance unchanged; edges enhanced. Unsharp masking is a technique that enhances edge contrast in an image, creating the illusion of increased sharpness. This transform applies Gaussian blur to create a blurred version of the image, then uses this to create a mask which is combined with the original image to enhance edges and fine details.

Parameters

NameTypeDefaultDescription
blur_limit
One of:
  • tuple[int, int]
  • int
(3, 7)maximum Gaussian kernel size for blurring the input image. Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`. If set single value `blur_limit` will be in range (0, blur_limit). Default: (3, 7).
sigma_limit
One of:
  • tuple[float, float]
  • float
0.0Gaussian kernel standard deviation. Must be more or equal to 0. If set single value `sigma_limit` will be in range (0, sigma_limit). If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
alpha
One of:
  • tuple[float, float]
  • float
(0.2, 0.5)range to choose the visibility of the sharpened image. At 0, only the original image is visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
thresholdint10Value to limit sharpening only for areas with high pixel difference between original image and it's smoothed version. Higher threshold means less sharpening on flat areas. Must be in range [0, 255]. Default: 10.
pfloat0.5probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
# Apply UnsharpMask with default parameters
>>> transform = A.UnsharpMask(p=1.0)
>>> sharpened_image = transform(image=image)['image']
>>>
# Apply UnsharpMask with custom parameters
>>> transform = A.UnsharpMask(
...     blur_limit=(3, 7),
...     sigma_limit=(0.1, 0.5),
...     alpha=(0.2, 0.7),
...     threshold=15,
...     p=1.0
... )
>>> sharpened_image = transform(image=image)['image']

Notes

- The algorithm creates a mask M = (I - G) * alpha, where I is the original image and G is the Gaussian blurred version. - The final image is computed as: output = I + M if |I - G| > threshold, else I. - Higher alpha values increase the strength of the sharpening effect. - Higher threshold values limit the sharpening effect to areas with more significant edges or details. - The blur_limit and sigma_limit parameters control the Gaussian blur used to create the mask.

References

  • [{'description': 'Unsharp Masking', 'source': 'https://en.wikipedia.org/wiki/Unsharp_masking'}]

Vignettingclass

Vignetting(
    intensity_range: tuple[float, float] = (0.2, 0.5),
    center_range: tuple[float, float] = (0.3, 0.7),
    p: float = 0.5
)

Darken corners with a radial (elliptical) gradient. Simulates lens vignetting or natural light falloff. Use for lens realism or stylistic darkening. Center of the image stays bright; corners and edges are darkened. Center position can be jittered for variety.

Parameters

NameTypeDefaultDescription
intensity_rangetuple[float, float](0.2, 0.5)Darkening at corners: 0 = no effect, 1 = black. Default: (0.2, 0.5).
center_rangetuple[float, float](0.3, 0.7)Range for vignette center as fraction of width/height. (0.5, 0.5) = image center. Default: (0.3, 0.7).
pfloat0.5Probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>>
>>> transform = A.Vignetting(intensity_range=(0.2, 0.5), p=1.0)
>>> result = transform(image=image)["image"]

Notes

- Elliptical gradient centered at a random point (within center_range). - Quadratic falloff from center to edges.