AlbumentationsX 2.0.18 Release Notes

New Transform: PhotoMetricDistort

Screenshot 2026-02-23 at 10 30 56

PhotoMetricDistort implements the photometric distortion pipeline from the SSD paper, matching the API and default parameters of torchvision's RandomPhotometricDistort.

Each of the five distortions — brightness, contrast, saturation, hue, and channel shuffle — is applied independently with probability distort_p. Contrast placement is randomized: it can appear either before or after the HSV-space adjustments (saturation + hue), mirroring the SSD paper's stochastic ordering.

import albumentations as A

transform = A.PhotoMetricDistort(
    brightness_range=(0.875, 1.125),  # multiplicative factor
    contrast_range=(0.5, 1.5),        # multiplicative factor
    saturation_range=(0.5, 1.5),      # multiplicative factor
    hue_range=(-0.05, 0.05),          # additive factor, range [-0.5, 0.5]
    distort_p=0.5,                    # probability per individual distortion
    p=0.5,                            # probability the whole transform runs
)

Key differences from torchvision:

  • torchvision uses a single p that applies to each distortion; AlbumentationsX separates distort_p (per-distortion) from p (the overall transform gate), giving independent control.
  • All default parameter values are identical to torchvision's RandomPhotometricDistort.

Supported targets: image, volume
Supported dtypes: uint8, float32
Channels: 1 (grayscale) and 3 (RGB) only — same constraint as torchvision


ColorJitter: Fused Brightness + Contrast

As part of the same PR, ColorJitter received a performance improvement. The four color operations are applied in a random order each call. When brightness and contrast happen to be adjacent in the shuffle (approximately 50% of calls, since 12 of the 24 possible orderings place them next to each other), they are now fused into a single operation:

  • uint8: the two transforms are composed analytically into a single 256-entry LUT applied in one cv2.LUT call instead of two sequential passes
  • float32: a single pre-allocated output buffer with in-place numpy ops avoids 4+ intermediate array allocations
dtypev2.0.17 (100 calls)v2.0.18 (100 calls)speedup
uint80.173s0.162s1.07x
float320.749s0.689s1.09x

The speedup is modest in the aggregate because the fusion only applies to the ~50% of calls where brightness and contrast are adjacent; the other ~50% are unchanged.


Performance: Multi-Channel Speedups (5+ Channels)

AlbumentationsX now requires OpenCV ≥ 4.13 and albucore ≥ 0.0.39. OpenCV 4.13 extended native multi-channel support for several operations. The previous code always split images with more than 4 channels into ≤4-channel chunks and processed them sequentially. The new code calls OpenCV directly when supported, falling back to chunking only when genuinely required.

All benchmarks: 512×512 images, 100 iterations, Apple M-series, macOS.


Blur (cv2.blur)

cv2.blur has always supported arbitrary channel counts natively — the chunking was unnecessary. The new code calls it directly with dst=img (in-place).

channelsv2.0.17v2.0.18speedup
10.013s0.013s1.0x
30.026s0.027s1.0x
50.230s0.058s4.0x
80.411s0.108s3.8x
160.834s0.217s3.8x
321.819s0.447s4.1x
643.246s0.785s4.1x
12810.145s3.730s2.7x

MedianBlur

OpenCV 4.13 added native multi-channel support for cv2.medianBlur when ksize is 3 or 5. For ksize ≥ 7, OpenCV's internal SIMD path still asserts channels ≤ 4, so chunking remains necessary there.

ksize = 5 — native multi-channel path (fast):

channelsv2.0.17v2.0.18speedup
10.026s0.023s1.1x
30.046s0.045s1.0x
50.241s0.152s1.6x
80.388s0.257s1.5x
160.780s0.534s1.5x
321.673s1.112s1.5x
643.043s1.888s1.6x
1289.478s3.901s2.4x

ksize = 7 — still chunked for >4 channels (no change expected):

channelsv2.0.17v2.0.18speedup
50.264s0.251s1.0x
80.435s0.454s1.0x
160.871s0.930s1.0x
321.900s1.705s1.0x
643.306s3.461s1.0x
12810.544s11.500s1.0x

Affine, Rotate, ShiftScaleRotate, SafeRotate (warp_affine)

These transforms now route through albucore.warp_affine, which calls cv2.warpAffine directly for multi-channel images when the interpolation mode is INTER_NEAREST, INTER_LINEAR, or INTER_AREA. For INTER_CUBIC, INTER_LANCZOS4, and INTER_LINEAR_EXACT, chunking is still required.

The default interpolation is INTER_LINEAR, so most users get the speedup automatically.

Affine (scale + rotate, INTER_LINEAR — default):

channelsv2.0.17v2.0.18speedup
10.099s0.024s4.1x
30.053s0.025s2.1x
50.254s0.055s4.6x
80.389s0.046s8.5x
160.745s0.059s12.6x
321.622s0.100s16.2x
643.191s0.125s25.5x
12810.273s0.272s37.8x

Affine (INTER_CUBIC — still chunked for >4 channels):

channelsv2.0.17v2.0.18speedup
50.339s0.325s1.0x
80.647s0.533s1.2x
161.133s1.071s1.1x
322.330s2.266s1.0x
644.329s4.306s1.0x
12812.197s13.501s1.0x

Rotate (INTER_LINEAR):

channelsv2.0.17v2.0.18speedup
10.021s0.035s0.6x
30.027s0.037s0.7x
50.240s0.142s1.7x
80.496s0.063s7.9x
160.834s0.061s13.7x
321.554s0.103s15.1x
642.957s0.166s17.8x
1289.115s0.294s31.0x

ShiftScaleRotate (INTER_LINEAR):

channelsv2.0.17v2.0.18speedup
50.239s0.150s1.6x
80.407s0.092s4.4x
160.764s0.069s11.1x
321.592s0.124s12.8x
642.993s0.201s14.9x
1289.504s0.333s28.5x

SafeRotate (INTER_LINEAR):

channelsv2.0.17v2.0.18speedup
50.239s0.058s4.1x
80.392s0.049s8.0x
160.764s0.066s11.6x
321.660s0.109s15.2x
643.040s0.144s21.1x
1289.559s0.330s29.0x

Perspective (warp_perspective)

Same interpolation-mode rules as warp_affine.

Perspective (INTER_LINEAR — default):

channelsv2.0.17v2.0.18speedup
10.045s0.049s1.0x
30.030s0.033s0.9x
50.248s0.064s3.9x
80.399s0.057s7.0x
160.779s0.068s11.5x
321.823s0.108s16.9x
643.142s0.205s15.3x
1289.644s0.460s21.0x

Perspective (INTER_CUBIC — still chunked for >4 channels):

channelsv2.0.17v2.0.18speedup
50.423s0.444s1.0x
80.741s0.652s1.1x
161.099s1.128s1.0x
322.412s2.647s0.9x
644.407s4.909s0.9x
12812.777s14.327s0.9x

ElasticTransform, GridDistortion, OpticalDistortion, ThinPlateSpline, PiecewiseAffine (remap)

All transforms that inherit from BaseDistortion route through albucore.remap. The native multi-channel path is available for INTER_NEAREST, INTER_LINEAR, INTER_AREA, and INTER_LINEAR_EXACT. For INTER_CUBIC and INTER_LANCZOS4, chunking remains necessary.

Note: ElasticTransform dominates its time on cv2.remap but also generates random displacement fields — the speedup for high channel counts reflects both the faster remap and that field generation is channel-independent. GridDistortion and OpticalDistortion show even larger speedups because their maps are smaller and the remap dominates.

ElasticTransform (INTER_LINEAR):

channelsv2.0.17v2.0.18speedup
11.488s1.477s1.0x
31.498s1.452s1.0x
51.599s1.557s1.0x
81.688s1.471s1.1x
161.891s1.618s1.2x
322.280s1.484s1.5x
642.993s1.669s1.8x
1286.025s1.535s3.9x

ElasticTransform (INTER_CUBIC — still chunked for >4 channels):

channelsv2.0.17v2.0.18speedup
51.642s1.618s1.0x
81.748s1.838s1.0x
162.033s2.148s1.0x
322.574s2.635s1.0x
643.606s3.824s0.9x
1287.409s8.672s0.9x

GridDistortion (INTER_LINEAR):

channelsv2.0.17v2.0.18speedup
10.121s0.057s2.1x
30.131s0.071s1.8x
50.353s0.104s3.4x
80.518s0.106s4.9x
160.937s0.119s7.9x
321.725s0.149s11.6x
643.186s0.152s20.9x
1289.466s0.238s39.8x

OpticalDistortion (INTER_LINEAR):

channelsv2.0.17v2.0.18speedup
10.129s0.079s1.6x
30.135s0.084s1.6x
50.352s0.123s2.9x
80.520s0.111s4.7x
160.930s0.131s7.1x
321.725s0.169s10.2x
643.205s0.177s18.1x
12810.661s0.268s39.8x

Pad, PadIfNeeded, CenterCrop, RandomCrop, Crop, CropAndPad (copy_make_border)

All padding operations now route through albucore.copy_make_border. For constant-value padding with a scalar fill (the most common case), cv2.copyMakeBorder supports arbitrary channel counts natively — no chunking needed. Per-channel fill with more than 4 distinct values still uses chunking.

PadIfNeeded (scalar fill, fill=0):

channelsv2.0.17v2.0.18speedup
10.002s0.002s1.0x
30.004s0.003s1.3x
50.253s0.015s16.9x
80.448s0.027s16.9x
160.885s0.050s17.7x
321.623s0.017s98.1x
643.260s0.041s79.3x
12811.453s0.085s134.5x

CenterCrop (pad when crop > image size):

channelsv2.0.17v2.0.18speedup
50.257s0.017s15.3x
80.447s0.026s16.9x
160.873s0.052s16.7x
321.717s0.017s99.8x
643.263s0.045s73.1x
12811.246s0.069s162.4x

CropAndPad:

channelsv2.0.17v2.0.18speedup
10.013s0.014s1.0x
30.036s0.018s2.0x
50.531s0.286s1.9x
80.928s0.580s1.6x
161.936s1.007s1.9x
323.732s1.819s2.1x
647.015s3.610s1.9x
12824.289s14.005s1.7x

Note: CropAndPad includes a resize step that still uses separate cv2 calls; the speedup is lower than pure pad transforms.


Summary of Affected Transforms

GroupTransformsSpeedup conditionWhen unchanged
warp_affineAffine, Rotate, ShiftScaleRotate, SafeRotateINTER_LINEAR/NEAREST/AREA (default)INTER_CUBIC/LANCZOS4/LINEAR_EXACT
warp_perspectivePerspectivesamesame
remapElasticTransform, GridDistortion, OpticalDistortion, ThinPlateSpline, PiecewiseAffineINTER_LINEAR/NEAREST/AREA/LINEAR_EXACT (default)INTER_CUBIC/LANCZOS4
copy_make_borderPad, PadIfNeeded, CenterCrop, RandomCrop, Crop, CropAndPadscalar or ≤4-element fill (default)per-channel fill with >4 values
box_blurBluralways—
median_blurMedianBlurksize 3 or 5ksize ≥ 7 for >4 channels

Requirements

  • OpenCV ≥ 4.13.0.92 (previously ≥ 4.9.0.80)
  • albucore == 0.0.39 (previously == 0.0.36)

Big thanks to @stark0908 for pointing the latests changes in OpenCV functionality that allowed to speed up multichannel transforms.