View on GitHub ↗2026-04-28
Summary
Patch release focused on functional-layer performance for common geometric and pixel/noise paths, plus a small geometric-doc cleanup. No API changes.
transpose/rot90(albumentations.augmentations.geometric.functional): HW↔HW transforms use OpenCV (cv2.transpose/cv2.rotate) for channel-last(H, W, C)images whenC ∈ {1, 3, 4}; other layouts fall back to NumPy (same as before for volumes / high channel counts).- PixelDropout: faster shared-mask generation (
get_drop_maskwithper_channel=False) via vectorized RNG compares instead ofchoice; mask broadcasting unchanged for downstream apply. - SaltAndPepper: sparse multi-channel noise uses copy + boolean indexing when total noisy fraction is low; dense noise stays on
np.where. - Defocus: aliased-disk kernel construction cached (
lru_cache);create_defocus_kernel()still returns an independent copy per call;defocus()passes a copy into convolution so cached tensors are never mutated byfilter2D. - Docs: removed misleading “slow” framing from PiecewiseAffine docstring (#244).
Breaking changes
None.
New features
None.
Bug fixes
None.
Performance
Benchmarks were run locally against development builds on this repo (python driver importing albumentations.augmentations.*.functional). Raw timings (seconds per call, min-of-repeats) and ratios are archived at _internal/release_notes/BENCHMARK_RESULTS_2.2.3.json.
Methodology
- Geometric (
transpose,rot90): baseline is NumPy transpose /np.rot90followed bynp.ascontiguousarrayso the comparison measures materialized contiguous output, matching workloads that immediately consume arrays with contiguous layout (similar to enforcing contiguous tensors downstream). Pure NumPy transpose/rot90without materialization returns views and is not comparable to OpenCV’s contiguous output on equal footing.- Reported speedups below are for
C ∈ {1, 3}(OpenCV fast path). ForC = 5, production code uses the NumPy fallback; ratios versus a forced-copy baseline are not meaningful when the fallback returns a non-materialized array — omit inflated headline numbers forC = 5.
- Reported speedups below are for
get_drop_mask(per_channel=False): baseline usesnumpy.random.Generator.choice+np.repeaton(H, W)(pre-change behavior).apply_salt_and_pepper: baseline is stackednp.whereonly; masks are independent sparse ~2% salt/pepper density each (numpy.random), shared across channels — typical for default-ish SaltAndPepper amounts.defocus: compares uncached kernel construction + convolve vs cached kernel API (create_defocus_kernel); full-image timing dominated bycv2.filter2D.
Representative speedups (materialized geometric baseline)
| Shape | Transpose (baseline → current) | rot90 r=90° (baseline → current) |
|---|---|---|
| 512×512×3 | ~5.1× | ~4.9× |
| 1024×1024×3 | ~4.3× | ~4.7× |
| 512×512×1 | ~1.6× | ~2.1× |
| 1024×1024×1 | ~2.1× | ~2.1× |
get_drop_mask (per_channel=False, dropout_prob=0.2): ~2.4×–2.8× faster than choice + repeat across the 256→1024 × {1,3,5} matrix (mean ~2.47×).
apply_salt_and_pepper (sparse masks):
| Shape | Speedup vs stacked np.where only |
|---|---|
| 1024×1024×3 | ~1.57× |
| 1024×1024×5 | ~2.55× |
Single-channel grayscale stays roughly neutral (still pure np.where).
Defocus kernel:
- Uncached kernel build vs repeated
create_defocus_kernel(cached +.copy()): ~9.3× on repeated(radius=5, alias_blur=0.3)calls. - Full
512×512×3defocus: ~1.01× —filter2Ddominates; kernel caching mainly removes redundant Gaussian-blur-of-disk work when the same(radius, alias_blur)repeats.
Misc
- Compose-level docs/skills already state channel-last tensors with an explicit channel dimension; release packaging unchanged besides version bump (
pyproject.toml+uv.lock).
Commits
| Commit | PR | Description |
|---|---|---|
fba6836 | #246 | chore: bump version to 2.2.3 |
4c6bb38 | #245 | perf: CV transpose/rotate, dropout masks, salt/pepper, cached defocus kernels |
4afd980 | #244 | docs(geometric): remove PiecewiseAffine slow warning |