Summary

Patch release focused on functional-layer performance for common geometric and pixel/noise paths, plus a small geometric-doc cleanup. No API changes.

  • transpose / rot90 (albumentations.augmentations.geometric.functional): HW↔HW transforms use OpenCV (cv2.transpose / cv2.rotate) for channel-last (H, W, C) images when C ∈ {1, 3, 4}; other layouts fall back to NumPy (same as before for volumes / high channel counts).
  • PixelDropout: faster shared-mask generation (get_drop_mask with per_channel=False) via vectorized RNG compares instead of choice; mask broadcasting unchanged for downstream apply.
  • SaltAndPepper: sparse multi-channel noise uses copy + boolean indexing when total noisy fraction is low; dense noise stays on np.where.
  • Defocus: aliased-disk kernel construction cached (lru_cache); create_defocus_kernel() still returns an independent copy per call; defocus() passes a copy into convolution so cached tensors are never mutated by filter2D.
  • Docs: removed misleading “slow” framing from PiecewiseAffine docstring (#244).

Breaking changes

None.


New features

None.


Bug fixes

None.


Performance

Benchmarks were run locally against development builds on this repo (python driver importing albumentations.augmentations.*.functional). Raw timings (seconds per call, min-of-repeats) and ratios are archived at _internal/release_notes/BENCHMARK_RESULTS_2.2.3.json.

Methodology

  • Geometric (transpose, rot90): baseline is NumPy transpose / np.rot90 followed by np.ascontiguousarray so the comparison measures materialized contiguous output, matching workloads that immediately consume arrays with contiguous layout (similar to enforcing contiguous tensors downstream). Pure NumPy transpose/rot90 without materialization returns views and is not comparable to OpenCV’s contiguous output on equal footing.
    • Reported speedups below are for C ∈ {1, 3} (OpenCV fast path). For C = 5, production code uses the NumPy fallback; ratios versus a forced-copy baseline are not meaningful when the fallback returns a non-materialized array — omit inflated headline numbers for C = 5.
  • get_drop_mask (per_channel=False): baseline uses numpy.random.Generator.choice + np.repeat on (H, W) (pre-change behavior).
  • apply_salt_and_pepper: baseline is stacked np.where only; masks are independent sparse ~2% salt/pepper density each (numpy.random), shared across channels — typical for default-ish SaltAndPepper amounts.
  • defocus: compares uncached kernel construction + convolve vs cached kernel API (create_defocus_kernel); full-image timing dominated by cv2.filter2D.

Representative speedups (materialized geometric baseline)

ShapeTranspose (baseline → current)rot90 r=90° (baseline → current)
512×512×3~5.1×~4.9×
1024×1024×3~4.3×~4.7×
512×512×1~1.6×~2.1×
1024×1024×1~2.1×~2.1×

get_drop_mask (per_channel=False, dropout_prob=0.2): ~2.4×–2.8× faster than choice + repeat across the 256→1024 × {1,3,5} matrix (mean ~2.47×).

apply_salt_and_pepper (sparse masks):

ShapeSpeedup vs stacked np.where only
1024×1024×3~1.57×
1024×1024×5~2.55×

Single-channel grayscale stays roughly neutral (still pure np.where).

Defocus kernel:

  • Uncached kernel build vs repeated create_defocus_kernel (cached + .copy()): ~9.3× on repeated (radius=5, alias_blur=0.3) calls.
  • Full 512×512×3 defocus: ~1.01×filter2D dominates; kernel caching mainly removes redundant Gaussian-blur-of-disk work when the same (radius, alias_blur) repeats.

Misc

  • Compose-level docs/skills already state channel-last tensors with an explicit channel dimension; release packaging unchanged besides version bump (pyproject.toml + uv.lock).

Commits

CommitPRDescription
fba6836#246chore: bump version to 2.2.3
4c6bb38#245perf: CV transpose/rotate, dropout masks, salt/pepper, cached defocus kernels
4afd980#244docs(geometric): remove PiecewiseAffine slow warning