Video Benchmark Results 🔗

Video Benchmark Summary

Video Augmentation Benchmarks 🔗

This directory contains benchmark results for video augmentation libraries.

Overview 🔗

The video benchmarks measure the performance of various augmentation libraries on video transformations. The benchmarks compare CPU-based processing (Albumentations) with GPU-accelerated processing (Kornia).

Dataset 🔗

The benchmarks use the UCF101 dataset, which contains 13,320 videos from 101 action categories. The videos are realistic, collected from YouTube, and include a wide variety of camera motion, object appearance, pose, scale, viewpoint, and background. This makes it an excellent dataset for benchmarking video augmentation performance across diverse real-world scenarios.

You can download the dataset from: https://www.crcv.ucf.edu/data/UCF101/UCF101.rar

Methodology 🔗

  1. Video Loading: Videos are loaded using library-specific loaders:

    • OpenCV for Albumentations
    • PyTorch tensors for Kornia
  2. Warmup Phase:

    • Performs adaptive warmup until performance variance stabilizes
    • Uses configurable parameters for stability detection
    • Implements early stopping for slow transforms
  3. Measurement Phase:

    • Multiple runs of each transform
    • Measures throughput (videos/second)
    • Calculates statistical metrics (median, standard deviation)
  4. Environment Control:

    • CPU benchmarks are run single-threaded
    • GPU benchmarks utilize the specified GPU device
    • Thread settings are controlled for consistent results

Hardware Comparison 🔗

The benchmarks compare:

  • Albumentations: CPU-based processing (single thread)
  • Kornia: GPU-accelerated processing (NVIDIA GPUs)

This provides insights into the trade-offs between CPU and GPU processing for video augmentation.

Running the Benchmarks 🔗

To run the video benchmarks:

./run_video_single.sh -l albumentations -d /path/to/videos -o /path/to/output

To run all libraries and generate a comparison:

./run_video_all.sh -d /path/to/videos -o /path/to/output

Benchmark Results 🔗

Video Benchmark Results 🔗

Number shows how many videos per second can be processed. Larger is better. The Speedup column shows how many times faster Albumentations is compared to the fastest other library for each transform.

Transformalbumentations (videos per second)
arm (1 core)
kornia (videos per second)
NVIDIA GeForce RTX 4090
torchvision (videos per second)
NVIDIA GeForce RTX 4090
Speedup
(Alb/fastest other)
Affine4.51 ± 0.0321.39 ± 0.05452.58 ± 0.140.01x
AutoContrast20.56 ± 0.1921.41 ± 0.02577.72 ± 16.860.04x
Blur52.40 ± 1.8220.61 ± 0.06N/A2.54x
Brightness187.30 ± 5.1221.85 ± 0.02755.52 ± 435.170.25x
CLAHE9.29 ± 0.05N/AN/AN/A
CenterCrop128823.73 ± 20.9070.12 ± 1.291133.39 ± 234.600.73x
ChannelDropout62.00 ± 2.3521.81 ± 0.03N/A2.84x
ChannelShuffle23.33 ± 0.1019.99 ± 0.03958.35 ± 0.200.02x
CoarseDropout68.63 ± 4.09N/AN/AN/A
ColorJitter10.89 ± 0.5618.79 ± 0.0368.75 ± 0.130.16x
Contrast189.97 ± 0.7321.69 ± 0.04546.55 ± 13.230.35x
CornerIllumination5.92 ± 0.062.60 ± 0.07N/A2.27x
Elastic4.19 ± 0.12N/A126.83 ± 1.280.03x
Equalize13.85 ± 0.094.21 ± 0.00191.55 ± 1.250.07x
Erasing70.64 ± 0.63N/A254.59 ± 6.570.28x
GaussianBlur28.56 ± 0.2521.61 ± 0.05543.44 ± 11.500.05x
GaussianIllumination7.98 ± 0.1120.33 ± 0.08N/A0.39x
GaussianNoise9.82 ± 0.3122.38 ± 0.08N/A0.44x
Grayscale157.71 ± 3.0022.24 ± 0.04838.40 ± 466.760.19x
HSV7.58 ± 0.06N/AN/AN/A
HorizontalFlip27.59 ± 0.0521.86 ± 0.07977.87 ± 49.030.03x
Hue15.57 ± 0.4819.53 ± 0.02N/A0.80x
Invert362.78 ± 10.2421.91 ± 0.23843.27 ± 176.000.43x
JpegCompression20.67 ± 0.34N/AN/AN/A
LinearIllumination5.40 ± 0.034.29 ± 0.19N/A1.26x
MedianBlur13.97 ± 0.128.39 ± 0.09N/A1.66x
MotionBlur36.79 ± 0.76N/AN/AN/A
Normalize21.48 ± 0.0921.82 ± 0.02460.80 ± 0.180.05x
OpticalDistortion4.71 ± 0.02N/AN/AN/A
Pad70.98 ± 0.86N/A759.68 ± 337.780.09x
Perspective4.40 ± 0.06N/A434.75 ± 0.140.01x
PlankianJitter22.56 ± 2.7510.85 ± 0.01N/A2.08x
PlasmaBrightness3.42 ± 0.0116.94 ± 0.36N/A0.20x
PlasmaContrast2.69 ± 0.0116.97 ± 0.03N/A0.16x
PlasmaShadow6.14 ± 0.0219.03 ± 0.50N/A0.32x
Posterize65.43 ± 0.61N/A631.46 ± 14.740.10x
RGBShift34.41 ± 0.1822.27 ± 0.04N/A1.55x
Rain24.70 ± 0.493.77 ± 0.00N/A6.55x
RandomCrop128766.40 ± 17.6965.33 ± 0.351132.79 ± 15.230.68x
RandomGamma189.61 ± 3.3721.63 ± 0.02N/A8.77x
RandomResizedCrop16.63 ± 1.066.29 ± 0.03182.09 ± 15.750.09x
Resize15.60 ± 0.225.87 ± 0.03139.96 ± 35.040.11x
Rotate28.29 ± 0.3221.53 ± 0.05534.18 ± 0.160.05x
SaltAndPepper10.22 ± 0.078.82 ± 0.12N/A1.16x
Saturation9.10 ± 0.1036.56 ± 0.12N/A0.25x
Sharpen26.69 ± 0.5817.86 ± 0.03420.09 ± 8.990.06x
Shear4.72 ± 0.02N/AN/AN/A
Snow13.26 ± 0.10N/AN/AN/A
Solarize58.35 ± 0.6020.73 ± 0.02628.42 ± 5.910.09x
ThinPlateSpline4.60 ± 0.0344.90 ± 0.67N/A0.10x
VerticalFlip391.52 ± 12.2621.96 ± 0.24977.92 ± 5.220.40x

Torchvision Metadata 🔗

system_info:
  python_version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27)
    [GCC 11.2.0]
  platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.31
  processor: x86_64
  cpu_count: '64'
  timestamp: '2025-03-11T11:14:57.765540+00:00'
library_versions:
  torchvision: 0.21.0
  numpy: 2.2.3
  pillow: 11.1.0
  opencv-python-headless: not installed
  torch: 2.6.0
  opencv-python: not installed
thread_settings:
  environment:
    OMP_NUM_THREADS: '1'
    OPENBLAS_NUM_THREADS: '1'
    MKL_NUM_THREADS: '1'
    VECLIB_MAXIMUM_THREADS: '1'
    NUMEXPR_NUM_THREADS: '1'
  opencv: not installed
  pytorch:
    threads: 32
    gpu_available: true
    gpu_device: 0
    gpu_name: NVIDIA GeForce RTX 4090
    gpu_memory_total: 23.55084228515625
    gpu_memory_allocated: 15.05643081665039
  pillow:
    threads: unknown
    simd: false
benchmark_params:
  num_videos: 200
  num_runs: 10
  max_warmup_iterations: 100
  warmup_window: 5
  warmup_threshold: 0.05
  min_warmup_windows: 3
precision: torch.float16
 

Kornia Metadata 🔗

system_info:
  python_version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27)
    [GCC 11.2.0]
  platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.31
  processor: x86_64
  cpu_count: '64'
  timestamp: '2025-03-11T00:46:14.791885+00:00'
library_versions:
  kornia: 0.8.0
  numpy: 2.2.3
  pillow: 11.1.0
  opencv-python-headless: not installed
  torch: 2.6.0
  opencv-python: not installed
thread_settings:
  environment:
    OMP_NUM_THREADS: '1'
    OPENBLAS_NUM_THREADS: '1'
    MKL_NUM_THREADS: '1'
    VECLIB_MAXIMUM_THREADS: '1'
    NUMEXPR_NUM_THREADS: '1'
  opencv: not installed
  pytorch:
    threads: 32
    gpu_available: true
    gpu_device: 0
    gpu_name: NVIDIA GeForce RTX 4090
    gpu_memory_total: 23.55084228515625
    gpu_memory_allocated: 15.05643081665039
  pillow:
    threads: unknown
    simd: false
benchmark_params:
  num_videos: 200
  num_runs: 5
  max_warmup_iterations: 100
  warmup_window: 5
  warmup_threshold: 0.05
  min_warmup_windows: 3
precision: torch.float16
 

Albumentations Metadata 🔗

system_info:
  python_version: 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 10:37:40)
    [Clang 14.0.6 ]
  platform: macOS-15.1-arm64-arm-64bit
  processor: arm
  cpu_count: '16'
  timestamp: '2025-04-17T02:12:28.247902+00:00'
library_versions:
  albumentations: 2.0.5
  numpy: 2.2.4
  pillow: 11.2.1
  opencv-python-headless: 4.11.0.86
  torch: 2.6.0
  opencv-python: not installed
thread_settings:
  environment:
    OMP_NUM_THREADS: '1'
    OPENBLAS_NUM_THREADS: '1'
    MKL_NUM_THREADS: '1'
    VECLIB_MAXIMUM_THREADS: '1'
    NUMEXPR_NUM_THREADS: '1'
  opencv:
    threads: 1
    opencl: false
  pytorch:
    threads: 1
    gpu_available: false
    gpu_device: null
  pillow:
    threads: unknown
    simd: false
benchmark_params:
  num_videos: 200
  num_runs: 5
  max_warmup_iterations: 100
  warmup_window: 5
  warmup_threshold: 0.05
  min_warmup_windows: 3
 

Analysis 🔗

The benchmark results show interesting trade-offs between CPU and GPU processing:

  • CPU Advantages:
    • Better for simple transformations with low computational complexity
    • No data transfer overhead between CPU and GPU
    • More consistent performance across different transform types
  • GPU Advantages:
    • Significantly faster for complex transformations
    • Better scaling with video resolution
    • More efficient for batch processing

Recommendations 🔗

Based on the benchmark results, we recommend:

  1. For simple transformations on a small number of videos, CPU processing may be sufficient
  2. For complex transformations or batch processing, GPU acceleration provides significant benefits
  3. Consider the specific transformations you need and their relative performance on CPU vs GPU