XYMasking aug in Albumentations πŸ”—

This transform is a generalization of the TimeMasking and FrequencyMasking from torchaudio.

import albumentations as A
from matplotlib import pyplot as plt

/opt/homebrew/Caskroom/miniconda/base/envs/albumentations_examples/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm

import numpy as np
import torch
import torchaudio

ModuleNotFoundError Traceback (most recent call last)

Cell In[2], line 1 -- */} 1 import torchaudio 2 import torch 4 import numpy as np

ModuleNotFoundError: No module named 'torchaudio'

def visualize(image):
    plt.figure(figsize=(10, 5))
    plt.axis("off")
    plt.imshow(image)
img = np.load("../images/spectrogram.npy")
img.shape

(128, 256, 4)

visualize(img[:, :, 0])

png

One vertical stripe (time masking) with fixed width, filled with 0 πŸ”—

params1 = {
    "num_masks_x": 1,
    "mask_x_length": 20,
    "fill_value": 0,
}
transform1 = A.Compose([A.XYMasking(**params1, p=1)], strict=True, seed=137)
visualize(transform1(image=img[:, :, 0])["image"])

png

One vertical stripe (time masking) with randomly sampled width, filled with 0 πŸ”—

params2 = {
    "num_masks_x": 1,
    "mask_x_length": (0, 20),  # This line changed from fixed  to a range
    "fill_value": 0,
}
transform2 = A.Compose([A.XYMasking(**params2, p=1)], strict=True, seed=137)
visualize(transform2(image=img[:, :, 0])["image"])

png

Analogous transform in torchaudioΒΆ πŸ”—

spectrogram = torchaudio.transforms.Spectrogram()
masking = torchaudio.transforms.TimeMasking(time_mask_param=20)
masked = masking(torch.from_numpy(img[:, :, 0]))
visualize(masked.numpy())

png

One horizontal stripe (frequency masking) with randomly sampled width, filled with 0 πŸ”—

params3 = {
    "num_masks_y": 1,
    "mask_y_length": (0, 20),
    "fill_value": 0,
}
transform3 = A.Compose([A.XYMasking(**params3, p=1)], strict=True, seed=137)
visualize(transform3(image=img[:, :, 0])["image"])

png

Analogous transform in torchaudio πŸ”—

spectrogram = torchaudio.transforms.Spectrogram()
masking = torchaudio.transforms.FrequencyMasking(freq_mask_param=20)
masked = masking(torch.from_numpy(img[:, :, 0]))
visualize(masked)

png

Several vertical and horizontal stripes πŸ”—

params4 = {
    "num_masks_x": (2, 4),
    "num_masks_y": 5,
    "mask_y_length": 8,
    "mask_x_length": (10, 20),
    "fill_value": 0,
}
transform4 = A.Compose([A.XYMasking(**params4, p=1)], strict=True, seed=137)
visualize(transform4(image=img[:, :, 0])["image"])

png

Application to the image with the number of channels larger than 3, and different fill values for different channels πŸ”—

Transform can work with any number of channels supporing image shapes of

  • Grayscale: (height, width)
  • RGB: (height, width, 3)
  • Multichannel: (heigh, width, num_channels)

For value that is used to fill masking regions you can use:

  • scalar that will be applied to every channel
  • list of numbers equal to the number of channels, so that every channel will have it's own filling value
params5 = {
    "num_masks_x": (2, 4),
    "num_masks_y": 5,
    "mask_y_length": 8,
    "mask_x_length": (20, 30),
    "fill_value": (0, 1, 2, 3),
}
transform5 = A.Compose([A.XYMasking(**params5, p=1)], strict=True, seed=137)
transformed = transform5(image=img)["image"]
fig, axs = plt.subplots(2, 2)
vmin = 0
vmax = 3
 
axs[0, 0].imshow(transformed[:, :, 0], vmin=vmin, vmax=vmax)
axs[0, 0].set_title("Channel 0")
axs[0, 0].axis("off")  # Hide axes for cleaner visualization
 
axs[0, 1].imshow(transformed[:, :, 1], vmin=vmin, vmax=vmax)
axs[0, 1].set_title("Channel 1")
axs[0, 1].axis("off")
 
axs[1, 0].imshow(transformed[:, :, 2], vmin=vmin, vmax=vmax)
axs[1, 0].set_title("Channel 2")
axs[1, 0].axis("off")
 
axs[1, 1].imshow(transformed[:, :, 3], vmin=vmin, vmax=vmax)
axs[1, 1].set_title("Channel 3")
axs[1, 1].axis("off")
 
plt.tight_layout()
 
plt.show()

png