Stay updated
News & Insightsalbumentations.augmentations.spectrogram.transform
Mask spectrogram in frequency domain. freq_mask_param sets max mask length; SpecAugment-style. Single vertical mask; use XYMasking for more flexibility.
Members
- classFrequencyMasking
- classTimeMasking
- classTimeReverse
FrequencyMaskingclass
FrequencyMasking(
freq_mask_param: int = 30,
p: float = 0.5
)Mask spectrogram in frequency domain. freq_mask_param sets max mask length; SpecAugment-style. Single vertical mask; use XYMasking for more flexibility. This transform masks random segments along the frequency axis of a spectrogram, implementing the frequency masking technique proposed in the SpecAugment paper. Frequency masking helps in training models to be robust against frequency variations and missing spectral information in audio signals. This is a specialized version of XYMasking configured for frequency masking only. For more advanced use cases (e.g., multiple masks, time masking, or custom fill values), consider using XYMasking directly.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| freq_mask_param | int | 30 | Maximum possible length of the mask in the frequency domain. Must be a positive integer. Length of the mask is uniformly sampled from (0, freq_mask_param). |
| p | float | 0.5 | probability of applying the transform. Default: 0.5. |
References
- [{'description': 'SpecAugment paper', 'source': 'https://arxiv.org/abs/1904.08779'}, {'description': 'Original implementation', 'source': 'https://pytorch.org/audio/stable/transforms.html#freqmask'}]
TimeMaskingclass
TimeMasking(
time_mask_param: int = 40,
p: float = 0.5
)Mask spectrogram in time domain. time_mask_param sets max mask length; SpecAugment-style. Single horizontal mask; use XYMasking for more flexibility. This transform masks random segments along the time axis of a spectrogram, implementing the time masking technique proposed in the SpecAugment paper. Time masking helps in training models to be robust against temporal variations and missing information in audio signals. This is a specialized version of XYMasking configured for time masking only. For more advanced use cases (e.g., multiple masks, frequency masking, or custom fill values), consider using XYMasking directly.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| time_mask_param | int | 40 | Maximum possible length of the mask in the time domain. Must be a positive integer. Length of the mask is uniformly sampled from (0, time_mask_param). |
| p | float | 0.5 | probability of applying the transform. Default: 0.5. |
References
- [{'description': 'SpecAugment paper', 'source': 'https://arxiv.org/abs/1904.08779'}, {'description': 'Original implementation', 'source': 'https://pytorch.org/audio/stable/transforms.html#timemask'}]
TimeReverseclass
TimeReverse(
p: float = 0.5
)Reverse time axis of a spectrogram (time inversion). Alias for HorizontalFlip; for audio. Used in AudioCLIP; p controls probability. Time inversion of a spectrogram is analogous to the random flip of an image, an augmentation technique widely used in the visual domain. This can be relevant in the context of audio classification tasks when working with spectrograms. The technique was successfully applied in the AudioCLIP paper, which extended CLIP to handle image, text, and audio inputs. This transform is implemented as a subclass of HorizontalFlip since reversing time in a spectrogram is equivalent to flipping the image horizontally.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| p | float | 0.5 | probability of applying the transform. Default: 0.5. |
Notes
This transform is functionally identical to HorizontalFlip but provides a more semantically meaningful name when working with spectrograms and other time-series visualizations.
References
- [{'description': 'AudioCLIP paper', 'source': 'https://arxiv.org/abs/2106.13043'}, {'description': 'Audiomentations', 'source': 'https://iver56.github.io/audiomentations/waveform_transforms/reverse/'}]