albumentations.augmentations.geometric.transforms
Apply affine transformations: translation, rotation, scale, shear. Params: scale, translate, rotate, shear, interpolation, fill.
Members
- classAffine
- classGridElasticDeform
- classMorphological
- classPerspective
- classRandomGridShuffle
- classShiftScaleRotate
Affineclass
Affine(
scale: tuple[float, float] | dict[str, tuple[float, float]] = (1.0, 1.0),
translate_percent: tuple[float, float] | dict[str, tuple[float, float]] | None,
translate_px: tuple[int, int] | dict[str, tuple[int, int]] | None,
rotate: tuple[float, float] = (0.0, 0.0),
shear: tuple[float, float] | dict[str, tuple[float, float]] = (0.0, 0.0),
interpolation: 0 | 1 | 2 | 3 | 4 = 1,
mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
fit_output: bool = False,
keep_ratio: bool = True,
rotate_method: 'largest_box' | 'ellipse' = largest_box,
balanced_scale: bool = False,
border_mode: 0 | 1 | 2 | 3 | 4 = 0,
fill: tuple[float, ...] | float = 0,
fill_mask: tuple[float, ...] | float | None,
p: float = 0.5
)Apply affine transformations: translation, rotation, scale, shear. Params: scale, translate, rotate, shear, interpolation, fill. Affine transformations involve: - Translation ("move" image on the x-/y-axis) - Rotation - Scaling ("zoom" in/out) - Shear (move one side of the image, turning a square into a trapezoid) All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters `fill` and `fill_mask` of this class deal with this. Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters `interpolation` and `mask_interpolation` deals with the method of interpolation used for this.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| scale | One of:
| (1.0, 1.0) | Scaling factor, where `1.0` denotes "no change". * If a tuple `(a, b)`, then a value will be uniformly sampled per image from `[a, b]` and used identically for x- and y-axis. * If a dict with keys `"x"` and/or `"y"`, each entry must itself be a `(a, b)` tuple. When `keep_ratio=True`, x and y ranges must be identical. |
| translate_percent | One of:
| - | Translation as a fraction of image size, where `0` denotes "no change" and `0.5` denotes "half the axis size". * If `None`, equivalent to `0.0` unless `translate_px` is set. * If a tuple `(a, b)`, sampled value applies to both x- and y-axis. * If a dict with keys `"x"` and/or `"y"`, each entry must be a `(a, b)` tuple. Sampling happens independently per axis. |
| translate_px | One of:
| - | Translation in pixels. Same shape rules as `translate_percent`. If `None`, equivalent to `0` unless `translate_percent` is set. |
| rotate | tuple[float, float] | (0.0, 0.0) | Rotation in degrees (**NOT** radians) around image center. A `(a, b)` tuple from which the rotation angle is uniformly sampled. |
| shear | One of:
| (0.0, 0.0) | Shear in degrees (**NOT** radians). * If a tuple `(a, b)`, used as `[a, b]` for both x- and y-shear. * If a dict with keys `"x"` and/or `"y"`, each entry must be a `(a, b)` tuple. Sampling happens independently per axis. |
| interpolation | One of:
| 1 | OpenCV interpolation flag. |
| mask_interpolation | One of:
| 0 | OpenCV interpolation flag. |
| fit_output | bool | False | If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (`translate_percent` and `translate_px` are ignored). Otherwise (`False`), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False |
| keep_ratio | bool | True | When True, the original aspect ratio will be kept when the random scale is applied. Default: True. |
| rotate_method | One of:
| largest_box | rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box" |
| balanced_scale | bool | False | When True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False. This is important because without it, scaling tends to lean towards upscaling. For example, if we want the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly from [0.5, 2]. With `balanced_scale`, the function ensures that half the time, the scaling factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in). This makes the zooming in and out process more balanced. |
| border_mode | One of:
| 0 | OpenCV border flag. |
| fill | One of:
| 0 | The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when `mode=constant`. The expected value range is `[0, 255]` for `uint8` images. |
| fill_mask | One of:
| - | Same as fill but only for masks. |
| p | float | 0.5 | probability of applying the transform. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with different parameter types
>>> transform = A.Compose([
... A.Affine(
... # Tuple for scale (will be used for both x and y)
... scale=(0.8, 1.2),
... # Dictionary with tuples for different x/y translations
... translate_percent={"x": (-0.2, 0.2), "y": (-0.1, 0.1)},
... # Tuple for rotation range
... rotate=(-30, 30),
... # Dictionary with tuples for different x/y shearing
... shear={"x": (-10, 10), "y": (-5, 5)},
... # Interpolation methods
... interpolation=cv2.INTER_LINEAR,
... mask_interpolation=cv2.INTER_NEAREST,
... # Other parameters
... fit_output=False,
... keep_ratio=True,
... rotate_method="largest_box",
... balanced_scale=True,
... border_mode=cv2.BORDER_CONSTANT,
... fill=0,
... fill_mask=0,
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Image with affine transforms applied
>>> transformed_mask = transformed['mask'] # Mask with affine transforms applied
>>> transformed_bboxes = transformed['bboxes'] # Bounding boxes with affine transforms applied
>>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints'] # Keypoints with affine transforms applied
>>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for transformed keypoints
>>>
>>> # Simpler example with only essential parameters
>>> simple_transform = A.Compose([
... A.Affine(
... scale=(1.1, 1.1),
... rotate=(15, 15),
... translate_px=(30, 30),
... p=1.0
... ),
... ])
>>> simple_result = simple_transform(image=image)
>>> simple_transformed = simple_result['image']References
- [{'description': 'Towards Rotation Invariance in Object Detection', 'source': 'https://arxiv.org/abs/2109.13488'}]
GridElasticDeformclass
GridElasticDeform(
num_grid_xy: tuple[int, int],
magnitude: int,
interpolation: 0 | 1 | 2 | 3 | 4 = 1,
mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
p: float = 1.0
)Elastic deformations via a grid: displace control points and interpolate. num_grid_xy and magnitude control density and strength. Good for local stretching. This transformation overlays a grid on the input and applies random displacements to the grid points, resulting in local elastic distortions. The granularity and intensity of the distortions can be controlled using the dimensions of the overlaying distortion grid and the magnitude parameter.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| num_grid_xy | tuple[int, int] | - | Number of grid cells along the width and height. Specified as (grid_width, grid_height). Each value must be greater than 1. |
| magnitude | int | - | Maximum pixel-wise displacement for distortion. Must be greater than 0. |
| interpolation | One of:
| 1 | Interpolation method to be used for the image transformation. Default: cv2.INTER_LINEAR |
| mask_interpolation | One of:
| 0 | Interpolation method to be used for mask transformation. Default: cv2.INTER_NEAREST |
| p | float | 1.0 | Probability of applying the transform. Default: 1.0. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with parameters as tuples when possible
>>> transform = A.Compose([
... A.GridElasticDeform(
... num_grid_xy=(4, 4),
... magnitude=10,
... interpolation=cv2.INTER_LINEAR,
... mask_interpolation=cv2.INTER_NEAREST,
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Elastically deformed image
>>> transformed_mask = transformed['mask'] # Elastically deformed mask
>>> transformed_bboxes = transformed['bboxes'] # Elastically deformed bounding boxes
>>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints'] # Elastically deformed keypoints
>>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for transformed keypointsNotes
This transformation is particularly useful for data augmentation in medical imaging and other domains where elastic deformations can simulate realistic variations.
Morphologicalclass
Morphological(
scale: tuple[int, int] = (2, 3),
operation: 'erosion' | 'dilation' = dilation,
p: float = 0.5
)Dilation or erosion with a structuring element (scale). For document scans: dilation fills gaps in text; erosion removes noise. Operation and scale per call. Morphological operations modify the structure of the image. Dilation expands the white (foreground) regions in a binary or grayscale image, while erosion shrinks them. These operations are beneficial in document processing, for example: - Dilation helps in closing up gaps within text or making thin lines thicker, enhancing legibility for OCR (Optical Character Recognition). - Erosion can remove small white noise and detach connected objects, making the structure of larger objects more pronounced.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| scale | tuple[int, int] | (2, 3) | Specifies the size of the structuring element (kernel) used for the operation. - If an integer is provided, a square kernel of that size will be used. - If a tuple or list is provided, it should contain two integers representing the minimum and maximum sizes for the dilation kernel. |
| operation | One of:
| dilation | The morphological operation to apply. Default is 'dilation'. |
| p | float | 0.5 | The probability of applying this transformation. Default is 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Create a document-like binary image with text
>>> image = np.ones((200, 500), dtype=np.uint8) * 255 # White background
>>> # Add some "text" (black pixels)
>>> cv2.putText(image, "Document Text", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, 0, 2)
>>> # Add some "noise" (small black dots)
>>> for _ in range(50):
... x, y = np.random.randint(0, image.shape[1]), np.random.randint(0, image.shape[0])
... cv2.circle(image, (x, y), 1, 0, -1)
>>>
>>> # Create a mask representing text regions
>>> mask = np.zeros_like(image)
>>> mask[image < 128] = 1 # Binary mask where text exists
>>>
>>> # Example 1: Apply dilation to thicken text and fill gaps
>>> dilation_transform = A.Morphological(
... scale=(3, 3), # Width and height of the structuring element
... operation="dilation", # Expand white regions (or black if inverted)
... p=1.0 # Always apply
... )
>>> result = dilation_transform(image=image, mask=mask)
>>> dilated_image = result['image'] # Text is thicker, gaps are filled
>>> dilated_mask = result['mask'] # Mask is expanded around text regions
>>>
>>> # Example 2: Apply erosion to thin text or remove noise
>>> erosion_transform = A.Morphological(
... scale=(2, 3), # Width and height of the structuring element
... operation="erosion", # Shrink white regions (or expand black if inverted)
... p=1.0 # Always apply
... )
>>> result = erosion_transform(image=image, mask=mask)
>>> eroded_image = result['image'] # Text is thinner, small noise may be removed
>>> eroded_mask = result['mask'] # Mask is contracted around text regions
>>>
>>> # Note: For document processing, dilation often helps enhance readability for OCR
>>> # while erosion can help remove noise or separate connected componentsReferences
- [{'description': 'Nougat', 'source': 'https://github.com/facebookresearch/nougat'}]
Perspectiveclass
Perspective(
scale: tuple[float, float] = (0.05, 0.1),
keep_size: bool = True,
fit_output: bool = False,
interpolation: 0 | 1 | 2 | 3 | 4 = 1,
mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
border_mode: 0 | 1 | 2 | 3 | 4 = 0,
fill: tuple[float, ...] | float = 0,
fill_mask: tuple[float, ...] | float = 0,
p: float = 0.5
)Apply random four-point perspective transformation. Params: scale, keep_size, border_mode, fill, interpolation. Supports image, mask, bboxes, keypoints.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| scale | tuple[float, float] | (0.05, 0.1) | Standard deviation range `(low, high)` for the normal distribution used to sample random corner displacements (as a fraction of image size). Must be a non-decreasing 2-tuple of non-negative floats. Default: (0.05, 0.1). |
| keep_size | bool | True | Whether to resize image back to its original size after applying the perspective transform. If set to False, the resulting images may end up having different shapes. Default: True. |
| fit_output | bool | False | If True, the image plane size and position will be adjusted to still capture the whole image after perspective transformation. This is followed by image resizing if keep_size is set to True. If False, parts of the transformed image may be outside of the image plane. This setting should not be set to True when using large scale values as it could lead to very large images. Default: False. |
| interpolation | One of:
| 1 | Interpolation method to be used for image transformation. Should be one of the OpenCV interpolation types. Default: cv2.INTER_LINEAR |
| mask_interpolation | One of:
| 0 | Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST. |
| border_mode | One of:
| 0 | OpenCV border mode used for padding. Default: cv2.BORDER_CONSTANT. |
| fill | One of:
| 0 | Padding value if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
| fill_mask | One of:
| 0 | Padding value for mask if border_mode is cv2.BORDER_CONSTANT. Default: 0. |
| p | float | 0.5 | Probability of applying the transform. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with parameters as tuples when possible
>>> transform = A.Compose([
... A.Perspective(
... scale=(0.05, 0.1),
... keep_size=True,
... fit_output=False,
... border_mode=cv2.BORDER_CONSTANT,
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Perspective-transformed image
>>> transformed_mask = transformed['mask'] # Perspective-transformed mask
>>> transformed_bboxes = transformed['bboxes'] # Perspective-transformed bounding boxes
>>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints'] # Perspective-transformed keypoints
>>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for transformed keypointsNotes
This transformation creates a perspective effect by randomly moving the four corners of the image. The amount of movement is controlled by the 'scale' parameter. When 'keep_size' is True, the output image will have the same size as the input image, which may cause some parts of the transformed image to be cut off or padded. When 'fit_output' is True, the transformation ensures that the entire transformed image is visible, which may result in a larger output image if keep_size is False.
RandomGridShuffleclass
RandomGridShuffle(
grid: tuple[int, int] = (3, 3),
p: float = 0.5
)Split image into a grid and randomly permute cells; same shuffle for all targets. Grid size from grid (e.g. (3, 3)). Breaks global layout, keeps local content.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| grid | tuple[int, int] | (3, 3) | Size of the grid for splitting the image into cells. Each cell is shuffled randomly. For example, (3, 3) will divide the image into a 3x3 grid, resulting in 9 cells to be shuffled. Default: (3, 3) |
| p | float | 0.5 | Probability that the transform will be applied. Should be in the range [0, 1]. Default: 0.5 |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with grid as a tuple
>>> transform = A.Compose([
... A.RandomGridShuffle(grid=(3, 3), p=1.0),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Grid-shuffled image
>>> transformed_mask = transformed['mask'] # Grid-shuffled mask
>>> transformed_bboxes = transformed['bboxes'] # Grid-shuffled bounding boxes
>>> transformed_keypoints = transformed['keypoints'] # Grid-shuffled keypoints
>>>
>>> # Visualization example with a simpler grid
>>> simple_image = np.array([
... [1, 1, 1, 2, 2, 2],
... [1, 1, 1, 2, 2, 2],
... [1, 1, 1, 2, 2, 2],
... [3, 3, 3, 4, 4, 4],
... [3, 3, 3, 4, 4, 4],
... [3, 3, 3, 4, 4, 4]
... ])
>>> simple_transform = A.RandomGridShuffle(grid=(2, 2), p=1.0)
>>> simple_result = simple_transform(image=simple_image)
>>> simple_transformed = simple_result['image']
>>> # The result could look like:
>>> # array([[4, 4, 4, 2, 2, 2],
>>> # [4, 4, 4, 2, 2, 2],
>>> # [4, 4, 4, 2, 2, 2],
>>> # [3, 3, 3, 1, 1, 1],
>>> # [3, 3, 3, 1, 1, 1],
>>> # [3, 3, 3, 1, 1, 1]])Notes
- This transform maintains consistency across all targets. If applied to an image and its corresponding mask or keypoints, the same shuffling will be applied to all. - The number of cells in the grid should be at least 2 (i.e., grid should be at least (1, 2), (2, 1), or (2, 2)) for the transform to have any effect. - Keypoints are moved along with their corresponding grid cell. - This transform could be useful when only micro features are important for the model, and memorizing the global structure could be harmful. For example: - Identifying the type of cell phone used to take a picture based on micro artifacts generated by phone post-processing algorithms, rather than the semantic features of the photo. See more at https://ieeexplore.ieee.org/abstract/document/8622031 - Identifying stress, glucose, hydration levels based on skin images.
ShiftScaleRotateclass
ShiftScaleRotate(
shift_range: tuple[float, float] = (-0.0625, 0.0625),
scale_range: tuple[float, float] = (-0.1, 0.1),
rotate_range: tuple[float, float] = (-45, 45),
interpolation: 0 | 1 | 2 | 3 | 4 = 1,
border_mode: 0 | 1 | 2 | 3 | 4 = 0,
shift_range_x: tuple[float, float] | None,
shift_range_y: tuple[float, float] | None,
rotate_method: 'largest_box' | 'ellipse' = largest_box,
mask_interpolation: 0 | 1 | 2 | 3 | 4 = 0,
fill: tuple[float, ...] | float = 0,
fill_mask: tuple[float, ...] | float = 0,
p: float = 0.5
)One-step affine: random shift, scale, and rotation. Limits sampled per call; good for pose or scale augmentation without separate transforms.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| shift_range | tuple[float, float] | (-0.0625, 0.0625) | shift factor range `(low, high)` for both height and width. Absolute values must lie in [-1, 1]. Default: (-0.0625, 0.0625). |
| scale_range | tuple[float, float] | (-0.1, 0.1) | scaling factor range `(low, high)`. Note that this range is biased by 1, so sampling happens from `(1 + low, 1 + high)`. Default: (-0.1, 0.1). |
| rotate_range | tuple[float, float] | (-45, 45) | rotation range `(low, high)`. Default: (-45, 45). |
| interpolation | One of:
| 1 | flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR. |
| border_mode | One of:
| 0 | flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_CONSTANT |
| shift_range_x | One of:
| - | shift factor range `(low, high)` for width. If set, overrides `shift_range` along the x-axis. Absolute values must lie in [-1, 1]. |
| shift_range_y | One of:
| - | shift factor range `(low, high)` for height. If set, overrides `shift_range` along the y-axis. Absolute values must lie in [-1, 1]. |
| rotate_method | One of:
| largest_box | rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse". Default: "largest_box" |
| mask_interpolation | One of:
| 0 | Flag that is used to specify the interpolation algorithm for mask. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_NEAREST. |
| fill | One of:
| 0 | padding value if border_mode is cv2.BORDER_CONSTANT. |
| fill_mask | One of:
| 0 | padding value if border_mode is cv2.BORDER_CONSTANT applied for masks. |
| p | float | 0.5 | probability of applying the transform. Default: 0.5. |
Examples
>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with parameters as tuples when possible
>>> transform = A.Compose([
... A.ShiftScaleRotate(
... shift_range=(-0.0625, 0.0625),
... scale_range=(-0.1, 0.1),
... rotate_range=(-45, 45),
... interpolation=cv2.INTER_LINEAR,
... border_mode=cv2.BORDER_CONSTANT,
... rotate_method="largest_box",
... p=1.0
... ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
... keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
... image=image,
... mask=mask,
... bboxes=bboxes,
... bbox_labels=bbox_labels,
... keypoints=keypoints,
... keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image'] # Shifted, scaled and rotated image
>>> transformed_mask = transformed['mask'] # Shifted, scaled and rotated mask
>>> transformed_bboxes = transformed['bboxes'] # Shifted, scaled and rotated bounding boxes
>>> transformed_bbox_labels = transformed['bbox_labels'] # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints'] # Shifted, scaled and rotated keypoints
>>> transformed_keypoint_labels = transformed['keypoint_labels'] # Labels for transformed keypoints