Replay and Applied-Parameter Debugging

Augmentation pipelines are stochastic by design. That is useful for training, but it can make debugging hard when one sample produces a suspicious prediction, a broken target, or an unusually high loss. Albumentations gives you three complementary tools for this: fixed seeds, replay, and applied-parameter logging.

Use this guide when you need to answer: "What exactly happened to this sample, and can I reproduce it?"

Three Levels of Reproducibility

Tool	What it gives you	Use it when
`seed=137` in Compose	A reproducible random sequence for a specific pipeline instance.	You want repeatable experiments, comparable debugging runs, or stable validation preprocessing.
ReplayCompose	A replay dictionary that can reproduce the exact sampled augmentation for one input.	You need to rerun the same transformation on a suspicious sample or matching targets.
`save_applied_params=True` in Compose	A compact `result["applied_transforms"]` list with the transforms that ran and the sampled values they exposed.	You want lightweight per-sample audit logs during training or validation.

A fixed seed controls the sequence produced by a pipeline. Replay captures one concrete draw from that sequence. Applied-parameter logging records what happened so you can inspect or aggregate it later.

Inspect What Happened to One Sample

Set save_applied_params=True on Compose. The result dictionary then includes applied_transforms.

import albumentations as A

transform = A.Compose(
    [
        A.RandomCrop(256, 256),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(p=0.5),
    ],
    save_applied_params=True,
    seed=137,
)

result = transform(image=image)

print(result["applied_transforms"])
# [
#     ("RandomCrop", {"shape": (512, 512, 3), "crop_coords": (34, 89, 290, 345), ...}),
#     ("HorizontalFlip", {"shape": (256, 256, 3)}),
# ]

The exact fields depend on the transforms in the pipeline. Treat the log as diagnostic data: it tells you which transforms ran and which concrete values were sampled for that call.

Replay One Suspicious Sample

Use ReplayCompose when you need exact replay of a sampled transformation. It stores a richer result["replay"] dictionary that can be passed back to A.ReplayCompose.replay(...).

import albumentations as A

transform = A.ReplayCompose(
    [
        A.RandomCrop(256, 256),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(p=0.5),
    ],
    seed=137,
)

result = transform(image=image, mask=mask)
replay = result["replay"]

# Later, reproduce the same sampled augmentation on the same sample.
reproduced = A.ReplayCompose.replay(replay, image=image, mask=mask)

This is the right tool for investigating one failed sample. Store the replay dictionary next to the sample identifier and model output, then use it to regenerate the exact augmented image and targets for inspection.

Log Failed Samples During Training

For ongoing diagnostics, keep logging lightweight. In a PyTorch-style dataset, return the sample ID and applied_transforms with the augmented data. If your training loop expects tensors, include that conversion in the transform pipeline and return result["image"].

class TrainingDataset:
    def __init__(self, records, transform):
        self.records = records
        self.transform = transform

    def __getitem__(self, index):
        record = self.records[index]
        result = self.transform(image=record["image"])

        return {
            "sample_id": record["id"],
            "image": result["image"],
            "target": record["target"],
            "applied_transforms": result["applied_transforms"],
        }

Use a collate function that keeps metadata as ordinary Python lists, then log suspicious samples after computing the loss:

import torch


loss_fn = torch.nn.CrossEntropyLoss(reduction="none")


def collate_debug_batch(samples):
    return {
        "sample_id": [sample["sample_id"] for sample in samples],
        "image": torch.stack([sample["image"] for sample in samples]),
        "target": torch.tensor([sample["target"] for sample in samples]),
        "applied_transforms": [sample["applied_transforms"] for sample in samples],
    }


debug_rows = []

for batch in train_loader:
    outputs = model(batch["image"])
    per_sample_loss = loss_fn(outputs, batch["target"])

    for sample_id, applied_transforms, loss in zip(
        batch["sample_id"],
        batch["applied_transforms"],
        per_sample_loss.detach().cpu().tolist(),
    ):
        if loss > high_loss_threshold:
            debug_rows.append(
                {
                    "sample_id": sample_id,
                    "applied_transforms": applied_transforms,
                    "loss": loss,
                },
            )

Those (sample_id, applied_transforms, loss) triples let you sort by loss, inspect the augmentation history of failed samples, and look for patterns such as too much blur, unrealistic crops, or label-target mismatch.

Turn Applied Parameters Into a Deterministic Probe

You can reconstruct a deterministic Compose pipeline from applied_transforms:

result = transform(image=image)

probe = A.Compose.from_applied_transforms(result["applied_transforms"])
probe_result = probe(image=image)

This is useful for quick local probes because the reconstructed transforms run with p=1.0 and use the sampled constructor-level values from the original call.

For exact reproduction of all runtime parameters, prefer ReplayCompose. from_applied_transforms(...) is not a full substitute for replay when a transform has internal runtime randomness such as crop positions, dropout masks, or other per-call sampled geometry.

Limitations

Applied-parameter logs help you inspect behavior; they do not choose a policy automatically. If high-loss samples often contain extreme crops or unrealistic color shifts, the log tells you where to look. You still need to decide whether the policy is appropriate for the task, run ablations, and validate the change on held-out data.

Replay and applied-parameter logging also do not replace versioned pipeline configuration. For experiment auditability, keep the augmentation policy, Albumentations version, seed, worker settings, and model artifact together.