Replay and Applied-Parameter Debugging
On this page
- Three Levels of Reproducibility
- Inspect What Happened to One Sample
- Replay One Suspicious Sample
- Log Failed Samples During Training
- Turn Applied Parameters Into a Deterministic Probe
- Limitations
- Related Guides
Augmentation pipelines are stochastic by design. That is useful for training, but it can make debugging hard when one sample produces a suspicious prediction, a broken target, or an unusually high loss. Albumentations gives you three complementary tools for this: fixed seeds, replay, and applied-parameter logging.
Use this guide when you need to answer: "What exactly happened to this sample, and can I reproduce it?"
Three Levels of Reproducibility
| Tool | What it gives you | Use it when |
|---|---|---|
seed=137 in Compose | A reproducible random sequence for a specific pipeline instance. | You want repeatable experiments, comparable debugging runs, or stable validation preprocessing. |
| ReplayCompose | A replay dictionary that can reproduce the exact sampled augmentation for one input. | You need to rerun the same transformation on a suspicious sample or matching targets. |
save_applied_params=True in Compose | A compact result["applied_transforms"] list with the transforms that ran and the sampled values they exposed. | You want lightweight per-sample audit logs during training or validation. |
A fixed seed controls the sequence produced by a pipeline. Replay captures one concrete draw from that sequence. Applied-parameter logging records what happened so you can inspect or aggregate it later.
Inspect What Happened to One Sample
Set save_applied_params=True on Compose. The result dictionary then includes applied_transforms.
import albumentations as A
transform = A.Compose(
[
A.RandomCrop(256, 256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
],
save_applied_params=True,
seed=137,
)
result = transform(image=image)
print(result["applied_transforms"])
# [
# ("RandomCrop", {"shape": (512, 512, 3), "crop_coords": (34, 89, 290, 345), ...}),
# ("HorizontalFlip", {"shape": (256, 256, 3)}),
# ]
The exact fields depend on the transforms in the pipeline. Treat the log as diagnostic data: it tells you which transforms ran and which concrete values were sampled for that call.
Replay One Suspicious Sample
Use ReplayCompose when you need exact replay of a sampled transformation. It stores a richer result["replay"] dictionary that can be passed back to A.ReplayCompose.replay(...).
import albumentations as A
transform = A.ReplayCompose(
[
A.RandomCrop(256, 256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
],
seed=137,
)
result = transform(image=image, mask=mask)
replay = result["replay"]
# Later, reproduce the same sampled augmentation on the same sample.
reproduced = A.ReplayCompose.replay(replay, image=image, mask=mask)
This is the right tool for investigating one failed sample. Store the replay dictionary next to the sample identifier and model output, then use it to regenerate the exact augmented image and targets for inspection.
Log Failed Samples During Training
For ongoing diagnostics, keep logging lightweight. In a PyTorch-style dataset, return the sample ID and applied_transforms with the augmented data. If your training loop expects tensors, include that conversion in the transform pipeline and return result["image"].
class TrainingDataset:
def __init__(self, records, transform):
self.records = records
self.transform = transform
def __getitem__(self, index):
record = self.records[index]
result = self.transform(image=record["image"])
return {
"sample_id": record["id"],
"image": result["image"],
"target": record["target"],
"applied_transforms": result["applied_transforms"],
}
Use a collate function that keeps metadata as ordinary Python lists, then log suspicious samples after computing the loss:
import torch
loss_fn = torch.nn.CrossEntropyLoss(reduction="none")
def collate_debug_batch(samples):
return {
"sample_id": [sample["sample_id"] for sample in samples],
"image": torch.stack([sample["image"] for sample in samples]),
"target": torch.tensor([sample["target"] for sample in samples]),
"applied_transforms": [sample["applied_transforms"] for sample in samples],
}
debug_rows = []
for batch in train_loader:
outputs = model(batch["image"])
per_sample_loss = loss_fn(outputs, batch["target"])
for sample_id, applied_transforms, loss in zip(
batch["sample_id"],
batch["applied_transforms"],
per_sample_loss.detach().cpu().tolist(),
):
if loss > high_loss_threshold:
debug_rows.append(
{
"sample_id": sample_id,
"applied_transforms": applied_transforms,
"loss": loss,
},
)
Those (sample_id, applied_transforms, loss) triples let you sort by loss, inspect the augmentation history of failed samples, and look for patterns such as too much blur, unrealistic crops, or label-target mismatch.
Turn Applied Parameters Into a Deterministic Probe
You can reconstruct a deterministic Compose pipeline from applied_transforms:
result = transform(image=image)
probe = A.Compose.from_applied_transforms(result["applied_transforms"])
probe_result = probe(image=image)
This is useful for quick local probes because the reconstructed transforms run with p=1.0 and use the sampled constructor-level values from the original call.
For exact reproduction of all runtime parameters, prefer ReplayCompose. from_applied_transforms(...) is not a full substitute for replay when a transform has internal runtime randomness such as crop positions, dropout masks, or other per-call sampled geometry.
Limitations
Applied-parameter logs help you inspect behavior; they do not choose a policy automatically. If high-loss samples often contain extreme crops or unrealistic color shifts, the log tells you where to look. You still need to decide whether the policy is appropriate for the task, run ablations, and validate the change on held-out data.
Replay and applied-parameter logging also do not replace versioned pipeline configuration. For experiment auditability, keep the augmentation policy, Albumentations version, seed, worker settings, and model artifact together.