Rebuilding Docker Images? The Bad News: Only ~1 in 40 Are Truly Reproducible

Container images are now a core unit of software delivery—and a prime target for supply-chain attacks. In theory, reproducible container builds offer a clean integrity check: rebuild an image from its Dockerfile and compare hashes. If the hash matches, you have strong evidence that the published image corresponds to the source.

In “It’s Not Just Timestamps: A Study on Docker Reproducibility” (Oreofe Solarin, 2026), the author puts that promise to the test at scale, with a measurement pipeline applied to 2,000 GitHub repositories containing Dockerfiles. The headline result is blunt: reproducible Docker images are extremely rare in the wild, and the reasons go far beyond timestamps.

What the Study Measured

The pipeline:

clones each repo,
finds Dockerfiles (preferring a root Dockerfile when possible),
attempts clean builds,
then checks reproducibility under three increasingly strict lenses:
1. Bitwise reproducibility: do two clean builds produce identical image digests?
2. Infra-reproducibility: after “hardening” build infrastructure (e.g., normalized timestamps via SOURCE_DATE_EPOCH), do the digests match?
3. Content-level checks: for non-matching digests, tools like diffoci and diffoscope identify what actually differs inside the images.

Key Findings: Buildability Is a Bigger Problem Than You Think

Before reproducibility, there’s a more basic hurdle: many Dockerfiles don’t even build reliably in an automated pipeline.

Out of 2,000 sampled repos, only 56.1% produced any buildable image.
The rest failed due to build errors, repository access issues, or timeouts.

So even basic “can we rebuild this?” often fails in practice.

Reproducibility Results: “As-Is” Bitwise Matches Are Nearly Nonexistent

Among the 1,123 buildable Dockerfiles:

Only 2.7% were bitwise reproducible as-is (same Docker toolchain, two clean builds).
After infrastructure hardening, reproducibility improved by 18.6 percentage points.
Yet 78.7% of buildable Dockerfiles remained non-reproducible, even after removing major infrastructure-level nondeterminism.

Even more striking: the study observed zero cases where images were “semantically identical but hash-different.” In other words, most hash mismatches corresponded to real content differences, not just harmless metadata noise.

What Breaks Reproducibility (Spoiler: It’s Often the Dockerfile’s Fault)

The paper’s central claim is in the title: timestamps aren’t the main villain—or at least, not the only one.

After infrastructure fixes, diff analysis showed recurring culprits largely under developer control:

File ordering / formatting differences (78.1%)
System logs baked into images (43.3%) — e.g., /var/log/apt/*, dpkg.log
Caches and on-disk databases (36.8%) — package caches, font caches, linker caches, etc.
Compiled artifacts (20.0%) — ELF binaries, bytecode outputs
App-specific generated files (13.0%) — reports, downloaded models, generated assets
Random/non-deterministic data (9.4%)
Package-manager state (5.6%)

The story here is practical: many Dockerfiles accidentally preserve “runtime junk” (logs, caches, machine IDs, bytecode, nondeterministic build outputs) inside the final image. Those files change between builds, so your digest changes too.

A Simple Example: Four Fixes Can Flip the Outcome

The paper shows a small but powerful illustration: a typical Dockerfile using a floating base tag and unpinned dependencies produces different digests across rebuilds. But with a handful of changes—like pinning the base image by digest, pinning package versions, and normalizing build timestamps—the same Dockerfile becomes bit-identical across builds.

Practical Dockerfile Guidelines (Actionable Takeaways)

Based on the most common root causes, the study suggests developer-facing best practices that are directly CI-lintable:

Pin everything
- Base images by digest (FROM python:3.11@sha256:…)
- Packages by version (avoid floating latest-style installs)
Clean up aggressively
- Remove apt lists and caches (/var/lib/apt/lists/*, archives)
- Remove logs (/var/log/*) when they don’t belong in the final image
- Avoid leaving build caches (pip/npm caches) behind
Reduce non-deterministic outputs
- Prevent Python bytecode drift (PYTHONDONTWRITEBYTECODE=1 or controlled compilation)
- Keep generated artifacts out of final layers when possible
Use reproducibility-friendly build settings
- Normalize timestamps with SOURCE_DATE_EPOCH
- Consider disabling provenance/SBOM if your goal is strict bitwise matching (trade-off: you lose metadata that can be useful for security auditing)

Why This Matters for Supply-Chain Security

Reproducible builds are often treated as a “simple integrity check.” This paper shows why that check rarely works for containers today: even with hardened infrastructure, most Dockerfiles embed build-to-build drift inside the image.

The implication: if you want hash-based verification to be meaningful in container supply chains, reproducibility has to become a developer habit—and ideally, a CI-enforced standard via linters and automated checks.

In short: it’s not just timestamps. It’s the everyday Dockerfile choices we’ve normalized for years.

source: https://arxiv.org/pdf/2602.17678

Leave a Reply Cancel reply

The Internet’s “Danger Zones”: How to Spot Information Voids Before Misinformation Takes Over

When AI Decides What “Violence” Means… It Doesn’t Think Like You Do

The “Hidden Traffic Hack” in Chaotic Roads: Why 30–60% Vehicle Grouping Can Boost Flow (and When It Backfires)

The “Household Size” Bombshell: Why Some European Countries Were Basically Set Up to Lose Against COVID

NYC’s Congestion Toll Shock: Who Really Wins, Who Pays the Price?

Teaching AI to Find Person–Place Links in Historical Texts

Anisotropic Marginal Fermi Liquids in Multi-Weyl Semimetals: How Coulomb Interactions Reshape Quasiparticles

Forecasting Oil Volatility through Network Models