Container images are now a core unit of software delivery—and a prime target for supply-chain attacks. In theory, reproducible container builds offer a clean integrity check: rebuild an image from its Dockerfile and compare hashes. If the hash matches, you have strong evidence that the published image corresponds to the source.
In “It’s Not Just Timestamps: A Study on Docker Reproducibility” (Oreofe Solarin, 2026), the author puts that promise to the test at scale, with a measurement pipeline applied to 2,000 GitHub repositories containing Dockerfiles. The headline result is blunt: reproducible Docker images are extremely rare in the wild, and the reasons go far beyond timestamps.
What the Study Measured
The pipeline:
-
clones each repo,
-
finds Dockerfiles (preferring a root Dockerfile when possible),
-
attempts clean builds,
-
then checks reproducibility under three increasingly strict lenses:
-
Bitwise reproducibility: do two clean builds produce identical image digests?
-
Infra-reproducibility: after “hardening” build infrastructure (e.g., normalized timestamps via SOURCE_DATE_EPOCH), do the digests match?
-
Content-level checks: for non-matching digests, tools like diffoci and diffoscope identify what actually differs inside the images.
-
Key Findings: Buildability Is a Bigger Problem Than You Think
Before reproducibility, there’s a more basic hurdle: many Dockerfiles don’t even build reliably in an automated pipeline.
-
Out of 2,000 sampled repos, only 56.1% produced any buildable image.
-
The rest failed due to build errors, repository access issues, or timeouts.
So even basic “can we rebuild this?” often fails in practice.
Reproducibility Results: “As-Is” Bitwise Matches Are Nearly Nonexistent
Among the 1,123 buildable Dockerfiles:
-
Only 2.7% were bitwise reproducible as-is (same Docker toolchain, two clean builds).
-
After infrastructure hardening, reproducibility improved by 18.6 percentage points.
-
Yet 78.7% of buildable Dockerfiles remained non-reproducible, even after removing major infrastructure-level nondeterminism.
Even more striking: the study observed zero cases where images were “semantically identical but hash-different.” In other words, most hash mismatches corresponded to real content differences, not just harmless metadata noise.
What Breaks Reproducibility (Spoiler: It’s Often the Dockerfile’s Fault)
The paper’s central claim is in the title: timestamps aren’t the main villain—or at least, not the only one.
After infrastructure fixes, diff analysis showed recurring culprits largely under developer control:
-
File ordering / formatting differences (78.1%)
-
System logs baked into images (43.3%) — e.g., /var/log/apt/*, dpkg.log
-
Caches and on-disk databases (36.8%) — package caches, font caches, linker caches, etc.
-
Compiled artifacts (20.0%) — ELF binaries, bytecode outputs
-
App-specific generated files (13.0%) — reports, downloaded models, generated assets
-
Random/non-deterministic data (9.4%)
-
Package-manager state (5.6%)
The story here is practical: many Dockerfiles accidentally preserve “runtime junk” (logs, caches, machine IDs, bytecode, nondeterministic build outputs) inside the final image. Those files change between builds, so your digest changes too.
A Simple Example: Four Fixes Can Flip the Outcome
The paper shows a small but powerful illustration: a typical Dockerfile using a floating base tag and unpinned dependencies produces different digests across rebuilds. But with a handful of changes—like pinning the base image by digest, pinning package versions, and normalizing build timestamps—the same Dockerfile becomes bit-identical across builds.
Practical Dockerfile Guidelines (Actionable Takeaways)
Based on the most common root causes, the study suggests developer-facing best practices that are directly CI-lintable:
-
Pin everything
-
Base images by digest (FROM python:3.11@sha256:…)
-
Packages by version (avoid floating latest-style installs)
-
-
Clean up aggressively
-
Remove apt lists and caches (/var/lib/apt/lists/*, archives)
-
Remove logs (/var/log/*) when they don’t belong in the final image
-
Avoid leaving build caches (pip/npm caches) behind
-
-
Reduce non-deterministic outputs
-
Prevent Python bytecode drift (PYTHONDONTWRITEBYTECODE=1 or controlled compilation)
-
Keep generated artifacts out of final layers when possible
-
-
Use reproducibility-friendly build settings
-
Normalize timestamps with SOURCE_DATE_EPOCH
-
Consider disabling provenance/SBOM if your goal is strict bitwise matching (trade-off: you lose metadata that can be useful for security auditing)
-
Why This Matters for Supply-Chain Security
Reproducible builds are often treated as a “simple integrity check.” This paper shows why that check rarely works for containers today: even with hardened infrastructure, most Dockerfiles embed build-to-build drift inside the image.
The implication: if you want hash-based verification to be meaningful in container supply chains, reproducibility has to become a developer habit—and ideally, a CI-enforced standard via linters and automated checks.
In short: it’s not just timestamps. It’s the everyday Dockerfile choices we’ve normalized for years.
source: https://arxiv.org/pdf/2602.17678