Files
git.stella-ops.org/docs/product-advisories/archived/16-Nov-2026 - layer-sbom cache hash reuse.md
master 8355e2ff75
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Add initial implementation of Vulnerability Resolver Jobs
- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies.
- Documented roles and guidelines in AGENTS.md for Scheduler module.
- Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs.
- Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics.
- Developed API endpoints for managing resolver jobs and retrieving metrics.
- Defined models for resolver job requests and responses.
- Integrated dependency injection for resolver job services.
- Implemented ImpactIndexSnapshot for persisting impact index data.
- Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring.
- Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService.
- Created dotnet-filter.sh script to handle command-line arguments for dotnet.
- Established nuget-prime project for managing package downloads.
2025-11-18 07:52:15 +02:00

5.4 KiB
Raw Blame History

Heres a fast, practical idea to speed up container scans: add a hashbased SBOM layer cache keyed by (Docker layer digest + dependencymanifest checksum) so identical inputs skip recomputation and only verify attestations.


What this is (in plain words)

  • Layers are immutable. Each image layer already has a content digest (e.g., sha256:...).
  • Dependency state is declarative. Lockfiles/manifest files (NuGet packages.lock.json, package-lock.json, poetry.lock, go.sum, etc.) summarize deps.
  • If both the layer bytes and the manifest content are identical to something weve scanned before, recomputing the SBOM/VEX is wasted work. We can reuse the previous result (plus a quick signature/attestation check).

Cache key

CacheKey = SHA256(
  concat(
    LayerDigestCanonical,          // e.g., "sha256:abcd..."
    '\n',
    ManifestAlgo,                  // e.g., "sha256"
    ':',
    ManifestChecksum               // hash of lockfile(s) inside the layer FS view
  )
)
  • Optionally include toolchain IDs to prevent crossversion skew:

    • SbomerVersion, ScannerRulesetVersion, FeedsSnapshotId (OSV/NVD feed epoch), PolicyBundleHash.

When it hits

  • Exact same layer + same manifests → return cached SBOM component graph + vuln findings + VEX and reverify the DSSE/intoto attestation and timestamps (freshness SLA).
  • Same layer, manifests absent → fall back to bytelevel heuristics (package index cache); lower confidence.

Minimal .NET 10 sketch (StellaOps)

public sealed record LayerInput(
    string LayerDigest,            // "sha256:..."
    string? ManifestAlgo,          // "sha256"
    string? ManifestChecksum,      // hex
    string SbomerVersion,
    string RulesetVersion,
    string FeedsSnapshotId,
    string PolicyBundleHash);

public static string ComputeCacheKey(LayerInput x)
{
    var s = string.Join("\n", new[]{
        x.LayerDigest,
        x.ManifestAlgo ?? "",
        x.ManifestChecksum ?? "",
        x.SbomerVersion,
        x.RulesetVersion,
        x.FeedsSnapshotId,
        x.PolicyBundleHash
    });
    using var sha = System.Security.Cryptography.SHA256.Create();
    return Convert.ToHexString(sha.ComputeHash(System.Text.Encoding.UTF8.GetBytes(s)));
}

public sealed class SbomCacheEntry
{
    public required string CacheKey { get; init; }
    public required byte[] CycloneDxJson { get; init; }        // gz if large
    public required byte[] VexJson { get; init; }
    public required byte[] AttestationDsse { get; init; }      // for re-verify
    public required DateTimeOffset ProducedAt { get; init; }
    public required string FeedsSnapshotId { get; init; }      // provenance
}

Cache flow (Scanner)

  1. Before scan

    • Extract manifest files from the union FS of the current layer.
    • Hash them (stable newline normalization).
    • Build LayerInput; compute CacheKey.
    • Lookup in ISbomCache.Get(CacheKey).
  2. Hit

    • Verify attestation (keys/policy), check feed epoch still within tolerance, resign freshness if policy allows.
    • Emit cached SBOM/VEX downstream; mark provenance as “replayed”.
  3. Miss

    • Run normal analyzers → SBOM → vuln match → VEX lattice.

    • Create intoto/DSSE attestation.

    • Store SbomCacheEntry and index by:

      • CacheKey (primary),
      • LayerDigest (secondary),
      • (ecosystem, manifestChecksum) for diagnostics.
  4. Invalidation

    • Roll cache on FeedsSnapshotId bumps or RulesetVersion change.
    • TTL optional for emergency revocations; keep attestation+provenance for audit.

Storage options

  • Local: contentaddressed dir (/var/lib/stellaops/sbom-cache/aa/bb/<cacheKey>.cjson.gz).
  • Remote: Redis or Mongo (GridFS) keyed by cacheKey; attach indexes on LayerDigest, FeedsSnapshotId.
  • OCI artifact: push SBOM/VEX as OCI refs tied to layer digest (helps multinode CI).

Attestation verification (quick)

  • On hit: Verify(AttestationDsse, Policy); ensure subject.digest == LayerDigest and metadata (FeedsSnapshotId, tool versions) matches required policy.
  • Optional freshness stamp: a tiny, fast “verification attestation” you produce at replay time.

Edge cases

  • Multimanifest layers (polyglot): combine checksums in a stable order (e.g., SHA256(man1 + '\n' + man2 + ...)).
  • Runtimeonly diffs (no manifest change): include package index snapshot hash if you maintain one.
  • Reproducibility drift: include analyzer version & configuration knobs in the key so the cache never masks rule changes.

Why this helps

  • Cold scans compute once; subsequent builds (same base image + same lockfiles) skip minutes of work.
  • Reproducibility becomes measurable: cache hit ratio per repo, per base image, per feed epoch.

Quick tasks to add to StellaOps

  • Implement LayerInput + keying in Scanner.WebService.
  • Add Manifest Harvester step per ecosystem (NuGet, npm, pip/poetry, go, Cargo).
  • Add ISbomCache (local + Mongo/OCI backends) with metrics.
  • Wire attestation reverify path on hits.
  • Ship a cache report: hit/miss, time saved, reasons for miss (ruleset/feeds changed, manifest changed, new analyzer).

If you want, I can draft the actual C# interfaces (cache backend + verifier) and a tiny integration for your existing Sbomer/Vexer services next.