Here’s a compact, practical way to think about **embedding in‑toto provenance attestations directly inside your event payloads** (instead of sidecar files), so your vuln/build graph stays temporally consistent. --- ### Why embed? * **Atomicity:** build → publish → scan → VEX decisions share one event ID and clock; no dangling sidecars. * **Replayability:** the event stream alone reproduces state (great for offline kits/audits). * **Causal joins:** vulnerability findings can cite the exact provenance that led to an image/digest. --- ### Event shape (single, self‑contained envelope) ```json { "eventId": "01JDN2Q0YB8M…", "eventType": "build.provenance.v1", "occurredAt": "2025-11-13T10:22:31Z", "subject": { "artifactPurl": "pkg:docker/acme/api@sha256:…", "digest": {"sha256": "…"} }, "provenance": { "kind": "in-toto-provenance", "dsse": { "payloadType": "application/vnd.in-toto+json", "payload": "", "signatures": [{"keyid":"…","sig":"…"}] }, "transparency": { "rekor": {"logIndex": 123456, "logID": "…", "entryUUID": "…"} } }, "sig": { "envelope": "dsse", "alg": "Ed25519", "bundle": { "certChain": ["…"], "timestamp": "…" } }, "meta": { "builderId": "https://builder.stella-ops.local/gha", "buildInvocationId": "gha-run-457812", "slsa": {"level": 3} } } ``` **Notes** * `provenance.dsse.payload` holds the raw in‑toto Statement (Statement + Subject + Predicate). * Keep both **artifact digest** (subject) and **statement subject** (inside payload) and verify they match on ingest. --- ### DB model (Mongo-esque) * `events` collection: one doc per event (above schema). * **Compound index:** `{ "subject.digest.sha256": 1, "occurredAt": 1 }` * **Causal index:** `{ "meta.buildInvocationId": 1 }` * **Uniq guard:** `{ "eventId": 1 } unique` --- ### Ingest pipeline (deterministic) 1. **Verify DSSE:** check signature, cert roots (or offline trust bundle). 2. **Validate Statement:** subject digests, builder ID, predicateType. 3. **Upsert artifact node:** keyed by digest; attach `lastProvenanceEventId`. 4. **Append event:** write once; never mutate (event‑sourced). 5. **Emit derived edges:** `(builderId) --built--> (artifact@digest)` with `occurredAt`. --- ### Joining scans to provenance (temporal consistency) * When a scan event arrives, resolve the **latest provenance event with `occurredAt ≤ scan.occurredAt`** for the same digest. * Store an edge `(artifact@digest) --scannedWith--> (scanner@version)` with a **pointer to the provenance eventId** used for policy. --- ### Minimal .NET 10 contracts ```csharp public sealed record DsseEnvelope(string PayloadType, string Payload, IReadOnlyList Signatures); public sealed record Provenance(string Kind, DsseEnvelope Dsse, Transparency? Transparency); public sealed record EventSubject(string ArtifactPurl, Digest Digest); public sealed record EventEnvelope( string EventId, string EventType, DateTime OccurredAt, EventSubject Subject, Provenance Provenance, SigMeta Sig, Meta Meta); public interface IEventVerifier { ValueTask VerifyAsync(EventEnvelope ev, CancellationToken ct); } public interface IEventIngestor { ValueTask IngestAsync(EventEnvelope ev, CancellationToken ct); // verify->validate->append->derive } ``` --- ### Policy hooks (VEX/Trust Algebra) * **Rule:** “Only trust findings if the scan’s referenced provenance has `builderId ∈ AllowedBuilders` and `SLSA ≥ 3` and `time(scan) − time(prov) ≤ 24h`.” * **Effect:** drops stale/forged results and aligns all scoring to one timeline. --- ### Migration from sidecars 1. **Dual‑write** for one sprint: keep emitting sidecars, but also embed DSSE in events. 2. Add **backfill job**: wraps historical sidecars into `build.provenance.v1` events (preserve original timestamps). 3. Flip **consumers** (scoring/VEX) to **require `provenance` in the event**; keep sidecar reader only for legacy imports. --- ### Failure & edge cases * **Oversized payloads:** gzip the DSSE payload; cap event body (e.g., 512 KB) and store overflow in `provenance.ref` (content‑addressed blob) while **hash‑linking** it in the event. * **Multiple subjects:** keep the Statement intact; still key the event by the **primary digest** you care about, but validate all subjects. --- ### Quick checklist to ship * [ ] Event schema & JSON schema with strict types (no additionalProperties). * [ ] DSSE + in‑toto validators (offline trust bundles supported). * [ ] Mongo indexes + append‑only writer. * [ ] Temporal join in scanner consumer (≤ O(log n) via index). * [ ] VEX rules referencing `event.meta` & `provenance.dsse`. * [ ] Backfill task for legacy sidecars. * [ ] Replay test: rebuild graph from events only → identical results. If you want, I can turn this into ready‑to‑drop **.proto + C# models**, plus a Mongo migration script and a tiny verifier service.