Here’s a small but high‑impact product tweak: **add an immutable `graph_revision_id` to every call‑graph page and API link**, so any result is citeable and reproducible across time. --- ### Why it matters (quick) * **Auditability:** you can prove *which* graph produced a finding. * **Reproducibility:** reruns that change paths won’t “move the goalposts.” * **Support & docs:** screenshots/links in tickets point to an exact graph state. ### What to add * **Stable anchor in all URLs:** `https://…/graphs/{graph_id}?rev={graph_revision_id}` `https://…/api/graphs/{graph_id}/nodes?rev={graph_revision_id}` * **Opaque, content‑addressed ID:** e.g., `graph_revision_id = blake3( sorted_edges + cfg + tool_versions + dataset_hashes )`. * **First‑class fields:** store `graph_id` (logical lineage), `graph_revision_id` (immutable), `parent_revision_id` (if derived), `created_at`, `provenance` (feed hashes, toolchain). * **UI surfacing:** show a copy‑button “Rev: 8f2d…c9” on graph pages and in the “Share” dialog. * **Diff affordance:** when `?rev=A` and `?rev=B` are both present, offer “Compare paths (A↔B).” ### Minimal API contract (suggested) * `GET /api/graphs/{graph_id}` → latest + `latest_revision_id` * `GET /api/graphs/{graph_id}/revisions/{graph_revision_id}` → immutable snapshot * `GET /api/graphs/{graph_id}/nodes?rev=…` and `/edges?rev=…` * `POST /api/graphs/{graph_id}/pin` with `{ graph_revision_id }` to mark “official” * HTTP `Link` header on all responses: `Link: <…/graphs/{graph_id}/revisions/{graph_revision_id}>; rel="version"` ### How to compute the revision id (deterministic) * Inputs (all normalized): sorted node/edge sets; build config; tool+model versions; input artifacts (SBOM/VEX/feed) **by hash**; environment knobs (feature flags). * Serialization: canonical JSON (UTF‑8, ordered keys). * Hash: BLAKE3/sha256 → base58/hex (shortened in UI, full in API). * Store alongside a manifest (so you can replay the graph later). ### Guardrails * **Never reuse an ID** if any input bit differs. * **Do not** make it guessable from business data (avoid leaking repo names, paths). * **Break glass:** if a bad graph must be purged, keep the ID tombstoned (410 Gone) so references don’t silently change. ### Stella Ops touches (concrete) * **Authority**: add `GraphRevisionManifest` (feeds, lattice/policy versions, scanners, in‑toto/DSSE attestations). * **Scanner/Vexer**: emit deterministic manifests and hand them to Authority for id derivation. * **Ledger**: record `(graph_id, graph_revision_id, manifest_hash, signatures)`; expose audit query by `graph_revision_id`. * **Docs & Support**: “Attach your `graph_revision_id`” line in issue templates. ### Tiny UX copy * On graph page header: `Rev 8f2d…c9` • **Copy** • **Compare** • **Pin** * Share dialog: “This link freezes today’s state. New runs get a different rev.” If you want, I can draft the DB table, the manifest JSON schema, and the exact URL/router changes for your .NET 10 services next. Cool, let’s turn this into something your engineers can actually pick up and implement. Below is a concrete implementation plan broken down by phases, services, and tickets, with suggested data models, APIs, and tests. --- ## 0. Definitions (shared across teams) * **Graph ID (`graph_id`)** – Logical identifier for a call graph lineage (e.g., “the call graph for build X of repo Y”). * **Graph Revision ID (`graph_revision_id`)** – Immutable identifier for a specific snapshot of that graph, derived from a manifest (content-addressed hash). * **Parent Revision ID (`parent_revision_id`)** – Previous revision in the lineage (if any). * **Manifest** – Canonical JSON blob that describes *everything* that could affect graph structure or results: * Nodes & edges * Input feeds and their hashes (SBOM, VEX, scanner output, etc.) * config/policies/feature flags * tool + version (scanner, vexer, authority) --- ## 1. High-Level Architecture Changes 1. **Introduce `graph_revision_id` as a first-class concept** in: * Graph storage / Authority * Ledger / audit * Backend APIs serving call graphs 2. **Derive `graph_revision_id` deterministically** from a manifest via a cryptographic hash. 3. **Expose revision in all graph-related URLs & APIs**: * UI: `…/graphs/{graph_id}?rev={graph_revision_id}` * API: `…/api/graphs/{graph_id}/revisions/{graph_revision_id}` 4. **Ensure immutability**: once a revision exists, it can never be updated in-place—only superseded by new revisions. --- ## 2. Backend: Data Model & Storage ### 2.1. Authority (graph source of truth) **Goal:** Model graphs and revisions explicitly. **New / updated tables (example in SQL-ish form):** 1. **Graphs (logical entity)** ```sql CREATE TABLE graphs ( id UUID PRIMARY KEY, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), latest_revision_id VARCHAR(128) NULL, -- FK into graph_revisions.id label TEXT NULL, -- optional human label metadata JSONB NULL ); ``` 2. **Graph Revisions (immutable snapshots)** ```sql CREATE TABLE graph_revisions ( id VARCHAR(128) PRIMARY KEY, -- graph_revision_id (hash) graph_id UUID NOT NULL REFERENCES graphs(id), parent_revision_id VARCHAR(128) NULL REFERENCES graph_revisions(id), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), manifest JSONB NOT NULL, -- canonical manifest provenance JSONB NOT NULL, -- tool versions, etc. is_pinned BOOLEAN NOT NULL DEFAULT FALSE, pinned_by UUID NULL, -- user id pinned_at TIMESTAMPTZ NULL ); CREATE INDEX idx_graph_revisions_graph_id ON graph_revisions(graph_id); ``` 3. **Call Graph Data (if separate)** If you store nodes/edges in separate tables, add a foreign key to `graph_revision_id`: ```sql ALTER TABLE call_graph_nodes ADD COLUMN graph_revision_id VARCHAR(128) NULL; ALTER TABLE call_graph_edges ADD COLUMN graph_revision_id VARCHAR(128) NULL; ``` > **Rule:** Nodes/edges for a revision are **never mutated**; a new revision means new rows. --- ### 2.2. Ledger (audit trail) **Goal:** Every revision gets a ledger record for auditability. **Table change or new table:** ```sql CREATE TABLE graph_revision_ledger ( id BIGSERIAL PRIMARY KEY, graph_revision_id VARCHAR(128) NOT NULL, graph_id UUID NOT NULL, manifest_hash VARCHAR(128) NOT NULL, manifest_digest_algo TEXT NOT NULL, -- e.g., "BLAKE3" authority_signature BYTEA NULL, -- optional created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); CREATE INDEX idx_grl_revision ON graph_revision_ledger(graph_revision_id); ``` Ledger ingestion happens **after** a revision is stored in Authority, but **before** it is exposed as “current” in the UI. --- ## 3. Backend: Revision Hashing & Manifest ### 3.1. Define the manifest schema Create a spec (e.g., JSON Schema) used by Scanner/Vexer/Authority. **Example structure:** ```json { "graph": { "graph_id": "uuid", "generator": { "tool_name": "scanner", "tool_version": "1.4.2", "run_id": "some-run-id" } }, "inputs": { "sbom_hash": "sha256:…", "vex_hash": "sha256:…", "repos": [ { "name": "repo-a", "commit": "abc123", "tree_hash": "sha1:…" } ] }, "config": { "policy_version": "2024-10-01", "feature_flags": { "new_vex_engine": true } }, "graph_content": { "nodes": [ // nodes in canonical sorted order ], "edges": [ // edges in canonical sorted order ] } } ``` **Key requirements:** * All lists that affect the graph (`nodes`, `edges`, `repos`, etc.) must be **sorted deterministically**. * Keys must be **stable** (no environment-dependent keys, no random IDs). * All hashes of input artifacts must be included (not raw content). ### 3.2. Hash computation Language-agnostic algorithm: 1. Normalize manifest to **canonical JSON**: * UTF-8 * Sorted keys * No extra whitespace 2. Hash the bytes using a cryptographic hash (BLAKE3 or SHA-256). 3. Encode as hex or base58 string. **Pseudocode:** ```pseudo function compute_graph_revision_id(manifest): canonical_json = canonical_json_encode(manifest) // sorted keys digest_bytes = BLAKE3(canonical_json) digest_hex = hex_encode(digest_bytes) return "grv_" + digest_hex[0:40] // prefix + shorten for UI ``` **Ticket:** Implement `GraphRevisionIdGenerator` library (shared): * `Compute(manifest) -> graph_revision_id` * `ValidateFormat(graph_revision_id) -> bool` Make this a **shared library** across Scanner, Vexer, Authority to avoid divergence. --- ## 4. Backend: APIs ### 4.1. Graphs & revisions REST API **New endpoints (example):** 1. **Get latest graph revision** ```http GET /api/graphs/{graph_id} Response: { "graph_id": "…", "latest_revision_id": "grv_8f2d…c9", "created_at": "…", "metadata": { … } } ``` 2. **List revisions for a graph** ```http GET /api/graphs/{graph_id}/revisions Query: ?page=1&pageSize=20 Response: { "graph_id": "…", "items": [ { "graph_revision_id": "grv_8f2d…c9", "created_at": "…", "parent_revision_id": null, "is_pinned": true }, { "graph_revision_id": "grv_3a1b…e4", "created_at": "…", "parent_revision_id": "grv_8f2d…c9", "is_pinned": false } ] } ``` 3. **Get a specific revision (snapshot)** ```http GET /api/graphs/{graph_id}/revisions/{graph_revision_id} Response: { "graph_id": "…", "graph_revision_id": "…", "created_at": "…", "parent_revision_id": null, "manifest": { … }, // optional: maybe not full content if large "provenance": { … } } ``` 4. **Get nodes/edges for a revision** ```http GET /api/graphs/{graph_id}/nodes?rev={graph_revision_id} GET /api/graphs/{graph_id}/edges?rev={graph_revision_id} ``` Behavior: * If `rev` is **omitted**, return the **latest_revision_id** for that `graph_id`. * If `rev` is **invalid or unknown**, return `404` (not fallback). 5. **Pin/unpin a revision (optional for v1)** ```http POST /api/graphs/{graph_id}/pin Body: { "graph_revision_id": "…" } DELETE /api/graphs/{graph_id}/pin Body: { "graph_revision_id": "…" } ``` ### 4.2. Backward compatibility * Existing endpoints like `GET /api/graphs/{graph_id}/nodes` should: * Continue working with no `rev` param. * Internally resolve to `latest_revision_id`. * For old records with no revision: * Create a synthetic manifest from current stored data. * Compute a `graph_revision_id`. * Store it and set `latest_revision_id` on the `graphs` row. --- ## 5. Scanner / Vexer / Upstream Pipelines **Goal:** At the end of a graph build, they produce a manifest and a `graph_revision_id`. ### 5.1. Responsibilities 1. **Scanner/Vexer**: * Gather: * Tool name/version * Input artifact hashes * Feature flags / config * Graph nodes/edges * Construct manifest (according to schema). * Compute `graph_revision_id` using shared library. * Send manifest + revision ID to Authority via an internal API (e.g., `POST /internal/graph-build-complete`). 2. **Authority**: * Idempotently upsert: * `graphs` (if new `graph_id`) * `graph_revisions` row (if `graph_revision_id` not yet present) * nodes/edges rows keyed by `graph_revision_id`. * Update `graphs.latest_revision_id` to the new revision. ### 5.2. Internal API (Authority) ```http POST /internal/graphs/{graph_id}/revisions Body: { "graph_revision_id": "…", "parent_revision_id": "…", // optional "manifest": { … }, "provenance": { … }, "nodes": [ … ], "edges": [ … ] } Response: 201 Created (or 200 if idempotent) ``` **Rules:** * If `graph_revision_id` already exists for that `graph_id` with identical `manifest_hash`, treat as **idempotent**. * If `graph_revision_id` exists but manifest hash differs → log and reject (bug in hashing). --- ## 6. Frontend / UX Changes Assuming a SPA (React/Vue/etc.), we’ll treat these as tasks. ### 6.1. URL & routing * **New canonical URL format** for graph UI: * Latest: `/graphs/{graph_id}` * Specific revision: `/graphs/{graph_id}?rev={graph_revision_id}` * Router: * Parse `rev` query param. * If present, call `GET /api/graphs/{graph_id}/nodes?rev=…`. * If not present, call same endpoint but without `rev` → backend returns latest. ### 6.2. Displaying revision info * In graph page header: * Show truncated revision: * `Rev: 8f2d…c9` * Buttons: * **Copy** → Copies full `graph_revision_id`. * **Share** → Copies full URL with `?rev=…`. * Optional chip if pinned: `Pinned`. **Example data model (TS):** ```ts type GraphRevisionSummary = { graphId: string; graphRevisionId: string; createdAt: string; parentRevisionId?: string | null; isPinned: boolean; }; ``` ### 6.3. Revision list panel (optional but useful) * Add a side panel or tab: “Revisions”. * Fetch from `GET /api/graphs/{graph_id}/revisions`. * Clicking a revision: * Navigates to same page with `?rev={graph_revision_id}`. * Preserves other UI state where reasonable. ### 6.4. Diff view (nice-to-have, can be v2) * UX: “Compare with…” button in header. * Opens dialog to pick a second revision. * Backend: add a diff endpoint later, or compute diff client-side from node/edge lists if feasible. --- ## 7. Migration Plan ### 7.1. Phase 1 – Schema & read-path ready 1. **Add DB columns/tables**: * `graphs`, `graph_revisions`, `graph_revision_ledger`. * `graph_revision_id` column to `call_graph_nodes` / `call_graph_edges`. 2. **Deploy with no behavior changes**: * Default `graph_revision_id` columns NULL. * Existing APIs continue to work. ### 7.2. Phase 2 – Backfill existing graphs 1. Write a **backfill job**: * For each distinct existing graph: * Build a manifest from existing stored data. * Compute `graph_revision_id`. * Insert into `graphs` & `graph_revisions`. * Update nodes/edges for that graph to set `graph_revision_id`. * Set `graphs.latest_revision_id`. 2. Log any graphs that can’t be backfilled (corrupt data, etc.) for manual review. 3. After backfill: * Add **NOT NULL** constraint on `graph_revision_id` for nodes/edges (if practical). * Ensure all public APIs can fetch revisions without changes from clients. ### 7.3. Phase 3 – Wire up new pipelines 1. Update Scanner/Vexer to construct manifests and compute revision IDs. 2. Update Authority to accept `/internal/graphs/{graph_id}/revisions`. 3. Gradually roll out: * Feature flag: `graphRevisionIdFromPipeline`. * For flagged runs, use the new pipeline; for others, fall back to old + synthetic revision. ### 7.4. Phase 4 – Frontend rollout 1. Update UI to: * Read `rev` from URL (but not required). * Show `Rev` in header. * Use revision-aware endpoints. 2. Once stable: * Update “Share” actions to always include `?rev=…`. --- ## 8. Testing Strategy ### 8.1. Unit tests * **Hashing library**: * Same manifest → same `graph_revision_id`. * Different node ordering → same `graph_revision_id`. * Tiny manifest change → different `graph_revision_id`. * **Authority service**: * Creating a revision stores `graph_revisions` + nodes/edges with matching `graph_revision_id`. * Duplicate revision (same id + manifest) is idempotent. * Conflicting manifest with same `graph_revision_id` is rejected. ### 8.2. Integration tests * Scenario: “Create graph → view in UI” * Pipeline produces manifest & revision. * Authority persists revision. * Ledger logs event. * UI shows matching `graph_revision_id`. * Scenario: “Stable permalinks” * Capture a link with `?rev=…`. * Rerun pipeline (new revision). * Old link still shows original nodes/edges. ### 8.3. Migration tests * On a sanitized snapshot: * Run migration & backfill. * Spot-check: * Each `graph_id` has exactly one `latest_revision_id`. * Node/edge counts before and after match. * Manually recompute hash for a few graphs and compare to stored `graph_revision_id`. --- ## 9. Security & Compliance Considerations * **Immutability guarantee**: * Don’t allow updates to `graph_revisions.manifest`. * Any change must happen by creating a new revision. * **Tombstoning** (for rare delete cases): * If you must “remove” a bad graph, mark revision as `tombstoned` in an additional column and return `410 Gone` for that `graph_revision_id`. * Never reuse that ID. * **Access control**: * Ensure revision APIs use the same ACLs as existing graph APIs. * Don’t leak manifests to users not allowed to see underlying artifacts. --- ## 10. Concrete Ticket Breakdown (example) You can copy/paste this into your tracker and tweak. 1. **BE-01** – Add `graphs` and `graph_revisions` tables * AC: * Tables exist with fields above. * Migrations run cleanly in staging. 2. **BE-02** – Add `graph_revision_id` to nodes/edges tables * AC: * Column added, nullable. * No runtime errors in staging. 3. **BE-03** – Implement `GraphRevisionIdGenerator` library * AC: * Given a manifest, returns deterministic ID. * Unit tests cover ordering, minimal changes. 4. **BE-04** – Implement `/internal/graphs/{graph_id}/revisions` in Authority * AC: * Stores new revision + nodes/edges. * Idempotent on duplicate revisions. 5. **BE-05** – Implement public revision APIs * AC: * Endpoints in §4.1 available with Swagger. * `rev` query param supported. * Default behavior returns latest revision. 6. **BE-06** – Backfill existing graphs into `graph_revisions` * AC: * All existing graphs have `latest_revision_id`. * Nodes/edges linked to a `graph_revision_id`. * Metrics & logs generated for failures. 7. **BE-07** – Ledger integration for revisions * AC: * Each new revision creates a ledger entry. * Query by `graph_revision_id` works. 8. **PIPE-01** – Scanner/Vexer manifest construction * AC: * Manifest includes all required fields. * Values verified against Authority for a sample run. 9. **PIPE-02** – Scanner/Vexer computes `graph_revision_id` and calls Authority * AC: * End-to-end pipeline run produces a new `graph_revision_id`. * Authority stores it and sets as latest. 10. **FE-01** – UI supports `?rev=` param and displays revision * AC: * When URL has `rev`, UI loads that revision. * When no `rev`, loads latest. * Rev appears in header with copy/share. 11. **FE-02** – Revision list UI (optional) * AC: * Revision panel lists revisions. * Click navigates to appropriate `?rev=`. --- If you’d like, I can next help you turn this into a very explicit design doc (with diagrams and exact JSON examples) or into ready-to-paste migration scripts / TypeScript interfaces tailored to your actual stack.