up

2025-11-27 15:05:48 +02:00
parent 4831c7fcb0
commit e950474a77
278 changed files with 81498 additions and 672 deletions
--- a/docs/product-advisories/26-Nov-2025
+++ b/docs/product-advisories/26-Nov-2025
@@ -0,0 +1,654 @@
+Here’s a small but high‑impact product tweak: **add an immutable `graph_revision_id` to every call‑graph page and API link**, so any result is citeable and reproducible across time.
+
+---
+
+### Why it matters (quick)
+
+* **Auditability:** you can prove *which* graph produced a finding.
+* **Reproducibility:** reruns that change paths won’t “move the goalposts.”
+* **Support & docs:** screenshots/links in tickets point to an exact graph state.
+
+### What to add
+
+* **Stable anchor in all URLs:**
+  `https://…/graphs/{graph_id}?rev={graph_revision_id}`
+  `https://…/api/graphs/{graph_id}/nodes?rev={graph_revision_id}`
+* **Opaque, content‑addressed ID:** e.g., `graph_revision_id = blake3( sorted_edges + cfg + tool_versions + dataset_hashes )`.
+* **First‑class fields:** store `graph_id` (logical lineage), `graph_revision_id` (immutable), `parent_revision_id` (if derived), `created_at`, `provenance` (feed hashes, toolchain).
+* **UI surfacing:** show a copy‑button “Rev: 8f2d…c9” on graph pages and in the “Share” dialog.
+* **Diff affordance:** when `?rev=A` and `?rev=B` are both present, offer “Compare paths (A↔B).”
+
+### Minimal API contract (suggested)
+
+* `GET /api/graphs/{graph_id}` → latest + `latest_revision_id`
+* `GET /api/graphs/{graph_id}/revisions/{graph_revision_id}` → immutable snapshot
+* `GET /api/graphs/{graph_id}/nodes?rev=…` and `/edges?rev=…`
+* `POST /api/graphs/{graph_id}/pin` with `{ graph_revision_id }` to mark “official”
+* HTTP `Link` header on all responses:
+  `Link: <…/graphs/{graph_id}/revisions/{graph_revision_id}>; rel="version"`
+
+### How to compute the revision id (deterministic)
+
+* Inputs (all normalized): sorted node/edge sets; build config; tool+model versions; input artifacts (SBOM/VEX/feed) **by hash**; environment knobs (feature flags).
+* Serialization: canonical JSON (UTF‑8, ordered keys).
+* Hash: BLAKE3/sha256 → base58/hex (shortened in UI, full in API).
+* Store alongside a manifest (so you can replay the graph later).
+
+### Guardrails
+
+* **Never reuse an ID** if any input bit differs.
+* **Do not** make it guessable from business data (avoid leaking repo names, paths).
+* **Break glass:** if a bad graph must be purged, keep the ID tombstoned (410 Gone) so references don’t silently change.
+
+### Stella Ops touches (concrete)
+
+* **Authority**: add `GraphRevisionManifest` (feeds, lattice/policy versions, scanners, in‑toto/DSSE attestations).
+* **Scanner/Vexer**: emit deterministic manifests and hand them to Authority for id derivation.
+* **Ledger**: record `(graph_id, graph_revision_id, manifest_hash, signatures)`; expose audit query by `graph_revision_id`.
+* **Docs & Support**: “Attach your `graph_revision_id`” line in issue templates.
+
+### Tiny UX copy
+
+* On graph page header: `Rev 8f2d…c9` • **Copy** • **Compare** • **Pin**
+* Share dialog: “This link freezes today’s state. New runs get a different rev.”
+
+If you want, I can draft the DB table, the manifest JSON schema, and the exact URL/router changes for your .NET 10 services next.
+Cool, let’s turn this into something your engineers can actually pick up and implement.
+
+Below is a concrete implementation plan broken down by phases, services, and tickets, with suggested data models, APIs, and tests.
+
+---
+
+## 0. Definitions (shared across teams)
+
+* **Graph ID (`graph_id`)** – Logical identifier for a call graph lineage (e.g., “the call graph for build X of repo Y”).
+* **Graph Revision ID (`graph_revision_id`)** – Immutable identifier for a specific snapshot of that graph, derived from a manifest (content-addressed hash).
+* **Parent Revision ID (`parent_revision_id`)** – Previous revision in the lineage (if any).
+* **Manifest** – Canonical JSON blob that describes *everything* that could affect graph structure or results:
+
+  * Nodes & edges
+  * Input feeds and their hashes (SBOM, VEX, scanner output, etc.)
+  * config/policies/feature flags
+  * tool + version (scanner, vexer, authority)
+
+---
+
+## 1. High-Level Architecture Changes
+
+1. **Introduce `graph_revision_id` as a first-class concept** in:
+
+   * Graph storage / Authority
+   * Ledger / audit
+   * Backend APIs serving call graphs
+2. **Derive `graph_revision_id` deterministically** from a manifest via a cryptographic hash.
+3. **Expose revision in all graph-related URLs & APIs**:
+
+   * UI: `…/graphs/{graph_id}?rev={graph_revision_id}`
+   * API: `…/api/graphs/{graph_id}/revisions/{graph_revision_id}`
+4. **Ensure immutability**: once a revision exists, it can never be updated in-place—only superseded by new revisions.
+
+---
+
+## 2. Backend: Data Model & Storage
+
+### 2.1. Authority (graph source of truth)
+
+**Goal:** Model graphs and revisions explicitly.
+
+**New / updated tables (example in SQL-ish form):**
+
+1. **Graphs (logical entity)**
+
+```sql
+CREATE TABLE graphs (
+  id                UUID PRIMARY KEY,
+  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  latest_revision_id VARCHAR(128) NULL, -- FK into graph_revisions.id
+  label             TEXT NULL,          -- optional human label
+  metadata          JSONB NULL
+);
+```
+
+2. **Graph Revisions (immutable snapshots)**
+
+```sql
+CREATE TABLE graph_revisions (
+  id                   VARCHAR(128) PRIMARY KEY, -- graph_revision_id (hash)
+  graph_id             UUID NOT NULL REFERENCES graphs(id),
+  parent_revision_id   VARCHAR(128) NULL REFERENCES graph_revisions(id),
+  created_at           TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  manifest             JSONB NOT NULL,           -- canonical manifest
+  provenance           JSONB NOT NULL,           -- tool versions, etc.
+  is_pinned            BOOLEAN NOT NULL DEFAULT FALSE,
+  pinned_by            UUID NULL,                -- user id
+  pinned_at            TIMESTAMPTZ NULL
+);
+CREATE INDEX idx_graph_revisions_graph_id ON graph_revisions(graph_id);
+```
+
+3. **Call Graph Data (if separate)**
+   If you store nodes/edges in separate tables, add a foreign key to `graph_revision_id`:
+
+```sql
+ALTER TABLE call_graph_nodes
+  ADD COLUMN graph_revision_id VARCHAR(128) NULL;
+
+ALTER TABLE call_graph_edges
+  ADD COLUMN graph_revision_id VARCHAR(128) NULL;
+```
+
+> **Rule:** Nodes/edges for a revision are **never mutated**; a new revision means new rows.
+
+---
+
+### 2.2. Ledger (audit trail)
+
+**Goal:** Every revision gets a ledger record for auditability.
+
+**Table change or new table:**
+
+```sql
+CREATE TABLE graph_revision_ledger (
+  id                   BIGSERIAL PRIMARY KEY,
+  graph_revision_id    VARCHAR(128) NOT NULL,
+  graph_id             UUID NOT NULL,
+  manifest_hash        VARCHAR(128) NOT NULL,
+  manifest_digest_algo TEXT NOT NULL,        -- e.g., "BLAKE3"
+  authority_signature  BYTEA NULL,           -- optional
+  created_at           TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_grl_revision ON graph_revision_ledger(graph_revision_id);
+```
+
+Ledger ingestion happens **after** a revision is stored in Authority, but **before** it is exposed as “current” in the UI.
+
+---
+
+## 3. Backend: Revision Hashing & Manifest
+
+### 3.1. Define the manifest schema
+
+Create a spec (e.g., JSON Schema) used by Scanner/Vexer/Authority.
+
+**Example structure:**
+
+```json
+{
+  "graph": {
+    "graph_id": "uuid",
+    "generator": {
+      "tool_name": "scanner",
+      "tool_version": "1.4.2",
+      "run_id": "some-run-id"
+    }
+  },
+  "inputs": {
+    "sbom_hash": "sha256:…",
+    "vex_hash": "sha256:…",
+    "repos": [
+      {
+        "name": "repo-a",
+        "commit": "abc123",
+        "tree_hash": "sha1:…"
+      }
+    ]
+  },
+  "config": {
+    "policy_version": "2024-10-01",
+    "feature_flags": {
+      "new_vex_engine": true
+    }
+  },
+  "graph_content": {
+    "nodes": [
+      // nodes in canonical sorted order
+    ],
+    "edges": [
+      // edges in canonical sorted order
+    ]
+  }
+}
+```
+
+**Key requirements:**
+
+* All lists that affect the graph (`nodes`, `edges`, `repos`, etc.) must be **sorted deterministically**.
+* Keys must be **stable** (no environment-dependent keys, no random IDs).
+* All hashes of input artifacts must be included (not raw content).
+
+### 3.2. Hash computation
+
+Language-agnostic algorithm:
+
+1. Normalize manifest to **canonical JSON**:
+
+   * UTF-8
+   * Sorted keys
+   * No extra whitespace
+2. Hash the bytes using a cryptographic hash (BLAKE3 or SHA-256).
+3. Encode as hex or base58 string.
+
+**Pseudocode:**
+
+```pseudo
+function compute_graph_revision_id(manifest):
+    canonical_json = canonical_json_encode(manifest) // sorted keys
+    digest_bytes = BLAKE3(canonical_json)
+    digest_hex = hex_encode(digest_bytes)
+    return "grv_" + digest_hex[0:40]   // prefix + shorten for UI
+```
+
+**Ticket:** Implement `GraphRevisionIdGenerator` library (shared):
+
+* `Compute(manifest) -> graph_revision_id`
+* `ValidateFormat(graph_revision_id) -> bool`
+
+Make this a **shared library** across Scanner, Vexer, Authority to avoid divergence.
+
+---
+
+## 4. Backend: APIs
+
+### 4.1. Graphs & revisions REST API
+
+**New endpoints (example):**
+
+1. **Get latest graph revision**
+
+```http
+GET /api/graphs/{graph_id}
+Response:
+{
+  "graph_id": "…",
+  "latest_revision_id": "grv_8f2d…c9",
+  "created_at": "…",
+  "metadata": { … }
+}
+```
+
+2. **List revisions for a graph**
+
+```http
+GET /api/graphs/{graph_id}/revisions
+Query: ?page=1&pageSize=20
+Response:
+{
+  "graph_id": "…",
+  "items": [
+    {
+      "graph_revision_id": "grv_8f2d…c9",
+      "created_at": "…",
+      "parent_revision_id": null,
+      "is_pinned": true
+    },
+    {
+      "graph_revision_id": "grv_3a1b…e4",
+      "created_at": "…",
+      "parent_revision_id": "grv_8f2d…c9",
+      "is_pinned": false
+    }
+  ]
+}
+```
+
+3. **Get a specific revision (snapshot)**
+
+```http
+GET /api/graphs/{graph_id}/revisions/{graph_revision_id}
+Response:
+{
+  "graph_id": "…",
+  "graph_revision_id": "…",
+  "created_at": "…",
+  "parent_revision_id": null,
+  "manifest": { … },        // optional: maybe not full content if large
+  "provenance": { … }
+}
+```
+
+4. **Get nodes/edges for a revision**
+
+```http
+GET /api/graphs/{graph_id}/nodes?rev={graph_revision_id}
+GET /api/graphs/{graph_id}/edges?rev={graph_revision_id}
+```
+
+Behavior:
+
+* If `rev` is **omitted**, return the **latest_revision_id** for that `graph_id`.
+* If `rev` is **invalid or unknown**, return `404` (not fallback).
+
+5. **Pin/unpin a revision (optional for v1)**
+
+```http
+POST /api/graphs/{graph_id}/pin
+Body: { "graph_revision_id": "…" }
+
+DELETE /api/graphs/{graph_id}/pin
+Body: { "graph_revision_id": "…" }
+```
+
+### 4.2. Backward compatibility
+
+* Existing endpoints like `GET /api/graphs/{graph_id}/nodes` should:
+
+  * Continue working with no `rev` param.
+  * Internally resolve to `latest_revision_id`.
+* For old records with no revision:
+
+  * Create a synthetic manifest from current stored data.
+  * Compute a `graph_revision_id`.
+  * Store it and set `latest_revision_id` on the `graphs` row.
+
+---
+
+## 5. Scanner / Vexer / Upstream Pipelines
+
+**Goal:** At the end of a graph build, they produce a manifest and a `graph_revision_id`.
+
+### 5.1. Responsibilities
+
+1. **Scanner/Vexer**:
+
+   * Gather:
+
+     * Tool name/version
+     * Input artifact hashes
+     * Feature flags / config
+     * Graph nodes/edges
+   * Construct manifest (according to schema).
+   * Compute `graph_revision_id` using shared library.
+   * Send manifest + revision ID to Authority via an internal API (e.g., `POST /internal/graph-build-complete`).
+
+2. **Authority**:
+
+   * Idempotently upsert:
+
+     * `graphs` (if new `graph_id`)
+     * `graph_revisions` row (if `graph_revision_id` not yet present)
+     * nodes/edges rows keyed by `graph_revision_id`.
+   * Update `graphs.latest_revision_id` to the new revision.
+
+### 5.2. Internal API (Authority)
+
+```http
+POST /internal/graphs/{graph_id}/revisions
+Body:
+{
+  "graph_revision_id": "…",
+  "parent_revision_id": "…",         // optional
+  "manifest": { … },
+  "provenance": { … },
+  "nodes": [ … ],
+  "edges": [ … ]
+}
+Response: 201 Created (or 200 if idempotent)
+```
+
+**Rules:**
+
+* If `graph_revision_id` already exists for that `graph_id` with identical `manifest_hash`, treat as **idempotent**.
+* If `graph_revision_id` exists but manifest hash differs → log and reject (bug in hashing).
+
+---
+
+## 6. Frontend / UX Changes
+
+Assuming a SPA (React/Vue/etc.), we’ll treat these as tasks.
+
+### 6.1. URL & routing
+
+* **New canonical URL format** for graph UI:
+
+  * Latest: `/graphs/{graph_id}`
+  * Specific revision: `/graphs/{graph_id}?rev={graph_revision_id}`
+
+* Router:
+
+  * Parse `rev` query param.
+  * If present, call `GET /api/graphs/{graph_id}/nodes?rev=…`.
+  * If not present, call same endpoint but without `rev` → backend returns latest.
+
+### 6.2. Displaying revision info
+
+* In graph page header:
+
+  * Show truncated revision:
+
+    * `Rev: 8f2d…c9`
+  * Buttons:
+
+    * **Copy** → Copies full `graph_revision_id`.
+    * **Share** → Copies full URL with `?rev=…`.
+  * Optional chip if pinned: `Pinned`.
+
+**Example data model (TS):**
+
+```ts
+type GraphRevisionSummary = {
+  graphId: string;
+  graphRevisionId: string;
+  createdAt: string;
+  parentRevisionId?: string | null;
+  isPinned: boolean;
+};
+```
+
+### 6.3. Revision list panel (optional but useful)
+
+* Add a side panel or tab: “Revisions”.
+* Fetch from `GET /api/graphs/{graph_id}/revisions`.
+* Clicking a revision:
+
+  * Navigates to same page with `?rev={graph_revision_id}`.
+  * Preserves other UI state where reasonable.
+
+### 6.4. Diff view (nice-to-have, can be v2)
+
+* UX: “Compare with…” button in header.
+
+  * Opens dialog to pick a second revision.
+* Backend: add a diff endpoint later, or compute diff client-side from node/edge lists if feasible.
+
+---
+
+## 7. Migration Plan
+
+### 7.1. Phase 1 – Schema & read-path ready
+
+1. **Add DB columns/tables**:
+
+   * `graphs`, `graph_revisions`, `graph_revision_ledger`.
+   * `graph_revision_id` column to `call_graph_nodes` / `call_graph_edges`.
+2. **Deploy with no behavior changes**:
+
+   * Default `graph_revision_id` columns NULL.
+   * Existing APIs continue to work.
+
+### 7.2. Phase 2 – Backfill existing graphs
+
+1. Write a **backfill job**:
+
+   * For each distinct existing graph:
+
+     * Build a manifest from existing stored data.
+     * Compute `graph_revision_id`.
+     * Insert into `graphs` & `graph_revisions`.
+     * Update nodes/edges for that graph to set `graph_revision_id`.
+     * Set `graphs.latest_revision_id`.
+
+2. Log any graphs that can’t be backfilled (corrupt data, etc.) for manual review.
+
+3. After backfill:
+
+   * Add **NOT NULL** constraint on `graph_revision_id` for nodes/edges (if practical).
+   * Ensure all public APIs can fetch revisions without changes from clients.
+
+### 7.3. Phase 3 – Wire up new pipelines
+
+1. Update Scanner/Vexer to construct manifests and compute revision IDs.
+2. Update Authority to accept `/internal/graphs/{graph_id}/revisions`.
+3. Gradually roll out:
+
+   * Feature flag: `graphRevisionIdFromPipeline`.
+   * For flagged runs, use the new pipeline; for others, fall back to old + synthetic revision.
+
+### 7.4. Phase 4 – Frontend rollout
+
+1. Update UI to:
+
+   * Read `rev` from URL (but not required).
+   * Show `Rev` in header.
+   * Use revision-aware endpoints.
+2. Once stable:
+
+   * Update “Share” actions to always include `?rev=…`.
+
+---
+
+## 8. Testing Strategy
+
+### 8.1. Unit tests
+
+* **Hashing library**:
+
+  * Same manifest → same `graph_revision_id`.
+  * Different node ordering → same `graph_revision_id`.
+  * Tiny manifest change → different `graph_revision_id`.
+* **Authority service**:
+
+  * Creating a revision stores `graph_revisions` + nodes/edges with matching `graph_revision_id`.
+  * Duplicate revision (same id + manifest) is idempotent.
+  * Conflicting manifest with same `graph_revision_id` is rejected.
+
+### 8.2. Integration tests
+
+* Scenario: “Create graph → view in UI”
+
+  * Pipeline produces manifest & revision.
+  * Authority persists revision.
+  * Ledger logs event.
+  * UI shows matching `graph_revision_id`.
+* Scenario: “Stable permalinks”
+
+  * Capture a link with `?rev=…`.
+  * Rerun pipeline (new revision).
+  * Old link still shows original nodes/edges.
+
+### 8.3. Migration tests
+
+* On a sanitized snapshot:
+
+  * Run migration & backfill.
+  * Spot-check:
+
+    * Each `graph_id` has exactly one `latest_revision_id`.
+    * Node/edge counts before and after match.
+    * Manually recompute hash for a few graphs and compare to stored `graph_revision_id`.
+
+---
+
+## 9. Security & Compliance Considerations
+
+* **Immutability guarantee**:
+
+  * Don’t allow updates to `graph_revisions.manifest`.
+  * Any change must happen by creating a new revision.
+* **Tombstoning** (for rare delete cases):
+
+  * If you must “remove” a bad graph, mark revision as `tombstoned` in an additional column and return `410 Gone` for that `graph_revision_id`.
+  * Never reuse that ID.
+* **Access control**:
+
+  * Ensure revision APIs use the same ACLs as existing graph APIs.
+  * Don’t leak manifests to users not allowed to see underlying artifacts.
+
+---
+
+## 10. Concrete Ticket Breakdown (example)
+
+You can copy/paste this into your tracker and tweak.
+
+1. **BE-01** – Add `graphs` and `graph_revisions` tables
+
+   * AC:
+
+     * Tables exist with fields above.
+     * Migrations run cleanly in staging.
+
+2. **BE-02** – Add `graph_revision_id` to nodes/edges tables
+
+   * AC:
+
+     * Column added, nullable.
+     * No runtime errors in staging.
+
+3. **BE-03** – Implement `GraphRevisionIdGenerator` library
+
+   * AC:
+
+     * Given a manifest, returns deterministic ID.
+     * Unit tests cover ordering, minimal changes.
+
+4. **BE-04** – Implement `/internal/graphs/{graph_id}/revisions` in Authority
+
+   * AC:
+
+     * Stores new revision + nodes/edges.
+     * Idempotent on duplicate revisions.
+
+5. **BE-05** – Implement public revision APIs
+
+   * AC:
+
+     * Endpoints in §4.1 available with Swagger.
+     * `rev` query param supported.
+     * Default behavior returns latest revision.
+
+6. **BE-06** – Backfill existing graphs into `graph_revisions`
+
+   * AC:
+
+     * All existing graphs have `latest_revision_id`.
+     * Nodes/edges linked to a `graph_revision_id`.
+     * Metrics & logs generated for failures.
+
+7. **BE-07** – Ledger integration for revisions
+
+   * AC:
+
+     * Each new revision creates a ledger entry.
+     * Query by `graph_revision_id` works.
+
+8. **PIPE-01** – Scanner/Vexer manifest construction
+
+   * AC:
+
+     * Manifest includes all required fields.
+     * Values verified against Authority for a sample run.
+
+9. **PIPE-02** – Scanner/Vexer computes `graph_revision_id` and calls Authority
+
+   * AC:
+
+     * End-to-end pipeline run produces a new `graph_revision_id`.
+     * Authority stores it and sets as latest.
+
+10. **FE-01** – UI supports `?rev=` param and displays revision
+
+    * AC:
+
+      * When URL has `rev`, UI loads that revision.
+      * When no `rev`, loads latest.
+      * Rev appears in header with copy/share.
+
+11. **FE-02** – Revision list UI (optional)
+
+    * AC:
+
+      * Revision panel lists revisions.
+      * Click navigates to appropriate `?rev=`.
+
+---
+
+If you’d like, I can next help you turn this into a very explicit design doc (with diagrams and exact JSON examples) or into ready-to-paste migration scripts / TypeScript interfaces tailored to your actual stack.