Files
git.stella-ops.org/docs/product-advisories/26-Nov-2025 - Use Graph Revision IDs as Public Trust Anchors.md
master e950474a77
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
up
2025-11-27 15:16:31 +02:00

19 KiB
Raw Blame History

Heres a small but highimpact product tweak: add an immutable graph_revision_id to every callgraph page and API link, so any result is citeable and reproducible across time.


Why it matters (quick)

  • Auditability: you can prove which graph produced a finding.
  • Reproducibility: reruns that change paths wont “move the goalposts.”
  • Support & docs: screenshots/links in tickets point to an exact graph state.

What to add

  • Stable anchor in all URLs: https://…/graphs/{graph_id}?rev={graph_revision_id} https://…/api/graphs/{graph_id}/nodes?rev={graph_revision_id}
  • Opaque, contentaddressed ID: e.g., graph_revision_id = blake3( sorted_edges + cfg + tool_versions + dataset_hashes ).
  • Firstclass fields: store graph_id (logical lineage), graph_revision_id (immutable), parent_revision_id (if derived), created_at, provenance (feed hashes, toolchain).
  • UI surfacing: show a copybutton “Rev: 8f2d…c9” on graph pages and in the “Share” dialog.
  • Diff affordance: when ?rev=A and ?rev=B are both present, offer “Compare paths (A↔B).”

Minimal API contract (suggested)

  • GET /api/graphs/{graph_id} → latest + latest_revision_id
  • GET /api/graphs/{graph_id}/revisions/{graph_revision_id} → immutable snapshot
  • GET /api/graphs/{graph_id}/nodes?rev=… and /edges?rev=…
  • POST /api/graphs/{graph_id}/pin with { graph_revision_id } to mark “official”
  • HTTP Link header on all responses: Link: <…/graphs/{graph_id}/revisions/{graph_revision_id}>; rel="version"

How to compute the revision id (deterministic)

  • Inputs (all normalized): sorted node/edge sets; build config; tool+model versions; input artifacts (SBOM/VEX/feed) by hash; environment knobs (feature flags).
  • Serialization: canonical JSON (UTF8, ordered keys).
  • Hash: BLAKE3/sha256 → base58/hex (shortened in UI, full in API).
  • Store alongside a manifest (so you can replay the graph later).

Guardrails

  • Never reuse an ID if any input bit differs.
  • Do not make it guessable from business data (avoid leaking repo names, paths).
  • Break glass: if a bad graph must be purged, keep the ID tombstoned (410 Gone) so references dont silently change.

StellaOps touches (concrete)

  • Authority: add GraphRevisionManifest (feeds, lattice/policy versions, scanners, intoto/DSSE attestations).
  • Scanner/Vexer: emit deterministic manifests and hand them to Authority for id derivation.
  • Ledger: record (graph_id, graph_revision_id, manifest_hash, signatures); expose audit query by graph_revision_id.
  • Docs & Support: “Attach your graph_revision_id” line in issue templates.

Tiny UX copy

  • On graph page header: Rev 8f2d…c9CopyComparePin
  • Share dialog: “This link freezes todays state. New runs get a different rev.”

If you want, I can draft the DB table, the manifest JSON schema, and the exact URL/router changes for your .NET 10 services next. Cool, lets turn this into something your engineers can actually pick up and implement.

Below is a concrete implementation plan broken down by phases, services, and tickets, with suggested data models, APIs, and tests.


0. Definitions (shared across teams)

  • Graph ID (graph_id) Logical identifier for a call graph lineage (e.g., “the call graph for build X of repo Y”).

  • Graph Revision ID (graph_revision_id) Immutable identifier for a specific snapshot of that graph, derived from a manifest (content-addressed hash).

  • Parent Revision ID (parent_revision_id) Previous revision in the lineage (if any).

  • Manifest Canonical JSON blob that describes everything that could affect graph structure or results:

    • Nodes & edges
    • Input feeds and their hashes (SBOM, VEX, scanner output, etc.)
    • config/policies/feature flags
    • tool + version (scanner, vexer, authority)

1. High-Level Architecture Changes

  1. Introduce graph_revision_id as a first-class concept in:

    • Graph storage / Authority
    • Ledger / audit
    • Backend APIs serving call graphs
  2. Derive graph_revision_id deterministically from a manifest via a cryptographic hash.

  3. Expose revision in all graph-related URLs & APIs:

    • UI: …/graphs/{graph_id}?rev={graph_revision_id}
    • API: …/api/graphs/{graph_id}/revisions/{graph_revision_id}
  4. Ensure immutability: once a revision exists, it can never be updated in-place—only superseded by new revisions.


2. Backend: Data Model & Storage

2.1. Authority (graph source of truth)

Goal: Model graphs and revisions explicitly.

New / updated tables (example in SQL-ish form):

  1. Graphs (logical entity)
CREATE TABLE graphs (
  id                UUID PRIMARY KEY,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  latest_revision_id VARCHAR(128) NULL, -- FK into graph_revisions.id
  label             TEXT NULL,          -- optional human label
  metadata          JSONB NULL
);
  1. Graph Revisions (immutable snapshots)
CREATE TABLE graph_revisions (
  id                   VARCHAR(128) PRIMARY KEY, -- graph_revision_id (hash)
  graph_id             UUID NOT NULL REFERENCES graphs(id),
  parent_revision_id   VARCHAR(128) NULL REFERENCES graph_revisions(id),
  created_at           TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  manifest             JSONB NOT NULL,           -- canonical manifest
  provenance           JSONB NOT NULL,           -- tool versions, etc.
  is_pinned            BOOLEAN NOT NULL DEFAULT FALSE,
  pinned_by            UUID NULL,                -- user id
  pinned_at            TIMESTAMPTZ NULL
);
CREATE INDEX idx_graph_revisions_graph_id ON graph_revisions(graph_id);
  1. Call Graph Data (if separate) If you store nodes/edges in separate tables, add a foreign key to graph_revision_id:
ALTER TABLE call_graph_nodes
  ADD COLUMN graph_revision_id VARCHAR(128) NULL;

ALTER TABLE call_graph_edges
  ADD COLUMN graph_revision_id VARCHAR(128) NULL;

Rule: Nodes/edges for a revision are never mutated; a new revision means new rows.


2.2. Ledger (audit trail)

Goal: Every revision gets a ledger record for auditability.

Table change or new table:

CREATE TABLE graph_revision_ledger (
  id                   BIGSERIAL PRIMARY KEY,
  graph_revision_id    VARCHAR(128) NOT NULL,
  graph_id             UUID NOT NULL,
  manifest_hash        VARCHAR(128) NOT NULL,
  manifest_digest_algo TEXT NOT NULL,        -- e.g., "BLAKE3"
  authority_signature  BYTEA NULL,           -- optional
  created_at           TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_grl_revision ON graph_revision_ledger(graph_revision_id);

Ledger ingestion happens after a revision is stored in Authority, but before it is exposed as “current” in the UI.


3. Backend: Revision Hashing & Manifest

3.1. Define the manifest schema

Create a spec (e.g., JSON Schema) used by Scanner/Vexer/Authority.

Example structure:

{
  "graph": {
    "graph_id": "uuid",
    "generator": {
      "tool_name": "scanner",
      "tool_version": "1.4.2",
      "run_id": "some-run-id"
    }
  },
  "inputs": {
    "sbom_hash": "sha256:…",
    "vex_hash": "sha256:…",
    "repos": [
      {
        "name": "repo-a",
        "commit": "abc123",
        "tree_hash": "sha1:…"
      }
    ]
  },
  "config": {
    "policy_version": "2024-10-01",
    "feature_flags": {
      "new_vex_engine": true
    }
  },
  "graph_content": {
    "nodes": [
      // nodes in canonical sorted order
    ],
    "edges": [
      // edges in canonical sorted order
    ]
  }
}

Key requirements:

  • All lists that affect the graph (nodes, edges, repos, etc.) must be sorted deterministically.
  • Keys must be stable (no environment-dependent keys, no random IDs).
  • All hashes of input artifacts must be included (not raw content).

3.2. Hash computation

Language-agnostic algorithm:

  1. Normalize manifest to canonical JSON:

    • UTF-8
    • Sorted keys
    • No extra whitespace
  2. Hash the bytes using a cryptographic hash (BLAKE3 or SHA-256).

  3. Encode as hex or base58 string.

Pseudocode:

function compute_graph_revision_id(manifest):
    canonical_json = canonical_json_encode(manifest) // sorted keys
    digest_bytes = BLAKE3(canonical_json)
    digest_hex = hex_encode(digest_bytes)
    return "grv_" + digest_hex[0:40]   // prefix + shorten for UI

Ticket: Implement GraphRevisionIdGenerator library (shared):

  • Compute(manifest) -> graph_revision_id
  • ValidateFormat(graph_revision_id) -> bool

Make this a shared library across Scanner, Vexer, Authority to avoid divergence.


4. Backend: APIs

4.1. Graphs & revisions REST API

New endpoints (example):

  1. Get latest graph revision
GET /api/graphs/{graph_id}
Response:
{
  "graph_id": "…",
  "latest_revision_id": "grv_8f2d…c9",
  "created_at": "…",
  "metadata": { … }
}
  1. List revisions for a graph
GET /api/graphs/{graph_id}/revisions
Query: ?page=1&pageSize=20
Response:
{
  "graph_id": "…",
  "items": [
    {
      "graph_revision_id": "grv_8f2d…c9",
      "created_at": "…",
      "parent_revision_id": null,
      "is_pinned": true
    },
    {
      "graph_revision_id": "grv_3a1b…e4",
      "created_at": "…",
      "parent_revision_id": "grv_8f2d…c9",
      "is_pinned": false
    }
  ]
}
  1. Get a specific revision (snapshot)
GET /api/graphs/{graph_id}/revisions/{graph_revision_id}
Response:
{
  "graph_id": "…",
  "graph_revision_id": "…",
  "created_at": "…",
  "parent_revision_id": null,
  "manifest": { … },        // optional: maybe not full content if large
  "provenance": { … }
}
  1. Get nodes/edges for a revision
GET /api/graphs/{graph_id}/nodes?rev={graph_revision_id}
GET /api/graphs/{graph_id}/edges?rev={graph_revision_id}

Behavior:

  • If rev is omitted, return the latest_revision_id for that graph_id.
  • If rev is invalid or unknown, return 404 (not fallback).
  1. Pin/unpin a revision (optional for v1)
POST /api/graphs/{graph_id}/pin
Body: { "graph_revision_id": "…" }

DELETE /api/graphs/{graph_id}/pin
Body: { "graph_revision_id": "…" }

4.2. Backward compatibility

  • Existing endpoints like GET /api/graphs/{graph_id}/nodes should:

    • Continue working with no rev param.
    • Internally resolve to latest_revision_id.
  • For old records with no revision:

    • Create a synthetic manifest from current stored data.
    • Compute a graph_revision_id.
    • Store it and set latest_revision_id on the graphs row.

5. Scanner / Vexer / Upstream Pipelines

Goal: At the end of a graph build, they produce a manifest and a graph_revision_id.

5.1. Responsibilities

  1. Scanner/Vexer:

    • Gather:

      • Tool name/version
      • Input artifact hashes
      • Feature flags / config
      • Graph nodes/edges
    • Construct manifest (according to schema).

    • Compute graph_revision_id using shared library.

    • Send manifest + revision ID to Authority via an internal API (e.g., POST /internal/graph-build-complete).

  2. Authority:

    • Idempotently upsert:

      • graphs (if new graph_id)
      • graph_revisions row (if graph_revision_id not yet present)
      • nodes/edges rows keyed by graph_revision_id.
    • Update graphs.latest_revision_id to the new revision.

5.2. Internal API (Authority)

POST /internal/graphs/{graph_id}/revisions
Body:
{
  "graph_revision_id": "…",
  "parent_revision_id": "…",         // optional
  "manifest": { … },
  "provenance": { … },
  "nodes": [ … ],
  "edges": [ … ]
}
Response: 201 Created (or 200 if idempotent)

Rules:

  • If graph_revision_id already exists for that graph_id with identical manifest_hash, treat as idempotent.
  • If graph_revision_id exists but manifest hash differs → log and reject (bug in hashing).

6. Frontend / UX Changes

Assuming a SPA (React/Vue/etc.), well treat these as tasks.

6.1. URL & routing

  • New canonical URL format for graph UI:

    • Latest: /graphs/{graph_id}
    • Specific revision: /graphs/{graph_id}?rev={graph_revision_id}
  • Router:

    • Parse rev query param.
    • If present, call GET /api/graphs/{graph_id}/nodes?rev=….
    • If not present, call same endpoint but without rev → backend returns latest.

6.2. Displaying revision info

  • In graph page header:

    • Show truncated revision:

      • Rev: 8f2d…c9
    • Buttons:

      • Copy → Copies full graph_revision_id.
      • Share → Copies full URL with ?rev=….
    • Optional chip if pinned: Pinned.

Example data model (TS):

type GraphRevisionSummary = {
  graphId: string;
  graphRevisionId: string;
  createdAt: string;
  parentRevisionId?: string | null;
  isPinned: boolean;
};

6.3. Revision list panel (optional but useful)

  • Add a side panel or tab: “Revisions”.

  • Fetch from GET /api/graphs/{graph_id}/revisions.

  • Clicking a revision:

    • Navigates to same page with ?rev={graph_revision_id}.
    • Preserves other UI state where reasonable.

6.4. Diff view (nice-to-have, can be v2)

  • UX: “Compare with…” button in header.

    • Opens dialog to pick a second revision.
  • Backend: add a diff endpoint later, or compute diff client-side from node/edge lists if feasible.


7. Migration Plan

7.1. Phase 1 Schema & read-path ready

  1. Add DB columns/tables:

    • graphs, graph_revisions, graph_revision_ledger.
    • graph_revision_id column to call_graph_nodes / call_graph_edges.
  2. Deploy with no behavior changes:

    • Default graph_revision_id columns NULL.
    • Existing APIs continue to work.

7.2. Phase 2 Backfill existing graphs

  1. Write a backfill job:

    • For each distinct existing graph:

      • Build a manifest from existing stored data.
      • Compute graph_revision_id.
      • Insert into graphs & graph_revisions.
      • Update nodes/edges for that graph to set graph_revision_id.
      • Set graphs.latest_revision_id.
  2. Log any graphs that cant be backfilled (corrupt data, etc.) for manual review.

  3. After backfill:

    • Add NOT NULL constraint on graph_revision_id for nodes/edges (if practical).
    • Ensure all public APIs can fetch revisions without changes from clients.

7.3. Phase 3 Wire up new pipelines

  1. Update Scanner/Vexer to construct manifests and compute revision IDs.

  2. Update Authority to accept /internal/graphs/{graph_id}/revisions.

  3. Gradually roll out:

    • Feature flag: graphRevisionIdFromPipeline.
    • For flagged runs, use the new pipeline; for others, fall back to old + synthetic revision.

7.4. Phase 4 Frontend rollout

  1. Update UI to:

    • Read rev from URL (but not required).
    • Show Rev in header.
    • Use revision-aware endpoints.
  2. Once stable:

    • Update “Share” actions to always include ?rev=….

8. Testing Strategy

8.1. Unit tests

  • Hashing library:

    • Same manifest → same graph_revision_id.
    • Different node ordering → same graph_revision_id.
    • Tiny manifest change → different graph_revision_id.
  • Authority service:

    • Creating a revision stores graph_revisions + nodes/edges with matching graph_revision_id.
    • Duplicate revision (same id + manifest) is idempotent.
    • Conflicting manifest with same graph_revision_id is rejected.

8.2. Integration tests

  • Scenario: “Create graph → view in UI”

    • Pipeline produces manifest & revision.
    • Authority persists revision.
    • Ledger logs event.
    • UI shows matching graph_revision_id.
  • Scenario: “Stable permalinks”

    • Capture a link with ?rev=….
    • Rerun pipeline (new revision).
    • Old link still shows original nodes/edges.

8.3. Migration tests

  • On a sanitized snapshot:

    • Run migration & backfill.

    • Spot-check:

      • Each graph_id has exactly one latest_revision_id.
      • Node/edge counts before and after match.
      • Manually recompute hash for a few graphs and compare to stored graph_revision_id.

9. Security & Compliance Considerations

  • Immutability guarantee:

    • Dont allow updates to graph_revisions.manifest.
    • Any change must happen by creating a new revision.
  • Tombstoning (for rare delete cases):

    • If you must “remove” a bad graph, mark revision as tombstoned in an additional column and return 410 Gone for that graph_revision_id.
    • Never reuse that ID.
  • Access control:

    • Ensure revision APIs use the same ACLs as existing graph APIs.
    • Dont leak manifests to users not allowed to see underlying artifacts.

10. Concrete Ticket Breakdown (example)

You can copy/paste this into your tracker and tweak.

  1. BE-01 Add graphs and graph_revisions tables

    • AC:

      • Tables exist with fields above.
      • Migrations run cleanly in staging.
  2. BE-02 Add graph_revision_id to nodes/edges tables

    • AC:

      • Column added, nullable.
      • No runtime errors in staging.
  3. BE-03 Implement GraphRevisionIdGenerator library

    • AC:

      • Given a manifest, returns deterministic ID.
      • Unit tests cover ordering, minimal changes.
  4. BE-04 Implement /internal/graphs/{graph_id}/revisions in Authority

    • AC:

      • Stores new revision + nodes/edges.
      • Idempotent on duplicate revisions.
  5. BE-05 Implement public revision APIs

    • AC:

      • Endpoints in §4.1 available with Swagger.
      • rev query param supported.
      • Default behavior returns latest revision.
  6. BE-06 Backfill existing graphs into graph_revisions

    • AC:

      • All existing graphs have latest_revision_id.
      • Nodes/edges linked to a graph_revision_id.
      • Metrics & logs generated for failures.
  7. BE-07 Ledger integration for revisions

    • AC:

      • Each new revision creates a ledger entry.
      • Query by graph_revision_id works.
  8. PIPE-01 Scanner/Vexer manifest construction

    • AC:

      • Manifest includes all required fields.
      • Values verified against Authority for a sample run.
  9. PIPE-02 Scanner/Vexer computes graph_revision_id and calls Authority

    • AC:

      • End-to-end pipeline run produces a new graph_revision_id.
      • Authority stores it and sets as latest.
  10. FE-01 UI supports ?rev= param and displays revision

    • AC:

      • When URL has rev, UI loads that revision.
      • When no rev, loads latest.
      • Rev appears in header with copy/share.
  11. FE-02 Revision list UI (optional)

    • AC:

      • Revision panel lists revisions.
      • Click navigates to appropriate ?rev=.

If youd like, I can next help you turn this into a very explicit design doc (with diagrams and exact JSON examples) or into ready-to-paste migration scripts / TypeScript interfaces tailored to your actual stack.