19 KiB
Here’s a small but high‑impact product tweak: add an immutable graph_revision_id to every call‑graph page and API link, so any result is citeable and reproducible across time.
Why it matters (quick)
- Auditability: you can prove which graph produced a finding.
- Reproducibility: reruns that change paths won’t “move the goalposts.”
- Support & docs: screenshots/links in tickets point to an exact graph state.
What to add
- Stable anchor in all URLs:
https://…/graphs/{graph_id}?rev={graph_revision_id}https://…/api/graphs/{graph_id}/nodes?rev={graph_revision_id} - Opaque, content‑addressed ID: e.g.,
graph_revision_id = blake3( sorted_edges + cfg + tool_versions + dataset_hashes ). - First‑class fields: store
graph_id(logical lineage),graph_revision_id(immutable),parent_revision_id(if derived),created_at,provenance(feed hashes, toolchain). - UI surfacing: show a copy‑button “Rev: 8f2d…c9” on graph pages and in the “Share” dialog.
- Diff affordance: when
?rev=Aand?rev=Bare both present, offer “Compare paths (A↔B).”
Minimal API contract (suggested)
GET /api/graphs/{graph_id}→ latest +latest_revision_idGET /api/graphs/{graph_id}/revisions/{graph_revision_id}→ immutable snapshotGET /api/graphs/{graph_id}/nodes?rev=…and/edges?rev=…POST /api/graphs/{graph_id}/pinwith{ graph_revision_id }to mark “official”- HTTP
Linkheader on all responses:Link: <…/graphs/{graph_id}/revisions/{graph_revision_id}>; rel="version"
How to compute the revision id (deterministic)
- Inputs (all normalized): sorted node/edge sets; build config; tool+model versions; input artifacts (SBOM/VEX/feed) by hash; environment knobs (feature flags).
- Serialization: canonical JSON (UTF‑8, ordered keys).
- Hash: BLAKE3/sha256 → base58/hex (shortened in UI, full in API).
- Store alongside a manifest (so you can replay the graph later).
Guardrails
- Never reuse an ID if any input bit differs.
- Do not make it guessable from business data (avoid leaking repo names, paths).
- Break glass: if a bad graph must be purged, keep the ID tombstoned (410 Gone) so references don’t silently change.
Stella Ops touches (concrete)
- Authority: add
GraphRevisionManifest(feeds, lattice/policy versions, scanners, in‑toto/DSSE attestations). - Scanner/Vexer: emit deterministic manifests and hand them to Authority for id derivation.
- Ledger: record
(graph_id, graph_revision_id, manifest_hash, signatures); expose audit query bygraph_revision_id. - Docs & Support: “Attach your
graph_revision_id” line in issue templates.
Tiny UX copy
- On graph page header:
Rev 8f2d…c9• Copy • Compare • Pin - Share dialog: “This link freezes today’s state. New runs get a different rev.”
If you want, I can draft the DB table, the manifest JSON schema, and the exact URL/router changes for your .NET 10 services next. Cool, let’s turn this into something your engineers can actually pick up and implement.
Below is a concrete implementation plan broken down by phases, services, and tickets, with suggested data models, APIs, and tests.
0. Definitions (shared across teams)
-
Graph ID (
graph_id) – Logical identifier for a call graph lineage (e.g., “the call graph for build X of repo Y”). -
Graph Revision ID (
graph_revision_id) – Immutable identifier for a specific snapshot of that graph, derived from a manifest (content-addressed hash). -
Parent Revision ID (
parent_revision_id) – Previous revision in the lineage (if any). -
Manifest – Canonical JSON blob that describes everything that could affect graph structure or results:
- Nodes & edges
- Input feeds and their hashes (SBOM, VEX, scanner output, etc.)
- config/policies/feature flags
- tool + version (scanner, vexer, authority)
1. High-Level Architecture Changes
-
Introduce
graph_revision_idas a first-class concept in:- Graph storage / Authority
- Ledger / audit
- Backend APIs serving call graphs
-
Derive
graph_revision_iddeterministically from a manifest via a cryptographic hash. -
Expose revision in all graph-related URLs & APIs:
- UI:
…/graphs/{graph_id}?rev={graph_revision_id} - API:
…/api/graphs/{graph_id}/revisions/{graph_revision_id}
- UI:
-
Ensure immutability: once a revision exists, it can never be updated in-place—only superseded by new revisions.
2. Backend: Data Model & Storage
2.1. Authority (graph source of truth)
Goal: Model graphs and revisions explicitly.
New / updated tables (example in SQL-ish form):
- Graphs (logical entity)
CREATE TABLE graphs (
id UUID PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
latest_revision_id VARCHAR(128) NULL, -- FK into graph_revisions.id
label TEXT NULL, -- optional human label
metadata JSONB NULL
);
- Graph Revisions (immutable snapshots)
CREATE TABLE graph_revisions (
id VARCHAR(128) PRIMARY KEY, -- graph_revision_id (hash)
graph_id UUID NOT NULL REFERENCES graphs(id),
parent_revision_id VARCHAR(128) NULL REFERENCES graph_revisions(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
manifest JSONB NOT NULL, -- canonical manifest
provenance JSONB NOT NULL, -- tool versions, etc.
is_pinned BOOLEAN NOT NULL DEFAULT FALSE,
pinned_by UUID NULL, -- user id
pinned_at TIMESTAMPTZ NULL
);
CREATE INDEX idx_graph_revisions_graph_id ON graph_revisions(graph_id);
- Call Graph Data (if separate)
If you store nodes/edges in separate tables, add a foreign key to
graph_revision_id:
ALTER TABLE call_graph_nodes
ADD COLUMN graph_revision_id VARCHAR(128) NULL;
ALTER TABLE call_graph_edges
ADD COLUMN graph_revision_id VARCHAR(128) NULL;
Rule: Nodes/edges for a revision are never mutated; a new revision means new rows.
2.2. Ledger (audit trail)
Goal: Every revision gets a ledger record for auditability.
Table change or new table:
CREATE TABLE graph_revision_ledger (
id BIGSERIAL PRIMARY KEY,
graph_revision_id VARCHAR(128) NOT NULL,
graph_id UUID NOT NULL,
manifest_hash VARCHAR(128) NOT NULL,
manifest_digest_algo TEXT NOT NULL, -- e.g., "BLAKE3"
authority_signature BYTEA NULL, -- optional
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_grl_revision ON graph_revision_ledger(graph_revision_id);
Ledger ingestion happens after a revision is stored in Authority, but before it is exposed as “current” in the UI.
3. Backend: Revision Hashing & Manifest
3.1. Define the manifest schema
Create a spec (e.g., JSON Schema) used by Scanner/Vexer/Authority.
Example structure:
{
"graph": {
"graph_id": "uuid",
"generator": {
"tool_name": "scanner",
"tool_version": "1.4.2",
"run_id": "some-run-id"
}
},
"inputs": {
"sbom_hash": "sha256:…",
"vex_hash": "sha256:…",
"repos": [
{
"name": "repo-a",
"commit": "abc123",
"tree_hash": "sha1:…"
}
]
},
"config": {
"policy_version": "2024-10-01",
"feature_flags": {
"new_vex_engine": true
}
},
"graph_content": {
"nodes": [
// nodes in canonical sorted order
],
"edges": [
// edges in canonical sorted order
]
}
}
Key requirements:
- All lists that affect the graph (
nodes,edges,repos, etc.) must be sorted deterministically. - Keys must be stable (no environment-dependent keys, no random IDs).
- All hashes of input artifacts must be included (not raw content).
3.2. Hash computation
Language-agnostic algorithm:
-
Normalize manifest to canonical JSON:
- UTF-8
- Sorted keys
- No extra whitespace
-
Hash the bytes using a cryptographic hash (BLAKE3 or SHA-256).
-
Encode as hex or base58 string.
Pseudocode:
function compute_graph_revision_id(manifest):
canonical_json = canonical_json_encode(manifest) // sorted keys
digest_bytes = BLAKE3(canonical_json)
digest_hex = hex_encode(digest_bytes)
return "grv_" + digest_hex[0:40] // prefix + shorten for UI
Ticket: Implement GraphRevisionIdGenerator library (shared):
Compute(manifest) -> graph_revision_idValidateFormat(graph_revision_id) -> bool
Make this a shared library across Scanner, Vexer, Authority to avoid divergence.
4. Backend: APIs
4.1. Graphs & revisions REST API
New endpoints (example):
- Get latest graph revision
GET /api/graphs/{graph_id}
Response:
{
"graph_id": "…",
"latest_revision_id": "grv_8f2d…c9",
"created_at": "…",
"metadata": { … }
}
- List revisions for a graph
GET /api/graphs/{graph_id}/revisions
Query: ?page=1&pageSize=20
Response:
{
"graph_id": "…",
"items": [
{
"graph_revision_id": "grv_8f2d…c9",
"created_at": "…",
"parent_revision_id": null,
"is_pinned": true
},
{
"graph_revision_id": "grv_3a1b…e4",
"created_at": "…",
"parent_revision_id": "grv_8f2d…c9",
"is_pinned": false
}
]
}
- Get a specific revision (snapshot)
GET /api/graphs/{graph_id}/revisions/{graph_revision_id}
Response:
{
"graph_id": "…",
"graph_revision_id": "…",
"created_at": "…",
"parent_revision_id": null,
"manifest": { … }, // optional: maybe not full content if large
"provenance": { … }
}
- Get nodes/edges for a revision
GET /api/graphs/{graph_id}/nodes?rev={graph_revision_id}
GET /api/graphs/{graph_id}/edges?rev={graph_revision_id}
Behavior:
- If
revis omitted, return the latest_revision_id for thatgraph_id. - If
revis invalid or unknown, return404(not fallback).
- Pin/unpin a revision (optional for v1)
POST /api/graphs/{graph_id}/pin
Body: { "graph_revision_id": "…" }
DELETE /api/graphs/{graph_id}/pin
Body: { "graph_revision_id": "…" }
4.2. Backward compatibility
-
Existing endpoints like
GET /api/graphs/{graph_id}/nodesshould:- Continue working with no
revparam. - Internally resolve to
latest_revision_id.
- Continue working with no
-
For old records with no revision:
- Create a synthetic manifest from current stored data.
- Compute a
graph_revision_id. - Store it and set
latest_revision_idon thegraphsrow.
5. Scanner / Vexer / Upstream Pipelines
Goal: At the end of a graph build, they produce a manifest and a graph_revision_id.
5.1. Responsibilities
-
Scanner/Vexer:
-
Gather:
- Tool name/version
- Input artifact hashes
- Feature flags / config
- Graph nodes/edges
-
Construct manifest (according to schema).
-
Compute
graph_revision_idusing shared library. -
Send manifest + revision ID to Authority via an internal API (e.g.,
POST /internal/graph-build-complete).
-
-
Authority:
-
Idempotently upsert:
graphs(if newgraph_id)graph_revisionsrow (ifgraph_revision_idnot yet present)- nodes/edges rows keyed by
graph_revision_id.
-
Update
graphs.latest_revision_idto the new revision.
-
5.2. Internal API (Authority)
POST /internal/graphs/{graph_id}/revisions
Body:
{
"graph_revision_id": "…",
"parent_revision_id": "…", // optional
"manifest": { … },
"provenance": { … },
"nodes": [ … ],
"edges": [ … ]
}
Response: 201 Created (or 200 if idempotent)
Rules:
- If
graph_revision_idalready exists for thatgraph_idwith identicalmanifest_hash, treat as idempotent. - If
graph_revision_idexists but manifest hash differs → log and reject (bug in hashing).
6. Frontend / UX Changes
Assuming a SPA (React/Vue/etc.), we’ll treat these as tasks.
6.1. URL & routing
-
New canonical URL format for graph UI:
- Latest:
/graphs/{graph_id} - Specific revision:
/graphs/{graph_id}?rev={graph_revision_id}
- Latest:
-
Router:
- Parse
revquery param. - If present, call
GET /api/graphs/{graph_id}/nodes?rev=…. - If not present, call same endpoint but without
rev→ backend returns latest.
- Parse
6.2. Displaying revision info
-
In graph page header:
-
Show truncated revision:
Rev: 8f2d…c9
-
Buttons:
- Copy → Copies full
graph_revision_id. - Share → Copies full URL with
?rev=….
- Copy → Copies full
-
Optional chip if pinned:
Pinned.
-
Example data model (TS):
type GraphRevisionSummary = {
graphId: string;
graphRevisionId: string;
createdAt: string;
parentRevisionId?: string | null;
isPinned: boolean;
};
6.3. Revision list panel (optional but useful)
-
Add a side panel or tab: “Revisions”.
-
Fetch from
GET /api/graphs/{graph_id}/revisions. -
Clicking a revision:
- Navigates to same page with
?rev={graph_revision_id}. - Preserves other UI state where reasonable.
- Navigates to same page with
6.4. Diff view (nice-to-have, can be v2)
-
UX: “Compare with…” button in header.
- Opens dialog to pick a second revision.
-
Backend: add a diff endpoint later, or compute diff client-side from node/edge lists if feasible.
7. Migration Plan
7.1. Phase 1 – Schema & read-path ready
-
Add DB columns/tables:
graphs,graph_revisions,graph_revision_ledger.graph_revision_idcolumn tocall_graph_nodes/call_graph_edges.
-
Deploy with no behavior changes:
- Default
graph_revision_idcolumns NULL. - Existing APIs continue to work.
- Default
7.2. Phase 2 – Backfill existing graphs
-
Write a backfill job:
-
For each distinct existing graph:
- Build a manifest from existing stored data.
- Compute
graph_revision_id. - Insert into
graphs&graph_revisions. - Update nodes/edges for that graph to set
graph_revision_id. - Set
graphs.latest_revision_id.
-
-
Log any graphs that can’t be backfilled (corrupt data, etc.) for manual review.
-
After backfill:
- Add NOT NULL constraint on
graph_revision_idfor nodes/edges (if practical). - Ensure all public APIs can fetch revisions without changes from clients.
- Add NOT NULL constraint on
7.3. Phase 3 – Wire up new pipelines
-
Update Scanner/Vexer to construct manifests and compute revision IDs.
-
Update Authority to accept
/internal/graphs/{graph_id}/revisions. -
Gradually roll out:
- Feature flag:
graphRevisionIdFromPipeline. - For flagged runs, use the new pipeline; for others, fall back to old + synthetic revision.
- Feature flag:
7.4. Phase 4 – Frontend rollout
-
Update UI to:
- Read
revfrom URL (but not required). - Show
Revin header. - Use revision-aware endpoints.
- Read
-
Once stable:
- Update “Share” actions to always include
?rev=….
- Update “Share” actions to always include
8. Testing Strategy
8.1. Unit tests
-
Hashing library:
- Same manifest → same
graph_revision_id. - Different node ordering → same
graph_revision_id. - Tiny manifest change → different
graph_revision_id.
- Same manifest → same
-
Authority service:
- Creating a revision stores
graph_revisions+ nodes/edges with matchinggraph_revision_id. - Duplicate revision (same id + manifest) is idempotent.
- Conflicting manifest with same
graph_revision_idis rejected.
- Creating a revision stores
8.2. Integration tests
-
Scenario: “Create graph → view in UI”
- Pipeline produces manifest & revision.
- Authority persists revision.
- Ledger logs event.
- UI shows matching
graph_revision_id.
-
Scenario: “Stable permalinks”
- Capture a link with
?rev=…. - Rerun pipeline (new revision).
- Old link still shows original nodes/edges.
- Capture a link with
8.3. Migration tests
-
On a sanitized snapshot:
-
Run migration & backfill.
-
Spot-check:
- Each
graph_idhas exactly onelatest_revision_id. - Node/edge counts before and after match.
- Manually recompute hash for a few graphs and compare to stored
graph_revision_id.
- Each
-
9. Security & Compliance Considerations
-
Immutability guarantee:
- Don’t allow updates to
graph_revisions.manifest. - Any change must happen by creating a new revision.
- Don’t allow updates to
-
Tombstoning (for rare delete cases):
- If you must “remove” a bad graph, mark revision as
tombstonedin an additional column and return410 Gonefor thatgraph_revision_id. - Never reuse that ID.
- If you must “remove” a bad graph, mark revision as
-
Access control:
- Ensure revision APIs use the same ACLs as existing graph APIs.
- Don’t leak manifests to users not allowed to see underlying artifacts.
10. Concrete Ticket Breakdown (example)
You can copy/paste this into your tracker and tweak.
-
BE-01 – Add
graphsandgraph_revisionstables-
AC:
- Tables exist with fields above.
- Migrations run cleanly in staging.
-
-
BE-02 – Add
graph_revision_idto nodes/edges tables-
AC:
- Column added, nullable.
- No runtime errors in staging.
-
-
BE-03 – Implement
GraphRevisionIdGeneratorlibrary-
AC:
- Given a manifest, returns deterministic ID.
- Unit tests cover ordering, minimal changes.
-
-
BE-04 – Implement
/internal/graphs/{graph_id}/revisionsin Authority-
AC:
- Stores new revision + nodes/edges.
- Idempotent on duplicate revisions.
-
-
BE-05 – Implement public revision APIs
-
AC:
- Endpoints in §4.1 available with Swagger.
revquery param supported.- Default behavior returns latest revision.
-
-
BE-06 – Backfill existing graphs into
graph_revisions-
AC:
- All existing graphs have
latest_revision_id. - Nodes/edges linked to a
graph_revision_id. - Metrics & logs generated for failures.
- All existing graphs have
-
-
BE-07 – Ledger integration for revisions
-
AC:
- Each new revision creates a ledger entry.
- Query by
graph_revision_idworks.
-
-
PIPE-01 – Scanner/Vexer manifest construction
-
AC:
- Manifest includes all required fields.
- Values verified against Authority for a sample run.
-
-
PIPE-02 – Scanner/Vexer computes
graph_revision_idand calls Authority-
AC:
- End-to-end pipeline run produces a new
graph_revision_id. - Authority stores it and sets as latest.
- End-to-end pipeline run produces a new
-
-
FE-01 – UI supports
?rev=param and displays revision-
AC:
- When URL has
rev, UI loads that revision. - When no
rev, loads latest. - Rev appears in header with copy/share.
- When URL has
-
-
FE-02 – Revision list UI (optional)
-
AC:
- Revision panel lists revisions.
- Click navigates to appropriate
?rev=.
-
If you’d like, I can next help you turn this into a very explicit design doc (with diagrams and exact JSON examples) or into ready-to-paste migration scripts / TypeScript interfaces tailored to your actual stack.