Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
66 lines
5.9 KiB
Markdown
66 lines
5.9 KiB
Markdown
# Graph architecture
|
||
|
||
> Derived from Epic 5 – SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures.
|
||
|
||
## 1) Core model
|
||
|
||
- **Nodes:**
|
||
- `Artifact` (application/image digest) with metadata (tenant, environment, labels).
|
||
- `Component` (package/version, purl, ecosystem).
|
||
- `File`/`Path` (source files, binary paths) with hash/time metadata.
|
||
- `License` nodes linked to components and SBOM attestations.
|
||
- `Advisory` and `VEXStatement` nodes linking to Concelier/Excititor records via digests.
|
||
- `PolicyVersion` nodes representing signed policy packs.
|
||
- **Edges:** directed, timestamped relationships such as `DEPENDS_ON`, `BUILT_FROM`, `DECLARED_IN`, `AFFECTED_BY`, `VEX_EXEMPTS`, `GOVERNS_WITH`, `OBSERVED_RUNTIME`. Each edge carries provenance (SRM hash, SBOM digest, policy run ID).
|
||
- **Overlays:** computed index tables providing fast access to reachability, blast radius, and differential views (e.g., `graph_overlay/vuln/{tenant}/{advisoryKey}`). Runtime endpoints emit overlays inline (`policy.overlay.v1`, `openvex.v1`) with deterministic overlay IDs (`sha256(tenant|nodeId|overlayKind)`) and sampled explain traces on policy overlays.
|
||
|
||
## 2) Pipelines
|
||
|
||
1. **Ingestion:** Cartographer/SBOM Service emit SBOM snapshots (`sbom_snapshot` events) captured by the Graph Indexer. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata.
|
||
2. **ETL:** Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or document + adjacency lists in Mongo).
|
||
3. **Overlay computation:** Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports.
|
||
4. **Diffing:** `graph_diff` jobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption.
|
||
5. **Analytics (Runtime & Signals 140.A):** background workers run Louvain-style clustering + degree/betweenness approximations on ingested graphs, emitting overlays per tenant/snapshot and writing cluster ids back to nodes when enabled.
|
||
|
||
## 3) APIs
|
||
|
||
- `POST /graph/search` — NDJSON node tiles with cursor paging, tenant + scope guards.
|
||
- `POST /graph/query` — NDJSON nodes/edges/stats/cursor with budgets (tiles/nodes/edges) and optional inline overlays (`includeOverlays=true`) emitting `policy.overlay.v1` and `openvex.v1` payloads; overlay IDs are `sha256(tenant|nodeId|overlayKind)`; policy overlay may include a sampled `explainTrace`.
|
||
- `POST /graph/paths` — bounded BFS (depth ≤6) returning path nodes/edges/stats; honours budgets and overlays.
|
||
- `POST /graph/diff` — compares `snapshotA` vs `snapshotB`, streaming node/edge added/removed/changed tiles plus stats; budget enforcement mirrors `/graph/query`.
|
||
- `POST /graph/export` — async job producing deterministic manifests (`sha256`, size, format) for `ndjson/csv/graphml/png/svg`; download via `/graph/export/{jobId}`.
|
||
- Legacy: `GET /graph/nodes/{id}`, `POST /graph/query/saved`, `GET /graph/impact/{advisoryKey}`, `POST /graph/overlay/policy` remain in spec but should align to the NDJSON surfaces above as they are brought forward.
|
||
|
||
## 4) Storage considerations
|
||
|
||
- Backed by either:
|
||
- **Document + adjacency** (Mongo collections `graph_nodes`, `graph_edges`, `graph_overlays`) with deterministic ordering and streaming exports.
|
||
- Or **Graph DB** (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
|
||
- All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.
|
||
|
||
## 5) Offline & export
|
||
|
||
- Each snapshot packages `nodes.jsonl`, `edges.jsonl`, `overlays/` plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles.
|
||
- Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results.
|
||
- Runtime hosts register the SBOM ingest pipeline via `services.AddSbomIngestPipeline(...)`. Snapshot exports default to `./artifacts/graph-snapshots` but can be redirected with `STELLAOPS_GRAPH_SNAPSHOT_DIR` or the `SbomIngestOptions.SnapshotRootDirectory` callback.
|
||
- Analytics overlays are exported as NDJSON (`overlays/clusters.ndjson`, `overlays/centrality.ndjson`) ordered by node id; `overlays/manifest.json` mirrors snapshot id and counts for offline parity.
|
||
|
||
## 6) Observability
|
||
|
||
- Metrics: ingestion lag (`graph_ingest_lag_seconds`), node/edge counts, query latency per saved query, overlay generation duration.
|
||
- New analytics metrics: `graph_analytics_runs_total`, `graph_analytics_failures_total`, `graph_analytics_clusters_total`, `graph_analytics_centrality_total`, plus change-stream/backfill counters (`graph_changes_total`, `graph_backfill_total`, `graph_change_failures_total`, `graph_change_lag_seconds`).
|
||
- Logs: structured events for ETL stages and query execution (with trace IDs).
|
||
- Traces: ETL pipeline spans, query engine spans.
|
||
|
||
## 7) Rollout notes
|
||
|
||
- Phase 1: ingest SBOM + advisories, deliver impact queries.
|
||
- Phase 2: add VEX overlays, policy overlays, diff tooling.
|
||
- Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future).
|
||
|
||
### Local testing note
|
||
|
||
Set `STELLAOPS_TEST_MONGO_URI` to a reachable MongoDB instance before running `tests/Graph/StellaOps.Graph.Indexer.Tests`. The test harness falls back to `mongodb://127.0.0.1:27017`, then Mongo2Go, but the CI workflow requires the environment variable to be present to ensure upsert coverage runs against a managed database. Use `STELLAOPS_GRAPH_SNAPSHOT_DIR` (or the `AddSbomIngestPipeline` options callback) to control where graph snapshot artefacts land during local runs.
|
||
|
||
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.
|