Files
git.stella-ops.org/docs/modules/graph/architecture.md
2026-03-11 10:07:30 +02:00

106 lines
9.8 KiB
Markdown

# Graph architecture
> Derived from Epic 5 – SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures.
## 1) Core model
- **Nodes:**
- `Artifact` (application/image digest) with metadata (tenant, environment, labels).
- `SBOM` (sbom digest, format, version/sequence, chain id).
- `Component` (package/version, purl, ecosystem).
- `File`/`Path` (source files, binary paths) with hash/time metadata.
- `License` nodes linked to components and SBOM attestations.
- `Advisory` and `VEXStatement` nodes linking to Concelier/Excititor records via digests.
- `PolicyVersion` nodes representing signed policy packs.
- **Edges:** directed, timestamped relationships such as `DEPENDS_ON`, `BUILT_FROM`, `DECLARED_IN`, `AFFECTED_BY`, `VEX_EXEMPTS`, `GOVERNS_WITH`, `OBSERVED_RUNTIME`, `SBOM_VERSION_OF`, and `SBOM_LINEAGE_*`. Each edge carries provenance (SRM hash, SBOM digest, policy run ID).
- **Overlays:** computed index tables providing fast access to reachability, blast radius, and differential views (e.g., `graph_overlay/vuln/{tenant}/{advisoryKey}`). Runtime endpoints emit overlays inline (`policy.overlay.v1`, `openvex.v1`) with deterministic overlay IDs (`sha256(tenant|nodeId|overlayKind)`) and sampled explain traces on policy overlays.
## 2) Pipelines
1. **Ingestion:** Cartographer/SBOM Service emit SBOM snapshots (`sbom_snapshot` events) captured by the Graph Indexer. Ledger lineage references become `SBOM_VERSION_OF` + `SBOM_LINEAGE_*` edges. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata.
2. **ETL:** Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or PostgreSQL adjacency lists).
3. **Overlay computation:** Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports.
4. **Diffing:** `graph_diff` jobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption.
5. **Analytics (Runtime & Signals 140.A):** background workers run Louvain-style clustering + degree/betweenness approximations on ingested graphs, emitting overlays per tenant/snapshot and writing cluster ids back to nodes when enabled.
## 3) APIs
- `POST /graph/search` — NDJSON node tiles with cursor paging, tenant + scope guards.
- `POST /graph/query` — NDJSON nodes/edges/stats/cursor with budgets (tiles/nodes/edges) and optional inline overlays (`includeOverlays=true`) emitting `policy.overlay.v1` and `openvex.v1` payloads; overlay IDs are `sha256(tenant|nodeId|overlayKind)`; policy overlay may include a sampled `explainTrace`.
- `POST /graph/paths` — bounded BFS (depth ≤6) returning path nodes/edges/stats; honours budgets and overlays.
- `POST /graph/diff` — compares `snapshotA` vs `snapshotB`, streaming node/edge added/removed/changed tiles plus stats; budget enforcement mirrors `/graph/query`.
- `POST /graph/export` — async job producing deterministic manifests (`sha256`, size, format) for `ndjson/csv/graphml/png/svg`; download via `/graph/export/{jobId}`.
- `POST /graph/lineage` - returns SBOM lineage nodes/edges anchored by `artifactDigest` or `sbomDigest`, with optional relationship filters and depth limits.
- **Edge Metadata API** (added 2025-01):
- `POST /graph/edges/metadata` — batch query for edge explanations; request contains `EdgeIds[]`, response includes `EdgeTileWithMetadata[]` with full provenance.
- `GET /graph/edges/{edgeId}/metadata` — single edge metadata with explanation, via, provenance, and evidence references.
- `GET /graph/edges/path/{sourceNodeId}/{targetNodeId}` — returns all edges on the shortest path between two nodes, each with metadata.
- `GET /graph/edges/by-reason/{reason}` — query edges by `EdgeReason` enum (e.g., `SbomDependency`, `AdvisoryAffects`, `VexStatement`, `RuntimeTrace`).
- `GET /graph/edges/by-evidence?evidenceType=&evidenceRef=` — query edges by evidence reference.
- Legacy: `GET /graph/nodes/{id}`, `POST /graph/query/saved`, `GET /graph/impact/{advisoryKey}`, `POST /graph/overlay/policy` remain in spec but should align to the NDJSON surfaces above as they are brought forward.
### 3.1) Tenant and auth resolution contract (Sprint 20260222.058)
- Graph uses a single tenant resolver path (`GraphRequestContextResolver`) across search/query/paths/diff/lineage/export and edge-metadata endpoints.
- Tenant source precedence and compatibility:
- claim: `stellaops:tenant` (with bounded aliases `tid`, `tenant_id`)
- headers: `X-StellaOps-Tenant` (canonical), then migration headers `X-Stella-Tenant` and `X-Tenant-Id`
- Deterministic failures:
- missing tenant: `400 GRAPH_VALIDATION_FAILED`
- conflicting tenant claim/header values: `400 GRAPH_VALIDATION_FAILED`
- missing auth: `401 GRAPH_UNAUTHORIZED`
- missing scope: `403 GRAPH_FORBIDDEN`
- Scope checks are policy-driven (`Graph.ReadOrQuery`, `Graph.Query`, `Graph.Export`) and no endpoint directly trusts raw scope headers.
- Rate limiting and audit logging use the resolved tenant context; authenticated flows no longer collapse to ambiguous `"unknown"` tenant keys.
### 3.2) Edge Metadata Contracts
The edge metadata system provides explainability for graph relationships:
- **EdgeReason** enum: `Unknown`, `SbomDependency`, `StaticSymbol`, `RuntimeTrace`, `PackageManifest`, `Lockfile`, `BuildArtifact`, `ImageLayer`, `AdvisoryAffects`, `VexStatement`, `PolicyOverlay`, `AttestationRef`, `OperatorAnnotation`, `TransitiveInference`, `Provenance`.
- **EdgeVia** record: Describes how the edge was discovered (method, version, timestamp, confidence in basis points, evidence reference).
- **EdgeExplanationPayload** record: Full explanation including reason, via, human-readable summary, evidence list, provenance reference, and tags.
- **EdgeProvenanceRef** record: Source system, collection timestamp, SBOM digest, scan digest, attestation ID, event offset.
- **EdgeTileWithMetadata** record: Extends `EdgeTile` with `Explanation` property containing the full metadata.
### 3.3) Localization runtime contract (Sprint 20260224_002)
- Graph API now initializes localization via `AddStellaOpsLocalization(...)`, `AddTranslationBundle(...)`, `AddRemoteTranslationBundles()`, `UseStellaOpsLocalization()`, and `LoadTranslationsAsync()`.
- Locale resolution order for API messages is deterministic: `X-Locale` header -> `Accept-Language` header -> default locale (`en-US`).
- Translation layering is deterministic: shared embedded `common` bundle -> Graph embedded bundle (`Translations/*.graph.json`) -> Platform runtime override bundle.
- Remote Platform override fetches are bounded and loaded concurrently per provider locale so scratch bootstrap cannot hold the Graph API offline while optional translation overrides load.
- This rollout localizes selected error paths (for example, edge/export not found, invalid reason, and tenant/auth validation text) for `en-US` and `de-DE`.
## 4) Storage considerations
- Backed by either:
- **Relational + adjacency** (PostgreSQL tables `graph_nodes`, `graph_edges`, `graph_overlays`) with deterministic ordering and streaming exports.
- Or **Graph DB** (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
- All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.
## 5) Offline & export
- Each snapshot packages `nodes.jsonl`, `edges.jsonl`, `overlays/` plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles.
- Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results.
- Runtime hosts register the SBOM ingest pipeline via `services.AddSbomIngestPipeline(...)`. Snapshot exports default to `./artifacts/graph-snapshots` but can be redirected with `STELLAOPS_GRAPH_SNAPSHOT_DIR` or the `SbomIngestOptions.SnapshotRootDirectory` callback.
- Analytics overlays are exported as NDJSON (`overlays/clusters.ndjson`, `overlays/centrality.ndjson`) ordered by node id; `overlays/manifest.json` mirrors snapshot id and counts for offline parity.
## 6) Observability
- Metrics: ingestion lag (`graph_ingest_lag_seconds`), node/edge counts, query latency per saved query, overlay generation duration.
- New analytics metrics: `graph_analytics_runs_total`, `graph_analytics_failures_total`, `graph_analytics_clusters_total`, `graph_analytics_centrality_total`, plus change-stream/backfill counters (`graph_changes_total`, `graph_backfill_total`, `graph_change_failures_total`, `graph_change_lag_seconds`).
- Logs: structured events for ETL stages and query execution (with trace IDs).
- Traces: ETL pipeline spans, query engine spans.
## 7) Rollout notes
- Phase 1: ingest SBOM + advisories, deliver impact queries.
- Phase 2: add VEX overlays, policy overlays, diff tooling.
- Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future).
### Local testing note
Set `STELLAOPS_TEST_POSTGRES_CONNECTION` to a reachable PostgreSQL instance before running `tests/Graph/StellaOps.Graph.Indexer.Tests`. The test harness falls back to `Host=127.0.0.1;Port=5432;Database=stellaops_test`, then Testcontainers for PostgreSQL, but the CI workflow requires the environment variable to be present to ensure upsert coverage runs against a managed database. Use `STELLAOPS_GRAPH_SNAPSHOT_DIR` (or the `AddSbomIngestPipeline` options callback) to control where graph snapshot artefacts land during local runs.
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.