9.8 KiB
Graph architecture
Derived from Epic 5 – SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures.
1) Core model
- Nodes:
Artifact(application/image digest) with metadata (tenant, environment, labels).SBOM(sbom digest, format, version/sequence, chain id).Component(package/version, purl, ecosystem).File/Path(source files, binary paths) with hash/time metadata.Licensenodes linked to components and SBOM attestations.AdvisoryandVEXStatementnodes linking to Concelier/Excititor records via digests.PolicyVersionnodes representing signed policy packs.
- Edges: directed, timestamped relationships such as
DEPENDS_ON,BUILT_FROM,DECLARED_IN,AFFECTED_BY,VEX_EXEMPTS,GOVERNS_WITH,OBSERVED_RUNTIME,SBOM_VERSION_OF, andSBOM_LINEAGE_*. Each edge carries provenance (SRM hash, SBOM digest, policy run ID). - Overlays: computed index tables providing fast access to reachability, blast radius, and differential views (e.g.,
graph_overlay/vuln/{tenant}/{advisoryKey}). Runtime endpoints emit overlays inline (policy.overlay.v1,openvex.v1) with deterministic overlay IDs (sha256(tenant|nodeId|overlayKind)) and sampled explain traces on policy overlays.
2) Pipelines
- Ingestion: Cartographer/SBOM Service emit SBOM snapshots (
sbom_snapshotevents) captured by the Graph Indexer. Ledger lineage references becomeSBOM_VERSION_OF+SBOM_LINEAGE_*edges. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata. - ETL: Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or PostgreSQL adjacency lists).
- Overlay computation: Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports.
- Diffing:
graph_diffjobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption. - Analytics (Runtime & Signals 140.A): background workers run Louvain-style clustering + degree/betweenness approximations on ingested graphs, emitting overlays per tenant/snapshot and writing cluster ids back to nodes when enabled.
3) APIs
POST /graph/search— NDJSON node tiles with cursor paging, tenant + scope guards.POST /graph/query— NDJSON nodes/edges/stats/cursor with budgets (tiles/nodes/edges) and optional inline overlays (includeOverlays=true) emittingpolicy.overlay.v1andopenvex.v1payloads; overlay IDs aresha256(tenant|nodeId|overlayKind); policy overlay may include a sampledexplainTrace.POST /graph/paths— bounded BFS (depth ≤6) returning path nodes/edges/stats; honours budgets and overlays.POST /graph/diff— comparessnapshotAvssnapshotB, streaming node/edge added/removed/changed tiles plus stats; budget enforcement mirrors/graph/query.POST /graph/export— async job producing deterministic manifests (sha256, size, format) forndjson/csv/graphml/png/svg; download via/graph/export/{jobId}.POST /graph/lineage- returns SBOM lineage nodes/edges anchored byartifactDigestorsbomDigest, with optional relationship filters and depth limits.- Edge Metadata API (added 2025-01):
POST /graph/edges/metadata— batch query for edge explanations; request containsEdgeIds[], response includesEdgeTileWithMetadata[]with full provenance.GET /graph/edges/{edgeId}/metadata— single edge metadata with explanation, via, provenance, and evidence references.GET /graph/edges/path/{sourceNodeId}/{targetNodeId}— returns all edges on the shortest path between two nodes, each with metadata.
GET /graph/edges/by-reason/{reason}— query edges byEdgeReasonenum (e.g.,SbomDependency,AdvisoryAffects,VexStatement,RuntimeTrace).GET /graph/edges/by-evidence?evidenceType=&evidenceRef=— query edges by evidence reference.- Legacy:
GET /graph/nodes/{id},POST /graph/query/saved,GET /graph/impact/{advisoryKey},POST /graph/overlay/policyremain in spec but should align to the NDJSON surfaces above as they are brought forward.
3.1) Tenant and auth resolution contract (Sprint 20260222.058)
- Graph uses a single tenant resolver path (
GraphRequestContextResolver) across search/query/paths/diff/lineage/export and edge-metadata endpoints. - Tenant source precedence and compatibility:
- claim:
stellaops:tenant(with bounded aliasestid,tenant_id) - headers:
X-StellaOps-Tenant(canonical), then migration headersX-Stella-TenantandX-Tenant-Id
- claim:
- Deterministic failures:
- missing tenant:
400 GRAPH_VALIDATION_FAILED - conflicting tenant claim/header values:
400 GRAPH_VALIDATION_FAILED - missing auth:
401 GRAPH_UNAUTHORIZED - missing scope:
403 GRAPH_FORBIDDEN
- missing tenant:
- Scope checks are policy-driven (
Graph.ReadOrQuery,Graph.Query,Graph.Export) and no endpoint directly trusts raw scope headers. - Rate limiting and audit logging use the resolved tenant context; authenticated flows no longer collapse to ambiguous
"unknown"tenant keys.
3.2) Edge Metadata Contracts
The edge metadata system provides explainability for graph relationships:
- EdgeReason enum:
Unknown,SbomDependency,StaticSymbol,RuntimeTrace,PackageManifest,Lockfile,BuildArtifact,ImageLayer,AdvisoryAffects,VexStatement,PolicyOverlay,AttestationRef,OperatorAnnotation,TransitiveInference,Provenance. - EdgeVia record: Describes how the edge was discovered (method, version, timestamp, confidence in basis points, evidence reference).
- EdgeExplanationPayload record: Full explanation including reason, via, human-readable summary, evidence list, provenance reference, and tags.
- EdgeProvenanceRef record: Source system, collection timestamp, SBOM digest, scan digest, attestation ID, event offset.
- EdgeTileWithMetadata record: Extends
EdgeTilewithExplanationproperty containing the full metadata.
3.3) Localization runtime contract (Sprint 20260224_002)
- Graph API now initializes localization via
AddStellaOpsLocalization(...),AddTranslationBundle(...),AddRemoteTranslationBundles(),UseStellaOpsLocalization(), andLoadTranslationsAsync(). - Locale resolution order for API messages is deterministic:
X-Localeheader ->Accept-Languageheader -> default locale (en-US). - Translation layering is deterministic: shared embedded
commonbundle -> Graph embedded bundle (Translations/*.graph.json) -> Platform runtime override bundle. - Remote Platform override fetches are bounded and loaded concurrently per provider locale so scratch bootstrap cannot hold the Graph API offline while optional translation overrides load.
- This rollout localizes selected error paths (for example, edge/export not found, invalid reason, and tenant/auth validation text) for
en-USandde-DE.
4) Storage considerations
- Backed by either:
- Relational + adjacency (PostgreSQL tables
graph_nodes,graph_edges,graph_overlays) with deterministic ordering and streaming exports. - Or Graph DB (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
- Relational + adjacency (PostgreSQL tables
- All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.
5) Offline & export
- Each snapshot packages
nodes.jsonl,edges.jsonl,overlays/plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles. - Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results.
- Runtime hosts register the SBOM ingest pipeline via
services.AddSbomIngestPipeline(...). Snapshot exports default to./artifacts/graph-snapshotsbut can be redirected withSTELLAOPS_GRAPH_SNAPSHOT_DIRor theSbomIngestOptions.SnapshotRootDirectorycallback. - Analytics overlays are exported as NDJSON (
overlays/clusters.ndjson,overlays/centrality.ndjson) ordered by node id;overlays/manifest.jsonmirrors snapshot id and counts for offline parity.
6) Observability
- Metrics: ingestion lag (
graph_ingest_lag_seconds), node/edge counts, query latency per saved query, overlay generation duration. - New analytics metrics:
graph_analytics_runs_total,graph_analytics_failures_total,graph_analytics_clusters_total,graph_analytics_centrality_total, plus change-stream/backfill counters (graph_changes_total,graph_backfill_total,graph_change_failures_total,graph_change_lag_seconds). - Logs: structured events for ETL stages and query execution (with trace IDs).
- Traces: ETL pipeline spans, query engine spans.
7) Rollout notes
- Phase 1: ingest SBOM + advisories, deliver impact queries.
- Phase 2: add VEX overlays, policy overlays, diff tooling.
- Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future).
Local testing note
Set STELLAOPS_TEST_POSTGRES_CONNECTION to a reachable PostgreSQL instance before running tests/Graph/StellaOps.Graph.Indexer.Tests. The test harness falls back to Host=127.0.0.1;Port=5432;Database=stellaops_test, then Testcontainers for PostgreSQL, but the CI workflow requires the environment variable to be present to ensure upsert coverage runs against a managed database. Use STELLAOPS_GRAPH_SNAPSHOT_DIR (or the AddSbomIngestPipeline options callback) to control where graph snapshot artefacts land during local runs.
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.