Files
git.stella-ops.org/docs/modules/graph/architecture.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

3.8 KiB
Raw Blame History

Graph architecture

Derived from Epic5 SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures.

1) Core model

  • Nodes:
    • Artifact (application/image digest) with metadata (tenant, environment, labels).
    • Component (package/version, purl, ecosystem).
    • File/Path (source files, binary paths) with hash/time metadata.
    • License nodes linked to components and SBOM attestations.
    • Advisory and VEXStatement nodes linking to Concelier/Excititor records via digests.
    • PolicyVersion nodes representing signed policy packs.
  • Edges: directed, timestamped relationships such as DEPENDS_ON, BUILT_FROM, DECLARED_IN, AFFECTED_BY, VEX_EXEMPTS, GOVERNS_WITH, OBSERVED_RUNTIME. Each edge carries provenance (SRM hash, SBOM digest, policy run ID).
  • Overlays: computed index tables providing fast access to reachability, blast radius, and differential views (e.g., graph_overlay/vuln/{tenant}/{advisoryKey}).

2) Pipelines

  1. Ingestion: Cartographer/SBOM Service emit SBOM snapshots (sbom_snapshot events) captured by the Graph Indexer. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata.
  2. ETL: Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or document + adjacency lists in Mongo).
  3. Overlay computation: Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports.
  4. Diffing: graph_diff jobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption.

3) APIs

  • GET /graph/nodes/{id} — fetch node with metadata and attached provenance.
  • POST /graph/query/saved — execute saved query (Cypher-like DSL) with tenant filtering; supports paging, citation metadata, and explain traces.
  • GET /graph/impact/{advisoryKey} — returns impacted artifacts with path context and policy/vex overlays.
  • GET /graph/diff/{snapshotA}/{snapshotB} — streaming API returning diff manifest including new/removed edges, risk summary, and export references.
  • POST /graph/overlay/policy — create or retrieve overlay for policy version + advisory set, referencing effective_finding results.

4) Storage considerations

  • Backed by either:
    • Document + adjacency (Mongo collections graph_nodes, graph_edges, graph_overlays) with deterministic ordering and streaming exports.
    • Or Graph DB (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
  • All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.

5) Offline & export

  • Each snapshot packages nodes.jsonl, edges.jsonl, overlays/ plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles.
  • Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results.

6) Observability

  • Metrics: ingestion lag (graph_ingest_lag_seconds), node/edge counts, query latency per saved query, overlay generation duration.
  • Logs: structured events for ETL stages and query execution (with trace IDs).
  • Traces: ETL pipeline spans, query engine spans.

7) Rollout notes

  • Phase 1: ingest SBOM + advisories, deliver impact queries.
  • Phase 2: add VEX overlays, policy overlays, diff tooling.
  • Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future).

Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.