- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
		
			
				
	
	
		
			57 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			57 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Graph architecture
 | ||
| 
 | ||
| > Derived from Epic 5 – SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures.
 | ||
| 
 | ||
| ## 1) Core model
 | ||
| 
 | ||
| - **Nodes:**
 | ||
|   - `Artifact` (application/image digest) with metadata (tenant, environment, labels).
 | ||
|   - `Component` (package/version, purl, ecosystem).
 | ||
|   - `File`/`Path` (source files, binary paths) with hash/time metadata.
 | ||
|   - `License` nodes linked to components and SBOM attestations.
 | ||
|   - `Advisory` and `VEXStatement` nodes linking to Concelier/Excititor records via digests.
 | ||
|   - `PolicyVersion` nodes representing signed policy packs.
 | ||
| - **Edges:** directed, timestamped relationships such as `DEPENDS_ON`, `BUILT_FROM`, `DECLARED_IN`, `AFFECTED_BY`, `VEX_EXEMPTS`, `GOVERNS_WITH`, `OBSERVED_RUNTIME`. Each edge carries provenance (SRM hash, SBOM digest, policy run ID).
 | ||
| - **Overlays:** computed index tables providing fast access to reachability, blast radius, and differential views (e.g., `graph_overlay/vuln/{tenant}/{advisoryKey}`).
 | ||
| 
 | ||
| ## 2) Pipelines
 | ||
| 
 | ||
| 1. **Ingestion:** Cartographer/SBOM Service emit SBOM snapshots (`sbom_snapshot` events) captured by the Graph Indexer. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata.
 | ||
| 2. **ETL:** Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or document + adjacency lists in Mongo).
 | ||
| 3. **Overlay computation:** Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports.
 | ||
| 4. **Diffing:** `graph_diff` jobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption.
 | ||
| 
 | ||
| ## 3) APIs
 | ||
| 
 | ||
| - `GET /graph/nodes/{id}` — fetch node with metadata and attached provenance.
 | ||
| - `POST /graph/query/saved` — execute saved query (Cypher-like DSL) with tenant filtering; supports paging, citation metadata, and `explain` traces.
 | ||
| - `GET /graph/impact/{advisoryKey}` — returns impacted artifacts with path context and policy/vex overlays.
 | ||
| - `GET /graph/diff/{snapshotA}/{snapshotB}` — streaming API returning diff manifest including new/removed edges, risk summary, and export references.
 | ||
| - `POST /graph/overlay/policy` — create or retrieve overlay for policy version + advisory set, referencing `effective_finding` results.
 | ||
| 
 | ||
| ## 4) Storage considerations
 | ||
| 
 | ||
| - Backed by either:
 | ||
|   - **Document + adjacency** (Mongo collections `graph_nodes`, `graph_edges`, `graph_overlays`) with deterministic ordering and streaming exports.
 | ||
|   - Or **Graph DB** (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
 | ||
| - All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.
 | ||
| 
 | ||
| ## 5) Offline & export
 | ||
| 
 | ||
| - Each snapshot packages `nodes.jsonl`, `edges.jsonl`, `overlays/` plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles.
 | ||
| - Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results.
 | ||
| 
 | ||
| ## 6) Observability
 | ||
| 
 | ||
| - Metrics: ingestion lag (`graph_ingest_lag_seconds`), node/edge counts, query latency per saved query, overlay generation duration.
 | ||
| - Logs: structured events for ETL stages and query execution (with trace IDs).
 | ||
| - Traces: ETL pipeline spans, query engine spans.
 | ||
| 
 | ||
| ## 7) Rollout notes
 | ||
| 
 | ||
| - Phase 1: ingest SBOM + advisories, deliver impact queries.
 | ||
| - Phase 2: add VEX overlays, policy overlays, diff tooling.
 | ||
| - Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future).
 | ||
| 
 | ||
| Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.
 |