Files
git.stella-ops.org/docs/modules/graph/architecture.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

57 lines
3.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Graph architecture
> Derived from Epic5 SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures.
## 1) Core model
- **Nodes:**
- `Artifact` (application/image digest) with metadata (tenant, environment, labels).
- `Component` (package/version, purl, ecosystem).
- `File`/`Path` (source files, binary paths) with hash/time metadata.
- `License` nodes linked to components and SBOM attestations.
- `Advisory` and `VEXStatement` nodes linking to Concelier/Excititor records via digests.
- `PolicyVersion` nodes representing signed policy packs.
- **Edges:** directed, timestamped relationships such as `DEPENDS_ON`, `BUILT_FROM`, `DECLARED_IN`, `AFFECTED_BY`, `VEX_EXEMPTS`, `GOVERNS_WITH`, `OBSERVED_RUNTIME`. Each edge carries provenance (SRM hash, SBOM digest, policy run ID).
- **Overlays:** computed index tables providing fast access to reachability, blast radius, and differential views (e.g., `graph_overlay/vuln/{tenant}/{advisoryKey}`).
## 2) Pipelines
1. **Ingestion:** Cartographer/SBOM Service emit SBOM snapshots (`sbom_snapshot` events) captured by the Graph Indexer. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata.
2. **ETL:** Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or document + adjacency lists in Mongo).
3. **Overlay computation:** Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports.
4. **Diffing:** `graph_diff` jobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption.
## 3) APIs
- `GET /graph/nodes/{id}` — fetch node with metadata and attached provenance.
- `POST /graph/query/saved` — execute saved query (Cypher-like DSL) with tenant filtering; supports paging, citation metadata, and `explain` traces.
- `GET /graph/impact/{advisoryKey}` — returns impacted artifacts with path context and policy/vex overlays.
- `GET /graph/diff/{snapshotA}/{snapshotB}` — streaming API returning diff manifest including new/removed edges, risk summary, and export references.
- `POST /graph/overlay/policy` — create or retrieve overlay for policy version + advisory set, referencing `effective_finding` results.
## 4) Storage considerations
- Backed by either:
- **Document + adjacency** (Mongo collections `graph_nodes`, `graph_edges`, `graph_overlays`) with deterministic ordering and streaming exports.
- Or **Graph DB** (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
- All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.
## 5) Offline & export
- Each snapshot packages `nodes.jsonl`, `edges.jsonl`, `overlays/` plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles.
- Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results.
## 6) Observability
- Metrics: ingestion lag (`graph_ingest_lag_seconds`), node/edge counts, query latency per saved query, overlay generation duration.
- Logs: structured events for ETL stages and query execution (with trace IDs).
- Traces: ETL pipeline spans, query engine spans.
## 7) Rollout notes
- Phase 1: ingest SBOM + advisories, deliver impact queries.
- Phase 2: add VEX overlays, policy overlays, diff tooling.
- Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future).
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.