feat: Add new projects to solution and implement contract testing documentation

- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution.
- Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done.
- Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
This commit is contained in:
2025-10-27 07:57:55 +02:00
parent 1e41ba7ffa
commit 651b8e0fa3
355 changed files with 17276 additions and 1160 deletions

431
EPIC_5.md Normal file
View File

@@ -0,0 +1,431 @@
Heres Epic 5 in the same pasteintorepo, implementationready format as the prior epics. Its exhaustive, formal, and designed to slot into AOC, Policy Engine, Conseiller/Excitator, and the Console.
---
# Epic 5: SBOM Graph Explorer
> Short name: **Graph Explorer**
> Services touched: **SBOM Service**, **Graph Indexer** (new), **Graph API** (new), **Policy Engine**, **Conseiller (Feedser)**, **Excitator (Vexer)**, **Web API Gateway**, **Authority** (authN/Z), **Workers/Scheduler**, **Telemetry**
> Surfaces: **Console (Web UI)** graph module, **CLI**, **Exports**
> Deliverables: Interactive graph UI with semantic zoom, saved queries, policy/VEX/advisory overlays, diff views, impact analysis, exports
---
## 1) What it is
**SBOM Graph Explorer** is the interactive, tenantscoped view of all supplychain relationships the platform knows about, rendered as a navigable graph. It connects:
* **Artifacts** (applications, images, libs), **Packages/Versions**, **Files/Paths**, **Licenses**, **Advisories** (from Conseiller), **VEX statements** (from Excitator), **Provenance** (builds, sources), and **Policies** (overlays of determinations)
* **Edges** like `depends_on`, `contains`, `built_from`, `declared_in`, `affected_by`, `vex_exempts`, `governs_with`
* **Time/version** dimension: multiple SBOM snapshots with diffs
Its built for investigation and review: find where a vulnerable package enters; see which apps are impacted; understand why a finding exists; simulate a policy version and see the delta. The explorer observes **AOC enforcement**: it never mutates facts; it aggregates and visualizes them. Only the Policy Engine may classify, and classification is displayed as overlays.
---
## 2) Why
* SBOMs are graphs. Tables flatten what matters and hide transitive risk.
* Engineers, security, and auditors need impact answers quickly: “What pulls in `log4j:2.17` and where is it at runtime?”
* Policy/VEX/advisory interactions are nuanced. A visual overlay makes precedence and outcomes obvious.
* Review is collaborative; you need saved queries, deep links, exports, and consistent evidence.
---
## 3) How it should work (maximum detail)
### 3.1 Domain model
**Nodes** (typed, versioned, tenantscoped):
* `Artifact`: application, service, container image, library, module
* `Package`: name + ecosystem (purl), `PackageVersion` with resolved version
* `File`: path within artifact or image layer
* `License`: SPDX id
* `Advisory`: normalized advisory id (GHSA, CVE, vendor), source = Conseiller
* `VEX`: statement with product context, status, justification, source = Excitator
* `SBOM`: ingestion unit; includes metadata (tool, sha, build info)
* `PolicyDetermination`: materialized view of Policy Engine results (readonly overlay)
* `Build`: provenance, commit, workflow run
* `Source`: repo, tag, commit
**Edges** (directed):
* `declared_in` (PackageVersion → SBOM)
* `contains` (Artifact → PackageVersion | File)
* `depends_on` (PackageVersion → PackageVersion) with scope attr (prod|dev|test|optional)
* `built_from` (Artifact → Build), `provenance_of` (Build → Source)
* `affected_by` (PackageVersion → Advisory) with range semantics
* `vex_exempts` (Advisory ↔ VEX) scoped by product/component
* `licensed_under` (Artifact|PackageVersion → License)
* `governs_with` (Artifact|PackageVersion → PolicyDetermination)
* `derived_from` (SBOM → SBOM) for superseding snapshots
**Identity & versioning**
* Every node has a stable key: `{tenant}:{type}:{natural_id}` (e.g., purl for packages, digest for images).
* SBOM snapshots are immutable; edges carry `valid_from`/`valid_to` for time travel and diffing.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 3.2 User capabilities (endtoend)
* **Search & Navigate**: global search (purls, CVEs, repos, licenses), keyboard nav, breadcrumbs, semantic zoom.
* **Lenses**: toggle views (Security, License, Provenance, Runtime vs Dev, Policy effect).
* **Overlays**:
* **Advisory overlay**: show affected nodes/edges with source, severity, ranges.
* **VEX overlay**: show suppressions/justifications; collapse exempted paths.
* **Policy overlay**: choose a policy version; nodes/edges reflect determinations (severity, status) with explain sampling.
* **Impact analysis**: pick a vulnerable node; highlight upstream/downstream dependents, scope filters, shortest/all paths with constraints.
* **Diff view**: compare SBOM A vs B; show added/removed nodes/edges, changed versions, changed determinations.
* **Saved queries**: visual builder + JSON query; shareable permalinks scoped by tenant and environment.
* **Exports**: GraphML, CSV edge list, NDJSON of findings, PNG/SVG snapshot.
* **Evidence details**: side panel with raw facts, advisory links, VEX statements, policy explain trace, provenance.
* **Accessibility**: tabnavigable, highcontrast, screenreader labels for nodes and sidebars.
### 3.3 Query model
* **Visual builder** for common queries:
* “Show all paths from Artifact X to Advisory Y up to depth 6.”
* “All runtime dependencies with license = GPL3.0.”
* “All artifacts affected by GHSA… with no applicable VEX.”
* “Which SBOMs introduced/removed `openssl` between build 120 and 130?”
* **JSON query** (internal, POST body) with:
* `start`: list of node selectors (type + id or attributes)
* `expand`: edge types and depth, direction, scope filters
* `where`: predicates on node/edge attributes
* `overlay`: policy version id, advisory sources, VEX filters
* `limit`: nodes, edges, timebox, cost budget
* **Cost control**: server estimates cost, denies or pages results; UI streams partial graph tiles.
### 3.4 UI architecture (Console)
* **Canvas**: WebGL renderer with levelofdetail, edge bundling, and label culling; deterministic layout when possible (seeded).
* **Semantic zoom**:
* Far: clusters by artifact/repo/ecosystem, color by lens
* Mid: package groups, advisory badges, license swatches
* Near: concrete versions, direct edges, inline badges for policy determinations
* **Panels**:
* Left: search, filters, lens selector, saved queries
* Right: details, explain trace, evidence tabs (Advisory/VEX/Policy/Provenance)
* Bottom: query expression, diagnostics, performance/stream status
* **Diff mode**: split or overlay, color legend (add/remove/changed), filter by node type.
* **Deep links**: URL encodes query + viewport; shareable respecting RBAC.
* **Keyboard**: space drag, +/- zoom, F to focus, G to expand neighbors, P to show paths.
### 3.5 Backend architecture
**Graph Indexer (new)**
* Consumes SBOM ingests, Conseiller advisories, Excitator VEX statements, Policy Engine determinations (readonly).
* Projects facts into a **property graph** persisted in:
* Primary: document store + adjacency sets (e.g., Mongo collections + compressed adjacency lists)
* Optional driver for graph DB backends if needed (pluggable)
* Maintains materialized aggregates: degree, critical paths cache, affected artifact counts, license distribution.
* Emits **graph snapshots** per SBOM with lineage to original ingestion.
**Graph API (new)**
* Endpoints for search, neighbor expansion, path queries, diffs, overlays, exports.
* Streaming responses for large graphs (chunked NDJSON tiles).
* Cost accounting + quotas per tenant.
**Workers**
* **Centrality & clustering** precompute on idle: betweenness approximations, connected components, Louvain clusters.
* **Diff compute** on new SBOM ingestion pairs (previous vs current).
* **Overlay materialization** cache for popular policy versions.
**Policy Engine integration**
* Graph API requests can specify a policy version.
* For sampled nodes, the API fetches explain traces; for counts, uses precomputed overlay materializations where available.
**AOC enforcement**
* Graph Indexer never merges or edits advisories/VEX; it links them and exposes overlays that the Policy Engine evaluates.
* Conseiller and Excitator remain authoritative sources; severities come from Policygoverned normalization.
### 3.6 APIs (representative)
* `GET /graph/search?q=...&type=package|artifact|advisory|license`
* `POST /graph/query` ⇒ stream tiles `{nodes[], edges[], stats, cursor}`
* `POST /graph/paths` body: `{from, to, depth<=6, constraints{scope, runtime_only}}`
* `POST /graph/diff` body: `{sbom_a, sbom_b, filters}`
* `GET /graph/snapshots/{sbom_id}` ⇒ graph metadata, counts, top advisories
* `POST /graph/export` body: `{format: graphml|csv|ndjson|png|svg, query|snapshot}`
* `GET /graph/saved` / `POST /graph/saved` save and list tenant queries
* `GET /graph/overlays/policy/{version_id}` ⇒ summary stats for caching
All endpoints tenantscoped, RBACchecked. Timeouts and pagination by server. Errors return structured diagnostics.
### 3.7 CLI
```
stella sbom graph search "purl:pkg:maven/org.apache.logging.log4j/log4j-core"
stella sbom graph query --file ./query.json --export graphml > graph.graphml
stella sbom graph impacted --advisory GHSA-xxxx --runtime-only --limit 100
stella sbom graph paths --from artifact:service-a --to advisory:GHSA-xxxx --depth 5 --policy 1.3.0
stella sbom graph diff --sbom-a 2025-03-15T10:00Z --sbom-b 2025-03-22T10:00Z --export csv > diff.csv
stella sbom graph save --name "openssl-runtime" --file ./query.json
```
Exit codes: 0 ok, 2 query validation error, 3 overbudget, 4 not found, 5 RBAC denied.
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
### 3.8 Performance & scale
* **Progressive loading**: server pages tiles by BFS frontier; client renders incrementally.
* **Viewport culling**: only visible nodes/edges in canvas; offscreen demoted to aggregates.
* **Levelofdetail**: simplified glyphs and collapsed clusters at distance.
* **Query budgets**: pertenant rate + node/edge caps; interactive paths limited to depth ≤ 6.
* **Caching**: hot queries memoized per tenant + overlay version; diffs precomputed for consecutive SBOMs.
### 3.9 Security
* Multitenant isolation at storage and API layers.
* RBAC roles:
* **Viewer**: browse graphs, saved queries
* **Investigator**: run queries, export data
* **Operator**: configure budgets, purge caches
* **Auditor**: download evidence bundles
* Input validation for query JSON; deny disallowed edge traversals; strict CSP in web app.
### 3.10 Observability
* Metrics: tile latency, nodes/edges per tile, cache hit rate, query denials, memory pressure.
* Logs: structured, include query hash, cost, truncation flags.
* Traces: server spans per stage (parse, plan, fetch, overlay, stream).
### 3.11 Accessibility & UX guarantees
* Keyboard complete, ARIA roles for graph and panels, highcontrast theme.
* Deterministic layout on reload for shareable investigations.
### 3.12 Data retention
* Graph nodes derived from SBOMs share retention with SBOM artifacts; overlays are ephemeral caches.
* Saved queries retained until deleted; references to missing objects show warnings.
---
## 4) Implementation plan
### 4.1 Services
* **Graph Indexer (new microservice)**
* Subscribes to SBOM ingest events, Conseiller advisory updates, Excitator VEX updates, Policy overlay materializations.
* Builds adjacency lists and node documents; computes aggregates and clusters.
* **Graph API (new microservice)**
* Validates and executes queries; streams tiles; composes overlays; serves diffs and exports.
* Integrates with Policy Engine for explain sample retrieval.
* **SBOM Service (existing)**
* Emits ingestion events with SBOM ids and lineage; exposes SBOM metadata to Graph API.
* **Web API Gateway**
* Routes `/graph/*`, injects tenant context, enforces RBAC.
### 4.2 Console (Web UI) feature module
* `packages/features/graph-explorer`
* Canvas renderer (WebGL), panels, query builder, diff mode, overlays, exports.
* Deeplink router and viewport state serializer.
### 4.3 Workers
* Centrality/clustering worker, diff worker, overlay materialization worker.
* Schedules on lowtraffic windows; backpressure aware.
### 4.4 Data model (storage)
* Collections:
* `graph_nodes`: `{_id, tenant, type, natural_id, attrs, degree, created_at, updated_at}`
* `graph_edges`: `{_id, tenant, from_id, to_id, type, attrs, valid_from, valid_to}`
* `graph_snapshots`: perSBOM node/edge references
* `graph_saved_queries`: `{_id, tenant, name, query_json, created_by}`
* `graph_overlays_cache`: keyed by `{tenant, policy_version, hash(query)}`
* Indexes: compound on `{tenant, type, natural_id}`, `{tenant, from_id}`, `{tenant, to_id}`, time bounds.
---
## 5) Documentation changes (create/update)
1. **`/docs/sbom/graph-explorer-overview.md`**
* Concepts, node/edge taxonomy, lenses, overlays, roles, limitations.
2. **`/docs/sbom/graph-using-the-console.md`**
* Walkthroughs: search, navigate, impact, diff, export; screenshots and keyboard cheatsheet.
3. **`/docs/sbom/graph-query-language.md`**
* JSON schema, examples, constraints, cost/budget rules.
4. **`/docs/sbom/graph-api.md`**
* REST endpoints, request/response examples, streaming and pagination.
5. **`/docs/sbom/graph-cli.md`**
* CLI command reference and example pipelines.
6. **`/docs/policy/graph-overlays.md`**
* How policy versions render in Graph; explain sampling; AOC guardrails.
7. **`/docs/vex/graph-integration.md`**
* How VEX suppressions appear and how to validate product scoping.
8. **`/docs/advisories/graph-integration.md`**
* Advisory linkage and severity normalization by policy.
9. **`/docs/architecture/graph-services.md`**
* Graph Indexer, Graph API, storage choices, failure modes.
10. **`/docs/observability/graph-telemetry.md`**
* Metrics, logs, tracing, dashboards.
11. **`/docs/runbooks/graph-incidents.md`**
* Handling runaway queries, cache poisoning, degraded render.
12. **`/docs/security/graph-rbac.md`**
* Permissions matrix, multitenant boundaries.
Every doc should end with a “Compliance checklist.”
**Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
---
## 6) Tasks
### 6.1 Backend: Graph Indexer
* [ ] Define node/edge schemas and attribute dictionaries for each type.
* [ ] Implement event consumers for SBOM ingests, Conseiller updates, Excitator updates.
* [ ] Build ingestion pipeline that populates nodes/edges and maintains `valid_from/valid_to`.
* [ ] Implement aggregate counters and degree metrics.
* [ ] Implement clustering job and persist cluster ids per node.
* [ ] Implement snapshot materialization per SBOM and lineage tracking.
* [ ] Unit tests for each node/edge builder; propertybased tests for identity stability.
### 6.2 Backend: Graph API
* [ ] Implement `/graph/search` with prefix and exact match across node types.
* [ ] Implement `/graph/query` with validation, planning, cost estimation, and streaming tile results.
* [ ] Implement `/graph/paths` with constraints and depth limits; shortest path heuristic.
* [ ] Implement `/graph/diff` computing adds/removes/changed versions; stream results.
* [ ] Implement overlays: advisory join, VEX join, policy materialization and explain sampling.
* [ ] Implement exports: GraphML, CSV edge list, NDJSON findings, PNG/SVG snapshots.
* [ ] RBAC middleware integration; multitenant scoping.
* [ ] Load tests with synthetic large SBOMs; define default budgets.
### 6.3 Policy Engine integration
* [ ] Add endpoint to fetch explain traces for specific node ids in batch.
* [ ] Add materialization export that Graph API can cache per policy version.
### 6.4 Console (Web UI)
* [ ] Create `graph-explorer` module with routes `/graph`, `/graph/diff`, `/graph/q/:id`.
* [ ] Implement WebGL canvas with LOD, culling, edge bundling, deterministic layout seed.
* [ ] Build search, filter, lens, and overlay toolbars.
* [ ] Side panels: details, evidence tabs, explain viewer.
* [ ] Diff mode: split/overlay toggles and color legend.
* [ ] Saved queries: create, update, run; deep links.
* [ ] Export UI: formats, server roundtrip, progress indicators.
* [ ] a11y audit and keyboardonly flow.
### 6.5 CLI
* [ ] Implement `stella sbom graph *` subcommands with JSON IO and piping support.
* [ ] Document examples and stable output schemas for CI consumption.
### 6.6 Observability & Ops
* [ ] Dashboards for tile latency, query denials, cache hit rate, memory.
* [ ] Alerting on query error spikes, OOM risk, cache churn.
* [ ] Runbooks in `/docs/runbooks/graph-incidents.md`.
### 6.7 Docs
* [ ] Author all docs in section 5, link from Console contextual help.
* [ ] Add endtoend tutorial: “Investigate GHSAXXXX across prod artifacts.”
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
---
## 7) Acceptance criteria
* Console renders large SBOM graphs with semantic zoom, overlays, and responsive interactions.
* Users can run impact and path queries with bounded depth and get results within budget.
* VEX suppressions and advisory severities appear correctly and are consistent with policy.
* Diff view clearly shows added/removed/changed nodes/edges between two SBOMs.
* Saved queries and deep links reproduce the same view deterministically (given same data).
* Exports produce valid GraphML/CSV/NDJSON and image snapshots.
* CLI supports search, query, paths, impacted, diff, and export with stable schemas.
* AOC guardrails: explorer never mutates facts; overlays reflect Policy Engine decisions.
* RBAC enforced; all actions logged and observable.
---
## 8) Risks & mitigations
* **Graph explosion on large monorepos** → tiling, clustering, budgets, and strict depth limits.
* **Inconsistent identities across tools** → canonicalize purls/digests; propertybased tests for identity stability.
* **Policy overlay latency** → precompute materializations for hot policy versions; sample explains only on focus.
* **User confusion** → strong lens defaults, deterministic layouts, legends, incontext help.
---
## 9) Test plan
* **Unit**: node/edge builders, identity normalization, cost estimator.
* **Integration**: ingest SBOM + advisories + VEX, verify overlays and counts.
* **E2E**: Playwright flows for search→impact→diff→export; deep link determinism.
* **Performance**: simulate 500k nodes/2M edges; measure tile latency and memory.
* **Security**: RBAC matrix; tenant isolation tests; query validation fuzzing.
* **Determinism**: snapshot roundtrip: same query and seed produce identical layout and stats.
---
## 10) Feature flags
* `graph.explorer` (UI feature module)
* `graph.paths` (advanced path queries)
* `graph.diff` (SBOM diff mode)
* `graph.overlays.policy` (policy overlay + explain sampling)
* `graph.export` (exports enabled)
Documented in `/docs/observability/graph-telemetry.md`.
---
## 11) Nongoals (this epic)
* Realtime process/runtime call graphs.
* Full substitution for text reports; Explorer complements Reports.
* Crosstenant graphs; all queries are tenantscoped.
---
## 12) Philosophy
* **See the system**: security and license risk are structural. If you cannot see structure, you will miss risk.
* **Evidence over assertion**: every colored node corresponds to raw facts and explainable determinations.
* **Bounded interactivity**: fast, partial answers beat slow “complete” ones.
* **Immutability**: graphs mirror SBOM snapshots and are never rewritten; we add context, not edits.
> Final reminder: **Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.**