Files
git.stella-ops.org/EPIC_5.md
master 651b8e0fa3 feat: Add new projects to solution and implement contract testing documentation
- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution.
- Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done.
- Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
2025-10-27 07:57:55 +02:00

20 KiB
Raw Blame History

Heres Epic 5 in the same pasteintorepo, implementationready format as the prior epics. Its exhaustive, formal, and designed to slot into AOC, Policy Engine, Conseiller/Excitator, and the Console.


Epic 5: SBOM Graph Explorer

Short name: Graph Explorer Services touched: SBOM Service, Graph Indexer (new), Graph API (new), Policy Engine, Conseiller (Feedser), Excitator (Vexer), Web API Gateway, Authority (authN/Z), Workers/Scheduler, Telemetry Surfaces: Console (Web UI) graph module, CLI, Exports Deliverables: Interactive graph UI with semantic zoom, saved queries, policy/VEX/advisory overlays, diff views, impact analysis, exports


1) What it is

SBOM Graph Explorer is the interactive, tenantscoped view of all supplychain relationships the platform knows about, rendered as a navigable graph. It connects:

  • Artifacts (applications, images, libs), Packages/Versions, Files/Paths, Licenses, Advisories (from Conseiller), VEX statements (from Excitator), Provenance (builds, sources), and Policies (overlays of determinations)
  • Edges like depends_on, contains, built_from, declared_in, affected_by, vex_exempts, governs_with
  • Time/version dimension: multiple SBOM snapshots with diffs

Its built for investigation and review: find where a vulnerable package enters; see which apps are impacted; understand why a finding exists; simulate a policy version and see the delta. The explorer observes AOC enforcement: it never mutates facts; it aggregates and visualizes them. Only the Policy Engine may classify, and classification is displayed as overlays.


2) Why

  • SBOMs are graphs. Tables flatten what matters and hide transitive risk.
  • Engineers, security, and auditors need impact answers quickly: “What pulls in log4j:2.17 and where is it at runtime?”
  • Policy/VEX/advisory interactions are nuanced. A visual overlay makes precedence and outcomes obvious.
  • Review is collaborative; you need saved queries, deep links, exports, and consistent evidence.

3) How it should work (maximum detail)

3.1 Domain model

Nodes (typed, versioned, tenantscoped):

  • Artifact: application, service, container image, library, module
  • Package: name + ecosystem (purl), PackageVersion with resolved version
  • File: path within artifact or image layer
  • License: SPDX id
  • Advisory: normalized advisory id (GHSA, CVE, vendor), source = Conseiller
  • VEX: statement with product context, status, justification, source = Excitator
  • SBOM: ingestion unit; includes metadata (tool, sha, build info)
  • PolicyDetermination: materialized view of Policy Engine results (readonly overlay)
  • Build: provenance, commit, workflow run
  • Source: repo, tag, commit

Edges (directed):

  • declared_in (PackageVersion → SBOM)
  • contains (Artifact → PackageVersion | File)
  • depends_on (PackageVersion → PackageVersion) with scope attr (prod|dev|test|optional)
  • built_from (Artifact → Build), provenance_of (Build → Source)
  • affected_by (PackageVersion → Advisory) with range semantics
  • vex_exempts (Advisory ↔ VEX) scoped by product/component
  • licensed_under (Artifact|PackageVersion → License)
  • governs_with (Artifact|PackageVersion → PolicyDetermination)
  • derived_from (SBOM → SBOM) for superseding snapshots

Identity & versioning

  • Every node has a stable key: {tenant}:{type}:{natural_id} (e.g., purl for packages, digest for images).
  • SBOM snapshots are immutable; edges carry valid_from/valid_to for time travel and diffing.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.2 User capabilities (endtoend)

  • Search & Navigate: global search (purls, CVEs, repos, licenses), keyboard nav, breadcrumbs, semantic zoom.

  • Lenses: toggle views (Security, License, Provenance, Runtime vs Dev, Policy effect).

  • Overlays:

    • Advisory overlay: show affected nodes/edges with source, severity, ranges.
    • VEX overlay: show suppressions/justifications; collapse exempted paths.
    • Policy overlay: choose a policy version; nodes/edges reflect determinations (severity, status) with explain sampling.
  • Impact analysis: pick a vulnerable node; highlight upstream/downstream dependents, scope filters, shortest/all paths with constraints.

  • Diff view: compare SBOM A vs B; show added/removed nodes/edges, changed versions, changed determinations.

  • Saved queries: visual builder + JSON query; shareable permalinks scoped by tenant and environment.

  • Exports: GraphML, CSV edge list, NDJSON of findings, PNG/SVG snapshot.

  • Evidence details: side panel with raw facts, advisory links, VEX statements, policy explain trace, provenance.

  • Accessibility: tabnavigable, highcontrast, screenreader labels for nodes and sidebars.

3.3 Query model

  • Visual builder for common queries:

    • “Show all paths from Artifact X to Advisory Y up to depth 6.”
    • “All runtime dependencies with license = GPL3.0.”
    • “All artifacts affected by GHSA… with no applicable VEX.”
    • “Which SBOMs introduced/removed openssl between build 120 and 130?”
  • JSON query (internal, POST body) with:

    • start: list of node selectors (type + id or attributes)
    • expand: edge types and depth, direction, scope filters
    • where: predicates on node/edge attributes
    • overlay: policy version id, advisory sources, VEX filters
    • limit: nodes, edges, timebox, cost budget
  • Cost control: server estimates cost, denies or pages results; UI streams partial graph tiles.

3.4 UI architecture (Console)

  • Canvas: WebGL renderer with levelofdetail, edge bundling, and label culling; deterministic layout when possible (seeded).

  • Semantic zoom:

    • Far: clusters by artifact/repo/ecosystem, color by lens
    • Mid: package groups, advisory badges, license swatches
    • Near: concrete versions, direct edges, inline badges for policy determinations
  • Panels:

    • Left: search, filters, lens selector, saved queries
    • Right: details, explain trace, evidence tabs (Advisory/VEX/Policy/Provenance)
    • Bottom: query expression, diagnostics, performance/stream status
  • Diff mode: split or overlay, color legend (add/remove/changed), filter by node type.

  • Deep links: URL encodes query + viewport; shareable respecting RBAC.

  • Keyboard: space drag, +/- zoom, F to focus, G to expand neighbors, P to show paths.

3.5 Backend architecture

Graph Indexer (new)

  • Consumes SBOM ingests, Conseiller advisories, Excitator VEX statements, Policy Engine determinations (readonly).

  • Projects facts into a property graph persisted in:

    • Primary: document store + adjacency sets (e.g., Mongo collections + compressed adjacency lists)
    • Optional driver for graph DB backends if needed (pluggable)
  • Maintains materialized aggregates: degree, critical paths cache, affected artifact counts, license distribution.

  • Emits graph snapshots per SBOM with lineage to original ingestion.

Graph API (new)

  • Endpoints for search, neighbor expansion, path queries, diffs, overlays, exports.
  • Streaming responses for large graphs (chunked NDJSON tiles).
  • Cost accounting + quotas per tenant.

Workers

  • Centrality & clustering precompute on idle: betweenness approximations, connected components, Louvain clusters.
  • Diff compute on new SBOM ingestion pairs (previous vs current).
  • Overlay materialization cache for popular policy versions.

Policy Engine integration

  • Graph API requests can specify a policy version.
  • For sampled nodes, the API fetches explain traces; for counts, uses precomputed overlay materializations where available.

AOC enforcement

  • Graph Indexer never merges or edits advisories/VEX; it links them and exposes overlays that the Policy Engine evaluates.
  • Conseiller and Excitator remain authoritative sources; severities come from Policygoverned normalization.

3.6 APIs (representative)

  • GET /graph/search?q=...&type=package|artifact|advisory|license
  • POST /graph/query ⇒ stream tiles {nodes[], edges[], stats, cursor}
  • POST /graph/paths body: {from, to, depth<=6, constraints{scope, runtime_only}}
  • POST /graph/diff body: {sbom_a, sbom_b, filters}
  • GET /graph/snapshots/{sbom_id} ⇒ graph metadata, counts, top advisories
  • POST /graph/export body: {format: graphml|csv|ndjson|png|svg, query|snapshot}
  • GET /graph/saved / POST /graph/saved save and list tenant queries
  • GET /graph/overlays/policy/{version_id} ⇒ summary stats for caching

All endpoints tenantscoped, RBACchecked. Timeouts and pagination by server. Errors return structured diagnostics.

3.7 CLI

stella sbom graph search "purl:pkg:maven/org.apache.logging.log4j/log4j-core"
stella sbom graph query --file ./query.json --export graphml > graph.graphml
stella sbom graph impacted --advisory GHSA-xxxx --runtime-only --limit 100
stella sbom graph paths --from artifact:service-a --to advisory:GHSA-xxxx --depth 5 --policy 1.3.0
stella sbom graph diff --sbom-a 2025-03-15T10:00Z --sbom-b 2025-03-22T10:00Z --export csv > diff.csv
stella sbom graph save --name "openssl-runtime" --file ./query.json

Exit codes: 0 ok, 2 query validation error, 3 overbudget, 4 not found, 5 RBAC denied.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.8 Performance & scale

  • Progressive loading: server pages tiles by BFS frontier; client renders incrementally.
  • Viewport culling: only visible nodes/edges in canvas; offscreen demoted to aggregates.
  • Levelofdetail: simplified glyphs and collapsed clusters at distance.
  • Query budgets: pertenant rate + node/edge caps; interactive paths limited to depth ≤ 6.
  • Caching: hot queries memoized per tenant + overlay version; diffs precomputed for consecutive SBOMs.

3.9 Security

  • Multitenant isolation at storage and API layers.

  • RBAC roles:

    • Viewer: browse graphs, saved queries
    • Investigator: run queries, export data
    • Operator: configure budgets, purge caches
    • Auditor: download evidence bundles
  • Input validation for query JSON; deny disallowed edge traversals; strict CSP in web app.

3.10 Observability

  • Metrics: tile latency, nodes/edges per tile, cache hit rate, query denials, memory pressure.
  • Logs: structured, include query hash, cost, truncation flags.
  • Traces: server spans per stage (parse, plan, fetch, overlay, stream).

3.11 Accessibility & UX guarantees

  • Keyboard complete, ARIA roles for graph and panels, highcontrast theme.
  • Deterministic layout on reload for shareable investigations.

3.12 Data retention

  • Graph nodes derived from SBOMs share retention with SBOM artifacts; overlays are ephemeral caches.
  • Saved queries retained until deleted; references to missing objects show warnings.

4) Implementation plan

4.1 Services

  • Graph Indexer (new microservice)

    • Subscribes to SBOM ingest events, Conseiller advisory updates, Excitator VEX updates, Policy overlay materializations.
    • Builds adjacency lists and node documents; computes aggregates and clusters.
  • Graph API (new microservice)

    • Validates and executes queries; streams tiles; composes overlays; serves diffs and exports.
    • Integrates with Policy Engine for explain sample retrieval.
  • SBOM Service (existing)

    • Emits ingestion events with SBOM ids and lineage; exposes SBOM metadata to Graph API.
  • Web API Gateway

    • Routes /graph/*, injects tenant context, enforces RBAC.

4.2 Console (Web UI) feature module

  • packages/features/graph-explorer

    • Canvas renderer (WebGL), panels, query builder, diff mode, overlays, exports.
    • Deeplink router and viewport state serializer.

4.3 Workers

  • Centrality/clustering worker, diff worker, overlay materialization worker.
  • Schedules on lowtraffic windows; backpressure aware.

4.4 Data model (storage)

  • Collections:

    • graph_nodes: {_id, tenant, type, natural_id, attrs, degree, created_at, updated_at}
    • graph_edges: {_id, tenant, from_id, to_id, type, attrs, valid_from, valid_to}
    • graph_snapshots: perSBOM node/edge references
    • graph_saved_queries: {_id, tenant, name, query_json, created_by}
    • graph_overlays_cache: keyed by {tenant, policy_version, hash(query)}
  • Indexes: compound on {tenant, type, natural_id}, {tenant, from_id}, {tenant, to_id}, time bounds.


5) Documentation changes (create/update)

  1. /docs/sbom/graph-explorer-overview.md

    • Concepts, node/edge taxonomy, lenses, overlays, roles, limitations.
  2. /docs/sbom/graph-using-the-console.md

    • Walkthroughs: search, navigate, impact, diff, export; screenshots and keyboard cheatsheet.
  3. /docs/sbom/graph-query-language.md

    • JSON schema, examples, constraints, cost/budget rules.
  4. /docs/sbom/graph-api.md

    • REST endpoints, request/response examples, streaming and pagination.
  5. /docs/sbom/graph-cli.md

    • CLI command reference and example pipelines.
  6. /docs/policy/graph-overlays.md

    • How policy versions render in Graph; explain sampling; AOC guardrails.
  7. /docs/vex/graph-integration.md

    • How VEX suppressions appear and how to validate product scoping.
  8. /docs/advisories/graph-integration.md

    • Advisory linkage and severity normalization by policy.
  9. /docs/architecture/graph-services.md

    • Graph Indexer, Graph API, storage choices, failure modes.
  10. /docs/observability/graph-telemetry.md

    • Metrics, logs, tracing, dashboards.
  11. /docs/runbooks/graph-incidents.md

    • Handling runaway queries, cache poisoning, degraded render.
  12. /docs/security/graph-rbac.md

    • Permissions matrix, multitenant boundaries.

Every doc should end with a “Compliance checklist.” Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


6) Tasks

6.1 Backend: Graph Indexer

  • Define node/edge schemas and attribute dictionaries for each type.
  • Implement event consumers for SBOM ingests, Conseiller updates, Excitator updates.
  • Build ingestion pipeline that populates nodes/edges and maintains valid_from/valid_to.
  • Implement aggregate counters and degree metrics.
  • Implement clustering job and persist cluster ids per node.
  • Implement snapshot materialization per SBOM and lineage tracking.
  • Unit tests for each node/edge builder; propertybased tests for identity stability.

6.2 Backend: Graph API

  • Implement /graph/search with prefix and exact match across node types.
  • Implement /graph/query with validation, planning, cost estimation, and streaming tile results.
  • Implement /graph/paths with constraints and depth limits; shortest path heuristic.
  • Implement /graph/diff computing adds/removes/changed versions; stream results.
  • Implement overlays: advisory join, VEX join, policy materialization and explain sampling.
  • Implement exports: GraphML, CSV edge list, NDJSON findings, PNG/SVG snapshots.
  • RBAC middleware integration; multitenant scoping.
  • Load tests with synthetic large SBOMs; define default budgets.

6.3 Policy Engine integration

  • Add endpoint to fetch explain traces for specific node ids in batch.
  • Add materialization export that Graph API can cache per policy version.

6.4 Console (Web UI)

  • Create graph-explorer module with routes /graph, /graph/diff, /graph/q/:id.
  • Implement WebGL canvas with LOD, culling, edge bundling, deterministic layout seed.
  • Build search, filter, lens, and overlay toolbars.
  • Side panels: details, evidence tabs, explain viewer.
  • Diff mode: split/overlay toggles and color legend.
  • Saved queries: create, update, run; deep links.
  • Export UI: formats, server roundtrip, progress indicators.
  • a11y audit and keyboardonly flow.

6.5 CLI

  • Implement stella sbom graph * subcommands with JSON IO and piping support.
  • Document examples and stable output schemas for CI consumption.

6.6 Observability & Ops

  • Dashboards for tile latency, query denials, cache hit rate, memory.
  • Alerting on query error spikes, OOM risk, cache churn.
  • Runbooks in /docs/runbooks/graph-incidents.md.

6.7 Docs

  • Author all docs in section 5, link from Console contextual help.
  • Add endtoend tutorial: “Investigate GHSAXXXX across prod artifacts.”

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


7) Acceptance criteria

  • Console renders large SBOM graphs with semantic zoom, overlays, and responsive interactions.
  • Users can run impact and path queries with bounded depth and get results within budget.
  • VEX suppressions and advisory severities appear correctly and are consistent with policy.
  • Diff view clearly shows added/removed/changed nodes/edges between two SBOMs.
  • Saved queries and deep links reproduce the same view deterministically (given same data).
  • Exports produce valid GraphML/CSV/NDJSON and image snapshots.
  • CLI supports search, query, paths, impacted, diff, and export with stable schemas.
  • AOC guardrails: explorer never mutates facts; overlays reflect Policy Engine decisions.
  • RBAC enforced; all actions logged and observable.

8) Risks & mitigations

  • Graph explosion on large monorepos → tiling, clustering, budgets, and strict depth limits.
  • Inconsistent identities across tools → canonicalize purls/digests; propertybased tests for identity stability.
  • Policy overlay latency → precompute materializations for hot policy versions; sample explains only on focus.
  • User confusion → strong lens defaults, deterministic layouts, legends, incontext help.

9) Test plan

  • Unit: node/edge builders, identity normalization, cost estimator.
  • Integration: ingest SBOM + advisories + VEX, verify overlays and counts.
  • E2E: Playwright flows for search→impact→diff→export; deep link determinism.
  • Performance: simulate 500k nodes/2M edges; measure tile latency and memory.
  • Security: RBAC matrix; tenant isolation tests; query validation fuzzing.
  • Determinism: snapshot roundtrip: same query and seed produce identical layout and stats.

10) Feature flags

  • graph.explorer (UI feature module)
  • graph.paths (advanced path queries)
  • graph.diff (SBOM diff mode)
  • graph.overlays.policy (policy overlay + explain sampling)
  • graph.export (exports enabled)

Documented in /docs/observability/graph-telemetry.md.


11) Nongoals (this epic)

  • Realtime process/runtime call graphs.
  • Full substitution for text reports; Explorer complements Reports.
  • Crosstenant graphs; all queries are tenantscoped.

12) Philosophy

  • See the system: security and license risk are structural. If you cannot see structure, you will miss risk.
  • Evidence over assertion: every colored node corresponds to raw facts and explainable determinations.
  • Bounded interactivity: fast, partial answers beat slow “complete” ones.
  • Immutability: graphs mirror SBOM snapshots and are never rewritten; we add context, not edits.

Final reminder: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.