Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
63 lines
6.5 KiB
Markdown
63 lines
6.5 KiB
Markdown
# StellaOps Graph
|
||
|
||
Graph Indexer + Graph API build the tenant-scoped knowledge graph that powers blast-radius analysis, provenance timelines, and saved-query automation across StellaOps. Cartographer has been retired as of 2025-10-30 (see `docs/updates/2025-10-30-devops-governance.md`); this module now owns ingestion, storage, overlays, and query surfaces for graph data.
|
||
|
||
## Scope & responsibilities
|
||
- Ingest SBOM snapshots, advisory/VEX events, policy overlays, and runtime signals to maintain a first-party graph representation with deterministic node/edge identities.
|
||
- Serve APIs and saved-query tooling for impact analysis, dependency traversal, diffing, and policy/VEX overlays with explainable provenance.
|
||
- Supply Graph Explorer UI/CLI experiences, plus Offline Kit exports (`nodes.jsonl`, `edges.jsonl`, `overlays/`) with DSSE manifests for air-gapped replay. Analytics overlays are emitted as NDJSON (`overlays/clusters.ndjson`, `overlays/centrality.ndjson`) with deterministic ordering; Mongo-backed providers support production wiring.
|
||
- Maintain the [Graph Index Canonical Schema](schema.md) and coordinate query/overlay lifecycle with Scheduler, Policy Engine, Vulnerability Explorer, and Export Center.
|
||
|
||
## Architecture snapshot (Sprint 30 groundwork)
|
||
- **Graph Indexer service** — consumes SBOM (`sbom_snapshot`), advisory, and VEX events; normalises identifiers; persists into `graph_nodes`, `graph_edges`, `graph_snapshots`, and overlay caches with tenant partitions.
|
||
- **Graph API service** — exposes `GET /graph/nodes`, `/graph/impact/{advisory}`, `/graph/query/saved`, `/graph/diff`, and overlay endpoints with RBAC scopes defined in Authority (`docs/updates/2025-10-26-authority-graph-scopes.md`).
|
||
- **Overlay & diff workers** — materialise impact lists, saved-query caches, and signed diff manifests; feed Scheduler `GraphBuildJob`/`GraphOverlayJob` contracts (`docs/updates/2025-10-26-scheduler-graph-jobs.md`).
|
||
- **Console & CLI integrations** — planned modules deliver WebGL explorer, timeline viz, and CLI `stella sbom graph ...` commands aligned with implementation plan phases.
|
||
- **Storage abstraction** — supports document + adjacency (Mongo) or pluggable graph engine; both paths enforce deterministic ordering and export manifests.
|
||
|
||
## Current workstreams (Q4 2025)
|
||
- `GRAPH-SVC-30-00x` (see `src/Graph/StellaOps.Graph.Indexer/TASKS.md`) — stand up Graph Indexer pipeline, identity registry, snapshot exports.
|
||
- Active sprint: `docs/implplan/SPRINT_0141_0001_0001_graph_indexer.md` (Runtime & Signals 140.A) — clustering/centrality jobs, incremental/backfill pipeline, determinism tests, packaging.
|
||
- `GRAPH-API-30-00x` — draft API planner/cost guard, streaming responses, and Authority scope integration.
|
||
- `DOCS-GRAPH-24-003` & related backlog — author overview/API/query language docs; update this README again once those deliverables land.
|
||
- Deployment/DevOps follow-ups (`DEVOPS-VEX-30-001`, `DEPLOY-VEX-30-001`) coordinate dashboards, load tests, and Helm/Compose overlays for the graph stack.
|
||
|
||
## Integrations & dependencies
|
||
- **SBOM Service** (Scanner WebService + Worker) produce `sbom_snapshot` events consumed by Graph Indexer.
|
||
- **Concelier/Excititor** contribute advisory + VEX edges; VEX Lens consensus overlays attach to graph nodes as attributes.
|
||
- **Policy Engine & Scheduler** trigger recompute jobs and consume overlays for risk/impact automation.
|
||
- **Vulnerability Explorer & Console** surface graph queries, saved views, and diff visualisations.
|
||
- **Authority** defines scopes (`graph.viewer`, `graph.operator`) and client registrations; secrets managed via existing platform patterns.
|
||
|
||
## Data, observability & offline
|
||
- Collections/tables: `graph_nodes`, `graph_edges`, `graph_snapshots`, `graph_saved_queries`, `graph_overlays_cache`, append-only change logs for replay.
|
||
- Metrics: `graph_ingest_lag_seconds`, `graph_nodes_total`, `graph_query_latency_seconds{queryId}`, overlay/diff duration counters.
|
||
- Logs/traces: structured ETL logs, query planner traces, WebGL interaction telemetry (once UI lands).
|
||
- Offline bundles: deterministic `nodes.jsonl`, `edges.jsonl`, overlay manifests + DSSE signatures, consumable by Export Center and CLI mirroring.
|
||
|
||
## Operations & runbook (Sprint 030)
|
||
- Dashboards: import `Observability/graph-api-grafana.json` (panels for latency, budget denials, overlay cache ratio, export latency). Apply tenant filter in every panel.
|
||
- Health checks: `/healthz` should be 200; search/query/paths/diff/export endpoints require `X-Stella-Tenant`, `Authorization`, and scopes (`graph:read/query/export`).
|
||
- Key metrics (new):
|
||
- `graph_tile_latency_seconds` histogram (label `route`); alert when p95 > 1.5s for 5m.
|
||
- `graph_query_budget_denied_total` counter (label `reason`); investigate spikes (>50 in 5m).
|
||
- `graph_overlay_cache_hits_total` / `graph_overlay_cache_misses_total`; watch miss ratio > 0.4 for 10m.
|
||
- `graph_export_latency_seconds` histogram (label `format`); alert when p95 > 2s for ndjson/graphml.
|
||
- Triage playbook:
|
||
- Budget denials: lower default edges/nodes budget or guide callers to request smaller scopes; verify overlay includes are truly required.
|
||
- Overlay cache misses: ensure cache TTL is ≥5m; check overlay service connectivity to Policy Engine; warm cache by replaying recent hot nodes.
|
||
- Export slowness: reduce export `Limit`, offload PNG/SVG to worker, and confirm disk I/O headroom.
|
||
- If alerts fire, capture tenant, route, cursor/budget values, and recent deploy SHA in incident note.
|
||
|
||
## Key docs & updates
|
||
- [`architecture.md`](architecture.md) — inputs, pipelines, APIs, storage choices, observability, offline handling.
|
||
- [`implementation_plan.md`](implementation_plan.md) — phased delivery roadmap, work breakdown, risks, test strategy.
|
||
- [`schema.md`](schema.md) — canonical node/edge schema and attribute dictionary (keep in sync with indexer code).
|
||
- API surface: `docs/api/graph-gateway-spec-draft.yaml` (NDJSON tiles for `/graph/search|query|paths|diff|export`, budgets, overlays).
|
||
- Updates: `docs/updates/2025-10-26-scheduler-graph-jobs.md`, `docs/updates/2025-10-26-authority-graph-scopes.md`, `docs/updates/2025-10-30-devops-governance.md` for the latest decisions/dependencies.
|
||
- Index: see `architecture-index.md` for data model, ingestion pipeline, overlays/caches, events, and API/observability pointers.
|
||
|
||
## Epic alignment
|
||
- **Epic 5 – SBOM Graph Explorer:** Graph Indexer, Graph API, saved queries, overlays, Console/CLI experiences, Offline Kit parity.
|
||
- Cross-epic ties: Policy reasoning (explain overlays), Scheduler recompute, Notify/Task Runner integration for graph incidents.
|