Files
git.stella-ops.org/EPIC_6.md
master 651b8e0fa3 feat: Add new projects to solution and implement contract testing documentation
- Added "StellaOps.Policy.Engine", "StellaOps.Cartographer", and "StellaOps.SbomService" projects to the StellaOps solution.
- Created AGENTS.md to outline the Contract Testing Guild Charter, detailing mission, scope, and definition of done.
- Established TASKS.md for the Contract Testing Task Board, outlining tasks for Sprint 62 and Sprint 63 related to mock servers and replay testing.
2025-10-27 07:57:55 +02:00

22 KiB
Raw Blame History

Below is the expanded, “maximum documentation” package for Epic 6. It is pasteready for your repo and deliberately formal so engineering, docs, and audit folks can work from the same source of truth.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


Epic 6: Vulnerability Explorer (policyaware)

Short name: Vuln Explorer Services touched: Conseiller (Feedser), Excitator (Vexer), SBOM Service, Policy Engine, Findings Ledger (new), Web API Gateway, Authority (authN/Z), Workers/Scheduler, Telemetry/Analytics, Console (Web UI), CLI AOC ground rule: Conseiller/Excitator aggregate but never merge or rewrite source documents. The Explorer only renders effective results as decided by Policy Engine and records human workflow as immutable ledger events.


1) What it is

The Vulnerability Explorer is the API and UI for operational triage, investigation, and reporting of vulnerabilities across all artifacts tracked by StellaOps. It correlates SBOM inventory with advisory and VEX evidence, then displays the effective status and obligations per the currently selected Policy version. It never edits source evidence. It provides:

  • Policyaware lists, pivots, and detail views
  • Triage workflow with immutable audit ledger
  • Risk acceptance with expiry and evidence
  • SLA tracking by severity and business tier
  • Simulation against other policy versions
  • Exports and cryptographically signed “evidence bundles”

Identity principle: one Finding per tuple (artifact_id, purl, version, advisory_key), with links to every contributing AdvisoryEvidence and VexEvidence.


2) Why (concise)

  • Prioritization must reflect policy and VEX, not raw feeds.
  • Audit requires complete, reproducible lineage: what changed, why, and who decided.
  • Operators need consistent APIs, CLI, and a UI that explain determinations, not just list CVEs.

3) How it should work (maximum detail)

3.1 Domain model and contracts

Evidence (immutable)

  • AdvisoryEvidence (from Conseiller)

    {
      "id": "adv_evd:...uuid...",
      "tenant": "acme",
      "source": "ghsa|nvd|vendor|ossindex|... ",
      "source_id": "GHSA-xxxx",
      "schema": "GHSA|CVE|CSAF-ADV",
      "advisory_key": "CVE-2024-12345",
      "affected": [{"ecosystem":"npm","purl":"pkg:npm/lodash","ranges":[{"type":"semver","events":[{"introduced":"0.0.0"},{"fixed":"4.17.21"}]}]}],
      "cvss": {"version":"3.1","baseScore":7.5,"vectorString":"AV:N/AC:L/..."},
      "severity": "HIGH",
      "urls": ["https://..."],
      "published": "2024-06-10T12:00:00Z",
      "withdrawn": null,
      "ingested_at": "2024-06-11T08:43:21Z"
    }
    
  • VexEvidence (from Excitator)

    {
      "id": "vex_evd:...uuid...",
      "tenant": "acme",
      "source": "vendor-csaf",
      "schema": "CSAF-VEX",
      "advisory_key": "CVE-2024-12345",
      "product_scope": [{"purl":"pkg:npm/lodash@4.17.20"}],
      "status": "not_affected|affected|fixed|under_investigation",
      "justification": "component_not_present|vulnerable_code_not_in_execute_path|... ",
      "impact_statement": "Not reachable in Acme Payment Service.",
      "timestamp": "2024-06-12T10:00:00Z",
      "ingested_at": "2024-06-12T10:10:02Z"
    }
    
  • InventoryEvidence (from SBOM Service)

    {
      "id": "inv_evd:...uuid...",
      "tenant": "acme",
      "artifact_id": "svc:payments@1.14.0",
      "sbom_id": "sbom:sha256:...",
      "purl": "pkg:npm/lodash@4.17.20",
      "scope": "runtime|dev|test",
      "runtime_flag": true,
      "paths": [["root", "pkg:npm/a", "pkg:npm/lodash"]],
      "discovered_at": "2024-06-12T11:00:00Z"
    }
    

PolicyDetermination (readonly from Policy Engine)

{
  "key": {
    "artifact_id": "svc:payments@1.14.0",
    "purl": "pkg:npm/lodash",
    "version": "4.17.20",
    "advisory_key": "CVE-2024-12345",
    "policy_version": "1.3.0"
  },
  "applicable": true,
  "effective_severity": "HIGH",
  "exploitability": "ACTIVE|LIKELY|UNKNOWN|UNLIKELY",
  "signals": {"epss": 0.86, "kev": true, "maturity": "weaponized"},
  "suppression_state": "none|policy|vex",
  "obligations": [{"type":"fix_by","due":"2024-07-15"},{"type":"document_risk"}],
  "sla": {"due":"2024-07-15","tier":"gold"},
  "rationale": [
    {"rule":"sev.base=nvd","detail":"NVD base:7.5"},
    {"rule":"exploit.upsell.kev","detail":"KEV flag → HIGH"},
    {"rule":"env.weight.prod","detail":"Env=prod no downgrade"}
  ]
}

Finding identity

finding_id = hash(tenant, artifact_id, purl, version, advisory_key)

Ledger event (appendonly, tamperevident)

{
  "event_id": "led:...uuid...",
  "finding_id": "f:...hash...",
  "tenant": "acme",
  "type": "assign|comment|attach|change_state|accept_risk|set_target_fix|verify_fix|reopen",
  "payload": {"to":"team:platform","reason":"oncall triage"},
  "actor": {"user_id":"u:42","display":"Dana S."},
  "ts": "2024-06-12T14:01:02Z",
  "prev_event_hash": "sha256:...",
  "event_hash": "sha256:sha256(canonical_json(event) + prev_event_hash)"
}

Projection (materialized current state for fast lists)

{
  "finding_id":"f:...hash...",
  "tenant":"acme",
  "artifact_id":"svc:payments@1.14.0",
  "purl":"pkg:npm/lodash",
  "version":"4.17.20",
  "advisory_key":"CVE-2024-12345",
  "effective_severity":"HIGH",
  "exploitability":"ACTIVE",
  "suppression_state":"none",
  "status":"UNDER_INVESTIGATION|REMEDIATING|ACCEPTED|RESOLVED|NEW|SUPPRESSED",
  "sla_due":"2024-07-15",
  "owner":"team:platform",
  "kev":true,
  "epss":0.86,
  "new_since":"2024-06-12",
  "has_fix":true,
  "envs":["prod"],
  "runtime_flag":true,
  "updated_at":"2024-06-12T14:01:02Z"
}

Advisory key normalization

  • Input identifiers: CVE-*, GHSA-*, vendor IDs.
  • Preference order: CVE, then GHSA, else vendor id prefixed with namespace.
  • Canonicalization: uppercase, trim, map withdrawn to same key but mark withdrawn=true in evidence.
  • Conseiller must publish links: [all source ids] for provenance.

3.2 Resolver algorithm (candidate findings)

Goal: produce tuples (artifact_id, purl, version, advisory_key) where inventory intersects affected ranges and policy deems the path relevant.

Pseudocode

for each artifact sbom S:
  inv = inventory(S)
  for each advisory evidence A:
    for each affected package spec in A.affected:
      for each inv_item in inv where inv_item.purl package == spec.purl package:
        if version_in_ranges(inv_item.version, spec.ranges, ecosystem):
          if policy.path_scope_allows(inv_item.scope, inv_item.runtime_flag, inv_item.paths):
             yield candidate (artifact_id, inv_item.purl, inv_item.version, A.advisory_key)

Version semantics per ecosystem

  • npm: semver, pre-release excluded unless explicitly in range.
  • Maven: Maven version rules, handle -SNAPSHOT, use mavenresolver semantics.
  • PyPI: PEP 440 versioning.
  • Go: semver with +incompatible handling.
  • OS packages: RPM/DEB epoch:versionrelease ordering.

Edge cases

  • Multiple paths: store shortest path and count.
  • Dev/test scope: policy may exclude or downgrade.
  • Withdrawn advisories: keep as evidence; Policy can set severity to NONE.

3.3 VEX precedence and scoping

  • If any matching VEX says not_affected scoped to the artifact product/component per CSAF product tree, set suppression_state="vex" and applicable=false.
  • If VEX says fixed and inventory version is >= fixed version, mark as Resolved (verified) after SBOM recrawl confirms.
  • If VEX under_investigation, no suppression; may add a policy grace period obligation.

3.4 Policy evaluation

  • Inputs: candidate tuple + context (artifact env, business tier, ownership, signals, fix availability).
  • Determinations: applicability, effective severity, exploitability, obligations, SLA, suppression.
  • Suppression by policy examples: testscope only; path through optional deps; package vendored but not linked at runtime.
  • Simulation: identical input, alternate policy_version; returns determinations without side effects.

3.5 API surface (authoritative)

List

GET /vuln/findings?policy=1.3.0&sev=high,critical&group_by=artifact&exploit=kev&env=prod&page=1&page_size=100

Response

{
  "page": 1,
  "page_size": 100,
  "total": 740,
  "group_by": "artifact",
  "results": [
    {"group":"svc:payments","counts":{"CRITICAL":3,"HIGH":12,"MEDIUM":8},"sla_breaches":2},
    ...
  ]
}

Query (complex filters)

POST /vuln/findings/query
{
  "policy": "1.3.0",
  "filter": {
    "severity": [">=MEDIUM"],
    "exploit": ["kev", "epss>=0.8"],
    "artifact": ["svc:payments", "svc:checkouts"],
    "status": ["NEW","UNDER_INVESTIGATION"],
    "env": ["prod"]
  },
  "sort": [{"field":"effective_severity","dir":"desc"},{"field":"epss","dir":"desc"}],
  "page": {"number":1,"size":200}
}

Detail

GET /vuln/findings/{finding_id}?policy=1.3.0

Returns projection, evidence links, policy rationale, paths, history summary.

Workflow

POST /vuln/findings/{id}/assign   { "to": "team:platform" }
POST /vuln/findings/{id}/comment  { "text": "triage notes..." }
POST /vuln/findings/{id}/accept-risk { "until":"2025-06-30","reason":"vendor patch pending","evidence":["url|upload_id"] }
POST /vuln/findings/{id}/verify-fix { "sbom_id": "sbom:sha256:..." }
POST /vuln/findings/{id}/target-fix { "version": "4.17.21" }

Simulation

POST /vuln/simulate
{
  "policy_from": "1.3.0",
  "policy_to": "1.4.0",
  "query": { "severity":[">=MEDIUM"], "env":["prod"] }
}

Response includes perfinding delta {before, after, diff}.

Export

POST /vuln/export { "format":"ndjson","scope":{"query":{...}} }

Returns a signed bundle (see §3.10).

Errors

  • 400 validation, 403 RBAC, 404 not found, 409 state conflict (idempotency), 429 rate limited, 5xx server.

3.6 Console (Web UI)

Routes

  • /vuln list with saved views
  • /vuln/:id detail drawer state
  • /vuln/simulate/:policyVersion diff mode

State shape (client)

interface VulnListState {
  policyVersion: string;
  filters: {...};
  sort: [...];
  columns: string[];
  viewId?: string;
  page: {number: number; size: number};
}

UX

  • Virtualized grid with server paging; column chooser; density toggle.
  • Quick filters: severity, exploit signals, status, env, owner, fix availability.
  • Detail tabs: Summary, Evidence (raw docs with provenance), Policy (rationale chain), Paths (deep link to Graph Explorer), Fixes, History.
  • Simulation bar shows delta chips: +21 HIGH, -9 Suppressed by VEX etc.
  • Evidence bundle dialog previews scope and size.
  • a11y: ARIA roles on grid, keyboard shortcuts: A assign, C comment, R accept risk, V verify fix.

3.7 CLI

Commands

stella vuln list --policy 1.3.0 --sev high,critical --group-by artifact --env prod --json
stella vuln show --id <finding-id> --policy 1.3.0
stella vuln simulate --from 1.3.0 --to 1.4.0 --sev '>=medium' --delta --json
stella vuln assign --filter 'advisory:CVE-2024-12345 artifact:payments' --to team:platform
stella vuln accept-risk --id <finding-id> --until 2025-06-30 --reason "vendor patch pending" --evidence url:https://ticket/123
stella vuln verify-fix --id <finding-id> --sbom <sbom-id>

Return codes: 0 ok, 2 invalid args, 3 budget exceeded, 4 not found, 5 denied.

3.8 Storage schema (illustrative)

Tables

-- Evidence
CREATE TABLE evidence_advisory (...);
CREATE INDEX ea_tenant_key ON evidence_advisory(tenant, advisory_key);
CREATE TABLE evidence_vex (...);
CREATE INDEX ev_tenant_key ON evidence_vex(tenant, advisory_key);
CREATE TABLE evidence_inventory (...);
CREATE INDEX ei_artifact_purl ON evidence_inventory(tenant, artifact_id, purl);

-- Ledger
CREATE TABLE findings_ledger_events (
  event_id uuid PRIMARY KEY,
  finding_id bytea NOT NULL,
  tenant text NOT NULL,
  type text NOT NULL,
  payload jsonb NOT NULL,
  actor jsonb NOT NULL,
  ts timestamptz NOT NULL,
  prev_event_hash bytea,
  event_hash bytea NOT NULL
);
CREATE INDEX fle_find_ts ON findings_ledger_events(tenant, finding_id, ts);

-- Projection
CREATE TABLE findings_projection (
  finding_id bytea PRIMARY KEY,
  tenant text NOT NULL,
  artifact_id text NOT NULL,
  purl text NOT NULL,
  version text NOT NULL,
  advisory_key text NOT NULL,
  policy_version text NOT NULL,
  effective_severity text NOT NULL,
  exploitability text,
  suppression_state text,
  status text NOT NULL,
  sla_due date,
  owner text,
  kev boolean,
  epss double precision,
  envs text[],
  runtime_flag boolean,
  updated_at timestamptz NOT NULL
);
CREATE INDEX fp_query ON findings_projection(tenant, policy_version, effective_severity, status);

Tamperevidence

  • Ledger events use chained SHA256 hashes over canonical JSON + previous hash.
  • Daily Merkle root of all event hashes is anchored to the audit store (and optionally external timestamping service).

3.9 Performance and scaling

  • P95 list endpoint under 600 ms for 100row pages at 5M findings/tenant.
  • Projections denormalize heavy joins; background projector uses idempotent jobs keyed by (tenant,finding_id,policy_version).
  • Rate limits per tenant and per API key; backpressure on export jobs; exponential retry for projector.

3.10 Evidence bundle format

  • Container: ZIP with manifest.json, findings.ndjson, advisory_evidence.ndjson, vex_evidence.ndjson, inventory_evidence.ndjson, policy_version.json, ledger_events.ndjson, CHECKSUMS.

  • Signing: Detached signature bundle.sig using tenants org key (Ed25519).

  • Manifest

    {"generated_at":"2024-06-12T15:00:00Z","tenant":"acme","policy_version":"1.3.0","scope":{"query":{...}},"counts":{"findings":421}}
    

3.11 Observability

  • Metrics (OpenTelemetry):

    • vuln_findings_list_latency_ms (histogram)
    • vuln_projection_lag_seconds (gauge)
    • vuln_new_findings_total (counter)
    • vuln_sla_breaches_total (counter by sev, owner)
    • vuln_simulation_latency_ms (histogram)
  • Logs: structured JSON with tenant, policy_version, query_hash, result_count.

  • Traces: spans for resolver, policy calls, projection builds, export assembly.

  • PII: redact comments in logs; store attachments encrypted at rest (KMS).

3.12 Security & RBAC

Roles

  • Viewer: GET list/detail/export read scope.
  • Investigator: Viewer + workflow actions except risk acceptance.
  • Operator: Investigator + risk acceptance, verify fix, bulk actions.
  • Auditor: Viewer + evidence bundles and ledger integrity checks.

ABAC

  • Attribute constraints: by artifact.owner, env, and business_tier.
  • CSRF protection for Console; all POST require antiforgery tokens.
  • Attachments stored with envelope encryption; signed URLs for limited time access.

3.13 Rollout and migrations

  • Feature flags: vuln.explorer.ui, vuln.explorer.simulation, vuln.explorer.bulk_actions, vuln.explorer.evidence_bundle.
  • Phase 1: dark launch API and projections.
  • Phase 2: UI readonly list and detail.
  • Phase 3: workflow actions and exports.
  • Data backfill: replay advisory/VEX/SBOM events to seed projections.
  • Compatibility: maintain projection v1 schema for two releases; migration scripts in /migrations/vuln/.

4) Implementation plan

4.1 Services

  • Findings Ledger (new)

    • Appendonly event store with projector to findings_projection.
    • Event validation and canonicalization; hashing and Merkle root anchoring.
  • Vuln Explorer API (new)

    • Query/filter engine with policy parameterization and grouping.
    • Simulation endpoint.
    • Export job orchestrator.
  • Conseiller / Excitator (updates)

    • Guarantee canonical advisory_key and publish links[].
    • No merges; maintain raw payload snapshots.
  • Policy Engine (updates)

    • Batch evaluation endpoint POST /policy/eval/batch with simulate support.
    • Return rationale chain with rule IDs.
  • SBOM Service (updates)

    • Publish inventory deltas; include scope, runtime_flag, paths.
    • Nearest safe version hints.
  • Workers/Scheduler

    • Resolver job keyed by (tenant, artifact_id, sbom_id); emits candidate tuples.
    • Recompute on policy activation and evidence changes.

4.2 Code structure

/src/StellaOps.Findings.Ledger
  /api
  /projector
  /storage
/src/StellaOps.VulnExplorer.Api
  /routes
  /query
  /simulation
  /export
/packages/console/features/vuln-explorer
  /components
  /pages
  /state
/src/StellaOps.Cli

4.3 Performance tasks

  • Projection indexes and covering queries; explain plans in /docs/vuln/perf-notes.md.
  • Cache hot groupings per tenant with TTL and invalidation on ledger projector tick.

5) Documentation changes (create/update)

  1. /docs/vuln/explorer-overview.md Conceptual model, identities, evidence vs determinations, AOC guarantees.
  2. /docs/vuln/explorer-using-console.md Workflows with screenshots, keyboard shortcuts, saved views, deep links.
  3. /docs/vuln/explorer-api.md Endpoint specs, query language, grouping, pagination, errors, rate limits.
  4. /docs/vuln/explorer-cli.md Commands, flags, examples, exit codes.
  5. /docs/vuln/findings-ledger.md Event schema, state machine, hashing, Merkle roots, integrity checks.
  6. /docs/policy/vuln-determinations.md Inputs, outputs, precedence rules, simulation semantics.
  7. /docs/vex/explorer-integration.md CSAF mapping, scoping to product tree, precedence.
  8. /docs/advisories/explorer-integration.md Advisory key normalization, provenance, withdrawn handling.
  9. /docs/sbom/vuln-resolution.md Ecosystem version semantics, path sensitivity, scope rules.
  10. /docs/observability/vuln-telemetry.md Metrics, logs, traces, dashboards, SLOs.
  11. /docs/security/vuln-rbac.md Role mapping, ABAC, attachment encryption, CSRF.
  12. /docs/runbooks/vuln-ops.md Recompute storms, projector lag, policy activation drains, export failures.
  13. /docs/install/containers.md Add findings-ledger, vuln-explorer-api images, compose/k8s manifests, resource sizing, health checks.

Each doc ends with: Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


6) Engineering tasks

Backend: Findings & API

  • Define evidence and ledger schemas; migrations.
  • Implement resolver for npm, Maven, PyPI, Go, OS packages with propertybased tests for version comparisons.
  • Implement ledger API and projector with idempotency and hashing.
  • Implement list/detail/grouping endpoints with serverside paging.
  • Implement simulation and export (bundle assembly, signing).
  • Integrate Policy Engine batch eval with rationale traces.
  • RBAC via Authority with ABAC filters.
  • Load tests at 5M findings/tenant; tune indexes.

Conseiller/Excitator

  • Normalize advisory_key and persist links[].
  • Ensure raw payload snapshots are retrievable by Explorer for Evidence tab.

SBOM Service

  • Emit scope, runtime_flag, paths; safe version hints.
  • Inventory delta events to trigger resolver.

Console

  • Build grid with virtualization, saved views, deep link serializer.
  • Implement detail tabs and path deeplinks to Graph Explorer.
  • Add simulation bar and delta chips.
  • Evidence bundle dialog.
  • a11y keyboard flow and ARIA labeling; unit and E2E tests.

CLI

  • stella vuln list|show|simulate|assign|accept-risk|verify-fix with --json and CSV export.
  • Stable output schemas; pipefriendly defaults.

Observability/Ops

  • Dashboards for list latency, projection lag, new/reopened, SLA breaches.
  • Alerts on projector backlog, API 5xx spikes, export failures.
  • Runbooks in /docs/runbooks/vuln-ops.md.

Docs

  • Author files listed in §5 with crosslinks to Policy Studio and SBOM Graph Explorer.
  • Update /docs/install/containers.md with new images and compose/k8s snippets.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


7) Acceptance criteria

  • List and detail views reflect effective policy outcomes and update instantly when switching policy versions.
  • Evidence tab shows all raw advisory/VEX documents with provenance; no source merging.
  • Resolver respects ecosystem semantics, scope, and paths; path tab roundtrips to Graph Explorer.
  • Ledger events are immutable and reconstruct historical list states accurately.
  • Simulation returns diffs without side effects and matches Policy Engine outputs.
  • CLI/API support paging, grouping, export, simulation; contracts stable and documented.
  • RBAC and tenant isolation validated by tests; attachments encrypted.
  • P95 performance budgets met; dashboards green for SLOs.

8) Risks and mitigations

  • Advisory identity collisions → strict canonicalization; preserve links[]; never merge raw docs.
  • Projection lag → backpressure, worker autoscaling, health checks; alerting on lag.
  • Resolver false positives → path evidence required; dev/test scope rules explicit; ecosystemspecific tests.
  • User confusion over suppression → explicit badges; Policy tab with rationale and “why changed.”
  • Export size → NDJSON streaming, size estimator in UI, scope previews.

9) Test plan

  • Unit: version comparators, resolver per ecosystem, policy mapping, ledger state machine.
  • Integration: SBOM + advisories + VEX ingestion, candidate generation, policy application, suppression precedence.
  • E2E Console: triage, bulk assign, simulation, evidence bundle download; keyboardonly flow.
  • Performance: list/grouping at target scale; projector rebuild; export assembly.
  • Security: RBAC matrix, ABAC filters, CSRF, signed URL lifetimes, tamperevidence verification.
  • Determinism: timetravel snapshots reproduce prior states byteforbyte.

10) Philosophy

  • Facts first, decisions second. Evidence is immutable; decisions and workflow sit on top in a ledger.
  • Policy is the lens. The same facts can imply different obligations; the system must make that explicit and reproducible.
  • Audit > convenience. Every state change is justified, signed, and verifiable.
  • No hidden magic. If anything is suppressed, the UI shows the rule or VEX that did it, with documents attached.

Final reminder: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.