feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		
							
								
								
									
										22
									
								
								docs/modules/excititor/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/excititor/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Excititor agent guide | ||||
|  | ||||
| ## Mission | ||||
| Excititor converts heterogeneous VEX feeds into raw observations and linksets that honour the Aggregation-Only Contract. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										33
									
								
								docs/modules/excititor/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										33
									
								
								docs/modules/excititor/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,33 @@ | ||||
| # StellaOps Excititor | ||||
|  | ||||
| Excititor converts heterogeneous VEX feeds into raw observations and linksets that honour the Aggregation-Only Contract. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Fetch OpenVEX/CSAF/CycloneDX statements via restart-only connectors. | ||||
| - Store immutable VEX observations with full provenance. | ||||
| - Publish linksets and events that drive policy suppression decisions. | ||||
| - Provide deterministic exports for Offline Kit and downstream tooling. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Excititor.WebService` scheduler/API host. | ||||
| - Connector libraries under `StellaOps.Excititor.Connector.*`. | ||||
| - Normalization helpers and exporters in `StellaOps.Excititor.*`. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Policy Engine for evidence queries. | ||||
| - UI/CLI for conflict visibility and explanation. | ||||
| - Notify for VEX-driven alerts. | ||||
|  | ||||
| ## Operational notes | ||||
| - MongoDB for observation storage and job metadata. | ||||
| - Offline kit packaging aligned with Concelier merges. | ||||
| - Connector-specific runbooks (see `docs/modules/concelier/operations/connectors`). | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-LNM-22-006 / DOCS-LNM-22-007 (shared with Concelier). | ||||
| - CLI-EXC-25-001..002 follow-up for CLI parity. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** maintain immutable VEX observations, provenance, and AOC verifier coverage. | ||||
| - **Epic 7 – VEX Consensus Lens:** supply trustworthy raw inputs, trust metadata, and consensus hooks for the lens computations. | ||||
| - **Epic 8 – Advisory AI:** expose citation-ready VEX payloads for the advisory assistant pipeline. | ||||
							
								
								
									
										9
									
								
								docs/modules/excititor/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/excititor/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Excititor | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | EXCITITOR-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | EXCITITOR-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | EXCITITOR-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										749
									
								
								docs/modules/excititor/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										749
									
								
								docs/modules/excititor/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,749 @@ | ||||
| # component_architecture_excititor.md — **Stella Ops Excititor** (Sprint 22) | ||||
|  | ||||
| > Consolidates the VEX ingestion guardrails from Epic 1 with consensus and AI-facing requirements from Epics 7 and 8. This is the authoritative architecture record for Excititor. | ||||
|  | ||||
| > **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, observation/linkset pipelines, APIs, plug-in contracts, storage schema, performance budgets, testing matrix, and how it integrates with Concelier, Policy Engine, and evidence surfaces. It is implementation-ready. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & role in the platform | ||||
|  | ||||
| **Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into immutable **VEX observations**, correlate them into **linksets** that retain provenance/conflicts without precedence, and publish deterministic evidence exports and events that Policy Engine, Console, and CLI use to suppress or explain findings. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Excititor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights). | ||||
| * Excititor preserves **conflicting observations** unchanged; consensus (when enabled) merely annotates how policy might choose, but raw evidence remains exportable. | ||||
| * VEX consumption is **backend-only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Excititor for status evidence and then decides what to show. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Aggregation guardrails (AOC baseline) | ||||
|  | ||||
| Excititor enforces the same ingestion covenant as Concelier, tailored to VEX payloads: | ||||
|  | ||||
| 1. **Immutable `vex_raw` documents.** Upstream OpenVEX/CSAF/CycloneDX files are stored verbatim (`content.raw`) with provenance (`issuer`, `statement_id`, timestamps, signatures). Revisions append new versions linked by `supersedes`. | ||||
| 2. **No derived consensus at ingest time.** Fields such as `effective_status`, `merged_state`, `severity`, or reachability are forbidden. Roslyn analyzers and runtime guards block violations before writes. | ||||
| 3. **Linkset-only joins.** Product aliases, CVE keys, SBOM hints, and references live under `linkset`; ingestion must never mutate the underlying statement. | ||||
| 4. **Deterministic canonicalisation.** Writers sort JSON keys/arrays, normalize timestamps (UTC ISO‑8601), and hash content for reproducible exports. | ||||
| 5. **AOC verifier.** `StellaOps.AOC.Verifier` runs in CI and production, checking schema compliance, provenance completeness, sorted collections, and signature metadata. | ||||
|  | ||||
| ### 1.1 VEX raw document shape | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "_id": "vex_raw:openvex:VEX-2025-00001:v2", | ||||
|   "source": { | ||||
|     "issuer": "vendor:redhat", | ||||
|     "stream": "openvex", | ||||
|     "api": "https://vendor/api/vex/VEX-2025-00001.json", | ||||
|     "collector_version": "excititor/0.9.4" | ||||
|   }, | ||||
|   "upstream": { | ||||
|     "statement_id": "VEX-2025-00001", | ||||
|     "document_version": "2025-08-30T12:00:00Z", | ||||
|     "fetched_at": "2025-08-30T12:05:00Z", | ||||
|     "received_at": "2025-08-30T12:05:01Z", | ||||
|     "content_hash": "sha256:...", | ||||
|     "signature": { | ||||
|       "present": true, | ||||
|       "format": "dsse", | ||||
|       "key_id": "rekor:uuid", | ||||
|       "sig": "base64..." | ||||
|     } | ||||
|   }, | ||||
|   "content": { | ||||
|     "format": "openvex", | ||||
|     "spec_version": "1.0", | ||||
|     "raw": { /* upstream statement */ } | ||||
|   }, | ||||
|   "identifiers": { | ||||
|     "cve": ["CVE-2025-13579"], | ||||
|     "products": [ | ||||
|       {"purl": "pkg:rpm/redhat/openssl@3.0.9", "component": "openssl"} | ||||
|     ] | ||||
|   }, | ||||
|   "linkset": { | ||||
|     "aliases": ["REDHAT:RHSA-2025:1234"], | ||||
|     "sbom_products": ["pkg:rpm/redhat/openssl@3.0.9"], | ||||
|     "justifications": ["reasonable_worst_case_assumption"], | ||||
|     "references": [ | ||||
|       {"type": "advisory", "url": "https://..."} | ||||
|     ] | ||||
|   }, | ||||
|   "supersedes": "vex_raw:openvex:VEX-2025-00001:v1", | ||||
|   "tenant": "default" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 1.2 Issuer trust registry | ||||
|  | ||||
| To enable Epic 7’s consensus lens, Excititor maintains `vex_issuer_registry` documents containing: | ||||
|  | ||||
| - `issuer_id`, canonical name, and allowed domains. | ||||
| - `trust.tier` (`critical`, `high`, `medium`, `low`), `trust.confidence` (0–1). | ||||
| - `products` PURL patterns the issuer is authoritative for. | ||||
| - `signing_keys` with key IDs and expiry. | ||||
| - `last_validated_at`, `revocation_status`. | ||||
|  | ||||
| The registry is distributed as a signed bundle and cached locally; ingestion rejects statements from issuers without registry entries or valid signatures. | ||||
|  | ||||
| ### 1.3 Normalised tuple store | ||||
|  | ||||
| Excititor derives `vex_normalized` tuples (without making decisions) for downstream consumers: | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "advisory_key": "CVE-2025-13579", | ||||
|   "artifact": "pkg:rpm/redhat/openssl@3.0.9", | ||||
|   "issuer": "vendor:redhat", | ||||
|   "status": "not_affected", | ||||
|   "justification": "component_not_present", | ||||
|   "scope": "runtime_path", | ||||
|   "timestamp": "2025-08-30T12:00:00Z", | ||||
|   "trust": {"tier": "high", "confidence": 0.95}, | ||||
|   "statement_id": "VEX-2025-00001:v2", | ||||
|   "content_hash": "sha256:..." | ||||
| } | ||||
| ``` | ||||
|  | ||||
| These tuples allow VEX Lens to compute deterministic consensus without re-parsing heavy upstream documents. | ||||
|  | ||||
| ### 1.4 AI-ready citations | ||||
|  | ||||
| `GET /v1/vex/statements/{advisory_key}` produces sorted JSON responses containing raw statement metadata (`issuer`, `content_hash`, `signature`), normalised tuples, and provenance pointers. Advisory AI consumes this endpoint to build retrieval contexts with explicit citations. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Inputs, outputs & canonical domain | ||||
|  | ||||
| ### 1.1 Accepted input formats (ingest) | ||||
|  | ||||
| * **OpenVEX** JSON documents (attested or raw). | ||||
| * **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF). | ||||
| * **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks). | ||||
| * **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors. | ||||
|  | ||||
| All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors. | ||||
|  | ||||
| ### 1.2 Canonical model (observations & linksets) | ||||
|  | ||||
| #### VexObservation | ||||
|  | ||||
| ```jsonc | ||||
| observationId       // {tenant}:{providerId}:{upstreamId}:{revision} | ||||
| tenant | ||||
| providerId          // e.g., redhat, suse, ubuntu, osv | ||||
| streamId            // connector stream (csaf, openvex, cyclonedx, attestation) | ||||
| upstream{ | ||||
|     upstreamId, | ||||
|     documentVersion?, | ||||
|     fetchedAt, | ||||
|     receivedAt, | ||||
|     contentHash, | ||||
|     signature{present, format?, keyId?, signature?} | ||||
| } | ||||
| statements[ | ||||
|   { | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     status,                    // affected | not_affected | fixed | under_investigation | ||||
|     justification?, | ||||
|     introducedVersion?, | ||||
|     fixedVersion?, | ||||
|     lastObserved, | ||||
|     locator?,                  // JSON Pointer/line for provenance | ||||
|     evidence?[] | ||||
|   } | ||||
| ] | ||||
| content{ | ||||
|     format, | ||||
|     specVersion?, | ||||
|     raw | ||||
| } | ||||
| linkset{ | ||||
|     aliases[],                 // CVE/GHSA/vendor IDs | ||||
|     purls[], | ||||
|     cpes[], | ||||
|     references[{type,url}], | ||||
|     reconciledFrom[] | ||||
| } | ||||
| supersedes? | ||||
| createdAt | ||||
| attributes? | ||||
| ``` | ||||
|  | ||||
| #### VexLinkset | ||||
|  | ||||
| ```jsonc | ||||
| linksetId           // sha256 over sorted (tenant, vulnId, productKey, observationIds) | ||||
| tenant | ||||
| key{ | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     confidence          // low|medium|high | ||||
| } | ||||
| observations[] = [ | ||||
|   { | ||||
|     observationId, | ||||
|     providerId, | ||||
|     status, | ||||
|     justification?, | ||||
|     introducedVersion?, | ||||
|     fixedVersion?, | ||||
|     evidence?, | ||||
|     collectedAt | ||||
|   } | ||||
| ] | ||||
| aliases{ | ||||
|     primary, | ||||
|     others[] | ||||
| } | ||||
| purls[] | ||||
| cpes[] | ||||
| conflicts[]?        // see VexLinksetConflict | ||||
| createdAt | ||||
| updatedAt | ||||
| ``` | ||||
|  | ||||
| #### VexLinksetConflict | ||||
|  | ||||
| ```jsonc | ||||
| conflictId | ||||
| type                // status-mismatch | justification-divergence | version-range-clash | non-joinable-overlap | metadata-gap | ||||
| field?              // optional pointer for UI rendering | ||||
| statements[]        // per-observation values with providerId + status/justification/version data | ||||
| confidence | ||||
| detectedAt | ||||
| ``` | ||||
|  | ||||
| #### VexConsensus (optional) | ||||
|  | ||||
| ```jsonc | ||||
| consensusId         // sha256(vulnerabilityId, productKey, policyRevisionId) | ||||
| vulnerabilityId | ||||
| productKey | ||||
| rollupStatus        // derived by Excititor policy adapter (linkset aware) | ||||
| sources[]           // observation references with weight, accepted flag, reason | ||||
| policyRevisionId | ||||
| evaluatedAt | ||||
| consensusDigest | ||||
| ``` | ||||
|  | ||||
| Consensus persists only when Excititor policy adapters require pre-computed rollups (e.g., Offline Kit). Policy Engine can also compute consensus on demand from linksets. | ||||
|  | ||||
| ### 1.3 Exports & evidence bundles | ||||
|  | ||||
| * **Raw observations** — JSON tree per observation for auditing/offline. | ||||
| * **Linksets** — grouped evidence for policy/Console/CLI consumption. | ||||
| * **Consensus (optional)** — if enabled, mirrors existing API contracts. | ||||
| * **Provider snapshots** — last N days of observations per provider to support diagnostics. | ||||
| * **Index** — `(productKey, vulnerabilityId) → {status candidates, confidence, observationIds}` for high-speed joins. | ||||
|  | ||||
| All exports remain deterministic and, when configured, attested via DSSE + Rekor v2. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Identity model — products & joins | ||||
|  | ||||
| ### 2.1 Vuln identity | ||||
|  | ||||
| * Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets. | ||||
| * **Alias graph** maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable. | ||||
|  | ||||
| ### 2.2 Product identity (`productKey`) | ||||
|  | ||||
| * **Primary:** `purl` (Package URL). | ||||
| * **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable. | ||||
| * **Fallback:** `oci:<registry>/<repo>@<digest>` for image‑level VEX. | ||||
| * **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`). | ||||
|  | ||||
| > Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Storage schema (MongoDB) | ||||
|  | ||||
| Database: `excititor` | ||||
|  | ||||
| ### 3.1 Collections | ||||
|  | ||||
| **`vex.providers`** | ||||
|  | ||||
| ``` | ||||
| _id: providerId | ||||
| name, homepage, contact | ||||
| trustTier: enum {vendor, distro, platform, hub, attestation} | ||||
| signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] } | ||||
| fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays } | ||||
| enabled: bool | ||||
| createdAt, modifiedAt | ||||
| ``` | ||||
|  | ||||
| **`vex.raw`** (immutable raw documents) | ||||
|  | ||||
| ``` | ||||
| _id: sha256(doc bytes) | ||||
| providerId | ||||
| uri | ||||
| ingestedAt | ||||
| contentType | ||||
| sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? } | ||||
| payload: GridFS pointer (if large) | ||||
| disposition: kept|replaced|superseded | ||||
| correlation: { replaces?: sha256, replacedBy?: sha256 } | ||||
| ``` | ||||
|  | ||||
| **`vex.observations`** | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "tenant:providerId:upstreamId:revision", | ||||
|   tenant, | ||||
|   providerId, | ||||
|   streamId, | ||||
|   upstream: { upstreamId, documentVersion?, fetchedAt, receivedAt, contentHash, signature }, | ||||
|   statements: [ | ||||
|     { | ||||
|       vulnerabilityId, | ||||
|       productKey, | ||||
|       status, | ||||
|       justification?, | ||||
|       introducedVersion?, | ||||
|       fixedVersion?, | ||||
|       lastObserved, | ||||
|       locator?, | ||||
|       evidence? | ||||
|     } | ||||
|   ], | ||||
|   content: { format, specVersion?, raw }, | ||||
|   linkset: { aliases[], purls[], cpes[], references[], reconciledFrom[] }, | ||||
|   supersedes?, | ||||
|   createdAt, | ||||
|   attributes? | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, providerId:1, upstream.upstreamId:1}`, `{tenant:1, statements.vulnerabilityId:1}`, `{tenant:1, linkset.purls:1}`, `{tenant:1, createdAt:-1}`. | ||||
|  | ||||
| **`vex.linksets`** | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "sha256:...", | ||||
|   tenant, | ||||
|   key: { vulnerabilityId, productKey, confidence }, | ||||
|   observations: [ | ||||
|     { observationId, providerId, status, justification?, introducedVersion?, fixedVersion?, evidence?, collectedAt } | ||||
|   ], | ||||
|   aliases: { primary, others: [] }, | ||||
|   purls: [], | ||||
|   cpes: [], | ||||
|   conflicts: [], | ||||
|   createdAt, | ||||
|   updatedAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, updatedAt:-1}`. | ||||
|  | ||||
| **`vex.events`** (observation/linkset events, optional long retention) | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: ObjectId, | ||||
|   tenant, | ||||
|   type: "vex.observation.updated" | "vex.linkset.updated", | ||||
|   key, | ||||
|   delta, | ||||
|   hash, | ||||
|   occurredAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{type:1, occurredAt:-1}`, TTL on `occurredAt` for configurable retention. | ||||
|  | ||||
| **`vex.consensus`** (optional rollups) | ||||
|  | ||||
| ``` | ||||
| _id: sha256(canonical(vulnerabilityId, productKey, policyRevisionId)) | ||||
| vulnerabilityId | ||||
| productKey | ||||
| rollupStatus | ||||
| sources[]      // observation references with weights/reasons | ||||
| policyRevisionId | ||||
| evaluatedAt | ||||
| signals?       // optional severity/kev/epss hints | ||||
| consensusDigest | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{vulnerabilityId:1, productKey:1}`, `{policyRevisionId:1, evaluatedAt:-1}`. | ||||
|  | ||||
| **`vex.exports`** (manifest of emitted artifacts) | ||||
|  | ||||
| ``` | ||||
| _id | ||||
| querySignature | ||||
| format: raw|consensus|index | ||||
| artifactSha256 | ||||
| rekor { uuid, index, url }? | ||||
| createdAt | ||||
| policyRevisionId | ||||
| cacheable: bool | ||||
| ``` | ||||
|  | ||||
| **`vex.cache`** — observation/linkset export cache: `{querySignature, exportId, ttl, hits}`. | ||||
|  | ||||
| **`vex.migrations`** — ordered migrations ensuring new indexes (`20251027-linksets-introduced`, etc.). | ||||
|  | ||||
| ### 3.2 Indexing strategy | ||||
|  | ||||
| * Hot path queries rely on `{tenant, key.vulnerabilityId, key.productKey}` covering linkset lookup. | ||||
| * Observability queries use `{tenant, updatedAt}` to monitor staleness. | ||||
| * Consensus (if enabled) keyed by `{vulnerabilityId, productKey, policyRevisionId}` for deterministic reuse. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Ingestion pipeline | ||||
|  | ||||
| ### 4.1 Connector contract | ||||
|  | ||||
| ```csharp | ||||
| public interface IVexConnector | ||||
| { | ||||
|     string ProviderId { get; } | ||||
|     Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs | ||||
|     Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> ObservationStatements[] | ||||
| } | ||||
| ``` | ||||
|  | ||||
| * **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff. | ||||
| * **Normalize** parses the format, validates schema, maps product identities deterministically, emits observation statements with **provenance** metadata (locator, justification, version ranges). | ||||
|  | ||||
| ### 4.2 Signature verification (per provider) | ||||
|  | ||||
| * **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles. | ||||
| * **PGP** (provider keyrings) for distro/vendor feeds that sign docs. | ||||
| * **x509** (mutual TLS / provider‑pinned certs) where applicable. | ||||
| * Signature state is stored on **vex.raw.sig** and copied into `statements[].signatureState` so downstream policy can gate by verification result. | ||||
|  | ||||
| > Observation statements from sources failing signature policy are marked `"signatureState.verified=false"` and policy can down-weight or ignore them. | ||||
|  | ||||
| ### 4.3 Time discipline | ||||
|  | ||||
| * For each doc, prefer **provider’s document timestamp**; if absent, use fetch time. | ||||
| * Statements carry `lastObserved` which drives **tie-breaking** within equal weight tiers. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Normalization: product & status semantics | ||||
|  | ||||
| ### 5.1 Product mapping | ||||
|  | ||||
| * **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb). | ||||
| * Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied. | ||||
| * If expansion would be speculative, the statement remains **platform-scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non-joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime. | ||||
|  | ||||
| ### 5.2 Status + justification mapping | ||||
|  | ||||
| * Canonical **status**: `affected | not_affected | fixed | under_investigation`. | ||||
| * **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.: | ||||
|  | ||||
|   * `component_not_present` | ||||
|   * `vulnerable_code_not_in_execute_path` | ||||
|   * `vulnerable_configuration_unused` | ||||
|   * `inline_mitigation_applied` | ||||
|   * `fix_available` (with `fixedVersion`) | ||||
|   * `under_investigation` | ||||
| * Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Consensus algorithm | ||||
|  | ||||
| **Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` when consumers opt into Excititor-managed consensus derived from linksets. | ||||
|  | ||||
| ### 6.1 Inputs | ||||
|  | ||||
| * Set **S** of observation statements drawn from the current `VexLinkset` for `(tenant, vulnId, productKey)`. | ||||
| * **Excititor policy snapshot**: | ||||
|  | ||||
|   * **weights** per provider tier and per provider overrides. | ||||
|   * **justification gates** (e.g., require justification for `not_affected` to be acceptable). | ||||
|   * **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros). | ||||
|   * **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered). | ||||
|  | ||||
| ### 6.2 Steps | ||||
|  | ||||
| 1. **Filter invalid** statements by signature policy & justification gates → set `S'`. | ||||
| 2. **Score** each statement: | ||||
|    `score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect). Observations lacking verified signatures receive policy-configured penalties. | ||||
| 3. **Aggregate** scores per status: `W(status) = Σ score(statements with that status)`. | ||||
| 4. **Pick** `rollupStatus = argmax_status W(status)`. | ||||
| 5. **Tie‑breakers** (in order): | ||||
|  | ||||
|    * Higher **max single** provider score wins (vendor > distro > platform > hub). | ||||
|    * More **recent** lastObserved wins. | ||||
|    * Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker. | ||||
| 6. **Explain**: mark accepted observations (`accepted=true; reason="weight"`/`"freshness"`/`"confidence"`) and rejected ones with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`, `"low_confidence_linkset"`). | ||||
|  | ||||
| > The algorithm is **pure** given `S` and policy snapshot; result is reproducible and hashed into `consensusDigest`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Query & export APIs | ||||
|  | ||||
| All endpoints are versioned under `/api/v1/vex`. | ||||
|  | ||||
| ### 7.1 Query (online) | ||||
|  | ||||
| ``` | ||||
| POST /observations/search | ||||
|   body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string } | ||||
|   → { observations[], nextPageToken? } | ||||
|  | ||||
| POST /linksets/search | ||||
|   body: { vulnIds?: string[], productKeys?: string[], confidence?: string[], since?: timestamp, limit?: int, pageToken?: string } | ||||
|   → { linksets[], nextPageToken? } | ||||
|  | ||||
| POST /consensus/search | ||||
|   body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string } | ||||
|   → { entries[], nextPageToken? } | ||||
|  | ||||
| POST /excititor/resolve (scope: vex.read) | ||||
|   body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string } | ||||
|   → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, observations[], conflicts[], linksetConfidence, consensus?, signals?, envelope? } ] } | ||||
| ``` | ||||
|  | ||||
| ### 7.2 Exports (cacheable snapshots) | ||||
|  | ||||
| ``` | ||||
| POST /exports | ||||
|   body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool } | ||||
|   → { exportId, artifactSha256, rekor? } | ||||
|  | ||||
| GET  /exports/{exportId}        → bytes (application/json or binary index) | ||||
| GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? } | ||||
| ``` | ||||
|  | ||||
| ### 7.3 Provider operations | ||||
|  | ||||
| ``` | ||||
| GET  /providers                  → provider list & signature policy | ||||
| POST /providers/{id}/refresh     → trigger fetch/normalize window | ||||
| GET  /providers/{id}/status      → last fetch, doc counts, signature stats | ||||
| ``` | ||||
|  | ||||
| **Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Attestation integration | ||||
|  | ||||
| * Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines). | ||||
| * `vex.exports.rekor` stores `{uuid, index, url}` when present. | ||||
| * **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields: | ||||
|  | ||||
|   * `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| excititor: | ||||
|   mongo: { uri: "mongodb://mongo/excititor" } | ||||
|   s3: | ||||
|     endpoint: http://minio:9000 | ||||
|     bucket: stellaops | ||||
|   policy: | ||||
|     weights: | ||||
|       vendor: 1.0 | ||||
|       distro: 0.9 | ||||
|       platform: 0.7 | ||||
|       hub: 0.5 | ||||
|       attestation: 0.6 | ||||
|       ceiling: 1.25 | ||||
|     scoring: | ||||
|       alpha: 0.25 | ||||
|       beta: 0.5 | ||||
|     providerOverrides: | ||||
|       redhat: 1.0 | ||||
|       suse: 0.95 | ||||
|     requireJustificationForNotAffected: true | ||||
|     signatureRequiredForFixed: true | ||||
|     minEvidence: | ||||
|       not_affected: | ||||
|         vendorOrTwoDistros: true | ||||
|   connectors: | ||||
|     - providerId: redhat | ||||
|       kind: csaf | ||||
|       baseUrl: https://access.redhat.com/security/data/csaf/v2/ | ||||
|       signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] } | ||||
|       windowDays: 7 | ||||
|     - providerId: suse | ||||
|       kind: csaf | ||||
|       baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ | ||||
|       signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] } | ||||
|     - providerId: ubuntu | ||||
|       kind: openvex | ||||
|       baseUrl: https://…/vex/ | ||||
|       signaturePolicy: { type: none } | ||||
|     - providerId: vendorX | ||||
|       kind: cyclonedx-vex | ||||
|       ociRef: ghcr.io/vendorx/vex@sha256:… | ||||
|       signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] } | ||||
| ``` | ||||
|  | ||||
| ### 9.1 WebService endpoints | ||||
|  | ||||
| With storage configured, the WebService exposes the following ingress and diagnostic APIs: | ||||
|  | ||||
| * `GET /excititor/status` – returns the active storage configuration and registered artifact stores. | ||||
| * `GET /excititor/health` – simple liveness probe. | ||||
| * `POST /excititor/statements` – accepts normalized VEX statements and persists them via `IVexClaimStore`; use this for migrations/backfills. | ||||
| * `GET /excititor/statements/{vulnId}/{productKey}?since=` – returns the immutable statement log for a vulnerability/product pair. | ||||
| * `POST /excititor/resolve` – requires `vex.read` scope; accepts up to 256 `(vulnId, productKey)` pairs via `productKeys` or `purls` and returns deterministic consensus results, decision telemetry, and a signed envelope (`artifact` digest, optional signer signature, optional attestation metadata + DSSE envelope). Returns **409 Conflict** when the requested `policyRevisionId` mismatches the active snapshot. | ||||
|  | ||||
| Run the ingestion endpoint once after applying migration `20251019-consensus-signals-statements` to repopulate historical statements with the new severity/KEV/EPSS signal fields. | ||||
|  | ||||
| * `weights.ceiling` raises the deterministic clamp applied to provider tiers/overrides (range 1.0‒5.0). Values outside the range are clamped with warnings so operators can spot typos. | ||||
| * `scoring.alpha` / `scoring.beta` configure KEV/EPSS boosts for the Phase 1 → Phase 2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Security model | ||||
|  | ||||
| * **Input signature verification** enforced per provider policy (PGP, cosign, x509). | ||||
| * **Connector allowlists**: outbound fetch constrained to configured domains. | ||||
| * **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies. | ||||
| * **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`). | ||||
| * **No secrets in logs**; deterministic logging contexts include providerId, docDigest, observationId, and linksetId. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Performance & scale | ||||
|  | ||||
| * **Targets:** | ||||
|  | ||||
|   * Normalize 10k observation statements/minute/core. | ||||
|   * Linkset rebuild ≤ 20 ms P95 for 1k unique `(vuln, product)` pairs in hot cache. | ||||
|   * Consensus (when enabled) compute ≤ 50 ms for 1k unique `(vuln, product)` pairs. | ||||
|   * Export (observations + linksets) 1M rows in ≤ 60 s on 8 cores with streaming writer. | ||||
|  | ||||
| * **Scaling:** | ||||
|  | ||||
|   * WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys. | ||||
|   * Exports stream straight to S3 (MinIO) with rolling buffers. | ||||
|  | ||||
| * **Caching:** | ||||
|  | ||||
|   * `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`. | ||||
|  | ||||
| ### 11.1 Worker TTL refresh controls | ||||
|  | ||||
| Excititor.Worker ships with a background refresh service that re-evaluates stale consensus rows and applies stability dampers before publishing status flips. Operators can tune its behaviour through the following configuration (shown in `appsettings.json` syntax): | ||||
|  | ||||
| ```jsonc | ||||
| { | ||||
|   "Excititor": { | ||||
|     "Worker": { | ||||
|       "Refresh": { | ||||
|         "Enabled": true, | ||||
|         "ConsensusTtl": "02:00:00",       // refresh consensus older than 2 hours | ||||
|         "ScanInterval": "00:10:00",       // sweep cadence | ||||
|         "ScanBatchSize": 250,              // max documents examined per sweep | ||||
|         "Damper": { | ||||
|           "Minimum": "1.00:00:00",       // lower bound before status flip publishes | ||||
|           "Maximum": "2.00:00:00",       // upper bound guardrail | ||||
|           "DefaultDuration": "1.12:00:00", | ||||
|           "Rules": [ | ||||
|             { "MinWeight": 0.90, "Duration": "1.00:00:00" }, | ||||
|             { "MinWeight": 0.75, "Duration": "1.06:00:00" }, | ||||
|             { "MinWeight": 0.50, "Duration": "1.12:00:00" } | ||||
|           ] | ||||
|         } | ||||
|       } | ||||
|     } | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| * `ConsensusTtl` governs when the worker issues a fresh resolve for cached consensus data. | ||||
| * `Damper` lengths are clamped between `Minimum`/`Maximum`; duration is bypassed when component fingerprints (`VexProduct.ComponentIdentifiers`) change. | ||||
| * The same keys are available through environment variables (e.g., `Excititor__Worker__Refresh__ConsensusTtl=02:00:00`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Observability | ||||
|  | ||||
| * **Metrics:** | ||||
|  | ||||
|   * `vex.fetch.requests_total{provider}` / `vex.fetch.bytes_total{provider}` | ||||
|   * `vex.fetch.failures_total{provider,reason}` / `vex.signature.failures_total{provider,method}` | ||||
|   * `vex.normalize.statements_total{provider}` | ||||
|   * `vex.observations.write_total{result}` | ||||
|   * `vex.linksets.updated_total{result}` / `vex.linksets.conflicts_total{type}` | ||||
|   * `vex.consensus.rollup_total{status}` (when enabled) | ||||
|   * `vex.exports.bytes_total{format}` / `vex.exports.latency_seconds{format}` | ||||
| * **Tracing:** spans for fetch, verify, parse, map, observe, linkset, consensus, export. | ||||
| * **Dashboards:** provider staleness, linkset conflict hot spots, signature posture, export cache hit-rate. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Testing matrix | ||||
|  | ||||
| * **Connectors:** golden raw docs → deterministic observation statements (fixtures per provider/format). | ||||
| * **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted. | ||||
| * **Normalization edge cases:** platform-scoped statements, free-text justifications, non-purl products. | ||||
| * **Linksets:** conflict scenarios across tiers; verify confidence scoring + conflict payload stability. | ||||
| * **Consensus (optional):** ensure tie-breakers honour policy weights/justification gates. | ||||
| * **Performance:** 1M-row observation/linkset export timing; memory ceilings; stream correctness. | ||||
| * **Determinism:** same inputs + policy → identical linkset hashes, conflict payloads, optional `consensusDigest`, and export bytes. | ||||
| * **API contract tests:** pagination, filters, RBAC, rate limits. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Integration points | ||||
|  | ||||
| * **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`. | ||||
| * **Concelier**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation. | ||||
| * **UI**: VEX explorer screens use `/observations/search`, `/linksets/search`, and `/consensus/search`; show conflicts & provenance. | ||||
| * **CLI**: `stella vex linksets export --since 7d --out vex-linksets.json` (optionally `--include-consensus`) for audits and Offline Kit parity. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Failure modes & fallback | ||||
|  | ||||
| * **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor). | ||||
| * **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy. | ||||
| * **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Rollout plan (incremental) | ||||
|  | ||||
| 1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`. | ||||
| 2. **Signature policies**: PGP for distros; cosign for OCI. | ||||
| 3. **Exports + optional attestation**. | ||||
| 4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer. | ||||
| 5. **Scale hardening**: export indexes; conflict analytics. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 18) Operational runbooks | ||||
|  | ||||
| * **Statement backfill** — see `docs/dev/EXCITITOR_STATEMENT_BACKFILL.md` for the CLI workflow, required permissions, observability guidance, and rollback steps. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 19) Appendix — canonical JSON (stable ordering) | ||||
|  | ||||
| All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`: | ||||
|  | ||||
| * UTF‑8 without BOM; | ||||
| * keys sorted (ASCII); | ||||
| * arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated; | ||||
| * timestamps in `YYYY‑MM‑DDThh:mm:ssZ`; | ||||
| * no insignificant whitespace. | ||||
|  | ||||
							
								
								
									
										21
									
								
								docs/modules/excititor/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										21
									
								
								docs/modules/excititor/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,21 @@ | ||||
| # Implementation plan — Excititor | ||||
|  | ||||
| ## Current objectives | ||||
| - Maintain deterministic behaviour and offline parity across releases. | ||||
| - Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap. | ||||
| - Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs. | ||||
| - Validation: extend tests/fixtures to preserve determinism and provenance requirements. | ||||
|  | ||||
| ## Epic milestones | ||||
| - **Epic 1 – AOC enforcement:** enforce immutable VEX observation schema, provenance capture, and guardrails. | ||||
| - **Epic 7 – VEX Consensus Lens:** provide lens-ready metadata (issuer trust, temporal scoping) and consensus APIs. | ||||
| - **Epic 8 – Advisory AI:** guarantee citation-ready payloads and normalized context for AI summaries/explainers. | ||||
| - Track DOCS-LNM-22-006/007 and CLI-EXC-25-001..002 in ../../TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up new work. | ||||
| - Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										164
									
								
								docs/modules/excititor/mirrors.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										164
									
								
								docs/modules/excititor/mirrors.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,164 @@ | ||||
| # architecture_excititor_mirrors.md — Excititor Mirror Distribution | ||||
|  | ||||
| > **Status:** Draft (Sprint 7). Complements `docs/modules/excititor/architecture.md` by describing the mirror export surface exposed by `Excititor.WebService` and the configuration hooks used by operators and downstream mirrors. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Purpose | ||||
|  | ||||
| Excititor publishes canonical VEX consensus data. Operators (or StellaOps-managed mirrors) need a deterministic way to sync those exports into downstream environments. Mirror distribution provides: | ||||
|  | ||||
| * A declarative map of export bundles (`json`, `jsonl`, `openvex`, `csaf`) reachable via signed HTTP endpoints under `/excititor/mirror`. | ||||
| * Thin quota/authentication controls on top of the existing export cache so mirrors cannot starve the web service. | ||||
| * Stable payload shapes that downstream automation can monitor (index → fetch updates → download artifact → verify signature). | ||||
|  | ||||
| Mirror endpoints are intentionally **read-only**. Write paths (export generation, attestation, cache) remain the responsibility of the export pipeline. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Configuration model | ||||
|  | ||||
| The web service reads mirror configuration from `Excititor:Mirror` (YAML/JSON/appsettings). Each domain groups a set of exports that share rate limits and authentication rules. | ||||
|  | ||||
| ```yaml | ||||
| Excititor: | ||||
|   Mirror: | ||||
|     Domains: | ||||
|       - id: primary | ||||
|         displayName: Primary Mirror | ||||
|         requireAuthentication: false | ||||
|         maxIndexRequestsPerHour: 600 | ||||
|         maxDownloadRequestsPerHour: 1200 | ||||
|         exports: | ||||
|           - key: consensus | ||||
|             format: json | ||||
|             filters: | ||||
|               vulnId: CVE-2025-0001 | ||||
|               productKey: pkg:test/demo | ||||
|             sort: | ||||
|               createdAt: false     # descending | ||||
|             limit: 1000 | ||||
|           - key: consensus-openvex | ||||
|             format: openvex | ||||
|             filters: | ||||
|               vulnId: CVE-2025-0001 | ||||
| ``` | ||||
|  | ||||
| ### Root settings | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `outputRoot` | – | Filesystem root where mirror artefacts are written. Defaults to the Excititor file-system artifact store root when omitted. | | ||||
| | `directoryName` | – | Optional subdirectory created under `outputRoot`; defaults to `mirror`. | | ||||
| | `targetRepository` | – | Hint propagated to manifests/index files indicating the operator-visible location (for example `s3://mirror/excititor`). | | ||||
| | `signing` | – | Bundle signing configuration. When enabled, the exporter emits a detached JWS (`bundle.json.jws`) alongside each domain bundle. | | ||||
|  | ||||
| `signing` supports the following fields: | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `enabled` | – | Toggles detached signing for domain bundles. | | ||||
| | `algorithm` | – | Signing algorithm identifier (default `ES256`). | | ||||
| | `keyId` | ✅ (when `enabled`) | Signing key identifier resolved via the configured crypto provider registry. | | ||||
| | `provider` | – | Optional provider hint when multiple registries are available. | | ||||
| | `keyPath` | – | Optional PEM path used to seed the provider when the key is not already loaded. | | ||||
|  | ||||
| ### Domain field reference | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `id` | ✅ | Stable identifier. Appears in URLs (`/excititor/mirror/domains/{id}`) and download filenames. | | ||||
| | `displayName` | – | Human-friendly label surfaced in the `/domains` listing. Falls back to `id`. | | ||||
| | `requireAuthentication` | – | When `true` the service enforces that the caller is authenticated (Authority token). | | ||||
| | `maxIndexRequestsPerHour` | – | Per-domain quota for index endpoints. `0`/negative disables the guard. | | ||||
| | `maxDownloadRequestsPerHour` | – | Per-domain quota for artifact downloads. | | ||||
| | `exports` | ✅ | Collection of export projections. | | ||||
|  | ||||
| Export-level fields: | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `key` | ✅ | Unique key within the domain. Used in URLs (`/exports/{key}`) and filenames/bundle entries. | | ||||
| | `format` | ✅ | One of `json`, `jsonl`, `openvex`, `csaf`. Maps to `VexExportFormat`. | | ||||
| | `filters` | – | Key/value pairs executed via `VexQueryFilter`. Keys must match export data source columns (e.g., `vulnId`, `productKey`). | | ||||
| | `sort` | – | Key/boolean map (false = descending). | | ||||
| | `limit`, `offset`, `view` | – | Optional query bounds passed through to the export query. | | ||||
|  | ||||
| ⚠️ **Misconfiguration:** invalid formats or missing keys cause exports to be flagged with `status` in the index response; they are not exposed downstream. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) HTTP surface | ||||
|  | ||||
| Routes are grouped under `/excititor/mirror`. | ||||
|  | ||||
| | Method | Path | Description | | ||||
| | --- | --- | --- | | ||||
| | `GET` | `/domains` | Returns configured domains with quota metadata. | | ||||
| | `GET` | `/domains/{domainId}` | Domain detail (auth/quota + export keys). `404` for unknown domains. | | ||||
| | `GET` | `/domains/{domainId}/index` | Lists exports with exportId, query signature, format, artifact digest, attestation metadata, and size. Applies index quota. | | ||||
| | `GET` | `/domains/{domainId}/exports/{exportKey}` | Returns manifest metadata (single export). `404` if unknown/missing. | | ||||
| | `GET` | `/domains/{domainId}/exports/{exportKey}/download` | Streams export content from the artifact store. Applies download quota. | | ||||
|  | ||||
| Responses are serialized via `VexCanonicalJsonSerializer` ensuring stable ordering. Download responses include a content-disposition header naming the file `<domain>-<export>.<ext>`. | ||||
|  | ||||
| ### Error handling | ||||
|  | ||||
| * `401` – authentication required (`requireAuthentication=true`). | ||||
| * `404` – domain/export not found or manifest not persisted. | ||||
| * `429` – per-domain quota exceeded (`Retry-After` header set in seconds). | ||||
| * `503` – export misconfiguration (invalid format/query). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Rate limiting | ||||
|  | ||||
| `MirrorRateLimiter` implements a simple rolling 1-hour window using `IMemoryCache`. Each domain has two quotas: | ||||
|  | ||||
| * `index` scope → `maxIndexRequestsPerHour` | ||||
| * `download` scope → `maxDownloadRequestsPerHour` | ||||
|  | ||||
| `0` or negative limits disable enforcement. Quotas are best-effort (per-instance). For HA deployments, configure sticky routing at the ingress or replace the limiter with a distributed implementation. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Interaction with export pipeline | ||||
|  | ||||
| Mirror endpoints consume manifests produced by the export engine (`MongoVexExportStore`). They do **not** trigger new exports. Operators must configure connectors/exporters to keep targeted exports fresh (see `EXCITITOR-EXPORT-01-005/006/007`). | ||||
|  | ||||
| Recommended workflow: | ||||
|  | ||||
| 1. Define export plans at the export layer (JSON/OpenVEX/CSAF). | ||||
| 2. Configure mirror domains mapping to those plans. | ||||
| 3. Downstream mirror automation: | ||||
|    * `GET /domains/{id}/index` | ||||
|    * Compare `exportId` / `consensusRevision` | ||||
|    * `GET /download` when new | ||||
|    * Verify digest + attestation | ||||
|  | ||||
| When the export engine runs, it materializes the following artefacts under `outputRoot/<directoryName>`: | ||||
|  | ||||
| - `index.json` – canonical index listing each configured domain, manifest/bundle descriptors (with SHA-256 digests), and available export keys. | ||||
| - `<domain>/manifest.json` – per-domain summary with export metadata (query signature, consensus/score digests, source providers) and a descriptor pointing at the bundle. | ||||
| - `<domain>/bundle.json` – canonical payload containing serialized consensus, score envelopes, and normalized VEX claims for the matching export definitions. | ||||
| - `<domain>/bundle.json.jws` – optional detached JWS when signing is enabled. | ||||
|  | ||||
| Downstream automation reads `manifest.json`/`bundle.json` directly, while `/excititor/mirror` endpoints stream the same artefacts through authenticated HTTP. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Operational guidance | ||||
|  | ||||
| * Track quota utilisation via HTTP 429 metrics (configure structured logging or OTEL counters when rate limiting triggers). | ||||
| * Mirror domains can be deployed per tenant (e.g., `tenant-a`, `tenant-b`) with different auth requirements. | ||||
| * Ensure the underlying artifact stores (`FileSystem`, `S3`, offline bundle) retain artefacts long enough for mirrors to sync. | ||||
| * For air-gapped mirrors, combine mirror endpoints with the Offline Kit (see `docs/24_OFFLINE_KIT.md`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Future alignment | ||||
|  | ||||
| * Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships. | ||||
| * Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata. | ||||
| * Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively. | ||||
|  | ||||
							
								
								
									
										104
									
								
								docs/modules/excititor/scoring.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										104
									
								
								docs/modules/excititor/scoring.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,104 @@ | ||||
| ## Status | ||||
|  | ||||
| This document tracks the future-looking risk scoring model for Excititor. The calculation below is not active yet; Sprint 7 work will add the required schema fields, policy controls, and services. Until that ships, Excititor emits consensus statuses without numeric scores. | ||||
|  | ||||
| ## Scoring model (target state) | ||||
|  | ||||
| **S = Gate(VEX_status) × W_trust(source) × [Severity_base × (1 + α·KEV + β·EPSS)]** | ||||
|  | ||||
| * **Gate(VEX_status)**: `affected`/`under_investigation` → 1, `not_affected`/`fixed` → 0. A trusted “not affected” or “fixed” still zeroes the score. | ||||
| * **W_trust(source)**: normalized policy weight (baseline 0‒1). Policies may opt into >1 boosts for signed vendor feeds once Phase 1 closes. | ||||
| * **Severity_base**: canonical numeric severity from Concelier (CVSS or org-defined scale). | ||||
| * **KEV flag**: 0/1 boost when CISA Known Exploited Vulnerabilities applies. | ||||
| * **EPSS**: probability [0,1]; bounded multiplier. | ||||
| * **α, β**: configurable coefficients (default α=0.25, β=0.5) stored in policy. | ||||
|  | ||||
| Safeguards: freeze boosts when product identity is unknown, clamp outputs ≥0, and log every factor in the audit trail. | ||||
|  | ||||
| ## Implementation roadmap | ||||
|  | ||||
| | Phase | Scope | Artifacts | | ||||
| | --- | --- | --- | | ||||
| | **Phase 1 – Schema foundations** | Extend Excititor consensus/claims and Concelier canonical advisories with severity, KEV, EPSS, and expose α/β + weight ceilings in policy. | Sprint 7 tasks `EXCITITOR-CORE-02-001`, `EXCITITOR-POLICY-02-001`, `EXCITITOR-STORAGE-02-001`, `FEEDCORE-ENGINE-07-001`. | | ||||
| | **Phase 2 – Deterministic score engine** | Implement a scoring component that executes alongside consensus and persists score envelopes with hashes. | Planned task `EXCITITOR-CORE-02-002` (backlog). | | ||||
| | **Phase 3 – Surfacing & enforcement** | Expose scores via WebService/CLI, integrate with Concelier noise priors, and enforce policy-based suppressions. | To be scheduled after Phase 2. | | ||||
|  | ||||
| ## Policy controls (Phase 1) | ||||
|  | ||||
| Operators tune scoring inputs through the Excititor policy document: | ||||
|  | ||||
| ```yaml | ||||
| excititor: | ||||
|   policy: | ||||
|     weights: | ||||
|       vendor: 1.10      # per-tier weight | ||||
|       ceiling: 1.40     # max clamp applied to tiers and overrides (1.0‒5.0) | ||||
|     providerOverrides: | ||||
|       trusted.vendor: 1.35 | ||||
|     scoring: | ||||
|       alpha: 0.30       # KEV boost coefficient (defaults to 0.25) | ||||
|       beta: 0.60        # EPSS boost coefficient (defaults to 0.50) | ||||
| ``` | ||||
|  | ||||
| * All weights (tiers + overrides) are clamped to `[0, weights.ceiling]` with structured warnings when a value is out of range or not a finite number. | ||||
| * `weights.ceiling` itself is constrained to `[1.0, 5.0]`, preserving prior behaviour when omitted. | ||||
| * `scoring.alpha` / `scoring.beta` accept non-negative values up to 5.0; values outside the range fall back to defaults and surface diagnostics to operators. | ||||
|  | ||||
| ## Data model (after Phase 1) | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "vulnerabilityId": "CVE-2025-12345", | ||||
|   "product": "pkg:name@version", | ||||
|   "consensus": { | ||||
|     "status": "affected", | ||||
|     "policyRevisionId": "rev-12", | ||||
|     "policyDigest": "0D9AEC…" | ||||
|   }, | ||||
|   "signals": { | ||||
|     "severity": {"scheme": "CVSS:3.1", "score": 7.5}, | ||||
|     "kev": true, | ||||
|     "epss": 0.40 | ||||
|   }, | ||||
|   "policy": { | ||||
|     "weight": 1.15, | ||||
|     "alpha": 0.25, | ||||
|     "beta": 0.5 | ||||
|   }, | ||||
|   "score": { | ||||
|     "value": 10.8, | ||||
|     "generatedAt": "2025-11-05T14:12:30Z", | ||||
|     "audit": [ | ||||
|       "gate:affected", | ||||
|       "weight:1.15", | ||||
|       "severity:7.5", | ||||
|       "kev:1", | ||||
|       "epss:0.40" | ||||
|     ] | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## Operational guidance | ||||
|  | ||||
| * **Inputs**: Concelier delivers severity/KEV/EPSS via the advisory event log; Excititor connectors load VEX statements. Policy owns trust tiers and coefficients. | ||||
| * **Processing**: the scoring engine (Phase 2) runs next to consensus, storing results with deterministic hashes so exports and attestations can reference them. | ||||
| * **Consumption**: WebService/CLI will return consensus plus score; scanners may suppress findings only when policy-authorized VEX gating and signed score envelopes agree. | ||||
|  | ||||
| ## Pseudocode (Phase 2 preview) | ||||
|  | ||||
| ```python | ||||
| def risk_score(gate, weight, severity, kev, epss, alpha, beta, freeze_boosts=False): | ||||
|     if gate == 0: | ||||
|         return 0 | ||||
|     if freeze_boosts: | ||||
|         kev, epss = 0, 0 | ||||
|     boost = 1 + alpha * kev + beta * epss | ||||
|     return max(0, weight * severity * boost) | ||||
| ``` | ||||
|  | ||||
| ## FAQ | ||||
|  | ||||
| * **Can operators opt out?** Set α=β=0 or keep weights ≤1.0 via policy. | ||||
| * **What about missing signals?** Treat them as zero and log the omission. | ||||
| * **When will this ship?** Phase 1 is planned for Sprint 7; later phases depend on connector coverage and attestation delivery. | ||||
		Reference in New Issue
	
	Block a user