Files
git.stella-ops.org/docs/ARCHITECTURE_EXCITITOR.md
master 791e12baab
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add tests and implement StubBearer authentication for Signer endpoints
- Created SignerEndpointsTests to validate the SignDsse and VerifyReferrers endpoints.
- Implemented StubBearerAuthenticationDefaults and StubBearerAuthenticationHandler for token-based authentication.
- Developed ConcelierExporterClient for managing Trivy DB settings and export operations.
- Added TrivyDbSettingsPageComponent for UI interactions with Trivy DB settings, including form handling and export triggering.
- Implemented styles and HTML structure for Trivy DB settings page.
- Created NotifySmokeCheck tool for validating Redis event streams and Notify deliveries.
2025-10-21 09:37:07 +03:00

535 lines
22 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# component_architecture_excititor.md — **StellaOps Excititor** (2025Q4)
> **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, APIs, plugin contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Concelier, and the attestation chain. It is implementationready.
---
## 0) Mission & role in the platform
**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress nonexploitable findings, prioritize remaining risk, and explain decisions.
**Boundaries.**
* Excititor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
* Excititor preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
* VEX consumption is **backendonly**: Scanner never applies VEX. The backends **Policy Engine** asks Excititor for status evidence and then decides what to show.
---
## 1) Inputs, outputs & canonical domain
### 1.1 Accepted input formats (ingest)
* **OpenVEX** JSON documents (attested or raw).
* **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF).
* **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks).
* **OCIattached attestations** (VEX statements shipped as OCI referrers) — optional connectors.
All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
### 1.2 Canonical model (normalized)
Every incoming statement becomes a set of **VexClaim** records:
```
VexClaim
- providerId // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
- vulnId // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
- productKey // canonical product identity (see §2.2)
- status // affected | not_affected | fixed | under_investigation
- justification? // for 'not_affected'/'affected' where provided
- introducedVersion? // semantics per provider (range or exact)
- fixedVersion? // where provided (range or exact)
- lastObserved // timestamp from source or fetch time
- provenance // doc digest, signature status, fetch URI, line/offset anchors
- evidence[] // raw source snippets for explainability
- supersedes? // optional cross-doc chain (docDigest → docDigest)
```
### 1.3 Exports (consumption)
* **VexConsensus** per `(vulnId, productKey)` with:
* `rollupStatus` (after policy weights/justification gates),
* `sources[]` (winning + losing claims with weights & reasons),
* `policyRevisionId` (identifier of the Excititor policy used),
* `consensusDigest` (stable SHA256 over canonical JSON).
* **Raw claims** export for auditing (unchanged, with provenance).
* **Provider snapshots** (per source, last N days) for operator debugging.
* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
---
## 2) Identity model — products & joins
### 2.1 Vuln identity
* Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets.
* **Alias graph** maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable.
### 2.2 Product identity (`productKey`)
* **Primary:** `purl` (Package URL).
* **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable.
* **Fallback:** `oci:<registry>/<repo>@<digest>` for imagelevel VEX.
* **Special cases:** kernel modules, firmware, platforms → providerspecific mapping helpers (connector captures providers product taxonomy → canonical `productKey`).
> Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **nonjoinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping.
---
## 3) Storage schema (MongoDB)
Database: `excititor`
### 3.1 Collections
**`vex.providers`**
```
_id: providerId
name, homepage, contact
trustTier: enum {vendor, distro, platform, hub, attestation}
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
enabled: bool
createdAt, modifiedAt
```
**`vex.raw`** (immutable raw documents)
```
_id: sha256(doc bytes)
providerId
uri
ingestedAt
contentType
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
payload: GridFS pointer (if large)
disposition: kept|replaced|superseded
correlation: { replaces?: sha256, replacedBy?: sha256 }
```
**`vex.statements`** (immutable normalized rows; append-only event log)
```
_id: ObjectId
providerId
vulnId
productKey
status
justification?
introducedVersion?
fixedVersion?
lastObserved
docDigest
provenance { uri, line?, pointer?, signatureState }
evidence[] { key, value, locator }
signals? {
severity? { scheme, score?, label?, vector? }
kev?: bool
epss?: double
}
insertedAt
indices:
- {vulnId:1, productKey:1}
- {providerId:1, insertedAt:-1}
- {docDigest:1}
- {status:1}
- text index (optional) on evidence.value for debugging
```
**`vex.consensus`** (rollups)
```
_id: sha256(canonical(vulnId, productKey, policyRevision))
vulnId
productKey
rollupStatus
sources[]: [
{ providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
]
policyRevisionId
evaluatedAt
signals? {
severity? { scheme, score?, label?, vector? }
kev?: bool
epss?: double
}
consensusDigest // same as _id
indices:
- {vulnId:1, productKey:1}
- {policyRevisionId:1, evaluatedAt:-1}
```
**`vex.exports`** (manifest of emitted artifacts)
```
_id
querySignature
format: raw|consensus|index
artifactSha256
rekor { uuid, index, url }?
createdAt
policyRevisionId
cacheable: bool
```
**`vex.cache`**
```
querySignature -> exportId (for fast reuse)
ttl, hits
```
**`vex.migrations`**
* ordered migrations applied at bootstrap to ensure indexes.
* `20251019-consensus-signals-statements` introduces the statements log indexes and the `policyRevisionId + evaluatedAt` lookup for consensus — rerun consensus writers once to hydrate newly persisted signals.
### 3.2 Indexing strategy
* Hot path queries use exact `(vulnId, productKey)` and timebounded windows; compound indexes cover both.
* Providers list view by `lastObserved` for monitoring staleness.
* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
---
## 4) Ingestion pipeline
### 4.1 Connector contract
```csharp
public interface IVexConnector
{
string ProviderId { get; }
Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs
Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
}
```
* **Fetch** must implement: window scheduling, conditional GET (ETag/IfModifiedSince), rate limiting, retry/backoff.
* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
### 4.2 Signature verification (per provider)
* **cosign (keyless or keyful)** for OCI referrers or HTTPserved JSON with Sigstore bundles.
* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
* **x509** (mutual TLS / providerpinned certs) where applicable.
* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can downweight or ignore them.
### 4.3 Time discipline
* For each doc, prefer **providers document timestamp**; if absent, use fetch time.
* Claims carry `lastObserved` which drives **tiebreaking** within equal weight tiers.
---
## 5) Normalization: product & status semantics
### 5.1 Product mapping
* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
* Where a provider publishes **platformlevel** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
* If expansion would be speculative, the claim remains **platformscoped** with `productKey="platform:redhat:rhel:9"` and is flagged **nonjoinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
### 5.2 Status + justification mapping
* Canonical **status**: `affected | not_affected | fixed | under_investigation`.
* **Justifications** normalized to a controlled vocabulary (CISAaligned), e.g.:
* `component_not_present`
* `vulnerable_code_not_in_execute_path`
* `vulnerable_configuration_unused`
* `inline_mitigation_applied`
* `fix_available` (with `fixedVersion`)
* `under_investigation`
* Providers with freetext justifications are mapped by deterministic tables; raw text preserved as `evidence`.
---
## 6) Consensus algorithm
**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
### 6.1 Inputs
* Set **S** of `VexClaim` for the key.
* **Excititor policy snapshot**:
* **weights** per provider tier and per provider overrides.
* **justification gates** (e.g., require justification for `not_affected` to be acceptable).
* **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros).
* **signature requirements** (e.g., require verified signature for fixed to be considered).
### 6.2 Steps
1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
2. **Score** each claim:
`score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
4. **Pick** `rollupStatus = argmax_status W(status)`.
5. **Tiebreakers** (in order):
* Higher **max single** provider score wins (vendor > distro > platform > hub).
* More **recent** lastObserved wins.
* Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
---
## 7) Query & export APIs
All endpoints are versioned under `/api/v1/vex`.
### 7.1 Query (online)
```
POST /claims/search
body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
→ { claims[], nextPageToken? }
POST /consensus/search
body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
→ { entries[], nextPageToken? }
POST /excititor/resolve (scope: vex.read)
body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
→ { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] }
```
### 7.2 Exports (cacheable snapshots)
```
POST /exports
body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
→ { exportId, artifactSha256, rekor? }
GET /exports/{exportId} → bytes (application/json or binary index)
GET /exports/{exportId}/meta → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }
```
### 7.3 Provider operations
```
GET /providers → provider list & signature policy
POST /providers/{id}/refresh → trigger fetch/normalize window
GET /providers/{id}/status → last fetch, doc counts, signature stats
```
**Auth:** servicetoservice via Authority tokens; operator operations via UI/CLI with RBAC.
---
## 8) Attestation integration
* Exports can be **DSSEsigned** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines).
* `vex.exports.rekor` stores `{uuid, index, url}` when present.
* **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields:
* `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`.
---
## 9) Configuration (YAML)
```yaml
excititor:
mongo: { uri: "mongodb://mongo/excititor" }
s3:
endpoint: http://minio:9000
bucket: stellaops
policy:
weights:
vendor: 1.0
distro: 0.9
platform: 0.7
hub: 0.5
attestation: 0.6
ceiling: 1.25
scoring:
alpha: 0.25
beta: 0.5
providerOverrides:
redhat: 1.0
suse: 0.95
requireJustificationForNotAffected: true
signatureRequiredForFixed: true
minEvidence:
not_affected:
vendorOrTwoDistros: true
connectors:
- providerId: redhat
kind: csaf
baseUrl: https://access.redhat.com/security/data/csaf/v2/
signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
windowDays: 7
- providerId: suse
kind: csaf
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
- providerId: ubuntu
kind: openvex
baseUrl: https://…/vex/
signaturePolicy: { type: none }
- providerId: vendorX
kind: cyclonedx-vex
ociRef: ghcr.io/vendorx/vex@sha256:…
signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }
```
### 9.1 WebService endpoints
With storage configured, the WebService exposes the following ingress and diagnostic APIs:
* `GET /excititor/status` returns the active storage configuration and registered artifact stores.
* `GET /excititor/health` simple liveness probe.
* `POST /excititor/statements` accepts normalized VEX statements and persists them via `IVexClaimStore`; use this for migrations/backfills.
* `GET /excititor/statements/{vulnId}/{productKey}?since=` returns the immutable statement log for a vulnerability/product pair.
* `POST /excititor/resolve` requires `vex.read` scope; accepts up to 256 `(vulnId, productKey)` pairs via `productKeys` or `purls` and returns deterministic consensus results, decision telemetry, and a signed envelope (`artifact` digest, optional signer signature, optional attestation metadata + DSSE envelope). Returns **409 Conflict** when the requested `policyRevisionId` mismatches the active snapshot.
Run the ingestion endpoint once after applying migration `20251019-consensus-signals-statements` to repopulate historical statements with the new severity/KEV/EPSS signal fields.
* `weights.ceiling` raises the deterministic clamp applied to provider tiers/overrides (range 1.05.0). Values outside the range are clamped with warnings so operators can spot typos.
* `scoring.alpha` / `scoring.beta` configure KEV/EPSS boosts for the Phase1 → Phase2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics.
---
## 10) Security model
* **Input signature verification** enforced per provider policy (PGP, cosign, x509).
* **Connector allowlists**: outbound fetch constrained to configured domains.
* **Tenant isolation**: pertenant DB prefixes or separate DBs; pertenant S3 prefixes; pertenant policies.
* **AuthN/Z**: Authorityissued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
---
## 11) Performance & scale
* **Targets:**
* Normalize 10k VEX claims/minute/core.
* Consensus compute ≤50ms for 1k unique `(vuln, product)` pairs in hot cache.
* Export (consensus) 1M rows in ≤60s on 8 cores with streaming writer.
* **Scaling:**
* WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with ratelimits; Mongo writes batched; upserts by natural keys.
* Exports stream straight to S3 (MinIO) with rolling buffers.
* **Caching:**
* `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`.
### 11.1 Worker TTL refresh controls
Excititor.Worker ships with a background refresh service that re-evaluates stale consensus rows and applies stability dampers before publishing status flips. Operators can tune its behaviour through the following configuration (shown in `appsettings.json` syntax):
```jsonc
{
"Excititor": {
"Worker": {
"Refresh": {
"Enabled": true,
"ConsensusTtl": "02:00:00", // refresh consensus older than 2 hours
"ScanInterval": "00:10:00", // sweep cadence
"ScanBatchSize": 250, // max documents examined per sweep
"Damper": {
"Minimum": "1.00:00:00", // lower bound before status flip publishes
"Maximum": "2.00:00:00", // upper bound guardrail
"DefaultDuration": "1.12:00:00",
"Rules": [
{ "MinWeight": 0.90, "Duration": "1.00:00:00" },
{ "MinWeight": 0.75, "Duration": "1.06:00:00" },
{ "MinWeight": 0.50, "Duration": "1.12:00:00" }
]
}
}
}
}
}
```
* `ConsensusTtl` governs when the worker issues a fresh resolve for cached consensus data.
* `Damper` lengths are clamped between `Minimum`/`Maximum`; duration is bypassed when component fingerprints (`VexProduct.ComponentIdentifiers`) change.
* The same keys are available through environment variables (e.g., `Excititor__Worker__Refresh__ConsensusTtl=02:00:00`).
---
## 12) Observability
* **Metrics:**
* `vex.ingest.docs_total{provider}`
* `vex.normalize.claims_total{provider}`
* `vex.signature.failures_total{provider,method}`
* `vex.consensus.conflicts_total{vulnId}`
* `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hitrate.
---
## 13) Testing matrix
* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
* **Normalization edge cases:** platformonly claims, freetext justifications, nonpurl products.
* **Consensus:** conflict scenarios across tiers; check tiebreakers; justification gates.
* **Performance:** 1Mrow export timing; memory ceilings; stream correctness.
* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
* **API contract tests:** pagination, filters, RBAC, rate limits.
---
## 14) Integration points
* **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
* **Concelier**: provides alias graph (CVE↔vendor IDs) and may supply VEXadjacent metadata (e.g., KEV flag) for policy escalation.
* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
---
## 15) Failure modes & fallback
* **Provider unreachable:** stale thresholds trigger warnings; policy can downweight stale providers automatically (freshness factor).
* **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or downweight per policy.
* **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**.
---
## 16) Rollout plan (incremental)
1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`.
2. **Signature policies**: PGP for distros; cosign for OCI.
3. **Exports + optional attestation**.
4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer.
5. **Scale hardening**: export indexes; conflict analytics.
---
## 17) Operational runbooks
* **Statement backfill** — see `docs/dev/EXCITITOR_STATEMENT_BACKFILL.md` for the CLI workflow, required permissions, observability guidance, and rollback steps.
---
## 18) Appendix — canonical JSON (stable ordering)
All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`:
* UTF8 without BOM;
* keys sorted (ASCII);
* arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated;
* timestamps in `YYYYMMDDThh:mm:ssZ`;
* no insignificant whitespace.