feat: Implement vulnerability token signing and verification utilities
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added VulnTokenSigner for signing JWT tokens with specified algorithms and keys. - Introduced VulnTokenUtilities for resolving tenant and subject claims, and sanitizing context dictionaries. - Created VulnTokenVerificationUtilities for parsing tokens, verifying signatures, and deserializing payloads. - Developed VulnWorkflowAntiForgeryTokenIssuer for issuing anti-forgery tokens with configurable options. - Implemented VulnWorkflowAntiForgeryTokenVerifier for verifying anti-forgery tokens and validating payloads. - Added AuthorityVulnerabilityExplorerOptions to manage configuration for vulnerability explorer features. - Included tests for FilesystemPackRunDispatcher to ensure proper job handling under egress policy restrictions.
This commit is contained in:
@@ -146,6 +146,48 @@ plan? = <plan name> // optional hint for UIs; not used for e
|
||||
|
||||
---
|
||||
|
||||
### 3.5 Vuln Explorer workflow safeguards
|
||||
|
||||
* **Anti-forgery flow** — Vuln Explorer’s mutation verbs call
|
||||
* `POST /vuln/workflow/anti-forgery/issue`
|
||||
* `POST /vuln/workflow/anti-forgery/verify`
|
||||
|
||||
Callers must hold `vuln:operate` scopes. Issued tokens embed the actor, tenant, whitelisted actions, ABAC selectors (environment/owner/business tier), and optional context key/value pairs. Tokens are EdDSA/ES256 signed via the primary Authority signing key and default to a 10‑minute TTL (cap: 30 minutes). Verification enforces nonce reuse prevention, tenant match, and action membership before forwarding the request to Vuln Explorer.
|
||||
|
||||
* **Attachment access** — Evidence bundles and attachments reference a ledger hash. Vuln Explorer obtains a scoped download token through:
|
||||
* `POST /vuln/attachments/tokens/issue`
|
||||
* `POST /vuln/attachments/tokens/verify`
|
||||
|
||||
These tokens bind the ledger event hash, attachment identifier, optional finding/content metadata, and the actor. They default to a 30‑minute TTL (cap: 4 hours) and require `vuln:investigate`.
|
||||
|
||||
* **Audit trail** — Both flows emit `vuln.workflow.csrf.*` and `vuln.attachment.token.*` audit records with tenant, actor, ledger hash, nonce, and filtered context metadata so Offline Kit operators can reconcile actions against ledger entries.
|
||||
|
||||
* **Configuration**
|
||||
|
||||
```yaml
|
||||
authority:
|
||||
vulnerabilityExplorer:
|
||||
workflow:
|
||||
antiForgery:
|
||||
enabled: true
|
||||
audience: "stellaops:vuln-workflow"
|
||||
defaultLifetime: "00:10:00"
|
||||
maxLifetime: "00:30:00"
|
||||
maxContextEntries: 16
|
||||
maxContextValueLength: 256
|
||||
attachments:
|
||||
enabled: true
|
||||
defaultLifetime: "00:30:00"
|
||||
maxLifetime: "04:00:00"
|
||||
payloadType: "application/vnd.stellaops.vuln-attachment-token+json"
|
||||
maxMetadataEntries: 16
|
||||
maxMetadataValueLength: 512
|
||||
```
|
||||
|
||||
Air-gapped bundles include the signing key material and policy snapshots required to validate these tokens offline.
|
||||
|
||||
---
|
||||
|
||||
## 4) Audiences, scopes & RBAC
|
||||
|
||||
### 4.1 Audiences
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Vexer agent guide
|
||||
# Excitor agent guide
|
||||
|
||||
## Mission
|
||||
Vexer computes deterministic consensus across VEX claims, preserving conflicts and producing attestable evidence for policy suppression.
|
||||
Excitor computes deterministic consensus across VEX claims, preserving conflicts and producing attestable evidence for policy suppression.
|
||||
|
||||
## Key docs
|
||||
- [Module README](./README.md)
|
||||
@@ -21,9 +21,9 @@ Vexer computes deterministic consensus across VEX claims, preserving conflicts a
|
||||
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
|
||||
- Update runbooks/observability assets when operational characteristics change.
|
||||
## Required Reading
|
||||
- `docs/modules/vexer/README.md`
|
||||
- `docs/modules/vexer/architecture.md`
|
||||
- `docs/modules/vexer/implementation_plan.md`
|
||||
- `docs/modules/excitor/README.md`
|
||||
- `docs/modules/excitor/architecture.md`
|
||||
- `docs/modules/excitor/implementation_plan.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
|
||||
## Working Agreement
|
||||
@@ -1,34 +1,34 @@
|
||||
# StellaOps Vexer
|
||||
|
||||
Vexer computes deterministic consensus across VEX claims, preserving conflicts and producing attestable evidence for policy suppression.
|
||||
|
||||
## Responsibilities
|
||||
- Ingest Excititor observations and compute per-product consensus snapshots.
|
||||
- Provide APIs for querying canonical VEX positions and conflict sets.
|
||||
- Publish exports and DSSE-ready digests for downstream consumption.
|
||||
- Keep provenance weights and disagreement metadata.
|
||||
|
||||
## Key components
|
||||
- Consensus engine and API host in `StellaOps.Vexer.*` (to-be-implemented).
|
||||
- Storage schema for consensus graphs.
|
||||
- Integration hooks for Policy Engine suppression logic.
|
||||
|
||||
## Integrations & dependencies
|
||||
- Excititor for raw observations.
|
||||
- Policy Engine and UI for suppression stories.
|
||||
- CLI for evidence inspection.
|
||||
|
||||
## Operational notes
|
||||
- Deterministic consensus algorithms (see architecture).
|
||||
- Planned telemetry for disagreement counts and freshness.
|
||||
- Offline exports aligning with Concelier/Excititor timelines.
|
||||
|
||||
## Related resources
|
||||
- ./scoring.md
|
||||
|
||||
## Backlog references
|
||||
- DOCS-VEXER backlog referenced in architecture doc.
|
||||
- CLI parity tracked in ../../TASKS.md (CLI-GRAPH/VEX stories).
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 7 – VEX Consensus Lens:** deliver trust-weighted consensus snapshots, disagreement metadata, and explain APIs.
|
||||
# StellaOps Excitor
|
||||
|
||||
Excitor computes deterministic consensus across VEX claims, preserving conflicts and producing attestable evidence for policy suppression.
|
||||
|
||||
## Responsibilities
|
||||
- Ingest Excititor observations and compute per-product consensus snapshots.
|
||||
- Provide APIs for querying canonical VEX positions and conflict sets.
|
||||
- Publish exports and DSSE-ready digests for downstream consumption.
|
||||
- Keep provenance weights and disagreement metadata.
|
||||
|
||||
## Key components
|
||||
- Consensus engine and API host in `StellaOps.Excitor.*` (to-be-implemented).
|
||||
- Storage schema for consensus graphs.
|
||||
- Integration hooks for Policy Engine suppression logic.
|
||||
|
||||
## Integrations & dependencies
|
||||
- Excititor for raw observations.
|
||||
- Policy Engine and UI for suppression stories.
|
||||
- CLI for evidence inspection.
|
||||
|
||||
## Operational notes
|
||||
- Deterministic consensus algorithms (see architecture).
|
||||
- Planned telemetry for disagreement counts and freshness.
|
||||
- Offline exports aligning with Concelier/Excititor timelines.
|
||||
|
||||
## Related resources
|
||||
- ./scoring.md
|
||||
|
||||
## Backlog references
|
||||
- DOCS-EXCITOR backlog referenced in architecture doc.
|
||||
- CLI parity tracked in ../../TASKS.md (CLI-GRAPH/VEX stories).
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 7 – VEX Consensus Lens:** deliver trust-weighted consensus snapshots, disagreement metadata, and explain APIs.
|
||||
9
docs/modules/excitor/TASKS.md
Normal file
9
docs/modules/excitor/TASKS.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Task board — Excitor
|
||||
|
||||
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
|
||||
|
||||
| ID | Status | Owner(s) | Description | Notes |
|
||||
|----|--------|----------|-------------|-------|
|
||||
| EXCITOR-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
|
||||
| EXCITOR-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
|
||||
| EXCITOR-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |
|
||||
@@ -1,465 +1,465 @@
|
||||
# component_architecture_vexer.md — **Stella Ops Vexer** (2025Q4)
|
||||
# component_architecture_excitor.md — **Stella Ops Excitor** (2025Q4)
|
||||
|
||||
> Built to satisfy Epic 7 – VEX Consensus Lens requirements.
|
||||
|
||||
> **Scope.** This document specifies the **Vexer** service: its purpose, trust model, data structures, APIs, plug‑in contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Feedser, and the attestation chain. It is implementation‑ready.
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & role in the platform
|
||||
|
||||
**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress non‑exploitable findings, prioritize remaining risk, and explain decisions.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Vexer **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
|
||||
* Vexer preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
|
||||
* VEX consumption is **backend‑only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Vexer for status evidence and then decides what to show.
|
||||
|
||||
---
|
||||
|
||||
## 1) Inputs, outputs & canonical domain
|
||||
|
||||
### 1.1 Accepted input formats (ingest)
|
||||
|
||||
* **OpenVEX** JSON documents (attested or raw).
|
||||
* **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF).
|
||||
* **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks).
|
||||
* **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors.
|
||||
|
||||
All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
|
||||
|
||||
### 1.2 Canonical model (normalized)
|
||||
|
||||
Every incoming statement becomes a set of **VexClaim** records:
|
||||
|
||||
```
|
||||
VexClaim
|
||||
- providerId // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
|
||||
- vulnId // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
|
||||
- productKey // canonical product identity (see §2.2)
|
||||
- status // affected | not_affected | fixed | under_investigation
|
||||
- justification? // for 'not_affected'/'affected' where provided
|
||||
- introducedVersion? // semantics per provider (range or exact)
|
||||
- fixedVersion? // where provided (range or exact)
|
||||
- lastObserved // timestamp from source or fetch time
|
||||
- provenance // doc digest, signature status, fetch URI, line/offset anchors
|
||||
- evidence[] // raw source snippets for explainability
|
||||
- supersedes? // optional cross-doc chain (docDigest → docDigest)
|
||||
```
|
||||
|
||||
### 1.3 Exports (consumption)
|
||||
|
||||
* **VexConsensus** per `(vulnId, productKey)` with:
|
||||
|
||||
* `rollupStatus` (after policy weights/justification gates),
|
||||
* `sources[]` (winning + losing claims with weights & reasons),
|
||||
* `policyRevisionId` (identifier of the Vexer policy used),
|
||||
* `consensusDigest` (stable SHA‑256 over canonical JSON).
|
||||
* **Raw claims** export for auditing (unchanged, with provenance).
|
||||
* **Provider snapshots** (per source, last N days) for operator debugging.
|
||||
* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
|
||||
|
||||
All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
|
||||
|
||||
---
|
||||
|
||||
## 2) Identity model — products & joins
|
||||
|
||||
### 2.1 Vuln identity
|
||||
|
||||
* Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets.
|
||||
* **Alias graph** maintained (from Feedser) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable.
|
||||
|
||||
### 2.2 Product identity (`productKey`)
|
||||
|
||||
* **Primary:** `purl` (Package URL).
|
||||
* **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable.
|
||||
* **Fallback:** `oci:<registry>/<repo>@<digest>` for image‑level VEX.
|
||||
* **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`).
|
||||
|
||||
> Vexer does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping.
|
||||
|
||||
---
|
||||
|
||||
## 3) Storage schema (MongoDB)
|
||||
|
||||
Database: `vexer`
|
||||
|
||||
### 3.1 Collections
|
||||
|
||||
**`vex.providers`**
|
||||
|
||||
```
|
||||
_id: providerId
|
||||
name, homepage, contact
|
||||
trustTier: enum {vendor, distro, platform, hub, attestation}
|
||||
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
|
||||
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
|
||||
enabled: bool
|
||||
createdAt, modifiedAt
|
||||
```
|
||||
|
||||
**`vex.raw`** (immutable raw documents)
|
||||
|
||||
```
|
||||
_id: sha256(doc bytes)
|
||||
providerId
|
||||
uri
|
||||
ingestedAt
|
||||
contentType
|
||||
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
|
||||
payload: GridFS pointer (if large)
|
||||
disposition: kept|replaced|superseded
|
||||
correlation: { replaces?: sha256, replacedBy?: sha256 }
|
||||
```
|
||||
|
||||
**`vex.claims`** (normalized rows; dedupe on providerId+vulnId+productKey+docDigest)
|
||||
|
||||
```
|
||||
_id
|
||||
providerId
|
||||
vulnId
|
||||
productKey
|
||||
status
|
||||
justification?
|
||||
introducedVersion?
|
||||
fixedVersion?
|
||||
lastObserved
|
||||
docDigest
|
||||
provenance { uri, line?, pointer?, signatureState }
|
||||
evidence[] { key, value, locator }
|
||||
indices:
|
||||
- {vulnId:1, productKey:1}
|
||||
- {providerId:1, lastObserved:-1}
|
||||
- {status:1}
|
||||
- text index (optional) on evidence.value for debugging
|
||||
```
|
||||
|
||||
**`vex.consensus`** (rollups)
|
||||
|
||||
```
|
||||
_id: sha256(canonical(vulnId, productKey, policyRevision))
|
||||
vulnId
|
||||
productKey
|
||||
rollupStatus
|
||||
sources[]: [
|
||||
{ providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
|
||||
]
|
||||
policyRevisionId
|
||||
evaluatedAt
|
||||
consensusDigest // same as _id
|
||||
indices:
|
||||
- {vulnId:1, productKey:1}
|
||||
- {policyRevisionId:1, evaluatedAt:-1}
|
||||
```
|
||||
|
||||
**`vex.exports`** (manifest of emitted artifacts)
|
||||
|
||||
```
|
||||
_id
|
||||
querySignature
|
||||
format: raw|consensus|index
|
||||
artifactSha256
|
||||
rekor { uuid, index, url }?
|
||||
createdAt
|
||||
policyRevisionId
|
||||
cacheable: bool
|
||||
```
|
||||
|
||||
**`vex.cache`**
|
||||
|
||||
```
|
||||
querySignature -> exportId (for fast reuse)
|
||||
ttl, hits
|
||||
```
|
||||
|
||||
**`vex.migrations`**
|
||||
|
||||
* ordered migrations applied at bootstrap to ensure indexes.
|
||||
|
||||
### 3.2 Indexing strategy
|
||||
|
||||
* Hot path queries use exact `(vulnId, productKey)` and time‑bounded windows; compound indexes cover both.
|
||||
* Providers list view by `lastObserved` for monitoring staleness.
|
||||
* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
|
||||
|
||||
---
|
||||
|
||||
## 4) Ingestion pipeline
|
||||
|
||||
### 4.1 Connector contract
|
||||
|
||||
```csharp
|
||||
public interface IVexConnector
|
||||
{
|
||||
string ProviderId { get; }
|
||||
Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs
|
||||
Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
|
||||
}
|
||||
```
|
||||
|
||||
* **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff.
|
||||
* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
|
||||
|
||||
### 4.2 Signature verification (per provider)
|
||||
|
||||
* **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles.
|
||||
* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
|
||||
* **x509** (mutual TLS / provider‑pinned certs) where applicable.
|
||||
* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
|
||||
|
||||
> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can down‑weight or ignore them.
|
||||
|
||||
### 4.3 Time discipline
|
||||
|
||||
* For each doc, prefer **provider’s document timestamp**; if absent, use fetch time.
|
||||
* Claims carry `lastObserved` which drives **tie‑breaking** within equal weight tiers.
|
||||
|
||||
---
|
||||
|
||||
## 5) Normalization: product & status semantics
|
||||
|
||||
### 5.1 Product mapping
|
||||
|
||||
* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
|
||||
* Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
|
||||
* If expansion would be speculative, the claim remains **platform‑scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non‑joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
|
||||
|
||||
### 5.2 Status + justification mapping
|
||||
|
||||
* Canonical **status**: `affected | not_affected | fixed | under_investigation`.
|
||||
* **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.:
|
||||
|
||||
* `component_not_present`
|
||||
* `vulnerable_code_not_in_execute_path`
|
||||
* `vulnerable_configuration_unused`
|
||||
* `inline_mitigation_applied`
|
||||
* `fix_available` (with `fixedVersion`)
|
||||
* `under_investigation`
|
||||
* Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`.
|
||||
|
||||
---
|
||||
|
||||
## 6) Consensus algorithm
|
||||
|
||||
**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
|
||||
|
||||
### 6.1 Inputs
|
||||
|
||||
* Set **S** of `VexClaim` for the key.
|
||||
* **Vexer policy snapshot**:
|
||||
|
||||
* **weights** per provider tier and per provider overrides.
|
||||
* **justification gates** (e.g., require justification for `not_affected` to be acceptable).
|
||||
* **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros).
|
||||
* **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered).
|
||||
|
||||
### 6.2 Steps
|
||||
|
||||
1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
|
||||
2. **Score** each claim:
|
||||
`score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
|
||||
3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
|
||||
4. **Pick** `rollupStatus = argmax_status W(status)`.
|
||||
5. **Tie‑breakers** (in order):
|
||||
|
||||
* Higher **max single** provider score wins (vendor > distro > platform > hub).
|
||||
* More **recent** lastObserved wins.
|
||||
* Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
|
||||
6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
|
||||
|
||||
> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
|
||||
|
||||
---
|
||||
|
||||
## 7) Query & export APIs
|
||||
|
||||
All endpoints are versioned under `/api/v1/vex`.
|
||||
|
||||
### 7.1 Query (online)
|
||||
|
||||
```
|
||||
POST /claims/search
|
||||
body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
|
||||
→ { claims[], nextPageToken? }
|
||||
|
||||
POST /consensus/search
|
||||
body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
|
||||
→ { entries[], nextPageToken? }
|
||||
|
||||
POST /excititor/resolve (scope: vex.read)
|
||||
body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
|
||||
→ { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] }
|
||||
```
|
||||
|
||||
### 7.2 Exports (cacheable snapshots)
|
||||
|
||||
```
|
||||
POST /exports
|
||||
body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
|
||||
→ { exportId, artifactSha256, rekor? }
|
||||
|
||||
GET /exports/{exportId} → bytes (application/json or binary index)
|
||||
GET /exports/{exportId}/meta → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }
|
||||
```
|
||||
|
||||
### 7.3 Provider operations
|
||||
|
||||
```
|
||||
GET /providers → provider list & signature policy
|
||||
POST /providers/{id}/refresh → trigger fetch/normalize window
|
||||
GET /providers/{id}/status → last fetch, doc counts, signature stats
|
||||
```
|
||||
|
||||
**Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC.
|
||||
|
||||
---
|
||||
|
||||
## 8) Attestation integration
|
||||
|
||||
* Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines).
|
||||
* `vex.exports.rekor` stores `{uuid, index, url}` when present.
|
||||
* **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields:
|
||||
|
||||
* `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`.
|
||||
|
||||
---
|
||||
|
||||
## 9) Configuration (YAML)
|
||||
|
||||
```yaml
|
||||
vexer:
|
||||
mongo: { uri: "mongodb://mongo/vexer" }
|
||||
s3:
|
||||
endpoint: http://minio:9000
|
||||
bucket: stellaops
|
||||
policy:
|
||||
weights:
|
||||
vendor: 1.0
|
||||
distro: 0.9
|
||||
platform: 0.7
|
||||
hub: 0.5
|
||||
attestation: 0.6
|
||||
providerOverrides:
|
||||
redhat: 1.0
|
||||
suse: 0.95
|
||||
requireJustificationForNotAffected: true
|
||||
signatureRequiredForFixed: true
|
||||
minEvidence:
|
||||
not_affected:
|
||||
vendorOrTwoDistros: true
|
||||
connectors:
|
||||
- providerId: redhat
|
||||
kind: csaf
|
||||
baseUrl: https://access.redhat.com/security/data/csaf/v2/
|
||||
signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
|
||||
windowDays: 7
|
||||
- providerId: suse
|
||||
kind: csaf
|
||||
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
|
||||
signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
|
||||
- providerId: ubuntu
|
||||
kind: openvex
|
||||
baseUrl: https://…/vex/
|
||||
signaturePolicy: { type: none }
|
||||
- providerId: vendorX
|
||||
kind: cyclonedx-vex
|
||||
ociRef: ghcr.io/vendorx/vex@sha256:…
|
||||
signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10) Security model
|
||||
|
||||
* **Input signature verification** enforced per provider policy (PGP, cosign, x509).
|
||||
* **Connector allowlists**: outbound fetch constrained to configured domains.
|
||||
* **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies.
|
||||
* **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
|
||||
* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
|
||||
|
||||
---
|
||||
|
||||
## 11) Performance & scale
|
||||
|
||||
* **Targets:**
|
||||
|
||||
* Normalize 10k VEX claims/minute/core.
|
||||
* Consensus compute ≤ 50 ms for 1k unique `(vuln, product)` pairs in hot cache.
|
||||
* Export (consensus) 1M rows in ≤ 60 s on 8 cores with streaming writer.
|
||||
|
||||
* **Scaling:**
|
||||
|
||||
* WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys.
|
||||
* Exports stream straight to S3 (MinIO) with rolling buffers.
|
||||
|
||||
* **Caching:**
|
||||
|
||||
* `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`.
|
||||
|
||||
---
|
||||
|
||||
## 12) Observability
|
||||
|
||||
* **Metrics:**
|
||||
|
||||
* `vex.ingest.docs_total{provider}`
|
||||
* `vex.normalize.claims_total{provider}`
|
||||
* `vex.signature.failures_total{provider,method}`
|
||||
* `vex.consensus.conflicts_total{vulnId}`
|
||||
* `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
|
||||
* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
|
||||
* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hit‑rate.
|
||||
|
||||
---
|
||||
|
||||
## 13) Testing matrix
|
||||
|
||||
* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
|
||||
* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
|
||||
* **Normalization edge cases:** platform‑only claims, free‑text justifications, non‑purl products.
|
||||
* **Consensus:** conflict scenarios across tiers; check tie‑breakers; justification gates.
|
||||
* **Performance:** 1M‑row export timing; memory ceilings; stream correctness.
|
||||
* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
|
||||
* **API contract tests:** pagination, filters, RBAC, rate limits.
|
||||
|
||||
---
|
||||
|
||||
## 14) Integration points
|
||||
|
||||
* **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
|
||||
* **Feedser**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation.
|
||||
* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
|
||||
* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
|
||||
|
||||
---
|
||||
|
||||
## 15) Failure modes & fallback
|
||||
|
||||
* **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor).
|
||||
* **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy.
|
||||
* **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**.
|
||||
|
||||
---
|
||||
|
||||
## 16) Rollout plan (incremental)
|
||||
|
||||
1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`.
|
||||
2. **Signature policies**: PGP for distros; cosign for OCI.
|
||||
3. **Exports + optional attestation**.
|
||||
4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer.
|
||||
5. **Scale hardening**: export indexes; conflict analytics.
|
||||
|
||||
---
|
||||
|
||||
## 17) Appendix — canonical JSON (stable ordering)
|
||||
|
||||
All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`:
|
||||
|
||||
* UTF‑8 without BOM;
|
||||
* keys sorted (ASCII);
|
||||
* arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated;
|
||||
* timestamps in `YYYY‑MM‑DDThh:mm:ssZ`;
|
||||
* no insignificant whitespace.
|
||||
|
||||
|
||||
> **Scope.** This document specifies the **Excitor** service: its purpose, trust model, data structures, APIs, plug‑in contracts, storage schema, normalization/consensus algorithms, performance budgets, testing matrix, and how it integrates with Scanner, Policy, Conselier, and the attestation chain. It is implementation‑ready.
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & role in the platform
|
||||
|
||||
**Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into **canonical, queryable claims**; compute **deterministic consensus** per *(vuln, product)*; preserve **conflicts with provenance**; publish **stable, attestable exports** that the backend uses to suppress non‑exploitable findings, prioritize remaining risk, and explain decisions.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Excitor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights).
|
||||
* Excitor preserves **conflicting claims** unchanged; consensus encodes how we would pick, but the raw set is always exportable.
|
||||
* VEX consumption is **backend‑only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Excitor for status evidence and then decides what to show.
|
||||
|
||||
---
|
||||
|
||||
## 1) Inputs, outputs & canonical domain
|
||||
|
||||
### 1.1 Accepted input formats (ingest)
|
||||
|
||||
* **OpenVEX** JSON documents (attested or raw).
|
||||
* **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF).
|
||||
* **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks).
|
||||
* **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors.
|
||||
|
||||
All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors.
|
||||
|
||||
### 1.2 Canonical model (normalized)
|
||||
|
||||
Every incoming statement becomes a set of **VexClaim** records:
|
||||
|
||||
```
|
||||
VexClaim
|
||||
- providerId // 'redhat', 'suse', 'ubuntu', 'github', 'vendorX'
|
||||
- vulnId // 'CVE-2025-12345', 'GHSA-xxxx', canonicalized
|
||||
- productKey // canonical product identity (see §2.2)
|
||||
- status // affected | not_affected | fixed | under_investigation
|
||||
- justification? // for 'not_affected'/'affected' where provided
|
||||
- introducedVersion? // semantics per provider (range or exact)
|
||||
- fixedVersion? // where provided (range or exact)
|
||||
- lastObserved // timestamp from source or fetch time
|
||||
- provenance // doc digest, signature status, fetch URI, line/offset anchors
|
||||
- evidence[] // raw source snippets for explainability
|
||||
- supersedes? // optional cross-doc chain (docDigest → docDigest)
|
||||
```
|
||||
|
||||
### 1.3 Exports (consumption)
|
||||
|
||||
* **VexConsensus** per `(vulnId, productKey)` with:
|
||||
|
||||
* `rollupStatus` (after policy weights/justification gates),
|
||||
* `sources[]` (winning + losing claims with weights & reasons),
|
||||
* `policyRevisionId` (identifier of the Excitor policy used),
|
||||
* `consensusDigest` (stable SHA‑256 over canonical JSON).
|
||||
* **Raw claims** export for auditing (unchanged, with provenance).
|
||||
* **Provider snapshots** (per source, last N days) for operator debugging.
|
||||
* **Index** optimized for backend joins: `(productKey, vulnId) → (status, confidence, sourceSet)`.
|
||||
|
||||
All exports are **deterministic**, and (optionally) **attested** via DSSE and logged to Rekor v2.
|
||||
|
||||
---
|
||||
|
||||
## 2) Identity model — products & joins
|
||||
|
||||
### 2.1 Vuln identity
|
||||
|
||||
* Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets.
|
||||
* **Alias graph** maintained (from Conselier) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable.
|
||||
|
||||
### 2.2 Product identity (`productKey`)
|
||||
|
||||
* **Primary:** `purl` (Package URL).
|
||||
* **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable.
|
||||
* **Fallback:** `oci:<registry>/<repo>@<digest>` for image‑level VEX.
|
||||
* **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`).
|
||||
|
||||
> Excitor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping.
|
||||
|
||||
---
|
||||
|
||||
## 3) Storage schema (MongoDB)
|
||||
|
||||
Database: `excitor`
|
||||
|
||||
### 3.1 Collections
|
||||
|
||||
**`vex.providers`**
|
||||
|
||||
```
|
||||
_id: providerId
|
||||
name, homepage, contact
|
||||
trustTier: enum {vendor, distro, platform, hub, attestation}
|
||||
signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] }
|
||||
fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays }
|
||||
enabled: bool
|
||||
createdAt, modifiedAt
|
||||
```
|
||||
|
||||
**`vex.raw`** (immutable raw documents)
|
||||
|
||||
```
|
||||
_id: sha256(doc bytes)
|
||||
providerId
|
||||
uri
|
||||
ingestedAt
|
||||
contentType
|
||||
sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? }
|
||||
payload: GridFS pointer (if large)
|
||||
disposition: kept|replaced|superseded
|
||||
correlation: { replaces?: sha256, replacedBy?: sha256 }
|
||||
```
|
||||
|
||||
**`vex.claims`** (normalized rows; dedupe on providerId+vulnId+productKey+docDigest)
|
||||
|
||||
```
|
||||
_id
|
||||
providerId
|
||||
vulnId
|
||||
productKey
|
||||
status
|
||||
justification?
|
||||
introducedVersion?
|
||||
fixedVersion?
|
||||
lastObserved
|
||||
docDigest
|
||||
provenance { uri, line?, pointer?, signatureState }
|
||||
evidence[] { key, value, locator }
|
||||
indices:
|
||||
- {vulnId:1, productKey:1}
|
||||
- {providerId:1, lastObserved:-1}
|
||||
- {status:1}
|
||||
- text index (optional) on evidence.value for debugging
|
||||
```
|
||||
|
||||
**`vex.consensus`** (rollups)
|
||||
|
||||
```
|
||||
_id: sha256(canonical(vulnId, productKey, policyRevision))
|
||||
vulnId
|
||||
productKey
|
||||
rollupStatus
|
||||
sources[]: [
|
||||
{ providerId, status, justification?, weight, lastObserved, accepted:bool, reason }
|
||||
]
|
||||
policyRevisionId
|
||||
evaluatedAt
|
||||
consensusDigest // same as _id
|
||||
indices:
|
||||
- {vulnId:1, productKey:1}
|
||||
- {policyRevisionId:1, evaluatedAt:-1}
|
||||
```
|
||||
|
||||
**`vex.exports`** (manifest of emitted artifacts)
|
||||
|
||||
```
|
||||
_id
|
||||
querySignature
|
||||
format: raw|consensus|index
|
||||
artifactSha256
|
||||
rekor { uuid, index, url }?
|
||||
createdAt
|
||||
policyRevisionId
|
||||
cacheable: bool
|
||||
```
|
||||
|
||||
**`vex.cache`**
|
||||
|
||||
```
|
||||
querySignature -> exportId (for fast reuse)
|
||||
ttl, hits
|
||||
```
|
||||
|
||||
**`vex.migrations`**
|
||||
|
||||
* ordered migrations applied at bootstrap to ensure indexes.
|
||||
|
||||
### 3.2 Indexing strategy
|
||||
|
||||
* Hot path queries use exact `(vulnId, productKey)` and time‑bounded windows; compound indexes cover both.
|
||||
* Providers list view by `lastObserved` for monitoring staleness.
|
||||
* `vex.consensus` keyed by `(vulnId, productKey, policyRevision)` for deterministic reuse.
|
||||
|
||||
---
|
||||
|
||||
## 4) Ingestion pipeline
|
||||
|
||||
### 4.1 Connector contract
|
||||
|
||||
```csharp
|
||||
public interface IVexConnector
|
||||
{
|
||||
string ProviderId { get; }
|
||||
Task FetchAsync(VexConnectorContext ctx, CancellationToken ct); // raw docs
|
||||
Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> VexClaim[]
|
||||
}
|
||||
```
|
||||
|
||||
* **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff.
|
||||
* **Normalize** parses the format, validates schema, maps product identities deterministically, emits `VexClaim` records with **provenance**.
|
||||
|
||||
### 4.2 Signature verification (per provider)
|
||||
|
||||
* **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles.
|
||||
* **PGP** (provider keyrings) for distro/vendor feeds that sign docs.
|
||||
* **x509** (mutual TLS / provider‑pinned certs) where applicable.
|
||||
* Signature state is stored on **vex.raw.sig** and copied into **provenance.signatureState** on claims.
|
||||
|
||||
> Claims from sources failing signature policy are marked `"signatureState.verified=false"` and **policy** can down‑weight or ignore them.
|
||||
|
||||
### 4.3 Time discipline
|
||||
|
||||
* For each doc, prefer **provider’s document timestamp**; if absent, use fetch time.
|
||||
* Claims carry `lastObserved` which drives **tie‑breaking** within equal weight tiers.
|
||||
|
||||
---
|
||||
|
||||
## 5) Normalization: product & status semantics
|
||||
|
||||
### 5.1 Product mapping
|
||||
|
||||
* **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb).
|
||||
* Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied.
|
||||
* If expansion would be speculative, the claim remains **platform‑scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non‑joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime.
|
||||
|
||||
### 5.2 Status + justification mapping
|
||||
|
||||
* Canonical **status**: `affected | not_affected | fixed | under_investigation`.
|
||||
* **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.:
|
||||
|
||||
* `component_not_present`
|
||||
* `vulnerable_code_not_in_execute_path`
|
||||
* `vulnerable_configuration_unused`
|
||||
* `inline_mitigation_applied`
|
||||
* `fix_available` (with `fixedVersion`)
|
||||
* `under_investigation`
|
||||
* Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`.
|
||||
|
||||
---
|
||||
|
||||
## 6) Consensus algorithm
|
||||
|
||||
**Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` given possibly conflicting claims.
|
||||
|
||||
### 6.1 Inputs
|
||||
|
||||
* Set **S** of `VexClaim` for the key.
|
||||
* **Excitor policy snapshot**:
|
||||
|
||||
* **weights** per provider tier and per provider overrides.
|
||||
* **justification gates** (e.g., require justification for `not_affected` to be acceptable).
|
||||
* **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros).
|
||||
* **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered).
|
||||
|
||||
### 6.2 Steps
|
||||
|
||||
1. **Filter invalid** claims by signature policy & justification gates → set `S'`.
|
||||
2. **Score** each claim:
|
||||
`score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect).
|
||||
3. **Aggregate** scores per status: `W(status) = Σ score(claims with that status)`.
|
||||
4. **Pick** `rollupStatus = argmax_status W(status)`.
|
||||
5. **Tie‑breakers** (in order):
|
||||
|
||||
* Higher **max single** provider score wins (vendor > distro > platform > hub).
|
||||
* More **recent** lastObserved wins.
|
||||
* Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker.
|
||||
6. **Explain**: mark accepted sources (`accepted=true; reason="weight"`/`"freshness"`), mark rejected sources with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`).
|
||||
|
||||
> The algorithm is **pure** given S and policy snapshot; result is reproducible and hashed into `consensusDigest`.
|
||||
|
||||
---
|
||||
|
||||
## 7) Query & export APIs
|
||||
|
||||
All endpoints are versioned under `/api/v1/vex`.
|
||||
|
||||
### 7.1 Query (online)
|
||||
|
||||
```
|
||||
POST /claims/search
|
||||
body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string }
|
||||
→ { claims[], nextPageToken? }
|
||||
|
||||
POST /consensus/search
|
||||
body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string }
|
||||
→ { entries[], nextPageToken? }
|
||||
|
||||
POST /excititor/resolve (scope: vex.read)
|
||||
body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string }
|
||||
→ { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, sources[], conflicts[], decisions[], signals?, summary?, envelope: { artifact, contentSignature?, attestation?, attestationEnvelope?, attestationSignature? } } ] }
|
||||
```
|
||||
|
||||
### 7.2 Exports (cacheable snapshots)
|
||||
|
||||
```
|
||||
POST /exports
|
||||
body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool }
|
||||
→ { exportId, artifactSha256, rekor? }
|
||||
|
||||
GET /exports/{exportId} → bytes (application/json or binary index)
|
||||
GET /exports/{exportId}/meta → { signature, policyRevisionId, createdAt, artifactSha256, rekor? }
|
||||
```
|
||||
|
||||
### 7.3 Provider operations
|
||||
|
||||
```
|
||||
GET /providers → provider list & signature policy
|
||||
POST /providers/{id}/refresh → trigger fetch/normalize window
|
||||
GET /providers/{id}/status → last fetch, doc counts, signature stats
|
||||
```
|
||||
|
||||
**Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC.
|
||||
|
||||
---
|
||||
|
||||
## 8) Attestation integration
|
||||
|
||||
* Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines).
|
||||
* `vex.exports.rekor` stores `{uuid, index, url}` when present.
|
||||
* **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields:
|
||||
|
||||
* `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`.
|
||||
|
||||
---
|
||||
|
||||
## 9) Configuration (YAML)
|
||||
|
||||
```yaml
|
||||
excitor:
|
||||
mongo: { uri: "mongodb://mongo/excitor" }
|
||||
s3:
|
||||
endpoint: http://minio:9000
|
||||
bucket: stellaops
|
||||
policy:
|
||||
weights:
|
||||
vendor: 1.0
|
||||
distro: 0.9
|
||||
platform: 0.7
|
||||
hub: 0.5
|
||||
attestation: 0.6
|
||||
providerOverrides:
|
||||
redhat: 1.0
|
||||
suse: 0.95
|
||||
requireJustificationForNotAffected: true
|
||||
signatureRequiredForFixed: true
|
||||
minEvidence:
|
||||
not_affected:
|
||||
vendorOrTwoDistros: true
|
||||
connectors:
|
||||
- providerId: redhat
|
||||
kind: csaf
|
||||
baseUrl: https://access.redhat.com/security/data/csaf/v2/
|
||||
signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] }
|
||||
windowDays: 7
|
||||
- providerId: suse
|
||||
kind: csaf
|
||||
baseUrl: https://ftp.suse.com/pub/projects/security/csaf/
|
||||
signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] }
|
||||
- providerId: ubuntu
|
||||
kind: openvex
|
||||
baseUrl: https://…/vex/
|
||||
signaturePolicy: { type: none }
|
||||
- providerId: vendorX
|
||||
kind: cyclonedx-vex
|
||||
ociRef: ghcr.io/vendorx/vex@sha256:…
|
||||
signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10) Security model
|
||||
|
||||
* **Input signature verification** enforced per provider policy (PGP, cosign, x509).
|
||||
* **Connector allowlists**: outbound fetch constrained to configured domains.
|
||||
* **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies.
|
||||
* **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`).
|
||||
* **No secrets in logs**; deterministic logging contexts include providerId, docDigest, claim keys.
|
||||
|
||||
---
|
||||
|
||||
## 11) Performance & scale
|
||||
|
||||
* **Targets:**
|
||||
|
||||
* Normalize 10k VEX claims/minute/core.
|
||||
* Consensus compute ≤ 50 ms for 1k unique `(vuln, product)` pairs in hot cache.
|
||||
* Export (consensus) 1M rows in ≤ 60 s on 8 cores with streaming writer.
|
||||
|
||||
* **Scaling:**
|
||||
|
||||
* WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys.
|
||||
* Exports stream straight to S3 (MinIO) with rolling buffers.
|
||||
|
||||
* **Caching:**
|
||||
|
||||
* `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`.
|
||||
|
||||
---
|
||||
|
||||
## 12) Observability
|
||||
|
||||
* **Metrics:**
|
||||
|
||||
* `vex.ingest.docs_total{provider}`
|
||||
* `vex.normalize.claims_total{provider}`
|
||||
* `vex.signature.failures_total{provider,method}`
|
||||
* `vex.consensus.conflicts_total{vulnId}`
|
||||
* `vex.exports.bytes{format}` / `vex.exports.latency_seconds`
|
||||
* **Tracing:** spans for fetch, verify, parse, map, consensus, export.
|
||||
* **Dashboards:** provider staleness, top conflicting vulns/components, signature posture, export cache hit‑rate.
|
||||
|
||||
---
|
||||
|
||||
## 13) Testing matrix
|
||||
|
||||
* **Connectors:** golden raw docs → deterministic claims (fixtures per provider/format).
|
||||
* **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted.
|
||||
* **Normalization edge cases:** platform‑only claims, free‑text justifications, non‑purl products.
|
||||
* **Consensus:** conflict scenarios across tiers; check tie‑breakers; justification gates.
|
||||
* **Performance:** 1M‑row export timing; memory ceilings; stream correctness.
|
||||
* **Determinism:** same inputs + policy → identical `consensusDigest` and export bytes.
|
||||
* **API contract tests:** pagination, filters, RBAC, rate limits.
|
||||
|
||||
---
|
||||
|
||||
## 14) Integration points
|
||||
|
||||
* **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`.
|
||||
* **Conselier**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation.
|
||||
* **UI**: VEX explorer screens use `/claims/search` and `/consensus/search`; show conflicts & provenance.
|
||||
* **CLI**: `stellaops vex export --consensus --since 7d --out vex.json` for audits.
|
||||
|
||||
---
|
||||
|
||||
## 15) Failure modes & fallback
|
||||
|
||||
* **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor).
|
||||
* **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy.
|
||||
* **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**.
|
||||
|
||||
---
|
||||
|
||||
## 16) Rollout plan (incremental)
|
||||
|
||||
1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`.
|
||||
2. **Signature policies**: PGP for distros; cosign for OCI.
|
||||
3. **Exports + optional attestation**.
|
||||
4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer.
|
||||
5. **Scale hardening**: export indexes; conflict analytics.
|
||||
|
||||
---
|
||||
|
||||
## 17) Appendix — canonical JSON (stable ordering)
|
||||
|
||||
All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`:
|
||||
|
||||
* UTF‑8 without BOM;
|
||||
* keys sorted (ASCII);
|
||||
* arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated;
|
||||
* timestamps in `YYYY‑MM‑DDThh:mm:ssZ`;
|
||||
* no insignificant whitespace.
|
||||
|
||||
@@ -1,65 +1,65 @@
|
||||
# Implementation plan — Vexer
|
||||
|
||||
## Delivery phases
|
||||
- **Phase 1 – Connectors & normalization**
|
||||
Build connectors for OpenVEX, CSAF VEX, CycloneDX VEX, OCI attestations; capture provenance, signatures, and source metadata; normalise into `VexClaim`.
|
||||
- **Phase 2 – Mapping & trust registry**
|
||||
Implement product mapping (CPE → purl/version), issuer registry (trust tiers, signatures), scope scoring, and justification taxonomy.
|
||||
- **Phase 3 – Consensus & projections**
|
||||
Deliver consensus computation, conflict preservation, projections (`vex_consensus`, history, provider snapshots), and DSSE events.
|
||||
- **Phase 4 – APIs & integrations**
|
||||
Expose REST/CLI endpoints for claims, consensus, conflicts, exports; integrate Policy Engine, Vuln Explorer, Advisory AI, Export Center.
|
||||
- **Phase 5 – Observability & offline**
|
||||
Ship metrics, logs, traces, dashboards, incident runbooks, Offline Kit bundles, and performance tuning (10M claims/tenant).
|
||||
|
||||
## Work breakdown
|
||||
- **Connectors**
|
||||
- Fetchers for vendor feeds, CSAF repositories, OpenVEX docs, OCI referrers.
|
||||
- Signature verification (PGP, cosign, PKI) per source; schema validation; rate limiting.
|
||||
- Source configuration (trust tier, fetch cadence, blackout windows) stored in metadata registry.
|
||||
- **Normalization**
|
||||
- Canonical `VexClaim` schema with deterministic IDs, provenance, supersedes chains.
|
||||
- Product tree parsing, mapping to canonical product keys and environments.
|
||||
- Justification and scope scoring derived from source semantics.
|
||||
- **Consensus & projections**
|
||||
- Lattice join with precedence rules, conflict tracking, confidence scores, recency decay.
|
||||
- Append-only history, conflict queue, DSSE events (`vex.consensus.updated`).
|
||||
- Export-ready JSONL & DSSE bundles for Offline Kit and Export Center.
|
||||
- **APIs & UX**
|
||||
- REST endpoints (`/claims`, `/consensus`, `/conflicts`, `/providers`) with tenant RBAC.
|
||||
- CLI commands `stella vex claims|consensus|conflicts|export`.
|
||||
- Console modules (list/detail, conflict diagnostics, provider health, simulation hooks).
|
||||
- **Integrations**
|
||||
- Policy Engine trust knobs, Vuln Explorer consensus badges, Advisory AI narrative generation, Notify alerts for conflicts.
|
||||
- Orchestrator jobs for recompute/backfill triggered by Excitator deltas.
|
||||
- **Observability & Ops**
|
||||
- Metrics (ingest latency, signature failure rate, conflict rate, consensus latency).
|
||||
- Logs/traces with tenant/issuer/provenance context.
|
||||
- Runbooks for mapping failures, signature errors, recompute storms, quota exhaustion.
|
||||
|
||||
## Acceptance criteria
|
||||
- Connectors ingest validated VEX statements with signed provenance, deterministic mapping, and tenant isolation.
|
||||
- Consensus outputs reproducible, include conflicts, and integrate with Policy Engine/Vuln Explorer/Export Center.
|
||||
- CLI/Console provide evidence inspection, conflict analysis, and exports; Offline Kit bundles replay verification offline.
|
||||
- Observability dashboards/alerts capture ingest health, trust anomalies, conflict spikes, and performance budgets.
|
||||
- Recompute pipeline handles policy changes and new evidence without dropping deterministic outcomes.
|
||||
|
||||
## Risks & mitigations
|
||||
- **Mapping ambiguity:** maintain scope scores, manual overrides, highlight warnings.
|
||||
- **Signature trust gaps:** issuer registry with auditing, fallback trust policies, tenant overrides.
|
||||
- **Evidence surges:** orchestrator backpressure, prioritised queues, shardable workers.
|
||||
- **Performance regressions:** indexing, caching, load tests, budget enforcement.
|
||||
- **Tenant leakage:** strict RBAC/filters, fuzz tests, compliance reviews.
|
||||
|
||||
## Test strategy
|
||||
- **Unit:** connector parsers, normalization, mapping conversions, lattice operations.
|
||||
- **Property:** randomised evidence ensuring commutative consensus and deterministic digests.
|
||||
- **Integration:** end-to-end pipeline from Excitator to consensus export, policy simulation, conflict handling.
|
||||
- **Performance:** large feed ingestion, recompute stress, CLI export throughput.
|
||||
- **Security:** signature tampering, issuer revocation, RBAC.
|
||||
- **Offline:** export/import verification, DSSE bundle validation.
|
||||
|
||||
## Definition of done
|
||||
- Connectors, normalization, consensus, APIs, and integrations deployed with telemetry, runbooks, and Offline Kit parity.
|
||||
- Documentation (overview, architecture, algorithm, issuer registry, API/CLI, runbooks) updated with imposed rule compliance.
|
||||
- ./TASKS.md and ../../TASKS.md reflect active status and dependencies.
|
||||
# Implementation plan — Excitor
|
||||
|
||||
## Delivery phases
|
||||
- **Phase 1 – Connectors & normalization**
|
||||
Build connectors for OpenVEX, CSAF VEX, CycloneDX VEX, OCI attestations; capture provenance, signatures, and source metadata; normalise into `VexClaim`.
|
||||
- **Phase 2 – Mapping & trust registry**
|
||||
Implement product mapping (CPE → purl/version), issuer registry (trust tiers, signatures), scope scoring, and justification taxonomy.
|
||||
- **Phase 3 – Consensus & projections**
|
||||
Deliver consensus computation, conflict preservation, projections (`vex_consensus`, history, provider snapshots), and DSSE events.
|
||||
- **Phase 4 – APIs & integrations**
|
||||
Expose REST/CLI endpoints for claims, consensus, conflicts, exports; integrate Policy Engine, Vuln Explorer, Advisory AI, Export Center.
|
||||
- **Phase 5 – Observability & offline**
|
||||
Ship metrics, logs, traces, dashboards, incident runbooks, Offline Kit bundles, and performance tuning (10M claims/tenant).
|
||||
|
||||
## Work breakdown
|
||||
- **Connectors**
|
||||
- Fetchers for vendor feeds, CSAF repositories, OpenVEX docs, OCI referrers.
|
||||
- Signature verification (PGP, cosign, PKI) per source; schema validation; rate limiting.
|
||||
- Source configuration (trust tier, fetch cadence, blackout windows) stored in metadata registry.
|
||||
- **Normalization**
|
||||
- Canonical `VexClaim` schema with deterministic IDs, provenance, supersedes chains.
|
||||
- Product tree parsing, mapping to canonical product keys and environments.
|
||||
- Justification and scope scoring derived from source semantics.
|
||||
- **Consensus & projections**
|
||||
- Lattice join with precedence rules, conflict tracking, confidence scores, recency decay.
|
||||
- Append-only history, conflict queue, DSSE events (`vex.consensus.updated`).
|
||||
- Export-ready JSONL & DSSE bundles for Offline Kit and Export Center.
|
||||
- **APIs & UX**
|
||||
- REST endpoints (`/claims`, `/consensus`, `/conflicts`, `/providers`) with tenant RBAC.
|
||||
- CLI commands `stella vex claims|consensus|conflicts|export`.
|
||||
- Console modules (list/detail, conflict diagnostics, provider health, simulation hooks).
|
||||
- **Integrations**
|
||||
- Policy Engine trust knobs, Vuln Explorer consensus badges, Advisory AI narrative generation, Notify alerts for conflicts.
|
||||
- Orchestrator jobs for recompute/backfill triggered by Excitor deltas.
|
||||
- **Observability & Ops**
|
||||
- Metrics (ingest latency, signature failure rate, conflict rate, consensus latency).
|
||||
- Logs/traces with tenant/issuer/provenance context.
|
||||
- Runbooks for mapping failures, signature errors, recompute storms, quota exhaustion.
|
||||
|
||||
## Acceptance criteria
|
||||
- Connectors ingest validated VEX statements with signed provenance, deterministic mapping, and tenant isolation.
|
||||
- Consensus outputs reproducible, include conflicts, and integrate with Policy Engine/Vuln Explorer/Export Center.
|
||||
- CLI/Console provide evidence inspection, conflict analysis, and exports; Offline Kit bundles replay verification offline.
|
||||
- Observability dashboards/alerts capture ingest health, trust anomalies, conflict spikes, and performance budgets.
|
||||
- Recompute pipeline handles policy changes and new evidence without dropping deterministic outcomes.
|
||||
|
||||
## Risks & mitigations
|
||||
- **Mapping ambiguity:** maintain scope scores, manual overrides, highlight warnings.
|
||||
- **Signature trust gaps:** issuer registry with auditing, fallback trust policies, tenant overrides.
|
||||
- **Evidence surges:** orchestrator backpressure, prioritised queues, shardable workers.
|
||||
- **Performance regressions:** indexing, caching, load tests, budget enforcement.
|
||||
- **Tenant leakage:** strict RBAC/filters, fuzz tests, compliance reviews.
|
||||
|
||||
## Test strategy
|
||||
- **Unit:** connector parsers, normalization, mapping conversions, lattice operations.
|
||||
- **Property:** randomised evidence ensuring commutative consensus and deterministic digests.
|
||||
- **Integration:** end-to-end pipeline from Excitor to consensus export, policy simulation, conflict handling.
|
||||
- **Performance:** large feed ingestion, recompute stress, CLI export throughput.
|
||||
- **Security:** signature tampering, issuer revocation, RBAC.
|
||||
- **Offline:** export/import verification, DSSE bundle validation.
|
||||
|
||||
## Definition of done
|
||||
- Connectors, normalization, consensus, APIs, and integrations deployed with telemetry, runbooks, and Offline Kit parity.
|
||||
- Documentation (overview, architecture, algorithm, issuer registry, API/CLI, runbooks) updated with imposed rule compliance.
|
||||
- ./TASKS.md and ../../TASKS.md reflect active status and dependencies.
|
||||
@@ -1,83 +1,83 @@
|
||||
## Status
|
||||
|
||||
This document tracks the future-looking risk scoring model for Vexer. The calculation below is not active yet; Sprint 7 work will add the required schema fields, policy controls, and services. Until that ships, Vexer emits consensus statuses without numeric scores.
|
||||
|
||||
## Scoring model (target state)
|
||||
|
||||
**S = Gate(VEX_status) × W_trust(source) × [Severity_base × (1 + α·KEV + β·EPSS)]**
|
||||
|
||||
* **Gate(VEX_status)**: `affected`/`under_investigation` → 1, `not_affected`/`fixed` → 0. A trusted “not affected” or “fixed” still zeroes the score.
|
||||
* **W_trust(source)**: normalized policy weight (baseline 0‒1). Policies may opt into >1 boosts for signed vendor feeds once Phase 1 closes.
|
||||
* **Severity_base**: canonical numeric severity from Feedser (CVSS or org-defined scale).
|
||||
* **KEV flag**: 0/1 boost when CISA Known Exploited Vulnerabilities applies.
|
||||
* **EPSS**: probability [0,1]; bounded multiplier.
|
||||
* **α, β**: configurable coefficients (default α=0.25, β=0.5) stored in policy.
|
||||
|
||||
Safeguards: freeze boosts when product identity is unknown, clamp outputs ≥0, and log every factor in the audit trail.
|
||||
|
||||
## Implementation roadmap
|
||||
|
||||
| Phase | Scope | Artifacts |
|
||||
| --- | --- | --- |
|
||||
| **Phase 1 – Schema foundations** | Extend Vexer consensus/claims and Feedser canonical advisories with severity, KEV, EPSS, and expose α/β + weight ceilings in policy. | Sprint 7 tasks `VEXER-CORE-02-001`, `VEXER-POLICY-02-001`, `VEXER-STORAGE-02-001`, `FEEDCORE-ENGINE-07-001`. |
|
||||
| **Phase 2 – Deterministic score engine** | Implement a scoring component that executes alongside consensus and persists score envelopes with hashes. | Planned task `VEXER-CORE-02-002` (backlog). |
|
||||
| **Phase 3 – Surfacing & enforcement** | Expose scores via WebService/CLI, integrate with Feedser noise priors, and enforce policy-based suppressions. | To be scheduled after Phase 2. |
|
||||
|
||||
## Data model (after Phase 1)
|
||||
|
||||
```json
|
||||
{
|
||||
"vulnerabilityId": "CVE-2025-12345",
|
||||
"product": "pkg:name@version",
|
||||
"consensus": {
|
||||
"status": "affected",
|
||||
"policyRevisionId": "rev-12",
|
||||
"policyDigest": "0D9AEC…"
|
||||
},
|
||||
"signals": {
|
||||
"severity": {"scheme": "CVSS:3.1", "score": 7.5},
|
||||
"kev": true,
|
||||
"epss": 0.40
|
||||
},
|
||||
"policy": {
|
||||
"weight": 1.15,
|
||||
"alpha": 0.25,
|
||||
"beta": 0.5
|
||||
},
|
||||
"score": {
|
||||
"value": 10.8,
|
||||
"generatedAt": "2025-11-05T14:12:30Z",
|
||||
"audit": [
|
||||
"gate:affected",
|
||||
"weight:1.15",
|
||||
"severity:7.5",
|
||||
"kev:1",
|
||||
"epss:0.40"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Operational guidance
|
||||
|
||||
* **Inputs**: Feedser delivers severity/KEV/EPSS via the advisory event log; Vexer connectors load VEX statements. Policy owns trust tiers and coefficients.
|
||||
* **Processing**: the scoring engine (Phase 2) runs next to consensus, storing results with deterministic hashes so exports and attestations can reference them.
|
||||
* **Consumption**: WebService/CLI will return consensus plus score; scanners may suppress findings only when policy-authorized VEX gating and signed score envelopes agree.
|
||||
|
||||
## Pseudocode (Phase 2 preview)
|
||||
|
||||
```python
|
||||
def risk_score(gate, weight, severity, kev, epss, alpha, beta, freeze_boosts=False):
|
||||
if gate == 0:
|
||||
return 0
|
||||
if freeze_boosts:
|
||||
kev, epss = 0, 0
|
||||
boost = 1 + alpha * kev + beta * epss
|
||||
return max(0, weight * severity * boost)
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
* **Can operators opt out?** Set α=β=0 or keep weights ≤1.0 via policy.
|
||||
* **What about missing signals?** Treat them as zero and log the omission.
|
||||
* **When will this ship?** Phase 1 is planned for Sprint 7; later phases depend on connector coverage and attestation delivery.
|
||||
## Status
|
||||
|
||||
This document tracks the future-looking risk scoring model for Excitor. The calculation below is not active yet; Sprint 7 work will add the required schema fields, policy controls, and services. Until that ships, Excitor emits consensus statuses without numeric scores.
|
||||
|
||||
## Scoring model (target state)
|
||||
|
||||
**S = Gate(VEX_status) × W_trust(source) × [Severity_base × (1 + α·KEV + β·EPSS)]**
|
||||
|
||||
* **Gate(VEX_status)**: `affected`/`under_investigation` → 1, `not_affected`/`fixed` → 0. A trusted “not affected” or “fixed” still zeroes the score.
|
||||
* **W_trust(source)**: normalized policy weight (baseline 0‒1). Policies may opt into >1 boosts for signed vendor feeds once Phase 1 closes.
|
||||
* **Severity_base**: canonical numeric severity from Conselier (CVSS or org-defined scale).
|
||||
* **KEV flag**: 0/1 boost when CISA Known Exploited Vulnerabilities applies.
|
||||
* **EPSS**: probability [0,1]; bounded multiplier.
|
||||
* **α, β**: configurable coefficients (default α=0.25, β=0.5) stored in policy.
|
||||
|
||||
Safeguards: freeze boosts when product identity is unknown, clamp outputs ≥0, and log every factor in the audit trail.
|
||||
|
||||
## Implementation roadmap
|
||||
|
||||
| Phase | Scope | Artifacts |
|
||||
| --- | --- | --- |
|
||||
| **Phase 1 – Schema foundations** | Extend Excitor consensus/claims and Conselier canonical advisories with severity, KEV, EPSS, and expose α/β + weight ceilings in policy. | Sprint 7 tasks `EXCITOR-CORE-02-001`, `EXCITOR-POLICY-02-001`, `EXCITOR-STORAGE-02-001`, `FEEDCORE-ENGINE-07-001`. |
|
||||
| **Phase 2 – Deterministic score engine** | Implement a scoring component that executes alongside consensus and persists score envelopes with hashes. | Planned task `EXCITOR-CORE-02-002` (backlog). |
|
||||
| **Phase 3 – Surfacing & enforcement** | Expose scores via WebService/CLI, integrate with Conselier noise priors, and enforce policy-based suppressions. | To be scheduled after Phase 2. |
|
||||
|
||||
## Data model (after Phase 1)
|
||||
|
||||
```json
|
||||
{
|
||||
"vulnerabilityId": "CVE-2025-12345",
|
||||
"product": "pkg:name@version",
|
||||
"consensus": {
|
||||
"status": "affected",
|
||||
"policyRevisionId": "rev-12",
|
||||
"policyDigest": "0D9AEC…"
|
||||
},
|
||||
"signals": {
|
||||
"severity": {"scheme": "CVSS:3.1", "score": 7.5},
|
||||
"kev": true,
|
||||
"epss": 0.40
|
||||
},
|
||||
"policy": {
|
||||
"weight": 1.15,
|
||||
"alpha": 0.25,
|
||||
"beta": 0.5
|
||||
},
|
||||
"score": {
|
||||
"value": 10.8,
|
||||
"generatedAt": "2025-11-05T14:12:30Z",
|
||||
"audit": [
|
||||
"gate:affected",
|
||||
"weight:1.15",
|
||||
"severity:7.5",
|
||||
"kev:1",
|
||||
"epss:0.40"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Operational guidance
|
||||
|
||||
* **Inputs**: Conselier delivers severity/KEV/EPSS via the advisory event log; Excitor connectors load VEX statements. Policy owns trust tiers and coefficients.
|
||||
* **Processing**: the scoring engine (Phase 2) runs next to consensus, storing results with deterministic hashes so exports and attestations can reference them.
|
||||
* **Consumption**: WebService/CLI will return consensus plus score; scanners may suppress findings only when policy-authorized VEX gating and signed score envelopes agree.
|
||||
|
||||
## Pseudocode (Phase 2 preview)
|
||||
|
||||
```python
|
||||
def risk_score(gate, weight, severity, kev, epss, alpha, beta, freeze_boosts=False):
|
||||
if gate == 0:
|
||||
return 0
|
||||
if freeze_boosts:
|
||||
kev, epss = 0, 0
|
||||
boost = 1 + alpha * kev + beta * epss
|
||||
return max(0, weight * severity * boost)
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
* **Can operators opt out?** Set α=β=0 or keep weights ≤1.0 via policy.
|
||||
* **What about missing signals?** Treat them as zero and log the omission.
|
||||
* **When will this ship?** Phase 1 is planned for Sprint 7; later phases depend on connector coverage and attestation delivery.
|
||||
@@ -1,150 +1,150 @@
|
||||
# Export Center Provenance & Signing
|
||||
|
||||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
Export Center runs emit deterministic manifests, provenance records, and signatures so operators can prove bundle integrity end-to-end—whether the artefact is downloaded over HTTPS, pulled as an OCI object, or staged through the Offline Kit. This guide captures the canonical artefacts, signing pipeline, verification workflows, and failure handling expectations that backlogs `EXPORT-SVC-35-005` and `EXPORT-SVC-37-002` implement.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goals & scope
|
||||
|
||||
- **Authenticity.** Every export manifest and provenance document is signed using Authority-managed KMS keys (cosign-compatible) with optional SLSA Level 3 attestation.
|
||||
- **Traceability.** Provenance links each bundle to the inputs that produced it: tenant, findings ledger queries, policy snapshots, SBOM identifiers, adapter versions, and encryption recipients.
|
||||
- **Determinism.** Canonical JSON (sorted keys, RFC 3339 UTC timestamps, normalized numbers) guarantees byte-for-byte stability across reruns with identical input.
|
||||
- **Portability.** Signatures and attestations travel with filesystem bundles, OCI artefacts, and Offline Kit staging trees. Verification does not require online Authority access when the bundle includes the cosign public key.
|
||||
|
||||
---
|
||||
|
||||
## 2. Artefact inventory
|
||||
|
||||
| File | Location | Description | Notes |
|
||||
|------|----------|-------------|-------|
|
||||
| `export.json` | `manifests/export.json` or HTTP `GET /api/export/runs/{id}/manifest` | Canonical manifest describing profile, selectors, counts, SHA-256 digests, compression hints, distribution targets. | Hash of this file is included in provenance `subjects[]`. |
|
||||
| `provenance.json` | `manifests/provenance.json` or `GET /api/export/runs/{id}/provenance` | In-toto provenance record listing subjects, materials, toolchain metadata, encryption recipients, and KMS key identifiers. | Mirrors SLSA Level 2 schema; optionally upgraded to Level 3 with builder attestations. |
|
||||
| `export.json.sig` / `export.json.dsse` | `signatures/export.json.sig` | Cosign signature (and optional DSSE envelope) for manifest. | File naming matches cosign defaults; offline verification scripts expect `.sig`. |
|
||||
| `provenance.json.sig` / `provenance.json.dsse` | `signatures/provenance.json.sig` | Cosign signature (and optional DSSE envelope) for provenance document. | `dsse` present when SLSA Level 3 is enabled. |
|
||||
| `bundle.attestation` | `signatures/bundle.attestation` (optional) | SLSA Level 2/3 attestation binding bundle tarball/OCI digest to the run. | Only produced when `export.attestation.enabled=true`. |
|
||||
| `manifest.yaml` | bundle root | Human-readable summary including digests, sizes, encryption metadata, and verification hints. | Unsigned but redundant; signatures cover the JSON manifests. |
|
||||
|
||||
All digests use lowercase hex SHA-256 (`sha256:<digest>`). When bundle encryption is enabled, `provenance.json` records wrapped data keys and recipient fingerprints under `encryption.recipients[]`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Signing pipeline
|
||||
|
||||
1. **Canonicalisation.** Export worker serialises `export.json` and `provenance.json` using `NotifyCanonicalJsonSerializer` (identical canonical JSON helpers shared across services). Keys are sorted lexicographically, arrays ordered deterministically, timestamps normalised to UTC.
|
||||
2. **Digest creation.** SHA-256 digests are computed and recorded:
|
||||
- `manifest_hash` and `provenance_hash` stored in the run metadata (Mongo) and exported via `/api/export/runs/{id}`.
|
||||
- Provenance `subjects[]` contains both manifest hash and bundle/archive hash.
|
||||
3. **Key retrieval.** Worker obtains a short-lived signing token from Authority’s KMS client using tenant-scoped credentials (`export.sign` scope). Keys live in Authority or tenant-specific HSMs depending on deployment.
|
||||
4. **Signature emission.** Cosign generates detached signatures (`*.sig`). If DSSE is enabled, cosign wraps payload bytes in a DSSE envelope (`*.dsse`). Attestations follow the SLSA Level 2 provenance template; Level 3 requires builder metadata (`EXPORT-SVC-37-002` optional feature flag).
|
||||
5. **Storage & distribution.** Signatures and attestations are written alongside manifests in object storage, included in filesystem bundles, and attached as OCI artefact layers/annotations.
|
||||
6. **Audit trail.** Run metadata captures signer identity (`signing_key_id`), cosign certificate serial, signature timestamps, and verification hints. Console/CLI surface these details for downstream automation.
|
||||
|
||||
> **Key management.** Secrets and key references are configured per tenant via `export.signing`, pointing to Authority clients or external HSM aliases. Offline deployments pre-load cosign public keys into the bundle (`signatures/pubkeys/{tenant}.pem`).
|
||||
|
||||
---
|
||||
|
||||
## 4. Provenance schema highlights
|
||||
|
||||
`provenance.json` follows the SLSA provenance (`https://slsa.dev/provenance/v1`) structure with StellaOps-specific extensions. Key fields:
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| `subject[]` | Array of `{name,digest}` pairs. Includes bundle tarball/OCI digest and `export.json` digest. |
|
||||
| `predicateType` | SLSA v1 (default). |
|
||||
| `predicate.builder` | `{id:"stellaops/export-center@<region>"}` identifies the worker instance/cluster. |
|
||||
| `predicate.buildType` | Profile identifier (`mirror:full`, `mirror:delta`, etc.). |
|
||||
| `predicate.invocation.parameters` | Profile selectors, retention flags, encryption mode, base export references. |
|
||||
| `predicate.materials[]` | Source artefacts with digests: findings ledger query snapshots, policy snapshot IDs + hashes, SBOM identifiers, adapter release digests. |
|
||||
| `predicate.metadata.buildFinishedOn` | RFC 3339 timestamp when signing completed. |
|
||||
| `predicate.metadata.reproducible` | Always `true`—workers guarantee determinism. |
|
||||
| `predicate.environment.encryption` | Records encryption recipients, wrapped keys, algorithm (`age` or `aes-gcm`). |
|
||||
| `predicate.environment.kms` | Signing key identifier (`authority://tenant/export-signing-key`) and certificate chain fingerprints. |
|
||||
|
||||
Sample (abridged):
|
||||
|
||||
```json
|
||||
{
|
||||
"subject": [
|
||||
{ "name": "bundle.tar.zst", "digest": { "sha256": "c1fe..." } },
|
||||
{ "name": "manifests/export.json", "digest": { "sha256": "ad42..." } }
|
||||
],
|
||||
"predicate": {
|
||||
"buildType": "mirror:delta",
|
||||
"invocation": {
|
||||
"parameters": {
|
||||
"tenant": "tenant-01",
|
||||
"baseExportId": "run-20251020-01",
|
||||
"selectors": { "sources": ["concelier","vexer"], "profiles": ["mirror"] }
|
||||
}
|
||||
},
|
||||
"materials": [
|
||||
{ "uri": "ledger://tenant-01/findings?cursor=rev-42", "digest": { "sha256": "0f9a..." } },
|
||||
{ "uri": "policy://tenant-01/snapshots/rev-17", "digest": { "sha256": "8c3d..." } }
|
||||
],
|
||||
"environment": {
|
||||
"encryption": {
|
||||
"mode": "age",
|
||||
"recipients": [
|
||||
{ "recipient": "age1qxyz...", "wrappedKey": "BASE64...", "keyId": "tenant-01/notify-age" }
|
||||
]
|
||||
},
|
||||
"kms": {
|
||||
"signingKeyId": "authority://tenant-01/export-signing",
|
||||
"certificateChainSha256": "1f5e..."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Verification workflows
|
||||
|
||||
| Scenario | Steps |
|
||||
|----------|-------|
|
||||
| **CLI verification** | 1. `stella export manifest <runId> --output manifests/export.json --signature manifests/export.json.sig`<br>2. `stella export provenance <runId> --output manifests/provenance.json --signature manifests/provenance.json.sig`<br>3. `cosign verify-blob --key pubkeys/tenant.pem --signature manifests/export.json.sig manifests/export.json`<br>4. `cosign verify-blob --key pubkeys/tenant.pem --signature manifests/provenance.json.sig manifests/provenance.json` |
|
||||
| **Bundle verification (offline)** | 1. Extract bundle (or mount OCI artefact).<br>2. Validate manifest/provenance signatures using bundled public key.<br>3. Recompute SHA-256 for `data/` files and compare with entries in `export.json`.<br>4. If encrypted, decrypt with Age/AES-GCM recipient key, then re-run digest comparisons on decrypted content. |
|
||||
| **CI pipeline** | Use `stella export verify --manifest manifests/export.json --provenance manifests/provenance.json --signature manifests/export.json.sig --signature manifests/provenance.json.sig` (task `CLI-EXPORT-37-001`). Failure exits non-zero with reason codes (`ERR_EXPORT_SIG_INVALID`, `ERR_EXPORT_DIGEST_MISMATCH`). |
|
||||
| **Console download** | Console automatically verifies signatures before exposing the bundle; failure surfaces an actionable error referencing the export run ID and required remediation. |
|
||||
|
||||
Verification guidance (docs/modules/cli/guides/cli-reference.md §export) cross-links here; keep both docs in sync when CLI behaviour changes.
|
||||
|
||||
---
|
||||
|
||||
## 6. Distribution considerations
|
||||
|
||||
- **HTTP headers.** `X-Export-Digest` includes bundle digest; `X-Export-Provenance` references `provenance.json` URL; `X-Export-Signature` references `.sig`. Clients use these hints to short-circuit re-downloads.
|
||||
- **OCI annotations.** `org.opencontainers.image.ref.name`, `io.stellaops.export.manifest-digest`, and `io.stellaops.export.provenance-ref` allow registry tooling to locate manifests/signatures quickly.
|
||||
- **Offline Kit staging.** Offline kit assembler copies `manifests/`, `signatures/`, and `pubkeys/` verbatim. Verification scripts (`offline-kits/bin/verify-export.sh`) wrap the cosign commands described above.
|
||||
|
||||
---
|
||||
|
||||
## 7. Failure handling & observability
|
||||
|
||||
- Runs surface signature status via `/api/export/runs/{id}` (`signing.status`, `signing.lastError`). Common errors include `ERR_EXPORT_KMS_UNAVAILABLE`, `ERR_EXPORT_ATTESTATION_FAILED`, `ERR_EXPORT_CANONICALIZE`.
|
||||
- Metrics: `exporter_sign_duration_seconds`, `exporter_sign_failures_total{error_code}`, `exporter_provenance_verify_failures_total`.
|
||||
- Logs: `phase=sign`, `error_code`, `signing_key_id`, `cosign_certificate_sn`.
|
||||
- Alerts: DevOps dashboards (task `DEVOPS-EXPORT-37-001`) trigger on consecutive signing failures or verification failures >0.
|
||||
|
||||
When verification fails downstream, operators should:
|
||||
1. Confirm signatures using the known-good key.
|
||||
2. Inspect `provenance.json` materials; rerun the source queries to ensure matching digests.
|
||||
3. Review run audit logs and retry export with `--resume` to regenerate manifests.
|
||||
|
||||
---
|
||||
|
||||
## 8. Compliance checklist
|
||||
|
||||
- [ ] Manifests and provenance documents generated with canonical JSON, deterministic digests, and signatures.
|
||||
- [ ] Cosign public keys published per tenant, rotated through Authority, and distributed to Offline Kit consumers.
|
||||
- [ ] SLSA attestations enabled where supply-chain requirements demand Level 3 evidence.
|
||||
- [ ] CLI/Console verification paths documented and tested (CI pipelines exercise `stella export verify`).
|
||||
- [ ] Encryption metadata (recipients, wrapped keys) recorded in provenance and validated during verification.
|
||||
- [ ] Run audit logs capture signature timestamps, signer identity, and failure reasons.
|
||||
|
||||
---
|
||||
|
||||
> **Imposed rule reminder:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
# Export Center Provenance & Signing
|
||||
|
||||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
Export Center runs emit deterministic manifests, provenance records, and signatures so operators can prove bundle integrity end-to-end—whether the artefact is downloaded over HTTPS, pulled as an OCI object, or staged through the Offline Kit. This guide captures the canonical artefacts, signing pipeline, verification workflows, and failure handling expectations that backlogs `EXPORT-SVC-35-005` and `EXPORT-SVC-37-002` implement.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goals & scope
|
||||
|
||||
- **Authenticity.** Every export manifest and provenance document is signed using Authority-managed KMS keys (cosign-compatible) with optional SLSA Level 3 attestation.
|
||||
- **Traceability.** Provenance links each bundle to the inputs that produced it: tenant, findings ledger queries, policy snapshots, SBOM identifiers, adapter versions, and encryption recipients.
|
||||
- **Determinism.** Canonical JSON (sorted keys, RFC 3339 UTC timestamps, normalized numbers) guarantees byte-for-byte stability across reruns with identical input.
|
||||
- **Portability.** Signatures and attestations travel with filesystem bundles, OCI artefacts, and Offline Kit staging trees. Verification does not require online Authority access when the bundle includes the cosign public key.
|
||||
|
||||
---
|
||||
|
||||
## 2. Artefact inventory
|
||||
|
||||
| File | Location | Description | Notes |
|
||||
|------|----------|-------------|-------|
|
||||
| `export.json` | `manifests/export.json` or HTTP `GET /api/export/runs/{id}/manifest` | Canonical manifest describing profile, selectors, counts, SHA-256 digests, compression hints, distribution targets. | Hash of this file is included in provenance `subjects[]`. |
|
||||
| `provenance.json` | `manifests/provenance.json` or `GET /api/export/runs/{id}/provenance` | In-toto provenance record listing subjects, materials, toolchain metadata, encryption recipients, and KMS key identifiers. | Mirrors SLSA Level 2 schema; optionally upgraded to Level 3 with builder attestations. |
|
||||
| `export.json.sig` / `export.json.dsse` | `signatures/export.json.sig` | Cosign signature (and optional DSSE envelope) for manifest. | File naming matches cosign defaults; offline verification scripts expect `.sig`. |
|
||||
| `provenance.json.sig` / `provenance.json.dsse` | `signatures/provenance.json.sig` | Cosign signature (and optional DSSE envelope) for provenance document. | `dsse` present when SLSA Level 3 is enabled. |
|
||||
| `bundle.attestation` | `signatures/bundle.attestation` (optional) | SLSA Level 2/3 attestation binding bundle tarball/OCI digest to the run. | Only produced when `export.attestation.enabled=true`. |
|
||||
| `manifest.yaml` | bundle root | Human-readable summary including digests, sizes, encryption metadata, and verification hints. | Unsigned but redundant; signatures cover the JSON manifests. |
|
||||
|
||||
All digests use lowercase hex SHA-256 (`sha256:<digest>`). When bundle encryption is enabled, `provenance.json` records wrapped data keys and recipient fingerprints under `encryption.recipients[]`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Signing pipeline
|
||||
|
||||
1. **Canonicalisation.** Export worker serialises `export.json` and `provenance.json` using `NotifyCanonicalJsonSerializer` (identical canonical JSON helpers shared across services). Keys are sorted lexicographically, arrays ordered deterministically, timestamps normalised to UTC.
|
||||
2. **Digest creation.** SHA-256 digests are computed and recorded:
|
||||
- `manifest_hash` and `provenance_hash` stored in the run metadata (Mongo) and exported via `/api/export/runs/{id}`.
|
||||
- Provenance `subjects[]` contains both manifest hash and bundle/archive hash.
|
||||
3. **Key retrieval.** Worker obtains a short-lived signing token from Authority’s KMS client using tenant-scoped credentials (`export.sign` scope). Keys live in Authority or tenant-specific HSMs depending on deployment.
|
||||
4. **Signature emission.** Cosign generates detached signatures (`*.sig`). If DSSE is enabled, cosign wraps payload bytes in a DSSE envelope (`*.dsse`). Attestations follow the SLSA Level 2 provenance template; Level 3 requires builder metadata (`EXPORT-SVC-37-002` optional feature flag).
|
||||
5. **Storage & distribution.** Signatures and attestations are written alongside manifests in object storage, included in filesystem bundles, and attached as OCI artefact layers/annotations.
|
||||
6. **Audit trail.** Run metadata captures signer identity (`signing_key_id`), cosign certificate serial, signature timestamps, and verification hints. Console/CLI surface these details for downstream automation.
|
||||
|
||||
> **Key management.** Secrets and key references are configured per tenant via `export.signing`, pointing to Authority clients or external HSM aliases. Offline deployments pre-load cosign public keys into the bundle (`signatures/pubkeys/{tenant}.pem`).
|
||||
|
||||
---
|
||||
|
||||
## 4. Provenance schema highlights
|
||||
|
||||
`provenance.json` follows the SLSA provenance (`https://slsa.dev/provenance/v1`) structure with StellaOps-specific extensions. Key fields:
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| `subject[]` | Array of `{name,digest}` pairs. Includes bundle tarball/OCI digest and `export.json` digest. |
|
||||
| `predicateType` | SLSA v1 (default). |
|
||||
| `predicate.builder` | `{id:"stellaops/export-center@<region>"}` identifies the worker instance/cluster. |
|
||||
| `predicate.buildType` | Profile identifier (`mirror:full`, `mirror:delta`, etc.). |
|
||||
| `predicate.invocation.parameters` | Profile selectors, retention flags, encryption mode, base export references. |
|
||||
| `predicate.materials[]` | Source artefacts with digests: findings ledger query snapshots, policy snapshot IDs + hashes, SBOM identifiers, adapter release digests. |
|
||||
| `predicate.metadata.buildFinishedOn` | RFC 3339 timestamp when signing completed. |
|
||||
| `predicate.metadata.reproducible` | Always `true`—workers guarantee determinism. |
|
||||
| `predicate.environment.encryption` | Records encryption recipients, wrapped keys, algorithm (`age` or `aes-gcm`). |
|
||||
| `predicate.environment.kms` | Signing key identifier (`authority://tenant/export-signing-key`) and certificate chain fingerprints. |
|
||||
|
||||
Sample (abridged):
|
||||
|
||||
```json
|
||||
{
|
||||
"subject": [
|
||||
{ "name": "bundle.tar.zst", "digest": { "sha256": "c1fe..." } },
|
||||
{ "name": "manifests/export.json", "digest": { "sha256": "ad42..." } }
|
||||
],
|
||||
"predicate": {
|
||||
"buildType": "mirror:delta",
|
||||
"invocation": {
|
||||
"parameters": {
|
||||
"tenant": "tenant-01",
|
||||
"baseExportId": "run-20251020-01",
|
||||
"selectors": { "sources": ["concelier","excitor"], "profiles": ["mirror"] }
|
||||
}
|
||||
},
|
||||
"materials": [
|
||||
{ "uri": "ledger://tenant-01/findings?cursor=rev-42", "digest": { "sha256": "0f9a..." } },
|
||||
{ "uri": "policy://tenant-01/snapshots/rev-17", "digest": { "sha256": "8c3d..." } }
|
||||
],
|
||||
"environment": {
|
||||
"encryption": {
|
||||
"mode": "age",
|
||||
"recipients": [
|
||||
{ "recipient": "age1qxyz...", "wrappedKey": "BASE64...", "keyId": "tenant-01/notify-age" }
|
||||
]
|
||||
},
|
||||
"kms": {
|
||||
"signingKeyId": "authority://tenant-01/export-signing",
|
||||
"certificateChainSha256": "1f5e..."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Verification workflows
|
||||
|
||||
| Scenario | Steps |
|
||||
|----------|-------|
|
||||
| **CLI verification** | 1. `stella export manifest <runId> --output manifests/export.json --signature manifests/export.json.sig`<br>2. `stella export provenance <runId> --output manifests/provenance.json --signature manifests/provenance.json.sig`<br>3. `cosign verify-blob --key pubkeys/tenant.pem --signature manifests/export.json.sig manifests/export.json`<br>4. `cosign verify-blob --key pubkeys/tenant.pem --signature manifests/provenance.json.sig manifests/provenance.json` |
|
||||
| **Bundle verification (offline)** | 1. Extract bundle (or mount OCI artefact).<br>2. Validate manifest/provenance signatures using bundled public key.<br>3. Recompute SHA-256 for `data/` files and compare with entries in `export.json`.<br>4. If encrypted, decrypt with Age/AES-GCM recipient key, then re-run digest comparisons on decrypted content. |
|
||||
| **CI pipeline** | Use `stella export verify --manifest manifests/export.json --provenance manifests/provenance.json --signature manifests/export.json.sig --signature manifests/provenance.json.sig` (task `CLI-EXPORT-37-001`). Failure exits non-zero with reason codes (`ERR_EXPORT_SIG_INVALID`, `ERR_EXPORT_DIGEST_MISMATCH`). |
|
||||
| **Console download** | Console automatically verifies signatures before exposing the bundle; failure surfaces an actionable error referencing the export run ID and required remediation. |
|
||||
|
||||
Verification guidance (docs/modules/cli/guides/cli-reference.md §export) cross-links here; keep both docs in sync when CLI behaviour changes.
|
||||
|
||||
---
|
||||
|
||||
## 6. Distribution considerations
|
||||
|
||||
- **HTTP headers.** `X-Export-Digest` includes bundle digest; `X-Export-Provenance` references `provenance.json` URL; `X-Export-Signature` references `.sig`. Clients use these hints to short-circuit re-downloads.
|
||||
- **OCI annotations.** `org.opencontainers.image.ref.name`, `io.stellaops.export.manifest-digest`, and `io.stellaops.export.provenance-ref` allow registry tooling to locate manifests/signatures quickly.
|
||||
- **Offline Kit staging.** Offline kit assembler copies `manifests/`, `signatures/`, and `pubkeys/` verbatim. Verification scripts (`offline-kits/bin/verify-export.sh`) wrap the cosign commands described above.
|
||||
|
||||
---
|
||||
|
||||
## 7. Failure handling & observability
|
||||
|
||||
- Runs surface signature status via `/api/export/runs/{id}` (`signing.status`, `signing.lastError`). Common errors include `ERR_EXPORT_KMS_UNAVAILABLE`, `ERR_EXPORT_ATTESTATION_FAILED`, `ERR_EXPORT_CANONICALIZE`.
|
||||
- Metrics: `exporter_sign_duration_seconds`, `exporter_sign_failures_total{error_code}`, `exporter_provenance_verify_failures_total`.
|
||||
- Logs: `phase=sign`, `error_code`, `signing_key_id`, `cosign_certificate_sn`.
|
||||
- Alerts: DevOps dashboards (task `DEVOPS-EXPORT-37-001`) trigger on consecutive signing failures or verification failures >0.
|
||||
|
||||
When verification fails downstream, operators should:
|
||||
1. Confirm signatures using the known-good key.
|
||||
2. Inspect `provenance.json` materials; rerun the source queries to ensure matching digests.
|
||||
3. Review run audit logs and retry export with `--resume` to regenerate manifests.
|
||||
|
||||
---
|
||||
|
||||
## 8. Compliance checklist
|
||||
|
||||
- [ ] Manifests and provenance documents generated with canonical JSON, deterministic digests, and signatures.
|
||||
- [ ] Cosign public keys published per tenant, rotated through Authority, and distributed to Offline Kit consumers.
|
||||
- [ ] SLSA attestations enabled where supply-chain requirements demand Level 3 evidence.
|
||||
- [ ] CLI/Console verification paths documented and tested (CI pipelines exercise `stella export verify`).
|
||||
- [ ] Encryption metadata (recipients, wrapped keys) recorded in provenance and validated during verification.
|
||||
- [ ] Run audit logs capture signature timestamps, signer identity, and failure reasons.
|
||||
|
||||
---
|
||||
|
||||
> **Imposed rule reminder:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
|
||||
@@ -88,6 +88,7 @@
|
||||
- `curl -fsSL https://localhost:8447/health/live`
|
||||
- Issue an access token and list issuers to confirm results.
|
||||
- Check Mongo counts match expectations (`db.issuers.countDocuments()`, etc.).
|
||||
- Confirm Prometheus scrapes `issuer_directory_changes_total` and `issuer_directory_key_operations_total` for the tenants you restored.
|
||||
|
||||
## Disaster recovery notes
|
||||
- **Retention:** Maintain 30 daily + 12 monthly archives. Store copies in geographically separate, access-controlled vaults.
|
||||
@@ -98,6 +99,6 @@
|
||||
## Verification checklist
|
||||
- [ ] `/health/live` returns `200 OK`.
|
||||
- [ ] Mongo collections (`issuers`, `issuer_keys`, `issuer_trust_overrides`) have expected counts.
|
||||
- [ ] `issuer_directory_changes_total` and `issuer_directory_key_operations_total` metrics resume within 1 minute.
|
||||
- [ ] `issuer_directory_changes_total`, `issuer_directory_key_operations_total`, and `issuer_directory_key_validation_failures_total` metrics resume within 1 minute.
|
||||
- [ ] Audit entries exist for post-restore CRUD activity.
|
||||
- [ ] Client integrations (VEX Lens, Excititor) resolve issuers successfully.
|
||||
|
||||
@@ -39,6 +39,13 @@
|
||||
```
|
||||
Compose automatically mounts `../../etc/issuer-directory.yaml` into the container at `/etc/issuer-directory.yaml`, seeds CSAF publishers, and exposes the API on `https://localhost:8447`.
|
||||
|
||||
### Compose environment variables
|
||||
| Variable | Purpose | Default |
|
||||
| --- | --- | --- |
|
||||
| `ISSUER_DIRECTORY_PORT` | Host port that maps to container port `8080`. | `8447` |
|
||||
| `ISSUER_DIRECTORY_MONGO_CONNECTION_STRING` | Injected into `ISSUERDIRECTORY__MONGO__CONNECTIONSTRING`; should contain credentials. | `mongodb://${MONGO_INITDB_ROOT_USERNAME}:${MONGO_INITDB_ROOT_PASSWORD}@mongo:27017` |
|
||||
| `ISSUER_DIRECTORY_SEED_CSAF` | Toggles CSAF bootstrap on startup. Set to `false` after the first production import if you manage issuers manually. | `true` |
|
||||
|
||||
4. **Smoke test**
|
||||
```bash
|
||||
curl -k https://localhost:8447/health/live
|
||||
|
||||
@@ -12,6 +12,7 @@ Include the following artefacts in your Offline Update Kit staging tree:
|
||||
| `config/issuer-directory/issuer-directory.yaml` | `etc/issuer-directory.yaml` (customised) | Replace Authority issuer, tenant header, and log level as required. |
|
||||
| `config/issuer-directory/csaf-publishers.json` | `src/IssuerDirectory/StellaOps.IssuerDirectory/data/csaf-publishers.json` or regional override | Operators can edit before import to add private publishers. |
|
||||
| `secrets/issuer-directory/connection.env` | Secure secret store export (`ISSUER_DIRECTORY_MONGO_CONNECTION_STRING=`) | Encrypt at rest; Offline Kit importer places it in the Compose/Helm secret. |
|
||||
| `env/issuer-directory.env` (optional) | Curated `.env` snippet (for example `ISSUER_DIRECTORY_SEED_CSAF=false`) | Helps operators disable reseeding after their first import without editing the main profile. |
|
||||
| `docs/issuer-directory/deployment.md` | `docs/modules/issuer-directory/operations/deployment.md` | Ship alongside kit documentation for operators. |
|
||||
|
||||
> **Image digests:** Update `deploy/releases/2025.10-edge.yaml` (or the relevant manifest) with the exact digest before building the kit so `offline-manifest.json` can assert integrity.
|
||||
@@ -69,3 +70,4 @@ Include the following artefacts in your Offline Update Kit staging tree:
|
||||
- [ ] `/issuer-directory/issuers` returns global seed issuers (requires token with `issuer-directory:read` scope).
|
||||
- [ ] Audit collection receives entries when you create/update issuers offline.
|
||||
- [ ] Offline kit manifest (`offline-manifest.json`) lists `images/issuer-directory-web.tar` and `config/issuer-directory/issuer-directory.yaml` with SHA-256 values you recorded during packaging.
|
||||
- [ ] Prometheus in the offline environment reports `issuer_directory_changes_total` for the tenants imported from the kit.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,56 +1,56 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/notify/notify-event@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Notify Event Envelope",
|
||||
"type": "object",
|
||||
"required": ["eventId", "kind", "tenant", "ts", "payload"],
|
||||
"properties": {
|
||||
"eventId": {"type": "string", "format": "uuid"},
|
||||
"kind": {
|
||||
"type": "string",
|
||||
"description": "Event kind identifier (e.g. scanner.report.ready).",
|
||||
"enum": [
|
||||
"scanner.report.ready",
|
||||
"scanner.scan.completed",
|
||||
"scheduler.rescan.delta",
|
||||
"attestor.logged",
|
||||
"zastava.admission",
|
||||
"feedser.export.completed",
|
||||
"vexer.export.completed"
|
||||
]
|
||||
},
|
||||
"version": {"type": "string"},
|
||||
"tenant": {"type": "string"},
|
||||
"ts": {"type": "string", "format": "date-time"},
|
||||
"actor": {"type": "string"},
|
||||
"scope": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"namespace": {"type": "string"},
|
||||
"repo": {"type": "string"},
|
||||
"digest": {"type": "string"},
|
||||
"component": {"type": "string"},
|
||||
"image": {"type": "string"},
|
||||
"labels": {"$ref": "#/$defs/stringMap"},
|
||||
"attributes": {"$ref": "#/$defs/stringMap"}
|
||||
},
|
||||
"additionalProperties": false
|
||||
},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"description": "Event specific body; see individual schemas for shapes.",
|
||||
"additionalProperties": true
|
||||
},
|
||||
"attributes": {"$ref": "#/$defs/stringMap"}
|
||||
},
|
||||
"additionalProperties": false,
|
||||
"$defs": {
|
||||
"stringMap": {
|
||||
"type": "object",
|
||||
"patternProperties": {
|
||||
".*": {"type": "string"}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
}
|
||||
}
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/notify/notify-event@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Notify Event Envelope",
|
||||
"type": "object",
|
||||
"required": ["eventId", "kind", "tenant", "ts", "payload"],
|
||||
"properties": {
|
||||
"eventId": {"type": "string", "format": "uuid"},
|
||||
"kind": {
|
||||
"type": "string",
|
||||
"description": "Event kind identifier (e.g. scanner.report.ready).",
|
||||
"enum": [
|
||||
"scanner.report.ready",
|
||||
"scanner.scan.completed",
|
||||
"scheduler.rescan.delta",
|
||||
"attestor.logged",
|
||||
"zastava.admission",
|
||||
"conselier.export.completed",
|
||||
"excitor.export.completed"
|
||||
]
|
||||
},
|
||||
"version": {"type": "string"},
|
||||
"tenant": {"type": "string"},
|
||||
"ts": {"type": "string", "format": "date-time"},
|
||||
"actor": {"type": "string"},
|
||||
"scope": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"namespace": {"type": "string"},
|
||||
"repo": {"type": "string"},
|
||||
"digest": {"type": "string"},
|
||||
"component": {"type": "string"},
|
||||
"image": {"type": "string"},
|
||||
"labels": {"$ref": "#/$defs/stringMap"},
|
||||
"attributes": {"$ref": "#/$defs/stringMap"}
|
||||
},
|
||||
"additionalProperties": false
|
||||
},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"description": "Event specific body; see individual schemas for shapes.",
|
||||
"additionalProperties": true
|
||||
},
|
||||
"attributes": {"$ref": "#/$defs/stringMap"}
|
||||
},
|
||||
"additionalProperties": false,
|
||||
"$defs": {
|
||||
"stringMap": {
|
||||
"type": "object",
|
||||
"patternProperties": {
|
||||
".*": {"type": "string"}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,6 +8,8 @@ Policy Engine compiles and evaluates Stella DSL policies deterministically, prod
|
||||
- [Architecture](./architecture.md)
|
||||
- [Implementation plan](./implementation_plan.md)
|
||||
- [Task board](./TASKS.md)
|
||||
- [Secret leak detection readiness](../policy/secret-leak-detection-readiness.md)
|
||||
- [Windows package readiness](../policy/windows-package-readiness.md)
|
||||
|
||||
## How to get started
|
||||
1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.
|
||||
|
||||
@@ -1,31 +1,33 @@
|
||||
# StellaOps Policy Engine
|
||||
|
||||
Policy Engine compiles and evaluates Stella DSL policies deterministically, producing explainable findings with full provenance.
|
||||
|
||||
## Responsibilities
|
||||
- Compile `stella-dsl@1` packs into executable graphs.
|
||||
- Join advisories, VEX evidence, and SBOM inventories to derive effective findings.
|
||||
- Expose simulation and diff APIs for UI/CLI workflows.
|
||||
- Emit change-stream driven events for Notify/Scheduler integrations.
|
||||
|
||||
## Key components
|
||||
- `StellaOps.Policy.Engine` service host.
|
||||
- Shared libraries under `StellaOps.Policy.*` for evaluation, storage, DSL tooling.
|
||||
|
||||
## Integrations & dependencies
|
||||
- MongoDB findings collections, RustFS explain bundles.
|
||||
- Scheduler for incremental re-evaluation triggers.
|
||||
- CLI/UI for policy authoring and runs.
|
||||
|
||||
## Operational notes
|
||||
- DSL grammar and lifecycle docs in ../../policy/.
|
||||
- Observability guidance in ../../observability/policy.md.
|
||||
- Governance and scope mapping in ../../security/policy-governance.md.
|
||||
|
||||
## Backlog references
|
||||
- DOCS-POLICY-20-001 … DOCS-POLICY-20-012 (completed baseline).
|
||||
- DOCS-POLICY-23-007 (upcoming command updates).
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 2 – Policy Engine & Editor:** deliver deterministic evaluation, DSL infrastructure, explain traces, and incremental runs.
|
||||
- **Epic 4 – Policy Studio:** integrate registry workflows, simulation at scale, approvals, and promotion semantics.
|
||||
# StellaOps Policy Engine
|
||||
|
||||
Policy Engine compiles and evaluates Stella DSL policies deterministically, producing explainable findings with full provenance.
|
||||
|
||||
## Responsibilities
|
||||
- Compile `stella-dsl@1` packs into executable graphs.
|
||||
- Join advisories, VEX evidence, and SBOM inventories to derive effective findings.
|
||||
- Expose simulation and diff APIs for UI/CLI workflows.
|
||||
- Emit change-stream driven events for Notify/Scheduler integrations.
|
||||
|
||||
## Key components
|
||||
- `StellaOps.Policy.Engine` service host.
|
||||
- Shared libraries under `StellaOps.Policy.*` for evaluation, storage, DSL tooling.
|
||||
|
||||
## Integrations & dependencies
|
||||
- MongoDB findings collections, RustFS explain bundles.
|
||||
- Scheduler for incremental re-evaluation triggers.
|
||||
- CLI/UI for policy authoring and runs.
|
||||
|
||||
## Operational notes
|
||||
- DSL grammar and lifecycle docs in ../../policy/.
|
||||
- Observability guidance in ../../observability/policy.md.
|
||||
- Governance and scope mapping in ../../security/policy-governance.md.
|
||||
- Readiness briefs: ../policy/secret-leak-detection-readiness.md, ../policy/windows-package-readiness.md.
|
||||
- Readiness briefs: ../scanner/design/macos-analyzer.md, ../scanner/design/windows-analyzer.md, ../policy/secret-leak-detection-readiness.md, ../policy/windows-package-readiness.md.
|
||||
|
||||
## Backlog references
|
||||
- DOCS-POLICY-20-001 … DOCS-POLICY-20-012 (completed baseline).
|
||||
- DOCS-POLICY-23-007 (upcoming command updates).
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 2 – Policy Engine & Editor:** deliver deterministic evaluation, DSL infrastructure, explain traces, and incremental runs.
|
||||
- **Epic 4 – Policy Studio:** integrate registry workflows, simulation at scale, approvals, and promotion semantics.
|
||||
|
||||
@@ -1,9 +1,11 @@
|
||||
# Task board — Policy Engine
|
||||
|
||||
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
|
||||
|
||||
| ID | Status | Owner(s) | Description | Notes |
|
||||
|----|--------|----------|-------------|-------|
|
||||
| POLICY ENGINE-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
|
||||
| POLICY ENGINE-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
|
||||
| POLICY ENGINE-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |
|
||||
# Task board — Policy Engine
|
||||
|
||||
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
|
||||
|
||||
| ID | Status | Owner(s) | Description | Notes |
|
||||
|----|--------|----------|-------------|-------|
|
||||
| POLICY ENGINE-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
|
||||
| POLICY ENGINE-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
|
||||
| POLICY ENGINE-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |
|
||||
| POLICY-READINESS-0001 | DOING (2025-11-03) | Policy Guild, Security Guild | Resolve open questions in `../policy/secret-leak-detection-readiness.md` ahead of SCANNER-ENG-0007. | Decision workshop 2025-11-10 (Northwind demo); cover masking depth, telemetry retention, bundle defaults, tenant overrides. |
|
||||
| POLICY-READINESS-0002 | DOING (2025-11-03) | Policy Guild, Security Guild, Offline Kit Guild | Review `../policy/windows-package-readiness.md`, set signature verification locus, feed mirroring scopes, and legacy installer posture. | FinSecure PCI blocker; deliver Authenticode/feed decision by 2025-11-07 before analyzer spike kickoff. |
|
||||
|
||||
80
docs/modules/policy/secret-leak-detection-readiness.md
Normal file
80
docs/modules/policy/secret-leak-detection-readiness.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Secret Leak Detection Readiness — Policy & Security Brief
|
||||
|
||||
> Audience: Policy Guild, Security Guild
|
||||
> Related backlog: SCANNER-ENG-0007 (deterministic leak detection pipeline), DOCS-SCANNER-BENCH-62-007 (rule bundle documentation), SCANNER-SECRETS-01..03 (Surface.Secrets alignment)
|
||||
|
||||
## 1. Goal & scope
|
||||
- Provide a shared understanding of how the planned `StellaOps.Scanner.Analyzers.Secrets` plug-in will operate so Policy/Security can prepare governance controls in parallel with engineering delivery.
|
||||
- Document evidence flow, policy predicates, and offline distribution requirements to minimise lead time once implementation lands.
|
||||
- Capture open questions requiring Policy/Security sign-off (masking rules, tenancy constraints, waiver workflows).
|
||||
|
||||
## 2. Proposed evidence pipeline
|
||||
1. **Source resolution**
|
||||
- Surface.Secrets providers (Kubernetes, file bundle, inline) continue to resolve operational credentials. Handles remain opaque and never enter analyzer outputs.
|
||||
2. **Deterministic scanning**
|
||||
- New plug-in executes signed rule bundles (regex + entropy signatures) stored under `scanner/rules/secrets/`.
|
||||
- Execution context restricted to read-only layer mount; analyzers emit `secret.leak` evidence with: `{rule.id, rule.version, confidence, severity, mask, file, line}`.
|
||||
3. **Analysis store persistence**
|
||||
- Findings are written into `ScanAnalysisStore` (`ScanAnalysisKeys.secretFindings`) so Policy Engine can ingest them alongside component fragments.
|
||||
4. **Policy overlay**
|
||||
- Policy predicates (see §3) evaluate evidence, lattice scores, and tenant-scoped allow/deny lists.
|
||||
- CLI/export surfaces show masked snippets and remediation hints.
|
||||
5. **Offline parity**
|
||||
- Rule bundles, signature manifests, and validator hash lists ship with Offline Kit; rule updates must be signed and versioned to preserve determinism.
|
||||
|
||||
## 3. Policy Engine considerations
|
||||
- **New predicates**
|
||||
- `secret.hasFinding(ruleId?, severity?, confidence?)`
|
||||
- `secret.bundle.version(requiredVersion)`
|
||||
- `secret.mask.applied` (bool) — verify masking for high severity hits.
|
||||
- `secret.path.allowlist` — tenant-configured allow list keyed by digest/path.
|
||||
- **Lattice weight suggestions**
|
||||
- High severity & high confidence → escalate to `block` unless waived.
|
||||
- Low confidence → default to `warn` with optional escalation when multiple matches occur (`secret.match.count >= N`).
|
||||
- **Waiver workflow**
|
||||
- Reuse VEX-first lattice approach: require attach of remediation note, ticket reference, and expiration date.
|
||||
- Ensure waivers attach rule version so upgraded rules re-evaluate automatically.
|
||||
- **Masking / privacy**
|
||||
- Minimum masking: first and last 2 characters retained; remainder replaced with `*`.
|
||||
- Persist masked payload only; full value never leaves scanner context.
|
||||
|
||||
## 4. Security guardrails
|
||||
- Rule bundle signing: Signer issues DSSE envelope for each ruleset; Policy must verify signature before enabling new bundle.
|
||||
- Tenant isolation: bundle enablement scoped per tenant; defaults deny unknown bundles.
|
||||
- Telemetry: emit `scanner.secret.finding_total{tenant, ruleId, severity}` with masking applied after count aggregation.
|
||||
- Access control: restrict retrieval of raw finding payloads to roles with `scanner.secret.read` scope; audits log query + tenant + rule id.
|
||||
|
||||
## 5. Offline & update flow
|
||||
1. Engineering publishes new bundle → Signer signs → Offline Kit includes bundle + manifest.
|
||||
2. Operators import bundle via CLI (`stella secrets bundle install --path <bundle>`).
|
||||
- CLI verifies signature, version monotonicity, and rule hash list.
|
||||
3. Policy update: push config snippet enabling bundle, severity mapping, and waiver templates.
|
||||
4. Run `stella secrets scan --dry-run` to verify deterministic output against golden fixtures before enabling in production.
|
||||
|
||||
## 6. Open questions / owners
|
||||
| Topic | Question | Owner | Target decision |
|
||||
| --- | --- | --- | --- |
|
||||
| Masking depth | Do we mask file paths or only payloads? | Security Guild | Sprint 132 design review |
|
||||
| Telemetry retention | Should secret finding metrics be sampled or full fidelity? | Policy + Observability Guild | Sprint 132 |
|
||||
| Default bundles | Which rule families ship enabled-by-default (cloud creds, SSH keys, JWT)? | Security Guild | Sprint 133 |
|
||||
| Tenant overrides | Format for per-tenant allow lists (path glob vs digest)? | Policy Guild | Sprint 133 |
|
||||
|
||||
### Decision tracker
|
||||
| Decision | Owner(s) | Target date | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| Masking depth (paths vs payloads) | Security Guild | 2025-11-10 | Pending — workshop aligned with Northwind demo |
|
||||
| Telemetry retention granularity | Policy + Observability Guild | 2025-11-10 | Pending |
|
||||
| Default rule bundles (cloud creds/SSH/JWT) | Security Guild | 2025-11-10 | Draft proposals under review |
|
||||
| Tenant override format | Policy Guild | 2025-11-10 | Pending |
|
||||
|
||||
## 7. Next steps
|
||||
1. Policy Guild drafts predicate specs + policy templates (map to DOCS-SCANNER-BENCH-62-007 exit criteria).
|
||||
2. Security Guild reviews signing + masking requirements; align with Surface.Secrets roadmap.
|
||||
3. Docs Guild (this task) continues maintaining `docs/benchmarks/scanner/deep-dives/secrets.md` with finalized rule taxonomy and references.
|
||||
4. Engineering provides prototype fixture outputs for review once SCANNER-ENG-0007 spikes begin.
|
||||
|
||||
|
||||
## Coordination
|
||||
- Capture macOS customer requirements via `docs/benchmarks/scanner/windows-macos-demand.md` (Northwind Health Services).
|
||||
- Update dashboard (`docs/api/scanner/windows-macos-summary.md`) after readiness decisions.
|
||||
- Share outcomes from 2025-11-10 macOS demo before finalising POLICY-READINESS-0001.
|
||||
92
docs/modules/policy/windows-package-readiness.md
Normal file
92
docs/modules/policy/windows-package-readiness.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Windows Package Coverage — Policy & Security Readiness Brief
|
||||
|
||||
> Audience: Policy Guild, Security Guild, Offline Kit Guild
|
||||
> Related engineering backlog (proposed): SCANNER-ENG-0024..0027
|
||||
> Docs linkage: DOCS-SCANNER-BENCH-62-016
|
||||
|
||||
## 1. Goal
|
||||
- Prepare policy and security guidance ahead of Windows analyzer implementation (MSI, WinSxS, Chocolatey, registry).
|
||||
- Define evidence handling, predicates, waiver expectations, and offline prerequisites so engineering can align during spike execution.
|
||||
|
||||
## 2. Evidence pipeline snapshot (from `design/windows-analyzer.md`)
|
||||
1. **Collection**
|
||||
- MSI database parsing → component records keyed by ProductCode/ComponentCode.
|
||||
- WinSxS manifests → assembly identities, catalog signatures.
|
||||
- Chocolatey packages → nuspec metadata, feed provenance, script hashes.
|
||||
- Registry exports → uninstall/service entries, legacy installers.
|
||||
- Driver/service mapper → capability overlays (kernel-mode, auto-start).
|
||||
2. **Storage**
|
||||
- Results persisted as `LayerComponentFragment`s plus capability overlays (`ScanAnalysisKeys.capability.windows`).
|
||||
- Provenance metadata includes signature thumbprint, catalog hash, feed URL, install context.
|
||||
3. **Downstream**
|
||||
- Policy Engine consumes component + capability evidence; Export Center bundles MSI manifests, nuspec metadata, catalog hashes.
|
||||
|
||||
## 3. Policy predicate requirements
|
||||
| Predicate | Description | Initial default |
|
||||
| --- | --- | --- |
|
||||
| `windows.package.signed(thumbprint?)` | True when Authenticode signature/cert matches allowlist. | Warn on missing signature, fail on mismatched thumbprint for kernel drivers. |
|
||||
| `windows.package.sourceAllowed(sourceId)` | Validates Chocolatey/nuget feed against tenant allowlist. | Fail if feed not in tenant policy. |
|
||||
| `windows.driver.kernelMode()` | Flags kernel-mode drivers for extra scrutiny. | Fail when unsigned; warn otherwise. |
|
||||
| `windows.driver.signedBy(publisher)` | Checks driver publisher matches allowlist. | Warn on unknown publisher. |
|
||||
| `windows.service.autoStart(name)` | Identifies auto-start services. | Warn if unsigned binary or service not in allowlist. |
|
||||
| `windows.package.legacyInstaller()` | Legacy EXE-only installers detected via registry. | Warn by default; escalate if binary unsigned. |
|
||||
|
||||
Additional considerations:
|
||||
- Map KB references (from WinSxS/MSP metadata) to vulnerability posture once Policy Engine supports patch layering.
|
||||
- Provide predicates to waive specific ProductCodes or AssemblyIdentities with expiration.
|
||||
|
||||
## 4. Waiver & governance model
|
||||
- Waiver key: `{productCode, version, signatureThumbprint}` or for drivers `{driverName, serviceName, signatureThumbprint}`.
|
||||
- Required metadata: remediation ticket, justification, expiry date.
|
||||
- Automated re-evaluation when version or signature changes.
|
||||
- Tenants maintain allow lists for Chocolatey feeds and driver publishers via policy configuration.
|
||||
|
||||
## 5. Masking & privacy
|
||||
- Findings should not include raw script contents; provide SHA256 hash and limited excerpt (first/last 8 chars).
|
||||
- Registry values (install paths, command lines) must be truncated if they contain secrets; rely on Surface.Secrets to manage environment variables referenced during install scripts.
|
||||
|
||||
## 6. Offline kit guidance
|
||||
- Bundle:
|
||||
- MSI parser binary + schema definitions.
|
||||
- Chocolatey feed snapshot(s) (nupkg files) with hash manifest.
|
||||
- Microsoft root/intermediate certificate bundles; optional CRL/OCSP cache instructions.
|
||||
- Operators must export registry hives (`SOFTWARE`, `SYSTEM`) during image extraction; document PowerShell script and required access.
|
||||
- Provide checksum manifest to verify feed snapshot integrity.
|
||||
|
||||
## 7. Telemetry expectations
|
||||
- Metrics:
|
||||
- `scanner.windows.package_total{tenant,signed}` — count packages per signature state.
|
||||
- `scanner.windows.driver_unsigned_total{tenant}`.
|
||||
- `scanner.windows.choco_feed_total{tenant,feed}`.
|
||||
- Logs:
|
||||
- Include product code, version, signature thumbprint, feed ID (no file paths unless sanitized).
|
||||
- Traces:
|
||||
- Annotate collector spans (`collector.windows.msi`, `collector.windows.winsxs`, etc.) with component counts and parsing duration.
|
||||
|
||||
## 8. Open questions
|
||||
| Topic | Question | Owner | Target decision |
|
||||
| --- | --- | --- | --- |
|
||||
| Signature verification locus | Scanner vs Policy: where to verify Authenticode signatures + revocation? | Security Guild | Sprint 133 |
|
||||
| Feed mirroring scope | Default set of Chocolatey feeds to mirror (official/community). | Product + Security Guild | Sprint 133 |
|
||||
| Legacy installers | Should we block unsigned EXE installers by default or allow warn-only posture? | Policy Guild | Sprint 134 |
|
||||
| Driver taxonomy | Define high-risk driver categories (kernel-mode, filter drivers) for policy severity. | Policy Guild | Sprint 134 |
|
||||
|
||||
### Decision tracker
|
||||
| Decision | Owner(s) | Target date | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| Authenticode verification locus (Scanner vs Policy) | Security Guild | 2025-11-07 | Pending — blocker for FinSecure |
|
||||
| Chocolatey feed mirroring scope | Product + Security Guild | 2025-11-07 | Draft proposal circulating |
|
||||
| Legacy installer posture (warn vs fail) | Policy Guild | 2025-11-14 | Not started |
|
||||
| Driver risk taxonomy | Policy Guild | 2025-11-14 | Not started |
|
||||
|
||||
## 9. Next steps
|
||||
1. Policy Guild drafts predicate specs + policy templates; align with DOCS-SCANNER-BENCH-62-016.
|
||||
2. Security Guild evaluates signature verification approach and revocation handling (online vs offline CRL cache).
|
||||
3. Offline Kit Guild scopes snapshot size and update cadence for Chocolatey feeds and certificate bundles.
|
||||
4. Docs Guild prepares policy/user guidance updates once predicates are finalised.
|
||||
5. Security Guild to report decision for FinSecure Corp (POLICY-READINESS-0002) by 2025-11-07; feed outcome into dashboards.
|
||||
|
||||
## Coordination
|
||||
- Sync demand signals via `docs/benchmarks/scanner/windows-macos-demand.md`.
|
||||
- Log policy readiness status in `docs/api/scanner/windows-coverage.md`.
|
||||
- Update Windows/macOS metrics dashboard when decisions change (`docs/api/scanner/windows-macos-summary.md`).
|
||||
@@ -8,6 +8,12 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
|
||||
- [Architecture](./architecture.md)
|
||||
- [Implementation plan](./implementation_plan.md)
|
||||
- [Task board](./TASKS.md)
|
||||
- [macOS analyzer design](./design/macos-analyzer.md)
|
||||
- [Windows analyzer design](./design/windows-analyzer.md)
|
||||
- [Field engagement workflow](./operations/field-engagement.md)
|
||||
- [Design dossiers](./design/README.md)
|
||||
- [Benchmarks overview](../../benchmarks/scanner/README.md)
|
||||
- [Benchmarks overview](../../benchmarks/scanner/README.md)
|
||||
|
||||
## How to get started
|
||||
1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.
|
||||
|
||||
@@ -1,38 +1,46 @@
|
||||
# StellaOps Scanner
|
||||
|
||||
Scanner analyses container images layer-by-layer, producing deterministic SBOM fragments, diffs, and signed reports.
|
||||
|
||||
## Responsibilities
|
||||
- Expose APIs (WebService) for scan orchestration, diffing, and artifact retrieval.
|
||||
- Run Worker analyzers for OS, language, and native ecosystems with restart-only plug-ins.
|
||||
- Store SBOM fragments and artifacts in RustFS/object storage.
|
||||
- Publish DSSE-ready metadata for Signer/Attestor and downstream policy evaluation.
|
||||
|
||||
## Key components
|
||||
- `StellaOps.Scanner.WebService` minimal API host.
|
||||
- `StellaOps.Scanner.Worker` analyzer executor.
|
||||
- Analyzer libraries under `StellaOps.Scanner.Analyzers.*`.
|
||||
|
||||
## Integrations & dependencies
|
||||
- Scheduler for job intake and retries.
|
||||
- Policy Engine for evidence handoff.
|
||||
- Export Center / Offline Kit for artifact packaging.
|
||||
|
||||
## Operational notes
|
||||
- CAS caches, bounded retries, DSSE integration.
|
||||
- Monitoring dashboards (see ./operations/analyzers-grafana-dashboard.json).
|
||||
- RustFS migration playbook.
|
||||
|
||||
## Related resources
|
||||
- ./operations/analyzers.md
|
||||
- ./operations/analyzers-grafana-dashboard.json
|
||||
- ./operations/rustfs-migration.md
|
||||
- ./operations/entrypoint.md
|
||||
|
||||
## Backlog references
|
||||
- DOCS-SCANNER updates tracked in ../../TASKS.md.
|
||||
- Analyzer parity work in src/Scanner/**/TASKS.md.
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 6 – Vulnerability Explorer:** provide policy-aware scan outputs, explain traces, and findings ledger hooks for triage workflows.
|
||||
- **Epic 10 – Export Center:** generate export-ready artefacts, manifests, and DSSE metadata for bundles.
|
||||
# StellaOps Scanner
|
||||
|
||||
Scanner analyses container images layer-by-layer, producing deterministic SBOM fragments, diffs, and signed reports.
|
||||
|
||||
## Responsibilities
|
||||
- Expose APIs (WebService) for scan orchestration, diffing, and artifact retrieval.
|
||||
- Run Worker analyzers for OS, language, and native ecosystems with restart-only plug-ins.
|
||||
- Store SBOM fragments and artifacts in RustFS/object storage.
|
||||
- Publish DSSE-ready metadata for Signer/Attestor and downstream policy evaluation.
|
||||
|
||||
## Key components
|
||||
- `StellaOps.Scanner.WebService` minimal API host.
|
||||
- `StellaOps.Scanner.Worker` analyzer executor.
|
||||
- Analyzer libraries under `StellaOps.Scanner.Analyzers.*`.
|
||||
|
||||
## Integrations & dependencies
|
||||
- Scheduler for job intake and retries.
|
||||
- Policy Engine for evidence handoff.
|
||||
- Export Center / Offline Kit for artifact packaging.
|
||||
|
||||
## Operational notes
|
||||
- CAS caches, bounded retries, DSSE integration.
|
||||
- Monitoring dashboards (see ./operations/analyzers-grafana-dashboard.json).
|
||||
- RustFS migration playbook.
|
||||
|
||||
## Related resources
|
||||
- ./operations/analyzers.md
|
||||
- ./operations/analyzers-grafana-dashboard.json
|
||||
- ./operations/rustfs-migration.md
|
||||
- ./operations/entrypoint.md
|
||||
- ./design/macos-analyzer.md
|
||||
- ./design/windows-analyzer.md
|
||||
- ../benchmarks/scanner/deep-dives/macos.md
|
||||
- ../benchmarks/scanner/deep-dives/windows.md
|
||||
- ../benchmarks/scanner/windows-macos-demand.md
|
||||
- ../benchmarks/scanner/windows-macos-interview-template.md
|
||||
- ./operations/field-engagement.md
|
||||
- ./design/README.md
|
||||
|
||||
## Backlog references
|
||||
- DOCS-SCANNER updates tracked in ../../TASKS.md.
|
||||
- Analyzer parity work in src/Scanner/**/TASKS.md.
|
||||
|
||||
## Epic alignment
|
||||
- **Epic 6 – Vulnerability Explorer:** provide policy-aware scan outputs, explain traces, and findings ledger hooks for triage workflows.
|
||||
- **Epic 10 – Export Center:** generate export-ready artefacts, manifests, and DSSE metadata for bundles.
|
||||
|
||||
@@ -28,5 +28,13 @@
|
||||
| SCANNER-ENG-0005 | TODO | Go Analyzer Guild | Enhance Go stripped-binary fallback inference per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Include inferred module metadata & policy integration |
|
||||
| SCANNER-ENG-0006 | TODO | Rust Analyzer Guild | Expand Rust fingerprint coverage per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Ship enriched fingerprint catalogue + policy controls |
|
||||
| SCANNER-ENG-0007 | TODO | Scanner Guild, Policy Guild | Design deterministic secret leak detection pipeline per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Include rule packaging, Policy Engine integration, CLI workflow |
|
||||
| SCANNER-ENG-0020 | TODO | Scanner Guild (macOS Cellar Squad) | Implement Homebrew collector and fragment mapper per `design/macos-analyzer.md` §3.1. | Emit brew component fragments with tap provenance; integrate Surface.Validation/FS limits. |
|
||||
| SCANNER-ENG-0021 | TODO | Scanner Guild (macOS Receipts Squad) | Implement pkgutil receipt collector per `design/macos-analyzer.md` §3.2. | Parse receipts/BOMs into deterministic component records with install metadata. |
|
||||
| SCANNER-ENG-0022 | TODO | Scanner Guild, Policy Guild (macOS Bundles Squad) | Implement macOS bundle inspector & capability overlays per `design/macos-analyzer.md` §3.3. | Extract signing/entitlements, emit capability evidence, merge with receipts/Homebrew. |
|
||||
| SCANNER-ENG-0023 | TODO | Scanner Guild, Offline Kit Guild, Policy Guild | Deliver macOS policy/offline integration per `design/macos-analyzer.md` §5–6. | Define policy predicates, CLI toggles, Offline Kit packaging, and documentation. |
|
||||
| SCANNER-ENG-0024 | TODO | Scanner Guild (Windows MSI Squad) | Implement Windows MSI collector per `design/windows-analyzer.md` §3.1. | Parse MSI databases, emit component fragments with provenance metadata; blocked until POLICY-READINESS-0002 (decision due 2025-11-07). |
|
||||
| SCANNER-ENG-0025 | TODO | Scanner Guild (Windows WinSxS Squad) | Implement WinSxS manifest collector per `design/windows-analyzer.md` §3.2. | Correlate assemblies with MSI components and catalog signatures; dependent on POLICY-READINESS-0002 outcome. |
|
||||
| SCANNER-ENG-0026 | TODO | Scanner Guild (Windows Packages Squad) | Implement Chocolatey & registry collectors per `design/windows-analyzer.md` §3.3–3.4. | Harvest nuspec metadata and registry uninstall/service evidence; merge with filesystem artefacts; align with feed decisions from POLICY-READINESS-0002. |
|
||||
| SCANNER-ENG-0027 | TODO | Scanner Guild, Policy Guild, Offline Kit Guild | Deliver Windows policy/offline integration per `design/windows-analyzer.md` §5–6. | Define predicates, CLI/Offline docs, and packaging for feeds/certs; start after POLICY-READINESS-0002 sign-off. |
|
||||
| SCANNER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
|
||||
| SCANNER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |
|
||||
|
||||
35
docs/modules/scanner/design/README.md
Normal file
35
docs/modules/scanner/design/README.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Scanner Design Dossiers
|
||||
|
||||
This directory contains deep technical designs for current and upcoming analyzers and surface components.
|
||||
|
||||
## Language analyzers
|
||||
- `ruby-analyzer.md` — lockfile, runtime graph, capability signals for Ruby.
|
||||
|
||||
## Surface & platform contracts
|
||||
- `surface-fs.md`
|
||||
- `surface-env.md`
|
||||
- `surface-validation.md`
|
||||
- `surface-secrets.md`
|
||||
|
||||
## OS ecosystem designs
|
||||
- `macos-analyzer.md` — Homebrew, pkgutil, `.app` bundle plan.
|
||||
- `windows-analyzer.md` — MSI, WinSxS, Chocolatey, registry collectors.
|
||||
|
||||
## Demand & dashboards
|
||||
- `../../benchmarks/scanner/windows-macos-demand.md` — demand tracker.
|
||||
- `../../benchmarks/scanner/windows-macos-interview-template.md` — interview template.
|
||||
- `../../api/scanner/windows-coverage.md` — coverage summary dashboard.
|
||||
- `../../api/scanner/windows-macos-summary.md` — metric snapshot.
|
||||
|
||||
## Utility & reference
|
||||
- `../operations/field-engagement.md` — SE workflow guidance.
|
||||
- `../operations/analyzers.md` — operational runbook.
|
||||
- `../operations/rustfs-migration.md` — storage migration notes.
|
||||
|
||||
## Maintenance tips
|
||||
- Keep demand tracker (`../../benchmarks/scanner/windows-macos-demand.md`) and API dashboards in sync when updating macOS/Windows designs.
|
||||
- Cross-reference policy readiness briefs for associated predicates and waiver models.
|
||||
|
||||
## Policy readiness
|
||||
- `../policy/secret-leak-detection-readiness.md` — secret leak pipeline decisions.
|
||||
- `../policy/windows-package-readiness.md` — Windows analyzer policy decisions.
|
||||
123
docs/modules/scanner/design/macos-analyzer.md
Normal file
123
docs/modules/scanner/design/macos-analyzer.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# macOS Analyzer Design Brief (Draft)
|
||||
|
||||
> Owners: Scanner Guild, Policy Guild, Offline Kit Guild
|
||||
> Related backlog (proposed): SCANNER-ENG-0020..0023, DOCS-SCANNER-BENCH-62-002
|
||||
> Status: Draft pending demand validation (see `docs/benchmarks/scanner/windows-macos-demand.md`)
|
||||
|
||||
## 1. Scope & objectives
|
||||
- Deliver deterministic inventory coverage for macOS container and VM images, focusing on Homebrew, Apple/pkgutil receipts, and `.app` bundles.
|
||||
- Preserve StellaOps principles: per-layer provenance, offline parity, policy explainability, and signed evidence pipelines.
|
||||
- Provide capability signals (entitlements, hardened runtime, launch agents) to enable Policy Engine gating.
|
||||
|
||||
Out of scope (Phase 1):
|
||||
- Dynamic runtime tracing of macOS services (deferred to Zastava/EntryTrace).
|
||||
- iOS/tvOS/visionOS package formats (will reuse bundle inspection where feasible).
|
||||
- Direct notarization ticket validation (delegated to Policy Engine unless otherwise decided).
|
||||
|
||||
## 2. High-level architecture
|
||||
```
|
||||
Scanner.Worker (macOS profile)
|
||||
├─ Surface.Validation (enforce allowlists, bundle size limits)
|
||||
├─ Surface.FS (layer/materialized filesystem)
|
||||
├─ HomebrewCollector (new) -> LayerComponentFragment (brew)
|
||||
├─ PkgutilCollector (new) -> LayerComponentFragment (pkgutil)
|
||||
├─ BundleInspector (new) -> Capability records + component fragments
|
||||
├─ LaunchAgentMapper (optional) -> Usage hints
|
||||
└─ MacOsAggregator (new) -> merges fragments, emits ComponentGraph & capability overlays
|
||||
```
|
||||
|
||||
- Each collector runs deterministically against the mounted filesystem; results persist in `ScanAnalysisStore` under dedicated keys before being normalized by `MacOsComponentMapper`.
|
||||
- Layer digests follow existing `LayerComponentFragment` semantics to maintain diff/replay parity.
|
||||
|
||||
## 3. Collectors & data sources
|
||||
### 3.1 Homebrew collector
|
||||
- Scan `/usr/local/Cellar/**` and `/opt/homebrew/Cellar/**` for `INSTALL_RECEIPT.json` and `*.rb` formula metadata.
|
||||
- Output fields: `tap`, `formula`, `version`, `revision`, `poured_from_bottle`, `installed_with`: [].
|
||||
- Store `bottle.sha256`, `source.url`, and `tapped_from` metadata for provenance.
|
||||
- Map to PURL: `pkg:brew/<tap>/<formula>@<version>?revision=<revision>`.
|
||||
- Guardrails: limit traversal depth, ignore caches (`/Caskroom` optional), enforce 200MB per formula cap (configurable).
|
||||
|
||||
### 3.2 pkgutil receipt collector
|
||||
- Parse `/var/db/receipts/*.plist` for pkg identifiers, version, install time, volume, installer domain.
|
||||
- Use `.bom` files to enumerate installed files; capture path hashes via `osx.BomParser`.
|
||||
- Emit `ComponentRecord`s keyed by `pkgutil:<identifier>` with metadata: `bundleIdentifier`, `installTimeUtc`, `volume`.
|
||||
- Provide dependency mapping between pkg receipts and Homebrew formulae when overlapping file hashes exist (optional Phase 2).
|
||||
|
||||
### 3.3 Bundle inspector
|
||||
- Traverse `/Applications`, `/System/Applications`, `/Users/*/Applications`, `/Library/Application Support`.
|
||||
- For each `.app`:
|
||||
- Read `Info.plist` (bundle id, version, short version, minimum system).
|
||||
- Extract code signing metadata via `codesign --display --xml` style parsing (Team ID, certificate chain, hardened runtime flag).
|
||||
- Parse entitlements (`archived-entitlements.xcent`) and map to capability taxonomy (network, camera, automation, etc.).
|
||||
- Hash `CodeResources` manifest to support provenance comparisons.
|
||||
- Link `.app` bundles to receipts and Homebrew formulae using bundle id or install path heuristics.
|
||||
- Emit capability overlays (e.g., `macos.entitlement("com.apple.security.network.server") == true`).
|
||||
|
||||
### 3.4 Launch agent mapper (optional Phase 1 extension)
|
||||
- Inspect `/Library/Launch{Agents,Daemons}` and user equivalents.
|
||||
- Parse `plist` for program arguments, run conditions, sockets, environment.
|
||||
- Feed derived runtime hints into EntryTrace (`UsageHints`) to mark active binaries vs dormant installations.
|
||||
|
||||
## 4. Aggregation & output
|
||||
- `MacOsComponentMapper` converts collector results into:
|
||||
- `LayerComponentFragment` arrays keyed by synthetic digests (`sha256:stellaops-os-macbrew`, etc.).
|
||||
- `ComponentMetadata` entries capturing tap origin, Team ID, entitlements, notarization flag (when available).
|
||||
- Capability overlays stored under `ScanAnalysisKeys.capability.macOS`.
|
||||
- Export Center enhancements:
|
||||
- Include Homebrew metadata manifests and CodeResources hashes in attested bundles.
|
||||
- Provide optional notarization proof attachments if Policy Engine later requires them.
|
||||
|
||||
## 5. Policy & governance integration
|
||||
- Predicates to introduce:
|
||||
- `macos.bundle.signed(teamId?, hardenedRuntime?)`
|
||||
- `macos.entitlement(name)`
|
||||
- `macos.pkg.receipt(identifier, version?)`
|
||||
- `macos.notarized` (pending Security decision).
|
||||
- Lattice guidance:
|
||||
- Unsigned/unnotarized third-party apps default to `warn`.
|
||||
- High-risk entitlements (camera, screen capture) escalate severity unless whitelisted.
|
||||
- Waiver model similar to existing EntryTrace/Secrets: require bundle hash + Team ID to avoid broad exceptions.
|
||||
|
||||
## 6. Offline kit considerations
|
||||
- Mirror required Homebrew taps (`homebrew/core`, `homebrew/cask`) and freeze metadata per release.
|
||||
- Bundle notarization cache instructions (CRL/OCSP) so operators can prefetch without Apple connectivity.
|
||||
- Package Apple certificate chain updates (WWDR) and provide verification script to ensure validity.
|
||||
- Document disk space impacts (Homebrew cellar snapshots ~500MB per tap snapshot).
|
||||
|
||||
## 7. Testing strategy
|
||||
- Unit tests with fixture directories:
|
||||
- Sample Cellar tree with multiple formulas (bottled vs source builds).
|
||||
- Synthetic pkg receipts + BOM files.
|
||||
- `.app` bundles with varying signing states (signed/notarized/unsigned).
|
||||
- Integration tests:
|
||||
- Build macOS rootfs tarballs via CI job (runner requiring macOS) that mimic layered container conversion.
|
||||
- Verify deterministic output by re-running collectors and diffing results (CI gating).
|
||||
- Offline compliance tests:
|
||||
- Ensure rule bundles and caches load without network by running integration suite in sealed environment.
|
||||
|
||||
## 8. Dependencies & open items
|
||||
| Item | Description | Owner | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| Demand threshold | Need ≥3 qualified customer asks to justify engineering investment | Product Guild | Active (DOCS-SCANNER-BENCH-62-002) |
|
||||
| macOS rootfs fixtures | Require automation to produce macOS layer tarballs for tests | Scanner Guild | Pending design |
|
||||
| Notarization validation | Decide whether scanner performs `spctl --assess` or defers | Security Guild | TBD |
|
||||
| Entitlement taxonomy | Finalize capability grouping for Policy Engine | Policy Guild | TBD |
|
||||
|
||||
## 9. Proposed backlog entries
|
||||
| ID (proposed) | Title | Summary |
|
||||
| --- | --- | --- |
|
||||
| SCANNER-ENG-0020 | Implement Homebrew collector and fragment mapper | Parse cellar manifests, emit brew component records, integrate with Surface.FS/Validation. |
|
||||
| SCANNER-ENG-0021 | Implement pkgutil receipt collector | Parse receipts/BOM, output installer component records with provenance. |
|
||||
| SCANNER-ENG-0022 | Implement macOS bundle inspector & capability overlays | Extract plist/signing/entitlements, produce capability evidence and merge with receipts. |
|
||||
| SCANNER-ENG-0023 | Policy & Offline integration for macOS | Define predicates, CLI toggles, Offline Kit packaging, and documentation. |
|
||||
|
||||
## 10. References
|
||||
- `docs/benchmarks/scanner/deep-dives/macos.md` — competitor snapshot and roadmap summary.
|
||||
- `docs/benchmarks/scanner/windows-macos-demand.md` — demand capture log.
|
||||
- `docs/modules/scanner/design/surface-secrets.md`, `surface-fs.md`, `surface-validation.md` — surface contracts leveraged by collectors.
|
||||
|
||||
Further reading: `../../api/scanner/windows-coverage.md` (summary) and `../../api/scanner/windows-macos-summary.md` (metrics dashboard).
|
||||
|
||||
Policy readiness alignment: see `../policy/secret-leak-detection-readiness.md` (POLICY-READINESS-0001).
|
||||
|
||||
Upcoming milestone: Northwind Health Services demo on 2025-11-10 to validate notarization/entitlement outputs before final policy sign-off.
|
||||
135
docs/modules/scanner/design/windows-analyzer.md
Normal file
135
docs/modules/scanner/design/windows-analyzer.md
Normal file
@@ -0,0 +1,135 @@
|
||||
# Windows Analyzer Design Brief (Draft)
|
||||
|
||||
> Owners: Scanner Guild, Policy Guild, Offline Kit Guild, Security Guild
|
||||
> Related backlog (proposed): SCANNER-ENG-0024..0027, DOCS-SCANNER-BENCH-62-002
|
||||
> Status: Draft — contingent on Windows demand threshold (see `docs/benchmarks/scanner/windows-macos-demand.md`)
|
||||
|
||||
## 1. Objectives & boundaries
|
||||
- Provide deterministic inventory for Windows Server/container images covering MSI/WinSxS assemblies, Chocolatey packages, and registry-derived installers.
|
||||
- Preserve replayability (layer fragments, provenance metadata) and align outputs with existing SBOM/policy pipelines.
|
||||
- Respect sovereignty constraints: offline-friendly, signed rule bundles, no reliance on Windows APIs unavailable in containerized scans.
|
||||
|
||||
Out of scope (Phase 1):
|
||||
- Live registry queries on running Windows hosts (requires runtime agent; defer to Zastava/Runtime roadmap).
|
||||
- Windows Update patch baseline comparison (tracked separately under Runtime/Posture).
|
||||
- UWP/MSIX packages (flagged for follow-up once MSI parity is complete).
|
||||
|
||||
## 2. Architecture overview
|
||||
```
|
||||
Scanner.Worker (Windows profile)
|
||||
├─ Surface.Validation (enforce layer size, path allowlists)
|
||||
├─ Surface.FS (materialized NTFS image via 7z/guestmount)
|
||||
├─ MsiCollector -> LayerComponentFragment (windows-msi)
|
||||
├─ WinSxSCollector -> LayerComponentFragment (windows-winsxs)
|
||||
├─ ChocolateyCollector -> LayerComponentFragment (windows-choco)
|
||||
├─ RegistryCollector -> Evidence overlays (uninstall/services)
|
||||
├─ DriverCapabilityMapper -> Capability overlays (kernel/user drivers)
|
||||
└─ WindowsComponentMapper -> ComponentGraph + capability metadata
|
||||
```
|
||||
|
||||
- Collectors operate on extracted filesystem snapshots; registry access performed on exported hive files produced during image extraction (document in ops runbooks).
|
||||
- `WindowsComponentMapper` normalizes component identities (ProductCode, AssemblyIdentity, Chocolatey package ID) and merges overlapping evidence into deterministic fragments.
|
||||
|
||||
## 3. Collectors
|
||||
### 3.1 MSI collector
|
||||
- Input: `Windows/Installer/*.msi` database files (Jet OLE DB), registry hive exports for product mapping.
|
||||
- Implementation approach:
|
||||
- Use open-source MSI parser (custom or MIT-compatible) to avoid COM dependencies.
|
||||
- Extract Product, Component, File, Feature, Media tables.
|
||||
- Compute SHA256 for installed files via Component table, linking to WinSxS manifests.
|
||||
- Output metadata: `productCode`, `upgradeCode`, `productVersion`, `manufacturer`, `language`, `installContext`, `packageCode`, `sourceList`.
|
||||
- Evidence: file paths with digests, component IDs, CAB/patch references.
|
||||
|
||||
### 3.2 WinSxS collector
|
||||
- Input: `Windows/WinSxS/Manifests/*.manifest`, `Windows/WinSxS/` payload directories, catalog (.cat) files.
|
||||
- Parse XML assembly identities (name, version, processor architecture, public key token, language).
|
||||
- Map to MSI components when file hashes match.
|
||||
- Capture catalog signature thumbprint and optional patch KB references for policy gating.
|
||||
|
||||
### 3.3 Chocolatey collector
|
||||
- Input: `ProgramData/Chocolatey/lib/**`, `ProgramData/Chocolatey/package.backup`, `chocolateyinstall.ps1`, `.nuspec`.
|
||||
- Extract package ID, version, checksum, source feed, installed files and scripts.
|
||||
- Note whether install used cache or remote feed; record script hash for determinism.
|
||||
|
||||
### 3.4 Registry collector
|
||||
- Input: Exported `SOFTWARE` hive covering:
|
||||
- `Microsoft\Windows\CurrentVersion\Uninstall`
|
||||
- `Microsoft\Windows\CurrentVersion\Installer\UserData`
|
||||
- `Microsoft\Windows\CurrentVersion\Run` (startup apps)
|
||||
- Service/driver configuration from `SYSTEM` hive under `Services`.
|
||||
- Emit fallback evidence for installers not captured by MSI/Chocolatey (legacy EXE installers).
|
||||
- Record uninstall strings, install dates, publisher, estimated size, install location.
|
||||
|
||||
### 3.5 Driver & service mapper
|
||||
- Parse `SYSTEM` hive `Services` entries to detect drivers (type=1 or 2) and critical services (start mode auto/boot).
|
||||
- Output capability overlays (e.g., `windows.driver.kernelMode(true)`, `windows.service.autoStart("Spooler")`) for Policy Engine.
|
||||
|
||||
## 4. Component mapping & output
|
||||
- `WindowsComponentMapper`:
|
||||
- Generate `LayerComponentFragment`s with synthetic layer digests (e.g., `sha256:stellaops-windows-msi`).
|
||||
- Build `ComponentIdentity` with PURL-like scheme: `pkg:msi/<productCode>` or `pkg:winsxs/<assemblyIdentity>`.
|
||||
- Include metadata: signature thumbprint, catalog hash, KB references, install context, manufacturer.
|
||||
- Capability overlays stored under `ScanAnalysisKeys.capability.windows` for policy consumption.
|
||||
- Export Center bundling:
|
||||
- Include MSI manifest extracts, WinSxS assembly manifests, Chocolatey nuspec snapshots, and service/driver capability CSV.
|
||||
|
||||
## 5. Policy integration
|
||||
- Predicates to introduce:
|
||||
- `windows.package.signed(expectedThumbprint?)`
|
||||
- `windows.package.unsupportedInstallerType`
|
||||
- `windows.driver.kernelMode`, `windows.driver.unsigned`
|
||||
- `windows.service.autoStart(name)`
|
||||
- `windows.choco.sourceAllowed(feed)`
|
||||
- Lattice approach:
|
||||
- Unsigned kernel drivers → default `fail`.
|
||||
- Unknown installer sources → `warn` with escalation on critical services.
|
||||
- Chocolatey packages from non-whitelisted feeds → configurable severity.
|
||||
- Waiver semantics bind to product code + signature thumbprint; waivers expire when package version changes.
|
||||
|
||||
## 6. Offline kit & distribution
|
||||
- Package:
|
||||
- MSI schema definitions and parser binaries (signed).
|
||||
- Chocolatey feed snapshot (nupkg archives + index) for allow-listed feeds.
|
||||
- Windows catalog certificate chains + optional CRL/OCSP caches.
|
||||
- Documentation:
|
||||
- Provide instructions for exporting registry hives during image extraction (PowerShell script included).
|
||||
- Note disk space expectations (Chocolatey snapshot size, WinSxS manifest volume).
|
||||
|
||||
## 7. Testing strategy
|
||||
- Fixtures:
|
||||
- Sample MSI packages (with/without transforms), WinSxS manifests, Chocolatey packages.
|
||||
- Registry hive exports representing mixed installer types.
|
||||
- Tests:
|
||||
- Unit tests for each collector parsing edge cases (language-specific manifests, transforms, script hashing).
|
||||
- Integration tests using synthetic Windows container image layers (generated via CI on Windows worker).
|
||||
- Determinism checks ensuring repeated runs produce identical fragments.
|
||||
- Security review:
|
||||
- Validate script execution paths (collectors must never execute Chocolatey scripts; inspect only).
|
||||
|
||||
## 8. Dependencies & open questions
|
||||
| Item | Description | Owner | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| MSI parser choice | Select MIT/Apache-compatible parser or build internal reader | Scanner Guild | TBD |
|
||||
| Registry export tooling | Determine standard script/utility for hive exports in container context | Ops Guild | TBD |
|
||||
| Authenticodes verification locus | Decide scanner vs policy responsibility for signature verification | Security Guild | TBD |
|
||||
| Feed mirroring policy | Which Chocolatey feeds to mirror by default | Product + Security Guilds | TBD |
|
||||
|
||||
## 9. Proposed backlog entries
|
||||
| ID (proposed) | Title | Summary |
|
||||
| --- | --- | --- |
|
||||
| SCANNER-ENG-0024 | Implement Windows MSI collector | Parse MSI databases, emit component fragments with provenance metadata. |
|
||||
| SCANNER-ENG-0025 | Implement WinSxS manifest collector | Correlate assemblies with MSI components and catalog signatures. |
|
||||
| SCANNER-ENG-0026 | Implement Chocolatey & registry collectors | Harvest nuspec metadata and uninstall/service registry data. |
|
||||
| SCANNER-ENG-0027 | Policy & Offline integration for Windows | Define predicates, CLI toggles, Offline Kit packaging, documentation. |
|
||||
|
||||
## 10. References
|
||||
- `docs/benchmarks/scanner/deep-dives/windows.md`
|
||||
- `docs/benchmarks/scanner/windows-macos-demand.md`
|
||||
- `docs/modules/scanner/design/macos-analyzer.md` (structure/composition parallels)
|
||||
- Surface design docs (`surface-fs.md`, `surface-validation.md`, `surface-secrets.md`) for interfacing expectations.
|
||||
|
||||
Further reading: `../../api/scanner/windows-coverage.md` (summary) and `../../api/scanner/windows-macos-summary.md` (metrics dashboard).
|
||||
|
||||
Policy readiness alignment: see `../policy/windows-package-readiness.md` (POLICY-READINESS-0002).
|
||||
|
||||
Upcoming milestone: FinSecure Corp PCI review requires Authenticode/feed decision by 2025-11-07 before Windows analyzer spike kickoff.
|
||||
30
docs/modules/scanner/operations/field-engagement.md
Normal file
30
docs/modules/scanner/operations/field-engagement.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Field Engagement Playbook — Windows & macOS Coverage
|
||||
|
||||
> Audience: Field SEs, Product Specialists • Status: Draft
|
||||
|
||||
## Purpose
|
||||
Provide quick-reference guidance when prospects or customers ask about Windows/macOS coverage.
|
||||
|
||||
## Key talking points
|
||||
- **Current scope**: Scanner supports deterministic Linux coverage; Windows/macOS analyzers are in design.
|
||||
- **Roadmap**: macOS design (brew/pkgutil/.app) at `../design/macos-analyzer.md`; Windows design (MSI/WinSxS/Chocolatey) at `../design/windows-analyzer.md`.
|
||||
- **Demand tracking**: All signals captured in `../../benchmarks/scanner/windows-macos-demand.md` using the interview template.
|
||||
- **Policy readiness**: Secret leak detection briefing (`../../policy/secret-leak-detection-readiness.md`) and Windows package readiness (`../../policy/windows-package-readiness.md`).
|
||||
- **Backlog IDs**: MacOS (SCANNER-ENG-0020..0023), Windows (SCANNER-ENG-0024..0027), policy follow-ups (POLICY-READINESS-0001/0002).
|
||||
|
||||
## SE workflow
|
||||
1. Use the interview template to capture customer needs.
|
||||
2. Append structured summary to `windows-macos-demand.md` and update the API dashboards (`docs/api/scanner/windows-macos-summary.md`, `docs/api/scanner/windows-coverage.md`).
|
||||
3. Notify Product/Scanner guild during weekly sync; flag blockers in Jira.
|
||||
4. Add highlight to the “Recent updates” section in `docs/api/scanner/windows-macos-summary.md`.
|
||||
5. Track upcoming milestones (FinSecure decision 2025-11-07, Northwind demo 2025-11-10) and ensure readiness tasks reflect outcomes.
|
||||
|
||||
## FAQ snippets
|
||||
- *When will Windows/macOS analyzers be GA?* — Pending demand threshold; design complete, awaiting prioritisation.
|
||||
- *Can we run scans offline?* — Offline parity is a requirement; Offline Kit packaging detailed in design briefs.
|
||||
- *Do we cover Authenticode/notarization?* — Planned via Policy Engine predicates as part of readiness tasks.
|
||||
|
||||
## Contacts
|
||||
- Product lead: TBD (record in demand log when assigned)
|
||||
- Scanner guild rep: TBD
|
||||
- Policy guild rep: TBD
|
||||
@@ -1,426 +1,426 @@
|
||||
# component_architecture_scheduler.md — **Stella Ops Scheduler** (2025Q4)
|
||||
|
||||
> Synthesises the scheduling requirements documented across the Policy, Vulnerability Explorer, and Orchestrator module guides and implementation plans.
|
||||
|
||||
> **Scope.** Implementation‑ready architecture for **Scheduler**: a service that (1) **re‑evaluates** already‑cataloged images when intel changes (Feedser/Vexer/policy), (2) orchestrates **nightly** and **ad‑hoc** runs, (3) targets only the **impacted** images using the BOM‑Index, and (4) emits **report‑ready** events that downstream **Notify** fans out. Default mode is **analysis‑only** (no image pull); optional **content‑refresh** can be enabled per schedule.
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & boundaries
|
||||
|
||||
**Mission.** Keep scan results **current** without rescanning the world. When new advisories or VEX claims land, **pinpoint** affected images and ask the backend to recompute **verdicts** against the **existing SBOMs**. Surface only **meaningful deltas** to humans and ticket queues.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Scheduler **does not** compute SBOMs and **does not** sign. It calls Scanner/WebService’s **/reports (analysis‑only)** endpoint and lets the backend (Policy + Vexer + Feedser) decide PASS/FAIL.
|
||||
* Scheduler **may** ask Scanner to **content‑refresh** selected targets (e.g., mutable tags) but the default is **no** image pull.
|
||||
* Notifications are **not** sent directly; Scheduler emits events consumed by **Notify**.
|
||||
|
||||
---
|
||||
|
||||
## 1) Runtime shape & projects
|
||||
|
||||
```
|
||||
src/
|
||||
├─ StellaOps.Scheduler.WebService/ # REST (schedules CRUD, runs, admin)
|
||||
├─ StellaOps.Scheduler.Worker/ # planners + runners (N replicas)
|
||||
├─ StellaOps.Scheduler.ImpactIndex/ # purl→images inverted index (roaring bitmaps)
|
||||
├─ StellaOps.Scheduler.Models/ # DTOs (Schedule, Run, ImpactSet, Deltas)
|
||||
├─ StellaOps.Scheduler.Storage.Mongo/ # schedules, runs, cursors, locks
|
||||
├─ StellaOps.Scheduler.Queue/ # Redis Streams / NATS abstraction
|
||||
├─ StellaOps.Scheduler.Tests.* # unit/integration/e2e
|
||||
```
|
||||
|
||||
**Deployables**:
|
||||
|
||||
* **Scheduler.WebService** (stateless)
|
||||
* **Scheduler.Worker** (scale‑out; planners + executors)
|
||||
|
||||
**Dependencies**: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Feedser, Vexer, MongoDB, Redis/NATS, (optional) Notify.
|
||||
|
||||
---
|
||||
|
||||
## 2) Core responsibilities
|
||||
|
||||
1. **Time‑based** runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
|
||||
2. **Event‑driven** runs: react to **Feedser export** and **Vexer export** deltas (changed product keys / advisories / claims).
|
||||
3. **Impact targeting**: map changes to **image sets** using a **global inverted index** built from Scanner’s per‑image **BOM‑Index** sidecars.
|
||||
4. **Run planning**: shard, pace, and rate‑limit jobs to avoid thundering herds.
|
||||
5. **Execution**: call Scanner **/reports (analysis‑only)** or **/scans (content‑refresh)**; aggregate **delta** results.
|
||||
6. **Events**: publish `rescan.delta` and `report.ready` summaries for **Notify** & **UI**.
|
||||
7. **Control plane**: CRUD schedules, **pause/resume**, dry‑run previews, audit.
|
||||
|
||||
---
|
||||
|
||||
## 3) Data model (Mongo)
|
||||
|
||||
**Database**: `scheduler`
|
||||
|
||||
* `schedules`
|
||||
|
||||
```
|
||||
{ _id, tenantId, name, enabled, whenCron, timezone,
|
||||
mode: "analysis-only" | "content-refresh",
|
||||
selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
|
||||
includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
|
||||
onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
|
||||
notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
|
||||
limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
|
||||
createdAt, updatedAt, createdBy, updatedBy }
|
||||
```
|
||||
|
||||
* `runs`
|
||||
|
||||
```
|
||||
{ _id, scheduleId?, tenantId, trigger: "cron|feedser|vexer|manual",
|
||||
reason?: { feedserExportId?, vexerExportId?, cursor? },
|
||||
state: "planning|queued|running|completed|error|cancelled",
|
||||
stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
|
||||
startedAt, finishedAt, error? }
|
||||
```
|
||||
|
||||
* `impact_cursors`
|
||||
|
||||
```
|
||||
{ _id: tenantId, feedserLastExportId, vexerLastExportId, updatedAt }
|
||||
```
|
||||
|
||||
* `locks` (singleton schedulers, run leases)
|
||||
|
||||
* `audit` (CRUD actions, run outcomes)
|
||||
|
||||
**Indexes**:
|
||||
|
||||
* `schedules` on `{tenantId, enabled}`, `{whenCron}`.
|
||||
* `runs` on `{tenantId, startedAt desc}`, `{state}`.
|
||||
* TTL optional for completed runs (e.g., 180 days).
|
||||
|
||||
---
|
||||
|
||||
## 4) ImpactIndex (global inverted index)
|
||||
|
||||
Goal: translate **change keys** → **image sets** in **milliseconds**.
|
||||
|
||||
**Source**: Scanner produces per‑image **BOM‑Index** sidecars (purls, and `usedByEntrypoint` bitmaps). Scheduler ingests/refreshes them to build a **global** index.
|
||||
|
||||
**Representation**:
|
||||
|
||||
* Assign **image IDs** (dense ints) to catalog images.
|
||||
* Keep **Roaring Bitmaps**:
|
||||
|
||||
* `Contains[purl] → bitmap(imageIds)`
|
||||
* `UsedBy[purl] → bitmap(imageIds)` (subset of Contains)
|
||||
* Optionally keep **Owner maps**: `{imageId → {tenantId, namespaces[], repos[]}}` for selection filters.
|
||||
* Persist in RocksDB/LMDB or Redis‑modules; cache hot shards in memory; snapshot to Mongo for cold start.
|
||||
|
||||
**Update paths**:
|
||||
|
||||
* On new/updated image SBOM: **merge** per‑image set into global maps.
|
||||
* On image remove/expiry: **clear** id from bitmaps.
|
||||
|
||||
**API (internal)**:
|
||||
|
||||
```csharp
|
||||
IImpactIndex {
|
||||
ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
|
||||
ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Feedser)
|
||||
ImpactSet ResolveAll(Selector sel); // for nightly
|
||||
}
|
||||
```
|
||||
|
||||
**Selector filters**: tenant, namespaces, repos, labels, digest allowlists, `includeTags` patterns.
|
||||
|
||||
---
|
||||
|
||||
## 5) External interfaces (REST)
|
||||
|
||||
Base path: `/api/v1/scheduler` (Authority OpToks; scopes: `scheduler.read`, `scheduler.admin`).
|
||||
|
||||
### 5.1 Schedules CRUD
|
||||
|
||||
* `POST /schedules` → create
|
||||
* `GET /schedules` → list (filter by tenant)
|
||||
* `GET /schedules/{id}` → details + next run
|
||||
* `PATCH /schedules/{id}` → pause/resume/update
|
||||
* `DELETE /schedules/{id}` → delete (soft delete, optional)
|
||||
|
||||
### 5.2 Run control & introspection
|
||||
|
||||
* `POST /run` — ad‑hoc run
|
||||
|
||||
```json
|
||||
{ "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
|
||||
```
|
||||
* `GET /runs` — list with paging
|
||||
* `GET /runs/{id}` — status, stats, links to deltas
|
||||
* `POST /runs/{id}/cancel` — best‑effort cancel
|
||||
|
||||
### 5.3 Previews (dry‑run)
|
||||
|
||||
* `POST /preview/impact` — returns **candidate count** and a small sample of impacted digests for given change keys or selection.
|
||||
|
||||
### 5.4 Event webhooks (optional push from Feedser/Vexer)
|
||||
|
||||
* `POST /events/feedser-export`
|
||||
|
||||
```json
|
||||
{ "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
|
||||
```
|
||||
* `POST /events/vexer-export`
|
||||
|
||||
```json
|
||||
{ "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
|
||||
```
|
||||
|
||||
**Security**: webhook requires **mTLS** or an **HMAC** `X-Scheduler-Signature` (Ed25519 / SHA‑256) plus Authority token.
|
||||
|
||||
---
|
||||
|
||||
## 6) Planner → Runner pipeline
|
||||
|
||||
### 6.1 Planning algorithm (event‑driven)
|
||||
|
||||
```
|
||||
On Export Event (Feedser/Vexer):
|
||||
keys = Normalize(change payload) # productKeys or vulnIds→productKeys
|
||||
usageOnly = schedule/policy hint? # default true
|
||||
sel = Selector for tenant/scope from schedules subscribed to events
|
||||
|
||||
impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
|
||||
impacted = ApplyOwnerFilters(impacted, sel) # namespaces/repos/labels
|
||||
impacted = DeduplicateByDigest(impacted)
|
||||
impacted = EnforceLimits(impacted, limits.maxJobs)
|
||||
shards = Shard(impacted, byHashPrefix, n=limits.parallelism)
|
||||
|
||||
For each shard:
|
||||
Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)
|
||||
```
|
||||
|
||||
**Fairness & pacing**
|
||||
|
||||
* Use **leaky bucket** per tenant and per registry host.
|
||||
* Prioritize **KEV‑tagged** and **critical** first if oversubscribed.
|
||||
|
||||
### 6.2 Nightly planning
|
||||
|
||||
```
|
||||
At cron tick:
|
||||
sel = resolve selection
|
||||
candidates = ImpactIndex.ResolveAll(sel)
|
||||
if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
|
||||
shard & enqueue as above
|
||||
```
|
||||
|
||||
### 6.3 Execution (Runner)
|
||||
|
||||
* Pop **RunSegment** job → for each image digest:
|
||||
|
||||
* **analysis‑only**: `POST scanner/reports { imageDigest, policyRevision? }`
|
||||
* **content‑refresh**: resolve tag→digest if needed; `POST scanner/scans { imageRef, attest? false }` then `POST /reports`
|
||||
* Collect **delta**: `newFindings`, `newCriticals`/`highs`, `links` (UI deep link, Rekor if present).
|
||||
* Persist per‑image outcome in `runs.{id}.stats` (incremental counters).
|
||||
* Emit `scheduler.rescan.delta` events to **Notify** only when **delta > 0** and matches severity rule.
|
||||
|
||||
---
|
||||
|
||||
## 7) Event model (outbound)
|
||||
|
||||
**Topic**: `rescan.delta` (internal bus → Notify; UI subscribes via backend).
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant": "tenant-01",
|
||||
"runId": "324af…",
|
||||
"imageDigest": "sha256:…",
|
||||
"newCriticals": 1,
|
||||
"newHigh": 2,
|
||||
"kevHits": ["CVE-2025-..."],
|
||||
"topFindings": [
|
||||
{ "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
|
||||
],
|
||||
"reportUrl": "https://ui/.../scans/sha256:.../report",
|
||||
"attestation": { "uuid":"rekor-uuid", "verified": true },
|
||||
"ts": "2025-10-18T03:12:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Also**: `report.ready` for “no‑change” summaries (digest + zero delta), which Notify can ignore by rule.
|
||||
|
||||
---
|
||||
|
||||
## 8) Security posture
|
||||
|
||||
* **AuthN/Z**: Authority OpToks with `aud=scheduler`; DPoP (preferred) or mTLS.
|
||||
* **Multi‑tenant**: every schedule, run, and event carries `tenantId`; ImpactIndex filters by tenant‑visible images.
|
||||
* **Webhook** callers (Feedser/Vexer) present **mTLS** or **HMAC** and Authority token.
|
||||
* **Input hardening**: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
|
||||
* **No secrets** in logs; redact tokens and signatures.
|
||||
|
||||
---
|
||||
|
||||
## 9) Observability & SLOs
|
||||
|
||||
**Metrics (Prometheus)**
|
||||
|
||||
* `scheduler.events_total{source, result}`
|
||||
* `scheduler.impact_resolve_seconds{quantile}`
|
||||
* `scheduler.images_selected_total{mode}`
|
||||
* `scheduler.jobs_enqueued_total{mode}`
|
||||
* `scheduler.run_latency_seconds{quantile}` // event → first verdict
|
||||
* `scheduler.delta_images_total{severity}`
|
||||
* `scheduler.rate_limited_total{reason}`
|
||||
|
||||
**Targets**
|
||||
|
||||
* Resolve 10k changed keys → impacted set in **<300 ms** (hot cache).
|
||||
* Event → first rescan verdict in **≤60 s** (p95).
|
||||
* Nightly coverage 50k images in **≤10 min** with 10 workers (analysis‑only).
|
||||
|
||||
**Tracing** (OTEL): spans `plan`, `resolve`, `enqueue`, `report_call`, `persist`, `emit`.
|
||||
|
||||
---
|
||||
|
||||
## 10) Configuration (YAML)
|
||||
|
||||
```yaml
|
||||
scheduler:
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
require: "dpop" # or "mtls"
|
||||
queue:
|
||||
kind: "redis" # or "nats"
|
||||
url: "redis://redis:6379/4"
|
||||
mongo:
|
||||
uri: "mongodb://mongo/scheduler"
|
||||
impactIndex:
|
||||
storage: "rocksdb" # "rocksdb" | "redis" | "memory"
|
||||
warmOnStart: true
|
||||
usageOnlyDefault: true
|
||||
limits:
|
||||
defaultRatePerSecond: 50
|
||||
defaultParallelism: 8
|
||||
maxJobsPerRun: 50000
|
||||
integrates:
|
||||
scannerUrl: "https://scanner-web.internal"
|
||||
feedserWebhook: true
|
||||
vexerWebhook: true
|
||||
notifications:
|
||||
emitBus: "internal" # deliver to Notify via internal bus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11) UI touch‑points
|
||||
|
||||
* **Schedules** page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
|
||||
* **Runs** page: timeline; heat‑map of deltas; drill‑down to affected images.
|
||||
* **Dry‑run preview** modal: “This Feedser export touches ~3,214 images; projected deltas: ~420 (34 KEV).”
|
||||
|
||||
---
|
||||
|
||||
## 12) Failure modes & degradations
|
||||
|
||||
| Condition | Behavior |
|
||||
| ------------------------------------ | ---------------------------------------------------------------------------------------- |
|
||||
| ImpactIndex cold / incomplete | Fall back to **All** selection for nightly; for events, cap to KEV+critical until warmed |
|
||||
| Feedser/Vexer webhook storm | Coalesce by exportId; debounce 30–60 s; keep last |
|
||||
| Scanner under load (429) | Backoff with jitter; respect per‑tenant/leaky bucket |
|
||||
| Oversubscription (too many impacted) | Prioritize KEV/critical first; spillover to next window; UI banner shows backlog |
|
||||
| Notify down | Buffer outbound events in queue (TTL 24h) |
|
||||
| Mongo slow | Cut batch sizes; sample‑log; alert ops; don’t drop runs unless critical |
|
||||
|
||||
---
|
||||
|
||||
## 13) Testing matrix
|
||||
|
||||
* **ImpactIndex**: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
|
||||
* **Planner**: dedupe, shard, fairness, limit enforcement, KEV prioritization.
|
||||
* **Runner**: parallel report calls, error backoff, partial failures, idempotency.
|
||||
* **End‑to‑end**: Feedser export → deltas visible in UI in ≤60 s.
|
||||
* **Security**: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
|
||||
* **Chaos**: drop scanner availability; simulate registry throttles (content‑refresh mode).
|
||||
* **Nightly**: cron tick correctness across timezones and DST.
|
||||
|
||||
---
|
||||
|
||||
## 14) Implementation notes
|
||||
|
||||
* **Language**: .NET 10 minimal API; Channels‑based pipeline; `System.Threading.RateLimiting`.
|
||||
* **Bitmaps**: Roaring via `RoaringBitmap` bindings; memory‑map large shards if RocksDB used.
|
||||
* **Cron**: Quartz‑style parser with timezone support; clock skew tolerated ±60 s.
|
||||
* **Dry‑run**: use ImpactIndex only; never call scanner.
|
||||
* **Idempotency**: run segments carry deterministic keys; retries safe.
|
||||
* **Backpressure**: per‑tenant buckets; per‑host registry budgets respected when content‑refresh enabled.
|
||||
|
||||
---
|
||||
|
||||
## 15) Sequences (representative)
|
||||
|
||||
**A) Event‑driven rescan (Feedser delta)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant FE as Feedser
|
||||
participant SCH as Scheduler.Worker
|
||||
participant IDX as ImpactIndex
|
||||
participant SC as Scanner.WebService
|
||||
participant NO as Notify
|
||||
|
||||
FE->>SCH: POST /events/feedser-export {exportId, changedProductKeys}
|
||||
SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
|
||||
IDX-->>SCH: bitmap(imageIds) → digests list
|
||||
SCH->>SC: POST /reports {imageDigest} (batch/sequenced)
|
||||
SC-->>SCH: report deltas (new criticals/highs)
|
||||
alt delta>0
|
||||
SCH->>NO: rescan.delta {digest, newCriticals, links}
|
||||
end
|
||||
```
|
||||
|
||||
**B) Nightly rescan**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant CRON as Cron
|
||||
participant SCH as Scheduler.Worker
|
||||
participant IDX as ImpactIndex
|
||||
participant SC as Scanner.WebService
|
||||
|
||||
CRON->>SCH: tick (02:00 Europe/Sofia)
|
||||
SCH->>IDX: ResolveAll(selector)
|
||||
IDX-->>SCH: candidates
|
||||
SCH->>SC: POST /reports {digest} (paced)
|
||||
SC-->>SCH: results
|
||||
SCH-->>SCH: aggregate, store run stats
|
||||
```
|
||||
|
||||
**C) Content‑refresh (tag followers)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant SCH as Scheduler
|
||||
participant SC as Scanner
|
||||
SCH->>SC: resolve tag→digest (if changed)
|
||||
alt digest changed
|
||||
SCH->>SC: POST /scans {imageRef} # new SBOM
|
||||
SC-->>SCH: scan complete (artifacts)
|
||||
SCH->>SC: POST /reports {imageDigest}
|
||||
else unchanged
|
||||
SCH->>SC: POST /reports {imageDigest} # analysis-only
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 16) Roadmap
|
||||
|
||||
* **Vuln‑centric impact**: pre‑join vuln→purl→images to rank by **KEV** and **exploited‑in‑the‑wild** signals.
|
||||
* **Policy diff preview**: when a staged policy changes, show projected breakage set before promotion.
|
||||
* **Cross‑cluster federation**: one Scheduler instance driving many Scanner clusters (tenant isolation).
|
||||
* **Windows containers**: integrate Zastava runtime hints for Usage view tightening.
|
||||
|
||||
---
|
||||
|
||||
**End — component_architecture_scheduler.md**
|
||||
|
||||
> **Scope.** Implementation‑ready architecture for **Scheduler**: a service that (1) **re‑evaluates** already‑cataloged images when intel changes (Conselier/Excitor/policy), (2) orchestrates **nightly** and **ad‑hoc** runs, (3) targets only the **impacted** images using the BOM‑Index, and (4) emits **report‑ready** events that downstream **Notify** fans out. Default mode is **analysis‑only** (no image pull); optional **content‑refresh** can be enabled per schedule.
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & boundaries
|
||||
|
||||
**Mission.** Keep scan results **current** without rescanning the world. When new advisories or VEX claims land, **pinpoint** affected images and ask the backend to recompute **verdicts** against the **existing SBOMs**. Surface only **meaningful deltas** to humans and ticket queues.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Scheduler **does not** compute SBOMs and **does not** sign. It calls Scanner/WebService’s **/reports (analysis‑only)** endpoint and lets the backend (Policy + Excitor + Conselier) decide PASS/FAIL.
|
||||
* Scheduler **may** ask Scanner to **content‑refresh** selected targets (e.g., mutable tags) but the default is **no** image pull.
|
||||
* Notifications are **not** sent directly; Scheduler emits events consumed by **Notify**.
|
||||
|
||||
---
|
||||
|
||||
## 1) Runtime shape & projects
|
||||
|
||||
```
|
||||
src/
|
||||
├─ StellaOps.Scheduler.WebService/ # REST (schedules CRUD, runs, admin)
|
||||
├─ StellaOps.Scheduler.Worker/ # planners + runners (N replicas)
|
||||
├─ StellaOps.Scheduler.ImpactIndex/ # purl→images inverted index (roaring bitmaps)
|
||||
├─ StellaOps.Scheduler.Models/ # DTOs (Schedule, Run, ImpactSet, Deltas)
|
||||
├─ StellaOps.Scheduler.Storage.Mongo/ # schedules, runs, cursors, locks
|
||||
├─ StellaOps.Scheduler.Queue/ # Redis Streams / NATS abstraction
|
||||
├─ StellaOps.Scheduler.Tests.* # unit/integration/e2e
|
||||
```
|
||||
|
||||
**Deployables**:
|
||||
|
||||
* **Scheduler.WebService** (stateless)
|
||||
* **Scheduler.Worker** (scale‑out; planners + executors)
|
||||
|
||||
**Dependencies**: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Conselier, Excitor, MongoDB, Redis/NATS, (optional) Notify.
|
||||
|
||||
---
|
||||
|
||||
## 2) Core responsibilities
|
||||
|
||||
1. **Time‑based** runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
|
||||
2. **Event‑driven** runs: react to **Conselier export** and **Excitor export** deltas (changed product keys / advisories / claims).
|
||||
3. **Impact targeting**: map changes to **image sets** using a **global inverted index** built from Scanner’s per‑image **BOM‑Index** sidecars.
|
||||
4. **Run planning**: shard, pace, and rate‑limit jobs to avoid thundering herds.
|
||||
5. **Execution**: call Scanner **/reports (analysis‑only)** or **/scans (content‑refresh)**; aggregate **delta** results.
|
||||
6. **Events**: publish `rescan.delta` and `report.ready` summaries for **Notify** & **UI**.
|
||||
7. **Control plane**: CRUD schedules, **pause/resume**, dry‑run previews, audit.
|
||||
|
||||
---
|
||||
|
||||
## 3) Data model (Mongo)
|
||||
|
||||
**Database**: `scheduler`
|
||||
|
||||
* `schedules`
|
||||
|
||||
```
|
||||
{ _id, tenantId, name, enabled, whenCron, timezone,
|
||||
mode: "analysis-only" | "content-refresh",
|
||||
selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
|
||||
includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
|
||||
onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
|
||||
notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
|
||||
limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
|
||||
createdAt, updatedAt, createdBy, updatedBy }
|
||||
```
|
||||
|
||||
* `runs`
|
||||
|
||||
```
|
||||
{ _id, scheduleId?, tenantId, trigger: "cron|conselier|excitor|manual",
|
||||
reason?: { conselierExportId?, excitorExportId?, cursor? },
|
||||
state: "planning|queued|running|completed|error|cancelled",
|
||||
stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
|
||||
startedAt, finishedAt, error? }
|
||||
```
|
||||
|
||||
* `impact_cursors`
|
||||
|
||||
```
|
||||
{ _id: tenantId, conselierLastExportId, excitorLastExportId, updatedAt }
|
||||
```
|
||||
|
||||
* `locks` (singleton schedulers, run leases)
|
||||
|
||||
* `audit` (CRUD actions, run outcomes)
|
||||
|
||||
**Indexes**:
|
||||
|
||||
* `schedules` on `{tenantId, enabled}`, `{whenCron}`.
|
||||
* `runs` on `{tenantId, startedAt desc}`, `{state}`.
|
||||
* TTL optional for completed runs (e.g., 180 days).
|
||||
|
||||
---
|
||||
|
||||
## 4) ImpactIndex (global inverted index)
|
||||
|
||||
Goal: translate **change keys** → **image sets** in **milliseconds**.
|
||||
|
||||
**Source**: Scanner produces per‑image **BOM‑Index** sidecars (purls, and `usedByEntrypoint` bitmaps). Scheduler ingests/refreshes them to build a **global** index.
|
||||
|
||||
**Representation**:
|
||||
|
||||
* Assign **image IDs** (dense ints) to catalog images.
|
||||
* Keep **Roaring Bitmaps**:
|
||||
|
||||
* `Contains[purl] → bitmap(imageIds)`
|
||||
* `UsedBy[purl] → bitmap(imageIds)` (subset of Contains)
|
||||
* Optionally keep **Owner maps**: `{imageId → {tenantId, namespaces[], repos[]}}` for selection filters.
|
||||
* Persist in RocksDB/LMDB or Redis‑modules; cache hot shards in memory; snapshot to Mongo for cold start.
|
||||
|
||||
**Update paths**:
|
||||
|
||||
* On new/updated image SBOM: **merge** per‑image set into global maps.
|
||||
* On image remove/expiry: **clear** id from bitmaps.
|
||||
|
||||
**API (internal)**:
|
||||
|
||||
```csharp
|
||||
IImpactIndex {
|
||||
ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
|
||||
ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Conselier)
|
||||
ImpactSet ResolveAll(Selector sel); // for nightly
|
||||
}
|
||||
```
|
||||
|
||||
**Selector filters**: tenant, namespaces, repos, labels, digest allowlists, `includeTags` patterns.
|
||||
|
||||
---
|
||||
|
||||
## 5) External interfaces (REST)
|
||||
|
||||
Base path: `/api/v1/scheduler` (Authority OpToks; scopes: `scheduler.read`, `scheduler.admin`).
|
||||
|
||||
### 5.1 Schedules CRUD
|
||||
|
||||
* `POST /schedules` → create
|
||||
* `GET /schedules` → list (filter by tenant)
|
||||
* `GET /schedules/{id}` → details + next run
|
||||
* `PATCH /schedules/{id}` → pause/resume/update
|
||||
* `DELETE /schedules/{id}` → delete (soft delete, optional)
|
||||
|
||||
### 5.2 Run control & introspection
|
||||
|
||||
* `POST /run` — ad‑hoc run
|
||||
|
||||
```json
|
||||
{ "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
|
||||
```
|
||||
* `GET /runs` — list with paging
|
||||
* `GET /runs/{id}` — status, stats, links to deltas
|
||||
* `POST /runs/{id}/cancel` — best‑effort cancel
|
||||
|
||||
### 5.3 Previews (dry‑run)
|
||||
|
||||
* `POST /preview/impact` — returns **candidate count** and a small sample of impacted digests for given change keys or selection.
|
||||
|
||||
### 5.4 Event webhooks (optional push from Conselier/Excitor)
|
||||
|
||||
* `POST /events/conselier-export`
|
||||
|
||||
```json
|
||||
{ "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
|
||||
```
|
||||
* `POST /events/excitor-export`
|
||||
|
||||
```json
|
||||
{ "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
|
||||
```
|
||||
|
||||
**Security**: webhook requires **mTLS** or an **HMAC** `X-Scheduler-Signature` (Ed25519 / SHA‑256) plus Authority token.
|
||||
|
||||
---
|
||||
|
||||
## 6) Planner → Runner pipeline
|
||||
|
||||
### 6.1 Planning algorithm (event‑driven)
|
||||
|
||||
```
|
||||
On Export Event (Conselier/Excitor):
|
||||
keys = Normalize(change payload) # productKeys or vulnIds→productKeys
|
||||
usageOnly = schedule/policy hint? # default true
|
||||
sel = Selector for tenant/scope from schedules subscribed to events
|
||||
|
||||
impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
|
||||
impacted = ApplyOwnerFilters(impacted, sel) # namespaces/repos/labels
|
||||
impacted = DeduplicateByDigest(impacted)
|
||||
impacted = EnforceLimits(impacted, limits.maxJobs)
|
||||
shards = Shard(impacted, byHashPrefix, n=limits.parallelism)
|
||||
|
||||
For each shard:
|
||||
Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)
|
||||
```
|
||||
|
||||
**Fairness & pacing**
|
||||
|
||||
* Use **leaky bucket** per tenant and per registry host.
|
||||
* Prioritize **KEV‑tagged** and **critical** first if oversubscribed.
|
||||
|
||||
### 6.2 Nightly planning
|
||||
|
||||
```
|
||||
At cron tick:
|
||||
sel = resolve selection
|
||||
candidates = ImpactIndex.ResolveAll(sel)
|
||||
if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
|
||||
shard & enqueue as above
|
||||
```
|
||||
|
||||
### 6.3 Execution (Runner)
|
||||
|
||||
* Pop **RunSegment** job → for each image digest:
|
||||
|
||||
* **analysis‑only**: `POST scanner/reports { imageDigest, policyRevision? }`
|
||||
* **content‑refresh**: resolve tag→digest if needed; `POST scanner/scans { imageRef, attest? false }` then `POST /reports`
|
||||
* Collect **delta**: `newFindings`, `newCriticals`/`highs`, `links` (UI deep link, Rekor if present).
|
||||
* Persist per‑image outcome in `runs.{id}.stats` (incremental counters).
|
||||
* Emit `scheduler.rescan.delta` events to **Notify** only when **delta > 0** and matches severity rule.
|
||||
|
||||
---
|
||||
|
||||
## 7) Event model (outbound)
|
||||
|
||||
**Topic**: `rescan.delta` (internal bus → Notify; UI subscribes via backend).
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant": "tenant-01",
|
||||
"runId": "324af…",
|
||||
"imageDigest": "sha256:…",
|
||||
"newCriticals": 1,
|
||||
"newHigh": 2,
|
||||
"kevHits": ["CVE-2025-..."],
|
||||
"topFindings": [
|
||||
{ "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
|
||||
],
|
||||
"reportUrl": "https://ui/.../scans/sha256:.../report",
|
||||
"attestation": { "uuid":"rekor-uuid", "verified": true },
|
||||
"ts": "2025-10-18T03:12:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Also**: `report.ready` for “no‑change” summaries (digest + zero delta), which Notify can ignore by rule.
|
||||
|
||||
---
|
||||
|
||||
## 8) Security posture
|
||||
|
||||
* **AuthN/Z**: Authority OpToks with `aud=scheduler`; DPoP (preferred) or mTLS.
|
||||
* **Multi‑tenant**: every schedule, run, and event carries `tenantId`; ImpactIndex filters by tenant‑visible images.
|
||||
* **Webhook** callers (Conselier/Excitor) present **mTLS** or **HMAC** and Authority token.
|
||||
* **Input hardening**: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
|
||||
* **No secrets** in logs; redact tokens and signatures.
|
||||
|
||||
---
|
||||
|
||||
## 9) Observability & SLOs
|
||||
|
||||
**Metrics (Prometheus)**
|
||||
|
||||
* `scheduler.events_total{source, result}`
|
||||
* `scheduler.impact_resolve_seconds{quantile}`
|
||||
* `scheduler.images_selected_total{mode}`
|
||||
* `scheduler.jobs_enqueued_total{mode}`
|
||||
* `scheduler.run_latency_seconds{quantile}` // event → first verdict
|
||||
* `scheduler.delta_images_total{severity}`
|
||||
* `scheduler.rate_limited_total{reason}`
|
||||
|
||||
**Targets**
|
||||
|
||||
* Resolve 10k changed keys → impacted set in **<300 ms** (hot cache).
|
||||
* Event → first rescan verdict in **≤60 s** (p95).
|
||||
* Nightly coverage 50k images in **≤10 min** with 10 workers (analysis‑only).
|
||||
|
||||
**Tracing** (OTEL): spans `plan`, `resolve`, `enqueue`, `report_call`, `persist`, `emit`.
|
||||
|
||||
---
|
||||
|
||||
## 10) Configuration (YAML)
|
||||
|
||||
```yaml
|
||||
scheduler:
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
require: "dpop" # or "mtls"
|
||||
queue:
|
||||
kind: "redis" # or "nats"
|
||||
url: "redis://redis:6379/4"
|
||||
mongo:
|
||||
uri: "mongodb://mongo/scheduler"
|
||||
impactIndex:
|
||||
storage: "rocksdb" # "rocksdb" | "redis" | "memory"
|
||||
warmOnStart: true
|
||||
usageOnlyDefault: true
|
||||
limits:
|
||||
defaultRatePerSecond: 50
|
||||
defaultParallelism: 8
|
||||
maxJobsPerRun: 50000
|
||||
integrates:
|
||||
scannerUrl: "https://scanner-web.internal"
|
||||
conselierWebhook: true
|
||||
excitorWebhook: true
|
||||
notifications:
|
||||
emitBus: "internal" # deliver to Notify via internal bus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11) UI touch‑points
|
||||
|
||||
* **Schedules** page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
|
||||
* **Runs** page: timeline; heat‑map of deltas; drill‑down to affected images.
|
||||
* **Dry‑run preview** modal: “This Conselier export touches ~3,214 images; projected deltas: ~420 (34 KEV).”
|
||||
|
||||
---
|
||||
|
||||
## 12) Failure modes & degradations
|
||||
|
||||
| Condition | Behavior |
|
||||
| ------------------------------------ | ---------------------------------------------------------------------------------------- |
|
||||
| ImpactIndex cold / incomplete | Fall back to **All** selection for nightly; for events, cap to KEV+critical until warmed |
|
||||
| Conselier/Excitor webhook storm | Coalesce by exportId; debounce 30–60 s; keep last |
|
||||
| Scanner under load (429) | Backoff with jitter; respect per‑tenant/leaky bucket |
|
||||
| Oversubscription (too many impacted) | Prioritize KEV/critical first; spillover to next window; UI banner shows backlog |
|
||||
| Notify down | Buffer outbound events in queue (TTL 24h) |
|
||||
| Mongo slow | Cut batch sizes; sample‑log; alert ops; don’t drop runs unless critical |
|
||||
|
||||
---
|
||||
|
||||
## 13) Testing matrix
|
||||
|
||||
* **ImpactIndex**: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
|
||||
* **Planner**: dedupe, shard, fairness, limit enforcement, KEV prioritization.
|
||||
* **Runner**: parallel report calls, error backoff, partial failures, idempotency.
|
||||
* **End‑to‑end**: Conselier export → deltas visible in UI in ≤60 s.
|
||||
* **Security**: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
|
||||
* **Chaos**: drop scanner availability; simulate registry throttles (content‑refresh mode).
|
||||
* **Nightly**: cron tick correctness across timezones and DST.
|
||||
|
||||
---
|
||||
|
||||
## 14) Implementation notes
|
||||
|
||||
* **Language**: .NET 10 minimal API; Channels‑based pipeline; `System.Threading.RateLimiting`.
|
||||
* **Bitmaps**: Roaring via `RoaringBitmap` bindings; memory‑map large shards if RocksDB used.
|
||||
* **Cron**: Quartz‑style parser with timezone support; clock skew tolerated ±60 s.
|
||||
* **Dry‑run**: use ImpactIndex only; never call scanner.
|
||||
* **Idempotency**: run segments carry deterministic keys; retries safe.
|
||||
* **Backpressure**: per‑tenant buckets; per‑host registry budgets respected when content‑refresh enabled.
|
||||
|
||||
---
|
||||
|
||||
## 15) Sequences (representative)
|
||||
|
||||
**A) Event‑driven rescan (Conselier delta)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant FE as Conselier
|
||||
participant SCH as Scheduler.Worker
|
||||
participant IDX as ImpactIndex
|
||||
participant SC as Scanner.WebService
|
||||
participant NO as Notify
|
||||
|
||||
FE->>SCH: POST /events/conselier-export {exportId, changedProductKeys}
|
||||
SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
|
||||
IDX-->>SCH: bitmap(imageIds) → digests list
|
||||
SCH->>SC: POST /reports {imageDigest} (batch/sequenced)
|
||||
SC-->>SCH: report deltas (new criticals/highs)
|
||||
alt delta>0
|
||||
SCH->>NO: rescan.delta {digest, newCriticals, links}
|
||||
end
|
||||
```
|
||||
|
||||
**B) Nightly rescan**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant CRON as Cron
|
||||
participant SCH as Scheduler.Worker
|
||||
participant IDX as ImpactIndex
|
||||
participant SC as Scanner.WebService
|
||||
|
||||
CRON->>SCH: tick (02:00 Europe/Sofia)
|
||||
SCH->>IDX: ResolveAll(selector)
|
||||
IDX-->>SCH: candidates
|
||||
SCH->>SC: POST /reports {digest} (paced)
|
||||
SC-->>SCH: results
|
||||
SCH-->>SCH: aggregate, store run stats
|
||||
```
|
||||
|
||||
**C) Content‑refresh (tag followers)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant SCH as Scheduler
|
||||
participant SC as Scanner
|
||||
SCH->>SC: resolve tag→digest (if changed)
|
||||
alt digest changed
|
||||
SCH->>SC: POST /scans {imageRef} # new SBOM
|
||||
SC-->>SCH: scan complete (artifacts)
|
||||
SCH->>SC: POST /reports {imageDigest}
|
||||
else unchanged
|
||||
SCH->>SC: POST /reports {imageDigest} # analysis-only
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 16) Roadmap
|
||||
|
||||
* **Vuln‑centric impact**: pre‑join vuln→purl→images to rank by **KEV** and **exploited‑in‑the‑wild** signals.
|
||||
* **Policy diff preview**: when a staged policy changes, show projected breakage set before promotion.
|
||||
* **Cross‑cluster federation**: one Scheduler instance driving many Scanner clusters (tenant isolation).
|
||||
* **Windows containers**: integrate Zastava runtime hints for Usage view tightening.
|
||||
|
||||
---
|
||||
|
||||
**End — component_architecture_scheduler.md**
|
||||
|
||||
@@ -1,82 +1,82 @@
|
||||
# Scheduler Worker – Observability & Runbook
|
||||
|
||||
## Purpose
|
||||
Monitor planner and runner health for the Scheduler Worker (Sprint 16 telemetry). The new .NET meters surface queue throughput, latency, backlog, and delta severities so operators can detect stalled runs before rescan SLAs slip.
|
||||
|
||||
> **Grafana note:** Import `docs/modules/scheduler/operations/worker-grafana-dashboard.json` into the Prometheus-backed Grafana stack that scrapes the OpenTelemetry Collector.
|
||||
|
||||
---
|
||||
|
||||
## Key metrics
|
||||
|
||||
| Metric | Use case | Suggested query |
|
||||
| --- | --- | --- |
|
||||
| `scheduler_planner_runs_total{status}` | Planner throughput & failure ratio | `sum by (status) (rate(scheduler_planner_runs_total[5m]))` |
|
||||
| `scheduler_planner_latency_seconds_bucket` | Planning latency (p95 / p99) | `histogram_quantile(0.95, sum by (le) (rate(scheduler_planner_latency_seconds_bucket[5m])))` |
|
||||
| `scheduler_runner_segments_total{status}` | Runner success vs retries | `sum by (status) (rate(scheduler_runner_segments_total[5m]))` |
|
||||
| `scheduler_runner_delta_{critical,high,total}` | Newly-detected findings | `sum(rate(scheduler_runner_delta_critical_total[5m]))` |
|
||||
| `scheduler_runner_backlog{scheduleId}` | Remaining digests awaiting runner | `max by (scheduleId) (scheduler_runner_backlog)` |
|
||||
| `scheduler_runs_active{mode}` | Active runs in-flight | `sum(scheduler_runs_active)` |
|
||||
|
||||
Reference queries power the bundled Grafana dashboard panels. Use the `mode` template variable to focus on `analysisOnly` versus `contentRefresh` schedules.
|
||||
|
||||
---
|
||||
|
||||
## Grafana dashboard
|
||||
|
||||
1. Import `docs/modules/scheduler/operations/worker-grafana-dashboard.json` (UID `scheduler-worker-observability`).
|
||||
2. Point the `datasource` variable to the Prometheus instance scraping the collector. Optional: pin the `mode` variable to a specific schedule mode.
|
||||
3. Panels included:
|
||||
- **Planner Runs per Status** – visualises success vs failure ratio.
|
||||
- **Planner Latency P95** – highlights degradations in ImpactIndex or Mongo lookups.
|
||||
- **Runner Segments per Status** – shows retry pressure and queue health.
|
||||
- **New Findings per Severity** – rolls up delta counters (critical/high/total).
|
||||
- **Runner Backlog by Schedule** – tabulates outstanding digests per schedule.
|
||||
- **Active Runs** – stat panel showing the current number of in-flight runs.
|
||||
|
||||
Capture screenshots once Grafana provisioning completes and store them under `docs/assets/dashboards/` (pending automation ticket OBS-157).
|
||||
|
||||
---
|
||||
|
||||
## Prometheus alerts
|
||||
|
||||
Import `docs/modules/scheduler/operations/worker-prometheus-rules.yaml` into your Prometheus rule configuration. The bundle defines:
|
||||
|
||||
- **SchedulerPlannerFailuresHigh** – 5%+ of planner runs failed for 10 minutes. Page SRE.
|
||||
- **SchedulerPlannerLatencyHigh** – planner p95 latency remains above 45 s for 10 minutes. Investigate ImpactIndex, Mongo, and Feedser/Vexer event queues.
|
||||
- **SchedulerRunnerBacklogGrowing** – backlog exceeded 500 images for 15 minutes. Inspect runner workers, Scanner availability, and rate limiting.
|
||||
- **SchedulerRunStuck** – active run count stayed flat for 30 minutes while remaining non-zero. Check stuck segments, expired leases, and scanner retries.
|
||||
|
||||
Hook these alerts into the existing Observability notification pathway (`observability-pager` routing key) and ensure `service=scheduler-worker` is mapped to the on-call rotation.
|
||||
|
||||
---
|
||||
|
||||
## Runbook snapshot
|
||||
|
||||
1. **Planner failure/latency:**
|
||||
- Check Planner logs for ImpactIndex or Mongo exceptions.
|
||||
- Verify Feedser/Vexer webhook health; requeue events if necessary.
|
||||
- If planner is overwhelmed, temporarily reduce schedule parallelism via `stella scheduler schedule update`.
|
||||
2. **Runner backlog spike:**
|
||||
- Confirm Scanner WebService health (`/healthz`).
|
||||
- Inspect runner queue for stuck segments; consider increasing runner workers or scaling scanner capacity.
|
||||
- Review rate limits (schedule limits, ImpactIndex throughput) before changing global throttles.
|
||||
3. **Stuck runs:**
|
||||
- Use `stella scheduler runs list --state running` to identify affected runs.
|
||||
- Drill into Grafana panel “Runner Backlog by Schedule” to see offending schedule IDs.
|
||||
- If a segment will not progress, use `stella scheduler segments release --segment <id>` to force retry after resolving root cause.
|
||||
4. **Unexpected critical deltas:**
|
||||
- Correlate `scheduler_runner_delta_critical_total` spikes with Notify events (`scheduler.rescan.delta`).
|
||||
- Pivot to Scanner report links for impacted digests and confirm they match upstream advisories/policies.
|
||||
|
||||
Document incidents and mitigation in `ops/runbooks/INCIDENT_LOG.md` (per SRE policy) and attach Grafana screenshots for post-mortems.
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Grafana dashboard imported and wired to Prometheus datasource.
|
||||
- [ ] Prometheus alert rules deployed (see above).
|
||||
- [ ] Runbook linked from on-call rotation portal.
|
||||
- [ ] Observability Guild sign-off captured for Sprint 16 telemetry (OWNER: @obs-guild).
|
||||
|
||||
# Scheduler Worker – Observability & Runbook
|
||||
|
||||
## Purpose
|
||||
Monitor planner and runner health for the Scheduler Worker (Sprint 16 telemetry). The new .NET meters surface queue throughput, latency, backlog, and delta severities so operators can detect stalled runs before rescan SLAs slip.
|
||||
|
||||
> **Grafana note:** Import `docs/modules/scheduler/operations/worker-grafana-dashboard.json` into the Prometheus-backed Grafana stack that scrapes the OpenTelemetry Collector.
|
||||
|
||||
---
|
||||
|
||||
## Key metrics
|
||||
|
||||
| Metric | Use case | Suggested query |
|
||||
| --- | --- | --- |
|
||||
| `scheduler_planner_runs_total{status}` | Planner throughput & failure ratio | `sum by (status) (rate(scheduler_planner_runs_total[5m]))` |
|
||||
| `scheduler_planner_latency_seconds_bucket` | Planning latency (p95 / p99) | `histogram_quantile(0.95, sum by (le) (rate(scheduler_planner_latency_seconds_bucket[5m])))` |
|
||||
| `scheduler_runner_segments_total{status}` | Runner success vs retries | `sum by (status) (rate(scheduler_runner_segments_total[5m]))` |
|
||||
| `scheduler_runner_delta_{critical,high,total}` | Newly-detected findings | `sum(rate(scheduler_runner_delta_critical_total[5m]))` |
|
||||
| `scheduler_runner_backlog{scheduleId}` | Remaining digests awaiting runner | `max by (scheduleId) (scheduler_runner_backlog)` |
|
||||
| `scheduler_runs_active{mode}` | Active runs in-flight | `sum(scheduler_runs_active)` |
|
||||
|
||||
Reference queries power the bundled Grafana dashboard panels. Use the `mode` template variable to focus on `analysisOnly` versus `contentRefresh` schedules.
|
||||
|
||||
---
|
||||
|
||||
## Grafana dashboard
|
||||
|
||||
1. Import `docs/modules/scheduler/operations/worker-grafana-dashboard.json` (UID `scheduler-worker-observability`).
|
||||
2. Point the `datasource` variable to the Prometheus instance scraping the collector. Optional: pin the `mode` variable to a specific schedule mode.
|
||||
3. Panels included:
|
||||
- **Planner Runs per Status** – visualises success vs failure ratio.
|
||||
- **Planner Latency P95** – highlights degradations in ImpactIndex or Mongo lookups.
|
||||
- **Runner Segments per Status** – shows retry pressure and queue health.
|
||||
- **New Findings per Severity** – rolls up delta counters (critical/high/total).
|
||||
- **Runner Backlog by Schedule** – tabulates outstanding digests per schedule.
|
||||
- **Active Runs** – stat panel showing the current number of in-flight runs.
|
||||
|
||||
Capture screenshots once Grafana provisioning completes and store them under `docs/assets/dashboards/` (pending automation ticket OBS-157).
|
||||
|
||||
---
|
||||
|
||||
## Prometheus alerts
|
||||
|
||||
Import `docs/modules/scheduler/operations/worker-prometheus-rules.yaml` into your Prometheus rule configuration. The bundle defines:
|
||||
|
||||
- **SchedulerPlannerFailuresHigh** – 5%+ of planner runs failed for 10 minutes. Page SRE.
|
||||
- **SchedulerPlannerLatencyHigh** – planner p95 latency remains above 45 s for 10 minutes. Investigate ImpactIndex, Mongo, and Conselier/Excitor event queues.
|
||||
- **SchedulerRunnerBacklogGrowing** – backlog exceeded 500 images for 15 minutes. Inspect runner workers, Scanner availability, and rate limiting.
|
||||
- **SchedulerRunStuck** – active run count stayed flat for 30 minutes while remaining non-zero. Check stuck segments, expired leases, and scanner retries.
|
||||
|
||||
Hook these alerts into the existing Observability notification pathway (`observability-pager` routing key) and ensure `service=scheduler-worker` is mapped to the on-call rotation.
|
||||
|
||||
---
|
||||
|
||||
## Runbook snapshot
|
||||
|
||||
1. **Planner failure/latency:**
|
||||
- Check Planner logs for ImpactIndex or Mongo exceptions.
|
||||
- Verify Conselier/Excitor webhook health; requeue events if necessary.
|
||||
- If planner is overwhelmed, temporarily reduce schedule parallelism via `stella scheduler schedule update`.
|
||||
2. **Runner backlog spike:**
|
||||
- Confirm Scanner WebService health (`/healthz`).
|
||||
- Inspect runner queue for stuck segments; consider increasing runner workers or scaling scanner capacity.
|
||||
- Review rate limits (schedule limits, ImpactIndex throughput) before changing global throttles.
|
||||
3. **Stuck runs:**
|
||||
- Use `stella scheduler runs list --state running` to identify affected runs.
|
||||
- Drill into Grafana panel “Runner Backlog by Schedule” to see offending schedule IDs.
|
||||
- If a segment will not progress, use `stella scheduler segments release --segment <id>` to force retry after resolving root cause.
|
||||
4. **Unexpected critical deltas:**
|
||||
- Correlate `scheduler_runner_delta_critical_total` spikes with Notify events (`scheduler.rescan.delta`).
|
||||
- Pivot to Scanner report links for impacted digests and confirm they match upstream advisories/policies.
|
||||
|
||||
Document incidents and mitigation in `ops/runbooks/INCIDENT_LOG.md` (per SRE policy) and attach Grafana screenshots for post-mortems.
|
||||
|
||||
---
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Grafana dashboard imported and wired to Prometheus datasource.
|
||||
- [ ] Prometheus alert rules deployed (see above).
|
||||
- [ ] Runbook linked from on-call rotation portal.
|
||||
- [ ] Observability Guild sign-off captured for Sprint 16 telemetry (OWNER: @obs-guild).
|
||||
|
||||
|
||||
@@ -1,63 +1,63 @@
|
||||
# Implementation plan — VEX Consensus Lens
|
||||
|
||||
## Delivery phases
|
||||
- **Phase 1 – Core lens service**
|
||||
Build normalisation pipeline (CSAF/OpenVEX/CycloneDX), product mapping library, trust weighting functions, consensus algorithm, and persistence (`vex_consensus`, history, conflicts).
|
||||
- **Phase 2 – API & integrations**
|
||||
Expose `/vex/consensus` query/detail/simulate/export endpoints, integrate Policy Engine thresholds, Vuln Explorer UI chips, and VEX Lens change events.
|
||||
- **Phase 3 – Issuer Directory & signatures**
|
||||
Deliver issuer registry, key management, signature verification, RBAC, audit logs, and tenant overrides.
|
||||
- **Phase 4 – Console & CLI experiences**
|
||||
Ship Console module (lists, evidence table, quorum bar, conflicts, simulation drawer) and CLI commands (`stella vex consensus ...`) with export support.
|
||||
- **Phase 5 – Recompute & performance**
|
||||
Implement recompute scheduling (policy activation, Excitator deltas), caching, load tests (10M records/tenant), observability dashboards, and Offline Kit exports.
|
||||
|
||||
## Work breakdown
|
||||
- **VEX Lens service**
|
||||
- Normalise VEX payloads, maintain scope scores, compute consensus digest.
|
||||
- Trust weighting functions (issuer tier, freshness decay, scope quality).
|
||||
- Idempotent workers for consensus projection and history tracking.
|
||||
- Conflict handling queue for manual review and notifications.
|
||||
- **Integrations**
|
||||
- Excitator: enrich VEX events with issuer hints, signatures, product trees.
|
||||
- Policy Engine: trust knobs, simulation endpoints, policy-driven recompute.
|
||||
- Vuln Explorer & Advisory AI: consensus badges, conflict surfacing.
|
||||
- **Issuer Directory**
|
||||
- CRUD for issuers/keys, audit logs, import CSAF publishers, tenant overrides.
|
||||
- Signature verification endpoints consumed by Lens.
|
||||
- **APIs & UX**
|
||||
- REST endpoints for query/detail/conflict export, trust weight updates.
|
||||
- Console module with filters, saved views, evidence table, simulation drawer.
|
||||
- CLI commands for list/show/simulate/export with JSON/CSV output.
|
||||
- **Observability & Ops**
|
||||
- Metrics (consensus latency, conflict rate, signature failures, cache hit rate), logs, traces.
|
||||
- Dashboards + runbooks for recompute storms, mapping failures, signature errors, quota breaches.
|
||||
- Offline exports for Export Center/Offline Kit.
|
||||
|
||||
## Acceptance criteria
|
||||
- Consensus results reproducible across supported VEX formats with deterministic digests and provenance.
|
||||
- Signature verification influences trust weights; unverifiable evidence is down-weighted without pipeline failure.
|
||||
- Policy simulations show quorum shifts without persisting state; Vuln Explorer consumes consensus signals.
|
||||
- Issuer Directory enforces RBAC, audit logs, and key rotation; CLI & Console parity achieved.
|
||||
- Recompute pipeline handles Excitator deltas and policy activations with backpressure and incident surfacing.
|
||||
- Observability dashboards/alerts cover ingestion lag, conflict spikes, signature failures, performance budgets (P95 < 500 ms for 100-row pages at 10M records/tenant).
|
||||
|
||||
## Risks & mitigations
|
||||
- **Product mapping ambiguity:** conservative scope scoring, manual overrides, surfaced warnings, policy review hooks.
|
||||
- **Issuer compromise:** signature verification, trust weighting, tenant overrides, revocation runbooks.
|
||||
- **Evidence storms:** batching, worker sharding, orchestrator rate limiting, priority queues.
|
||||
- **Performance degradation:** caching, indexing, load tests, quota enforcement.
|
||||
- **Offline gaps:** deterministic exports, manifest hashes, Offline Kit tests.
|
||||
|
||||
## Test strategy
|
||||
- **Unit:** normalisers, mapping, trust weights, consensus lattice, signature verification.
|
||||
- **Property:** randomised evidence sets verifying lattice commutativity and determinism.
|
||||
- **Integration:** Excitator → Lens → Policy/Vuln Explorer flow, issuer overrides, simulation.
|
||||
- **Performance:** large tenant datasets, cache behaviour, concurrency tests.
|
||||
- **Security:** RBAC, tenant scoping, signature tampering, issuer revocation.
|
||||
- **Offline:** export/import verification, CLI parity.
|
||||
|
||||
## Definition of done
|
||||
- Lens service, issuer directory, API/CLI/Console components deployed with telemetry and runbooks.
|
||||
- Documentation set (overview, algorithm, issuer directory, API, console, policy trust) updated with imposed rule statements.
|
||||
- ./TASKS.md and ../../TASKS.md reflect current status; Offline Kit parity confirmed.
|
||||
# Implementation plan — VEX Consensus Lens
|
||||
|
||||
## Delivery phases
|
||||
- **Phase 1 – Core lens service**
|
||||
Build normalisation pipeline (CSAF/OpenVEX/CycloneDX), product mapping library, trust weighting functions, consensus algorithm, and persistence (`vex_consensus`, history, conflicts).
|
||||
- **Phase 2 – API & integrations**
|
||||
Expose `/vex/consensus` query/detail/simulate/export endpoints, integrate Policy Engine thresholds, Vuln Explorer UI chips, and VEX Lens change events.
|
||||
- **Phase 3 – Issuer Directory & signatures**
|
||||
Deliver issuer registry, key management, signature verification, RBAC, audit logs, and tenant overrides.
|
||||
- **Phase 4 – Console & CLI experiences**
|
||||
Ship Console module (lists, evidence table, quorum bar, conflicts, simulation drawer) and CLI commands (`stella vex consensus ...`) with export support.
|
||||
- **Phase 5 – Recompute & performance**
|
||||
Implement recompute scheduling (policy activation, Excitor deltas), caching, load tests (10M records/tenant), observability dashboards, and Offline Kit exports.
|
||||
|
||||
## Work breakdown
|
||||
- **VEX Lens service**
|
||||
- Normalise VEX payloads, maintain scope scores, compute consensus digest.
|
||||
- Trust weighting functions (issuer tier, freshness decay, scope quality).
|
||||
- Idempotent workers for consensus projection and history tracking.
|
||||
- Conflict handling queue for manual review and notifications.
|
||||
- **Integrations**
|
||||
- Excitor: enrich VEX events with issuer hints, signatures, product trees.
|
||||
- Policy Engine: trust knobs, simulation endpoints, policy-driven recompute.
|
||||
- Vuln Explorer & Advisory AI: consensus badges, conflict surfacing.
|
||||
- **Issuer Directory**
|
||||
- CRUD for issuers/keys, audit logs, import CSAF publishers, tenant overrides.
|
||||
- Signature verification endpoints consumed by Lens.
|
||||
- **APIs & UX**
|
||||
- REST endpoints for query/detail/conflict export, trust weight updates.
|
||||
- Console module with filters, saved views, evidence table, simulation drawer.
|
||||
- CLI commands for list/show/simulate/export with JSON/CSV output.
|
||||
- **Observability & Ops**
|
||||
- Metrics (consensus latency, conflict rate, signature failures, cache hit rate), logs, traces.
|
||||
- Dashboards + runbooks for recompute storms, mapping failures, signature errors, quota breaches.
|
||||
- Offline exports for Export Center/Offline Kit.
|
||||
|
||||
## Acceptance criteria
|
||||
- Consensus results reproducible across supported VEX formats with deterministic digests and provenance.
|
||||
- Signature verification influences trust weights; unverifiable evidence is down-weighted without pipeline failure.
|
||||
- Policy simulations show quorum shifts without persisting state; Vuln Explorer consumes consensus signals.
|
||||
- Issuer Directory enforces RBAC, audit logs, and key rotation; CLI & Console parity achieved.
|
||||
- Recompute pipeline handles Excitor deltas and policy activations with backpressure and incident surfacing.
|
||||
- Observability dashboards/alerts cover ingestion lag, conflict spikes, signature failures, performance budgets (P95 < 500 ms for 100-row pages at 10M records/tenant).
|
||||
|
||||
## Risks & mitigations
|
||||
- **Product mapping ambiguity:** conservative scope scoring, manual overrides, surfaced warnings, policy review hooks.
|
||||
- **Issuer compromise:** signature verification, trust weighting, tenant overrides, revocation runbooks.
|
||||
- **Evidence storms:** batching, worker sharding, orchestrator rate limiting, priority queues.
|
||||
- **Performance degradation:** caching, indexing, load tests, quota enforcement.
|
||||
- **Offline gaps:** deterministic exports, manifest hashes, Offline Kit tests.
|
||||
|
||||
## Test strategy
|
||||
- **Unit:** normalisers, mapping, trust weights, consensus lattice, signature verification.
|
||||
- **Property:** randomised evidence sets verifying lattice commutativity and determinism.
|
||||
- **Integration:** Excitor → Lens → Policy/Vuln Explorer flow, issuer overrides, simulation.
|
||||
- **Performance:** large tenant datasets, cache behaviour, concurrency tests.
|
||||
- **Security:** RBAC, tenant scoping, signature tampering, issuer revocation.
|
||||
- **Offline:** export/import verification, CLI parity.
|
||||
|
||||
## Definition of done
|
||||
- Lens service, issuer directory, API/CLI/Console components deployed with telemetry and runbooks.
|
||||
- Documentation set (overview, algorithm, issuer directory, API, console, policy trust) updated with imposed rule statements.
|
||||
- ./TASKS.md and ../../TASKS.md reflect current status; Offline Kit parity confirmed.
|
||||
|
||||
@@ -1,9 +0,0 @@
|
||||
# Task board — Vexer
|
||||
|
||||
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
|
||||
|
||||
| ID | Status | Owner(s) | Description | Notes |
|
||||
|----|--------|----------|-------------|-------|
|
||||
| VEXER-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
|
||||
| VEXER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
|
||||
| VEXER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |
|
||||
@@ -47,18 +47,25 @@ CLI mirrors these endpoints (`stella findings list|view|update|export`). Console
|
||||
- Scheduler integration triggers follow-up scans or policy re-evaluation when remediation plan reaches checkpoint.
|
||||
- Zastava (Differential SBOM) feeds runtime exposure signals to reprioritise findings automatically.
|
||||
|
||||
## 5) Observability & compliance
|
||||
|
||||
- Metrics: `findings_open_total{severity,tenant}`, `findings_mttr_seconds`, `triage_actions_total{type}`, `report_generation_seconds`.
|
||||
- Logs: structured with `findingId`, `artifactId`, `advisory`, `policyVersion`, `actor`, `actionType`.
|
||||
- Audit exports: `audit_log.jsonl` appended whenever state changes; offline bundles include signed audit log and manifest.
|
||||
- Compliance: accepted risk requires dual approval and stores justification plus expiry reminders (raised through Notify).
|
||||
|
||||
## 6) Offline bundle requirements
|
||||
|
||||
- Bundle structure:
|
||||
- `manifest.json` (hashes, counts, policy version, generation timestamp).
|
||||
- `findings.jsonl` (current open findings).
|
||||
## 5) Observability & compliance
|
||||
|
||||
- Metrics: `findings_open_total{severity,tenant}`, `findings_mttr_seconds`, `triage_actions_total{type}`, `report_generation_seconds`.
|
||||
- Logs: structured with `findingId`, `artifactId`, `advisory`, `policyVersion`, `actor`, `actionType`.
|
||||
- Audit exports: `audit_log.jsonl` appended whenever state changes; offline bundles include signed audit log and manifest.
|
||||
- Compliance: accepted risk requires dual approval and stores justification plus expiry reminders (raised through Notify).
|
||||
|
||||
## 6) Identity & access integration
|
||||
|
||||
- **Scopes** – `vuln:view`, `vuln:investigate`, `vuln:operate`, `vuln:audit` map to read-only, triage, workflow, and audit experiences respectively. The deprecated `vuln:read` scope is still honoured for legacy tokens but is no longer advertised.
|
||||
- **Attribute filters (ABAC)** – Authority enforces per-service-account filters via the client-credential parameters `vuln_env`, `vuln_owner`, and `vuln_business_tier`. Service accounts define the allowed values in `authority.yaml` (`attributes` block). Tokens include the resolved filters as claims (`stellaops:vuln_env`, `stellaops:vuln_owner`, `stellaops:vuln_business_tier`), and tokens persisted to Mongo retain the same values for audit and revocation.
|
||||
- **Audit trail** – Every token issuance emits `authority.vuln_attr.*` audit properties that mirror the resolved filter set, along with `delegation.service_account` and ordered `delegation.actor[n]` entries so Vuln Explorer can correlate access decisions.
|
||||
- **Permalinks** – Signed permalinks inherit the caller’s ABAC filters; consuming services must enforce the embedded claims in addition to scope checks when resolving permalinks.
|
||||
|
||||
## 7) Offline bundle requirements
|
||||
|
||||
- Bundle structure:
|
||||
- `manifest.json` (hashes, counts, policy version, generation timestamp).
|
||||
- `findings.jsonl` (current open findings).
|
||||
- `history.jsonl` (state changes).
|
||||
- `actions.jsonl` (comments, assignments, tickets).
|
||||
- `reports/` (generated PDFs/CSVs).
|
||||
|
||||
@@ -1,70 +1,70 @@
|
||||
# Implementation plan — Vulnerability Explorer
|
||||
|
||||
## Delivery phases
|
||||
- **Phase 1 – Findings Ledger & resolver**
|
||||
Create append-only ledger, projector, ecosystem resolvers (npm/Maven/PyPI/Go/RPM/DEB), canonical advisory keys, and provenance hashing.
|
||||
- **Phase 2 – API & simulation**
|
||||
Ship Vuln Explorer API (list/detail/grouping/simulation), batch evaluation with Policy Engine rationales, and export orchestrator.
|
||||
- **Phase 3 – Console & CLI workflows**
|
||||
Deliver triage UI (assignments, comments, remediation plans, simulation bar), keyboard accessibility, and CLI commands (`stella vuln ...`) with JSON/CSV output.
|
||||
- **Phase 4 – Automation & integrations**
|
||||
Integrate Advisory AI hints, Zastava runtime exposure, Notify rules, Scheduler follow-up scans, and Graph Explorer deep links.
|
||||
- **Phase 5 – Exports & offline parity**
|
||||
Generate deterministic bundles (JSON, CSV, PDF, Offline Kit manifests), audit logs, and signed reports.
|
||||
- **Phase 6 – Observability & hardening**
|
||||
Complete dashboards (projection lag, MTTR, accepted-risk cadence), alerts, runbooks, performance tuning (5M findings/tenant), and security/RBAC validation.
|
||||
|
||||
## Work breakdown
|
||||
- **Findings Ledger**
|
||||
- Define event schema, Merkle root anchoring, append-only storage, history tables.
|
||||
- Projector to `finding_records` and `finding_history`, idempotent event processing, time travel snapshots.
|
||||
- Resolver pipelines referencing SBOM inventory deltas, policy outputs, VEX consensus, runtime signals.
|
||||
- **API & exports**
|
||||
- REST endpoints (`/v1/findings`, `/v1/findings/{id}`, `/actions`, `/reports`, `/exports`) with ABAC filters.
|
||||
- Simulation endpoint returning diffs, integration with Policy Engine batch evaluation.
|
||||
- Export jobs for JSON/CSV/PDF plus Offline Kit bundle assembly and signing.
|
||||
- **Console**
|
||||
- Feature module `vuln-explorer` with grid, filters, saved views, deep links, detail tabs (policy, evidence, paths, remediation).
|
||||
- Simulation drawer, delta chips, accepted-risk approvals, evidence bundle viewer.
|
||||
- Accessibility (keyboard navigation, ARIA), virtualization for large result sets.
|
||||
- **CLI**
|
||||
- Commands `stella vuln list|show|simulate|assign|accept-risk|verify-fix|export`.
|
||||
- Stable schemas for automation; piping support; tests for exit codes.
|
||||
- **Integrations**
|
||||
- Conseiller/Excitator: normalized advisory keys, linksets, evidence retrieval.
|
||||
- SBOM Service: inventory deltas with scope/runtime flags, safe version hints.
|
||||
- Notify: events for SLA breaches, accepted-risk expiries, remediation deadlines.
|
||||
- Scheduler: trigger rescans when remediation plan milestones complete.
|
||||
- **Observability & ops**
|
||||
- Metrics (open findings, MTTR, projection lag, export duration, SLA burn), logs/traces with correlation IDs.
|
||||
- Alerting on projector backlog, API 5xx spikes, export failures, accepted-risk nearing expiry.
|
||||
- Runbooks covering recompute storms, mapping errors, report issues.
|
||||
|
||||
## Acceptance criteria
|
||||
- Ledger/event sourcing reproduces historical states byte-for-byte; Merkle hashes verify integrity.
|
||||
- Resolver respects ecosystem semantics, scope, and runtime context; path evidence presented in UI/CLI.
|
||||
- Triage workflows (assignment, comments, accepted-risk) enforce justification and approval requirements with audit records.
|
||||
- Simulation returns policy diffs without mutating state; CLI/UI parity achieved for simulation and exports.
|
||||
- Exports and Offline Kit bundles reproducible with signed manifests and provenance; reports available in JSON/CSV/PDF.
|
||||
- Observability dashboards show green SLOs, alerts fire for projection lag or SLA burns, and runbooks documented.
|
||||
- RBAC/ABAC validated; attachments encrypted; tenant isolation guaranteed.
|
||||
|
||||
## Risks & mitigations
|
||||
- **Advisory identity collisions:** strict canonicalization, linkset references, raw evidence access.
|
||||
- **Resolver inaccuracies:** property-based tests, path verification, manual override workflows.
|
||||
- **Projection lag/backlog:** autoscaling, queue backpressure, alerting, pause controls.
|
||||
- **Export size/performance:** streaming NDJSON, size estimators, chunked downloads.
|
||||
- **User confusion on suppression:** rationale tab, explicit badges, explain traces.
|
||||
|
||||
## Test strategy
|
||||
- **Unit:** resolver algorithms, state machine transitions, policy mapping, export builders.
|
||||
- **Integration:** ingestion → ledger → projector → API flow, simulation, Notify notifications.
|
||||
- **E2E:** Console triage scenarios, CLI flows, accessibility tests.
|
||||
- **Performance:** 5M findings/tenant, projection rebuild, export generation.
|
||||
- **Security:** RBAC/ABAC matrix, CSRF, attachment encryption, signed URL expiry.
|
||||
- **Determinism:** time-travel snapshots, export manifest hashing, Offline Kit replay.
|
||||
|
||||
## Definition of done
|
||||
- Services, UI/CLI, integrations, exports, and observability deployed with runbooks and Offline Kit parity.
|
||||
- Documentation suite (overview, using-console, API, CLI, findings ledger, policy mapping, VEX/SBOM integration, telemetry, security, runbooks, install) updated with imposed rule statement.
|
||||
- ./TASKS.md and ../../TASKS.md reflect active progress; compliance checklists appended where required.
|
||||
# Implementation plan — Vulnerability Explorer
|
||||
|
||||
## Delivery phases
|
||||
- **Phase 1 – Findings Ledger & resolver**
|
||||
Create append-only ledger, projector, ecosystem resolvers (npm/Maven/PyPI/Go/RPM/DEB), canonical advisory keys, and provenance hashing.
|
||||
- **Phase 2 – API & simulation**
|
||||
Ship Vuln Explorer API (list/detail/grouping/simulation), batch evaluation with Policy Engine rationales, and export orchestrator.
|
||||
- **Phase 3 – Console & CLI workflows**
|
||||
Deliver triage UI (assignments, comments, remediation plans, simulation bar), keyboard accessibility, and CLI commands (`stella vuln ...`) with JSON/CSV output.
|
||||
- **Phase 4 – Automation & integrations**
|
||||
Integrate Advisory AI hints, Zastava runtime exposure, Notify rules, Scheduler follow-up scans, and Graph Explorer deep links.
|
||||
- **Phase 5 – Exports & offline parity**
|
||||
Generate deterministic bundles (JSON, CSV, PDF, Offline Kit manifests), audit logs, and signed reports.
|
||||
- **Phase 6 – Observability & hardening**
|
||||
Complete dashboards (projection lag, MTTR, accepted-risk cadence), alerts, runbooks, performance tuning (5M findings/tenant), and security/RBAC validation.
|
||||
|
||||
## Work breakdown
|
||||
- **Findings Ledger**
|
||||
- Define event schema, Merkle root anchoring, append-only storage, history tables.
|
||||
- Projector to `finding_records` and `finding_history`, idempotent event processing, time travel snapshots.
|
||||
- Resolver pipelines referencing SBOM inventory deltas, policy outputs, VEX consensus, runtime signals.
|
||||
- **API & exports**
|
||||
- REST endpoints (`/v1/findings`, `/v1/findings/{id}`, `/actions`, `/reports`, `/exports`) with ABAC filters.
|
||||
- Simulation endpoint returning diffs, integration with Policy Engine batch evaluation.
|
||||
- Export jobs for JSON/CSV/PDF plus Offline Kit bundle assembly and signing.
|
||||
- **Console**
|
||||
- Feature module `vuln-explorer` with grid, filters, saved views, deep links, detail tabs (policy, evidence, paths, remediation).
|
||||
- Simulation drawer, delta chips, accepted-risk approvals, evidence bundle viewer.
|
||||
- Accessibility (keyboard navigation, ARIA), virtualization for large result sets.
|
||||
- **CLI**
|
||||
- Commands `stella vuln list|show|simulate|assign|accept-risk|verify-fix|export`.
|
||||
- Stable schemas for automation; piping support; tests for exit codes.
|
||||
- **Integrations**
|
||||
- Conseiller/Excitor: normalized advisory keys, linksets, evidence retrieval.
|
||||
- SBOM Service: inventory deltas with scope/runtime flags, safe version hints.
|
||||
- Notify: events for SLA breaches, accepted-risk expiries, remediation deadlines.
|
||||
- Scheduler: trigger rescans when remediation plan milestones complete.
|
||||
- **Observability & ops**
|
||||
- Metrics (open findings, MTTR, projection lag, export duration, SLA burn), logs/traces with correlation IDs.
|
||||
- Alerting on projector backlog, API 5xx spikes, export failures, accepted-risk nearing expiry.
|
||||
- Runbooks covering recompute storms, mapping errors, report issues.
|
||||
|
||||
## Acceptance criteria
|
||||
- Ledger/event sourcing reproduces historical states byte-for-byte; Merkle hashes verify integrity.
|
||||
- Resolver respects ecosystem semantics, scope, and runtime context; path evidence presented in UI/CLI.
|
||||
- Triage workflows (assignment, comments, accepted-risk) enforce justification and approval requirements with audit records.
|
||||
- Simulation returns policy diffs without mutating state; CLI/UI parity achieved for simulation and exports.
|
||||
- Exports and Offline Kit bundles reproducible with signed manifests and provenance; reports available in JSON/CSV/PDF.
|
||||
- Observability dashboards show green SLOs, alerts fire for projection lag or SLA burns, and runbooks documented.
|
||||
- RBAC/ABAC validated; attachments encrypted; tenant isolation guaranteed.
|
||||
|
||||
## Risks & mitigations
|
||||
- **Advisory identity collisions:** strict canonicalization, linkset references, raw evidence access.
|
||||
- **Resolver inaccuracies:** property-based tests, path verification, manual override workflows.
|
||||
- **Projection lag/backlog:** autoscaling, queue backpressure, alerting, pause controls.
|
||||
- **Export size/performance:** streaming NDJSON, size estimators, chunked downloads.
|
||||
- **User confusion on suppression:** rationale tab, explicit badges, explain traces.
|
||||
|
||||
## Test strategy
|
||||
- **Unit:** resolver algorithms, state machine transitions, policy mapping, export builders.
|
||||
- **Integration:** ingestion → ledger → projector → API flow, simulation, Notify notifications.
|
||||
- **E2E:** Console triage scenarios, CLI flows, accessibility tests.
|
||||
- **Performance:** 5M findings/tenant, projection rebuild, export generation.
|
||||
- **Security:** RBAC/ABAC matrix, CSRF, attachment encryption, signed URL expiry.
|
||||
- **Determinism:** time-travel snapshots, export manifest hashing, Offline Kit replay.
|
||||
|
||||
## Definition of done
|
||||
- Services, UI/CLI, integrations, exports, and observability deployed with runbooks and Offline Kit parity.
|
||||
- Documentation suite (overview, using-console, API, CLI, findings ledger, policy mapping, VEX/SBOM integration, telemetry, security, runbooks, install) updated with imposed rule statement.
|
||||
- ./TASKS.md and ../../TASKS.md reflect active progress; compliance checklists appended where required.
|
||||
|
||||
Reference in New Issue
Block a user