feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		
							
								
								
									
										22
									
								
								docs/modules/advisory-ai/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/advisory-ai/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Advisory AI agent guide | ||||
|  | ||||
| ## Mission | ||||
| Advisory AI is the retrieval-augmented assistant that synthesizes advisory and VEX evidence into operator-ready summaries, conflict explanations, and remediation plans with strict provenance. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Review ./architecture.md for retrieval pipeline, guardrails, and profile support. | ||||
| 2. Open ../../implplan/SPRINTS.md and locate stories for this component. | ||||
| 3. Check ./TASKS.md and update status before/after work. | ||||
| 4. Read README/architecture for design context and update as the implementation evolves. | ||||
|  | ||||
| ## Guardrails | ||||
| - Uphold Aggregation-Only Contract boundaries when consuming ingestion data. | ||||
| - Preserve determinism and provenance in all derived outputs. | ||||
| - Document offline/air-gap pathways for any new feature. | ||||
| - Update telemetry/observability assets alongside feature work. | ||||
							
								
								
									
										29
									
								
								docs/modules/advisory-ai/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										29
									
								
								docs/modules/advisory-ai/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,29 @@ | ||||
| # StellaOps Advisory AI | ||||
|  | ||||
| Advisory AI is the retrieval-augmented assistant that synthesizes advisory and VEX evidence into operator-ready summaries, conflict explanations, and remediation plans with strict provenance. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Generate policy-aware advisory summaries with citations back to Conseiller and Excititor evidence. | ||||
| - Explain conflicting advisories/VEX statements using weights from VEX Lens and Policy Engine. | ||||
| - Propose remediation hints aligned with Offline Kit staging and export bundles. | ||||
| - Expose API/UI surfaces with guardrails on model prompts, outputs, and retention. | ||||
|  | ||||
| ## Key components | ||||
| - RAG pipeline drawing from Conseiller, Excititor, VEX Lens, Policy Engine, and SBOM Service data. | ||||
| - Prompt templates and guard models enforcing provenance and redaction policies. | ||||
| - Vercel/offline inference workers with deterministic caching of generated artefacts. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Authority for tenant-aware access control. | ||||
| - Policy Engine for context-specific decisions and explain traces. | ||||
| - Console/CLI for interaction surfaces. | ||||
| - Export Center/Vuln Explorer for embedding generated briefs. | ||||
|  | ||||
| ## Operational notes | ||||
| - Model cache management and offline bundle packaging per Epic 8 requirements. | ||||
| - Usage/latency dashboards for prompt/response monitoring. | ||||
| - Redaction policies validated against security/LLM guardrail tests. | ||||
|  | ||||
| ## Epic alignment | ||||
| - Epic 8: Advisory AI Assistant. | ||||
| - DOCS-AI stories to be tracked in ../../TASKS.md. | ||||
							
								
								
									
										9
									
								
								docs/modules/advisory-ai/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/advisory-ai/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Advisory AI | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | ADVISORY-AI-DOCS-0001 | TODO | Docs Guild | Ensure ./README.md reflects the latest epic deliverables. | Align with ./AGENTS.md | | ||||
| | ADVISORY-AI-ENG-0001 | TODO | Module Team | Break down epic milestones into actionable stories. | Sync into ../../TASKS.md | | ||||
| | ADVISORY-AI-OPS-0001 | TODO | Ops Guild | Prepare runbooks/observability assets once MVP lands. | Document outputs in ./README.md | | ||||
							
								
								
									
										100
									
								
								docs/modules/advisory-ai/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										100
									
								
								docs/modules/advisory-ai/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,100 @@ | ||||
| # Advisory AI architecture | ||||
|  | ||||
| > Captures the retrieval, guardrail, and inference packaging requirements defined in the Advisory AI implementation plan and related module guides. | ||||
|  | ||||
| ## 1) Goals | ||||
|  | ||||
| - Summarise advisories/VEX evidence into operator-ready briefs with citations. | ||||
| - Explain conflicting statements with provenance and trust weights (using VEX Lens & Excititor data). | ||||
| - Suggest remediation plans aligned with Offline Kit deployment models and scheduler follow-ups. | ||||
| - Operate deterministically where possible; cache generated artefacts with digests for audit. | ||||
|  | ||||
| ## 2) Pipeline overview | ||||
|  | ||||
| ``` | ||||
|                        +---------------------+ | ||||
|    Concelier/VEX Lens  |  Evidence Retriever | | ||||
|    Policy Engine ----> |  (vector + keyword) | ---> Context Pack (JSON) | ||||
|    Zastava runtime     +---------------------+ | ||||
|                                | | ||||
|                                v | ||||
|                         +-------------+ | ||||
|                         | Prompt      | | ||||
|                         | Assembler   | | ||||
|                         +-------------+ | ||||
|                                | | ||||
|                                v | ||||
|                         +-------------+ | ||||
|                         | Guarded LLM | | ||||
|                         | (local/host)| | ||||
|                         +-------------+ | ||||
|                                | | ||||
|                                v | ||||
|                         +-----------------+ | ||||
|                         | Citation &     | | ||||
|                         | Validation      | | ||||
|                         +-----------------+ | ||||
|                                | | ||||
|                                v | ||||
|                         +----------------+ | ||||
|                         | Output cache   | | ||||
|                         | (hash, bundle) | | ||||
|                         +----------------+ | ||||
| ``` | ||||
|  | ||||
| ## 3) Retrieval & context | ||||
|  | ||||
| - Hybrid search: vector embeddings (SBERT-compatible) + keyword filters for advisory IDs, PURLs, CVEs. | ||||
| - Context packs include: | ||||
|   - Advisory raw excerpts with highlighted sections and source URLs. | ||||
|   - VEX statements (normalized tuples + trust metadata). | ||||
|   - Policy explain traces for the affected finding. | ||||
|   - Runtime/impact hints from Zastava (exposure, entrypoints). | ||||
|   - Export-ready remediation data (fixed versions, patches). | ||||
|  | ||||
| All context references include `content_hash` and `source_id` enabling verifiable citations. | ||||
|  | ||||
| ## 4) Guardrails | ||||
|  | ||||
| - Prompt templates enforce structure: summary, conflicts, remediation, references. | ||||
| - Response validator ensures: | ||||
|   - No hallucinated advisories (every fact must map to input context). | ||||
|   - Citations follow `[n]` indexing referencing actual sources. | ||||
|   - Remediation suggestions only cite policy-approved sources (fixed versions, vendor hotfixes). | ||||
| - Moderation/PII filters prevent leaking secrets; responses failing validation are rejected and logged. | ||||
|  | ||||
| ## 5) Output persistence | ||||
|  | ||||
| - Cached artefacts stored in `advisory_ai_outputs` with fields: | ||||
|   - `output_hash` (sha256 of JSON response). | ||||
|   - `input_digest` (hash of context pack). | ||||
|   - `summary`, `conflicts`, `remediation`, `citations`. | ||||
|   - `generated_at`, `model_id`, `profile` (Sovereign/FIPS etc.). | ||||
|   - `signatures` (optional DSSE if run in deterministic mode). | ||||
| - Offline bundle format contains `summary.md`, `citations.json`, `context_manifest.json`, `signatures/`. | ||||
|  | ||||
| ## 6) Profiles & sovereignty | ||||
|  | ||||
| - **Profiles:** `default`, `fips-local` (FIPS-compliant local model), `gost-local`, `cloud-openai` (optional, disabled by default). Each profile defines allowed models, key management, and telemetry endpoints. | ||||
| - **CryptoProfile/RootPack integration:** generated artefacts can be signed using configured CryptoProfile to satisfy procurement/trust requirements. | ||||
|  | ||||
| ## 7) APIs | ||||
|  | ||||
| - `POST /v1/advisory-ai/summaries` — generate (or retrieve cached) summary for `{advisoryKey, artifactId, policyVersion}`. | ||||
| - `POST /v1/advisory-ai/conflicts` — explain conflicting VEX statements with trust ranking. | ||||
| - `POST /v1/advisory-ai/remediation` — fetch remediation plan with target fix versions, prerequisites, verification steps. | ||||
| - `GET /v1/advisory-ai/outputs/{hash}` — retrieve cached artefact (used by CLI/Console/Export Center). | ||||
|  | ||||
| All endpoints accept `profile` parameter (default `fips-local`) and return `output_hash`, `input_digest`, and `citations` for verification. | ||||
|  | ||||
| ## 8) Observability | ||||
|  | ||||
| - Metrics: `advisory_ai_requests_total{profile,type}`, `advisory_ai_latency_seconds`, `advisory_ai_validation_failures_total`. | ||||
| - Logs: include `output_hash`, `input_digest`, `profile`, `model_id`, `tenant`, `artifacts`. Sensitive context is not logged. | ||||
| - Traces: spans for retrieval, prompt assembly, model inference, validation, cache write. | ||||
|  | ||||
| ## 9) Operational controls | ||||
|  | ||||
| - Feature flags per tenant (`ai.summary.enabled`, `ai.remediation.enabled`). | ||||
| - Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage. | ||||
| - Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests. | ||||
							
								
								
									
										19
									
								
								docs/modules/advisory-ai/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										19
									
								
								docs/modules/advisory-ai/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,19 @@ | ||||
| # Implementation plan — Advisory AI | ||||
|  | ||||
| ## Current objectives | ||||
| - Deliver Epic milestones summarised below while maintaining determinism and offline parity. | ||||
| - Keep documentation, telemetry, and runbooks aligned with sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Roadmap: reconcile open stories in ../../TASKS.md with module backlog. | ||||
| - Delivery: ship features outlined in the epic while preserving AOC guardrails. | ||||
| - Validation: extend tests/fixtures to guarantee reproducibility and provenance. | ||||
|  | ||||
| ## Epic milestones | ||||
| - Epic 8: Advisory AI Assistant. | ||||
| - DOCS-AI stories to be tracked in ../../TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up work. | ||||
| - Sync with owners listed in docs/implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										22
									
								
								docs/modules/attestor/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/attestor/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Attestor agent guide | ||||
|  | ||||
| ## Mission | ||||
| Attestor moves signed evidence through the trust chain by accepting DSSE bundles from Signer, registering them with Rekor v2, and serving deterministic verification payloads to other services. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										54
									
								
								docs/modules/attestor/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										54
									
								
								docs/modules/attestor/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,54 @@ | ||||
| # StellaOps Attestor | ||||
|  | ||||
| Attestor converts signed DSSE evidence from the Signer into transparency-log proofs and verifiable reports for every downstream surface (Policy Engine, Export Center, CLI, Console, Scheduler). It is the trust backbone that proves SBOM, scan, VEX, and policy artefacts were signed, witnessed, and preserved without tampering. | ||||
|  | ||||
| ## Why it exists | ||||
| - **Evidence first:** organisations need portable, verifiable attestations that prove build provenance, SBOM availability, policy verdicts, and VEX statements. | ||||
| - **Policy enforcement:** verification policies ensure only approved issuers, key types, witnesses, and freshness windows are accepted. | ||||
| - **Sovereign/offline-ready:** Attestor archives envelopes, signatures, and proofs so air-gapped deployments can replay verification without contacting external services. | ||||
|  | ||||
| ## Roles & surfaces | ||||
| - **Subjects:** immutable digests for container images, SBOMs, reports, and policy bundles. | ||||
| - **Issuers:** builders, scanners, policy engines, or operators signing DSSE envelopes using keyless (Fulcio), KMS/HSM, or FIDO2 keys. | ||||
| - **Consumers:** CLI/SDK, Console, Export Center, Scanner, Policy Engine, and Notify retrieving verification bundles or triggering policy checks. | ||||
| - **Scopes:** Authority issues `attestor.write`, `attestor.verify`, `attestor.read`, and administrative scopes for issuer/key management; every call is bound with mTLS + DPoP. | ||||
|  | ||||
| ## Supported payloads | ||||
| - `StellaOps.BuildProvenance@1`, `StellaOps.SBOMAttestation@1` | ||||
| - `StellaOps.ScanResults@1`, `StellaOps.VEXAttestation@1` | ||||
| - `StellaOps.PolicyEvaluation@1`, `StellaOps.RiskProfileEvidence@1` | ||||
| All predicates capture subjects, issuer metadata, policy context, materials, optional witnesses, and versioned schemas. Unsupported predicates return `422 predicate_unsupported`. | ||||
|  | ||||
| ## Trust & envelope model | ||||
| - DSSE envelopes are canonicalised, hashed, and stored alongside the Rekor UUID, index, and proof. | ||||
| - Signature modes span keyless (Fulcio), keyful (KMS/HSM), and hardware-backed (FIDO2). Multiple signatures are supported per envelope. | ||||
| - Proofs include Merkle inclusion path, checkpoint metadata, optional witness endorsements, and cached verification verdicts. | ||||
| - CAS/object storage retains envelopes + provenance for later replay; Rekor backends may be primary plus mirrors. | ||||
|  | ||||
| ## UI, CLI, and SDK workflows | ||||
| - **Console:** Evidence browser, verification reports, chain-of-custody graph, issuer/key management, attestation workbench, and bulk verification flows. | ||||
| - **CLI / SDK:** `stella attest sign|verify|list|fetch|key` commands plus language SDKs to integrate build pipelines and offline verification scripts. | ||||
| - **Policy Studio:** Verification policies author required predicate types, issuers, witness requirements, and freshness windows; simulations show enforcement impact. | ||||
|  | ||||
| ## Storage, offline & air-gap posture | ||||
| - MongoDB stores entry metadata, dedupe keys, and audit events; object storage optionally archives DSSE bundles. | ||||
| - Export Center packages attestation bundles (`stella export attestation-bundle`) for Offline Kit delivery. | ||||
| - Transparency logs can be mirrored; offline mode records gaps and provides compensating controls. | ||||
|  | ||||
| ## Observability & performance | ||||
| - Metrics: `attestor_submission_total`, `attestor_verify_seconds`, `attestor_cache_hit_ratio`, `attestor_rekor_latency_seconds`. | ||||
| - Logs capture tenant, issuer, subject digests, Rekor UUID, proof status, and policy verdict. | ||||
| - Performance target: ≥1 000 envelopes/minute per worker with cached verification, batched operations, and concurrency controls. | ||||
|  | ||||
| ## Key integrations | ||||
| - Signer (DSSE source), Authority (scopes & tenancy), Export Center (attestation bundles), Policy Engine (verification policies), Scanner/Excititor (subject evidence), Notify (key rotation & verification alerts), Observability stack (dashboards/alerts). | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-ATTEST-73-001 … DOCS-ATTEST-75-002 (Attestor console, key management, air-gap bundles) in ../../TASKS.md. | ||||
| - EXPORT-ATTEST-75-002 (Export Center attestation packaging) in ../export-center/TASKS.md. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 19 – Attestor Console:** console experience, verification APIs, issuer/key governance, transparency integration, and offline bundles. | ||||
| - **Epic 10 – Export Center:** provenance alignment so exports carry signed manifests and attestation bundles. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										9
									
								
								docs/modules/attestor/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/attestor/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Attestor | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | ATTESTOR-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | ATTESTOR-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | ATTESTOR-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										432
									
								
								docs/modules/attestor/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										432
									
								
								docs/modules/attestor/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,432 @@ | ||||
| # component_architecture_attestor.md — **Stella Ops Attestor** (2025Q4) | ||||
|  | ||||
| > Derived from Epic 19 – Attestor Console with provenance hooks aligned to the Export Center bundle workflows scoped in Epic 10. | ||||
|  | ||||
| > **Scope.** Implementation‑ready architecture for the **Attestor**: the service that **submits** DSSE envelopes to **Rekor v2**, retrieves/validates inclusion proofs, caches results, and exposes verification APIs. It accepts DSSE **only** from the **Signer** over mTLS, enforces chain‑of‑trust to Stella Ops roots, and returns `{uuid, index, proof, logURL}` to calling services (Scanner.WebService for SBOMs; backend for final reports; Excititor exports when configured). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Turn a signed DSSE envelope from the Signer into a **transparency‑logged, verifiable fact** with a durable, replayable proof (Merkle inclusion + (optional) checkpoint anchoring). Provide **fast verification** for downstream consumers and a stable retrieval interface for UI/CLI. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Attestor **does not sign**; it **must not** accept unsigned or third‑party‑signed bundles. | ||||
| * Attestor **does not decide PASS/FAIL**; it logs attestations for SBOMs, reports, and export artifacts. | ||||
| * Rekor v2 backends may be **local** (self‑hosted) or **remote**; Attestor handles both with retries, backoff, and idempotency. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Topology & dependencies | ||||
|  | ||||
| **Process shape:** single stateless service `stellaops/attestor` behind mTLS. | ||||
|  | ||||
| **Dependencies:** | ||||
|  | ||||
| * **Signer** (caller) — authenticated via **mTLS** and **Authority** OpToks. | ||||
| * **Rekor v2** — tile‑backed transparency log endpoint(s). | ||||
| * **MinIO (S3)** — optional archive store for DSSE envelopes & verification bundles. | ||||
| * **MongoDB** — local cache of `{uuid, index, proof, artifactSha256, bundleSha256}`; job state; audit. | ||||
| * **Redis** — dedupe/idempotency keys and short‑lived rate‑limit buckets. | ||||
| * **Licensing Service (optional)** — “endorse” call for cross‑log publishing when customer opts‑in. | ||||
|  | ||||
| Trust boundary: **Only the Signer** is allowed to call submission endpoints; enforced by **mTLS peer cert allowlist** + `aud=attestor` OpTok. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ### Roles, identities & scopes | ||||
| - **Subjects** — immutable digests for artifacts (container images, SBOMs, reports) referenced in DSSE envelopes. | ||||
| - **Issuers** — authenticated builders/scanners/policy engines signing evidence; tracked with mode (`keyless`, `kms`, `hsm`, `fido2`) and tenant scope. | ||||
| - **Consumers** — Scanner, Export Center, CLI, Console, Policy Engine that verify proofs using Attestor APIs. | ||||
| - **Authority scopes** — `attestor.write`, `attestor.verify`, `attestor.read`, and administrative scopes for key management; all calls mTLS/DPoP-bound. | ||||
|  | ||||
| ### Supported predicate types | ||||
| - `StellaOps.BuildProvenance@1` | ||||
| - `StellaOps.SBOMAttestation@1` | ||||
| - `StellaOps.ScanResults@1` | ||||
| - `StellaOps.PolicyEvaluation@1` | ||||
| - `StellaOps.VEXAttestation@1` | ||||
| - `StellaOps.RiskProfileEvidence@1` | ||||
|  | ||||
| Each predicate embeds subject digests, issuer metadata, policy context, materials, and optional transparency hints. Unsupported predicates return `422 predicate_unsupported`. | ||||
|  | ||||
| ### Envelope & signature model | ||||
| - DSSE envelopes canonicalised (stable JSON ordering) prior to hashing. | ||||
| - Signature modes: keyless (Fulcio cert chain), keyful (KMS/HSM), hardware (FIDO2/WebAuthn). Multiple signatures allowed. | ||||
| - Rekor entry stores bundle hash, certificate chain, and optional witness endorsements. | ||||
| - Archive CAS retains original envelope plus metadata for offline verification. | ||||
|  | ||||
| ### Verification pipeline overview | ||||
| 1. Fetch envelope (from request, cache, or storage) and validate DSSE structure. | ||||
| 2. Verify signature(s) against configured trust roots; evaluate issuer policy. | ||||
| 3. Retrieve or acquire inclusion proof from Rekor (primary + optional mirror). | ||||
| 4. Validate Merkle proof against checkpoint; optionally verify witness endorsement. | ||||
| 5. Return cached verification bundle including policy verdict and timestamps. | ||||
|  | ||||
| ### UI & CLI touchpoints | ||||
| - Console: Evidence browser, verification report, chain-of-custody graph, issuer/key management, attestation workbench, bulk verification views. | ||||
| - CLI: `stella attest sign|verify|list|fetch|key` with offline verification and export bundle support. | ||||
| - SDKs expose sign/verify primitives for build pipelines. | ||||
|  | ||||
| ### Performance & observability targets | ||||
| - Throughput goal: ≥1 000 envelopes/minute per worker with cached verification. | ||||
| - Metrics: `attestor_submission_total`, `attestor_verify_seconds`, `attestor_rekor_latency_seconds`, `attestor_cache_hit_ratio`. | ||||
| - Logs include `tenant`, `issuer`, `subjectDigest`, `rekorUuid`, `proofStatus`; traces cover submission → Rekor → cache → response path. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Data model (Mongo) | ||||
|  | ||||
| Database: `attestor` | ||||
|  | ||||
| **Collections & schemas** | ||||
|  | ||||
| * `entries` | ||||
|  | ||||
|   ``` | ||||
|   { _id: "<rekor-uuid>", | ||||
|     artifact: { sha256: "<sha256>", kind: "sbom|report|vex-export", imageDigest?, subjectUri? }, | ||||
|     bundleSha256: "<sha256>",                           // canonicalized DSSE | ||||
|     index: <int>,                                       // log index/sequence if provided by backend | ||||
|     proof: {                                            // inclusion proof | ||||
|       checkpoint: { origin, size, rootHash, timestamp }, | ||||
|       inclusion: { leafHash, path[] }                   // Merkle path (tiles) | ||||
|     }, | ||||
|     log: { url, logId? }, | ||||
|     createdAt, status: "included|pending|failed", | ||||
|     signerIdentity: { mode: "keyless|kms", issuer, san?, kid? } | ||||
|   } | ||||
|   ``` | ||||
|  | ||||
| * `dedupe` | ||||
|  | ||||
|   ``` | ||||
|   { key: "bundle:<sha256>", rekorUuid, createdAt, ttlAt }     // idempotency key | ||||
|   ``` | ||||
|  | ||||
| * `audit` | ||||
|  | ||||
|   ``` | ||||
|   { _id, ts, caller: { cn, mTLSThumbprint, sub, aud },        // from mTLS + OpTok | ||||
|     action: "submit|verify|fetch", | ||||
|     artifactSha256, bundleSha256, rekorUuid?, index?, result, latencyMs, backend } | ||||
|   ``` | ||||
|  | ||||
| Indexes: | ||||
|  | ||||
| * `entries` on `artifact.sha256`, `bundleSha256`, `createdAt`, and `{status:1, createdAt:-1}`. | ||||
| * `dedupe.key` unique (TTL 24–48h). | ||||
| * `audit.ts` for time‑range queries. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Input contract (from Signer) | ||||
|  | ||||
| **Attestor accepts only** DSSE envelopes that satisfy all of: | ||||
|  | ||||
| 1. **mTLS** peer certificate maps to `signer` service (CA‑pinned). | ||||
| 2. **Authority** OpTok with `aud=attestor`, `scope=attestor.write`, DPoP or mTLS bound. | ||||
| 3. DSSE envelope is **signed by the Signer’s key** (or includes a **Fulcio‑issued** cert chain) and **chains to configured roots** (Fulcio/KMS). | ||||
| 4. **Predicate type** is one of Stella Ops types (sbom/report/vex‑export) with valid schema. | ||||
| 5. `subject[*].digest.sha256` is present and canonicalized. | ||||
|  | ||||
| **Wire shape (JSON):** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "bundle": { "dsse": { "payloadType": "application/vnd.in-toto+json", "payload": "<b64>", "signatures": [ ... ] }, | ||||
|               "certificateChain": [ "-----BEGIN CERTIFICATE-----..." ], | ||||
|               "mode": "keyless" }, | ||||
|   "meta": { | ||||
|     "artifact": { "sha256": "<subject sha256>", "kind": "sbom|report|vex-export", "imageDigest": "sha256:..." }, | ||||
|     "bundleSha256": "<sha256 of canonical dsse>", | ||||
|     "logPreference": "primary",               // "primary" | "mirror" | "both" | ||||
|     "archive": true                           // whether Attestor should archive bundle to S3 | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) APIs | ||||
|  | ||||
| ### 4.1 Submission | ||||
|  | ||||
| `POST /api/v1/rekor/entries`  *(mTLS + OpTok required)* | ||||
|  | ||||
| * **Body**: as above. | ||||
| * **Behavior**: | ||||
|  | ||||
|   * Verify caller (mTLS + OpTok). | ||||
|   * Validate DSSE bundle (signature, cert chain to Fulcio/KMS; DSSE structure; payloadType allowed). | ||||
|   * Idempotency: compute `bundleSha256`; check `dedupe`. If present, return existing `rekorUuid`. | ||||
|   * Submit canonicalized bundle to Rekor v2 (primary or mirror according to `logPreference`). | ||||
|   * Retrieve **inclusion proof** (blocking until inclusion or up to `proofTimeoutMs`); if backend returns promise only, return `status=pending` and retry asynchronously. | ||||
|   * Persist `entries` record; archive DSSE to S3 if `archive=true`. | ||||
| * **Response 200**: | ||||
|  | ||||
|   ```json | ||||
|   { | ||||
|     "uuid": "…", | ||||
|     "index": 123456, | ||||
|     "proof": { | ||||
|       "checkpoint": { "origin": "rekor@site", "size": 987654, "rootHash": "…", "timestamp": "…" }, | ||||
|       "inclusion": { "leafHash": "…", "path": ["…","…"] } | ||||
|     }, | ||||
|     "logURL": "https://rekor…/api/v2/log/…/entries/…", | ||||
|     "status": "included" | ||||
|   } | ||||
|   ``` | ||||
| * **Errors**: `401 invalid_token`, `403 not_signer|chain_untrusted`, `409 duplicate_bundle` (with existing `uuid`), `502 rekor_unavailable`, `504 proof_timeout`. | ||||
|  | ||||
| ### 4.2 Proof retrieval | ||||
|  | ||||
| `GET /api/v1/rekor/entries/{uuid}` | ||||
|  | ||||
| * Returns `entries` row (refreshes proof from Rekor if stale/missing). | ||||
| * Accepts `?refresh=true` to force backend query. | ||||
|  | ||||
| ### 4.3 Verification (third‑party or internal) | ||||
|  | ||||
| `POST /api/v1/rekor/verify` | ||||
|  | ||||
| * **Body** (one of): | ||||
|  | ||||
|   * `{ "uuid": "…" }` | ||||
|   * `{ "bundle": { …DSSE… } }` | ||||
|   * `{ "artifactSha256": "…" }`  *(looks up most recent entry)* | ||||
|  | ||||
| * **Checks**: | ||||
|  | ||||
|   1. **Bundle signature** → cert chain to Fulcio/KMS roots configured. | ||||
|   2. **Inclusion proof** → recompute leaf hash; verify Merkle path against checkpoint root. | ||||
|   3. Optionally verify **checkpoint** against local trust anchors (if Rekor signs checkpoints). | ||||
|   4. Confirm **subject.digest** matches caller‑provided hash (when given). | ||||
|  | ||||
| * **Response**: | ||||
|  | ||||
|   ```json | ||||
|   { "ok": true, "uuid": "…", "index": 123, "logURL": "…", "checkedAt": "…" } | ||||
|   ``` | ||||
|  | ||||
| ### 4.4 Batch submission (optional) | ||||
|  | ||||
| `POST /api/v1/rekor/batch` accepts an array of submission objects; processes with per‑item results. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Rekor v2 driver (backend) | ||||
|  | ||||
| * **Canonicalization**: DSSE envelopes are **normalized** (stable JSON ordering, no insignificant whitespace) before hashing and submission. | ||||
| * **Transport**: HTTP/2 with retries (exponential backoff, jitter), budgeted timeouts. | ||||
| * **Idempotency**: if backend returns “already exists,” map to existing `uuid`. | ||||
| * **Proof acquisition**: | ||||
|  | ||||
|   * In synchronous mode, poll the log for inclusion up to `proofTimeoutMs`. | ||||
|   * In asynchronous mode, return `pending` and schedule a **proof fetcher** job (Mongo job doc + backoff). | ||||
| * **Mirrors/dual logs**: | ||||
|  | ||||
|   * When `logPreference="both"`, submit to primary and mirror; store **both** UUIDs (primary canonical). | ||||
|   * Optional **cloud endorsement**: POST to the Stella Ops cloud `/attest/endorse` with `{uuid, artifactSha256}`; store returned endorsement id. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Security model | ||||
|  | ||||
| * **mTLS required** for submission from **Signer** (CA‑pinned). | ||||
| * **Authority token** with `aud=attestor` and DPoP/mTLS binding must be presented; Attestor verifies both. | ||||
| * **Bundle acceptance policy**: | ||||
|  | ||||
|   * DSSE signature must chain to the configured **Fulcio** (keyless) or **KMS/HSM** roots. | ||||
|   * SAN (Subject Alternative Name) must match **Signer identity** policy (e.g., `urn:stellaops:signer` or pinned OIDC issuer). | ||||
|   * Predicate `predicateType` must be on allowlist (sbom/report/vex-export). | ||||
|   * `subject.digest.sha256` values must be present and well‑formed (hex). | ||||
| * **No public submission** path. **Never** accept bundles from untrusted clients. | ||||
| * **Client certificate allowlists**: optional `security.mtls.allowedSubjects` / `allowedThumbprints` tighten peer identity checks beyond CA pinning. | ||||
| * **Rate limits**: token-bucket per caller derived from `quotas.perCaller` (QPS/burst) returns `429` + `Retry-After` when exceeded. | ||||
| * **Redaction**: Attestor never logs secret material; DSSE payloads **should** be public by design (SBOMs/reports). If customers require redaction, enforce policy at Signer (predicate minimization) **before** Attestor. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Storage & archival | ||||
|  | ||||
| * **Entries** in Mongo provide a local ledger keyed by `rekorUuid` and **artifact sha256** for quick reverse lookups. | ||||
| * **S3 archival** (if enabled): | ||||
|  | ||||
|   ``` | ||||
|   s3://stellaops/attest/ | ||||
|     dsse/<bundleSha256>.json | ||||
|     proof/<rekorUuid>.json | ||||
|     bundle/<artifactSha256>.zip               # optional verification bundle | ||||
|   ``` | ||||
| * **Verification bundles** (zip): | ||||
|  | ||||
|   * DSSE (`*.dsse.json`), proof (`*.proof.json`), `chain.pem` (certs), `README.txt` with verification steps & hashes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Observability & audit | ||||
|  | ||||
| **Metrics** (Prometheus): | ||||
|  | ||||
| * `attestor.submit_total{result,backend}` | ||||
| * `attestor.submit_latency_seconds{backend}` | ||||
| * `attestor.proof_fetch_total{result}` | ||||
| * `attestor.verify_total{result}` | ||||
| * `attestor.dedupe_hits_total` | ||||
| * `attestor.errors_total{type}` | ||||
|  | ||||
| **Correlation**: | ||||
|  | ||||
| * HTTP callers may supply `X-Correlation-Id`; Attestor will echo the header and push `CorrelationId` into the log scope for cross-service tracing. | ||||
|  | ||||
| **Tracing**: | ||||
|  | ||||
| * Spans: `validate`, `rekor.submit`, `rekor.poll`, `persist`, `archive`, `verify`. | ||||
|  | ||||
| **Audit**: | ||||
|  | ||||
| * Immutable `audit` rows (ts, caller, action, hashes, uuid, index, backend, result, latency). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| attestor: | ||||
|   listen: "https://0.0.0.0:8444" | ||||
|   security: | ||||
|     mtls: | ||||
|       caBundle: /etc/ssl/signer-ca.pem | ||||
|       requireClientCert: true | ||||
|     authority: | ||||
|       issuer: "https://authority.internal" | ||||
|       jwksUrl: "https://authority.internal/jwks" | ||||
|       requireSenderConstraint: "dpop"   # or "mtls" | ||||
|     signerIdentity: | ||||
|       mode: ["keyless","kms"] | ||||
|       fulcioRoots: ["/etc/fulcio/root.pem"] | ||||
|       allowedSANs: ["urn:stellaops:signer"] | ||||
|       kmsKeys: ["kms://cluster-kms/stellaops-signer"] | ||||
|   rekor: | ||||
|     primary: | ||||
|       url: "https://rekor-v2.internal" | ||||
|       proofTimeoutMs: 15000 | ||||
|       pollIntervalMs: 250 | ||||
|       maxAttempts: 60 | ||||
|     mirror: | ||||
|       enabled: false | ||||
|       url: "https://rekor-v2.mirror" | ||||
|   mongo: | ||||
|     uri: "mongodb://mongo/attestor" | ||||
|   s3: | ||||
|     enabled: true | ||||
|     endpoint: "http://minio:9000" | ||||
|     bucket: "stellaops" | ||||
|     prefix: "attest/" | ||||
|     objectLock: "governance" | ||||
|   redis: | ||||
|     url: "redis://redis:6379/2" | ||||
|   quotas: | ||||
|     perCaller: | ||||
|       qps: 50 | ||||
|       burst: 100 | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) End‑to‑end sequences | ||||
|  | ||||
| **A) Submit & include (happy path)** | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant SW as Scanner.WebService | ||||
|   participant SG as Signer | ||||
|   participant AT as Attestor | ||||
|   participant RK as Rekor v2 | ||||
|  | ||||
|   SW->>SG: POST /sign/dsse (OpTok+PoE) | ||||
|   SG-->>SW: DSSE bundle (+certs) | ||||
|   SW->>AT: POST /rekor/entries (mTLS + OpTok) | ||||
|   AT->>AT: Validate DSSE (chain to Fulcio/KMS; signer identity) | ||||
|   AT->>RK: submit(bundle) | ||||
|   RK-->>AT: {uuid, index?} | ||||
|   AT->>RK: poll inclusion until proof or timeout | ||||
|   RK-->>AT: inclusion proof (checkpoint + path) | ||||
|   AT-->>SW: {uuid, index, proof, logURL} | ||||
| ``` | ||||
|  | ||||
| **B) Verify by artifact digest (CLI)** | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant CLI as stellaops verify | ||||
|   participant SW as Scanner.WebService | ||||
|   participant AT as Attestor | ||||
|  | ||||
|   CLI->>SW: GET /catalog/artifacts/{id} | ||||
|   SW-->>CLI: {artifactSha256, rekor: {uuid}} | ||||
|   CLI->>AT: POST /rekor/verify { uuid } | ||||
|   AT-->>CLI: { ok: true, index, logURL } | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Failure modes & responses | ||||
|  | ||||
| | Condition                             | Return                  | Details                                                   |          |              | | ||||
| | ------------------------------------- | ----------------------- | --------------------------------------------------------- | -------- | ------------ | | ||||
| | mTLS/OpTok invalid                    | `401 invalid_token`     | Include `WWW-Authenticate` DPoP challenge when applicable |          |              | | ||||
| | Bundle not signed by trusted identity | `403 chain_untrusted`   | DSSE accepted only from Signer identities                 |          |              | | ||||
| | Duplicate bundle                      | `409 duplicate_bundle`  | Return existing `uuid` (idempotent)                       |          |              | | ||||
| | Rekor unreachable/timeout             | `502 rekor_unavailable` | Retry with backoff; surface `Retry-After`                 |          |              | | ||||
| | Inclusion proof timeout               | `202 accepted`          | `status=pending`, background job continues to fetch proof |          |              | | ||||
| | Archive failure                       | `207 multi-status`      | Entry recorded; archive will retry asynchronously         |          |              | | ||||
| | Verification mismatch                 | `400 verify_failed`     | Include reason: chain                                     | leafHash | rootMismatch | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Performance & scale | ||||
|  | ||||
| * Stateless; scale horizontally. | ||||
| * **Targets**: | ||||
|  | ||||
|   * Submit+proof P95 ≤ **300 ms** (warm log; local Rekor). | ||||
|   * Verify P95 ≤ **30 ms** from cache; ≤ **120 ms** with live proof fetch. | ||||
|   * 1k submissions/minute per replica sustained. | ||||
| * **Hot caches**: `dedupe` (bundle hash → uuid), recent `entries` by artifact sha256. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Testing matrix | ||||
|  | ||||
| * **Happy path**: valid DSSE, inclusion within timeout. | ||||
| * **Idempotency**: resubmit same `bundleSha256` → same `uuid`. | ||||
| * **Security**: reject non‑Signer mTLS, wrong `aud`, DPoP replay, untrusted cert chain, forbidden predicateType. | ||||
| * **Rekor variants**: promise‑then‑proof, proof delayed, mirror dual‑submit, mirror failure. | ||||
| * **Verification**: corrupt leaf path, wrong root, tampered bundle. | ||||
| * **Throughput**: soak test with 10k submissions; latency SLOs, zero drops. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Implementation notes | ||||
|  | ||||
| * Language: **.NET 10** minimal API; `HttpClient` with **sockets handler** tuned for HTTP/2. | ||||
| * JSON: **canonical writer** for DSSE payload hashing. | ||||
| * Crypto: use **BouncyCastle**/**System.Security.Cryptography**; PEM parsing for cert chains. | ||||
| * Rekor client: pluggable driver; treat backend errors as retryable/non‑retryable with granular mapping. | ||||
| * Safety: size caps on bundles; decompress bombs guarded; strict UTF‑8. | ||||
| * CLI integration: `stellaops verify attestation <uuid|bundle|artifact>` calls `/rekor/verify`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Optional features | ||||
|  | ||||
| * **Dual‑log** write (primary + mirror) and **cross‑log proof** packaging. | ||||
| * **Cloud endorsement**: send `{uuid, artifactSha256}` to Stella Ops cloud; store returned endorsement id for marketing/chain‑of‑custody. | ||||
| * **Checkpoint pinning**: periodically pin latest Rekor checkpoints to an external audit store for independent monitoring. | ||||
|  | ||||
							
								
								
									
										74
									
								
								docs/modules/attestor/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										74
									
								
								docs/modules/attestor/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,74 @@ | ||||
| # Implementation plan — Attestor | ||||
|  | ||||
| ## Delivery phases | ||||
| - **Phase 1 – Foundations**   | ||||
|   Build the Attestor service skeleton, DSSE bundle ingestion, mTLS/OpTok enforcement, Rekor v2 client, and cache the `{uuid,index,proof}` tuple. Publish base API (`POST /rekor/entries`, `GET /entries/{uuid}`) and Mongo schemas. | ||||
| - **Phase 2 – Policies & UI**   | ||||
|   Deliver verification policy authoring (Policy Studio integration), console views (evidence browser, verification reports, issuer management), and CLI verbs (`stella attest sign|verify|list|fetch`). | ||||
| - **Phase 3 – Scan & VEX support**   | ||||
|   Accept SBOM, ScanResults, VEX, and PolicyEvaluation predicates; integrate with Scanner, Export Center, Excititor, and Policy Engine pipelines. Ensure AOC invariants on ingestion. | ||||
| - **Phase 4 – Transparency & keys**   | ||||
|   Add multi-log submission (primary + mirror), witness endorsements, KMS/HSM/FIDO2 drivers, key rotation/revocation workflows, and audit trails. | ||||
| - **Phase 5 – Bulk & air gap**   | ||||
|   Implement batch submission/verification, DSSE archival to CAS/object storage, export/import bundles for Offline Kit, and mirror transparency log snapshots. | ||||
| - **Phase 6 – Performance & hardening**   | ||||
|   Optimise cache usage, parallel verification (target ≥1 k envelopes/minute per worker), extend observability (metrics/logs/traces), fuzz parsers, and finalise incident playbooks. | ||||
|  | ||||
| ## Work breakdown | ||||
| - **Attestor service & libraries** | ||||
|   - DSSE validation pipeline (payload whitelist, signature verification, trust roots). | ||||
|   - Rekor client with inclusion-proof acquisition, retry/backoff, mirroring controls. | ||||
|   - Mongo repositories for entries, dedupe, audit; CAS storage for DSSE envelopes. | ||||
|   - Batch submission/verification APIs, verification cache, deterministic serialization. | ||||
|   - Observability hooks: metrics (`attestor_submission_total`, `attestor_verify_seconds`), structured logs, OpenTelemetry traces. | ||||
| - **Signer & Authority integration** | ||||
|   - Enforce mTLS peer validation, Authority scope mapping (`attestor.write`, `attestor.verify`), and DPoP binding. | ||||
|   - Provide signer identity attestation metadata consumed by Attestor. | ||||
| - **Policy & Console** | ||||
|   - Extend Policy Studio with `VerificationPolicy` authoring, approvals, and simulated results. | ||||
|   - Console workflows: Evidence browser, verification reports, chain-of-custody graph, key management UI, bulk verification screens. | ||||
| - **CLI & SDK** | ||||
|   - `stella attest` command group (sign/verify/list/fetch/key management) with DSSE canonicalisation and cosign interoperability. | ||||
|   - SDK helpers for DSSE envelope creation, verification, and proof inspection. | ||||
| - **Export Center & Offline Kit** | ||||
|   - Export Center adapters for attestation bundles; CLI/Console flows to export & import evidence in air-gapped environments. | ||||
|   - Offline Kit scripts for replaying verification, mirroring transparency logs, and reporting gaps. | ||||
| - **Security & key management** | ||||
|   - KMS/HSM/FIDO2 driver abstraction, key rotation and revocation runbooks, witness endorsements, and revocation telemetry. | ||||
| - **Docs & training** | ||||
|   - Update module dossier (overview, architecture, implementation plan), key management guides, transparency reference, CLI/Console documentation, and air-gap runbooks. | ||||
|  | ||||
| ## Cross-module dependencies | ||||
| - **Policy Studio / Policy Engine:** verification policy artefacts, explain integration, remediation hints. | ||||
| - **Export Center:** attestation bundle export/import, provenance linking. | ||||
| - **Authority & Tenancy:** scopes, identity attestations, tenant-aware issuer catalogues. | ||||
| - **Notifications:** attestation success/failure events, key rotation alerts. | ||||
| - **Observability:** dashboards and alerting for signing/verification pipelines. | ||||
|  | ||||
| ## Acceptance criteria | ||||
| - Service ingests DSSE envelopes for all supported predicate types, logs them to configured transparency logs, and returns proofs with deterministic hashes. | ||||
| - Verification APIs/CLI/UI validate signatures, inclusion proofs, and policy compliance; cached verification accelerates repeated checks. | ||||
| - Verification policies gate attestation usage, enforcing issuer, freshness, signature count, and witness requirements. | ||||
| - Export Center and Offline Kit workflows bundle attestations and replay verification offline. | ||||
| - Observability coverage includes metrics, traces, logs, audit events, and alert triggers for key compromise, log outages, and verification failure spikes. | ||||
| - Performance target met (≥1 k envelopes/minute per worker) with horizontal scaling. | ||||
|  | ||||
| ## Risks & mitigations | ||||
| - **Key compromise or leakage:** enforce hardware-backed keys, rotation procedures, revocation checks, and incident runbooks. | ||||
| - **Parser bugs / malformed DSSE:** fuzz DSSE and predicate schemas, strict schema validation, fail closed. | ||||
| - **Transparency outage:** mirror logs, support witness endorsements, queue submissions for retry with exponential backoff. | ||||
| - **Policy complexity:** ship curated starter policies, provide simulation tooling, and document common scenarios. | ||||
| - **Offline gaps:** archive bundles and proof material, surface gaps to operators, and document compensating controls. | ||||
|  | ||||
| ## Test strategy | ||||
| - **Unit:** DSSE validation, Rekor client, dedupe logic, key drivers, policy enforcement. | ||||
| - **Integration:** submit/verify flows across predicate types, multi-log publishing, batch operations, CLI/UI end-to-end exercises. | ||||
| - **Security:** tenant isolation, scope enforcement, key rotation regression, tamper detection. | ||||
| - **Performance:** throughput benchmarks, cache hit-rate monitoring, large batch verification. | ||||
| - **Chaos:** inject Rekor outages, network failures, corrupt bundles; ensure graceful degradation and auditable alerts. | ||||
|  | ||||
| ## Definition of done | ||||
| - Phased milestones delivered with telemetry, documentation, and runbooks in place. | ||||
| - CLI/Console parity verified; Offline Kit procedures validated in sealed environment. | ||||
| - Cross-module dependencies acknowledged in ./TASKS.md and ../../TASKS.md. | ||||
| - Documentation set refreshed (overview, architecture, key management, transparency, CLI/UI) with imposed rule statement. | ||||
							
								
								
									
										22
									
								
								docs/modules/authority/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/authority/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Authority agent guide | ||||
|  | ||||
| ## Mission | ||||
| Authority is the platform OIDC/OAuth2 control plane that mints short-lived, sender-constrained operational tokens (OpToks) for every StellaOps service and tool. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										40
									
								
								docs/modules/authority/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										40
									
								
								docs/modules/authority/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,40 @@ | ||||
| # StellaOps Authority | ||||
|  | ||||
| Authority is the platform OIDC/OAuth2 control plane that mints short-lived, sender-constrained operational tokens (OpToks) for every StellaOps service and tool. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Expose device-code, auth-code, and client-credential flows with DPoP or mTLS binding. | ||||
| - Manage signing keys, JWKS rotation, and PoE integration for plan enforcement. | ||||
| - Emit structured audit events and enforce tenant-aware scope policies. | ||||
| - Provide plugin surface for custom identity providers and credential validators. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Authority` web host. | ||||
| - `StellaOps.Authority.Plugin.*` extensions for secret stores, identity bridges, and OpTok validation. | ||||
| - Telemetry and audit pipeline feeding Security/Observability stacks. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Signer/Attestor for PoE and OpTok introspection. | ||||
| - CLI/UI for login flows and token management. | ||||
| - Scheduler/Scanner for machine-to-machine scope enforcement. | ||||
|  | ||||
| ## Operational notes | ||||
| - MongoDB for tenant, client, and token state. | ||||
| - Key material in KMS/HSM with rotation runbooks (see ./operations/key-rotation.md). | ||||
| - Grafana/Prometheus dashboards for auth latency/issuance. | ||||
|  | ||||
| ## Related resources | ||||
| - ./operations/backup-restore.md | ||||
| - ./operations/key-rotation.md | ||||
| - ./operations/monitoring.md | ||||
| - ./operations/grafana-dashboard.json | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-SEC-62-001 (scope hardening doc) in ../../TASKS.md. | ||||
| - AUTH-POLICY-20-001/002 follow-ups in src/Authority/StellaOps.Authority/TASKS.md. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** enforce OpTok scopes and guardrails supporting raw ingestion boundaries. | ||||
| - **Epic 2 – Policy Engine & Editor:** supply policy evaluation/principal scopes and short-lived tokens for evaluator workflows. | ||||
| - **Epic 4 – Policy Studio:** integrate approval/promotion signatures and policy registry access controls. | ||||
| - **Epic 14 – Identity & Tenancy:** deliver tenant isolation, RBAC hierarchies, and governance tooling for authentication. | ||||
							
								
								
									
										9
									
								
								docs/modules/authority/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/authority/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Authority | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | AUTHORITY-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | AUTHORITY-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | AUTHORITY-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										445
									
								
								docs/modules/authority/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										445
									
								
								docs/modules/authority/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,445 @@ | ||||
| # component_architecture_authority.md — **Stella Ops Authority** (2025Q4) | ||||
|  | ||||
| > Consolidates identity and tenancy requirements documented across the AOC, Policy, and Platform guides, along with the dedicated Authority implementation plan. | ||||
|  | ||||
| > **Scope.** Implementation‑ready architecture for **Stella Ops Authority**: the on‑prem **OIDC/OAuth2** service that issues **short‑lived, sender‑constrained operational tokens (OpToks)** to first‑party services and tools. Covers protocols (DPoP & mTLS binding), token shapes, endpoints, storage, rotation, HA, RBAC, audit, and testing. This component is the trust anchor for *who* is calling inside a Stella Ops installation. (Entitlement is proven separately by **PoE** from the cloud Licensing Service; Authority does not issue PoE.) | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Provide **fast, local, verifiable** authentication for Stella Ops microservices and tools by minting **very short‑lived** OAuth2/OIDC tokens that are **sender‑constrained** (DPoP or mTLS‑bound). Support RBAC scopes, multi‑tenant claims, and deterministic validation for APIs (Scanner, Signer, Attestor, Excititor, Concelier, UI, CLI, Zastava). | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Authority **does not** validate entitlements/licensing. That’s enforced by **Signer** using **PoE** with the cloud Licensing Service. | ||||
| * Authority tokens are **operational only** (2–5 min TTL) and must not be embedded in long‑lived artifacts or stored in SBOMs. | ||||
| * Authority is **stateless for validation** (JWT) and **optional introspection** for services that prefer online checks. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Protocols & cryptography | ||||
|  | ||||
| * **OIDC Discovery**: `/.well-known/openid-configuration` | ||||
| * **OAuth2** grant types: | ||||
|  | ||||
|   * **Client Credentials** (service↔service, with mTLS or private_key_jwt) | ||||
|   * **Device Code** (CLI login on headless agents; optional) | ||||
|   * **Authorization Code + PKCE** (browser login for UI; optional) | ||||
| * **Sender constraint options** (choose per caller or per audience): | ||||
|  | ||||
|   * **DPoP** (Demonstration of Proof‑of‑Possession): proof JWT on each HTTP request, bound to the access token via `cnf.jkt`. | ||||
|   * **OAuth 2.0 mTLS** (certificate‑bound tokens): token bound to client certificate thumbprint via `cnf.x5t#S256`. | ||||
| * **Signing algorithms**: **EdDSA (Ed25519)** preferred; fallback **ES256 (P‑256)**. Rotation is supported via **kid** in JWKS. | ||||
| * **Token format**: **JWT** access tokens (compact), optionally opaque reference tokens for services that insist on introspection. | ||||
| * **Clock skew tolerance**: ±60 s; issue `nbf`, `iat`, `exp` accordingly. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Token model | ||||
|  | ||||
| ### 2.1 Access token (OpTok) — short‑lived (120–300 s) | ||||
|  | ||||
| **Registered claims** | ||||
|  | ||||
| ``` | ||||
| iss   = https://authority.<domain> | ||||
| sub   = <client_id or user_id> | ||||
| aud   = <service audience: signer|scanner|attestor|concelier|excititor|ui|zastava> | ||||
| exp   = <unix ts>  (<= 300 s from iat) | ||||
| iat   = <unix ts> | ||||
| nbf   = iat - 30 | ||||
| jti   = <uuid> | ||||
| scope = "scanner.scan scanner.export signer.sign ..." | ||||
| ``` | ||||
|  | ||||
| **Sender‑constraint (`cnf`)** | ||||
|  | ||||
| * **DPoP**: | ||||
|  | ||||
|   ```json | ||||
|   "cnf": { "jkt": "<base64url(SHA-256(JWK))>" } | ||||
|   ``` | ||||
| * **mTLS**: | ||||
|  | ||||
|   ```json | ||||
|   "cnf": { "x5t#S256": "<base64url(SHA-256(client_cert_der))>" } | ||||
|   ``` | ||||
|  | ||||
| **Install/tenant context (custom claims)** | ||||
|  | ||||
| ``` | ||||
| tid          = <tenant id>               // multi-tenant | ||||
| inst         = <installation id>        // unique installation | ||||
| roles        = [ "svc.scanner", "svc.signer", "ui.admin", ... ] | ||||
| plan?        = <plan name>              // optional hint for UIs; not used for enforcement | ||||
| ``` | ||||
|  | ||||
| > **Note**: Do **not** copy PoE claims into OpTok; OpTok ≠ entitlement. Only **Signer** checks PoE. | ||||
|  | ||||
| ### 2.2 Refresh tokens (optional) | ||||
|  | ||||
| * Default **disabled**. If enabled (for UI interactive logins), pair with **DPoP‑bound** refresh tokens or **mTLS** client sessions; short TTL (≤ 8 h), rotating on use (replay‑safe). | ||||
|  | ||||
| ### 2.3 ID tokens (optional) | ||||
|  | ||||
| * Issued for UI/browser OIDC flows (Authorization Code + PKCE); not used for service auth. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Endpoints & flows | ||||
|  | ||||
| ### 3.1 OIDC discovery & keys | ||||
|  | ||||
| * `GET /.well-known/openid-configuration` → endpoints, algs, jwks_uri | ||||
| * `GET /jwks` → JSON Web Key Set (rotating, at least 2 active keys during transition) | ||||
|  | ||||
| ### 3.2 Token issuance | ||||
|  | ||||
| * `POST /oauth/token` | ||||
|  | ||||
|   * **Client Credentials** (service→service): | ||||
|  | ||||
|     * **mTLS**: mutual TLS + `client_id` → bound token (`cnf.x5t#S256`) | ||||
|       * `security.senderConstraints.mtls.enforceForAudiences` forces the mTLS path when requested `aud`/`resource` values intersect high-value audiences (defaults include `signer`). Authority rejects clients attempting to use DPoP/basic secrets for these audiences. | ||||
|       * Stored `certificateBindings` are authoritative: thumbprint, subject, issuer, serial number, and SAN values are matched against the presented certificate, with rotation grace applied to activation windows. Failures surface deterministic error codes (e.g. `certificate_binding_subject_mismatch`). | ||||
|     * **private_key_jwt**: JWT‑based client auth + **DPoP** header (preferred for tools and CLI) | ||||
|   * **Device Code** (CLI): `POST /oauth/device/code` + `POST /oauth/token` poll | ||||
|   * **Authorization Code + PKCE** (UI): standard | ||||
|  | ||||
| **DPoP handshake (example)** | ||||
|  | ||||
| 1. Client prepares **JWK** (ephemeral keypair). | ||||
| 2. Client sends **DPoP proof** header with fields: | ||||
|  | ||||
|    ``` | ||||
|    htm=POST | ||||
|    htu=https://authority.../oauth/token | ||||
|    iat=<now> | ||||
|    jti=<uuid> | ||||
|    ``` | ||||
|  | ||||
|    signed with the DPoP private key; header carries JWK. | ||||
| 3. Authority validates proof; issues access token with `cnf.jkt=<thumbprint(JWK)>`. | ||||
| 4. Client uses the same DPoP key to sign **every subsequent API request** to services (Signer, Scanner, …). | ||||
|  | ||||
| **mTLS flow** | ||||
|  | ||||
| * Mutual TLS at the connection; Authority extracts client cert, validates chain; token carries `cnf.x5t#S256`. | ||||
|  | ||||
| ### 3.3 Introspection & revocation (optional) | ||||
|  | ||||
| * `POST /oauth/introspect` → `{ active, sub, scope, aud, exp, cnf, ... }` | ||||
| * `POST /oauth/revoke` → revokes refresh tokens or opaque access tokens. | ||||
| * **Replay prevention**: maintain **DPoP `jti` cache** (TTL ≤ 10 min) to reject duplicate proofs when services supply DPoP nonces (Signer requires nonce for high‑value operations). | ||||
|  | ||||
| ### 3.4 UserInfo (optional for UI) | ||||
|  | ||||
| * `GET /userinfo` (ID token context). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Audiences, scopes & RBAC | ||||
|  | ||||
| ### 4.1 Audiences | ||||
|  | ||||
| * `signer` — only the **Signer** service should accept tokens with `aud=signer`. | ||||
| * `attestor`, `scanner`, `concelier`, `excititor`, `ui`, `zastava` similarly. | ||||
|  | ||||
| Services **must** verify `aud` and **sender constraint** (DPoP/mTLS) per their policy. | ||||
|  | ||||
| ### 4.2 Core scopes | ||||
|  | ||||
| | Scope                              | Service            | Operation                  | | ||||
| | ---------------------------------- | ------------------ | -------------------------- | | ||||
| | `signer.sign`                      | Signer             | Request DSSE signing       | | ||||
| | `attestor.write`                   | Attestor           | Submit Rekor entries       | | ||||
| | `scanner.scan`                     | Scanner.WebService | Submit scan jobs           | | ||||
| | `scanner.export`                   | Scanner.WebService | Export SBOMs               | | ||||
| | `scanner.read`                     | Scanner.WebService | Read catalog/SBOMs         | | ||||
| | `vex.read` / `vex.admin`           | Excititor              | Query/operate              | | ||||
| | `concelier.read` / `concelier.export`  | Concelier            | Query/exports              | | ||||
| | `ui.read` / `ui.admin`             | UI                 | View/admin                 | | ||||
| | `zastava.emit` / `zastava.enforce` | Scanner/Zastava    | Runtime events / admission | | ||||
|  | ||||
| **Roles → scopes mapping** is configured centrally (Authority policy) and pushed during token issuance. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Storage & state | ||||
|  | ||||
| * **Configuration DB** (PostgreSQL/MySQL): clients, audiences, role→scope maps, tenant/installation registry, device code grants, persistent consents (if any). | ||||
| * **Cache** (Redis): | ||||
|  | ||||
|   * DPoP **jti** replay cache (short TTL) | ||||
|   * **Nonce** store (per resource server, if they demand nonce) | ||||
|   * Device code pollers, rate limiting buckets | ||||
| * **JWKS**: key material in HSM/KMS or encrypted at rest; JWKS served from memory. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Key management & rotation | ||||
|  | ||||
| * Maintain **at least 2 signing keys** active during rotation; tokens carry `kid`. | ||||
| * Prefer **Ed25519** for compact tokens; maintain **ES256** fallback for FIPS contexts. | ||||
| * Rotation cadence: 30–90 days; emergency rotation supported. | ||||
| * Publish new JWKS **before** issuing tokens with the new `kid` to avoid cold‑start validation misses. | ||||
| * Keep **old keys** available **at least** for max token TTL + 5 minutes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) HA & performance | ||||
|  | ||||
| * **Stateless issuance** (except device codes/refresh) → scale horizontally behind a load‑balancer. | ||||
| * **DB** only for client metadata and optional flows; token checks are JWT‑local; introspection endpoints hit cache/DB minimally. | ||||
| * **Targets**: | ||||
|  | ||||
|   * Token issuance P95 ≤ **20 ms** under warm cache. | ||||
|   * DPoP proof validation ≤ **1 ms** extra per request at resource servers (Signer/Scanner). | ||||
|   * 99.9% uptime; HPA on CPU/latency. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Security posture | ||||
|  | ||||
| * **Strict TLS** (1.3 preferred); HSTS; modern cipher suites. | ||||
| * **mTLS** enabled where required (Signer/Attestor paths). | ||||
| * **Replay protection**: DPoP `jti` cache, nonce support for **Signer** (add `DPoP-Nonce` header on 401; clients re‑sign). | ||||
| * **Rate limits** per client & per IP; exponential backoff on failures. | ||||
| * **Secrets**: clients use **private_key_jwt** or **mTLS**; never basic secrets over the wire. | ||||
| * **CSP/CSRF** hardening on UI flows; `SameSite=Lax` cookies; PKCE enforced. | ||||
| * **Logs** redact `Authorization` and DPoP proofs; store `sub`, `aud`, `scopes`, `inst`, `tid`, `cnf` thumbprints, not full keys. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Multi‑tenancy & installations | ||||
|  | ||||
| * **Tenant (`tid`)** and **Installation (`inst`)** registries define which audiences/scopes a client can request. | ||||
| * Cross‑tenant isolation enforced at issuance (disallow rogue `aud`), and resource servers **must** check that `tid` matches their configured tenant. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Admin & operations APIs | ||||
|  | ||||
| All under `/admin` (mTLS + `authority.admin` scope). | ||||
|  | ||||
| ``` | ||||
| POST /admin/clients                 # create/update client (confidential/public) | ||||
| POST /admin/audiences               # register audience resource URIs | ||||
| POST /admin/roles                   # define role→scope mappings | ||||
| POST /admin/tenants                 # create tenant/install entries | ||||
| POST /admin/keys/rotate             # rotate signing key (zero-downtime) | ||||
| GET  /admin/metrics                 # Prometheus exposition (token issue rates, errors) | ||||
| GET  /admin/healthz|readyz          # health/readiness | ||||
| ``` | ||||
|  | ||||
| Declared client `audiences` flow through to the issued JWT `aud` claim and the token request's `resource` indicators. Authority relies on this metadata to enforce DPoP nonce challenges for `signer`, `attestor`, and other high-value services without requiring clients to repeat the audience parameter on every request. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Integration hard lines (what resource servers must enforce) | ||||
|  | ||||
| Every Stella Ops service that consumes Authority tokens **must**: | ||||
|  | ||||
| 1. Verify JWT signature (`kid` in JWKS), `iss`, `aud`, `exp`, `nbf`. | ||||
| 2. Enforce **sender‑constraint**: | ||||
|  | ||||
|    * **DPoP**: validate DPoP proof (`htu`, `htm`, `iat`, `jti`) and match `cnf.jkt`; cache `jti` for replay defense; honor nonce challenges. | ||||
|    * **mTLS**: match presented client cert thumbprint to token `cnf.x5t#S256`. | ||||
| 3. Check **scopes**; optionally map to internal roles. | ||||
| 4. Check **tenant** (`tid`) and **installation** (`inst`) as appropriate. | ||||
| 5. For **Signer** only: require **both** OpTok and **PoE** in the request (enforced by Signer, not Authority). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Error surfaces & UX | ||||
|  | ||||
| * Token endpoint errors follow OAuth2 (`invalid_client`, `invalid_grant`, `invalid_scope`, `unauthorized_client`). | ||||
| * Resource servers use RFC 6750 style (`WWW-Authenticate: DPoP error="invalid_token", error_description="…", dpop_nonce="…" `). | ||||
| * For DPoP nonce challenges, clients retry with the server‑supplied nonce once. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Observability & audit | ||||
|  | ||||
| * **Metrics**: | ||||
|  | ||||
|   * `authority.tokens_issued_total{grant,aud}` | ||||
|   * `authority.dpop_validations_total{result}` | ||||
|   * `authority.mtls_bindings_total{result}` | ||||
|   * `authority.jwks_rotations_total` | ||||
|   * `authority.errors_total{type}` | ||||
| * **Audit log** (immutable sink): token issuance (`sub`, `aud`, `scopes`, `tid`, `inst`, `cnf thumbprint`, `jti`), revocations, admin changes. | ||||
| * **Tracing**: token flows, DB reads, JWKS cache. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| authority: | ||||
|   issuer: "https://authority.internal" | ||||
|   signing: | ||||
|     enabled: true | ||||
|     activeKeyId: "authority-signing-2025" | ||||
|     keyPath: "../certificates/authority-signing-2025.pem" | ||||
|     algorithm: "ES256" | ||||
|     keySource: "file" | ||||
|   security: | ||||
|     rateLimiting: | ||||
|       token: | ||||
|         enabled: true | ||||
|         permitLimit: 30 | ||||
|         window: "00:01:00" | ||||
|         queueLimit: 0 | ||||
|       authorize: | ||||
|         enabled: true | ||||
|         permitLimit: 60 | ||||
|         window: "00:01:00" | ||||
|         queueLimit: 10 | ||||
|       internal: | ||||
|         enabled: false | ||||
|         permitLimit: 5 | ||||
|         window: "00:01:00" | ||||
|         queueLimit: 0 | ||||
|     senderConstraints: | ||||
|       dpop: | ||||
|         enabled: true | ||||
|         allowedAlgorithms: [ "ES256", "ES384" ] | ||||
|         proofLifetime: "00:02:00" | ||||
|         allowedClockSkew: "00:00:30" | ||||
|         replayWindow: "00:05:00" | ||||
|         nonce: | ||||
|           enabled: true | ||||
|           ttl: "00:10:00" | ||||
|           maxIssuancePerMinute: 120 | ||||
|           store: "redis" | ||||
|           redisConnectionString: "redis://authority-redis:6379?ssl=false" | ||||
|           requiredAudiences: | ||||
|             - "signer" | ||||
|             - "attestor" | ||||
|       mtls: | ||||
|         enabled: true | ||||
|         requireChainValidation: true | ||||
|         rotationGrace: "00:15:00" | ||||
|         enforceForAudiences: | ||||
|           - "signer" | ||||
|         allowedSanTypes: | ||||
|           - "dns" | ||||
|           - "uri" | ||||
|         allowedCertificateAuthorities: | ||||
|           - "/etc/ssl/mtls/clients-ca.pem" | ||||
|   clients: | ||||
|     - clientId: scanner-web | ||||
|       grantTypes: [ "client_credentials" ] | ||||
|       audiences: [ "scanner" ] | ||||
|       auth: { type: "private_key_jwt", jwkFile: "/secrets/scanner-web.jwk" } | ||||
|       senderConstraint: "dpop" | ||||
|       scopes: [ "scanner.scan", "scanner.export", "scanner.read" ] | ||||
|     - clientId: signer | ||||
|       grantTypes: [ "client_credentials" ] | ||||
|       audiences: [ "signer" ] | ||||
|       auth: { type: "mtls" } | ||||
|       senderConstraint: "mtls" | ||||
|       scopes: [ "signer.sign" ] | ||||
|     - clientId: notify-web-dev | ||||
|       grantTypes: [ "client_credentials" ] | ||||
|       audiences: [ "notify.dev" ] | ||||
|       auth: { type: "client_secret", secretFile: "/secrets/notify-web-dev.secret" } | ||||
|       senderConstraint: "dpop" | ||||
|       scopes: [ "notify.read", "notify.admin" ] | ||||
|     - clientId: notify-web | ||||
|       grantTypes: [ "client_credentials" ] | ||||
|       audiences: [ "notify" ] | ||||
|       auth: { type: "client_secret", secretFile: "/secrets/notify-web.secret" } | ||||
|       senderConstraint: "dpop" | ||||
|       scopes: [ "notify.read", "notify.admin" ] | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Testing matrix | ||||
|  | ||||
| * **JWT validation**: wrong `aud`, expired `exp`, skewed `nbf`, stale `kid`. | ||||
| * **DPoP**: invalid `htu`/`htm`, replayed `jti`, stale `iat`, wrong `jkt`, nonce dance. | ||||
| * **mTLS**: wrong client cert, wrong CA, thumbprint mismatch. | ||||
| * **RBAC**: scope enforcement per audience; over‑privileged client denied. | ||||
| * **Rotation**: JWKS rotation while load‑testing; zero‑downtime verification. | ||||
| * **HA**: kill one Authority instance; verify issuance continues; JWKS served by peers. | ||||
| * **Performance**: 1k token issuance/sec on 2 cores with Redis enabled for jti caching. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Threat model & mitigations (summary) | ||||
|  | ||||
| | Threat              | Vector           | Mitigation                                                                                 | | ||||
| | ------------------- | ---------------- | ------------------------------------------------------------------------------------------ | | ||||
| | Token theft         | Copy of JWT      | **Short TTL**, **sender‑constraint** (DPoP/mTLS); replay blocked by `jti` cache and nonces | | ||||
| | Replay across hosts | Reuse DPoP proof | Enforce `htu`/`htm`, `iat` freshness, `jti` uniqueness; services may require **nonce**     | | ||||
| | Impersonation       | Fake client      | mTLS or `private_key_jwt` with pinned JWK; client registration & rotation                  | | ||||
| | Key compromise      | Signing key leak | HSM/KMS storage, key rotation, audit; emergency key revoke path; narrow token TTL          | | ||||
| | Cross‑tenant abuse  | Scope elevation  | Enforce `aud`, `tid`, `inst` at issuance and resource servers                              | | ||||
| | Downgrade to bearer | Strip DPoP       | Resource servers require DPoP/mTLS based on `aud`; reject bearer without `cnf`             | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Deployment & HA | ||||
|  | ||||
| * **Stateless** microservice, containerized; run ≥ 2 replicas behind LB. | ||||
| * **DB**: HA Postgres (or MySQL) for clients/roles; **Redis** for device codes, DPoP nonces/jtis. | ||||
| * **Secrets**: mount client JWKs via K8s Secrets/HashiCorp Vault; signing keys via KMS. | ||||
| * **Backups**: DB daily; Redis not critical (ephemeral). | ||||
| * **Disaster recovery**: export/import of client registry; JWKS rehydrate from KMS. | ||||
| * **Compliance**: TLS audit; penetration testing for OIDC flows. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 18) Implementation notes | ||||
|  | ||||
| * Reference stack: **.NET 10** + **OpenIddict 6** (or IdentityServer if licensed) with custom DPoP validator and mTLS binding middleware. | ||||
| * Keep the DPoP/JTI cache pluggable; allow Redis/Memcached. | ||||
| * Provide **client SDKs** for C# and Go: DPoP key mgmt, proof generation, nonce handling, token refresh helper. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 19) Quick reference — wire examples | ||||
|  | ||||
| **Access token (payload excerpt)** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "iss": "https://authority.internal", | ||||
|   "sub": "scanner-web", | ||||
|   "aud": "signer", | ||||
|   "exp": 1760668800, | ||||
|   "iat": 1760668620, | ||||
|   "nbf": 1760668620, | ||||
|   "jti": "9d9c3f01-6e1a-49f1-8f77-9b7e6f7e3c50", | ||||
|   "scope": "signer.sign", | ||||
|   "tid": "tenant-01", | ||||
|   "inst": "install-7A2B", | ||||
|   "cnf": { "jkt": "KcVb2V...base64url..." } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| **DPoP proof header fields (for POST /sign/dsse)** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "htu": "https://signer.internal/sign/dsse", | ||||
|   "htm": "POST", | ||||
|   "iat": 1760668620, | ||||
|   "jti": "4b1c9b3c-8a95-4c58-8a92-9c6cfb4a6a0b" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Signer validates that `hash(JWK)` in the proof matches `cnf.jkt` in the token. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 20) Rollout plan | ||||
|  | ||||
| 1. **MVP**: Client Credentials (private_key_jwt + DPoP), JWKS, short OpToks, per‑audience scopes. | ||||
| 2. **Add**: mTLS‑bound tokens for Signer/Attestor; device code for CLI; optional introspection. | ||||
| 3. **Hardening**: DPoP nonce support; full audit pipeline; HA tuning. | ||||
| 4. **UX**: Tenant/installation admin UI; role→scope editors; client bootstrap wizards. | ||||
							
								
								
									
										22
									
								
								docs/modules/authority/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/authority/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Implementation plan — Authority | ||||
|  | ||||
| ## Current objectives | ||||
| - Maintain deterministic behaviour and offline parity across releases. | ||||
| - Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap. | ||||
| - Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs. | ||||
| - Validation: extend tests/fixtures to preserve determinism and provenance requirements. | ||||
|  | ||||
| ## Epic milestones | ||||
| - **Epic 1 – AOC enforcement:** deliver OpTok scopes, guardrails, and AOC verifier hooks for ingestion services. | ||||
| - **Epic 2 – Policy Engine & Editor:** support policy evaluator flows (device-code, client credentials, scope sandboxing). | ||||
| - **Epic 4 – Policy Studio:** provide registry/promotion signing, approvals, and fresh-auth prompts. | ||||
| - **Epic 14 – Identity & Tenancy:** implement tenant isolation, RBAC hierarchies, audit trails, and PoE integration. | ||||
| - Track additional work (DOCS-SEC-62-001, AUTH-POLICY-20-001/002) in ../../TASKS.md and src/Authority/**/TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up new work. | ||||
| - Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										97
									
								
								docs/modules/authority/operations/backup-restore.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										97
									
								
								docs/modules/authority/operations/backup-restore.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,97 @@ | ||||
| # Authority Backup & Restore Runbook | ||||
|  | ||||
| ## Scope | ||||
| - **Applies to:** StellaOps Authority deployments running the official `ops/authority/docker-compose.authority.yaml` stack or equivalent Kubernetes packaging. | ||||
| - **Artifacts covered:** MongoDB (`stellaops-authority` database), Authority configuration (`etc/authority.yaml`), plugin manifests under `etc/authority.plugins/`, and signing key material stored in the `authority-keys` volume (defaults to `/app/keys` inside the container). | ||||
| - **Frequency:** Run the full procedure prior to upgrades, before rotating keys, and at least once per 24 h in production. Store snapshots in an encrypted, access-controlled vault. | ||||
|  | ||||
| ## Inventory Checklist | ||||
| | Component | Location (compose default) | Notes | | ||||
| | --- | --- | --- | | ||||
| | Mongo data | `mongo-data` volume (`/var/lib/docker/volumes/.../mongo-data`) | Contains all Authority collections (`AuthorityUser`, `AuthorityClient`, `AuthorityToken`, etc.). | | ||||
| | Configuration | `etc/authority.yaml` | Mounted read-only into the container at `/etc/authority.yaml`. | | ||||
| | Plugin manifests | `etc/authority.plugins/*.yaml` | Includes `standard.yaml` with `tokenSigning.keyDirectory`. | | ||||
| | Signing keys | `authority-keys` volume -> `/app/keys` | Path is derived from `tokenSigning.keyDirectory` (defaults to `../keys` relative to the manifest). | | ||||
|  | ||||
| > **TIP:** Confirm the deployed key directory via `tokenSigning.keyDirectory` in `etc/authority.plugins/standard.yaml`; some installations relocate keys to `/var/lib/stellaops/authority/keys`. | ||||
|  | ||||
| ## Hot Backup (no downtime) | ||||
| 1. **Create output directory:** `mkdir -p backup/$(date +%Y-%m-%d)` on the host. | ||||
| 2. **Dump Mongo:** | ||||
|    ```bash | ||||
|    docker compose -f ops/authority/docker-compose.authority.yaml exec mongo \ | ||||
|      mongodump --archive=/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz \ | ||||
|      --gzip --db stellaops-authority | ||||
|    docker compose -f ops/authority/docker-compose.authority.yaml cp \ | ||||
|      mongo:/dump/authority-$(date +%Y%m%dT%H%M%SZ).gz backup/ | ||||
|    ``` | ||||
|    The `mongodump` archive preserves indexes and can be restored with `mongorestore --archive --gzip`. | ||||
| 3. **Capture configuration + manifests:** | ||||
|    ```bash | ||||
|    cp etc/authority.yaml backup/ | ||||
|    rsync -a etc/authority.plugins/ backup/authority.plugins/ | ||||
|    ``` | ||||
| 4. **Export signing keys:** the compose file maps `authority-keys` to a local Docker volume. Snapshot it without stopping the service: | ||||
|    ```bash | ||||
|    docker run --rm \ | ||||
|      -v authority-keys:/keys \ | ||||
|      -v "$(pwd)/backup:/backup" \ | ||||
|      busybox tar czf /backup/authority-keys-$(date +%Y%m%dT%H%M%SZ).tar.gz -C /keys . | ||||
|    ``` | ||||
| 5. **Checksum:** generate SHA-256 digests for every file and store them alongside the artefacts. | ||||
| 6. **Encrypt & upload:** wrap the backup folder using your secrets management standard (e.g., age, GPG) and upload to the designated offline vault. | ||||
|  | ||||
| ## Cold Backup (planned downtime) | ||||
| 1. Notify stakeholders and drain traffic (CLI clients should refresh tokens afterwards). | ||||
| 2. Stop services: | ||||
|    ```bash | ||||
|    docker compose -f ops/authority/docker-compose.authority.yaml down | ||||
|    ``` | ||||
| 3. Back up volumes directly using `tar`: | ||||
|    ```bash | ||||
|    docker run --rm -v mongo-data:/data -v "$(pwd)/backup:/backup" \ | ||||
|      busybox tar czf /backup/mongo-data-$(date +%Y%m%d).tar.gz -C /data . | ||||
|    docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \ | ||||
|      busybox tar czf /backup/authority-keys-$(date +%Y%m%d).tar.gz -C /keys . | ||||
|    ``` | ||||
| 4. Copy configuration + manifests as in the hot backup (steps 3–6). | ||||
| 5. Restart services and verify health: | ||||
|    ```bash | ||||
|    docker compose -f ops/authority/docker-compose.authority.yaml up -d | ||||
|    curl -fsS http://localhost:8080/ready | ||||
|    ``` | ||||
|  | ||||
| ## Restore Procedure | ||||
| 1. **Provision clean volumes:** remove existing volumes if you’re rebuilding a node (`docker volume rm mongo-data authority-keys`), then recreate the compose stack so empty volumes exist. | ||||
| 2. **Restore Mongo:** | ||||
|    ```bash | ||||
|    docker compose exec -T mongo mongorestore --archive --gzip --drop < backup/authority-YYYYMMDDTHHMMSSZ.gz | ||||
|    ``` | ||||
|    Use `--drop` to replace collections; omit if doing a partial restore. | ||||
| 3. **Restore configuration/manifests:** copy `authority.yaml` and `authority.plugins/*` into place before starting the Authority container. | ||||
| 4. **Restore signing keys:** untar into the mounted volume: | ||||
|    ```bash | ||||
|    docker run --rm -v authority-keys:/keys -v "$(pwd)/backup:/backup" \ | ||||
|      busybox tar xzf /backup/authority-keys-YYYYMMDD.tar.gz -C /keys | ||||
|    ``` | ||||
|    Ensure file permissions remain `600` for private keys (`chmod -R 600`). | ||||
| 5. **Start services & validate:** | ||||
|    ```bash | ||||
|    docker compose up -d | ||||
|    curl -fsS http://localhost:8080/health | ||||
|    ``` | ||||
| 6. **Validate JWKS and tokens:** call `/jwks` and issue a short-lived token via the CLI to confirm key material matches expectations. If the restored environment requires a fresh signing key, follow the rotation SOP in [`docs/11_AUTHORITY.md`](../11_AUTHORITY.md) using `ops/authority/key-rotation.sh` to invoke `/internal/signing/rotate`. | ||||
|  | ||||
| ## Disaster Recovery Notes | ||||
| - **Air-gapped replication:** replicate archives via the Offline Update Kit transport channels; never attach USB devices without scanning. | ||||
| - **Retention:** maintain 30 daily snapshots + 12 monthly archival copies. Rotate encryption keys annually. | ||||
| - **Key compromise:** if signing keys are suspected compromised, restore from the latest clean backup, rotate via OPS3 (see `ops/authority/key-rotation.sh` and `docs/11_AUTHORITY.md`), and publish a revocation notice. | ||||
| - **Mongo version:** keep dump/restore images pinned to the deployment version (compose uses `mongo:7`). Driver 3.5.0 requires MongoDB **4.2+**—clusters still on 4.0 must be upgraded before restore, and future driver releases will drop 4.0 entirely. citeturn1open1 | ||||
|  | ||||
| ## Verification Checklist | ||||
| - [ ] `/ready` reports all identity providers ready. | ||||
| - [ ] OAuth flows issue tokens signed by the restored keys. | ||||
| - [ ] `PluginRegistrationSummary` logs expected providers on startup. | ||||
| - [ ] Revocation manifest export (`dotnet run --project src/Authority/StellaOps.Authority`) succeeds. | ||||
| - [ ] Monitoring dashboards show metrics resuming (see OPS5 deliverables). | ||||
|  | ||||
							
								
								
									
										174
									
								
								docs/modules/authority/operations/grafana-dashboard.json
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										174
									
								
								docs/modules/authority/operations/grafana-dashboard.json
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,174 @@ | ||||
| { | ||||
|   "title": "StellaOps Authority - Token & Access Monitoring", | ||||
|   "uid": "authority-token-monitoring", | ||||
|   "schemaVersion": 38, | ||||
|   "version": 1, | ||||
|   "editable": true, | ||||
|   "timezone": "", | ||||
|   "graphTooltip": 0, | ||||
|   "time": { | ||||
|     "from": "now-6h", | ||||
|     "to": "now" | ||||
|   }, | ||||
|   "templating": { | ||||
|     "list": [ | ||||
|       { | ||||
|         "name": "datasource", | ||||
|         "type": "datasource", | ||||
|         "query": "prometheus", | ||||
|         "refresh": 1, | ||||
|         "hide": 0, | ||||
|         "current": {} | ||||
|       } | ||||
|     ] | ||||
|   }, | ||||
|   "panels": [ | ||||
|     { | ||||
|       "id": 1, | ||||
|       "title": "Token Requests – Success vs Failure", | ||||
|       "type": "timeseries", | ||||
|       "datasource": { | ||||
|         "type": "prometheus", | ||||
|         "uid": "${datasource}" | ||||
|       }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "req/s", | ||||
|           "displayName": "{{grant_type}} ({{status}})" | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "sum by (grant_type, status) (rate(http_server_duration_seconds_count{service_name=\"stellaops-authority\", http_route=\"/token\"}[5m]))", | ||||
|           "legendFormat": "{{grant_type}} {{status}}" | ||||
|         } | ||||
|       ], | ||||
|       "options": { | ||||
|         "legend": { | ||||
|           "displayMode": "table", | ||||
|           "placement": "bottom" | ||||
|         }, | ||||
|         "tooltip": { | ||||
|           "mode": "multi" | ||||
|         } | ||||
|       } | ||||
|     }, | ||||
|     { | ||||
|       "id": 2, | ||||
|       "title": "Rate Limiter Rejections", | ||||
|       "type": "timeseries", | ||||
|       "datasource": { | ||||
|         "type": "prometheus", | ||||
|         "uid": "${datasource}" | ||||
|       }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "req/s", | ||||
|           "displayName": "{{limiter}}" | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "sum by (limiter) (rate(aspnetcore_rate_limiting_rejections_total{service_name=\"stellaops-authority\"}[5m]))", | ||||
|           "legendFormat": "{{limiter}}" | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "id": 3, | ||||
|       "title": "Bypass Events (5m)", | ||||
|       "type": "stat", | ||||
|       "datasource": { | ||||
|         "type": "prometheus", | ||||
|         "uid": "${datasource}" | ||||
|       }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "short", | ||||
|           "color": { | ||||
|             "mode": "thresholds" | ||||
|           }, | ||||
|           "thresholds": { | ||||
|             "mode": "absolute", | ||||
|             "steps": [ | ||||
|               { "color": "green", "value": null }, | ||||
|               { "color": "orange", "value": 1 }, | ||||
|               { "color": "red", "value": 5 } | ||||
|             ] | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "sum(rate(log_messages_total{message_template=\"Granting StellaOps bypass for remote {RemoteIp}; required scopes {RequiredScopes}.\"}[5m]))" | ||||
|         } | ||||
|       ], | ||||
|       "options": { | ||||
|         "reduceOptions": { | ||||
|           "calcs": ["last"], | ||||
|           "fields": "", | ||||
|           "values": false | ||||
|         }, | ||||
|         "orientation": "horizontal", | ||||
|         "textMode": "auto" | ||||
|       } | ||||
|     }, | ||||
|     { | ||||
|       "id": 4, | ||||
|       "title": "Lockout Events (15m)", | ||||
|       "type": "stat", | ||||
|       "datasource": { | ||||
|         "type": "prometheus", | ||||
|         "uid": "${datasource}" | ||||
|       }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "short", | ||||
|           "color": { | ||||
|             "mode": "thresholds" | ||||
|           }, | ||||
|           "thresholds": { | ||||
|             "mode": "absolute", | ||||
|             "steps": [ | ||||
|               { "color": "green", "value": null }, | ||||
|               { "color": "orange", "value": 5 }, | ||||
|               { "color": "red", "value": 10 } | ||||
|             ] | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "sum(rate(log_messages_total{message_template=\"Plugin {PluginName} denied access for {Username} due to lockout (retry after {RetryAfter}).\"}[15m]))" | ||||
|         } | ||||
|       ], | ||||
|       "options": { | ||||
|         "reduceOptions": { | ||||
|           "calcs": ["last"], | ||||
|           "fields": "", | ||||
|           "values": false | ||||
|         }, | ||||
|         "orientation": "horizontal", | ||||
|         "textMode": "auto" | ||||
|       } | ||||
|     }, | ||||
|     { | ||||
|       "id": 5, | ||||
|       "title": "Trace Explorer Shortcut", | ||||
|       "type": "text", | ||||
|       "options": { | ||||
|         "mode": "markdown", | ||||
|         "content": "[Open Trace Explorer](#/explore?left={\"datasource\":\"tempo\",\"queries\":[{\"query\":\"{service.name=\\\"stellaops-authority\\\", span_name=~\\\"authority.token.*\\\"}\",\"refId\":\"A\"}]})" | ||||
|       } | ||||
|     } | ||||
|   ], | ||||
|   "links": [] | ||||
| } | ||||
							
								
								
									
										94
									
								
								docs/modules/authority/operations/key-rotation.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										94
									
								
								docs/modules/authority/operations/key-rotation.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,94 @@ | ||||
| # Authority Signing Key Rotation Playbook | ||||
|  | ||||
| > **Status:** Authored 2025-10-12 as part of OPS3.KEY-ROTATION rollout.   | ||||
| > Use together with `docs/11_AUTHORITY.md` (Authority service guide) and the automation shipped under `ops/authority/`. | ||||
|  | ||||
| ## 1. Overview | ||||
|  | ||||
| Authority publishes JWKS and revocation bundles signed with ES256 keys. To rotate those keys without downtime we now provide: | ||||
|  | ||||
| - **Automation script:** `ops/authority/key-rotation.sh`   | ||||
|   Shell helper that POSTS to `/internal/signing/rotate`, supports metadata, dry-run, and confirms JWKS afterwards. | ||||
| - **CI workflow:** `.gitea/workflows/authority-key-rotation.yml`   | ||||
|   Manual dispatch workflow that pulls environment-specific secrets, runs the script, and records the result. Works across staging/production by passing the `environment` input. | ||||
|  | ||||
| This playbook documents the repeatable sequence for all environments. | ||||
|  | ||||
| ## 2. Pre-requisites | ||||
|  | ||||
| 1. **Generate a new PEM key (per environment)** | ||||
|    ```bash | ||||
|    openssl ecparam -name prime256v1 -genkey -noout \ | ||||
|      -out certificates/authority-signing-<env>-<year>.pem | ||||
|    chmod 600 certificates/authority-signing-<env>-<year>.pem | ||||
|    ``` | ||||
| 2. **Stash the previous key** under the same volume so it can be referenced in `signing.additionalKeys` after rotation. | ||||
| 3. **Ensure secrets/vars exist in Gitea** | ||||
|    - `<ENV>_AUTHORITY_BOOTSTRAP_KEY` | ||||
|    - `<ENV>_AUTHORITY_URL` | ||||
|    - Optional shared defaults `AUTHORITY_BOOTSTRAP_KEY`, `AUTHORITY_URL`. | ||||
|  | ||||
| ## 3. Executing the rotation | ||||
|  | ||||
| ### Option A – via CI workflow (recommended) | ||||
|  | ||||
| 1. Navigate to **Actions → Authority Key Rotation**. | ||||
| 2. Provide inputs: | ||||
|    - `environment`: `staging`, `production`, etc. | ||||
|    - `key_id`: new `kid` (e.g. `authority-signing-2025-dev`). | ||||
|    - `key_path`: path as seen by the Authority service (e.g. `../certificates/authority-signing-2025-dev.pem`). | ||||
|    - Optional `metadata`: comma-separated `key=value` pairs (for audit trails). | ||||
| 3. Trigger. The workflow: | ||||
|    - Reads the bootstrap key/URL from secrets. | ||||
|    - Runs `ops/authority/key-rotation.sh`. | ||||
|    - Prints the JWKS response for verification. | ||||
|  | ||||
| ### Option B – manual shell invocation | ||||
|  | ||||
| ```bash | ||||
| AUTHORITY_BOOTSTRAP_KEY=$(cat /secure/authority-bootstrap.key) \ | ||||
| ./ops/authority/key-rotation.sh \ | ||||
|   --authority-url https://authority.example.com \ | ||||
|   --key-id authority-signing-2025-dev \ | ||||
|   --key-path ../certificates/authority-signing-2025-dev.pem \ | ||||
|   --meta rotatedBy=ops --meta changeTicket=OPS-1234 | ||||
| ``` | ||||
|  | ||||
| Use `--dry-run` to inspect the payload before execution. | ||||
|  | ||||
| ## 4. Post-rotation checklist | ||||
|  | ||||
| 1. Update `authority.yaml` (or environment-specific overrides): | ||||
|    - Set `signing.activeKeyId` to the new key. | ||||
|    - Set `signing.keyPath` to the new PEM. | ||||
|    - Append the previous key into `signing.additionalKeys`. | ||||
|    - Ensure `keySource`/`provider` match the values passed to the script. | ||||
| 2. Run `stellaops-cli auth revoke export` so revocation bundles are re-signed with the new key. | ||||
| 3. Confirm `/jwks` lists the new `kid` with `status: "active"` and the previous one as `retired`. | ||||
| 4. Archive the old key securely; keep it available until all tokens/bundles signed with it have expired. | ||||
|  | ||||
| ## 5. Development key state | ||||
|  | ||||
| For the sample configuration (`etc/authority.yaml.sample`) we minted a placeholder dev key: | ||||
|  | ||||
| - Active: `authority-signing-2025-dev` (`certificates/authority-signing-2025-dev.pem`) | ||||
| - Retired: `authority-signing-dev` | ||||
|  | ||||
| Treat these as examples; real environments must maintain their own PEM material. | ||||
|  | ||||
| ## 6. References | ||||
|  | ||||
| - `docs/11_AUTHORITY.md` – Architecture and rotation SOP (Section 5). | ||||
| - `docs/modules/authority/operations/backup-restore.md` – Recovery flow referencing this playbook. | ||||
| - `ops/authority/README.md` – CLI usage and examples. | ||||
| - `scripts/rotate-policy-cli-secret.sh` – Helper to mint new `policy-cli` shared secrets when policy scope bundles change. | ||||
|  | ||||
| ## 7. Appendix — Policy CLI secret rotation | ||||
|  | ||||
| Scope migrations such as AUTH-POLICY-23-004 require issuing fresh credentials for the `policy-cli` client. Use the helper script committed with the repo to keep secrets deterministic across environments. | ||||
|  | ||||
| ```bash | ||||
| ./scripts/rotate-policy-cli-secret.sh --output etc/secrets/policy-cli.secret | ||||
| ``` | ||||
|  | ||||
| The script writes a timestamped header and a random secret into the target file. Use `--dry-run` when generating material for external secret stores. After updating secrets in staging/production, recycle the Authority pods and confirm the new client credentials work before the next release freeze. | ||||
							
								
								
									
										83
									
								
								docs/modules/authority/operations/monitoring.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										83
									
								
								docs/modules/authority/operations/monitoring.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,83 @@ | ||||
| # Authority Monitoring & Alerting Playbook | ||||
|  | ||||
| ## Telemetry Sources | ||||
| - **Traces:** Activity source `StellaOps.Authority` emits spans for every token flow (`authority.token.validate_*`, `authority.token.handle_*`, `authority.token.validate_access`). Key tags include `authority.endpoint`, `authority.grant_type`, `authority.username`, `authority.client_id`, and `authority.identity_provider`. | ||||
| - **Metrics:** OpenTelemetry instrumentation (`AddAspNetCoreInstrumentation`, `AddHttpClientInstrumentation`, custom meter `StellaOps.Authority`) exports: | ||||
|   - `http.server.request.duration` histogram (`http_route`, `http_status_code`, `authority.endpoint` tag via `aspnetcore` enrichment). | ||||
|   - `process.runtime.gc.*`, `process.runtime.dotnet.*` (from `AddRuntimeInstrumentation`). | ||||
| - **Logs:** Serilog writes structured events to stdout. Notable templates: | ||||
|   - `"Password grant verification failed ..."` and `"Plugin {PluginName} denied access ... due to lockout"` (lockout spike detector). | ||||
|   - `"Password grant validation failed for {Username}: provider '{Provider}' does not support MFA required for exception approvals."` (identifies users attempting `exceptions:approve` without MFA support; tie to fresh-auth errors). | ||||
|   - `"Client credentials validation failed for {ClientId}: exception scopes require tenant assignment."` (signals misconfigured exception service identities). | ||||
|   - `"Granting StellaOps bypass for remote {RemoteIp}"` (bypass usage). | ||||
|   - `"Rate limit exceeded for path {Path} from {RemoteIp}"` (limiter alerts). | ||||
|  | ||||
| ## Prometheus Metrics to Collect | ||||
| | Metric | Query | Purpose | | ||||
| | --- | --- | --- | | ||||
| | `token_requests_total` | `sum by (grant_type, status) (rate(http_server_duration_seconds_count{service_name="stellaops-authority", http_route="/token"}[5m]))` | Token issuance volume per grant type (`grant_type` comes via `authority.grant_type` span attribute → Exemplars in Grafana). | | ||||
| | `token_failure_ratio` | `sum(rate(http_server_duration_seconds_count{service_name="stellaops-authority", http_route="/token", http_status_code=~"4..|5.."}[5m])) / sum(rate(http_server_duration_seconds_count{service_name="stellaops-authority", http_route="/token"}[5m]))` | Alert when > 5 % for 10 min. | | ||||
| | `authorize_rate_limit_hits` | `sum(rate(aspnetcore_rate_limiting_rejections_total{service_name="stellaops-authority", limiter="authority-token"}[5m]))` | Detect rate limiting saturations (requires OTEL ASP.NET rate limiter exporter). | | ||||
| | `lockout_events` | `sum by (plugin) (rate(log_messages_total{app="stellaops-authority", level="Warning", message_template="Plugin {PluginName} denied access for {Username} due to lockout (retry after {RetryAfter})."}[5m]))` | Derived from Loki/Promtail log counter. | | ||||
| | `bypass_usage_total` | `sum(rate(log_messages_total{app="stellaops-authority", level="Information", message_template="Granting StellaOps bypass for remote {RemoteIp}; required scopes {RequiredScopes}."}[5m]))` | Track trusted bypass invocations. | | ||||
|  | ||||
| > **Exporter note:** Enable `aspnetcore` meters (`dotnet-counters` name `Microsoft.AspNetCore.Hosting`), or configure the OpenTelemetry Collector `metrics` pipeline with `metric_statements` to remap histogram counts into the shown series. | ||||
|  | ||||
| ## Alert Rules | ||||
| 1. **Token Failure Surge** | ||||
|    - _Expression_: `token_failure_ratio > 0.05` | ||||
|    - _For_: `10m` | ||||
|    - _Labels_: `severity="critical"` | ||||
|    - _Annotations_: Include `topk(5, sum by (authority_identity_provider) (increase(authority_token_rejections_total[10m])))` as diagnostic hint (requires span → metric transformation). | ||||
| 2. **Lockout Spike** | ||||
|    - _Expression_: `sum(rate(log_messages_total{message_template="Plugin {PluginName} denied access for {Username} due to lockout (retry after {RetryAfter})."}[15m])) > 10` | ||||
|    - _For_: `15m` | ||||
|    - Investigate credential stuffing; consider temporarily tightening `RateLimiting.Token`. | ||||
| 3. **Bypass Threshold** | ||||
|    - _Expression_: `sum(rate(log_messages_total{message_template="Granting StellaOps bypass for remote {RemoteIp}; required scopes {RequiredScopes}."}[5m])) > 1` | ||||
|    - _For_: `5m` | ||||
|    - Alert severity `warning` — verify the calling host list. | ||||
| 4. **Rate Limiter Saturation** | ||||
|    - _Expression_: `sum(rate(aspnetcore_rate_limiting_rejections_total{service_name="stellaops-authority"}[5m])) > 0` | ||||
|    - Escalate if sustained for 5 min; confirm trusted clients aren’t misconfigured. | ||||
|  | ||||
| ## Grafana Dashboard | ||||
| - Import `docs/modules/authority/operations/grafana-dashboard.json` to provision baseline panels: | ||||
|   - **Token Success vs Failure** – stacked rate visualization split by grant type. | ||||
|   - **Rate Limiter Hits** – bar chart showing `authority-token` and `authority-authorize`. | ||||
|   - **Bypass & Lockout Events** – dual-stat panel using Loki-derived counters. | ||||
|   - **Trace Explorer Link** – panel links to `StellaOps.Authority` span search pre-filtered by `authority.grant_type`. | ||||
|  | ||||
| ## Collector Configuration Snippets | ||||
| ```yaml | ||||
| receivers: | ||||
|   otlp: | ||||
|     protocols: | ||||
|       http: | ||||
| exporters: | ||||
|   prometheus: | ||||
|     endpoint: "0.0.0.0:9464" | ||||
| processors: | ||||
|   batch: | ||||
|   attributes/token_grant: | ||||
|     actions: | ||||
|       - key: grant_type | ||||
|         action: upsert | ||||
|         from_attribute: authority.grant_type | ||||
| service: | ||||
|   pipelines: | ||||
|     metrics: | ||||
|       receivers: [otlp] | ||||
|       processors: [attributes/token_grant, batch] | ||||
|       exporters: [prometheus] | ||||
|     logs: | ||||
|       receivers: [otlp] | ||||
|       processors: [batch] | ||||
|       exporters: [loki] | ||||
| ``` | ||||
|  | ||||
| ## Operational Checklist | ||||
| - [ ] Confirm `STELLAOPS_AUTHORITY__OBSERVABILITY__EXPORTERS` enables OTLP in production builds. | ||||
| - [ ] Ensure Promtail captures container stdout with Serilog structured formatting. | ||||
| - [ ] Periodically validate alert noise by running load tests that trigger the rate limiter. | ||||
| - [ ] Include dashboard JSON in Offline Kit for air-gapped clusters; update version header when metrics change. | ||||
							
								
								
									
										22
									
								
								docs/modules/ci/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/ci/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # CI Recipes agent guide | ||||
|  | ||||
| ## Mission | ||||
| CI module collects reproducible pipeline recipes for builds, tests, and release promotion across supported platforms. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										29
									
								
								docs/modules/ci/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										29
									
								
								docs/modules/ci/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,29 @@ | ||||
| # StellaOps CI Recipes | ||||
|  | ||||
| CI module collects reproducible pipeline recipes for builds, tests, and release promotion across supported platforms. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Provide ready-to-use pipeline snippets for ingestion, scanning, policy evaluation, and exports. | ||||
| - Document required secrets/scopes and deterministic build knobs. | ||||
| - Highlight offline-compatible workflows and cache strategies. | ||||
|  | ||||
| ## Key components | ||||
| - Recipe catalogue in ./recipes.md. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - DevOps release workflows. | ||||
| - Module-specific test suites referenced in recipes. | ||||
|  | ||||
| ## Operational notes | ||||
| - Encourage reuse through templated YAML/JSON fragments. | ||||
|  | ||||
| ## Related resources | ||||
| - ./recipes.md | ||||
|  | ||||
| ## Backlog references | ||||
| - CI recipes refresh tracked in ../../TASKS.md under DOCS-CI stories. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** bake ingestion/verifier guardrails into CI recipes. | ||||
| - **Epic 10 – Export Center:** provide pipeline snippets for export packaging, signing, and Offline Kit publication. | ||||
| - **Epic 11 – Notifications Studio:** offer CI hooks for notification previews/tests where relevant. | ||||
							
								
								
									
										9
									
								
								docs/modules/ci/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/ci/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — CI Recipes | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | CI RECIPES-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | CI RECIPES-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | CI RECIPES-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										7
									
								
								docs/modules/ci/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										7
									
								
								docs/modules/ci/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,7 @@ | ||||
| # CI Recipes architecture | ||||
|  | ||||
| > Reference the AOC guardrails, export workflows, and notification patterns documented in the Authority, Export Center, and Notify module guides when designing CI templates. | ||||
|  | ||||
| This placeholder summarises the planned architecture for CI Recipes. Consolidate design details from implementation plans and upcoming epics before coding. | ||||
|  | ||||
| Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised. | ||||
							
								
								
									
										21
									
								
								docs/modules/ci/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										21
									
								
								docs/modules/ci/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,21 @@ | ||||
| # Implementation plan — CI Recipes | ||||
|  | ||||
| ## Current objectives | ||||
| - Maintain deterministic behaviour and offline parity across releases. | ||||
| - Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap. | ||||
| - Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs. | ||||
| - Validation: extend tests/fixtures to preserve determinism and provenance requirements. | ||||
|  | ||||
| ## Epic milestones | ||||
| - **Epic 1 – AOC enforcement:** ensure pipelines enforce schemas, provenance, and verifier jobs. | ||||
| - **Epic 10 – Export Center:** add export/signing/Offline Kit automation templates. | ||||
| - **Epic 11 – Notifications Studio:** document CI hooks for notification previews/tests. | ||||
| - Track DOCS-CI stories in ../../TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up new work. | ||||
| - Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										353
									
								
								docs/modules/ci/recipes.md
									
									
									
									
									
										Executable file
									
								
							
							
						
						
									
										353
									
								
								docs/modules/ci/recipes.md
									
									
									
									
									
										Executable file
									
								
							| @@ -0,0 +1,353 @@ | ||||
| # Stella Ops CI Recipes — (2025‑08‑04) | ||||
|  | ||||
| ## 0 · Key variables (export these once) | ||||
|  | ||||
| | Variable      | Meaning                                                                                                                           | Typical value                                        | | ||||
| | ------------- | --------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- | | ||||
| | `STELLA_URL`  | Host that: ① stores the **CLI** & **SBOM‑builder** images under `/registry` **and** ② receives API calls at `https://$STELLA_URL` | `stella-ops.ci.acme.example`                         | | ||||
| | `DOCKER_HOST` | How containers reach your Docker daemon (because we no longer mount `/var/run/docker.sock`)                                       | `tcp://docker:2375`                                  | | ||||
| | `WORKSPACE`   | Directory where the pipeline stores artefacts (SBOM file)                                                                         | `$(pwd)`                                             | | ||||
| | `IMAGE`       | The image you are building & scanning                                                                                             | `acme/backend:sha-${COMMIT_SHA}`                     | | ||||
| | `SBOM_FILE`   | Immutable SBOM name – `<image-ref>‑YYYYMMDDThhmmssZ.sbom.json`                                                                    | `acme_backend_sha‑abc123‑20250804T153050Z.sbom.json` | | ||||
|  | ||||
| > **Authority graph scopes note (2025-10-27):** CI stages that spin up the Authority compose profile now rely on the checked-in `etc/authority.yaml`. Before running integration smoke jobs, inject real secrets for every `etc/secrets/*.secret` file (Cartographer, Graph API, Policy Engine, Concelier, Excititor). The repository defaults contain `*-change-me` placeholders and Authority will reject tokens if those secrets are not overridden. Reissue CI tokens that previously used `policy:write`/`policy:submit`/`policy:edit` scopes—new bundles must request `policy:read`, `policy:author`, `policy:review`, `policy:simulate`, and (`policy:approve`/`policy:operate`/`policy:activate` when pipelines promote policies). | ||||
|  | ||||
| ```bash | ||||
| export STELLA_URL="stella-ops.ci.acme.example" | ||||
| export DOCKER_HOST="tcp://docker:2375"               # Jenkins/Circle often expose it like this | ||||
| export WORKSPACE="$(pwd)" | ||||
| export IMAGE="acme/backend:sha-${COMMIT_SHA}" | ||||
| export SBOM_FILE="$(echo "${IMAGE}" | tr '/:+' '__')-$(date -u +%Y%m%dT%H%M%SZ).sbom.json" | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1 · SBOM creation strategies | ||||
|  | ||||
| ### Option A – **Buildx attested SBOM** (preferred if you can use BuildKit) | ||||
|  | ||||
| You pass **two build args** so the Dockerfile can run the builder and copy the result out of the build context. | ||||
|  | ||||
| ```bash | ||||
| docker buildx build \ | ||||
|   --build-arg STELLA_SBOM_BUILDER="$STELLA_URL/registry/stella-sbom-builder:latest" \ | ||||
|   --provenance=true --sbom=true \ | ||||
|   --build-arg SBOM_FILE="$SBOM_FILE" \ | ||||
|   -t "$IMAGE" . | ||||
| ``` | ||||
|  | ||||
| **If you **cannot** use Buildx, use Option B below.** The older “run a builder stage inside the Dockerfile” pattern is unreliable for producing an SBOM of the final image. | ||||
|  | ||||
| ```Dockerfile | ||||
|  | ||||
| ARG STELLA_SBOM_BUILDER | ||||
| ARG SBOM_FILE | ||||
|  | ||||
| FROM $STELLA_SBOM_BUILDER as sbom | ||||
| ARG IMAGE | ||||
| ARG SBOM_FILE | ||||
| RUN $STELLA_SBOM_BUILDER build --image $IMAGE --output /out/$SBOM_FILE | ||||
|  | ||||
| # ---- actual build stages … ---- | ||||
| FROM alpine:3.20 | ||||
| COPY --from=sbom /out/$SBOM_FILE /     # (optional) keep or discard | ||||
|  | ||||
| # (rest of your Dockerfile) | ||||
| ``` | ||||
|  | ||||
| ### Option B – **External builder step** (works everywhere; recommended baseline if Buildx isn’t available) | ||||
|  | ||||
| *(keep this block if your pipeline already has an image‑build step that you can’t modify)* | ||||
|  | ||||
| ```bash | ||||
| docker run --rm \ | ||||
|   -e DOCKER_HOST="$DOCKER_HOST" \                       # let builder reach the daemon remotely | ||||
|   -v "$WORKSPACE:/workspace" \                          # place SBOM beside the source code | ||||
|   "$STELLA_URL/registry/stella-sbom-builder:latest" \ | ||||
|     build --image "$IMAGE" --output "/workspace/${SBOM_FILE}" | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2 · Scan the image & upload results | ||||
|  | ||||
| ```bash | ||||
| docker run --rm \ | ||||
|   -e DOCKER_HOST="$DOCKER_HOST" \                       # remote‑daemon pointer | ||||
|   -v "$WORKSPACE/${SBOM_FILE}:/${SBOM_FILE}:ro" \       # mount SBOM under same name at container root | ||||
|   -e STELLA_OPS_URL="https://${STELLA_URL}" \           # where the CLI posts findings | ||||
|   "$STELLA_URL/registry/stella-cli:latest" \ | ||||
|     scan --sbom "/${SBOM_FILE}" "$IMAGE" | ||||
| ``` | ||||
|  | ||||
| The CLI returns **exit 0** if policies pass, **>0** if blocked — perfect for failing the job. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3 · CI templates | ||||
|  | ||||
| Below are minimal, cut‑and‑paste snippets. | ||||
| **Feel free to delete Option B** if you adopt Option A. | ||||
|  | ||||
| ### 3.1 Jenkins (Declarative Pipeline) | ||||
|  | ||||
| ```groovy | ||||
| pipeline { | ||||
|   agent { docker { image 'docker:25' args '--privileged' } }  // gives us /usr/bin/docker | ||||
|   environment { | ||||
|     STELLA_URL = 'stella-ops.ci.acme.example' | ||||
|     DOCKER_HOST = 'tcp://docker:2375' | ||||
|     IMAGE = "acme/backend:${env.BUILD_NUMBER}" | ||||
|     SBOM_FILE = "acme_backend_${env.BUILD_NUMBER}-${new Date().format('yyyyMMdd\'T\'HHmmss\'Z\'', TimeZone.getTimeZone('UTC'))}.sbom.json" | ||||
|   } | ||||
|   stages { | ||||
|     stage('Build image + SBOM (Option A)') { | ||||
|       steps { | ||||
|         sh ''' | ||||
|           docker build \ | ||||
|             --build-arg STELLA_SBOM_BUILDER="$STELLA_URL/registry/stella-sbom-builder:latest" \ | ||||
|             --build-arg SBOM_FILE="$SBOM_FILE" \ | ||||
|             -t "$IMAGE" . | ||||
|         ''' | ||||
|       } | ||||
|     } | ||||
|     /* ---------- Option B fallback (when you must keep the existing build step as‑is) ---------- | ||||
|     stage('SBOM builder (Option B)') { | ||||
|       steps { | ||||
|         sh ''' | ||||
|           docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \ | ||||
|             -v "$WORKSPACE:/workspace" \ | ||||
|             "$STELLA_URL/registry/stella-sbom-builder:latest" \ | ||||
|               build --image "$IMAGE" --output "/workspace/${SBOM_FILE}" | ||||
|         ''' | ||||
|       } | ||||
|     } | ||||
|     ------------------------------------------------------------------------------------------ */ | ||||
|     stage('Scan & upload') { | ||||
|       steps { | ||||
|         sh ''' | ||||
|           docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \ | ||||
|             -v "$WORKSPACE/${SBOM_FILE}:/${SBOM_FILE}:ro" \ | ||||
|             -e STELLA_OPS_URL="https://$STELLA_URL" \ | ||||
|             "$STELLA_URL/registry/stella-cli:latest" \ | ||||
|               scan --sbom "/${SBOM_FILE}" "$IMAGE" | ||||
|         ''' | ||||
|       } | ||||
|     } | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ### 3.2 CircleCI `.circleci/config.yml` | ||||
|  | ||||
| ```yaml | ||||
| version: 2.1 | ||||
| jobs: | ||||
|   stella_scan: | ||||
|     docker: | ||||
|       - image: cimg/base:stable           # baremetal image with Docker CLI | ||||
|     environment: | ||||
|       STELLA_URL: stella-ops.ci.acme.example | ||||
|       DOCKER_HOST: tcp://docker:2375      # Circle’s “remote Docker” socket | ||||
|     steps: | ||||
|       - checkout | ||||
|  | ||||
|       - run: | ||||
|           name: Compute vars | ||||
|           command: | | ||||
|             echo 'export IMAGE="acme/backend:${CIRCLE_SHA1}"' >> $BASH_ENV | ||||
|             echo 'export SBOM_FILE="$(echo acme/backend:${CIRCLE_SHA1} | tr "/:+" "__")-$(date -u +%Y%m%dT%H%M%SZ).sbom.json"' >> $BASH_ENV | ||||
|       - run: | ||||
|           name: Build image + SBOM (Option A) | ||||
|           command: | | ||||
|             docker build \ | ||||
|               --build-arg STELLA_SBOM_BUILDER="$STELLA_URL/registry/stella-sbom-builder:latest" \ | ||||
|               --build-arg SBOM_FILE="$SBOM_FILE" \ | ||||
|               -t "$IMAGE" . | ||||
|       # --- Option B fallback (when you must keep the existing build step as‑is) --- | ||||
|       #- run: | ||||
|       #    name: SBOM builder (Option B) | ||||
|       #    command: | | ||||
|       #      docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \ | ||||
|       #        -v "$PWD:/workspace" \ | ||||
|       #        "$STELLA_URL/registry/stella-sbom-builder:latest" \ | ||||
|       #          build --image "$IMAGE" --output "/workspace/${SBOM_FILE}" | ||||
|       - run: | ||||
|           name: Scan | ||||
|           command: | | ||||
|             docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \ | ||||
|               -v "$PWD/${SBOM_FILE}:/${SBOM_FILE}:ro" \ | ||||
|               -e STELLA_OPS_URL="https://$STELLA_URL" \ | ||||
|               "$STELLA_URL/registry/stella-cli:latest" \ | ||||
|                 scan --sbom "/${SBOM_FILE}" "$IMAGE" | ||||
| workflows: | ||||
|   stella: | ||||
|     jobs: [stella_scan] | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ### 3.3 Gitea Actions `.gitea/workflows/stella.yml` | ||||
|  | ||||
| *(Gitea 1.22+ ships native Actions compatible with GitHub syntax)* | ||||
|  | ||||
| ```yaml | ||||
| name: Stella Scan | ||||
| on: [push] | ||||
|  | ||||
| jobs: | ||||
|   stella: | ||||
|     runs-on: ubuntu-latest | ||||
|     env: | ||||
|       STELLA_URL: ${{ secrets.STELLA_URL }} | ||||
|       DOCKER_HOST: tcp://docker:2375       # provided by the docker:dind service | ||||
|     services: | ||||
|       docker: | ||||
|         image: docker:dind | ||||
|         options: >- | ||||
|           --privileged | ||||
|     steps: | ||||
|       - uses: actions/checkout@v4 | ||||
|  | ||||
|       - name: Compute vars | ||||
|         id: vars | ||||
|         run: | | ||||
|           echo "IMAGE=ghcr.io/${{ gitea.repository }}:${{ gitea.sha }}" >> $GITEA_OUTPUT | ||||
|           echo "SBOM_FILE=$(echo ghcr.io/${{ gitea.repository }}:${{ gitea.sha }} | tr '/:+' '__')-$(date -u +%Y%m%dT%H%M%SZ).sbom.json" >> $GITEA_OUTPUT | ||||
|  | ||||
|       - name: Build image + SBOM (Option A) | ||||
|         run: | | ||||
|           docker build \ | ||||
|             --build-arg STELLA_SBOM_BUILDER="${STELLA_URL}/registry/stella-sbom-builder:latest" \ | ||||
|             --build-arg SBOM_FILE="${{ steps.vars.outputs.SBOM_FILE }}" \ | ||||
|             -t "${{ steps.vars.outputs.IMAGE }}" . | ||||
|  | ||||
|       # --- Option B fallback (when you must keep the existing build step as‑is) --- | ||||
|       #- name: SBOM builder (Option B) | ||||
|       #  run: | | ||||
|       #    docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \ | ||||
|       #      -v "$(pwd):/workspace" \ | ||||
|       #      "${STELLA_URL}/registry/stella-sbom-builder:latest" \ | ||||
|       #        build --image "${{ steps.vars.outputs.IMAGE }}" --output "/workspace/${{ steps.vars.outputs.SBOM_FILE }}" | ||||
|  | ||||
|       - name: Scan | ||||
|         run: | | ||||
|           docker run --rm -e DOCKER_HOST="$DOCKER_HOST" \ | ||||
|             -v "$(pwd)/${{ steps.vars.outputs.SBOM_FILE }}:/${{ steps.vars.outputs.SBOM_FILE }}:ro" \ | ||||
|             -e STELLA_OPS_URL="https://${STELLA_URL}" \ | ||||
|             "${STELLA_URL}/registry/stella-cli:latest" \ | ||||
|               scan --sbom "/${{ steps.vars.outputs.SBOM_FILE }}" "${{ steps.vars.outputs.IMAGE }}" | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4 · Docs CI (Gitea Actions & Offline Mirror) | ||||
|  | ||||
| StellaOps ships a dedicated Docs workflow at `.gitea/workflows/docs.yml`. When mirroring the pipeline offline or running it locally, install the same toolchain so markdown linting, schema validation, and HTML preview stay deterministic. | ||||
|  | ||||
| ### 4.1 Toolchain bootstrap | ||||
|  | ||||
| ```bash | ||||
| # Node.js 20.x is required; install once per runner | ||||
| npm install --no-save \ | ||||
|   markdown-link-check \ | ||||
|   remark-cli \ | ||||
|   remark-preset-lint-recommended \ | ||||
|   ajv \ | ||||
|   ajv-cli \ | ||||
|   ajv-formats | ||||
|  | ||||
| # Python 3.11+ powers the preview renderer | ||||
| python -m pip install --upgrade pip | ||||
| python -m pip install markdown pygments | ||||
| ``` | ||||
|  | ||||
| > **No `pip` available?** Some hardened Python builds (including the repo’s `tmp/docenv` | ||||
| > interpreter) ship without `pip`/`ensurepip`. In that case download the pure‑Python | ||||
| > sdists (e.g. `Markdown-3.x.tar.gz`, `pygments-2.x.tar.gz`) and extract their | ||||
| > packages directly into the virtualenv’s `lib/python*/site-packages/` folder. | ||||
| > This keeps the renderer working even when package managers are disabled. | ||||
|  | ||||
| **Offline tip.** Add the packages above to your artifact mirror (for example `ops/devops/offline-kit.json`) so runners can install them via `npm --offline` / `pip --no-index`. | ||||
|  | ||||
| ### 4.2 Schema validation step | ||||
|  | ||||
| Ajv compiles every event schema to guard against syntax or format regressions. The workflow uses `ajv-formats` for UUID/date-time support. | ||||
|  | ||||
| ```bash | ||||
| for schema in docs/events/*.json; do | ||||
|   npx ajv compile -c ajv-formats -s "$schema" | ||||
| done | ||||
| ``` | ||||
|  | ||||
| Run this loop before committing schema changes. For new references, append `-r additional-file.json` so CI and local runs stay aligned. | ||||
|  | ||||
| ### 4.3 Preview build | ||||
|  | ||||
| ```bash | ||||
| python scripts/render_docs.py --source docs --output artifacts/docs-preview --clean | ||||
| ``` | ||||
|  | ||||
| Host the resulting bundle via any static file server for review (for example `python -m http.server`). | ||||
|  | ||||
| ### 4.4 Publishing checklist | ||||
|  | ||||
| - [ ] Toolchain installs succeed without hitting the public internet (mirror or cached tarballs). | ||||
| - [ ] Ajv validation passes for `scanner.report.ready@1`, `scheduler.rescan.delta@1`, `attestor.logged@1`. | ||||
| - [ ] Markdown link check (`npx markdown-link-check`) reports no broken references. | ||||
| - [ ] Preview bundle archived (or attached) for stakeholders. | ||||
|  | ||||
| ### 4.5 Policy DSL lint stage | ||||
|  | ||||
| Policy Engine v2 pipelines now fail fast if policy documents are malformed. After checkout and dotnet restore, run: | ||||
|  | ||||
| ```bash | ||||
| dotnet run \ | ||||
|   --project src/Tools/PolicyDslValidator/PolicyDslValidator.csproj \ | ||||
|   -- \ | ||||
|   --strict docs/examples/policies/*.yaml | ||||
| ``` | ||||
|  | ||||
| - `--strict` treats warnings as errors so missing metadata doesn’t slip through. | ||||
| - The validator accepts globs, so you can point it at tenant policy directories later (`policies/**/*.yaml`). | ||||
| - Exit codes follow UNIX conventions: `0` success, `1` parse/errors, `2` warnings when `--strict` is set, `64` usage mistakes. | ||||
|  | ||||
| Capture the validator output as part of your build logs; Support uses it when triaging policy rollout issues. | ||||
|  | ||||
| ### 4.6 Policy simulation smoke | ||||
|  | ||||
| Catch unexpected policy regressions by exercising a small set of golden SBOM findings via the simulation smoke tool: | ||||
|  | ||||
| ```bash | ||||
| dotnet run \ | ||||
|   --project src/Tools/PolicySimulationSmoke/PolicySimulationSmoke.csproj \ | ||||
|   -- \ | ||||
|   --scenario-root samples/policy/simulations \ | ||||
|   --output artifacts/policy-simulations | ||||
| ``` | ||||
|  | ||||
| - The tool loads each `scenario.json` under `samples/policy/simulations`, evaluates the referenced policy, and fails the build if projected verdicts change. | ||||
| - In CI the command runs twice (to `run1/` and `run2/`) and `diff -u` compares the summaries—any mismatch signals a determinism regression. | ||||
| - Artifacts land in `artifacts/policy-simulations/policy-simulation-summary.json`; upload them for later inspection (see CI workflow). | ||||
| - Expand scenarios by copying real-world findings into the samples directory—ensure expected statuses are recorded so regressions trip the pipeline. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5 · Troubleshooting cheat‑sheet | ||||
|  | ||||
| | Symptom                               | Root cause                  | First things to try                                             | | ||||
| | ------------------------------------- | --------------------------- | --------------------------------------------------------------- | | ||||
| | `no such host $STELLA_URL`            | DNS typo or VPN outage      | `ping $STELLA_URL` from runner                                  | | ||||
| | `connection refused` when CLI uploads | Port 443 blocked            | open firewall / check ingress                                   | | ||||
| | `failed to stat /<sbom>.json`         | SBOM wasn’t produced        | Did Option A actually run builder? If not, enable Option B      | | ||||
| | `registry unauthorized`               | Runner lacks registry creds | `docker login $STELLA_URL/registry` (store creds in CI secrets) | | ||||
| | Non‑zero scan exit                    | Blocking vuln/licence       | Open project in Ops UI → triage or waive                        | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ### Change log | ||||
|  | ||||
| * **2025‑10‑18** – Documented Docs CI toolchain (Ajv validation, static preview) and offline checklist. | ||||
| * **2025‑08‑04** – Variable clean‑up, removed Docker‑socket & cache mounts, added Jenkins / CircleCI / Gitea examples, clarified Option B comment. | ||||
							
								
								
									
										22
									
								
								docs/modules/cli/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/cli/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # CLI agent guide | ||||
|  | ||||
| ## Mission | ||||
| The `stella` CLI is the operator-facing Swiss army knife for scans, exports, policy management, offline kit operations, and automation scripting. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										40
									
								
								docs/modules/cli/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										40
									
								
								docs/modules/cli/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,40 @@ | ||||
| # StellaOps CLI | ||||
|  | ||||
| The `stella` CLI is the operator-facing Swiss army knife for scans, exports, policy management, offline kit operations, and automation scripting. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Deliver deterministic verbs for scan, diff, export, policy, and observability operations. | ||||
| - Handle interactive and non-interactive authentication via Authority (device code, client credentials). | ||||
| - Support offline kit workflows including bundle verification and seed installation. | ||||
| - Expose JSON outputs suitable for CI parity and golden tests. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Cli` native AOT host. | ||||
| - Shared helpers in `StellaOps.Cli.Core`. | ||||
| - Restart-time plug-ins under `StellaOps.Cli.Plugins.*`. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Authority for token exchange. | ||||
| - Backend APIs (Scanner, Policy, Export Center, Notify). | ||||
| - Offline kit bundles and local keychain/DPoP storage. | ||||
|  | ||||
| ## Operational notes | ||||
| - Deterministic output fixtures under `src/Cli/StellaOps.Cli.Tests`. | ||||
| - Versioned command docs in `docs/modules/cli/guides`. | ||||
| - Plugin catalogue in `plugins/cli/**` (restart-only). | ||||
|  | ||||
| ## Related resources | ||||
| - ./guides/20_REFERENCE.md | ||||
| - ./guides/cli-reference.md | ||||
| - ./guides/policy.md | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-CLI-OBS-52-001 / DOCS-CLI-FORENSICS-53-001 in ../../TASKS.md. | ||||
| - CLI-CORE-41-001 epic in `src/Cli/StellaOps.Cli/TASKS.md`. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 2 – Policy Engine & Editor:** deliver deterministic policy authoring, simulation, and explain verbs. | ||||
| - **Epic 4 – Policy Studio:** integrate registry/promotion workflows, approvals, and lint tooling. | ||||
| - **Epic 6 – Vulnerability Explorer:** surface triage and ledger operations. | ||||
| - **Epic 10 – Export Center:** orchestrate export requests, verification, and Offline Kit automation. | ||||
| - **Epic 11 – Notifications Studio:** manage notification authoring/previews from the command line. | ||||
							
								
								
									
										9
									
								
								docs/modules/cli/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/cli/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — CLI | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | CLI-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | CLI-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | CLI-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										408
									
								
								docs/modules/cli/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										408
									
								
								docs/modules/cli/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,408 @@ | ||||
| # component_architecture_cli.md — **Stella Ops CLI** (2025Q4) | ||||
|  | ||||
| > Consolidates requirements captured in the Policy Engine, Policy Studio, Vulnerability Explorer, Export Center, and Notifications implementation plans and module guides. | ||||
|  | ||||
| > **Scope.** Implementation‑ready architecture for **Stella Ops CLI**: command surface, process model, auth (Authority/DPoP), integration with Scanner/Excititor/Concelier/Signer/Attestor, Buildx plug‑in management, offline kit behavior, packaging, observability, security posture, and CI ergonomics. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Provide a **fast, deterministic, CI‑friendly** command‑line interface to drive Stella Ops workflows: | ||||
|  | ||||
| * Build‑time SBOM generation via **Buildx generator** orchestration. | ||||
| * Post‑build **scan/compose/diff/export** against **Scanner.WebService**. | ||||
| * **Policy** operations and **VEX/Vuln** data pulls (operator tasks). | ||||
| * **Verification** (attestation, referrers, signatures) for audits. | ||||
| * Air‑gapped/offline **kit** administration. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * CLI **never** signs; it only calls **Signer**/**Attestor** via backend APIs when needed (e.g., `report --attest`). | ||||
| * CLI **does not** store long‑lived credentials beyond OS keychain; tokens are **short** (Authority OpToks). | ||||
| * Heavy work (scanning, merging, policy) is executed **server‑side** (Scanner/Excititor/Concelier). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Solution layout & runtime form | ||||
|  | ||||
| ``` | ||||
| src/ | ||||
|  ├─ StellaOps.Cli/                         # net10.0 (Native AOT) single binary | ||||
|  ├─ StellaOps.Cli.Core/                    # verb plumbing, config, HTTP, auth | ||||
|  ├─ StellaOps.Cli.Plugins/                 # optional verbs packaged as plugins | ||||
|  ├─ StellaOps.Cli.Tests/                   # unit + golden-output tests | ||||
|  └─ packaging/ | ||||
|      ├─ msix / msi / deb / rpm / brew formula | ||||
|      └─ scoop manifest / winget manifest | ||||
| ``` | ||||
|  | ||||
| **Language/runtime**: .NET 10 **Native AOT** for speed/startup; Linux builds use **musl** static when possible. | ||||
|  | ||||
| **Plug-in verbs.** Non-core verbs (Excititor, runtime helpers, future integrations) ship as restart-time plug-ins under `plugins/cli/**` with manifest descriptors. The launcher loads plug-ins on startup; hot reloading is intentionally unsupported. The inaugural bundle, `StellaOps.Cli.Plugins.NonCore`, packages the Excititor, runtime, and offline-kit command groups and publishes its manifest at `plugins/cli/StellaOps.Cli.Plugins.NonCore/`. | ||||
|  | ||||
| **OS targets**: linux‑x64/arm64, windows‑x64/arm64, macOS‑x64/arm64. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Command surface (verbs) | ||||
|  | ||||
| > All verbs default to **JSON** output when `--json` is set (CI mode). Human output is concise, deterministic. | ||||
|  | ||||
| ### 2.1 Auth & profile | ||||
|  | ||||
| * `auth login` | ||||
|  | ||||
|   * Modes: **device‑code** (default), **client‑credentials** (service principal). | ||||
|   * Produces **Authority** access token (OpTok) + stores **DPoP** keypair in OS keychain. | ||||
| * `auth status` — show current issuer, subject, audiences, expiry. | ||||
| * `auth logout` — wipe cached tokens/keys. | ||||
|  | ||||
| ### 2.2 Build‑time SBOM (Buildx) | ||||
|  | ||||
| * `buildx install` — install/update the **StellaOps.Scanner.Sbomer.BuildXPlugin** on the host. | ||||
| * `buildx verify` — ensure generator is usable. | ||||
| * `buildx build` — thin wrapper around `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer` with convenience flags: | ||||
|  | ||||
|   * `--attest` (request Signer/Attestor via backend post‑push) | ||||
|   * `--provenance` pass‑through (optional) | ||||
|  | ||||
| ### 2.3 Scanning & artifacts | ||||
|  | ||||
| * `scan image <ref|digest>` | ||||
|  | ||||
|   * Options: `--force`, `--wait`, `--view=inventory|usage|both`, `--format=cdx-json|cdx-pb|spdx-json`, `--attest` (ask backend to sign/log). | ||||
|   * Streams progress; exits early unless `--wait`. | ||||
| * `diff image --old <digest> --new <digest> [--view ...]` — show layer‑attributed changes. | ||||
| * `export sbom <digest> [--view ... --format ... --out file]` — download artifact. | ||||
| * `report final <digest> [--policy-revision ... --attest]` — request PASS/FAIL report from backend (policy+vex) and optional attestation. | ||||
|  | ||||
| ### 2.4 Policy & data | ||||
|  | ||||
| * `policy get/set/apply` — fetch active policy, apply staged policy, compute digest. | ||||
| * `concelier export` — trigger/export canonical JSON or Trivy DB (admin). | ||||
| * `excititor export` — trigger/export consensus/raw claims (admin). | ||||
|  | ||||
| ### 2.5 Verification | ||||
|  | ||||
| * `verify attestation --uuid <rekor-uuid> | --artifact <sha256> | --bundle <path>` — call **Attestor /verify** and print proof summary. | ||||
| * `verify referrers <digest>` — ask **Signer /verify/referrers** (is image Stella‑signed?). | ||||
| * `verify image-signature <ref|digest>` — standalone cosign verification (optional, local). | ||||
|  | ||||
| ### 2.6 Runtime (Zastava helper) | ||||
|  | ||||
| * `runtime policy test --image/-i <digest> [--file <path> --ns <name> --label key=value --json]` — ask backend `/policy/runtime` like the webhook would (accepts multiple `--image`, comma/space lists, or stdin pipelines). | ||||
|  | ||||
| ### 2.7 Offline kit | ||||
|  | ||||
| * `offline kit pull` — fetch latest **Concelier JSON + Trivy DB + Excititor exports** as a tarball from a mirror. | ||||
| * `offline kit import <tar>` — upload the kit to on‑prem services (Concelier/Excititor). | ||||
| * `offline kit status` — list current seed versions. | ||||
|  | ||||
| ### 2.8 Utilities | ||||
|  | ||||
| * `config set/get` — endpoint & defaults. | ||||
| * `whoami` — short auth display. | ||||
| * `version` — CLI + protocol versions; release channel. | ||||
|  | ||||
| ### 2.9 Aggregation-only guard helpers | ||||
|  | ||||
| * `sources ingest --dry-run --source <id> --input <path|uri> [--tenant ... --format table|json --output file]` | ||||
|  | ||||
|   * Normalises documents (handles gzip/base64), posts them to the backend `aoc/ingest/dry-run` route, and exits non-zero when guard violations are detected. | ||||
|   * Defaults to table output with ANSI colour; `--json`/`--output` produce deterministic JSON for CI pipelines. | ||||
|  | ||||
| * `aoc verify [--since <ISO8601|duration>] [--limit <count>] [--sources list] [--codes list] [--format table|json] [--export file] [--tenant id] [--no-color]` | ||||
|  | ||||
|   * Replays guard checks against stored raw documents. Maps backend `ERR_AOC_00x` codes onto deterministic exit codes so CI can block regressions. | ||||
|   * Supports pagination hints (`--limit`, `--since`), tenant scoping via `--tenant` or `STELLA_TENANT`, and JSON exports for evidence lockers. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) AuthN: Authority + DPoP | ||||
|  | ||||
| ### 3.1 Token acquisition | ||||
|  | ||||
| * **Device‑code**: the CLI opens an OIDC device code flow against **Authority**; the browser login is optional for service principals. | ||||
| * **Client‑credentials**: service principals use **private_key_jwt** or **mTLS** to get tokens. | ||||
|  | ||||
| ### 3.2 DPoP key management | ||||
|  | ||||
| * On first login, the CLI generates an **ephemeral JWK** (Ed25519) and stores it in the **OS keychain** (Keychain/DPAPI/KWallet/Gnome Keyring). | ||||
| * Every request to backend services includes a **DPoP proof**; CLI refreshes tokens as needed. | ||||
|  | ||||
| ### 3.3 Multi‑audience & scopes | ||||
|  | ||||
| * CLI requests **audiences** as needed per verb: | ||||
|  | ||||
|   * `scanner` for scan/export/report/diff | ||||
|   * `signer` (indirect; usually backend calls Signer) | ||||
|   * `attestor` for verify | ||||
|   * `concelier`/`excititor` for admin verbs | ||||
|  | ||||
| CLI rejects verbs if required scopes are missing. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Process model & reliability | ||||
|  | ||||
| ### 4.1 HTTP client | ||||
|  | ||||
| * Single **http2** client with connection pooling, DNS pinning, retry/backoff (idempotent GET/POST marked safe). | ||||
| * **DPoP nonce** handling: on `401` with nonce challenge, CLI replays once. | ||||
|  | ||||
| ### 4.2 Streaming | ||||
|  | ||||
| * `scan` and `report` support **server‑sent JSON lines** (progress events). | ||||
| * `--json` prints machine events; human mode shows compact spinners and crucial updates only. | ||||
|  | ||||
| ### 4.3 Exit codes (CI‑safe) | ||||
|  | ||||
| | Code | Meaning                                     | | ||||
| | ---- | ------------------------------------------- | | ||||
| | 0    | Success                                     | | ||||
| | 2    | Policy fail (final report verdict=fail)     | | ||||
| | 3    | Verification failed (attestation/signature) | | ||||
| | 4    | Auth error (invalid/missing token/DPoP)     | | ||||
| | 5    | Resource not found (image/SBOM)             | | ||||
| | 6    | Rate limited / quota exceeded               | | ||||
| | 7    | Backend unavailable (retryable)             | | ||||
| | 9    | Invalid arguments                           | | ||||
| | 11–17 | Aggregation-only guard violation (`ERR_AOC_00x`) | | ||||
| | 18   | Verification truncated (increase `--limit`) | | ||||
| | 70   | Transport/authentication failure            | | ||||
| | 71   | CLI usage error (missing tenant, invalid cursor) | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Configuration model | ||||
|  | ||||
| **Precedence:** CLI flags → env vars → config file → defaults. | ||||
|  | ||||
| **Config file**: `${XDG_CONFIG_HOME}/stellaops/config.yaml` (Windows: `%APPDATA%\StellaOps\config.yaml`) | ||||
|  | ||||
| ```yaml | ||||
| cli: | ||||
|   authority: "https://authority.internal" | ||||
|   backend: | ||||
|     scanner: "https://scanner-web.internal" | ||||
|     attestor: "https://attestor.internal" | ||||
|     concelier: "https://concelier-web.internal" | ||||
|     excititor: "https://excititor-web.internal" | ||||
|   auth: | ||||
|     audienceDefault: "scanner" | ||||
|     deviceCode: true | ||||
|   output: | ||||
|     json: false | ||||
|     color: auto | ||||
|   tls: | ||||
|     caBundle: "/etc/ssl/certs/ca-bundle.crt" | ||||
|   offline: | ||||
|     kitMirror: "s3://mirror/stellaops-kit" | ||||
| ``` | ||||
|  | ||||
| Environment variables: `STELLAOPS_AUTHORITY`, `STELLAOPS_SCANNER_URL`, etc. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Buildx generator orchestration | ||||
|  | ||||
| * `buildx install` locates the Docker root directory, writes the **generator** plugin manifest, and pulls `stellaops/sbom-indexer` image (pinned digest). | ||||
| * `buildx build` wrapper injects: | ||||
|  | ||||
|   * `--attest=type=sbom,generator=stellaops/sbom-indexer` | ||||
|   * `--label org.stellaops.request=sbom` | ||||
| * Post‑build: CLI optionally calls **Scanner.WebService** to **verify referrers**, **compose** image SBOMs, and **attest** via Signer/Attestor. | ||||
|  | ||||
| **Detection**: If Buildx or generator unavailable, CLI falls back to **post‑build scan** with a warning. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Artifact handling | ||||
|  | ||||
| * **Downloads** (`export sbom`, `report final`): stream to file; compute sha256 on the fly; write sidecar `.sha256` and optional **verification bundle** (if `--bundle`). | ||||
| * **Uploads** (`offline kit import`): chunked upload; retry on transient errors; show progress bar (unless `--json`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Security posture | ||||
|  | ||||
| * **DPoP private keys** stored in **OS keychain**; metadata cached in config. | ||||
| * **No plaintext tokens** on disk; short‑lived **OpToks** held in memory. | ||||
| * **TLS**: verify backend certificates; allow custom CA bundle for on‑prem. | ||||
| * **Redaction**: CLI logs remove `Authorization`, DPoP headers, PoE tokens. | ||||
| * **Supply chain**: CLI distribution binaries are **cosign‑signed**; `stellaops version --verify` checks its own signature. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Observability | ||||
|  | ||||
| * `--verbose` adds request IDs, timings, and retry traces. | ||||
| * **Metrics** (optional, disabled by default): Prometheus text file exporter for local monitoring in long‑running agents. | ||||
| * **Structured logs** (`--json`): per‑event JSON lines with `ts`, `verb`, `status`, `latencyMs`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Performance targets | ||||
|  | ||||
| * Startup ≤ **20 ms** (AOT). | ||||
| * `scan image` request/response overhead ≤ **5 ms** (excluding server work). | ||||
| * Buildx wrapper overhead negligible (<1 ms). | ||||
| * Large artifact download (100 MB) sustained ≥ **80 MB/s** on local networks. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Tests & golden outputs | ||||
|  | ||||
| * **Unit tests**: argument parsing, config precedence, URL resolution, DPoP proof creation. | ||||
| * **Integration tests** (Testcontainers): mock Authority/Scanner/Attestor; CI pipeline with fake registry. | ||||
| * **Golden outputs**: verb snapshots for `--json` across OSes; kept in `tests/golden/…`. | ||||
| * **Contract tests**: ensure API shapes match service OpenAPI; fail build if incompatible. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Error envelopes (human + JSON) | ||||
|  | ||||
| **Human:** | ||||
|  | ||||
| ``` | ||||
| ✖ Policy FAIL: 3 high, 1 critical (VEX suppressed 12) | ||||
|   - pkg:rpm/openssl (CVE-2025-12345) — affected (vendor) — fixed in 3.0.14 | ||||
|   - pkg:npm/lodash (GHSA-xxxx) — affected — no fix | ||||
|   See: https://ui.internal/scans/sha256:... | ||||
| Exit code: 2 | ||||
| ``` | ||||
|  | ||||
| **JSON (`--json`):** | ||||
|  | ||||
| ```json | ||||
| { "event":"report", "status":"fail", "critical":1, "high":3, "url":"https://ui..." } | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Admin & advanced flags | ||||
|  | ||||
| * `--authority`, `--scanner`, `--attestor`, `--concelier`, `--excititor` override config URLs. | ||||
| * `--no-color`, `--quiet`, `--json`. | ||||
| * `--timeout`, `--retries`, `--retry-backoff-ms`. | ||||
| * `--ca-bundle`, `--insecure` (dev only; prints warning). | ||||
| * `--trace` (dump HTTP traces to file; scrubbed). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Interop with other tools | ||||
|  | ||||
| * Emits **CycloneDX Protobuf** directly to stdout when `export sbom --format cdx-pb --out -`. | ||||
| * Pipes to `jq`/`yq` cleanly in JSON mode. | ||||
| * Can act as a **credential helper** for scripts: `stellaops auth token --aud scanner` prints a one‑shot token for curl. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Packaging & distribution | ||||
|  | ||||
| * **Installers**: deb/rpm (postinst registers completions), Homebrew, Scoop, Winget, MSI/MSIX. | ||||
| * **Shell completions**: bash/zsh/fish/pwsh. | ||||
| * **Update channel**: `stellaops self-update` (optional) fetches cosign‑signed release manifest; corporate environments can disable. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Security hard lines | ||||
|  | ||||
| * Refuse to print token values; redact Authorization headers in verbose output. | ||||
| * Disallow `--insecure` unless `STELLAOPS_CLI_ALLOW_INSECURE=1` set (double opt‑in). | ||||
| * Enforce **short token TTL**; refresh proactively when <30 s left. | ||||
| * Device‑code cache binding to **machine** and **user** (protect against copy to other machines). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Wire sequences | ||||
|  | ||||
| **A) Scan & wait with attestation** | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant CLI | ||||
|   participant Auth as Authority | ||||
|   participant SW as Scanner.WebService | ||||
|   participant SG as Signer | ||||
|   participant AT as Attestor | ||||
|  | ||||
|   CLI->>Auth: device code flow (DPoP) | ||||
|   Auth-->>CLI: OpTok (aud=scanner) | ||||
|  | ||||
|   CLI->>SW: POST /scans { imageRef, attest:true } | ||||
|   SW-->>CLI: { scanId } | ||||
|   CLI->>SW: GET /scans/{id} (poll) | ||||
|   SW-->>CLI: { status: completed, artifacts, rekor? }  # if attested | ||||
|  | ||||
|   alt attestation pending | ||||
|     SW->>SG: POST /sign/dsse (server-side) | ||||
|     SG-->>SW: DSSE | ||||
|     SW->>AT: POST /rekor/entries | ||||
|     AT-->>SW: { uuid, proof } | ||||
|   end | ||||
|  | ||||
|   CLI->>SW: GET /sboms/<digest>?format=cdx-pb&view=usage | ||||
|   SW-->>CLI: bytes | ||||
| ``` | ||||
|  | ||||
| **B) Verify attestation by artifact** | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant CLI | ||||
|   participant AT as Attestor | ||||
|  | ||||
|   CLI->>AT: POST /rekor/verify { artifactSha256 } | ||||
|   AT-->>CLI: { ok:true, uuid, index, logURL } | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 18) Roadmap (CLI) | ||||
|  | ||||
| * `scan fs <path>` (local filesystem tree) → upload to backend for analysis. | ||||
| * `policy test --sbom <file>` (simulate policy results offline using local policy bundle). | ||||
| * `runtime capture` (developer mode) — capture small `/proc/<pid>/maps` for troubleshooting. | ||||
| * Pluggable output renderers for SARIF/HTML (admin‑controlled). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 19) Example CI snippets | ||||
|  | ||||
| **GitHub Actions (post‑build)** | ||||
|  | ||||
| ```yaml | ||||
| - name: Login (device code w/ OIDC broker) | ||||
|   run: stellaops auth login --json --authority ${{ secrets.AUTHORITY_URL }} | ||||
|  | ||||
| - name: Scan | ||||
|   run: stellaops scan image ${{ steps.build.outputs.digest }} --wait --json | ||||
|  | ||||
| - name: Export (usage view, protobuf) | ||||
|   run: stellaops export sbom ${{ steps.build.outputs.digest }} --view usage --format cdx-pb --out sbom.pb | ||||
|  | ||||
| - name: Verify attestation | ||||
|   run: stellaops verify attestation --artifact $(sha256sum sbom.pb | cut -d' ' -f1) --json | ||||
| ``` | ||||
|  | ||||
| **GitLab (buildx generator)** | ||||
|  | ||||
| ```yaml | ||||
| script: | ||||
|   - stellaops buildx install | ||||
|   - docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA . | ||||
|   - stellaops scan image $CI_REGISTRY_IMAGE@$IMAGE_DIGEST --wait --json | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 20) Test matrix (OS/arch) | ||||
|  | ||||
| * Linux: ubuntu‑20.04/22.04/24.04 (x64, arm64), alpine (musl). | ||||
| * macOS: 13–15 (x64, arm64). | ||||
| * Windows: 10/11, Server 2019/2022 (x64, arm64). | ||||
| * Docker engines: Docker Desktop, containerd‑based runners. | ||||
							
								
								
									
										8
									
								
								docs/modules/cli/guides/20_REFERENCE.md
									
									
									
									
									
										Executable file
									
								
							
							
						
						
									
										8
									
								
								docs/modules/cli/guides/20_REFERENCE.md
									
									
									
									
									
										Executable file
									
								
							| @@ -0,0 +1,8 @@ | ||||
| # CLI Reference (`stella --help`) | ||||
|  | ||||
| > **Auto‑generated file — do not edit manually.**   | ||||
| > On every tagged release the CI pipeline runs   | ||||
| > `stella --help --markdown > docs/modules/cli/guides/20_REFERENCE.md`   | ||||
| > ensuring this document always matches the shipped binary. | ||||
|  | ||||
| *(The reference will appear after the first public α release.)* | ||||
							
								
								
									
										316
									
								
								docs/modules/cli/guides/cli-reference.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										316
									
								
								docs/modules/cli/guides/cli-reference.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,316 @@ | ||||
| # CLI AOC Commands Reference | ||||
|  | ||||
| > **Audience:** DevEx engineers, operators, and CI authors integrating the `stella` CLI with Aggregation-Only Contract (AOC) workflows.   | ||||
| > **Scope:** Command synopsis, options, exit codes, and offline considerations for `stella sources ingest --dry-run` and `stella aoc verify` as introduced in Sprint 19. | ||||
|  | ||||
| Both commands are designed to enforce the AOC guardrails documented in the [aggregation-only reference](../../ingestion/aggregation-only-contract.md) and the [architecture overview](../../platform/architecture-overview.md). They consume Authority-issued tokens with tenant scopes and never mutate ingestion stores. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1 · Prerequisites | ||||
|  | ||||
| - CLI version: `stella` ≥ 0.19.0 (AOC feature gate enabled). | ||||
| - Required scopes (DPoP-bound): | ||||
|   - `advisory:read` for Concelier sources. | ||||
|   - `vex:read` for Excititor sources (optional but required for VEX checks). | ||||
|   - `aoc:verify` to invoke guard verification endpoints. | ||||
|   - `tenant:select` if your deployment uses tenant switching. | ||||
| - Connectivity: direct access to Concelier/Excititor APIs or Offline Kit snapshot (see § 4). | ||||
| - Environment: set `STELLA_AUTHORITY_URL`, `STELLA_TENANT`, and export a valid OpTok via `stella auth login` or existing token cache. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2 · `stella sources ingest --dry-run` | ||||
|  | ||||
| ### 2.1 Synopsis | ||||
|  | ||||
| ```bash | ||||
| stella sources ingest --dry-run \ | ||||
|   --source <source-key> \ | ||||
|   --input <path-or-uri> \ | ||||
|   [--tenant <tenant-id>] \ | ||||
|   [--format json|table] \ | ||||
|   [--no-color] \ | ||||
|   [--output <file>] | ||||
| ``` | ||||
|  | ||||
| ### 2.2 Description | ||||
|  | ||||
| Previews an ingestion write without touching MongoDB. The command loads an upstream advisory or VEX document, computes the would-write payload, runs it through the `AOCWriteGuard`, and reports any forbidden fields, provenance gaps, or idempotency issues. Use it during connector development, CI validation, or while triaging incidents. | ||||
|  | ||||
| ### 2.3 Options | ||||
|  | ||||
| | Option | Description | | ||||
| |--------|-------------| | ||||
| | `--source <source-key>` | Logical source name (`redhat`, `ubuntu`, `osv`, etc.). Mirrors connector configuration. | | ||||
| | `--input <path-or-uri>` | Path to local CSAF/OSV/VEX file or HTTPS URI. CLI normalises transport (gzip/base64) before guard evaluation. | | ||||
| | `--tenant <tenant-id>` | Overrides default tenant for multi-tenant deployments. Mandatory when `STELLA_TENANT` is not set. | | ||||
| | `--format json|table` | Output format. `table` (default) prints summary with highlighted violations; `json` emits machine-readable report (see below). | | ||||
| | `--no-color` | Disables ANSI colour output for CI logs. | | ||||
| | `--output <file>` | Writes the JSON report to file while still printing human-readable summary to stdout. | | ||||
|  | ||||
| ### 2.4 Output schema (JSON) | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "source": "redhat", | ||||
|   "tenant": "default", | ||||
|   "guardVersion": "1.0.0", | ||||
|   "status": "ok", | ||||
|   "document": { | ||||
|     "contentHash": "sha256:…", | ||||
|     "supersedes": null, | ||||
|     "provenance": { | ||||
|       "signature": { "format": "pgp", "present": true } | ||||
|     } | ||||
|   }, | ||||
|   "violations": [] | ||||
| } | ||||
| ``` | ||||
|  | ||||
| When violations exist, `status` becomes `error` and `violations` contains entries with `code` (`ERR_AOC_00x`), a short `message`, and JSON Pointer `path` values indicating offending fields. | ||||
|  | ||||
| ### 2.5 Exit codes | ||||
|  | ||||
| | Exit code | Meaning | | ||||
| |-----------|---------| | ||||
| | `0` | Guard passed; would-write payload is AOC compliant. | | ||||
| | `11` | `ERR_AOC_001` – Forbidden field (`severity`, `cvss`, etc.) detected. | | ||||
| | `12` | `ERR_AOC_002` – Merge attempt (multiple upstream sources fused). | | ||||
| | `13` | `ERR_AOC_003` – Idempotency violation (duplicate without supersedes). | | ||||
| | `14` | `ERR_AOC_004` – Missing provenance fields. | | ||||
| | `15` | `ERR_AOC_005` – Signature/checksum mismatch. | | ||||
| | `16` | `ERR_AOC_006` – Effective findings present (Policy-only data). | | ||||
| | `17` | `ERR_AOC_007` – Unknown top-level fields / schema violation. | | ||||
| | `70` | Transport error (network, auth, malformed input). | | ||||
|  | ||||
| > Exit codes map directly to the `ERR_AOC_00x` table for scripting consistency. Multiple violations yield the highest-priority code (e.g., 11 takes precedence over 14). | ||||
|  | ||||
| ### 2.6 Examples | ||||
|  | ||||
| Dry-run a local CSAF file: | ||||
|  | ||||
| ```bash | ||||
| stella sources ingest --dry-run \ | ||||
|   --source redhat \ | ||||
|   --input ./fixtures/redhat/RHSA-2025-1234.json | ||||
| ``` | ||||
|  | ||||
| Stream from HTTPS and emit JSON for CI: | ||||
|  | ||||
| ```bash | ||||
| stella sources ingest --dry-run \ | ||||
|   --source osv \ | ||||
|   --input https://osv.dev/vulnerability/GHSA-aaaa-bbbb \ | ||||
|   --format json \ | ||||
|   --output artifacts/osv-dry-run.json | ||||
|  | ||||
| cat artifacts/osv-dry-run.json | jq '.violations' | ||||
| ``` | ||||
|  | ||||
| ### 2.7 Offline notes | ||||
|  | ||||
| When operating in sealed/offline mode: | ||||
|  | ||||
| - Use `--input` paths pointing to Offline Kit snapshots (`offline-kit/advisories/*.json`). | ||||
| - Provide `--tenant` explicitly if the offline bundle contains multiple tenants. | ||||
| - The command does not attempt network access when given a file path. | ||||
| - Store reports with `--output` to include in transfer packages for policy review. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3 · `stella aoc verify` | ||||
|  | ||||
| ### 3.1 Synopsis | ||||
|  | ||||
| ```bash | ||||
| stella aoc verify \ | ||||
|   [--since <iso8601|duration>] \ | ||||
|   [--limit <count>] \ | ||||
|   [--sources <list>] \ | ||||
|   [--codes <ERR_AOC_00x,...>] \ | ||||
|   [--format table|json] \ | ||||
|   [--export <file>] \ | ||||
|   [--tenant <tenant-id>] \ | ||||
|   [--no-color] | ||||
| ``` | ||||
|  | ||||
| ### 3.2 Description | ||||
|  | ||||
| Replays the AOC guard against stored raw documents. By default it checks all advisories and VEX statements ingested in the last 24 hours for the active tenant, reporting totals, top violation codes, and sample documents. Use it in CI pipelines, scheduled verifications, or during incident response. | ||||
|  | ||||
| ### 3.3 Options | ||||
|  | ||||
| | Option | Description | | ||||
| |--------|-------------| | ||||
| | `--since <value>` | Verification window. Accepts ISO 8601 timestamp (`2025-10-25T12:00:00Z`) or duration (`48h`, `7d`). Defaults to `24h`. | | ||||
| | `--limit <count>` | Maximum number of violations to display (per code). `0` means show all. Defaults to `20`. | | ||||
| | `--sources <list>` | Comma-separated list of sources (`redhat,ubuntu,osv`). Filters both advisories and VEX entries. | | ||||
| | `--codes <list>` | Restricts output to specific `ERR_AOC_00x` codes. Useful for regression tracking. | | ||||
| | `--format table|json` | `table` (default) prints summary plus top violations; `json` outputs machine-readable report identical to the `/aoc/verify` API. | | ||||
| | `--export <file>` | Writes the JSON report to disk (useful for audits/offline uploads). | | ||||
| | `--tenant <tenant-id>` | Overrides tenant context. Required for cross-tenant verifications when run by platform operators. | | ||||
| | `--no-color` | Disables ANSI colours. | | ||||
|  | ||||
| `table` mode prints a summary showing the active tenant, evaluated window, counts of checked advisories/VEX statements, the active limit, total writes/violations, and whether the page was truncated. Status is colour-coded as `ok`, `violations`, or `truncated`. When violations exist the detail table lists the code, total occurrences, first sample document (`source` + `documentId` + `contentHash`), and JSON pointer path. | ||||
|  | ||||
| ### 3.4 Report structure (JSON) | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "tenant": "default", | ||||
|   "window": { | ||||
|     "from": "2025-10-25T12:00:00Z", | ||||
|     "to": "2025-10-26T12:00:00Z" | ||||
|   }, | ||||
|   "checked": { | ||||
|     "advisories": 482, | ||||
|     "vex": 75 | ||||
|   }, | ||||
|   "violations": [ | ||||
|     { | ||||
|       "code": "ERR_AOC_001", | ||||
|       "count": 2, | ||||
|       "examples": [ | ||||
|         { | ||||
|           "source": "redhat", | ||||
|           "documentId": "advisory_raw:redhat:RHSA-2025:1", | ||||
|           "contentHash": "sha256:…", | ||||
|           "path": "/content/raw/cvss" | ||||
|         } | ||||
|       ] | ||||
|     } | ||||
|   ], | ||||
|   "metrics": { | ||||
|     "ingestion_write_total": 557, | ||||
|     "aoc_violation_total": 2 | ||||
|   }, | ||||
|   "truncated": false | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 3.5 Exit codes | ||||
|  | ||||
| | Exit code | Meaning | | ||||
| |-----------|---------| | ||||
| | `0` | Verification succeeded with zero violations. | | ||||
| | `11…17` | Same mapping as § 2.5 when violations are detected. Highest-priority code returned. | | ||||
| | `18` | Verification ran but results truncated (limit reached) – treat as warning; rerun with higher `--limit`. | | ||||
| | `70` | Transport/authentication error. | | ||||
| | `71` | CLI misconfiguration (missing tenant, invalid `--since`, etc.). | | ||||
|  | ||||
| ### 3.6 Examples | ||||
|  | ||||
| Daily verification across all sources: | ||||
|  | ||||
| ```bash | ||||
| stella aoc verify --since 24h --format table | ||||
| ``` | ||||
|  | ||||
| CI pipeline focusing on errant sources and exporting evidence: | ||||
|  | ||||
| ```bash | ||||
| stella aoc verify \ | ||||
|   --sources redhat,ubuntu \ | ||||
|   --codes ERR_AOC_001,ERR_AOC_004 \ | ||||
|   --format json \ | ||||
|   --limit 100 \ | ||||
|   --export artifacts/aoc-verify.json | ||||
|  | ||||
| jq '.violations[] | {code, count}' artifacts/aoc-verify.json | ||||
| ``` | ||||
|  | ||||
| Air-gapped verification using Offline Kit snapshot (example script): | ||||
|  | ||||
| ```bash | ||||
| stella aoc verify \ | ||||
|   --since 7d \ | ||||
|   --format json \ | ||||
|   --export /mnt/offline/aoc-verify-$(date +%F).json | ||||
|  | ||||
| sha256sum /mnt/offline/aoc-verify-*.json > /mnt/offline/checksums.txt | ||||
| ``` | ||||
|  | ||||
| ### 3.7 Automation tips | ||||
|  | ||||
| - Schedule with `cron` or platform scheduler and fail the job when exit code ≥ 11. | ||||
| - Pair with `stella sources ingest --dry-run` for pre-flight validation before re-enabling a paused source. | ||||
| - Push JSON exports to observability pipelines for historical tracking of violation counts. | ||||
|  | ||||
| ### 3.8 Offline notes | ||||
|  | ||||
| - Works against Offline Kit Mongo snapshots when CLI is pointed at the local API gateway included in the bundle. | ||||
| - When fully disconnected, run against exported `aoc verify` reports generated on production and replay them using `--format json --export` (automation recipe above). | ||||
| - Include verification output in compliance packages alongside Offline Kit manifests. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4 · Global exit-code reference | ||||
|  | ||||
| | Code | Summary | | ||||
| |------|---------| | ||||
| | `0` | Success / no violations. | | ||||
| | `11` | `ERR_AOC_001` – Forbidden field present. | | ||||
| | `12` | `ERR_AOC_002` – Merge attempt detected. | | ||||
| | `13` | `ERR_AOC_003` – Idempotency violation. | | ||||
| | `14` | `ERR_AOC_004` – Missing provenance/signature metadata. | | ||||
| | `15` | `ERR_AOC_005` – Signature/checksum mismatch. | | ||||
| | `16` | `ERR_AOC_006` – Effective findings in ingestion payload. | | ||||
| | `17` | `ERR_AOC_007` – Schema violation / unknown fields. | | ||||
| | `18` | Partial verification (limit reached). | | ||||
| | `70` | Transport or HTTP failure. | | ||||
| | `71` | CLI usage error (invalid arguments, missing tenant). | | ||||
|  | ||||
| Use these codes in CI to map outcomes to build statuses or alert severities. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4 · `stella vuln observations` (Overlay paging) | ||||
|  | ||||
| `stella vuln observations` lists raw advisory observations for downstream overlays (Graph Explorer, Policy simulations, Console). Large tenants can now page through results deterministically. | ||||
|  | ||||
| | Option | Description | | ||||
| |--------|-------------| | ||||
| | `--limit <count>` | Caps the number of observations returned in a single call. Defaults to `200`; values above `500` are clamped server-side. | | ||||
| | `--cursor <token>` | Opaque continuation token produced by the previous page (`nextCursor` in JSON output). Pass it back to resume iteration. | | ||||
|  | ||||
| Additional notes: | ||||
|  | ||||
| - Table mode prints a hint when `hasMore` is `true`:   | ||||
|   `[yellow]More observations available. Continue with --cursor <token>[/]`. | ||||
| - JSON mode returns `nextCursor` and `hasMore` alongside the observation list so automation can loop until `hasMore` is `false`. | ||||
| - Supplying a non-positive limit falls back to the default (`200`). Invalid/expired cursors yield `400 Bad Request`; restart without `--cursor` to begin a fresh iteration. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5 · Related references | ||||
|  | ||||
| - [Aggregation-Only Contract reference](../../ingestion/aggregation-only-contract.md) | ||||
| - [Architecture overview](../../platform/architecture-overview.md) | ||||
| - [Console AOC dashboard](../../ui/console.md) | ||||
| - [Authority scopes](../../authority/architecture.md) | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6 · Compliance checklist | ||||
|  | ||||
| - [ ] Usage documented for both table and JSON formats. | ||||
| - [ ] Exit-code mapping matches `ERR_AOC_00x` definitions and automation guidance. | ||||
| - [ ] Offline/air-gap workflow captured for both commands. | ||||
| - [ ] References to AOC architecture and console docs included. | ||||
| - [ ] Examples validated against current CLI syntax (update post-implementation). | ||||
| - [ ] Docs guild screenshot/narrative placeholder logged for release notes (pending CLI team capture). | ||||
|  | ||||
| --- | ||||
|  | ||||
| *Last updated: 2025-10-29 (Sprint 24).* | ||||
|  | ||||
| ## 13. Authority configuration quick reference | ||||
|  | ||||
| | Setting | Purpose | How to set | | ||||
| |---------|---------|------------| | ||||
| | `StellaOps:Authority:OperatorReason` | Incident/change description recorded with `orch:operate` tokens. | CLI flag `--Authority:OperatorReason=...` or env `STELLAOPS_ORCH_REASON`. | | ||||
| | `StellaOps:Authority:OperatorTicket` | Change/incident ticket reference paired with orchestrator control actions. | CLI flag `--Authority:OperatorTicket=...` or env `STELLAOPS_ORCH_TICKET`. | | ||||
|  | ||||
| > Tokens requesting `orch:operate` will fail with `invalid_request` unless both values are present. Choose concise strings (≤256 chars for reason, ≤128 chars for ticket) and avoid sensitive data. | ||||
|  | ||||
							
								
								
									
										318
									
								
								docs/modules/cli/guides/policy.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										318
									
								
								docs/modules/cli/guides/policy.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,318 @@ | ||||
| # Stella CLI — Policy Commands | ||||
|  | ||||
| > **Audience:** Policy authors, reviewers, operators, and CI engineers using the `stella` CLI to interact with Policy Engine.   | ||||
| > **Supported from:** `stella` CLI ≥ 0.20.0 (Policy Engine v2 sprint line).   | ||||
| > **Prerequisites:** Authority-issued bearer token with the scopes noted per command (export `STELLA_TOKEN` or pass `--token`). | ||||
| > **2025-10-27 scope update:** CLI/CI tokens issued prior to Sprint 23 (AUTH-POLICY-23-001) must drop `policy:write`/`policy:submit`/`policy:edit` and instead request `policy:read`, `policy:author`, `policy:review`, and `policy:simulate` (plus `policy:approve`/`policy:operate`/`policy:activate` for promotion pipelines). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1 · Global Options & Output Modes | ||||
|  | ||||
| All `stella policy *` commands honour the common CLI options: | ||||
|  | ||||
| | Flag | Default | Description | | ||||
| |------|---------|-------------| | ||||
| | `--server <url>` | `https://stella.local` | Policy Engine gateway root. | | ||||
| | `--tenant <id>` | token default | Override tenant for multi-tenant installs. | | ||||
| | `--format <table\|json\|yaml>` | `table` for TTY, `json` otherwise | Output format for listings/diffs. | | ||||
| | `--output <file>` | stdout | Write full JSON payload to file. | | ||||
| | `--sealed` | false | Force sealed-mode behaviour (no outbound fetch). | | ||||
| | `--trace` | false | Emit verbose timing/log correlation info. | | ||||
|  | ||||
| > **Tip:** Set `STELLA_PROFILE=policy` in CI to load saved defaults from `~/.stella/profiles/policy.toml`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2 · Authoring & Drafting Commands | ||||
|  | ||||
| ### 2.1 `stella policy new` | ||||
|  | ||||
| Create a draft policy from a template or scratch. | ||||
|  | ||||
| ``` | ||||
| stella policy new --policy-id P-7 --name "Default Org Policy" \ | ||||
|   --template baseline --output-path policies/P-7.stella | ||||
| ``` | ||||
|  | ||||
| Options: | ||||
|  | ||||
| | Flag | Description | | ||||
| |------|-------------| | ||||
| | `--policy-id` *(required)* | Stable identifier (e.g., `P-7`). | | ||||
| | `--name` | Friendly display name. | | ||||
| | `--template` | `baseline`, `serverless`, `blank`. | | ||||
| | `--from` | Start from existing version (`policyId@version`). | | ||||
| | `--open` | Launches `$EDITOR` after creation. | | ||||
|  | ||||
| Writes DSL to local file and registers draft version (`status=draft`). Requires `policy:write`. | ||||
|  | ||||
| ### 2.2 `stella policy edit` | ||||
|  | ||||
| Open an existing draft in the local editor. | ||||
|  | ||||
| ``` | ||||
| stella policy edit P-7 --version 4 | ||||
| ``` | ||||
|  | ||||
| - Auto-checks out latest draft if `--version` omitted. | ||||
| - Saves to temp file, uploads on editor exit (unless `--no-upload`). | ||||
| - Use `--watch` to keep command alive and re-upload on every save. | ||||
|  | ||||
| ### 2.3 `stella policy lint` | ||||
|  | ||||
| Static validation without submitting. | ||||
|  | ||||
| ``` | ||||
| stella policy lint policies/P-7.stella --format json | ||||
| ``` | ||||
|  | ||||
| Outputs diagnostics (line/column, code, message). Exit codes: | ||||
|  | ||||
| | Code | Meaning | | ||||
| |------|---------| | ||||
| | `0` | No lint errors. | | ||||
| | `10` | Syntax/compile errors (`ERR_POL_001`). | | ||||
| | `11` | Unsupported syntax version. | | ||||
|  | ||||
| ### 2.4 `stella policy compile` | ||||
|  | ||||
| Emits IR digest and rule summary. | ||||
|  | ||||
| ``` | ||||
| stella policy compile P-7 --version 4 | ||||
| ``` | ||||
|  | ||||
| Returns JSON with `digest`, `rules.count`, action counts. Exit `0` success, `10` on compile errors. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3 · Lifecycle Workflow | ||||
|  | ||||
| ### 3.1 Submit | ||||
|  | ||||
| ``` | ||||
| stella policy submit P-7 --version 4 \ | ||||
|   --reviewer user:kay --reviewer group:sec-reviewers \ | ||||
|   --note "Simulated against golden SBOM set" \ | ||||
|   --attach sims/P-7-v4-vs-v3.json | ||||
| ``` | ||||
|  | ||||
| Requires `policy:submit`. CLI validates that lint/compile run within 24 h and bundle attachments exist. | ||||
|  | ||||
| ### 3.2 Review | ||||
|  | ||||
| ``` | ||||
| stella policy review P-7 --version 4 --approve \ | ||||
|   --note "Looks good; ensure incident playbook updated." | ||||
| ``` | ||||
|  | ||||
| - `--approve`, `--request-changes`, or `--comment`. | ||||
| - Provide `--blocking` to mark comment as blocking. | ||||
| - Requires `policy:review`. | ||||
|  | ||||
| ### 3.3 Approve | ||||
|  | ||||
| ``` | ||||
| stella policy approve P-7 --version 4 \ | ||||
|   --note "Determinism CI green; simulation diff attached." \ | ||||
|   --attach sims/P-7-v4-vs-v3.json | ||||
| ``` | ||||
|  | ||||
| Prompts for confirmation; refuses if approver == submitter. Requires `policy:approve`. | ||||
|  | ||||
| ### 3.4 Activate | ||||
|  | ||||
| ``` | ||||
| stella policy activate P-7 --version 4 --run-now --priority high | ||||
| ``` | ||||
|  | ||||
| - Optional `--scheduled-at 2025-10-27T02:00:00Z`. | ||||
| - Requires `policy:activate` and `policy:run`. | ||||
|  | ||||
| **Options** | ||||
|  | ||||
| - `--version <number>` (required) – target revision to promote. | ||||
| - `--note <text>` – record an activation note alongside the approval. | ||||
| - `--run-now` – enqueue an immediate full run after activation. | ||||
| - `--scheduled-at <timestamp>` – schedule activation for a specific UTC time (ISO-8601 format). | ||||
| - `--priority <label>` – optional scheduling priority hint (`low`, `standard`, `high`). | ||||
| - `--rollback` – mark the activation as a rollback of a previously active version. | ||||
| - `--incident <id>` – associate the activation with an incident identifier. | ||||
|  | ||||
| **Exit codes** | ||||
|  | ||||
| | Code | Meaning | | ||||
| |------|---------| | ||||
| | `0` | Activation completed (or policy already active). | | ||||
| | `75` | Activation recorded but awaiting a second approver. | | ||||
|  | ||||
| ### 3.5 Archive / Rollback | ||||
|  | ||||
| ``` | ||||
| stella policy archive P-7 --version 3 --reason "Superseded by v4" | ||||
| stella policy activate P-7 --version 3 --rollback --incident INC-2025-104 | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4 · Simulation & Runs | ||||
|  | ||||
| ### 4.1 Simulate | ||||
|  | ||||
| ``` | ||||
| stella policy simulate P-7 \ | ||||
|   --base 3 --candidate 4 \ | ||||
|   --sbom sbom:S-42 --sbom sbom:S-318 \ | ||||
|   --env exposure=internet --env sealed=false \ | ||||
|   --format json --output sims/P-7-v4-vs-v3.json | ||||
| ``` | ||||
|  | ||||
| Output fields (JSON): | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "diff": { | ||||
|     "added": 12, | ||||
|     "removed": 8, | ||||
|     "unchanged": 657, | ||||
|     "bySeverity": { | ||||
|       "Critical": {"up": 1, "down": 0}, | ||||
|       "High": {"up": 3, "down": 4} | ||||
|     } | ||||
|   }, | ||||
|   "explainUri": "blob://policy/P-7/simulations/2025-10-26.json" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| > Schema reminder: CLI commands surface objects defined in `src/Scheduler/__Libraries/StellaOps.Scheduler.Models/docs/SCHED-MODELS-20-001-POLICY-RUNS.md`; use the samples in `samples/api/scheduler/` for contract validation when extending output parsing. | ||||
|  | ||||
| Exit codes: | ||||
|  | ||||
| | Code | Meaning | | ||||
| |------|---------| | ||||
| | `0` | Simulation succeeded; diffs informational. | | ||||
| | `20` | Blocking delta (`--fail-on-diff` triggered). | | ||||
| | `21` | Simulation input missing (`ERR_POL_003`). | | ||||
| | `22` | Determinism guard (`ERR_POL_004`). | | ||||
| | `23` | API/permission error (`ERR_POL_002`, `ERR_POL_005`). | | ||||
|  | ||||
| ### 4.2 Run | ||||
|  | ||||
| ``` | ||||
| stella policy run P-7 --mode full \ | ||||
|   --sbom sbom:S-42 --env exposure=internal-only \ | ||||
|   --wait --watch | ||||
| ``` | ||||
|  | ||||
| Options: | ||||
|  | ||||
| | Flag | Description | | ||||
| |------|-------------| | ||||
| | `--mode` | `full` or `incremental` (default incremental). | | ||||
| | `--sbom` | Explicit SBOM IDs (optional). | | ||||
| | `--priority` | `normal`, `high`, `emergency`. | | ||||
| | `--wait` | Poll run status until completion. | | ||||
| | `--watch` | Stream progress events (requires TTY). | | ||||
|  | ||||
| `stella policy run status <runId>` retrieves run metadata.   | ||||
| `stella policy run list --status failed --limit 20` returns recent runs. | ||||
|  | ||||
| ### 4.3 Replay & Cancel | ||||
|  | ||||
| ``` | ||||
| stella policy run replay run:P-7:2025-10-26:auto --output bundles/replay.tgz | ||||
| stella policy run cancel run:P-7:2025-10-26:auto | ||||
| ``` | ||||
|  | ||||
| Replay downloads sealed bundle for deterministic verification. | ||||
|  | ||||
| ### 4.4 Schema artefacts for CLI validation | ||||
|  | ||||
| - CI publishes canonical JSON Schema exports for `PolicyRunRequest`, `PolicyRunStatus`, `PolicyDiffSummary`, and `PolicyExplainTrace` as the `policy-schema-exports` artifact (see `.gitea/workflows/build-test-deploy.yml`). | ||||
| - Each run writes the files to `artifacts/policy-schemas/<commit>/` and stores a unified diff (`policy-schema-diff.patch`) comparing them with the tracked baseline in `docs/schemas/`. | ||||
| - Schema changes trigger an alert in Slack `#policy-engine` via the `POLICY_ENGINE_SCHEMA_WEBHOOK` secret so CLI maintainers know to refresh fixtures or validation rules. | ||||
| - Consume these artefacts in CLI tests to keep payload validation aligned without committing generated files into the repo. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5 · Findings & Explainability | ||||
|  | ||||
| ### 5.1 List Findings | ||||
|  | ||||
| ``` | ||||
| stella findings ls --policy P-7 \ | ||||
|   --sbom sbom:S-42 \ | ||||
|   --status affected --severity High,Critical \ | ||||
|   --since 2025-10-01T00:00:00Z \ | ||||
|   --page 2 --page-size 100 \ | ||||
|   --format table | ||||
| ``` | ||||
|  | ||||
| Common flags: | ||||
|  | ||||
| | Flag | Description | | ||||
| |------|-------------| | ||||
| | `--sbom` | Repeatable filter for SBOM identifiers. | | ||||
| | `--status` | Repeatable filter (`affected`, `quieted`, `mitigated`, `not_affected`, etc.). | | ||||
| | `--severity` | Repeatable filter using normalized labels (`Critical`, `High`, `Medium`, `Low`, `Unknown`). | | ||||
| | `--since` | Return findings updated on/after the ISO-8601 timestamp. | | ||||
| | `--cursor` | Resume listing using the opaque token from a prior page. | | ||||
| | `--page`, `--page-size` | Page-based pagination (page >=1, size <=500; falls back to backend defaults). | | ||||
| | `--output` | Persist JSON payload to disk (implied JSON rendering). | | ||||
| | `--format` | `table` (default for TTY) or `json`. | | ||||
|  | ||||
| ### 5.2 Fetch Explain | ||||
|  | ||||
| ``` | ||||
| stella findings explain --policy P-7 \ | ||||
|   P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337 \ | ||||
|   --mode verbose \ | ||||
|   --format json --output explains/lodash.json | ||||
| ``` | ||||
|  | ||||
| Outputs ordered rule hits, inputs, evidence snapshots, and sealed-mode hints. Supported `--mode` values mirror API contracts (for example `summary`, `verbose`); omit to use backend default. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6 · Exit Codes Summary | ||||
|  | ||||
| | Exit code | Description | Typical ERR codes | | ||||
| |-----------|-------------|-------------------| | ||||
| | `0` | Success (command completed, warnings only). | — | | ||||
| | `10` | DSL syntax/compile failure. | `ERR_POL_001` | | ||||
| | `11` | Unsupported DSL version / schema mismatch. | `ERR_POL_001` | | ||||
| | `12` | Approval/rbac failure. | `ERR_POL_002`, `ERR_POL_005` | | ||||
| | `20` | Simulation diff exceeded thresholds (`--fail-on-diff`). | — | | ||||
| | `21` | Required inputs missing (SBOM/advisory/VEX). | `ERR_POL_003` | | ||||
| | `22` | Determinism guard triggered. | `ERR_POL_004` | | ||||
| | `23` | Run canceled or timed out. | `ERR_POL_006` | | ||||
| | `30` | Network/transport error (non-HTTP success). | — | | ||||
| | `64` | CLI usage error (invalid flag/argument). | — | | ||||
|  | ||||
| All non-zero exits emit structured error envelope on stderr when `--format json` or `STELLA_JSON_ERRORS=1`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7 · Offline & Air-Gap Usage | ||||
|  | ||||
| - Use `--sealed` to ensure commands avoid outbound calls; required for sealed enclaves. | ||||
| - `stella policy bundle export --policy P-7 --version 4 --output bundles/policy-P-7-v4.bundle` pairs with Offline Kit import. | ||||
| - Replay bundles (`run replay`) are DSSE-signed; verify with `stella offline verify`. | ||||
| - Store credentials in `~/.stella/offline.toml` for non-interactive air-gapped pipelines. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8 · Compliance Checklist | ||||
|  | ||||
| - [ ] **Help text synced:** `stella policy --help` matches documented flags/examples (update during release pipeline). | ||||
| - [ ] **Exit codes mapped:** Table above reflects CLI implementation and CI asserts mapping for `ERR_POL_*`. | ||||
| - [ ] **JSON schemas verified:** Example payloads validated against OpenAPI/SDK contracts before publishing. (_CI now exports canonical schemas as `policy-schema-exports`; wire tests to consume them._) | ||||
| - [ ] **Scope guidance present:** Each command lists required Authority scopes. | ||||
| - [ ] **Offline guidance included:** Sealed-mode steps and bundle workflows documented. | ||||
| - [ ] **Cross-links tested:** Links to DSL, lifecycle, runs, and API docs resolve locally (`yarn docs:lint`). | ||||
| - [ ] **Examples no-op safe:** Command examples either read-only or use placeholders (no destructive defaults). | ||||
|  | ||||
| --- | ||||
|  | ||||
| *Last updated: 2025-10-27 (Sprint 20).* | ||||
							
								
								
									
										23
									
								
								docs/modules/cli/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								docs/modules/cli/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,23 @@ | ||||
| # Implementation plan — CLI | ||||
|  | ||||
| ## Current objectives | ||||
| - Maintain deterministic behaviour and offline parity across releases. | ||||
| - Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap. | ||||
| - Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs. | ||||
| - Validation: extend tests/fixtures to preserve determinism and provenance requirements. | ||||
|  | ||||
| ## Epic milestones | ||||
| - **Epic 2 – Policy Engine & Editor:** deliver deterministic policy verbs, simulation, and explain outputs. | ||||
| - **Epic 4 – Policy Studio:** add registry/promotion workflows, lint tooling, and approvals UX. | ||||
| - **Epic 6 – Vulnerability Explorer:** integrate ledger/triage operations. | ||||
| - **Epic 10 – Export Center:** automate export verification and Offline Kit flows. | ||||
| - **Epic 11 – Notifications Studio:** manage rule/channel authoring and previews via CLI. | ||||
| - Track CLI-specific work (e.g., CLI-CORE-41-001, DOCS-CLI-OBS-52-001) in ../../TASKS.md and src/Cli/**/TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up new work. | ||||
| - Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										134
									
								
								docs/modules/cli/operations/release-and-packaging.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										134
									
								
								docs/modules/cli/operations/release-and-packaging.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,134 @@ | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| # CLI Release & Packaging Runbook | ||||
|  | ||||
| This runbook describes how to build, sign, package, and distribute the StellaOps CLI with Task Pack support. It covers connected and air-gapped workflows, SBOM generation, parity gating, and distribution artifacts required by Sprint 43 (`DEVOPS-CLI-43-001`, `DEPLOY-PACKS-43-001`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1 · Release Artifacts | ||||
|  | ||||
| | Artifact | Description | Notes | | ||||
| |----------|-------------|-------| | ||||
| | `stella-<version>-linux-x64.tar.gz` | Linux binary + completions | Includes man pages, localization files. | | ||||
| | `stella-<version>-macos-universal.tar.gz` | macOS universal binary | Signed/notarized where applicable. | | ||||
| | `stella-<version>-windows-x64.zip` | Windows binary + PowerShell modules | Code-signed. | | ||||
| | `stella-cli-container:<version>` | OCI image with CLI + pack runtime | Deterministic rootfs (scratch/distroless). | | ||||
| | SBOM (`.cdx.json`) | CycloneDX SBOM per artifact | Generated via `stella sbom generate` or `syft`. | | ||||
| | Checksums (`SHA256SUMS`) | Aggregated digest list | Signed with cosign. | | ||||
| | Provenance (`.intoto.jsonl`) | DSSE attestation (SLSA L2) | Contains build metadata. | | ||||
| | Release notes | Markdown summary | Links to task packs docs, parity matrix. | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2 · Build Pipeline | ||||
|  | ||||
| 1. **Source checkout** – pinned commit, reproducible environment (Docker). | ||||
| 2. **Dependency lock** – `dotnet restore`, `npm ci` (for CLI frontends), ensure deterministic build flags. | ||||
| 3. **Build binaries** – cross-platform targets with reproducible timestamps. | ||||
| 4. **Run tests** – unit + integration; include `stella pack` commands (plan/run/verify) in CI. | ||||
| 5. **Generate SBOM** – `syft packages dist/stella-linux-x64 --output cyclonedx-json`. | ||||
| 6. **Bundle** – compress artifacts, include completions (`bash`, `zsh`, `fish`, PowerShell). | ||||
| 7. **Sign** – cosign signatures for binaries, checksums, container image. | ||||
| 8. **Publish** – upload to `downloads.stella-ops.org`, container registry, Packs Registry (for CLI container). | ||||
| 9. **Parity gating** – run CLI parity matrix tests vs Console features (automation in `DEVOPS-CLI-43-001`). | ||||
|  | ||||
| CI must run in isolated environment (no network beyond allowlist). Cache dependencies for offline bundling. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3 · Versioning & Channels | ||||
|  | ||||
| - Semantic versioning (`YYYY.MM.patch`), e.g., `2025.10.0`. | ||||
| - Channels: | ||||
|   - `edge` – nightly builds, limited support. | ||||
|   - `beta` – pre-release candidates. | ||||
|   - `stable` – production-ready, after parity gating. | ||||
| - Release promotions mirror Task Pack channels; update downloads manifest (`deploy/downloads/manifest.json`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4 · Signing & Verification | ||||
|  | ||||
| - Binaries signed with cosign (`cosign sign-blob`). | ||||
| - Container image signed (`cosign sign stella-cli-container:<version>`). | ||||
| - DSSE provenance includes: | ||||
|   - Build pipeline ID. | ||||
|   - Source commit and repo. | ||||
|   - Dependencies SBOM digest. | ||||
|   - Test results summary. | ||||
| - Verification command for operators: | ||||
|  | ||||
| ```bash | ||||
| cosign verify-blob \ | ||||
|   --certificate-identity https://ci.stella-ops.org \ | ||||
|   --certificate-oidc-issuer https://fulcio.sigstore.dev \ | ||||
|   --signature stella-2025.10.0-linux-x64.sig \ | ||||
|   stella-2025.10.0-linux-x64.tar.gz | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5 · Distribution | ||||
|  | ||||
| ### 5.1 Online | ||||
|  | ||||
| - Publish artifacts to Downloads service; update manifest with digests, SBOM URLs, attestations. | ||||
| - Update CLI parity docs (`docs/cli-vs-ui-parity.md`) and release notes. | ||||
| - Push container image to registry with SBOM + attestations referenced as OCI referrers. | ||||
| - Notify stakeholders via `#release-cli` channel and release mailing list. | ||||
|  | ||||
| ### 5.2 Offline / Air-Gap | ||||
|  | ||||
| - Bundle CLI artifacts, Task Pack samples, and registry mirror: | ||||
|  | ||||
| ```bash | ||||
| stella pack bundle export \ | ||||
|   --packs "sbom-remediation:1.3.0" \ | ||||
|   --output offline/packs-bundle-2025.10.0.tgz | ||||
|  | ||||
| stella cli bundle export \ | ||||
|   --output offline/cli-2025.10.0.tgz \ | ||||
|   --include-container \ | ||||
|   --include-sbom | ||||
| ``` | ||||
|  | ||||
| - Update Offline Kit manifest with new CLI version and pack bundle entries. | ||||
| - Provide import scripts (`ouk import`) for sealed sites. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6 · Parity Gating | ||||
|  | ||||
| - `stella cli parity check` compares CLI commands vs parity matrix. | ||||
| - CI fails release if any required command flagged `🟥` or `🟡` with severity > threshold. | ||||
| - Parity report uploaded to Downloads workspace and linked in docs. | ||||
| - Manual review required for new commands (ensure `man` pages and help text localized). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7 · Localization & Documentation | ||||
|  | ||||
| - CLI includes localization bundles; ensure `i18n.txz` packaged. | ||||
| - Update man pages (`man/stella-pack.1`) and HTML docs. | ||||
| - Sync docs: `docs/modules/cli/guides/overview.md`, pack authoring guide, release notes. | ||||
| - Document new flags/commands in `docs/modules/cli/guides/commands/pack.md` (tracked in Sprint 42 tasks). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8 · Release Checklist | ||||
|  | ||||
| - [ ] All binaries built reproducibly (CI logs archived).   | ||||
| - [ ] Tests + parity matrix passing.   | ||||
| - [ ] SBOM + provenance generated and published.   | ||||
| - [ ] Cosign signatures created and verified.   | ||||
| - [ ] Downloads manifest updated (edge/beta/stable).   | ||||
| - [ ] Offline bundle exported and validated.   | ||||
| - [ ] Release notes + documentation updates merged.   | ||||
| - [ ] Notifications sent (chat/email).   | ||||
| - [ ] Imposed rule reminder present at top of document. | ||||
|  | ||||
| --- | ||||
|  | ||||
| *Last updated: 2025-10-27 (Sprint 43).*  | ||||
|  | ||||
							
								
								
									
										22
									
								
								docs/modules/concelier/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/concelier/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Concelier agent guide | ||||
|  | ||||
| ## Mission | ||||
| Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC). | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										36
									
								
								docs/modules/concelier/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										36
									
								
								docs/modules/concelier/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,36 @@ | ||||
| # StellaOps Concelier | ||||
|  | ||||
| Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC). | ||||
|  | ||||
| ## Responsibilities | ||||
| - Fetch and normalise vulnerability advisories via restart-time connectors. | ||||
| - Persist observations and correlation linksets without precedence decisions. | ||||
| - Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation. | ||||
| - Coordinate offline/air-gap updates via Offline Kit bundles. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Concelier.WebService` orchestration host. | ||||
| - Connector libraries under `StellaOps.Concelier.Connector.*`. | ||||
| - Exporter packages (`StellaOps.Concelier.Exporter.*`). | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - MongoDB for canonical observations and schedules. | ||||
| - Policy Engine / Export Center / CLI for evidence consumption. | ||||
| - Notify and UI for advisory deltas. | ||||
|  | ||||
| ## Operational notes | ||||
| - Connector runbooks in ./operations/connectors/. | ||||
| - Mirror operations for Offline Kit parity. | ||||
| - Grafana dashboards for connector health. | ||||
|  | ||||
| ## Related resources | ||||
| - ./operations/conflict-resolution.md | ||||
| - ./operations/mirror.md | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-LNM-22-001, DOCS-LNM-22-007 in ../../TASKS.md. | ||||
| - Connector-specific TODOs in `src/Concelier/**/TASKS.md`. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** uphold raw observation invariants, provenance requirements, linkset-only enrichment, and AOC verifier guardrails across every connector. | ||||
| - **Epic 10 – Export Center:** expose deterministic advisory exports and metadata required by JSON/Trivy/mirror bundles. | ||||
							
								
								
									
										9
									
								
								docs/modules/concelier/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/concelier/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Concelier | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | CONCELIER-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | CONCELIER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | CONCELIER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										600
									
								
								docs/modules/concelier/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										600
									
								
								docs/modules/concelier/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,600 @@ | ||||
| # component_architecture_concelier.md — **Stella Ops Concelier** (Sprint 22) | ||||
|  | ||||
| > Derived from Epic 1 – AOC enforcement and aligned with the Export Center evidence interfaces first scoped in Epic 10. | ||||
|  | ||||
| > **Scope.** Implementation-ready architecture for **Concelier**: the advisory ingestion and Link-Not-Merge (LNM) observation pipeline that produces deterministic raw observations, correlation linksets, and evidence events consumed by Policy Engine, Console, CLI, and Export centers. Covers domain models, connectors, observation/linkset builders, storage schema, events, APIs, performance, security, and test matrices. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Acquire authoritative **vulnerability advisories** (vendor PSIRTs, distros, OSS ecosystems, CERTs), persist them as immutable **observations** under the Aggregation-Only Contract (AOC), construct **linksets** that correlate observations without merging or precedence, and export deterministic evidence bundles (JSON, Trivy DB, Offline Kit) for downstream policy evaluation and operator tooling. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Concelier **does not** sign with private keys. When attestation is required, the export artifact is handed to the **Signer**/**Attestor** pipeline (out‑of‑process). | ||||
| * Concelier **does not** decide PASS/FAIL; it provides data to the **Policy** engine. | ||||
| * Online operation is **allowlist‑only**; air‑gapped deployments use the **Offline Kit**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Aggregation-Only Contract guardrails | ||||
|  | ||||
| **Epic 1 distilled** — the service itself is the enforcement point for AOC. The guardrail checklist is embedded in code (`AOCWriteGuard`) and must be satisfied before any advisory hits Mongo: | ||||
|  | ||||
| 1. **No derived semantics in ingestion.** The DTOs produced by connectors cannot contain severity, consensus, reachability, merged status, or fix hints. Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors and fail builds if forbidden properties appear. | ||||
| 2. **Immutable raw docs.** Every upstream advisory is persisted in `advisory_raw` with append-only semantics. Revisions produce new `_id`s via version suffix (`:v2`, `:v3`), linking back through `supersedes`. | ||||
| 3. **Mandatory provenance.** Collectors record `source`, `upstream` metadata (`document_version`, `fetched_at`, `received_at`, `content_hash`), and signature presence before writing. | ||||
| 4. **Linkset only.** Derived joins (aliases, PURLs, CPEs, references) are stored inside `linkset` and never mutate `content.raw`. | ||||
| 5. **Deterministic canonicalisation.** Writers use canonical JSON (sorted object keys, lexicographic arrays) ensuring identical inputs yield the same hashes/diff-friendly outputs. | ||||
| 6. **Idempotent upserts.** `(source.vendor, upstream.upstream_id, upstream.content_hash)` uniquely identify a document. Duplicate hashes short-circuit; new hashes create a new version. | ||||
| 7. **Verifier & CI.** `StellaOps.AOC.Verifier` processes observation batches in CI and at runtime, rejecting writes lacking provenance, introducing unordered collections, or violating the schema. | ||||
|  | ||||
| ### 1.1 Advisory raw document shape | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "_id": "advisory_raw:osv:GHSA-xxxx-....:v3", | ||||
|   "source": { | ||||
|     "vendor": "OSV", | ||||
|     "stream": "github", | ||||
|     "api": "https://api.osv.dev/v1/.../GHSA-...", | ||||
|     "collector_version": "concelier/1.7.3" | ||||
|   }, | ||||
|   "upstream": { | ||||
|     "upstream_id": "GHSA-xxxx-....", | ||||
|     "document_version": "2025-09-01T12:13:14Z", | ||||
|     "fetched_at": "2025-09-01T13:04:05Z", | ||||
|     "received_at": "2025-09-01T13:04:06Z", | ||||
|     "content_hash": "sha256:...", | ||||
|     "signature": { | ||||
|       "present": true, | ||||
|       "format": "dsse", | ||||
|       "key_id": "rekor:.../key/abc", | ||||
|       "sig": "base64..." | ||||
|     } | ||||
|   }, | ||||
|   "content": { | ||||
|     "format": "OSV", | ||||
|     "spec_version": "1.6", | ||||
|     "raw": { /* unmodified upstream document */ } | ||||
|   }, | ||||
|   "identifiers": { | ||||
|     "cve": ["CVE-2025-12345"], | ||||
|     "ghsa": ["GHSA-xxxx-...."], | ||||
|     "aliases": ["CVE-2025-12345", "GHSA-xxxx-...."] | ||||
|   }, | ||||
|   "linkset": { | ||||
|     "purls": ["pkg:npm/lodash@4.17.21"], | ||||
|     "cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"], | ||||
|     "references": [ | ||||
|       {"type":"advisory","url":"https://..."}, | ||||
|       {"type":"fix","url":"https://..."} | ||||
|     ], | ||||
|     "reconciled_from": ["content.raw.affected.ranges", "content.raw.pkg"] | ||||
|   }, | ||||
|   "supersedes": "advisory_raw:osv:GHSA-xxxx-....:v2", | ||||
|   "tenant": "default" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 1.2 Connector lifecycle | ||||
|  | ||||
| 1. **Snapshot stage** — connectors fetch signed feeds or use offline mirrors keyed by `{vendor, stream, snapshot_date}`. | ||||
| 2. **Parse stage** — upstream payloads are normalised into strongly-typed DTOs with UTC timestamps. | ||||
| 3. **Guard stage** — DTOs run through `AOCWriteGuard` performing schema validation, forbidden-field checks, provenance validation, deterministic sorting, and `_id` computation. | ||||
| 4. **Write stage** — append-only Mongo insert; duplicate hash is ignored, changed hash creates a new version and emits `supersedes` pointer. | ||||
| 5. **Event stage** — DSSE-backed events `advisory.observation.updated` and `advisory.linkset.updated` notify downstream services (Policy, Export Center, CLI). | ||||
|  | ||||
| ### 1.3 Export readiness | ||||
|  | ||||
| Concelier feeds Export Center profiles (Epic 10) by: | ||||
|  | ||||
| - Maintaining canonical JSON exports with deterministic manifests (`export.json`) listing content hashes, counts, and `supersedes` chains. | ||||
| - Producing Trivy DB-compatible artifacts (SQLite + metadata) packaged under `db/` with hash manifests. | ||||
| - Surfacing mirror manifests that reference Mongo snapshot digests, enabling Offline Kit bundle verification. | ||||
|  | ||||
| Running the same export job twice against the same snapshot must yield byte-identical archives and manifest hashes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Topology & processes | ||||
|  | ||||
| **Process shape:** single ASP.NET Core service `StellaOps.Concelier.WebService` hosting: | ||||
|  | ||||
| * **Scheduler** with distributed locks (Mongo backed). | ||||
| * **Connectors** (fetch/parse/map) that emit immutable observation candidates. | ||||
| * **Observation writer** enforcing AOC invariants via `AOCWriteGuard`. | ||||
| * **Linkset builder** that correlates observations into `advisory_linksets` and annotates conflicts. | ||||
| * **Event publisher** emitting `advisory.observation.updated` and `advisory.linkset.updated` messages. | ||||
| * **Exporters** (JSON, Trivy DB, Offline Kit slices) fed from observation/linkset stores. | ||||
| * **Minimal REST** for health/status/trigger/export and observation/linkset reads. | ||||
|  | ||||
| **Scale:** HA by running N replicas; **locks** prevent overlapping jobs per source/exporter. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Canonical domain model | ||||
|  | ||||
| > Stored in MongoDB (database `concelier`), serialized with a **canonical JSON** writer (stable order, camelCase, normalized timestamps). | ||||
|  | ||||
| ### 2.1 Core entities | ||||
|  | ||||
| #### AdvisoryObservation | ||||
|  | ||||
| ```jsonc | ||||
| observationId       // deterministic id: {tenant}:{source.vendor}:{upstreamId}:{revision} | ||||
| tenant              // issuing tenant (lower-case) | ||||
| source{ | ||||
|     vendor, stream, api, collectorVersion | ||||
| } | ||||
| upstream{ | ||||
|     upstreamId, documentVersion, fetchedAt, receivedAt, | ||||
|     contentHash, signature{present, format?, keyId?, signature?} | ||||
| } | ||||
| content{ | ||||
|     format, specVersion, raw, metadata? | ||||
| } | ||||
| identifiers{ | ||||
|     cve?, ghsa?, vendorIds[], aliases[] | ||||
| } | ||||
| linkset{ | ||||
|     purls[], cpes[], aliases[], references[{type,url}], | ||||
|     reconciledFrom[] | ||||
| } | ||||
| createdAt           // when Concelier recorded the observation | ||||
| attributes          // optional provenance metadata (batch ids, ingest cursor) | ||||
| ```jsonc | ||||
|  | ||||
| #### AdvisoryLinkset | ||||
|  | ||||
| ```jsonc | ||||
| linksetId           // sha256 over sorted (tenant, product/vuln tuple, observation ids) | ||||
| tenant | ||||
| key{ | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     confidence        // low|medium|high | ||||
| } | ||||
| observations[] = [ | ||||
|   { | ||||
|     observationId, | ||||
|     sourceVendor, | ||||
|     statement{ | ||||
|       status?, severity?, references?, notes? | ||||
|     }, | ||||
|     collectedAt | ||||
|   } | ||||
| ] | ||||
| aliases{ | ||||
|     primary, | ||||
|     others[] | ||||
| } | ||||
| purls[] | ||||
| cpes[] | ||||
| conflicts[]?        // see AdvisoryLinksetConflict | ||||
| createdAt | ||||
| updatedAt | ||||
| ```jsonc | ||||
|  | ||||
| #### AdvisoryLinksetConflict | ||||
|  | ||||
| ```jsonc | ||||
| conflictId          // deterministic hash | ||||
| type                // severity-mismatch | affected-range-divergence | reference-clash | alias-inconsistency | metadata-gap | ||||
| field?              // optional JSON pointer (e.g., /statement/severity/vector) | ||||
| observations[]      // per-source values contributing to the conflict | ||||
| confidence          // low|medium|high (heuristic weight) | ||||
| detectedAt | ||||
| ```jsonc | ||||
|  | ||||
| #### ObservationEvent / LinksetEvent | ||||
|  | ||||
| ```jsonc | ||||
| eventId             // ULID | ||||
| tenant | ||||
| type                // advisory.observation.updated | advisory.linkset.updated | ||||
| key{ | ||||
|     observationId?  // on observation event | ||||
|     linksetId?      // on linkset event | ||||
|     vulnerabilityId?, | ||||
|     productKey? | ||||
| } | ||||
| delta{ | ||||
|     added[], removed[], changed[]   // normalized summary for consumers | ||||
| } | ||||
| hash               // canonical hash of serialized delta payload | ||||
| occurredAt | ||||
| ```jsonc | ||||
|  | ||||
| #### ExportState | ||||
|  | ||||
| ```jsonc | ||||
| exportKind          // json | trivydb | ||||
| baseExportId?       // last full baseline | ||||
| baseDigest?         // digest of last full baseline | ||||
| lastFullDigest?     // digest of last full export | ||||
| lastDeltaDigest?    // digest of last delta export | ||||
| cursor              // per-kind incremental cursor | ||||
| files[]             // last manifest snapshot (path → sha256) | ||||
| ```jsonc | ||||
|  | ||||
| Legacy `Advisory`, `Affected`, and merge-centric entities remain in the repository for historical exports and replay but are being phased out as Link-Not-Merge takes over. New code paths must interact with `AdvisoryObservation` / `AdvisoryLinkset` exclusively and emit conflicts through the structured payloads described above. | ||||
|  | ||||
| ### 2.2 Product identity (`productKey`) | ||||
|  | ||||
| * **Primary:** `purl` (Package URL). | ||||
| * **OS packages:** RPM (NEVRA→purl:rpm), DEB (dpkg→purl:deb), APK (apk→purl:alpine), with **EVR/NVRA** preserved. | ||||
| * **Secondary:** `cpe` retained for compatibility; advisory records may carry both. | ||||
| * **Image/platform:** `oci:<registry>/<repo>@<digest>` for image‑level advisories (rare). | ||||
| * **Unmappable:** if a source is non‑deterministic, keep native string under `productKey="native:<provider>:<id>"` and mark **non‑joinable**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Source families & precedence | ||||
|  | ||||
| ### 3.1 Families | ||||
|  | ||||
| * **Vendor PSIRTs**: Microsoft, Oracle, Cisco, Adobe, Apple, VMware, Chromium… | ||||
| * **Linux distros**: Red Hat, SUSE, Ubuntu, Debian, Alpine… | ||||
| * **OSS ecosystems**: OSV, GHSA (GitHub Security Advisories), PyPI, npm, Maven, NuGet, Go. | ||||
| * **CERTs / national CSIRTs**: CISA (KEV, ICS), JVN, ACSC, CCCS, KISA, CERT‑FR/BUND, etc. | ||||
|  | ||||
| ### 3.2 Precedence (when claims conflict) | ||||
|  | ||||
| 1. **Vendor PSIRT** (authoritative for their product). | ||||
| 2. **Distro** (authoritative for packages they ship, including backports). | ||||
| 3. **Ecosystem** (OSV/GHSA) for library semantics. | ||||
| 4. **CERTs/aggregators** for enrichment (KEV/known exploited). | ||||
|  | ||||
| > Precedence affects **Affected** ranges and **fixed** info; **severity** is normalized to the **maximum** credible severity unless policy overrides. Conflicts are retained with **source provenance**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Connectors & normalization | ||||
|  | ||||
| ### 4.1 Connector contract | ||||
|  | ||||
| ```csharp | ||||
| public interface IFeedConnector { | ||||
|   string SourceName { get; } | ||||
|   Task FetchAsync(IServiceProvider sp, CancellationToken ct);   // -> document collection | ||||
|   Task ParseAsync(IServiceProvider sp, CancellationToken ct);   // -> dto collection (validated) | ||||
|   Task MapAsync(IServiceProvider sp, CancellationToken ct);     // -> advisory/alias/affected/reference | ||||
| } | ||||
| ```jsonc | ||||
|  | ||||
| * **Fetch**: windowed (cursor), conditional GET (ETag/Last‑Modified), retry/backoff, rate limiting. | ||||
| * **Parse**: schema validation (JSON Schema, XSD/CSAF), content type checks; write **DTO** with normalized casing. | ||||
| * **Map**: build canonical records; all outputs carry **provenance** (doc digest, URI, anchors). | ||||
|  | ||||
| ### 4.2 Version range normalization | ||||
|  | ||||
| * **SemVer** ecosystems (npm, pypi, maven, nuget, golang): normalize to `introduced`/`fixed` semver ranges (use `~`, `^`, `<`, `>=` canonicalized to intervals). | ||||
| * **RPM EVR**: `epoch:version-release` with `rpmvercmp` semantics; store raw EVR strings and also **computed order keys** for query. | ||||
| * **DEB**: dpkg version comparison semantics mirrored; store computed keys. | ||||
| * **APK**: Alpine version semantics; compute order keys. | ||||
| * **Generic**: if provider uses text, retain raw; do **not** invent ranges. | ||||
|  | ||||
| ### 4.3 Severity & CVSS | ||||
|  | ||||
| * Normalize **CVSS v2/v3/v4** where available (vector, baseScore, severity). | ||||
| * If multiple CVSS sources exist, track them all; **effective severity** defaults to **max** by policy (configurable). | ||||
| * **ExploitKnown** toggled by KEV and equivalent sources; store **evidence** (source, date). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Observation & linkset pipeline | ||||
|  | ||||
| > **Goal:** deterministically ingest raw documents into immutable observations, correlate them into evidence-rich linksets, and broadcast changes without precedence or mutation. | ||||
|  | ||||
| ### 5.1 Observation flow | ||||
|  | ||||
| 1. **Connector fetch/parse/map** — connectors download upstream payloads, validate signatures, and map to DTOs (identifiers, references, raw payload, provenance). | ||||
| 2. **AOC guard** — `AOCWriteGuard` verifies forbidden keys, provenance completeness, tenant claims, timestamp normalization, and content hash idempotency. Violations raise `ERR_AOC_00x` mapped to structured logs and metrics. | ||||
| 3. **Append-only write** — observations insert into `advisory_observations`; duplicates by `(tenant, source.vendor, upstream.upstreamId, upstream.contentHash)` become no-ops; new content for same upstream id creates a supersedes chain. | ||||
| 4. **Change feed + event** — Mongo change streams trigger `advisory.observation.updated@1` events with deterministic payloads (IDs, hash, supersedes pointer, linkset summary). Policy Engine, Offline Kit builder, and guard dashboards subscribe. | ||||
|  | ||||
| ### 5.2 Linkset correlation | ||||
|  | ||||
| 1. **Queue** — observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph. | ||||
| 2. **Canonical grouping** — builder resolves aliases using Concelier’s alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores. | ||||
| 3. **Linkset materialization** — `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates. | ||||
| 4. **Conflict detection** — builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability. | ||||
| 5. **Event emission** — `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation. | ||||
|  | ||||
| ### 5.3 Event contract | ||||
|  | ||||
| | Event | Schema | Notes | | ||||
| |-------|--------|-------| | ||||
| | `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. | | ||||
| | `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. | | ||||
|  | ||||
| Events are emitted via NATS (primary) and Redis Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Storage schema (MongoDB) | ||||
|  | ||||
| ### Collections & indexes (LNM path) | ||||
|  | ||||
| * `concelier.sources` `{_id, type, baseUrl, enabled, notes}` — connector catalog. | ||||
| * `concelier.source_state` `{sourceName(unique), enabled, cursor, lastSuccess, backoffUntil, paceOverrides}` — run-state (TTL indexes on `backoffUntil`). | ||||
| * `concelier.documents` `{_id, sourceName, uri, fetchedAt, sha256, contentType, status, metadata, gridFsId?, etag?, lastModified?}` — raw payload registry. | ||||
|   * Indexes: `{sourceName:1, uri:1}` unique; `{fetchedAt:-1}` for recent fetches. | ||||
| * `concelier.dto` `{_id, sourceName, documentId, schemaVer, payload, validatedAt}` — normalized connector DTOs used for replay. | ||||
|   * Index: `{sourceName:1, documentId:1}`. | ||||
| * `concelier.advisory_observations` | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "tenant:vendor:upstreamId:revision", | ||||
|   tenant, | ||||
|   source: { vendor, stream, api, collectorVersion }, | ||||
|   upstream: { upstreamId, documentVersion, fetchedAt, receivedAt, contentHash, signature }, | ||||
|   content: { format, specVersion, raw, metadata? }, | ||||
|   identifiers: { cve?, ghsa?, vendorIds[], aliases[] }, | ||||
|   linkset: { purls[], cpes[], aliases[], references[], reconciledFrom[] }, | ||||
|   supersedes?: "prevObservationId", | ||||
|   createdAt, | ||||
|   attributes?: object | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, upstream.upstreamId:1}`, `{tenant:1, source.vendor:1, linkset.purls:1}`, `{tenant:1, linkset.aliases:1}`, `{tenant:1, createdAt:-1}`. | ||||
| * `concelier.advisory_linksets` | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "sha256:...", | ||||
|   tenant, | ||||
|   key: { vulnerabilityId, productKey, confidence }, | ||||
|   observations: [ | ||||
|     { observationId, sourceVendor, statement, collectedAt } | ||||
|   ], | ||||
|   aliases: { primary, others: [] }, | ||||
|   purls: [], | ||||
|   cpes: [], | ||||
|   conflicts: [], | ||||
|   createdAt, | ||||
|   updatedAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, aliases.primary:1}`, `{tenant:1, updatedAt:-1}`. | ||||
| * `concelier.advisory_events` | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: ObjectId, | ||||
|   tenant, | ||||
|   type: "advisory.observation.updated" | "advisory.linkset.updated", | ||||
|   key, | ||||
|   delta, | ||||
|   hash, | ||||
|   occurredAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * TTL index on `occurredAt` (configurable retention), `{type:1, occurredAt:-1}` for replay. | ||||
| * `concelier.export_state` `{_id(exportKind), baseExportId?, baseDigest?, lastFullDigest?, lastDeltaDigest?, cursor, files[]}` | ||||
| * `locks` `{_id(jobKey), holder, acquiredAt, heartbeatAt, leaseMs, ttlAt}` (TTL cleans dead locks) | ||||
| * `jobs` `{_id, type, args, state, startedAt, heartbeatAt, endedAt, error}` | ||||
|  | ||||
| **Legacy collections** (`advisory`, `alias`, `affected`, `reference`, `merge_event`) remain read-only during the migration window to support back-compat exports. New code must not write to them; scheduled cleanup removes them after Link-Not-Merge GA. | ||||
|  | ||||
| **GridFS buckets**: `fs.documents` for raw payloads (immutable); `fs.exports` for historical JSON/Trivy archives. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Exporters | ||||
|  | ||||
| ### 7.1 Deterministic JSON (vuln‑list style) | ||||
|  | ||||
| * Folder structure mirroring `/<scheme>/<first-two>/<rest>/…` with one JSON per advisory; deterministic ordering, stable timestamps, normalized whitespace. | ||||
| * `manifest.json` lists all files with SHA‑256 and a top‑level **export digest**. | ||||
|  | ||||
| ### 7.2 Trivy DB exporter | ||||
|  | ||||
| * Builds Bolt DB archives compatible with Trivy; supports **full** and **delta** modes. | ||||
| * In delta, unchanged blobs are reused from the base; metadata captures: | ||||
|  | ||||
|   ```json | ||||
|   { | ||||
|     "mode": "delta|full", | ||||
|     "baseExportId": "...", | ||||
|     "baseManifestDigest": "sha256:...", | ||||
|     "changed": ["path1", "path2"], | ||||
|     "removed": ["path3"] | ||||
|   } | ||||
|   ``` | ||||
| * Optional ORAS push (OCI layout) for registries. | ||||
| * Offline kit bundles include Trivy DB + JSON tree + export manifest. | ||||
| * Mirror-ready bundles: when `concelier.trivy.mirror` defines domains, the exporter emits `mirror/index.json` plus per-domain `manifest.json`, `metadata.json`, and `db.tar.gz` files with SHA-256 digests so Concelier mirrors can expose domain-scoped download endpoints. | ||||
| * Concelier.WebService serves `/concelier/exports/index.json` and `/concelier/exports/mirror/{domain}/…` directly from the export tree with hour-long budgets (index: 60 s, bundles: 300 s, immutable) and per-domain rate limiting; the endpoints honour Stella Ops Authority or CIDR bypass lists depending on mirror topology. | ||||
|  | ||||
| ### 7.3 Hand‑off to Signer/Attestor (optional) | ||||
|  | ||||
| * On export completion, if `attest: true` is set in job args, Concelier **posts** the artifact metadata to **Signer**/**Attestor**; Concelier itself **does not** hold signing keys. | ||||
| * Export record stores returned `{ uuid, index, url }` from **Rekor v2**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) REST APIs | ||||
|  | ||||
| All under `/api/v1/concelier`. | ||||
|  | ||||
| **Health & status** | ||||
|  | ||||
| ``` | ||||
| GET  /healthz | /readyz | ||||
| GET  /status                              → sources, last runs, export cursors | ||||
| ``` | ||||
|  | ||||
| **Sources & jobs** | ||||
|  | ||||
| ``` | ||||
| GET  /sources                              → list of configured sources | ||||
| POST /sources/{name}/trigger               → { jobId } | ||||
| POST /sources/{name}/pause | /resume       → toggle | ||||
| GET  /jobs/{id}                            → job status | ||||
| ``` | ||||
|  | ||||
| **Exports** | ||||
|  | ||||
| ``` | ||||
| POST /exports/json   { full?:bool, force?:bool, attest?:bool } → { exportId, digest, rekor? } | ||||
| POST /exports/trivy  { full?:bool, force?:bool, publish?:bool, attest?:bool } → { exportId, digest, rekor? } | ||||
| GET  /exports/{id}   → export metadata (kind, digest, createdAt, rekor?) | ||||
| GET  /concelier/exports/index.json        → mirror index describing available domains/bundles | ||||
| GET  /concelier/exports/mirror/{domain}/manifest.json | ||||
| GET  /concelier/exports/mirror/{domain}/bundle.json | ||||
| GET  /concelier/exports/mirror/{domain}/bundle.json.jws | ||||
| ``` | ||||
|  | ||||
| **Search (operator debugging)** | ||||
|  | ||||
| ``` | ||||
| GET  /advisories/{key} | ||||
| GET  /advisories?scheme=CVE&value=CVE-2025-12345 | ||||
| GET  /affected?productKey=pkg:rpm/openssl&limit=100 | ||||
| ``` | ||||
|  | ||||
| **AuthN/Z:** Authority tokens (OpTok) with roles: `concelier.read`, `concelier.admin`, `concelier.export`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   mongo: { uri: "mongodb://mongo/concelier" } | ||||
|   s3: | ||||
|     endpoint: "http://minio:9000" | ||||
|     bucket: "stellaops-concelier" | ||||
|   scheduler: | ||||
|     windowSeconds: 30 | ||||
|     maxParallelSources: 4 | ||||
|   sources: | ||||
|     - name: redhat | ||||
|       kind: csaf | ||||
|       baseUrl: https://access.redhat.com/security/data/csaf/v2/ | ||||
|       signature: { type: pgp, keys: [ "…redhat PGP…" ] } | ||||
|       enabled: true | ||||
|       windowDays: 7 | ||||
|     - name: suse | ||||
|       kind: csaf | ||||
|       baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ | ||||
|       signature: { type: pgp, keys: [ "…suse PGP…" ] } | ||||
|     - name: ubuntu | ||||
|       kind: usn-json | ||||
|       baseUrl: https://ubuntu.com/security/notices.json | ||||
|       signature: { type: none } | ||||
|     - name: osv | ||||
|       kind: osv | ||||
|       baseUrl: https://api.osv.dev/v1/ | ||||
|       signature: { type: none } | ||||
|     - name: ghsa | ||||
|       kind: ghsa | ||||
|       baseUrl: https://api.github.com/graphql | ||||
|       auth: { tokenRef: "env:GITHUB_TOKEN" } | ||||
|   exporters: | ||||
|     json: | ||||
|       enabled: true | ||||
|       output: s3://stellaops-concelier/json/ | ||||
|     trivy: | ||||
|       enabled: true | ||||
|       mode: full | ||||
|       output: s3://stellaops-concelier/trivy/ | ||||
|       oras: | ||||
|         enabled: false | ||||
|         repo: ghcr.io/org/concelier | ||||
|   precedence: | ||||
|     vendorWinsOverDistro: true | ||||
|     distroWinsOverOsv: true | ||||
|   severity: | ||||
|     policy: max    # or 'vendorPreferred' / 'distroPreferred' | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Security & compliance | ||||
|  | ||||
| * **Outbound allowlist** per connector (domains, protocols); proxy support; TLS pinning where possible. | ||||
| * **Signature verification** for raw docs (PGP/cosign/x509) with results stored in `document.metadata.sig`. Docs failing verification may still be ingested but flagged; Policy Engine or downstream policy can down-weight them. | ||||
| * **No secrets in logs**; auth material via `env:` or mounted files; HTTP redaction of `Authorization` headers. | ||||
| * **Multi‑tenant**: per‑tenant DBs or prefixes; per‑tenant S3 prefixes; tenant‑scoped API tokens. | ||||
| * **Determinism**: canonical JSON writer; export digests stable across runs given same inputs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Performance targets & scale | ||||
|  | ||||
| * **Ingest**: ≥ 5k documents/min on 4 cores (CSAF/OpenVEX/JSON). | ||||
| * **Normalize/map**: ≥ 50k observation statements/min on 4 cores. | ||||
| * **Observation write**: ≤ 5 ms P95 per document (including guard + Mongo write). | ||||
| * **Linkset build**: ≤ 15 ms P95 per `(vulnerabilityId, productKey)` update, even with 20+ contributing observations. | ||||
| * **Export**: 1M advisories JSON in ≤ 90 s (streamed, zstd), Trivy DB in ≤ 60 s on 8 cores. | ||||
| * **Memory**: hard cap per job; chunked streaming writers; backpressure to avoid GC spikes. | ||||
|  | ||||
| **Scale pattern**: add Concelier replicas; Mongo scaling via indices and read/write concerns; GridFS only for oversized docs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Observability | ||||
|  | ||||
| * **Metrics** | ||||
|  | ||||
|   * `concelier.fetch.docs_total{source}` | ||||
|   * `concelier.fetch.bytes_total{source}` | ||||
|   * `concelier.parse.failures_total{source}` | ||||
|   * `concelier.map.statements_total{source}` | ||||
|   * `concelier.observations.write_total{result=ok|noop|error}` | ||||
|   * `concelier.linksets.updated_total{result=ok|skip|error}` | ||||
|   * `concelier.linksets.conflicts_total{type}` | ||||
|   * `concelier.export.bytes{kind}` | ||||
|   * `concelier.export.duration_seconds{kind}` | ||||
| * **Tracing** around fetch/parse/map/observe/linkset/export. | ||||
| * **Logs**: structured with `source`, `uri`, `docDigest`, `advisoryKey`, `exportId`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Testing matrix | ||||
|  | ||||
| * **Connectors:** fixture suites for each provider/format (happy path; malformed; signature fail). | ||||
| * **Version semantics:** EVR vs dpkg vs semver edge cases (epoch bumps, tilde versions, pre‑releases). | ||||
| * **Linkset correlation:** multi-source conflicts (severity, range, alias) produce deterministic conflict payloads; ensure confidence scoring stable. | ||||
| * **Export determinism:** byte‑for‑byte stable outputs across runs; digest equality. | ||||
| * **Performance:** soak tests with 1M advisories; cap memory; verify backpressure. | ||||
| * **API:** pagination, filters, RBAC, error envelopes (RFC 7807). | ||||
| * **Offline kit:** bundle build & import correctness. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Failure modes & recovery | ||||
|  | ||||
| * **Source outages:** scheduler backs off with exponential delay; `source_state.backoffUntil`; alerts on staleness. | ||||
| * **Schema drifts:** parse stage marks DTO invalid; job fails with clear diagnostics; connector version flags track supported schema ranges. | ||||
| * **Partial exports:** exporters write to temp prefix; **manifest commit** is atomic; only then move to final prefix and update `export_state`. | ||||
| * **Resume:** all stages idempotent; `source_state.cursor` supports window resume. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Operator runbook (quick) | ||||
|  | ||||
| * **Trigger all sources:** `POST /api/v1/concelier/sources/*/trigger` | ||||
| * **Force full export JSON:** `POST /api/v1/concelier/exports/json { "full": true, "force": true }` | ||||
| * **Force Trivy DB delta publish:** `POST /api/v1/concelier/exports/trivy { "full": false, "publish": true }` | ||||
| * **Inspect observation:** `GET /api/v1/concelier/observations/{observationId}` | ||||
| * **Query linkset:** `GET /api/v1/concelier/linksets?vulnerabilityId=CVE-2025-12345&productKey=pkg:rpm/redhat/openssl` | ||||
| * **Pause noisy source:** `POST /api/v1/concelier/sources/osv/pause` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Rollout plan | ||||
|  | ||||
| 1. **MVP**: Red Hat (CSAF), SUSE (CSAF), Ubuntu (USN JSON), OSV; JSON export. | ||||
| 2. **Add**: GHSA GraphQL, Debian (DSA HTML/JSON), Alpine secdb; Trivy DB export. | ||||
| 3. **Attestation hand‑off**: integrate with **Signer/Attestor** (optional). | ||||
| 4. **Scale & diagnostics**: provider dashboards, staleness alerts, export cache reuse. | ||||
| 5. **Offline kit**: end‑to‑end verified bundles for air‑gap. | ||||
							
								
								
									
										67
									
								
								docs/modules/concelier/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										67
									
								
								docs/modules/concelier/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,67 @@ | ||||
| # Implementation plan — Concelier | ||||
|  | ||||
| ## Delivery timeline | ||||
| - **Phase 1 — Guardrails & schema**   | ||||
|   Stand up Mongo JSON validators for `advisory_raw` and `vex_raw`, wire the `AOCWriteGuard` repository interceptor, and seed deterministic linkset builders. Freeze legacy normalisation paths and migrate callers to the new raw schema. | ||||
| - **Phase 2 — API & observability**   | ||||
|   Publish ingestion and verification endpoints (`POST /ingest/*`, `GET /advisories.raw`, `POST /aoc/verify`) with Authority scopes, expose telemetry (`aoc_violation_total`, guard spans, structured logs), and ensure Offline Kit packaging captures validator deployment steps. | ||||
| - **Phase 3 — Experience polish**   | ||||
|   Ship CLI/Console affordances (`stella sources ingest --dry-run`, dashboard tiles, violation drill-downs), finish Export Center hand-off metadata, and close out CI enforcement (`stella aoc verify` preflight, AST lint, seeded fixtures). | ||||
|  | ||||
| ## Work breakdown by component | ||||
| - **Concelier WebService & worker** | ||||
|   - Add Mongo validators and unique indexes over `(tenant, source.vendor, upstream.upstream_id, upstream.content_hash)`. | ||||
|   - Implement write interceptors rejecting forbidden fields, missing provenance, or merge attempts. | ||||
|   - Deterministically compute linksets and persist canonical JSON payloads. | ||||
|   - Introduce `/ingest/advisory`, `/advisories/raw*`, and `/aoc/verify` surfaces guarded by `advisory:*` and `aoc:verify` scopes. | ||||
|   - Emit guard metrics/traces and surface supersedes/violation audit logs. | ||||
| - **Excititor (shared ingestion contract)** | ||||
|   - Mirror Concelier guard and schema changes for `vex_raw`. | ||||
|   - Maintain restart-time plug-in determinism and linkset extraction parity. | ||||
| - **Shared libraries** | ||||
|   - Publish `StellaOps.Ingestion.AOC` (forbidden key catalog, guard middleware, provenance helpers, signature verification). | ||||
|   - Share error codes (`ERR_AOC_00x`) and deterministic hashing utilities. | ||||
| - **Policy Engine integration** | ||||
|   - Enforce `effective_finding_*` write exclusivity. | ||||
|   - Consume only raw documents + linksets, removing any implicit normalisation. | ||||
| - **Authority scopes** | ||||
|   - Provision `advisory:ingest|read`, `vex:ingest|read`, `aoc:verify`; propagate tenant claims to ingestion services. | ||||
| - **CLI & Console** | ||||
|   - Implement `stella sources ingest --dry-run` and `stella aoc verify` (with exit codes mapped to `ERR_AOC_00x`). | ||||
|   - Surface AOC dashboards, violation drill-down, and verification shortcuts in the Console. | ||||
| - **CI/CD** | ||||
|   - Add Roslyn analyzer / AST lint to block forbidden writes. | ||||
|   - Seed fixtures and run `stella aoc verify` against snapshots in pipeline gating. | ||||
|  | ||||
| ## Documentation deliverables | ||||
| - Update `docs/ingestion/aggregation-only-contract.md` with guard invariants, schemas, error codes, and migration guidance. | ||||
| - Refresh `docs/modules/concelier/operations/*.md` (mirror, conflict-resolution, authority audit) with validator rollouts and observability dashboards. | ||||
| - Cross-link Authority scope definitions, CLI reference, Console sources guide, and observability runbooks to the AOC guard changes. | ||||
| - Ensure Offline Kit documentation captures validator bootstrap and verify workflows. | ||||
|  | ||||
| ## Acceptance criteria | ||||
| - Mongo validators and runtime guards reject forbidden fields and missing provenance with the documented `ERR_AOC_00x` codes. | ||||
| - Linksets and supersedes chains are deterministic; rerunning ingestion over identical payloads yields byte-identical documents. | ||||
| - CLI `stella aoc verify` exits non-zero on seeded violations and zero on clean datasets; Console dashboards show real-time guard status. | ||||
| - Export Center consumes advisory datasets without relying on legacy normalised fields. | ||||
| - CI fails if lint rules detect forbidden writes or if seeded guard tests regress. | ||||
|  | ||||
| ## Risks & mitigations | ||||
| - **Collector drift introduces new forbidden keys.** Mitigated by guard middleware + CI lint + schema validation; RFC required for linkset changes. | ||||
| - **Migration complexity from legacy normalisation.** Staged cutover with `_backup_*` copies and temporary views to keep Policy Engine parity. | ||||
| - **Performance overhead during ingest.** Guard remains O(number of keys); index review ensures insert latency stays within warm (<5 s) / cold (<30 s) targets. | ||||
| - **Tenancy leakage.** `tenant` required in schema, Authority-supplied claims enforced per request, observability alerts fire on missing tenant identifiers. | ||||
|  | ||||
| ## Test strategy | ||||
| - **Unit**: guard rejection paths, provenance enforcement, idempotent insertions, linkset determinism. | ||||
| - **Property**: fuzz upstream payloads to guarantee no forbidden fields emerge. | ||||
| - **Integration**: batch ingest (50k advisories, mixed VEX fixtures), verifying zero guard violations and consistent supersedes. | ||||
| - **Contract**: Policy Engine consumers verify raw-only reads; Export Center consumes canonical datasets. | ||||
| - **End-to-end**: ingest/verify flow with CLI + Console actions to confirm observability and guard reporting. | ||||
|  | ||||
| ## Definition of done | ||||
| - Validators deployed and verified in staging/offline environments. | ||||
| - Runtime guards, CLI/Console workflows, and CI linting all active. | ||||
| - Observability dashboards and runbooks updated; metrics visible. | ||||
| - Documentation updates merged; Offline Kit instructions published. | ||||
| - ./TASKS.md reflects status transitions; cross-module dependencies acknowledged in ../../TASKS.md. | ||||
							
								
								
									
										159
									
								
								docs/modules/concelier/operations/authority-audit-runbook.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										159
									
								
								docs/modules/concelier/operations/authority-audit-runbook.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,159 @@ | ||||
| # Concelier Authority Audit Runbook | ||||
|  | ||||
| _Last updated: 2025-10-22_ | ||||
|  | ||||
| This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes. | ||||
| - OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM. | ||||
| - Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests. | ||||
| - The rollout table in `docs/10_CONCELIER_CLI_QUICKSTART.md` has been reviewed so stakeholders align on the staged → enforced toggle timeline. | ||||
|  | ||||
| ### Configuration snippet | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   authority: | ||||
|     enabled: true | ||||
|     allowAnonymousFallback: false          # keep true only during initial rollout | ||||
|     issuer: "https://authority.internal" | ||||
|     audiences: | ||||
|       - "api://concelier" | ||||
|     requiredScopes: | ||||
|       - "concelier.jobs.trigger" | ||||
|       - "advisory:read" | ||||
|       - "advisory:ingest" | ||||
|     requiredTenants: | ||||
|       - "tenant-default" | ||||
|     bypassNetworks: | ||||
|       - "127.0.0.1/32" | ||||
|       - "::1/128" | ||||
|     clientId: "concelier-jobs" | ||||
|     clientSecretFile: "/run/secrets/concelier_authority_client" | ||||
|     tokenClockSkewSeconds: 60 | ||||
|     resilience: | ||||
|       enableRetries: true | ||||
|       retryDelays: | ||||
|         - "00:00:01" | ||||
|         - "00:00:02" | ||||
|         - "00:00:05" | ||||
|       allowOfflineCacheFallback: true | ||||
|       offlineCacheTolerance: "00:10:00" | ||||
| ``` | ||||
|  | ||||
| > Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service. | ||||
|  | ||||
| ### Resilience tuning | ||||
|  | ||||
| - **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts. | ||||
| - **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs. | ||||
| - Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled. | ||||
|  | ||||
| ## 2. Key Signals | ||||
|  | ||||
| ### 2.1 Audit log channel | ||||
|  | ||||
| Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active. | ||||
|  | ||||
| ``` | ||||
| Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger advisory:ingest bypass=False remote=10.1.4.7 | ||||
| ``` | ||||
|  | ||||
| | Field        | Sample value            | Meaning                                                                                  | | ||||
| |--------------|-------------------------|------------------------------------------------------------------------------------------| | ||||
| | `route`      | `/jobs/definitions`     | Endpoint that processed the request.                                                     | | ||||
| | `status`     | `200` / `401` / `409`   | Final HTTP status code returned to the caller.                                           | | ||||
| | `subject`    | `ops@example.com`       | User or service principal subject (falls back to `(anonymous)` when unauthenticated).    | | ||||
| | `clientId`   | `concelier-cli`         | OAuth client ID provided by Authority (`(none)` if the token lacked the claim).         | | ||||
| | `scopes`     | `concelier.jobs.trigger advisory:ingest advisory:read` | Normalised scope list extracted from token claims; `(none)` if the token carried none.   | | ||||
| | `tenant`     | `tenant-default`        | Tenant claim extracted from the Authority token (`(none)` when the token lacked it).     | | ||||
| | `bypass`     | `True` / `False`        | Indicates whether the request succeeded because its source IP matched a bypass CIDR.    | | ||||
| | `remote`     | `10.1.4.7`              | Remote IP recorded from the connection / forwarded header test hooks.                    | | ||||
|  | ||||
| Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations: | ||||
|  | ||||
| - `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout). | ||||
| - `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration. | ||||
| - `status=202 AND NOT contains(scopes,"advisory:ingest")` – ingestion attempted without the new AOC scopes; confirm the Authority client registration matches the sample above. | ||||
| - `tenant!=(tenant-default)` – indicates a cross-tenant token was accepted. Ensure Concelier `requiredTenants` is aligned with Authority client registration. | ||||
| - Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated. | ||||
|  | ||||
| ### 2.2 Metrics | ||||
|  | ||||
| Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`. | ||||
|  | ||||
| | Metric name                   | Description                                        | PromQL example | | ||||
| |-------------------------------|----------------------------------------------------|----------------| | ||||
| | `web.jobs.triggered`          | Accepted job trigger requests.                     | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` | | ||||
| | `web.jobs.trigger.conflict`   | Rejected triggers (already running, disabled…).    | `sum(rate(web_jobs_trigger_conflict_total[5m]))` | | ||||
| | `web.jobs.trigger.failed`     | Server-side job failures.                          | `sum(rate(web_jobs_trigger_failed_total[5m]))` | | ||||
|  | ||||
| > Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names. | ||||
|  | ||||
| Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`: | ||||
|  | ||||
| - `concelier.source.http.requests_total{concelier_source="jobs-run"}` – ensures REST/manual triggers route through Authority. | ||||
| - If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries. | ||||
|  | ||||
| ## 3. Alerting Guidance | ||||
|  | ||||
| 1. **Unauthorized bypass attempt**   | ||||
|    - Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`   | ||||
|    - Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious. | ||||
|  | ||||
| 2. **Missing scopes**   | ||||
|    - Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`   | ||||
|    - Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`, `advisory:ingest`, and `advisory:read`. | ||||
|  | ||||
| 3. **Trigger failure surge**   | ||||
|    - Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.   | ||||
|    - Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors. | ||||
|  | ||||
| 4. **Conflict spike**   | ||||
|    - Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).   | ||||
|    - Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly. | ||||
|  | ||||
| 5. **Authority offline**   | ||||
|    - Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback. | ||||
|  | ||||
| ## 4. Rollout & Verification Procedure | ||||
|  | ||||
| 1. **Pre-checks** | ||||
|    - Align with the rollout phases documented in `docs/10_CONCELIER_CLI_QUICKSTART.md` (validation → rehearsal → enforced) and record the target dates in your change request. | ||||
|    - Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation. | ||||
|    - Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host). | ||||
|  | ||||
| 2. **Smoke test with valid token** | ||||
|    - Obtain a token via CLI: `stella auth login --scope "concelier.jobs.trigger advisory:ingest" --scope advisory:read`. | ||||
|    - Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`. | ||||
|    - Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger advisory:ingest advisory:read`, and `tenant=tenant-default`. | ||||
|  | ||||
| 3. **Negative test without token** | ||||
|    - Call the same endpoint without a token. Expect HTTP 401, `bypass=False`. | ||||
|    - If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled. | ||||
|  | ||||
| 4. **Bypass check (if applicable)** | ||||
|    - From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries. | ||||
|  | ||||
| 5. **Metrics validation** | ||||
|    - Ensure `web.jobs.triggered` counter increments during accepted runs. | ||||
|    - Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled. | ||||
|  | ||||
| ## 5. Troubleshooting | ||||
|  | ||||
| | Symptom | Probable cause | Remediation | | ||||
| |---------|----------------|-------------| | ||||
| | Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. | | ||||
| | Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. | | ||||
| | HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. | | ||||
| | Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. | | ||||
| | Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. | | ||||
|  | ||||
| ## 6. References | ||||
|  | ||||
| - `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start. | ||||
| - `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines. | ||||
| - `docs/modules/authority/operations/monitoring.md` – Authority-side monitoring and alerting playbook. | ||||
| - `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields. | ||||
							
								
								
									
										160
									
								
								docs/modules/concelier/operations/conflict-resolution.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										160
									
								
								docs/modules/concelier/operations/conflict-resolution.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,160 @@ | ||||
| # Concelier Conflict Resolution Runbook (Sprint 3) | ||||
|  | ||||
| This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Precedence Model (recap) | ||||
|  | ||||
| - **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it. | ||||
| - **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`. | ||||
| - **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`. | ||||
| - **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Telemetry Shipped This Sprint | ||||
|  | ||||
| | Instrument | Type | Key Tags | Purpose | | ||||
| |------------|------|----------|---------| | ||||
| | `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. | | ||||
| | `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. | | ||||
| | `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. | | ||||
| | `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. | | ||||
| | `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. | | ||||
|  | ||||
| ### Structured logs | ||||
|  | ||||
| - `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts. | ||||
| - `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions. | ||||
| - `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios. | ||||
| - `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments. | ||||
|  | ||||
| Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Detection & Alerting | ||||
|  | ||||
| 1. **Dashboard panels** | ||||
|    - `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window. | ||||
|    - `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data. | ||||
|    - `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA). | ||||
|    - `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day. | ||||
| 2. **Log based alerts** | ||||
|    - `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners. | ||||
|    - `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained. | ||||
| 3. **Job health** | ||||
|    - `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops. | ||||
|  | ||||
| ### Threshold updates (2025-10-12) | ||||
|  | ||||
| - `concelier.merge.conflicts` – Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging. | ||||
| - `concelier.merge.overrides` – Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`). | ||||
| - `concelier.merge.range_overrides` – Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Triage Workflow | ||||
|  | ||||
| 1. **Confirm job context** | ||||
|    - `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage. | ||||
| 2. **Inspect metrics** | ||||
|    - Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`. | ||||
| 3. **Pull structured logs** | ||||
|    - Example (vector output): | ||||
|      ``` | ||||
|      jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log | ||||
|      ``` | ||||
| 4. **Review merge events** | ||||
|    - `mongosh`: | ||||
|      ```javascript | ||||
|      use concelier; | ||||
|      db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5); | ||||
|      ``` | ||||
|    - Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output. | ||||
| 5. **Interrogate provenance** | ||||
|    - `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })` | ||||
|    - Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Conflict Classification Matrix | ||||
|  | ||||
| | Signal | Likely Cause | Immediate Action | | ||||
| |--------|--------------|------------------| | ||||
| | `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. | | ||||
| | `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. | | ||||
| | `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. | | ||||
| | Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. | | ||||
| | `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Resolution Playbook | ||||
|  | ||||
| 1. **Connector data fix** | ||||
|    - Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.). | ||||
|    - Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected. | ||||
| 2. **Temporary precedence override** | ||||
|    - Edit `etc/concelier.yaml`: | ||||
|      ```yaml | ||||
|      concelier: | ||||
|        merge: | ||||
|          precedence: | ||||
|            ranks: | ||||
|              osv: 1 | ||||
|              ghsa: 0 | ||||
|      ``` | ||||
|    - Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks. | ||||
|    - Document the override with expiry in the change log. | ||||
| 3. **Alias remediation** | ||||
|    - Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs). | ||||
|    - Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing). | ||||
| 4. **Escalation** | ||||
|    - If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. Validation Checklist | ||||
|  | ||||
| - [ ] Merge job rerun returns exit code `0`. | ||||
| - [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action. | ||||
| - [ ] Latest `merge_event` entry shows expected hash delta. | ||||
| - [ ] Affected advisory document shows updated `provenance[].decisionReason`. | ||||
| - [ ] Ops change log updated with incident summary, config overrides, and rollback plan. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8. Reference Material | ||||
|  | ||||
| - Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`. | ||||
| - Merge engine internals: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`. | ||||
| - Metrics definitions: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`. | ||||
| - Storage audit trail: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/Concelier/__Libraries/StellaOps.Concelier.Storage.Mongo/MergeEvents`. | ||||
|  | ||||
| Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9. Synthetic Regression Fixtures | ||||
|  | ||||
| - **Locations** – Canonical conflict snapshots now live at `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`. | ||||
| - **Validation commands** – To regenerate and verify the fixtures offline, run: | ||||
|  | ||||
| ```bash | ||||
| dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests | ||||
| dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests | ||||
| dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests | ||||
| dotnet test src/Concelier/__Tests/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions | ||||
| ``` | ||||
|  | ||||
| - **Expected signals** – The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10. Change Log | ||||
|  | ||||
| | Date (UTC) | Change | Notes | | ||||
| |------------|--------|-------| | ||||
| | 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. | | ||||
							
								
								
									
										77
									
								
								docs/modules/concelier/operations/connectors/apple.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										77
									
								
								docs/modules/concelier/operations/connectors/apple.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,77 @@ | ||||
| # Concelier Apple Security Update Connector Operations | ||||
|  | ||||
| This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`). | ||||
| - Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered. | ||||
| - Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     apple: | ||||
|       softwareLookupUri: "https://gdmf.apple.com/v2/pmv" | ||||
|       advisoryBaseUri: "https://support.apple.com/" | ||||
|       localeSegment: "en-us" | ||||
|       maxAdvisoriesPerFetch: 25 | ||||
|       initialBackfill: "120.00:00:00" | ||||
|       modifiedTolerance: "02:00:00" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient. | ||||
|  | ||||
| ## 2. Staging Smoke Test | ||||
|  | ||||
| 1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound. | ||||
| 2. Trigger a full connector cycle: | ||||
|    - CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map` | ||||
|    - REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }` | ||||
| 3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`: | ||||
|    - `apple.fetch.items` (documents fetched) | ||||
|    - `apple.fetch.failures` | ||||
|    - `apple.fetch.unchanged` | ||||
|    - `apple.parse.failures` | ||||
|    - `apple.map.affected.count` (histogram of affected package counts) | ||||
| 4. Cross-check the shared HTTP counters: | ||||
|    - `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases. | ||||
|    - `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run. | ||||
| 5. Inspect the info logs: | ||||
|    - `Apple software index fetch … processed=X newDocuments=Y` | ||||
|    - `Apple advisory parse complete … aliases=… affected=…` | ||||
|    - `Mapped Apple advisory … pendingMappings=0` | ||||
| 6. Confirm MongoDB state: | ||||
|    - `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`). | ||||
|    - `dtos` store has `schemaVersion="apple.security.update.v1"`. | ||||
|    - `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules. | ||||
|    - `source_states` entry for `apple` shows a recent `cursor.lastPosted`. | ||||
|  | ||||
| ## 3. Production Monitoring | ||||
|  | ||||
| - **Dashboards** – Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed): | ||||
|   - `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])` | ||||
|   - `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`) | ||||
|   - `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out | ||||
|   - `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`) | ||||
| - **Alerts** – Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists. | ||||
| - **Logs** – Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions. | ||||
| - **Telemetry pipeline** – `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it. | ||||
|  | ||||
| ## 4. Fixture Maintenance | ||||
|  | ||||
| Regression fixtures live under `src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear. | ||||
|  | ||||
| 1. Run the helper script matching your platform: | ||||
|    - Bash: `./scripts/update-apple-fixtures.sh` | ||||
|    - PowerShell: `./scripts/update-apple-fixtures.ps1` | ||||
| 2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots. | ||||
| 3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism. | ||||
| 4. Commit fixture updates together with any parser/mapping changes that motivated them. | ||||
|  | ||||
| ## 5. Known Issues & Follow-up Tasks | ||||
|  | ||||
| - Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows. | ||||
| - Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage. | ||||
| - Multi-locale content is still under regression sweep (`src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises. | ||||
							
								
								
									
										72
									
								
								docs/modules/concelier/operations/connectors/cccs.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										72
									
								
								docs/modules/concelier/operations/connectors/cccs.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,72 @@ | ||||
| # Concelier CCCS Connector Operations | ||||
|  | ||||
| This runbook covers day‑to‑day operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories. | ||||
|  | ||||
| ## 1. Configuration Checklist | ||||
|  | ||||
| - Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`. | ||||
| - Set the Concelier options before restarting workers. Example `concelier.yaml` snippet: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     cccs: | ||||
|       feeds: | ||||
|         - language: "en" | ||||
|           uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat" | ||||
|         - language: "fr" | ||||
|           uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat" | ||||
|       maxEntriesPerFetch: 80        # increase temporarily for backfill runs | ||||
|       maxKnownEntries: 512 | ||||
|       requestTimeout: "00:00:30" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5 100 rows each as of 2025‑10‑14). The connector honours `maxEntriesPerFetch`, so leave it low for steady‑state and raise it for planned backfills. | ||||
|  | ||||
| ## 2. Telemetry & Logging | ||||
|  | ||||
| - **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):** | ||||
|   - `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures` | ||||
|   - `cccs.fetch.documents`, `cccs.fetch.unchanged` | ||||
|   - `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine` | ||||
|   - `cccs.map.success`, `cccs.map.failures` | ||||
| - **Shared HTTP metrics** via `SourceDiagnostics`: | ||||
|   - `concelier.source.http.requests{concelier.source="cccs"}` | ||||
|   - `concelier.source.http.failures{concelier.source="cccs"}` | ||||
|   - `concelier.source.http.duration{concelier.source="cccs"}` | ||||
| - **Structured logs** | ||||
|   - `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…` | ||||
|   - `CCCS parse completed parsed=… failures=…` | ||||
|   - `CCCS map completed mapped=… failures=…` | ||||
|   - Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails. | ||||
|  | ||||
| Suggested Grafana alerts: | ||||
| - `increase(cccs.fetch.failures_total[15m]) > 0` | ||||
| - `rate(cccs.map.success_total[1h]) == 0` while other connectors are active | ||||
| - `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s` | ||||
|  | ||||
| ## 3. Historical Backfill Plan | ||||
|  | ||||
| 1. **Snapshot the source** – the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 2018‑06‑08 for EN, 2018‑06‑08 for FR). Mirror those responses into Offline Kit storage when operating air‑gapped. | ||||
| 2. **Stage ingestion**: | ||||
|    - Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers. | ||||
|    - Run chained jobs until `pendingDocuments` drains:   | ||||
|      `stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map` | ||||
|    - Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete. | ||||
| 3. **Optional pagination sweep** – for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift. | ||||
| 4. **Language split** – keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number. | ||||
| 5. **Throttle planning** – schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250 ms request delay or raise it if mirrored traffic is not available. | ||||
|  | ||||
| ## 4. Selector & Sanitiser Notes | ||||
|  | ||||
| - `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`. | ||||
| - Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers. | ||||
| - `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation. | ||||
|  | ||||
| ## 5. Fixture Maintenance | ||||
|  | ||||
| - Regression fixtures live in `src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`. | ||||
| - Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`. | ||||
| - Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing. | ||||
							
								
								
									
										146
									
								
								docs/modules/concelier/operations/connectors/certbund.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										146
									
								
								docs/modules/concelier/operations/connectors/certbund.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,146 @@ | ||||
| # Concelier CERT-Bund Connector Operations | ||||
|  | ||||
| _Last updated: 2025-10-17_ | ||||
|  | ||||
| Germany’s Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portal’s JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Configuration Checklist | ||||
|  | ||||
| - Allow outbound access (or stage mirrors) for: | ||||
|   - `https://wid.cert-bund.de/content/public/securityAdvisory/rss` | ||||
|   - `https://wid.cert-bund.de/portal/` (session/bootstrap) | ||||
|   - `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON) | ||||
| - Ensure the HTTP client reuses a cookie container (the connector’s dependency injection wiring already sets this up). | ||||
|  | ||||
| Example `concelier.yaml` fragment: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     cert-bund: | ||||
|       feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss" | ||||
|       portalBootstrapUri: "https://wid.cert-bund.de/portal/" | ||||
|       detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory" | ||||
|       maxAdvisoriesPerFetch: 50 | ||||
|       maxKnownAdvisories: 512 | ||||
|       requestTimeout: "00:00:30" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Telemetry & Logging | ||||
|  | ||||
| - **Meter**: `StellaOps.Concelier.Connector.CertBund` | ||||
| - **Counters / histograms**: | ||||
|   - `certbund.feed.fetch.attempts|success|failures` | ||||
|   - `certbund.feed.items.count` | ||||
|   - `certbund.feed.enqueued.count` | ||||
|   - `certbund.feed.coverage.days` | ||||
|   - `certbund.detail.fetch.attempts|success|not_modified|failures{reason}` | ||||
|   - `certbund.parse.success|failures{reason}` | ||||
|   - `certbund.parse.products.count`, `certbund.parse.cve.count` | ||||
|   - `certbund.map.success|failures{reason}` | ||||
|   - `certbund.map.affected.count`, `certbund.map.aliases.count` | ||||
| - Shared HTTP metrics remain available through `concelier.source.http.*`. | ||||
|  | ||||
| **Structured logs** (all emitted at information level when work occurs): | ||||
|  | ||||
| - `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}` | ||||
| - `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …` | ||||
| - `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …` | ||||
|  | ||||
| Alerting ideas: | ||||
|  | ||||
| 1. `increase(certbund.detail.fetch.failures_total[10m]) > 0` | ||||
| 2. `rate(certbund.map.success_total[30m]) == 0` | ||||
| 3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s` | ||||
|  | ||||
| The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Historical Backfill & Export Strategy | ||||
|  | ||||
| ### 3.1 Retention snapshot | ||||
|  | ||||
| - RSS window: ~250 advisories (≈90 days at current cadence). | ||||
| - Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied. | ||||
|  | ||||
| ### 3.2 JSON search pagination | ||||
|  | ||||
| ```bash | ||||
| # 1. Bootstrap cookies (client_config + XSRF-TOKEN) | ||||
| curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null | ||||
| curl -s -b cookies.txt -c cookies.txt \ | ||||
|      -H "X-Requested-With: XMLHttpRequest" \ | ||||
|      "https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null | ||||
|  | ||||
| XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt) | ||||
|  | ||||
| # 2. Page search results | ||||
| curl -s -b cookies.txt \ | ||||
|      -H "Content-Type: application/json" \ | ||||
|      -H "Accept: application/json" \ | ||||
|      -H "X-XSRF-TOKEN: ${XSRF}" \ | ||||
|      -X POST \ | ||||
|      --data '{"page":4,"size":100,"sort":["published,desc"]}' \ | ||||
|      "https://wid.cert-bund.de/portal/api/securityadvisory/search" \ | ||||
|      > certbund-page4.json | ||||
| ``` | ||||
|  | ||||
| Iterate `page` until the response `content` array is empty. Pages 0–9 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity. | ||||
|  | ||||
| > **Shortcut** – run `python src/Tools/certbund_offline_snapshot.py --output seed-data/cert-bund` | ||||
| > to bootstrap the session, capture the paginated search responses, and regenerate | ||||
| > the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token` | ||||
| > if the portal requires a browser-derived session (see options via `--help`). | ||||
|  | ||||
| ### 3.3 Export bundles | ||||
|  | ||||
| ```bash | ||||
| python src/Tools/certbund_offline_snapshot.py \ | ||||
|   --output seed-data/cert-bund \ | ||||
|   --start-year 2014 \ | ||||
|   --end-year "$(date -u +%Y)" | ||||
| ``` | ||||
|  | ||||
| The helper stores yearly exports under `seed-data/cert-bund/export/`, | ||||
| captures paginated search snapshots in `seed-data/cert-bund/search/`, | ||||
| and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`. | ||||
| Split ranges according to your compliance window (default: one file per | ||||
| calendar year). Concelier can ingest these JSON payloads directly when | ||||
| operating offline. | ||||
|  | ||||
| > When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the | ||||
| > manual `curl` flow above, then rerun the helper with `--skip-fetch` to | ||||
| > rebuild the manifest from the existing files. | ||||
|  | ||||
| ### 3.4 Connector-driven catch-up | ||||
|  | ||||
| 1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`. | ||||
| 2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`. | ||||
| 3. Restore defaults and capture the cursor snapshot for audit. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Locale & Translation Guidance | ||||
|  | ||||
| - Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy. | ||||
| - UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text. | ||||
| - Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Verification Checklist | ||||
|  | ||||
| 1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window. | ||||
| 2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit. | ||||
| 3. During backfills, watch `certbund.feed.enqueued.count` trend to zero. | ||||
| 4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint. | ||||
| 5. For Offline Kit exports, validate SHA256 hashes before distribution. | ||||
							
								
								
									
										94
									
								
								docs/modules/concelier/operations/connectors/cisco.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										94
									
								
								docs/modules/concelier/operations/connectors/cisco.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,94 @@ | ||||
| # Concelier Cisco PSIRT Connector – OAuth Provisioning SOP | ||||
|  | ||||
| _Last updated: 2025-10-14_ | ||||
|  | ||||
| ## 1. Scope | ||||
|  | ||||
| This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths. | ||||
|  | ||||
| ## 2. Prerequisites | ||||
|  | ||||
| - Active Cisco.com (CCO) account with access to the Cisco API Console. | ||||
| - Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0 | ||||
| - Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory. | ||||
|  | ||||
| ## 3. Provisioning workflow | ||||
|  | ||||
| 1. **Register the application** | ||||
|    - Sign in at <https://apiconsole.cisco.com>. | ||||
|    - Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0 | ||||
|    - Record the generated `clientId` and `clientSecret` in the Ops vault. | ||||
| 2. **Verify token issuance** | ||||
|    - Request an access token with: | ||||
|      ```bash | ||||
|      curl -s https://id.cisco.com/oauth2/default/v1/token \ | ||||
|        -H "Content-Type: application/x-www-form-urlencoded" \ | ||||
|        -d "grant_type=client_credentials" \ | ||||
|        -d "client_id=${CLIENT_ID}" \ | ||||
|        -d "client_secret=${CLIENT_SECRET}" | ||||
|      ``` | ||||
|    - Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7 | ||||
|    - Preserve the response only long enough to validate syntax; do **not** persist tokens. | ||||
| 3. **Authorize Concelier runtime** | ||||
|    - Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials. | ||||
|    - For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platform’s sealed secret format. | ||||
| 4. **Connectivity validation** | ||||
|    - From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`. | ||||
|    - Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses. | ||||
|  | ||||
| ## 4. Rotation SOP | ||||
|  | ||||
| | Step | Owner | Notes | | ||||
| | --- | --- | --- | | ||||
| | 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. | | ||||
| | 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. | | ||||
| | 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. | | ||||
| | 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. | | ||||
| | 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. | | ||||
|  | ||||
| **Automation hooks** | ||||
| - Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task. | ||||
| - Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit. | ||||
|  | ||||
| ## 5. Offline Kit packaging | ||||
|  | ||||
| 1. Generate the credential bundle using the Offline Kit CLI:   | ||||
|    `offline-kit secrets add cisco-openvuln --client-id … --client-secret …` | ||||
| 2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`. | ||||
| 3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata). | ||||
| 4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror). | ||||
|  | ||||
| ## 6. Quota and throttling guidance | ||||
|  | ||||
| - Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5 000 requests/day per application.citeturn0search0turn3search6 | ||||
| - Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments. | ||||
| - Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented. | ||||
|  | ||||
| ## 7. Telemetry & Monitoring | ||||
|  | ||||
| - **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)** | ||||
|   - `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged` | ||||
|   - `cisco.parse.success`, `cisco.parse.failures` | ||||
|   - `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages` | ||||
| - **Shared HTTP metrics** via `SourceDiagnostics`: | ||||
|   - `concelier.source.http.requests{concelier.source="vndr-cisco"}` | ||||
|   - `concelier.source.http.failures{concelier.source="vndr-cisco"}` | ||||
|   - `concelier.source.http.duration{concelier.source="vndr-cisco"}` | ||||
| - **Structured logs** | ||||
|   - `Cisco fetch completed date=… pages=… added=…` (info) | ||||
|   - `Cisco parse completed parsed=… failures=…` (info) | ||||
|   - `Cisco map completed mapped=… failures=…` (info) | ||||
|   - Warnings surface when DTO serialization fails or GridFS payload is missing. | ||||
| - Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues. | ||||
|  | ||||
| ## 8. Incident response | ||||
|  | ||||
| - **Token compromise** – revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4. | ||||
| - **Persistent 401/403** – confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID. | ||||
| - **429 spikes** – inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco. | ||||
|  | ||||
| ## 9. References | ||||
|  | ||||
| - Cisco PSIRT openVuln API Authentication Guide.citeturn3search0 | ||||
| - Accessing the openVuln API using curl (token lifetime).citeturn3search7 | ||||
| - openVuln API rate limit documentation.citeturn0search0turn3search6 | ||||
| @@ -0,0 +1,151 @@ | ||||
| { | ||||
|   "title": "Concelier CVE & KEV Observability", | ||||
|   "uid": "concelier-cve-kev", | ||||
|   "schemaVersion": 38, | ||||
|   "version": 1, | ||||
|   "editable": true, | ||||
|   "timezone": "", | ||||
|   "time": { | ||||
|     "from": "now-24h", | ||||
|     "to": "now" | ||||
|   }, | ||||
|   "refresh": "5m", | ||||
|   "templating": { | ||||
|     "list": [ | ||||
|       { | ||||
|         "name": "datasource", | ||||
|         "type": "datasource", | ||||
|         "query": "prometheus", | ||||
|         "refresh": 1, | ||||
|         "hide": 0 | ||||
|       } | ||||
|     ] | ||||
|   }, | ||||
|   "panels": [ | ||||
|     { | ||||
|       "type": "timeseries", | ||||
|       "title": "CVE fetch success vs failure", | ||||
|       "gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "ops", | ||||
|           "custom": { | ||||
|             "drawStyle": "line", | ||||
|             "lineWidth": 2, | ||||
|             "fillOpacity": 10 | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "rate(cve_fetch_success_total[5m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "success" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "B", | ||||
|           "expr": "rate(cve_fetch_failures_total[5m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "failure" | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "type": "timeseries", | ||||
|       "title": "KEV fetch cadence", | ||||
|       "gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "ops", | ||||
|           "custom": { | ||||
|             "drawStyle": "line", | ||||
|             "lineWidth": 2, | ||||
|             "fillOpacity": 10 | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "rate(kev_fetch_success_total[30m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "success" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "B", | ||||
|           "expr": "rate(kev_fetch_failures_total[30m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "failure" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "C", | ||||
|           "expr": "rate(kev_fetch_unchanged_total[30m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "unchanged" | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "type": "table", | ||||
|       "title": "KEV parse anomalies (24h)", | ||||
|       "gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "short" | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))", | ||||
|           "format": "table", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" } | ||||
|         } | ||||
|       ], | ||||
|       "transformations": [ | ||||
|         { | ||||
|           "id": "organize", | ||||
|           "options": { | ||||
|             "renameByName": { | ||||
|               "Value": "count" | ||||
|             } | ||||
|           } | ||||
|         } | ||||
|       ] | ||||
|     }, | ||||
|     { | ||||
|       "type": "timeseries", | ||||
|       "title": "Advisories emitted", | ||||
|       "gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 }, | ||||
|       "fieldConfig": { | ||||
|         "defaults": { | ||||
|           "unit": "ops", | ||||
|           "custom": { | ||||
|             "drawStyle": "line", | ||||
|             "lineWidth": 2, | ||||
|             "fillOpacity": 10 | ||||
|           } | ||||
|         }, | ||||
|         "overrides": [] | ||||
|       }, | ||||
|       "targets": [ | ||||
|         { | ||||
|           "refId": "A", | ||||
|           "expr": "rate(cve_map_success_total[15m])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "CVE" | ||||
|         }, | ||||
|         { | ||||
|           "refId": "B", | ||||
|           "expr": "rate(kev_map_advisories_total[24h])", | ||||
|           "datasource": { "type": "prometheus", "uid": "${datasource}" }, | ||||
|           "legendFormat": "KEV" | ||||
|         } | ||||
|       ] | ||||
|     } | ||||
|   ] | ||||
| } | ||||
							
								
								
									
										143
									
								
								docs/modules/concelier/operations/connectors/cve-kev.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										143
									
								
								docs/modules/concelier/operations/connectors/cve-kev.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,143 @@ | ||||
| # Concelier CVE & KEV Connector Operations | ||||
|  | ||||
| This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments. | ||||
|  | ||||
| ## 1. CVE Services Connector (`source:cve:*`) | ||||
|  | ||||
| ### 1.1 Prerequisites | ||||
|  | ||||
| - CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API. | ||||
| - Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers. | ||||
| - Updated `concelier.yaml` (or the matching environment variables) with the following section: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     cve: | ||||
|       baseEndpoint: "https://cveawg.mitre.org/api/" | ||||
|       apiOrg: "ORG123" | ||||
|       apiUser: "user@example.org" | ||||
|       apiKeyFile: "/var/run/secrets/concelier/cve-api-key" | ||||
|       seedDirectory: "./seed-data/cve" | ||||
|       pageSize: 200 | ||||
|       maxPagesPerFetch: 5 | ||||
|       initialBackfill: "30.00:00:00" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:10:00" | ||||
| ``` | ||||
|  | ||||
| > ℹ️  Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`. | ||||
|  | ||||
| > 🪙  When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repo’s `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied. | ||||
|  | ||||
| ### 1.2 Smoke Test (staging) | ||||
|  | ||||
| 1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials. | ||||
| 2. Trigger one end-to-end cycle: | ||||
|    - Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map` | ||||
|    - REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }` | ||||
| 3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`): | ||||
|    - `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged` | ||||
|    - `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine` | ||||
|    - `cve.map.success` | ||||
| 4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat. | ||||
| 5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`. | ||||
| 6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced. | ||||
|  | ||||
| ### 1.3 Production Monitoring | ||||
|  | ||||
| - **Dashboards** – Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts: | ||||
|   - `rate(cve_fetch_failures_total[5m]) > 0` for 10 minutes (`severity=warning`) | ||||
|   - `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`) | ||||
|   - `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies | ||||
| - **Logs** – Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests. | ||||
| - **Grafana pack** – Import `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout. | ||||
| - **Backfill window** – Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes. | ||||
|  | ||||
| ### 1.4 Staging smoke log (2025-10-15) | ||||
|  | ||||
| While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests: | ||||
|  | ||||
| - Command: `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"` | ||||
| - Summary log emitted by the connector: | ||||
|   ``` | ||||
|   CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1 | ||||
|   ``` | ||||
| - Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`: | ||||
|   | Metric | Value | | ||||
|   |--------|-------| | ||||
|   | `cve.fetch.attempts` | 1 | | ||||
|   | `cve.fetch.success` | 1 | | ||||
|   | `cve.fetch.documents` | 1 | | ||||
|   | `cve.parse.success` | 1 | | ||||
|   | `cve.map.success` | 1 | | ||||
|  | ||||
| The Grafana pack `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place. | ||||
|  | ||||
| ## 2. CISA KEV Connector (`source:kev:*`) | ||||
|  | ||||
| ### 2.1 Prerequisites | ||||
|  | ||||
| - Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`. | ||||
| - No credentials are required, but the HTTP allow-list must include `www.cisa.gov`. | ||||
| - Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed): | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     kev: | ||||
|       feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json" | ||||
|       requestTimeout: "00:01:00" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| ### 2.2 Schema validation & anomaly handling | ||||
|  | ||||
| The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons: | ||||
|  | ||||
| | Reason | Meaning | | ||||
| | --- | --- | | ||||
| | `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. | | ||||
| | `countMismatch` | Catalog `count` field disagreed with the actual entry total. | | ||||
| | `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). | | ||||
|  | ||||
| Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers. | ||||
|  | ||||
| ### 2.3 Smoke Test (staging) | ||||
|  | ||||
| 1. Deploy the configuration and restart Concelier. | ||||
| 2. Trigger a pipeline run: | ||||
|    - CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map` | ||||
|    - REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }` | ||||
| 3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`: | ||||
|    - `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures` | ||||
|    - `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`) | ||||
|    - `kev.map.advisories` (tag `catalogVersion`) | ||||
| 4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase). | ||||
| 5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta. | ||||
| 6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written. | ||||
|  | ||||
| ### 2.4 Production Monitoring | ||||
|  | ||||
| - Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`. | ||||
| - Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror. | ||||
| - Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs. | ||||
| - Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run. | ||||
|  | ||||
| ### 2.5 Known good dashboard tiles | ||||
|  | ||||
| Add the following panels to the Concelier observability board: | ||||
|  | ||||
| | Metric | Recommended visualisation | | ||||
| |--------|---------------------------| | ||||
| | `rate(kev_fetch_success_total[30m])` | Single-stat (last 24 h) with warning threshold `>0` | | ||||
| | `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area – highlights daily release size | | ||||
| | `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table – anomaly breakdown (matches dashboard panel) | | ||||
| | `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted | | ||||
|  | ||||
| ## 3. Runbook updates | ||||
|  | ||||
| - Record staging/production smoke test results (date, catalog version, advisory counts) in your team’s change log. | ||||
| - Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime. | ||||
| - Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics). | ||||
| - Version-control dashboard tweaks alongside `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores. | ||||
							
								
								
									
										123
									
								
								docs/modules/concelier/operations/connectors/ghsa.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										123
									
								
								docs/modules/concelier/operations/connectors/ghsa.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,123 @@ | ||||
| # Concelier GHSA Connector – Operations Runbook | ||||
|  | ||||
| _Last updated: 2025-10-16_ | ||||
|  | ||||
| ## 1. Overview | ||||
| The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents. | ||||
|  | ||||
| ## 2. Rate-limit telemetry | ||||
| The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry: | ||||
|  | ||||
| | Metric | Description | Tags | | ||||
| |--------|-------------|------| | ||||
| | `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). | | ||||
| | `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. | | ||||
| | `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. | | ||||
|  | ||||
| ### Dashboards & alerts | ||||
| - Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes. | ||||
| - Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10 %** for longer than a single reset window helps avoid secondary limits. | ||||
| - Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles. | ||||
| - Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective. | ||||
|  | ||||
| ## 3. Logging signals | ||||
| When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits: | ||||
| ``` | ||||
| GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%) | ||||
| ``` | ||||
| When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`). | ||||
|  | ||||
| After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly. | ||||
|  | ||||
| ## 4. Configuration knobs (`concelier.yaml`) | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     ghsa: | ||||
|       apiToken: "${GITHUB_PAT}" | ||||
|       pageSize: 50 | ||||
|       requestDelay: "00:00:00.200" | ||||
|       failureBackoff: "00:05:00" | ||||
|       rateLimitWarningThreshold: 500    # warn below this many remaining calls | ||||
|       secondaryRateLimitBackoff: "00:02:00"  # fallback delay when GitHub omits Retry-After | ||||
| ``` | ||||
|  | ||||
| ### Recommendations | ||||
| - Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption. | ||||
| - Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative. | ||||
| - For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHub’s secondary-limit guidance. | ||||
|  | ||||
| #### Default job schedule | ||||
|  | ||||
| | Job kind | Cron | Timeout | Lease | | ||||
| |----------|------|---------|-------| | ||||
| | `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes | | ||||
| | `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes | | ||||
| | `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes | | ||||
|  | ||||
| These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner. | ||||
|  | ||||
| ## 5. Provisioning credentials | ||||
|  | ||||
| Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`. | ||||
|  | ||||
| ### Docker Compose (stack operators) | ||||
| ```yaml | ||||
| services: | ||||
|   concelier: | ||||
|     environment: | ||||
|       CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat | ||||
|     secrets: | ||||
|       - ghsa_pat | ||||
|  | ||||
| secrets: | ||||
|   ghsa_pat: | ||||
|     file: ./secrets/ghsa_pat.txt  # contains only the PAT value | ||||
| ``` | ||||
|  | ||||
| ### Helm values (cluster operators) | ||||
| ```yaml | ||||
| concelier: | ||||
|   extraEnv: | ||||
|     - name: CONCELIER__SOURCES__GHSA__APITOKEN | ||||
|       valueFrom: | ||||
|         secretKeyRef: | ||||
|           name: concelier-ghsa | ||||
|           key: apiToken | ||||
|  | ||||
| extraSecrets: | ||||
|   concelier-ghsa: | ||||
|     apiToken: "<paste PAT here or source from external secret store>" | ||||
| ``` | ||||
|  | ||||
| After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads. | ||||
|  | ||||
| When enabling GHSA the first time, run a staged backfill: | ||||
|  | ||||
| 1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours. | ||||
| 2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`. | ||||
| 3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes). | ||||
|  | ||||
| ## 6. Runbook steps when throttled | ||||
| 1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`). | ||||
| 2. Confirm the connector is delaying—logs will show `GHSA rate limit exhausted...` with the chosen backoff. | ||||
| 3. If rate limits stay exhausted: | ||||
|    - Verify no other jobs are sharing the PAT. | ||||
|    - Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size. | ||||
|    - Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier. | ||||
| 4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour. | ||||
|  | ||||
| ## 7. Alert integration quick reference | ||||
| - Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) – use `histogram_quantile(0.99, ...)` to trend capacity. | ||||
| - VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs. | ||||
| - Grafana: stack remaining + used to visualise total limit per resource. | ||||
|  | ||||
| ## 8. Canonical metric fallback analytics | ||||
| When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data. | ||||
|  | ||||
| - Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`. | ||||
| - Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters. | ||||
| - Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data. | ||||
							
								
								
									
										122
									
								
								docs/modules/concelier/operations/connectors/ics-cisa.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										122
									
								
								docs/modules/concelier/operations/connectors/ics-cisa.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,122 @@ | ||||
| # Concelier CISA ICS Connector Operations | ||||
|  | ||||
| This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations. | ||||
|  | ||||
| ## 1. Credential Provisioning | ||||
|  | ||||
| 1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).   | ||||
| 2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics: | ||||
|    - `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`). | ||||
|    - `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`). | ||||
|    - `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness. | ||||
| 3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`. | ||||
| 4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`. | ||||
|  | ||||
| > ℹ️  GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git. | ||||
|  | ||||
| ## 2. Feed Validation | ||||
|  | ||||
| Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value): | ||||
|  | ||||
| ```bash | ||||
| curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \ | ||||
|      "https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>" | ||||
| ``` | ||||
|  | ||||
| If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped. | ||||
|  | ||||
| ## 3. Configuration Snippet | ||||
|  | ||||
| Add the connector configuration to `concelier.yaml` (or equivalent environment variables): | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     icscisa: | ||||
|       govDelivery: | ||||
|         code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}" | ||||
|         topics: | ||||
|           - "USDHSCISA_16" | ||||
|           - "USDHSCISA_19" | ||||
|           - "USDHSCISA_17" | ||||
|       rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA" | ||||
|       requestDelay: "00:00:01" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| Environment variable example: | ||||
|  | ||||
| ```bash | ||||
| export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF" | ||||
| ``` | ||||
|  | ||||
| Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded. | ||||
|  | ||||
|  | ||||
| Optional tuning keys (set only when needed): | ||||
|  | ||||
| - `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls. | ||||
| - `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1. | ||||
| - `enableDetailScrape` — toggle HTML detail fallback (defaults to true). | ||||
| - `captureAttachments` — collect PDF attachments from detail pages (defaults to true). | ||||
| - `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout. | ||||
|  | ||||
| ## 4. Seeding Without GovDelivery | ||||
|  | ||||
| If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch: | ||||
|  | ||||
| 1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`. | ||||
| 2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds. | ||||
| 3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories. | ||||
| 4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed. | ||||
|  | ||||
| > The CSVs are licensed under ODbL 1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them. | ||||
|  | ||||
| ## 4. Integration Validation | ||||
|  | ||||
| 1. Ensure secrets are in place and restart the Concelier workers. | ||||
| 2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic: | ||||
|    ```bash | ||||
|    CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \  | ||||
|    CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \  | ||||
|    stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map | ||||
|    ``` | ||||
| 3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits. | ||||
| 4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run. | ||||
|  | ||||
| ## 4. Rotation & Incident Response | ||||
|  | ||||
| - Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.   | ||||
| - Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.   | ||||
| - If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault. | ||||
|  | ||||
| ## 5. Offline Kit Handling | ||||
|  | ||||
| Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`: | ||||
|  | ||||
| ``` | ||||
| CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF | ||||
| ``` | ||||
|  | ||||
| The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user. | ||||
|  | ||||
| ## 6. Telemetry & Monitoring | ||||
|  | ||||
| The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out. | ||||
|  | ||||
| - `icscisa.fetch.*` – counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`). | ||||
| - `icscisa.parse.*` – counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document. | ||||
| - `icscisa.detail.*` – counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages. | ||||
| - `icscisa.map.*` – counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out. | ||||
|  | ||||
| Suggested alerts: | ||||
|  | ||||
| - `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues. | ||||
| - `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change). | ||||
| - `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating. | ||||
| - Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns. | ||||
|  | ||||
| ## 6. Related Tasks | ||||
|  | ||||
| - `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault. | ||||
| - `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`. | ||||
							
								
								
									
										74
									
								
								docs/modules/concelier/operations/connectors/kisa.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										74
									
								
								docs/modules/concelier/operations/connectors/kisa.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,74 @@ | ||||
| # Concelier KISA Connector Operations | ||||
|  | ||||
| Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`. | ||||
| - Connector options defined under `concelier:sources:kisa`: | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     kisa: | ||||
|       feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do" | ||||
|       detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do" | ||||
|       detailPageUri: "https://knvd.krcert.or.kr/detailDos.do" | ||||
|       maxAdvisoriesPerFetch: 10 | ||||
|       requestDelay: "00:00:01" | ||||
|       failureBackoff: "00:05:00" | ||||
| ``` | ||||
|  | ||||
| > Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically. | ||||
|  | ||||
| ## 2. Staging Smoke Test | ||||
|  | ||||
| 1. Restart the Concelier workers so the KISA options bind. | ||||
| 2. Run a full connector cycle: | ||||
|    - CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map` | ||||
|    - REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }` | ||||
| 3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`): | ||||
|    - `kisa.feed.success`, `kisa.feed.items` | ||||
|    - `kisa.detail.success` / `.failures` | ||||
|    - `kisa.parse.success` / `.failures` | ||||
|    - `kisa.map.success` / `.failures` | ||||
|    - `kisa.cursor.updates` | ||||
| 4. Inspect logs for structured entries: | ||||
|    - `KISA feed returned {ItemCount}` | ||||
|    - `KISA fetched detail for {Idx} … category={Category}` | ||||
|    - `KISA mapped advisory {AdvisoryId} (severity={Severity})` | ||||
|    - Absence of warnings such as `document missing GridFS payload`. | ||||
| 5. Validate MongoDB state: | ||||
|    - `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`. | ||||
|    - DTO store contains `schemaVersion="kisa.detail.v1"`. | ||||
|    - Advisories include aliases (`IDX`, CVE) and `language="ko"`. | ||||
|    - `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`. | ||||
|  | ||||
| ## 3. Production Monitoring | ||||
|  | ||||
| - **Dashboards** – Add the following Prometheus/OTEL expressions: | ||||
|   - `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])` | ||||
|   - `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0` | ||||
|   - `increase(kisa_parse_failures_total[1h])` for storage/JSON issues | ||||
|   - `increase(kisa_map_failures_total[1h])` to flag schema drift | ||||
|   - `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn | ||||
| - **Alerts** – Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`. | ||||
| - **Logs** – Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`. | ||||
|  | ||||
| ## 4. Localisation Handling | ||||
|  | ||||
| - Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF‑8 and avoid transliteration. | ||||
| - HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely. | ||||
| - Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering. | ||||
|  | ||||
| ## 5. Fixture & Regression Maintenance | ||||
|  | ||||
| - Regression fixtures: `src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`. | ||||
| - Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`. | ||||
| - The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating. | ||||
|  | ||||
| ## 6. Known Issues | ||||
|  | ||||
| - RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds. | ||||
| - Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills. | ||||
| - If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout. | ||||
							
								
								
									
										86
									
								
								docs/modules/concelier/operations/connectors/msrc.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										86
									
								
								docs/modules/concelier/operations/connectors/msrc.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,86 @@ | ||||
| # Concelier MSRC Connector – Azure AD Onboarding Brief | ||||
|  | ||||
| _Drafted: 2025-10-15_ | ||||
|  | ||||
| ## 1. App registration requirements | ||||
|  | ||||
| - **Tenant**: shared StellaOps production Azure AD. | ||||
| - **Application type**: confidential client (web/API) issuing client credentials. | ||||
| - **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once. | ||||
| - **Token audience**: `https://api.msrc.microsoft.com/`. | ||||
| - **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`. | ||||
|  | ||||
| ## 2. Secret/credential policy | ||||
|  | ||||
| - Maintain two client secrets (primary + standby) rotating every 90 days. | ||||
| - Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store. | ||||
| - Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry. | ||||
|  | ||||
| ## 3. Concelier configuration sample | ||||
|  | ||||
| ```yaml | ||||
| concelier: | ||||
|   sources: | ||||
|     vndr.msrc: | ||||
|       tenantId: "<azure-tenant-guid>" | ||||
|       clientId: "<app-registration-client-id>" | ||||
|       clientSecret: "<pull from secret store>" | ||||
|       apiVersion: "2024-08-01" | ||||
|       locale: "en-US" | ||||
|       requestDelay: "00:00:00.250" | ||||
|       failureBackoff: "00:05:00" | ||||
|       cursorOverlapMinutes: 10 | ||||
|       downloadCvrf: false  # set true to persist CVRF ZIP alongside JSON detail | ||||
| ``` | ||||
|  | ||||
| ## 4. CVRF artefacts | ||||
|  | ||||
| - The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default. | ||||
| - Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access. | ||||
| - Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval. | ||||
|  | ||||
| ### 4.1 State seeding helper | ||||
|  | ||||
| Use `src/Tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file: | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "source": "vndr.msrc", | ||||
|   "cursor": { | ||||
|     "lastModifiedCursor": "2024-01-01T00:00:00Z" | ||||
|   }, | ||||
|   "documents": [ | ||||
|     { | ||||
|       "uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001", | ||||
|       "contentFile": "./seeds/adv2024-0001.json", | ||||
|       "contentType": "application/json", | ||||
|       "metadata": { "msrc.vulnerabilityId": "ADV2024-0001" }, | ||||
|       "addToPendingDocuments": true | ||||
|     }, | ||||
|     { | ||||
|       "uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip", | ||||
|       "contentFile": "./seeds/adv2024-0001.cvrf.zip", | ||||
|       "contentType": "application/zip", | ||||
|       "status": "mapped", | ||||
|       "addToPendingDocuments": false | ||||
|     } | ||||
|   ] | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Run the helper: | ||||
|  | ||||
| ```bash | ||||
| dotnet run --project src/Tools/SourceStateSeeder -- \ | ||||
|   --connection-string "mongodb://localhost:27017" \ | ||||
|   --database concelier \ | ||||
|   --input seeds/msrc-backfill.json | ||||
| ``` | ||||
|  | ||||
| Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded. | ||||
|  | ||||
| ## 5. Outstanding items | ||||
|  | ||||
| - Ops to confirm tenant/app names and provide client credentials through the secure channel. | ||||
| - Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials. | ||||
| - Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions. | ||||
							
								
								
									
										48
									
								
								docs/modules/concelier/operations/connectors/nkcki.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										48
									
								
								docs/modules/concelier/operations/connectors/nkcki.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,48 @@ | ||||
| # NKCKI Connector Operations Guide | ||||
|  | ||||
| ## Overview | ||||
|  | ||||
| The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring. | ||||
|  | ||||
| ## Configuration | ||||
|  | ||||
| Key options exposed through `concelier:sources:ru-nkcki:http`: | ||||
|  | ||||
| - `maxBulletinsPerFetch` – limits new bulletin downloads in a single run (default `5`). | ||||
| - `maxListingPagesPerFetch` – maximum listing pages visited during pagination (default `3`). | ||||
| - `listingCacheDuration` – minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`). | ||||
| - `cacheDirectory` – optional path for persisted bulletin archives used during offline or failure scenarios. | ||||
| - `requestDelay` – delay inserted between bulletin downloads to respect upstream politeness. | ||||
|  | ||||
| When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit. | ||||
|  | ||||
| ## Telemetry | ||||
|  | ||||
| `RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`: | ||||
|  | ||||
| - `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures` | ||||
| - `nkcki.listing.pages.visited` (histogram, `pages`) | ||||
| - `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new` | ||||
| - `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures` | ||||
| - `nkcki.entries.processed` (histogram, `entries`) | ||||
|  | ||||
| Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates. | ||||
|  | ||||
| ## Archive Backfill Strategy | ||||
|  | ||||
| Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy: | ||||
|  | ||||
| 1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms. | ||||
| 2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked. | ||||
| 3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`. | ||||
| 4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`). | ||||
|  | ||||
| For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs. | ||||
|  | ||||
| ## Failure Handling | ||||
|  | ||||
| - Listing failures mark the source state with exponential backoff while attempting cache replay. | ||||
| - Bulletin fetches fall back to cached copies before surfacing an error. | ||||
| - Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`src/Tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros. | ||||
|  | ||||
| Refer to `ru-nkcki` entries in `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items. | ||||
							
								
								
									
										24
									
								
								docs/modules/concelier/operations/connectors/osv.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										24
									
								
								docs/modules/concelier/operations/connectors/osv.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,24 @@ | ||||
| # Concelier OSV Connector – Operations Notes | ||||
|  | ||||
| _Last updated: 2025-10-16_ | ||||
|  | ||||
| The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4. | ||||
|  | ||||
| ## 1. Canonical metric fallbacks | ||||
| - When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`. | ||||
| - Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories. | ||||
| - Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data. | ||||
|  | ||||
| ## 2. CWE provenance | ||||
| - `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value. | ||||
| - If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker. | ||||
|  | ||||
| ## 3. Dashboards & alerts | ||||
| - Extend existing merge dashboards with the new counter: | ||||
|   - Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly. | ||||
|   - Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only). | ||||
| - Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update. | ||||
|  | ||||
| ## 4. Runbook updates | ||||
| - Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`. | ||||
| - When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs. | ||||
							
								
								
									
										238
									
								
								docs/modules/concelier/operations/mirror.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										238
									
								
								docs/modules/concelier/operations/mirror.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,238 @@ | ||||
| # Concelier & Excititor Mirror Operations | ||||
|  | ||||
| This runbook describes how Stella Ops operates the managed mirrors under `*.stella-ops.org`. | ||||
| It covers Docker Compose and Helm deployment overlays, secret handling for multi-tenant | ||||
| authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles current. | ||||
|  | ||||
| ## 1. Prerequisites | ||||
|  | ||||
| - **Authority access** – client credentials (`client_id` + secret) authorised for | ||||
|   `concelier.mirror.read` and `excititor.mirror.read` scopes. Secrets live outside git. | ||||
| - **Signed TLS certificates** – wildcard or per-domain (`mirror-primary`, `mirror-community`). | ||||
|   Store them under `deploy/compose/mirror-gateway/tls/` or in Kubernetes secrets. | ||||
| - **Mirror gateway credentials** – Basic Auth htpasswd files per domain. Generate with | ||||
|   `htpasswd -B`. Operators distribute credentials to downstream consumers. | ||||
| - **Export artifact source** – read access to the canonical S3 buckets (or rsync share) | ||||
|   that hold `concelier` JSON bundles and `excititor` VEX exports. | ||||
| - **Persistent volumes** – storage for Concelier job metadata and mirror export trees. | ||||
|   For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`, | ||||
|   `excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout. | ||||
|  | ||||
| ### 1.1 Service configuration quick reference | ||||
|  | ||||
| Concelier.WebService exposes the mirror HTTP endpoints once `CONCELIER__MIRROR__ENABLED=true`. | ||||
| Key knobs: | ||||
|  | ||||
| - `CONCELIER__MIRROR__EXPORTROOT` – root folder containing export snapshots (`<exportId>/mirror/*`). | ||||
| - `CONCELIER__MIRROR__ACTIVEEXPORTID` – optional explicit export id; otherwise the service auto-falls back to the `latest/` symlink or newest directory. | ||||
| - `CONCELIER__MIRROR__REQUIREAUTHENTICATION` – default auth requirement; override per domain with `CONCELIER__MIRROR__DOMAINS__{n}__REQUIREAUTHENTICATION`. | ||||
| - `CONCELIER__MIRROR__MAXINDEXREQUESTSPERHOUR` – budget for `/concelier/exports/index.json`. Domains inherit this value unless they define `__MAXDOWNLOADREQUESTSPERHOUR`. | ||||
| - `CONCELIER__MIRROR__DOMAINS__{n}__ID` – domain identifier matching the exporter manifest; additional keys configure display name and rate budgets. | ||||
|  | ||||
| > The service honours Stella Ops Authority when `CONCELIER__AUTHORITY__ENABLED=true` and `ALLOWANONYMOUSFALLBACK=false`. Use the bypass CIDR list (`CONCELIER__AUTHORITY__BYPASSNETWORKS__*`) for in-cluster ingress gateways that terminate Basic Auth. Unauthorized requests emit `WWW-Authenticate: Bearer` so downstream automation can detect token failures. | ||||
|  | ||||
| Mirror responses carry deterministic cache headers: `/index.json` returns `Cache-Control: public, max-age=60`, while per-domain manifests/bundles include `Cache-Control: public, max-age=300, immutable`. Rate limiting surfaces `Retry-After` when quotas are exceeded. | ||||
|  | ||||
| ### 1.2 Mirror connector configuration | ||||
|  | ||||
| Downstream Concelier instances ingest published bundles using the `StellaOpsMirrorConnector`. Operators running the connector in air‑gapped or limited connectivity environments can tune the following options (environment prefix `CONCELIER__SOURCES__STELLAOPSMIRROR__`): | ||||
|  | ||||
| - `BASEADDRESS` – absolute mirror root (e.g., `https://mirror-primary.stella-ops.org`). | ||||
| - `INDEXPATH` – relative path to the mirror index (`/concelier/exports/index.json` by default). | ||||
| - `DOMAINID` – mirror domain identifier from the index (`primary`, `community`, etc.). | ||||
| - `HTTPTIMEOUT` – request timeout; raise when mirrors sit behind slow WAN links. | ||||
| - `SIGNATURE__ENABLED` – require detached JWS verification for `bundle.json`. | ||||
| - `SIGNATURE__KEYID` / `SIGNATURE__PROVIDER` – expected signing key metadata. | ||||
| - `SIGNATURE__PUBLICKEYPATH` – PEM fallback used when the mirror key registry is offline. | ||||
|  | ||||
| The connector keeps a per-export fingerprint (bundle digest + generated-at timestamp) and tracks outstanding document IDs. If a scan is interrupted, the next run resumes parse/map work using the stored fingerprint and pending document lists—no network requests are reissued unless the upstream digest changes. | ||||
|  | ||||
| ## 2. Secret & certificate layout | ||||
|  | ||||
| ### Docker Compose (`deploy/compose/docker-compose.mirror.yaml`) | ||||
|  | ||||
| - `deploy/compose/env/mirror.env.example` – copy to `.env` and adjust quotas or domain IDs. | ||||
| - `deploy/compose/mirror-secrets/` – mount read-only into `/run/secrets`. Place: | ||||
|   - `concelier-authority-client` – Authority client secret. | ||||
|   - `excititor-authority-client` (optional) – reserve for future authn. | ||||
| - `deploy/compose/mirror-gateway/tls/` – PEM-encoded cert/key pairs: | ||||
|   - `mirror-primary.crt`, `mirror-primary.key` | ||||
|   - `mirror-community.crt`, `mirror-community.key` | ||||
| - `deploy/compose/mirror-gateway/secrets/` – htpasswd files: | ||||
|   - `mirror-primary.htpasswd` | ||||
|   - `mirror-community.htpasswd` | ||||
|  | ||||
| ### Helm (`deploy/helm/stellaops/values-mirror.yaml`) | ||||
|  | ||||
| Create secrets in the target namespace: | ||||
|  | ||||
| ```bash | ||||
| kubectl create secret generic concelier-mirror-auth \ | ||||
|   --from-file=concelier-authority-client=concelier-authority-client | ||||
|  | ||||
| kubectl create secret generic excititor-mirror-auth \ | ||||
|   --from-file=excititor-authority-client=excititor-authority-client | ||||
|  | ||||
| kubectl create secret tls mirror-gateway-tls \ | ||||
|   --cert=mirror-primary.crt --key=mirror-primary.key | ||||
|  | ||||
| kubectl create secret generic mirror-gateway-htpasswd \ | ||||
|   --from-file=mirror-primary.htpasswd --from-file=mirror-community.htpasswd | ||||
| ``` | ||||
|  | ||||
| > Keep Basic Auth lists short-lived (rotate quarterly) and document credential recipients. | ||||
|  | ||||
| ## 3. Deployment | ||||
|  | ||||
| ### 3.1 Docker Compose (edge mirrors, lab validation) | ||||
|  | ||||
| 1. `cp deploy/compose/env/mirror.env.example deploy/compose/env/mirror.env` | ||||
| 2. Populate secrets/tls directories as described above. | ||||
| 3. Sync mirror bundles (see §4) into `deploy/compose/mirror-data/…` and ensure they are mounted | ||||
|    on the host path backing the `concelier-exports` and `excititor-exports` volumes. | ||||
| 4. Run the profile validator: `deploy/tools/validate-profiles.sh`. | ||||
| 5. Launch: `docker compose --env-file env/mirror.env -f docker-compose.mirror.yaml up -d`. | ||||
|  | ||||
| ### 3.2 Helm (production mirrors) | ||||
|  | ||||
| 1. Provision PVCs sized for mirror bundles (baseline: 20 GiB per domain). | ||||
| 2. Create secrets/tls config maps (§2). | ||||
| 3. `helm upgrade --install mirror deploy/helm/stellaops -f deploy/helm/stellaops/values-mirror.yaml`. | ||||
| 4. Annotate the `stellaops-mirror-gateway` service with ingress/LoadBalancer metadata required by | ||||
|    your CDN (e.g., AWS load balancer scheme internal + NLB idle timeout). | ||||
|  | ||||
| ## 4. Artifact sync workflow | ||||
|  | ||||
| Mirrors never generate exports—they ingest signed bundles produced by the Concelier and Excititor | ||||
| export jobs. Recommended sync pattern: | ||||
|  | ||||
| ### 4.1 Compose host (systemd timer) | ||||
|  | ||||
| `/usr/local/bin/mirror-sync.sh`: | ||||
|  | ||||
| ```bash | ||||
| #!/usr/bin/env bash | ||||
| set -euo pipefail | ||||
| export AWS_ACCESS_KEY_ID=… | ||||
| export AWS_SECRET_ACCESS_KEY=… | ||||
|  | ||||
| aws s3 sync s3://mirror-stellaops/concelier/latest \ | ||||
|   /opt/stellaops/mirror-data/concelier --delete --size-only | ||||
|  | ||||
| aws s3 sync s3://mirror-stellaops/excititor/latest \ | ||||
|   /opt/stellaops/mirror-data/excititor --delete --size-only | ||||
| ``` | ||||
|  | ||||
| Schedule with a systemd timer every 5 minutes. The Compose volumes mount `/opt/stellaops/mirror-data/*` | ||||
| into the containers read-only, matching `CONCELIER__MIRROR__EXPORTROOT=/exports/json` and | ||||
| `EXCITITOR__ARTIFACTS__FILESYSTEM__ROOT=/exports`. | ||||
|  | ||||
| ### 4.2 Kubernetes (CronJob) | ||||
|  | ||||
| Create a CronJob running the AWS CLI (or rclone) in the same namespace, writing into the PVCs: | ||||
|  | ||||
| ```yaml | ||||
| apiVersion: batch/v1 | ||||
| kind: CronJob | ||||
| metadata: | ||||
|   name: mirror-sync | ||||
| spec: | ||||
|   schedule: "*/5 * * * *" | ||||
|   jobTemplate: | ||||
|     spec: | ||||
|       template: | ||||
|         spec: | ||||
|           containers: | ||||
|           - name: sync | ||||
|             image: public.ecr.aws/aws-cli/aws-cli@sha256:5df5f52c29f5e3ba46d0ad9e0e3afc98701c4a0f879400b4c5f80d943b5fadea | ||||
|             command: | ||||
|               - /bin/sh | ||||
|               - -c | ||||
|               - > | ||||
|                 aws s3 sync s3://mirror-stellaops/concelier/latest /exports/concelier --delete --size-only && | ||||
|                 aws s3 sync s3://mirror-stellaops/excititor/latest /exports/excititor --delete --size-only | ||||
|             volumeMounts: | ||||
|               - name: concelier-exports | ||||
|                 mountPath: /exports/concelier | ||||
|               - name: excititor-exports | ||||
|                 mountPath: /exports/excititor | ||||
|             envFrom: | ||||
|               - secretRef: | ||||
|                   name: mirror-sync-aws | ||||
|           restartPolicy: OnFailure | ||||
|           volumes: | ||||
|             - name: concelier-exports | ||||
|               persistentVolumeClaim: | ||||
|                 claimName: concelier-mirror-exports | ||||
|             - name: excititor-exports | ||||
|               persistentVolumeClaim: | ||||
|                 claimName: excititor-mirror-exports | ||||
| ``` | ||||
|  | ||||
| ## 5. CDN integration | ||||
|  | ||||
| 1. Point the CDN origin at the mirror gateway (Compose host or Kubernetes LoadBalancer). | ||||
| 2. Honour the response headers emitted by the gateway and Concelier/Excititor: | ||||
|    `Cache-Control: public, max-age=300, immutable` for mirror payloads. | ||||
| 3. Configure origin shields in the CDN to prevent cache stampedes. Recommended TTLs: | ||||
|    - Index (`/concelier/exports/index.json`, `/excititor/mirror/*/index`) → 60 s. | ||||
|    - Bundle/manifest payloads → 300 s. | ||||
| 4. Forward the `Authorization` header—Basic Auth terminates at the gateway. | ||||
| 5. Enforce per-domain rate limits at the CDN (matching gateway budgets) and enable logging | ||||
|    to SIEM for anomaly detection. | ||||
|  | ||||
| ## 6. Smoke tests | ||||
|  | ||||
| After each deployment or sync cycle (temporarily set low budgets if you need to observe 429 responses): | ||||
|  | ||||
| ```bash | ||||
| # Index with Basic Auth | ||||
| curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/index.json | jq 'keys' | ||||
|  | ||||
| # Mirror manifest signature and cache headers | ||||
| curl -u $PRIMARY_CREDS -I https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/manifest.json \ | ||||
|   | tee /tmp/manifest-headers.txt | ||||
| grep -E '^Cache-Control: ' /tmp/manifest-headers.txt   # expect public, max-age=300, immutable | ||||
|  | ||||
| # Excititor consensus bundle metadata | ||||
| curl -u $COMMUNITY_CREDS https://mirror-community.stella-ops.org/excititor/mirror/community/index \ | ||||
|   | jq '.exports[].exportKey' | ||||
|  | ||||
| # Signed bundle + detached JWS (spot check digests) | ||||
| curl -u $PRIMARY_CREDS https://mirror-primary.stella-ops.org/concelier/exports/mirror/primary/bundle.json.jws \ | ||||
|   -o bundle.json.jws | ||||
| cosign verify-blob --signature bundle.json.jws --key mirror-key.pub bundle.json | ||||
|  | ||||
| # Service-level auth check (inside cluster – no gateway credentials) | ||||
| kubectl exec deploy/stellaops-concelier -- curl -si http://localhost:8443/concelier/exports/mirror/primary/manifest.json \ | ||||
|   | head -n 5   # expect HTTP/1.1 401 with WWW-Authenticate: Bearer | ||||
|  | ||||
| # Rate limit smoke (repeat quickly; second call should return 429 + Retry-After) | ||||
| for i in 1 2; do | ||||
|   curl -s -o /dev/null -D - https://mirror-primary.stella-ops.org/concelier/exports/index.json \ | ||||
|     -u $PRIMARY_CREDS | grep -E '^(HTTP/|Retry-After:)' | ||||
|   sleep 1 | ||||
| done | ||||
| ``` | ||||
|  | ||||
| Watch the gateway metrics (`nginx_vts` or access logs) for cache hits. In Kubernetes, `kubectl logs deploy/stellaops-mirror-gateway` | ||||
| should show `X-Cache-Status: HIT/MISS`. | ||||
|  | ||||
| ## 7. Maintenance & rotation | ||||
|  | ||||
| - **Bundle freshness** – alert if sync job lag exceeds 15 minutes or if `concelier` logs | ||||
|   `Mirror export root is not configured`. | ||||
| - **Secret rotation** – change Authority client secrets and Basic Auth credentials quarterly. | ||||
|   Update the mounted secrets and restart deployments (`docker compose restart concelier` or | ||||
|   `kubectl rollout restart deploy/stellaops-concelier`). | ||||
| - **TLS renewal** – reissue certificates, place new files, and reload gateway (`docker compose exec mirror-gateway nginx -s reload`). | ||||
| - **Quota tuning** – adjust per-domain `MAXDOWNLOADREQUESTSPERHOUR` in `.env` or values file. | ||||
|   Align CDN rate limits and inform downstreams. | ||||
|  | ||||
| ## 8. References | ||||
|  | ||||
| - Deployment profiles: `deploy/compose/docker-compose.mirror.yaml`, | ||||
|   `deploy/helm/stellaops/values-mirror.yaml` | ||||
| - Mirror architecture dossiers: `docs/modules/concelier/architecture.md`, | ||||
|   `docs/modules/excititor/mirrors.md` | ||||
| - Export bundling: `docs/modules/devops/architecture.md` §3, `docs/modules/excititor/architecture.md` §7 | ||||
							
								
								
									
										22
									
								
								docs/modules/devops/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/devops/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # DevOps agent guide | ||||
|  | ||||
| ## Mission | ||||
| The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										41
									
								
								docs/modules/devops/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										41
									
								
								docs/modules/devops/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,41 @@ | ||||
| # StellaOps DevOps | ||||
|  | ||||
| The DevOps module captures release, deployment, and migration playbooks that keep StellaOps deterministic across environments. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Maintain CI pipelines, signing workflows, and release packaging steps. | ||||
| - Operate shared runbooks for launch readiness, upgrades, and NuGet previews. | ||||
| - Provide offline kit assembly instructions and tooling integration. | ||||
| - Wrap observability/telemetry bootstrap flows for platform teams. | ||||
|  | ||||
| ## Key components | ||||
| - Runbooks under ./runbooks/ (launch, deployment, nuget). | ||||
| - Migration guidance under ./migrations/. | ||||
| - Architecture overview bridging CI/CD & infrastructure concerns. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Ops pipelines (Gitea, GitHub Actions) and artifact registries. | ||||
| - Authority/Signer for supply chain signing. | ||||
| - Telemetry stack bootstrap scripts. | ||||
|  | ||||
| ## Operational notes | ||||
| - Offline bundle packaging guidance in docs/modules/export-center/operations/runbook.md. | ||||
| - Dashboards for launch cutover rehearsals. | ||||
| - Coordination with Security for enforced guardrails. | ||||
|  | ||||
| ## Related resources | ||||
| - ./runbooks/launch-readiness.md | ||||
| - ./runbooks/launch-cutover.md | ||||
| - ./runbooks/deployment-upgrade.md | ||||
| - ./runbooks/nuget-preview-bootstrap.md | ||||
| - ./migrations/semver-style.md | ||||
|  | ||||
| ## Backlog references | ||||
| - DEVOPS-LAUNCH-18-001 / 18-900 runbooks in ../../TASKS.md. | ||||
| - Telemetry bootstrap automation tracked in `ops/devops/TASKS.md`. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** bake AOC verifier steps, CI guards, and schema validation into pipelines. | ||||
| - **Epic 9 – Orchestrator Dashboard:** support operational dashboards, job recovery runbooks, and rate-limit governance. | ||||
| - **Epic 10 – Export Center:** manage signing workflows, Offline Kit packaging, and release promotion for exports. | ||||
| - **Epic 15 – Observability & Forensics:** coordinate telemetry deployment, evidence retention, and forensic automation. | ||||
							
								
								
									
										9
									
								
								docs/modules/devops/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/devops/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — DevOps | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | DEVOPS-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | DEVOPS-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | DEVOPS-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										488
									
								
								docs/modules/devops/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										488
									
								
								docs/modules/devops/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,488 @@ | ||||
| # component_architecture_devops.md — **Stella Ops Release & Operations** (2025Q4) | ||||
|  | ||||
| > Draws from the AOC guardrails, Orchestrator, Export Center, and Observability module plans to describe how Stella Ops is built, signed, distributed, and operated. | ||||
|  | ||||
| > **Scope.** Implementation‑ready blueprint for **how Stella Ops is built, versioned, signed, distributed, upgraded, licensed (PoE)**, and operated in customer environments (online and air‑gapped). Covers reproducible builds, supply‑chain attestations, registries, offline kits, migration/rollback, artifact lifecycle (RustFS default + Mongo, S3 fallback), monitoring SLOs, and customer activation. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Product vision (operations lens) | ||||
|  | ||||
| Stella Ops must be **trustable at a glance** and **boringly operable**: | ||||
|  | ||||
| * Every release ships with **first‑party SBOMs, provenance, and signatures**; services verify **each other’s** integrity at runtime. | ||||
| * Customers can deploy by **digest** and stay aligned with **LTS/stable/edge** channels. | ||||
| * Paid customers receive **attestation authority** (Signer accepts their PoE) while the core platform remains **free to run**. | ||||
| * Air‑gapped customers receive **offline kits** with verifiable digests and deterministic import. | ||||
| * Artifacts expire predictably; operators know what’s kept, for how long, and why. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Release trains & versioning | ||||
|  | ||||
| ### 1.1 Channels | ||||
|  | ||||
| * **LTS** (12‑month support window): quarterly cadence (Q1/Q2/Q3/Q4). | ||||
| * **Stable** (default): monthly rollup (bug fixes + compatible features). | ||||
| * **Edge**: weekly; for early adopters, no guarantees. | ||||
|  | ||||
| ### 1.2 Version strings | ||||
|  | ||||
| Semantic core + calendar tag: | ||||
|  | ||||
| ``` | ||||
| <MAJOR>.<MINOR>.<PATCH>  (<YYYY>.<MM>)   e.g., 2.4.1 (2027.06) | ||||
| ``` | ||||
|  | ||||
| * **MAJOR**: breaking API/DB changes (rare). | ||||
| * **MINOR**: new features, compatible schema migrations (expand/contract pattern). | ||||
| * **PATCH**: bug fixes, perf and security updates. | ||||
| * **Calendar tag** exposes **release year** used by Signer for **PoE window checks**. | ||||
|  | ||||
| ### 1.3 Component alignment | ||||
|  | ||||
| A release is a **bundle** of image digests + charts + manifests. All services in a bundle are **wire‑compatible**. Mixed minor versions are allowed within a bounded skew: | ||||
|  | ||||
| * **Web UI ↔ backend**: `±1 minor`. | ||||
| * **Scanner ↔ Policy/Excititor/Concelier**: `±1 minor`. | ||||
| * **Authority/Signer/Attestor triangle**: **must** be same minor (crypto and DPoP/mTLS binding rules). | ||||
|  | ||||
| At startup, services **self‑advertise** their semver & channel; the UI surfaces **mismatch warnings**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Supply‑chain pipeline (how a release is built) | ||||
|  | ||||
| ### 2.1 Deterministic builds | ||||
|  | ||||
| * **Builders**: isolated **BuildKit** workers with pinned base images (digest only). | ||||
| * **Pinning**: lock files or `go.mod`, `package-lock.json`, `global.json`, `Directory.Packages.props` are **frozen** at tag. | ||||
| * **Reproducibility**: timestamps normalized; source date epoch; deterministic zips/tars. | ||||
| * **Multi‑arch**: linux/amd64 + linux/arm64 (Windows images track M2 roadmap). | ||||
|  | ||||
| ### 2.2 First‑party SBOMs & provenance | ||||
|  | ||||
| * Each image gets **CycloneDX (JSON+Protobuf) SBOM** and **SLSA‑style provenance** attached as **OCI referrers**. | ||||
| * Scanner’s **Buildx generator** is used to produce SBOMs *during* build; a separate post‑build scan verifies parity (red flag if drift). | ||||
| * **Release manifest** (see §6.1) lists all digests and SBOM/attestation refs. | ||||
|  | ||||
| ### 2.3 Signing & transparency | ||||
|  | ||||
| * Images are **cosign‑signed** (keyless) with a Stella Ops release identity; inclusion in a **transparency log** (Rekor) is required. | ||||
| * SBOM and provenance attestations are **DSSE** and also transparency‑logged. | ||||
| * Release keys (Fulcio roots or public keys) are embedded in **Signer** policy (for **scanner‑release validation** at customer side). | ||||
|  | ||||
| ### 2.4 Gates & tests | ||||
|  | ||||
| * **Static**: linters, codegen checks, protobuf API freeze (backward‑compat tests). | ||||
| * **Unit/integration**: per‑component, plus **end‑to‑end** flows (scan→vex→policy→sign→attest). | ||||
| * **Perf SLOs**: hot paths (SBOM compose, diff, export) measured against budgets. | ||||
| * **Security**: dependency audit vs Concelier export; container hardening tests; minimal caps. | ||||
| * **Analyzer smoke**: restart-time language plug-ins (currently Python) verified via `dotnet run --project src/Tools/LanguageAnalyzerSmoke` to ensure manifest integrity plus cold vs warm determinism (< 30 s / < 5 s budgets); the harness logs deviations from repository goldens for follow-up. | ||||
| * **Canary cohort**: internal staging + selected customers; one week on **edge** before **stable** tag. | ||||
|  | ||||
| ### 2.5 Debug-store artefacts | ||||
|  | ||||
| * Every release exports stripped debug information for ELF binaries discovered in service images. Debug files follow the GNU build-id layout (`debug/.build-id/<aa>/<rest>.debug`) and are generated via `objcopy --only-keep-debug`. | ||||
| * `debug/debug-manifest.json` captures build-id → component/image/source mappings with SHA-256 checksums so operators can mirror the directory into debuginfod or offline symbol stores. The manifest (and its `.sha256` companion) ships with every release bundle and Offline Kit. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Distribution & activation | ||||
|  | ||||
| ### 3.1 Registries | ||||
|  | ||||
| * **Primary**: `registry.stella-ops.org` (OCI v2, supports Referrers API). | ||||
| * **Mirrors**: GHCR (read‑only), regional mirrors for latency. | ||||
|   * Operational runbook: see `docs/modules/concelier/operations/mirror.md` for deployment profiles, CDN guidance, and sync automation. | ||||
| * **Pull by digest only** in Kubernetes/Compose manifests. | ||||
|  | ||||
| **Gating policy**: | ||||
|  | ||||
| * **Core images** (Authority, Scanner, Concelier, Excititor, Attestor, UI): public **read**. | ||||
| * **Enterprise add‑ons** (if any) and **pre‑release**: private repos via the **Registry Token Service** (`src/Registry/StellaOps.Registry.TokenService`) which exchanges Authority-issued OpToks for short-lived Docker registry bearer tokens. | ||||
|  | ||||
| > Monetization lever is **signing** (PoE gate), not image pulls, so the core remains simple to consume. | ||||
|  | ||||
| ### 3.2 OAuth2 token service (for private repos) | ||||
|  | ||||
| * Docker Registry’s token flow backed by **Authority**: | ||||
|  | ||||
|   1. Client hits registry (`401` with `WWW-Authenticate: Bearer realm=…`). | ||||
|   2. Client gets an **access token** from the token service (validated by Authority) with `scope=repository:…:pull`. | ||||
|   3. Registry allows pull for the requested repo. | ||||
| * Tokens are **short‑lived** (60–300 s) and **DPoP‑bound**. | ||||
|  | ||||
| The token service enforces plan gating via `registry-token.yaml` (see `docs/modules/registry/operations/token-service.md`) and exposes Prometheus metrics (`registry_token_issued_total`, `registry_token_rejected_total`). Revoked licence identifiers halt issuance even when scope requirements are met. | ||||
|  | ||||
| ### 3.3 Offline kits (air‑gapped) | ||||
|  | ||||
| * Tarball per release channel: | ||||
|  | ||||
|   ``` | ||||
|   stellaops-kit-<ver>-<channel>.tar.zst | ||||
|     /images/   OCI layout with all first-party images (multi-arch) | ||||
|     /sboms/    CycloneDX JSON+PB for each image | ||||
|     /attest/   DSSE bundles + Rekor proofs | ||||
|     /charts/   Helm charts + values templates | ||||
|     /compose/  docker-compose.yml + .env template | ||||
|     /plugins/  Concelier/Excititor connectors (restart-time) | ||||
|     /policy/   example policies | ||||
|     /manifest/ release.yaml  (see §6.1) | ||||
|   ``` | ||||
| * Import via CLI `offline kit import`; checks digests and signatures before load. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Licensing (PoE) & monetization | ||||
|  | ||||
| **Principle**: **Only paid Stella Ops issues valid signed attestations.** Running the stack is free; signing requires PoE. | ||||
|  | ||||
| ### 4.1 PoE issuance | ||||
|  | ||||
| * Customers purchase a plan and obtain a **PoE artifact** from `www.stella-ops.org`: | ||||
|  | ||||
|   * **PoE‑JWT** (DPoP/mTLS‑bound) **or** **PoE mTLS client certificate**. | ||||
|   * Contains: `license_id`, `plan`, `valid_release_year`, `max_version`, `exp`, optional `tenant/customer` IDs. | ||||
|  | ||||
| ### 4.2 Online enforcement | ||||
|  | ||||
| * **Signer** calls **Licensing /license/introspect** on every signing request (see signer doc). | ||||
| * If **revoked/expired/out‑of‑window** → deny with machine‑readable reason. | ||||
| * All **valid** bundles are DSSE‑signed and **Attestor** logs them; Rekor UUID returned. | ||||
| * UI badges: “**Verified by Stella Ops**” with link to the public log. | ||||
|  | ||||
| ### 4.3 Air‑gapped / offline | ||||
|  | ||||
| * Customers obtain a **time‑boxed PoE lease** (signed JSON, 7–30 days). | ||||
| * Signer accepts the lease and emits **provisional** attestations (clearly labeled). | ||||
| * When connectivity returns, a background job **endorses** the provisional entries with the cloud service, updating their status to **verified**. | ||||
| * Operators can export a **verification bundle** for auditors even before endorsement (contains DSSE + local Rekor proof + lease snapshot). | ||||
|  | ||||
| ### 4.4 Stolen/abused PoE | ||||
|  | ||||
| * Customers report theft; **Licensing** flags `license_id` as **revoked**. | ||||
| * Subsequent Signer requests **deny**; previous attestations remain but can be marked **contested** (UI shows badge, optional re‑sign path upon new PoE). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Deployment path (customer side) | ||||
|  | ||||
| ### 5.1 First install | ||||
|  | ||||
| * **Helm** (Kubernetes) or **Compose** (VMs). Example (K8s): | ||||
|  | ||||
| ```bash | ||||
| helm repo add stellaops https://charts.stella-ops.org | ||||
| helm install stella stellaops/platform \ | ||||
|   --version 2.4.0 \ | ||||
|   --set global.channel=stable \ | ||||
|   --set authority.issuer=https://authority.stella.local \ | ||||
|   --set scanner.minio.endpoint=http://minio.stella.local:9000 \ | ||||
|   --set scanner.mongo.uri=mongodb://mongo/scanner \ | ||||
|   --set concelier.mongo.uri=mongodb://mongo/concelier \ | ||||
|   --set excititor.mongo.uri=mongodb://mongo/excititor | ||||
| ``` | ||||
|  | ||||
| * Post‑install job registers **Authority clients** (Scanner, Signer, Attestor, UI) and prints **bootstrap** URLs and client credentials (sealed secrets). | ||||
| * UI banner shows **release bundle** and verification state (cosign OK? Rekor OK?). | ||||
|  | ||||
| ### 5.2 Updates | ||||
|  | ||||
| * **Blue/green**: pull new bundle by **digest**; deploy side‑by‑side; cut traffic. | ||||
|  | ||||
| * **Rolling**: upgrade stateful components in safe order: | ||||
|  | ||||
|   1. Authority (stateless, dual‑key rotation ready) | ||||
|   2. Signer/Attestor (same minor) | ||||
|   3. Scanner WebService & Workers | ||||
|   4. Concelier, then Excititor (schema migrations are expand/contract) | ||||
|   5. UI last | ||||
|  | ||||
| * **DB migrations** are **expand/contract**: | ||||
|  | ||||
|   * Phase A (release N): **add** new fields/indexes, write old+new. | ||||
|   * Phase B (N+1): **read** new fields; **drop** old. | ||||
|   * Rollback is a matter of redeploying previous images and keeping both schemas valid. | ||||
|  | ||||
| ### 5.3 Rollback | ||||
|  | ||||
| * Images referenced by **digest**; keep previous release manifest `K` versions back. | ||||
| * `helm rollback` or compose `docker compose -f release-K.yml up -d`. | ||||
| * Mongo migrations are additive; **no destructive changes** within a single minor. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Release payloads & manifests | ||||
|  | ||||
| ### 6.1 Release manifest (`release.yaml`) | ||||
|  | ||||
| ```yaml | ||||
| release: | ||||
|   version: "2.4.1" | ||||
|   channel: "stable" | ||||
|   date: "2027-06-20T12:00:00Z" | ||||
|   calendar: "2027.06" | ||||
|   components: | ||||
|     - name: scanner-webservice | ||||
|       image: registry.stella-ops.org/stellaops/scanner-web@sha256:aa..bb | ||||
|       sbom: oci://.../referrers/cdx-json@sha256:11..22 | ||||
|       provenance: oci://.../attest/provenance@sha256:33..44 | ||||
|       signature: { rekorUUID: "…" } | ||||
|     - name: signer | ||||
|       image: registry.stella-ops.org/stellaops/signer@sha256:cc..dd | ||||
|       signature: { rekorUUID: "…" } | ||||
|   charts: | ||||
|     - name: platform | ||||
|       version: "2.4.1" | ||||
|       digest: "sha256:ee..ff" | ||||
|   compose: | ||||
|     file: "docker-compose.yml" | ||||
|     digest: "sha256:77..88" | ||||
|   checksums: | ||||
|     sha256: "… digest of this release.yaml …" | ||||
| ``` | ||||
|  | ||||
| The manifest is **cosign‑signed**; UI/CLI can verify a bundle without talking to registries. | ||||
|  | ||||
| > Deployment guardrails – The repository keeps channel-aligned Compose bundles | ||||
| > in `deploy/compose/` and Helm overlays in `deploy/helm/stellaops/`. Both sets | ||||
| > pull their digests from `deploy/releases/` and are validated by | ||||
| > `deploy/tools/validate-profiles.sh` to guarantee lint/dry-run cleanliness. | ||||
|  | ||||
| ### 6.2 Image labels (release metadata) | ||||
|  | ||||
| Each image sets OCI labels: | ||||
|  | ||||
| ``` | ||||
| org.opencontainers.image.version = "2.4.1" | ||||
| org.opencontainers.image.revision = "<git sha>" | ||||
| org.opencontainers.image.created = "2027-06-20T12:00:00Z" | ||||
| org.stellaops.release.calendar = "2027.06" | ||||
| org.stellaops.release.channel  = "stable" | ||||
| org.stellaops.build.slsaProvenance = "oci://…" | ||||
| ``` | ||||
|  | ||||
| Signer validates **scanner** image’s cosign identity + calendar tag for **release window** checks. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Artifact lifecycle & storage (RustFS/Mongo) | ||||
|  | ||||
| ### 7.1 Buckets & prefixes (RustFS) | ||||
|  | ||||
| ``` | ||||
| rustfs://stellaops/ | ||||
|   scanner/ | ||||
|     layers/<sha256>/sbom.cdx.json.zst | ||||
|     images/<imgDigest>/inventory.cdx.pb | ||||
|     images/<imgDigest>/usage.cdx.pb | ||||
|     diffs/<old>_<new>/diff.json.zst | ||||
|     attest/<artifactSha256>.dsse.json | ||||
|   concelier/ | ||||
|     json/<exportId>/... | ||||
|     trivy/<exportId>/... | ||||
|   excititor/ | ||||
|     exports/<exportId>/... | ||||
|   attestor/ | ||||
|     dsse/<bundleSha256>.json | ||||
|     proof/<rekorUuid>.json | ||||
| ``` | ||||
|  | ||||
| ### 7.2 ILM classes | ||||
|  | ||||
| * **`short`**: working artifacts (diffs, queues) — TTL 7–14 days. | ||||
| * **`default`**: SBOMs & indexes — TTL 90–180 days (configurable). | ||||
| * **`compliance`**: signed reports & attested exports — retention enforced via RustFS hold or S3 Object Lock (governance/compliance) 1–7 years. | ||||
|  | ||||
| ### 7.3 Artifact Lifecycle Controller (ALC) | ||||
|  | ||||
| * A background worker (part of Scanner.WebService) enforces **TTL** and **reference counting**: | ||||
|  | ||||
|   * Artifacts referenced by **reports** or **tickets** are pinned. | ||||
|   * ILM actions logged; UI shows per‑class usage & upcoming purges. | ||||
|  | ||||
| > **Migration note.** Follow `docs/modules/scanner/operations/rustfs-migration.md` when transitioning existing | ||||
| > MinIO buckets to RustFS. The provided migrator is idempotent and safe to rerun per prefix. | ||||
|  | ||||
| ### 7.4 Mongo retention | ||||
|  | ||||
| * **Scanner**: `runtime.events` use TTL (e.g., 30–90 days); **catalog** permanent. | ||||
| * **Concelier/Excititor**: raw docs keep **last N windows**; canonical stores permanent. | ||||
| * **Attestor**: `entries` permanent; `dedupe` TTL 24–48h. | ||||
|  | ||||
| ### 7.5 Mongo server baseline | ||||
|  | ||||
| * **Minimum supported server:** MongoDB **4.2+**. Driver 3.5.0 removes compatibility shims for 4.0; upstream has already announced 4.0 support will be dropped in upcoming C# driver releases. citeturn1open1 | ||||
| * **Deploy images:** Compose/Helm defaults stay on `mongo:7.x`. For air-gapped installs, refresh Offline Kit bundles so the packaged `mongod` matches ≥4.2. | ||||
| * **Upgrade guard:** During rollout, verify replica sets reach FCV `4.2` or above before swapping binaries; automation should hard-stop if FCV is <4.2. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Observability & SLOs (operations) | ||||
|  | ||||
| * **Uptime SLO**: 99.9% for Signer/Authority/Attestor; 99.5% for Scanner WebService; Excititor/Concelier 99.0%. | ||||
| * **Error budgets**: tracked per month; dashboards show burn rates. | ||||
| * **Golden signals**: | ||||
|  | ||||
|   * **Latency**: token issuance, sign→attest round‑trip, scan enqueue→emit, export build. | ||||
|   * **Saturation**: queue depth, Mongo write IOPS, RustFS throughput / queue depth (or S3 metrics when in fallback mode). | ||||
|   * **Traffic**: scans/min, attestations/min, webhook admits/min. | ||||
|   * **Errors**: 5xx rates, cosign verification failures, Rekor timeouts. | ||||
|  | ||||
| Prometheus + OTLP; Grafana dashboards ship in the charts. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Security & compliance operations | ||||
|  | ||||
| * **Key rotation**: | ||||
|  | ||||
|   * Authority JWKS: 60‑day cadence, dual‑key overlap. | ||||
|   * Release signing identities: rotate per minor or quarterly. | ||||
|   * Sigstore roots mirrored and pinned; alarms on drift. | ||||
|  | ||||
| * **FIPS mode** (Gov build): | ||||
|  | ||||
|   * Enforce `ES256` + KMS/HSM; disable Ed25519; MLS ciphers only. | ||||
|   * Local **Rekor v2** and **Fulcio** alternatives; **air‑gapped** CA. | ||||
|  | ||||
| * **Vulnerability response**: | ||||
|  | ||||
|   * Concelier red-flag advisories trigger accelerated **stable** patch rollout; UI/CLI “security patch available” notice. | ||||
|   * 2025-10: Pinned `MongoDB.Driver` **3.5.0** and `SharpCompress` **0.41.0** across services (DEVOPS-SEC-10-301) to eliminate NU1902/NU1903 warnings surfaced during scanner cache/worker test runs; repacked the local `Mongo2Go` feed so test fixtures inherit the patched dependencies; future bumps follow the same central override pattern. | ||||
|  | ||||
| * **Backups/DR**: | ||||
|  | ||||
|   * Mongo nightly snapshots; MinIO versioning + replication (if configured). | ||||
|   * Restore runbooks tested quarterly with synthetic data. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Customer update flow (how versions are fetched & activated) | ||||
|  | ||||
| ### 10.1 Online clusters | ||||
|  | ||||
| * **UI** surfaces update banner with **release manifest** diff and risk notes. | ||||
| * Operator approves → **Controller** pulls new images by digest; health‑checks; moves traffic; deprecates old revision. | ||||
| * Post‑switch, **schema Phase B** migrations (if any) run automatically. | ||||
|  | ||||
| ### 10.2 Air‑gapped clusters | ||||
|  | ||||
| * Operator downloads **offline kit** from a mirror → `stellaops offline kit import`. | ||||
| * Controller validates bundle checksums and **cosign signatures**; applies charts/compose by digest. | ||||
| * After install, **verify** page shows green checks: image sigs, SBOMs attached, provenance logged. | ||||
|  | ||||
| ### 10.3 CLI self‑update (optional) | ||||
|  | ||||
| * `stellaops self-update` pulls a **signed release manifest** and verifies the **CLI binary** with cosign before swapping (admin can disable). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Compatibility & deprecation policy | ||||
|  | ||||
| * **APIs** are stable within a **major**; breaking changes imply **MAJOR++** and deprecation period of one minor. | ||||
| * **Storage**: expand/contract; “drop old fields” only after one minor grace. | ||||
| * **Config**: feature flags (default off) for risky features (e.g., eBPF). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Runbooks (selected) | ||||
|  | ||||
| ### 12.1 Lost PoE | ||||
|  | ||||
| 1. Suspend **automatic attestation** jobs. | ||||
| 2. Use CLI `stellaops signer status` to confirm `entitlement_denied`. | ||||
| 3. Obtain new PoE from portal; verify on Signer `/poe/verify`. | ||||
| 4. Re‑enable; optionally **re‑sign** last N reports (UI button → batch). | ||||
|  | ||||
| ### 12.2 Rekor outage (self‑hosted) | ||||
|  | ||||
| * Attestor returns `202 (pending)` with queued proof fetch. | ||||
| * Keep DSSE bundles locally; re‑submit on schedule; UI badge shows **Pending**. | ||||
| * If outage > SLA, you can switch to a **mirror** log in config; Attestor writes to both when restored. | ||||
|  | ||||
| ### 12.3 Emergency downgrade | ||||
|  | ||||
| * Identify prior release manifest (UI → Admin → Releases). | ||||
| * `helm rollback stella <revision>` (or compose apply previous file). | ||||
| * Services tolerate skew per §1.3; ensure **Signer/Authority/Attestor** are rolled together. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Example: cluster bootstrap (Compose) | ||||
|  | ||||
| ```yaml | ||||
| version: "3.9" | ||||
| services: | ||||
|   authority: | ||||
|     image: registry.stella-ops.org/stellaops/authority@sha256:... | ||||
|     env_file: ./env/authority.env | ||||
|     ports: ["8440:8440"] | ||||
|   signer: | ||||
|     image: registry.stella-ops.org/stellaops/signer@sha256:... | ||||
|     depends_on: [authority] | ||||
|     environment: | ||||
|       - SIGNER__POE__LICENSING__INTROSPECTURL=https://www.stella-ops.org/api/v1/license/introspect | ||||
|   attestor: | ||||
|     image: registry.stella-ops.org/stellaops/attestor@sha256:... | ||||
|     depends_on: [signer] | ||||
|   scanner-web: | ||||
|     image: registry.stella-ops.org/stellaops/scanner-web@sha256:... | ||||
|     environment: | ||||
|       - SCANNER__S3__ENDPOINT=http://minio:9000 | ||||
|   scanner-worker: | ||||
|     image: registry.stella-ops.org/stellaops/scanner-worker@sha256:... | ||||
|     deploy: { replicas: 4 } | ||||
|   concelier: | ||||
|     image: registry.stella-ops.org/stellaops/concelier@sha256:... | ||||
|   excititor: | ||||
|     image: registry.stella-ops.org/stellaops/excititor@sha256:... | ||||
|   web-ui: | ||||
|     image: registry.stella-ops.org/stellaops/web-ui@sha256:... | ||||
|   mongo: | ||||
|     image: mongo:7 | ||||
|   minio: | ||||
|     image: minio/minio:RELEASE.2025-07-10T00-00-00Z | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Governance & keys (who owns the trust root) | ||||
|  | ||||
| * **Release key policy**: only the Release Engineering group can push signed releases; 4‑eyes approval; TUF‑style manifest possible in future. | ||||
| * **Signer acceptance policy**: embedded release identities are updated **only** via minor upgrade; emergency CRL supported. | ||||
| * **Customer keys**: none needed for core use; enterprise add‑ons may require per‑customer registries and keys. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Roadmap (Ops) | ||||
|  | ||||
| * **Windows containers GA** (Scanner + Zastava). | ||||
| * **Key Transparency** for Signer certs. | ||||
| * **Delta‑kit** (offline) for incremental updates. | ||||
| * **Operator CRDs** (K8s) to manage policy and ILM declaratively. | ||||
| * **SBOM **protobuf** as default transport at rest (smaller, faster). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ### Appendix A — Minimal SLO monitors | ||||
|  | ||||
| * `authority.tokens_issued_total` slope ≈ normal. | ||||
| * `signer.requests_total{result="success"}/minute` > 0 (when scans occur). | ||||
| * `attestor.submit_latency_seconds{quantile=0.95}` < 0.3. | ||||
| * `scanner.scan_latency_seconds{quantile=0.95}` < target per image size. | ||||
| * `concelier.export.duration_seconds` stable; `excititor.consensus.conflicts_total` not exploding after policy changes. | ||||
| * RustFS request error rate near zero (or `s3_requests_errors_total` when operating against S3); Mongo `opcounters` hit expected baseline. | ||||
|  | ||||
| ### Appendix B — Upgrade safety checklist | ||||
|  | ||||
| * Verify **release manifest** signature. | ||||
| * Ensure **Signer/Authority/Attestor** are same minor. | ||||
| * Verify **DB backups** < 24h old. | ||||
| * Confirm **ILM** won’t purge compliance artifacts during upgrade window. | ||||
| * Roll **one component** at a time; watch SLOs; abort on regression. | ||||
|  | ||||
| --- | ||||
|  | ||||
| **End — component_architecture_devops.md** | ||||
							
								
								
									
										22
									
								
								docs/modules/devops/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/devops/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Implementation plan — DevOps | ||||
|  | ||||
| ## Current objectives | ||||
| - Maintain deterministic behaviour and offline parity across releases. | ||||
| - Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap. | ||||
| - Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs. | ||||
| - Validation: extend tests/fixtures to preserve determinism and provenance requirements. | ||||
|  | ||||
| ## Epic milestones | ||||
| - **Epic 1 – AOC enforcement:** ensure CI/CD guardrails, schema validation, and verifier pipelines are enforced. | ||||
| - **Epic 9 – Orchestrator Dashboard:** deliver dashboards, recovery runbooks, and rate-limit governance. | ||||
| - **Epic 10 – Export Center:** manage signing/promotions and Offline Kit bundle publishing. | ||||
| - **Epic 15 – Observability & Forensics:** coordinate telemetry deployments, evidence retention, and forensic automation. | ||||
| - Track module runbooks (DEVOPS-LAUNCH-18-001/900) and telemetry automation via ../../TASKS.md and ops/devops/TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up new work. | ||||
| - Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										50
									
								
								docs/modules/devops/migrations/semver-style.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										50
									
								
								docs/modules/devops/migrations/semver-style.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,50 @@ | ||||
| # SemVer Style Backfill Runbook | ||||
|  | ||||
| _Last updated: 2025-10-11_ | ||||
|  | ||||
| ## Overview | ||||
|  | ||||
| The SemVer style migration populates the new `normalizedVersions` field on advisory documents and ensures | ||||
| provenance `decisionReason` values are preserved during future reads. The migration is idempotent and only | ||||
| runs when the feature flag `concelier:storage:enableSemVerStyle` is enabled. | ||||
|  | ||||
| ## Preconditions | ||||
|  | ||||
| 1. **Review configuration** – set `concelier.storage.enableSemVerStyle` to `true` on all Concelier services. | ||||
| 2. **Confirm batch size** – adjust `concelier.storage.backfillBatchSize` if you need smaller batches for older | ||||
|    deployments (default: `250`). | ||||
| 3. **Back up** – capture a fresh snapshot of the `advisory` collection or a full MongoDB backup. | ||||
| 4. **Staging dry-run** – enable the flag in a staging environment and observe the migration output before | ||||
|    rolling to production. | ||||
|  | ||||
| ## Execution | ||||
|  | ||||
| No manual command is required. After deploying the configuration change, restart the Concelier WebService or | ||||
| any component that hosts the Mongo migration runner. During startup you will see log entries similar to: | ||||
|  | ||||
| ``` | ||||
| Applying Mongo migration 20251011-semver-style-backfill: Populate advisory.normalizedVersions for existing documents when SemVer style storage is enabled. | ||||
| Mongo migration 20251011-semver-style-backfill applied | ||||
| ``` | ||||
|  | ||||
| The migration reads advisories in batches (`concelier.storage.backfillBatchSize`) and writes flattened | ||||
| `normalizedVersions` arrays. Existing documents without SemVer ranges remain untouched. | ||||
|  | ||||
| ## Post-checks | ||||
|  | ||||
| 1. Verify the new indexes exist: | ||||
|    ``` | ||||
|    db.advisory.getIndexes() | ||||
|    ``` | ||||
|    You should see `advisory_normalizedVersions_pkg_scheme_type` and `advisory_normalizedVersions_value`. | ||||
| 2. Spot check a few advisories to confirm the top-level `normalizedVersions` array exists and matches | ||||
|    the embedded package data. | ||||
| 3. Run `dotnet test` for `StellaOps.Concelier.Storage.Mongo.Tests` (optional but recommended) in CI to confirm | ||||
|    the storage suite passes with the feature flag enabled. | ||||
|  | ||||
| ## Rollback | ||||
|  | ||||
| Set `concelier.storage.enableSemVerStyle` back to `false` and redeploy. The migration will be skipped on | ||||
| subsequent startups. You can leave the populated `normalizedVersions` arrays in place; they are ignored when | ||||
| the feature flag is off. If you must remove them entirely, restore from the backup captured during | ||||
| preparation. | ||||
							
								
								
									
										151
									
								
								docs/modules/devops/runbooks/deployment-upgrade.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										151
									
								
								docs/modules/devops/runbooks/deployment-upgrade.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,151 @@ | ||||
| # Stella Ops Deployment Upgrade & Rollback Runbook | ||||
|  | ||||
| _Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._ | ||||
|  | ||||
| This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Channel overview | ||||
|  | ||||
| | Channel | Release manifest | Helm values | Compose profile | | ||||
| |---------|------------------|-------------|-----------------| | ||||
| | `edge`  | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` | | ||||
| | `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` | | ||||
| | `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` | | ||||
|  | ||||
| Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Pre-flight checklist | ||||
|  | ||||
| 1. **Refresh release manifest**   | ||||
|    Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`). | ||||
|  | ||||
| 2. **Align deployment bundles with the manifest**   | ||||
|    Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services. | ||||
|    ```bash | ||||
|    ./deploy/tools/check-channel-alignment.py \ | ||||
|        --release deploy/releases/2025.10-edge.yaml \ | ||||
|        --target deploy/helm/stellaops/values-dev.yaml \ | ||||
|        --target deploy/compose/docker-compose.dev.yaml \ | ||||
|        --ignore-repo nats | ||||
|    ``` | ||||
|    Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files. | ||||
|  | ||||
| 3. **Lint and template profiles** | ||||
|    ```bash | ||||
|    ./deploy/tools/validate-profiles.sh | ||||
|    ``` | ||||
|  | ||||
| 4. **Smoke the Offline Kit debug store (edge/stable only)**   | ||||
|    When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree: | ||||
|    ```bash | ||||
|   ./ops/offline-kit/mirror_debug_store.py \ | ||||
|        --release-dir out/release \ | ||||
|        --offline-kit-dir out/offline-kit | ||||
|    ``` | ||||
|    Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle. | ||||
|  | ||||
| 5. **Review compatibility matrix**   | ||||
|    Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`. | ||||
|  | ||||
| 6. **Create a rollback bookmark**   | ||||
|    Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Helm upgrade procedure (staging → production) | ||||
|  | ||||
| 1. Switch to the deployment branch and ensure secrets/config maps are current. | ||||
| 2. Apply the upgrade in the staging cluster: | ||||
|    ```bash | ||||
|    helm upgrade stellaops deploy/helm/stellaops \ | ||||
|      -f deploy/helm/stellaops/values-stage.yaml \ | ||||
|      --namespace stellaops \ | ||||
|      --atomic \ | ||||
|      --timeout 15m | ||||
|    ``` | ||||
| 3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks). | ||||
| 4. Promote to production using the prod values file and the same command. | ||||
| 5. Record the new revision number and Git SHA in the change log. | ||||
|  | ||||
| ### Rollback (Helm) | ||||
|  | ||||
| 1. Identify the previous revision: `helm history stellaops -n stellaops`. | ||||
| 2. Execute: | ||||
|    ```bash | ||||
|    helm rollback stellaops <revision> \ | ||||
|      --namespace stellaops \ | ||||
|      --wait \ | ||||
|      --timeout 10m | ||||
|    ``` | ||||
| 3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests. | ||||
| 4. Update the incident/operations log with root cause and rollback details. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Docker Compose upgrade procedure | ||||
|  | ||||
| 1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts. | ||||
| 2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable). | ||||
| 3. Apply the upgrade: | ||||
|    ```bash | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      pull | ||||
|  | ||||
|    docker compose \ | ||||
|      --env-file deploy/compose/env/prod.env \ | ||||
|      -f deploy/compose/docker-compose.prod.yaml \ | ||||
|      up -d | ||||
|    ``` | ||||
| 4. Tail logs for critical services (`docker compose logs -f authority concelier`). | ||||
| 5. Update monitoring dashboards/alerts to confirm normal operation. | ||||
|  | ||||
| ### Rollback (Compose) | ||||
|  | ||||
| 1. Check out the previous release tag (e.g. `git checkout 2025.09.1`). | ||||
| 2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests. | ||||
| 3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/modules/authority/operations/backup-restore.md` and associated service guides). | ||||
| 4. Log the rollback in the operations journal. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Channel promotion workflow | ||||
|  | ||||
| 1. Author or update the channel manifest under `deploy/releases/`. | ||||
| 2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile. | ||||
| 3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`). | ||||
| 4. Publish release notes and update `deploy/releases/README.md` (if applicable). | ||||
| 5. Tag the repository when promoting stable or airgap builds. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Upgrade rehearsal & rollback drill log | ||||
|  | ||||
| Maintain rehearsal notes in `docs/modules/devops/runbooks/launch-cutover.md` or the relevant sprint planning document. After each drill capture: | ||||
|  | ||||
| - Release version tested | ||||
| - Date/time | ||||
| - Participants | ||||
| - Issues encountered & fixes | ||||
| - Rollback duration (if executed) | ||||
|  | ||||
| Attach the log to the sprint retro or operational wiki. | ||||
|  | ||||
| | Date (UTC) | Channel | Outcome | Notes | | ||||
| |------------|---------|---------|-------| | ||||
| | 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. References | ||||
|  | ||||
| - `deploy/README.md` – structure and validation workflow for deployment bundles. | ||||
| - `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline. | ||||
| - `docs/modules/devops/architecture.md` – high-level DevOps architecture, SLOs, and compliance requirements. | ||||
| - `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper. | ||||
| - `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker. | ||||
							
								
								
									
										128
									
								
								docs/modules/devops/runbooks/launch-cutover.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										128
									
								
								docs/modules/devops/runbooks/launch-cutover.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,128 @@ | ||||
| # Launch Cutover Runbook - Stella Ops | ||||
|  | ||||
| _Document owner: DevOps Guild (2025-10-26)_   | ||||
| _Scope:_ Full-platform launch from staging to production for release `2025.09.2`. | ||||
|  | ||||
| ## 1. Roles and Communication | ||||
|  | ||||
| | Role | Primary | Backup | Contact | | ||||
| | --- | --- | --- | --- | | ||||
| | Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | `#launch-bridge` (Mattermost) | | ||||
| | Authority stack | Authority Core guild rep | Security guild rep | `#authority` | | ||||
| | Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | `#scanner` | | ||||
| | Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation | | ||||
| | Observability | Telemetry guild rep | SRE on-call | `#telemetry` | | ||||
| | Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket | | ||||
|  | ||||
| Set up a bridge call 30 minutes before start and keep `#launch-bridge` updated every 10 minutes. | ||||
|  | ||||
| ## 2. Timeline Overview (UTC) | ||||
|  | ||||
| | Time | Activity | Owner | | ||||
| | --- | --- | --- | | ||||
| | T-24h | Change ticket approved, prod secrets verified, offline kit build status checked (`DEVOPS-OFFLINE-18-005`). | DevOps lead | | ||||
| | T-12h | Run `deploy/tools/validate-profiles.sh`; capture logs in ticket. | DevOps engineer | | ||||
| | T-6h | Freeze non-launch deployments; notify guild leads. | Product owner | | ||||
| | T-2h | Execute rehearsal in staging (Section 3) using `values-stage.yaml` to verify scripts. | DevOps + module reps | | ||||
| | T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead | | ||||
| | T0 | Execute production cutover steps (Section 4). | Cutover team | | ||||
| | T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead | | ||||
| | T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner | | ||||
|  | ||||
| ## 3. Rehearsal (Staging) Checklist | ||||
|  | ||||
| 1. `docker network create stellaops_frontdoor || true` (if not present on staging jump host). | ||||
| 2. Run `deploy/tools/validate-profiles.sh` and archive output. | ||||
| 3. Apply staging secrets (`kubectl apply -f secrets/stage/*.yaml` or `helm secrets upgrade`) ensuring `stellaops-stage` credentials align with `values-stage.yaml`. | ||||
| 4. Perform `helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-stage.yaml` in staging cluster. | ||||
| 5. Verify health endpoints: `curl https://authority.stage.../healthz`, `curl https://scanner.stage.../healthz`. | ||||
| 6. Execute smoke CLI: `stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.json` and confirm report status in UI. | ||||
| 7. Document total wall time and any deviations in the rehearsal log. | ||||
|  | ||||
| Rehearsal must complete without manual interventions before proceeding to production. | ||||
|  | ||||
| ## 4. Production Cutover Steps | ||||
|  | ||||
| ### 4.1 Pre-flight | ||||
| - Confirm production secrets in the appropriate secret store (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`) contain the keys referenced in `values-prod.yaml`. | ||||
| - Ensure the external reverse proxy network exists: `docker network create stellaops_frontdoor || true` on each compose host. | ||||
| - Back up current configuration and data: | ||||
|   - Mongo snapshot: `mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds)`. | ||||
|   - MinIO policy export: `mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M)`. | ||||
|  | ||||
| ### 4.2 Apply Updates (Compose) | ||||
| 1. On each compose node, pull updated images for release `2025.09.2`: | ||||
|    ```bash | ||||
|    docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml pull | ||||
|    ``` | ||||
| 2. Deploy changes: | ||||
|    ```bash | ||||
|    docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml up -d | ||||
|    ``` | ||||
| 3. Confirm containers healthy via `docker compose ps` and `docker logs <service> --tail 50`. | ||||
|  | ||||
| ### 4.3 Apply Updates (Helm/Kubernetes) | ||||
| If using Kubernetes, perform: | ||||
| ```bash | ||||
| helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml --atomic --timeout 15m | ||||
| ``` | ||||
| Monitor rollout with `kubectl get pods -n stellaops --watch` and `kubectl rollout status deployment/<service>`. | ||||
|  | ||||
| ### 4.4 Configuration Validation | ||||
| - Verify Authority issuer metadata: `curl https://authority.prod.../.well-known/openid-configuration`. | ||||
| - Validate Signer DSSE endpoint: `stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json`. | ||||
| - Check Scanner queue connectivity: `docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue` (returns success). | ||||
| - Ensure Notify (legacy) still accessible while Notifier migration pending. | ||||
|  | ||||
| ## 5. Smoke Tests | ||||
|  | ||||
| | Test | Command / Action | Expected Result | | ||||
| | --- | --- | --- | | ||||
| | API health | `curl https://scanner.prod.../healthz` | HTTP 200 with `status":"Healthy"` | | ||||
| | Scan submit | `stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json` | Scan completes < 5 minutes; report accessible with signed DSSE | | ||||
| | Runtime event ingest | Post sample event from Zastava observer fixture | `/runtime/events` responds 202 Accepted; record visible in Mongo `runtime_events` | | ||||
| | Signing | `stellaops-cli signer sign --bundle demo.json` | Returns DSSE with matching SHA256 and signer metadata | | ||||
| | Attestor verify | `stellaops-cli attestor verify --uuid <uuid>` | Verification result `ok=true` | | ||||
| | Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent | | ||||
|  | ||||
| Log results in the change ticket with timestamps and screenshots where applicable. | ||||
|  | ||||
| ## 6. Rollback Procedure | ||||
|  | ||||
| 1. Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts. | ||||
| 2. For Compose: | ||||
|    ```bash | ||||
|    docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml down | ||||
|    docker compose --env-file stage.env -f deploy/compose/docker-compose.stage.yaml up -d | ||||
|    ``` | ||||
| 3. For Helm: | ||||
|    ```bash | ||||
|    helm rollback stellaops <previous-release-number> --namespace stellaops | ||||
|    ``` | ||||
| 4. Restore Mongo snapshot if data inconsistency detected: `mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>`. | ||||
| 5. Restore MinIO mirror if required: `mc mirror minio-backup/stellaops-<timestamp> minio/stellaops`. | ||||
| 6. Notify stakeholders of rollback and capture root cause notes in incident ticket. | ||||
|  | ||||
| ## 7. Post-cutover Actions | ||||
|  | ||||
| - Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth. | ||||
| - Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored. | ||||
| - Update `docs/modules/devops/runbooks/launch-readiness.md` if any new gaps or follow-ups discovered. | ||||
| - Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner. | ||||
|  | ||||
| ## 8. Approval Matrix | ||||
|  | ||||
| | Step | Required Approvers | Record Location | | ||||
| | --- | --- | --- | | ||||
| | Production deployment plan | CTO + DevOps lead | Change ticket comment | | ||||
| | Cutover start (T0) | DevOps lead + module reps | `#launch-bridge` summary | | ||||
| | Post-smoke success | DevOps lead + product owner | Change ticket closure | | ||||
| | Rollback (if invoked) | DevOps lead + CTO | Incident ticket | | ||||
|  | ||||
| Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned. | ||||
|  | ||||
| ## 9. Rehearsal Log | ||||
|  | ||||
| | Date (UTC) | What We Exercised | Outcome | Follow-up | | ||||
| | --- | --- | --- | --- | | ||||
| | 2025-10-26 | Dry-run of compose/Helm validation via `deploy/tools/validate-profiles.sh` (dev/stage/prod/airgap/mirror). Network creation simulated (`docker network create stellaops_frontdoor` planned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. | | ||||
							
								
								
									
										49
									
								
								docs/modules/devops/runbooks/launch-readiness.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										49
									
								
								docs/modules/devops/runbooks/launch-readiness.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,49 @@ | ||||
| # Launch Readiness Record - Stella Ops | ||||
|  | ||||
| _Updated: 2025-10-26 (UTC)_ | ||||
|  | ||||
| This document captures production launch sign-offs, deployment readiness checkpoints, and any open risks that must be tracked before GA cutover. | ||||
|  | ||||
| ## 1. Sign-off Summary | ||||
|  | ||||
| | Module / Service | Guild / Point of Contact | Evidence (Task or Runbook) | Status | Timestamp (UTC) | Notes | | ||||
| | --- | --- | --- | --- | --- | --- | | ||||
| | Authority (Issuer) | Authority Core Guild | `AUTH-AOC-19-001` - scope issuance & configuration complete (DONE 2025-10-26) | READY | 2025-10-26T14:05Z | Tenant scope propagation follow-up (`AUTH-AOC-19-002`) tracked in gaps section. | | ||||
| | Signer | Signer Guild | `SIGNER-API-11-101` / `SIGNER-REF-11-102` / `SIGNER-QUOTA-11-103` (DONE 2025-10-21) | READY | 2025-10-26T14:07Z | DSSE signing, referrer verification, and quota enforcement validated in CI. | | ||||
| | Attestor | Attestor Guild | `ATTESTOR-API-11-201` / `ATTESTOR-VERIFY-11-202` / `ATTESTOR-OBS-11-203` (DONE 2025-10-19) | READY | 2025-10-26T14:10Z | Rekor submission/verification pipeline green; telemetry pack published. | | ||||
| | Scanner Web + Worker | Scanner WebService Guild | `SCANNER-WEB-09-10x`, `SCANNER-RUNTIME-12-30x` (DONE 2025-10-18 -> 2025-10-24) | READY* | 2025-10-26T14:20Z | Orchestrator envelope work (`SCANNER-EVENTS-16-301/302`) still open; see gaps. | | ||||
| | Concelier Core & Connectors | Concelier Core / Ops Guild | Ops runbook sign-off in `docs/modules/concelier/operations/conflict-resolution.md` (2025-10-16) | READY | 2025-10-26T14:25Z | Conflict resolution & connector coverage accepted; Mongo schema hardening pending (see gaps). | | ||||
| | Excititor API | Excititor Core Guild | Wave 0 connector ingest sign-offs (EXECPLAN.Section  Wave 0) | READY | 2025-10-26T14:28Z | VEX linkset publishing complete for launch datasets. | | ||||
| | Notify Web (legacy) | Notify Guild | Existing stack carried forward; Notifier program tracked separately (Sprint 38-40) | PENDING | 2025-10-26T14:32Z | Legacy notify web remains operational; migration to Notifier blocked on `SCANNER-EVENTS-16-301`. | | ||||
| | Web UI | UI Guild | Stable build `registry.stella-ops.org/.../web-ui@sha256:10d9248...` deployed in stage and smoke-tested | READY | 2025-10-26T14:35Z | Policy editor GA items (Sprint 20) outside launch scope. | | ||||
| | DevOps / Release | DevOps Guild | `deploy/tools/validate-profiles.sh` run (2025-10-26) covering dev/stage/prod/airgap/mirror | READY | 2025-10-26T15:02Z | Compose/Helm lint + docker compose config validated; see Section 2 for details. | | ||||
| | Offline Kit | Offline Kit Guild | `DEVOPS-OFFLINE-18-004` (Go analyzer) and `DEVOPS-OFFLINE-18-005` (Python analyzer) complete; debug-store mirror pending (`DEVOPS-OFFLINE-17-004`). | PENDING | 2025-10-26T15:05Z | Awaiting release debug artefacts to finalise `DEVOPS-OFFLINE-17-004`; tracked in Section 3. | | ||||
|  | ||||
| _\* READY with caveat - remaining work noted in Section 3._ | ||||
|  | ||||
| ## 2. Deployment Readiness Checklist | ||||
|  | ||||
| - **Production profiles committed:** `deploy/compose/docker-compose.prod.yaml` and `deploy/helm/stellaops/values-prod.yaml` added with front-door network hand-off and secret references for Mongo/MinIO/core services. | ||||
| - **Secrets placeholders documented:** `deploy/compose/env/prod.env.example` enumerates required credentials (`MONGO_INITDB_ROOT_PASSWORD`, `MINIO_ROOT_PASSWORD`, Redis/NATS endpoints, `FRONTDOOR_NETWORK`). Helm values reference Kubernetes secrets (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`). | ||||
| - **Static validation executed:** `deploy/tools/validate-profiles.sh` run on 2025-10-26 (docker compose config + helm lint/template) with all profiles passing. | ||||
| - **Ingress model defined:** Production compose profile introduces external `frontdoor` network; README updated with creation instructions and scope of externally reachable services. | ||||
| - **Observability hooks:** Authority/Signer/Attestor telemetry packs verified; scanner runtime build-id metrics landed (`SCANNER-RUNTIME-17-401`). Grafana dashboards referenced in component runbooks. | ||||
| - **Rollback assets:** Stage Compose profile remains aligned (`docker-compose.stage.yaml`), enabling rehearsals before prod cutover; release manifests (`deploy/releases/2025.09-stable.yaml`) map digests for reproducible rollback. | ||||
| - **Rehearsal status:** 2025-10-26 validation dry-run executed (`deploy/tools/validate-profiles.sh` across dev/stage/prod/airgap/mirror). Full stage Helm rollout pending access to the managed staging cluster; target to complete once credentials are provisioned. | ||||
|  | ||||
| ## 3. Outstanding Gaps & Follow-ups | ||||
|  | ||||
| | Item | Owner | Tracking Ref | Target / Next Step | Impact | | ||||
| | --- | --- | --- | --- | --- | | ||||
| | Tenant scope propagation and audit coverage | Authority Core Guild | `AUTH-AOC-19-002` (DOING 2025-10-26) | Land enforcement + audit fixtures by Sprint 19 freeze | Medium - required for multi-tenant GA but does not block initial cutover if tenants scoped manually. | | ||||
| | Orchestrator event envelopes + Notifier handshake | Scanner WebService Guild | `SCANNER-EVENTS-16-301` (BLOCKED), `SCANNER-EVENTS-16-302` (DOING) | Coordinate with Gateway/Notifier owners on preview package replacement or binding redirects; rerun `dotnet test` once patch lands and refresh schema docs. Share envelope samples in `docs/events/` after tests pass. | High — gating Notifier migration; legacy notify path remains functional meanwhile. | | ||||
| | Offline Kit Python analyzer bundle | Offline Kit Guild + Scanner Guild | `DEVOPS-OFFLINE-18-005` (DONE 2025-10-26) | Monitor for follow-up manifest updates and rerun smoke script when analyzers change. | Medium - ensures language analyzer coverage stays current for offline installs. | | ||||
| | Offline Kit debug store mirror | Offline Kit Guild + DevOps Guild | `DEVOPS-OFFLINE-17-004` (BLOCKED 2025-10-26) | Release pipeline must publish `out/release/debug` artefacts; once available, run `mirror_debug_store.py` and commit `metadata/debug-store.json`. | Low - symbol lookup remains accessible from staging assets but required before next Offline Kit tag. | | ||||
| | Mongo schema validators for advisory ingestion | Concelier Storage Guild | `CONCELIER-STORE-AOC-19-001` (TODO) | Finalize JSON schema + migration toggles; coordinate with Ops for rollout window | Low - current validation handled in app layer; schema guard adds defense-in-depth. | | ||||
| | Authority plugin telemetry alignment | Security Guild | `SEC2.PLG`, `SEC3.PLG`, `SEC5.PLG` (BLOCKED pending AUTH DPoP/MTLS tasks) | Resume once upstream auth surfacing stabilises | Low - plugin remains optional; launch uses default Authority configuration. | | ||||
|  | ||||
| ## 4. Approvals & Distribution | ||||
|  | ||||
| - Record shared in `#launch-readiness` (Mattermost) 2025-10-26 15:15 UTC with DevOps + Guild leads for acknowledgement. | ||||
| - Updates to this document require dual sign-off from DevOps Guild (owner) and impacted module guild lead; retain change log via Git history. | ||||
| - Cutover rehearsal and rollback drills are tracked separately in `docs/modules/devops/runbooks/launch-cutover.md` (see associated Task `DEVOPS-LAUNCH-18-001`). *** End Patch | ||||
							
								
								
									
										64
									
								
								docs/modules/devops/runbooks/nuget-preview-bootstrap.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										64
									
								
								docs/modules/devops/runbooks/nuget-preview-bootstrap.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,64 @@ | ||||
| # NuGet Preview Bootstrap (Offline-Friendly) | ||||
|  | ||||
| The StellaOps build relies on .NET 10 RC2 packages (Microsoft.Extensions.*, JwtBearer 10.0 RC). | ||||
| `NuGet.config` now wires three sources: | ||||
|  | ||||
| 1. `local` → `./local-nuget` (preferred, air-gapped mirror) | ||||
| 2. `dotnet-public` → `https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json` | ||||
| 3. `nuget.org` → fallback for everything else | ||||
|  | ||||
| Follow the steps below whenever you refresh the repo or roll a new Offline Kit drop. | ||||
|  | ||||
| ## 1. Mirror the preview packages | ||||
|  | ||||
| ```bash | ||||
| ./ops/devops/sync-preview-nuget.sh | ||||
| ``` | ||||
|  | ||||
| * Reads `ops/devops/nuget-preview-packages.csv`. Each line specifies the package, version, expected SHA-256 hash, and (optionally) the flat-container base URL (we pin to `dotnet-public`). | ||||
| * Downloads the `.nupkg` straight into `./local-nuget/` and re-verifies the checksum. Existing files are skipped when hashes already match. | ||||
| * Use `NUGET_V2_BASE` if you need to temporarily point at a different mirror. | ||||
|  | ||||
| 💡 The script never mutates packages in place—if a checksum changes you will see a “SHA mismatch … refreshing” message. | ||||
|  | ||||
| ## 2. Restore using the shared `NuGet.config` | ||||
|  | ||||
| From the repo root: | ||||
|  | ||||
| ```bash | ||||
| DOTNET_NOLOGO=1 dotnet restore src/Excititor/__Libraries/StellaOps.Excititor.Connectors.Abstractions/StellaOps.Excititor.Connectors.Abstractions.csproj \ | ||||
|   --configfile NuGet.config | ||||
| ``` | ||||
|  | ||||
| The `packageSourceMapping` section keeps `Microsoft.Extensions.*`, `Microsoft.AspNetCore.*`, and `Microsoft.Data.Sqlite` bound to `local`/`dotnet-public`, so `dotnet restore` never has to reach out to nuget.org when mirrors are populated. | ||||
|  | ||||
| Before committing changes (or when wiring up a new environment) run: | ||||
|  | ||||
| ```bash | ||||
| python3 ops/devops/validate_restore_sources.py | ||||
| ``` | ||||
|  | ||||
| The validator asserts: | ||||
|  | ||||
| - `NuGet.config` lists `local` → `dotnet-public` → `nuget.org` in that order. | ||||
| - `Directory.Build.props` pins `RestoreSources` so every project prioritises the local mirror. | ||||
| - No stray `NuGet.config` files shadow the repo root configuration. | ||||
|  | ||||
| CI executes the validator in both the `build-test-deploy` and `release` workflows, | ||||
| so regressions trip before any restore/build begins. | ||||
|  | ||||
| If you run fully air-gapped, remember to clear the cache between SDK upgrades: | ||||
|  | ||||
| ```bash | ||||
| dotnet nuget locals all --clear | ||||
| ``` | ||||
|  | ||||
| ## 3. Troubleshooting | ||||
|  | ||||
| | Symptom | Fix | | ||||
| | --- | --- | | ||||
| | `dotnet restore` still hits nuget.org for preview packages | Re-run `sync-preview-nuget.sh` to ensure the `.nupkg` exists locally, then delete `~/.nuget/packages/microsoft.extensions.*` so the resolver picks up the mirrored copy. | | ||||
| | SHA mismatch in the manifest | Update `ops/devops/nuget-preview-packages.csv` with the new version + checksum (from the feed) and re-run the sync script. | | ||||
| | Azure DevOps feed throttling | Set `DOTNET_PUBLIC_FLAT_BASE` env var and point it at your own mirrored flat-container, then add the URL to the 4th column of the manifest. | | ||||
|  | ||||
| Keep this doc alongside Offline Kit instructions so air-gapped operators know exactly how to refresh the mirror and verify packages before restore. | ||||
							
								
								
									
										22
									
								
								docs/modules/excititor/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/excititor/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Excititor agent guide | ||||
|  | ||||
| ## Mission | ||||
| Excititor converts heterogeneous VEX feeds into raw observations and linksets that honour the Aggregation-Only Contract. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										33
									
								
								docs/modules/excititor/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										33
									
								
								docs/modules/excititor/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,33 @@ | ||||
| # StellaOps Excititor | ||||
|  | ||||
| Excititor converts heterogeneous VEX feeds into raw observations and linksets that honour the Aggregation-Only Contract. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Fetch OpenVEX/CSAF/CycloneDX statements via restart-only connectors. | ||||
| - Store immutable VEX observations with full provenance. | ||||
| - Publish linksets and events that drive policy suppression decisions. | ||||
| - Provide deterministic exports for Offline Kit and downstream tooling. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Excititor.WebService` scheduler/API host. | ||||
| - Connector libraries under `StellaOps.Excititor.Connector.*`. | ||||
| - Normalization helpers and exporters in `StellaOps.Excititor.*`. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Policy Engine for evidence queries. | ||||
| - UI/CLI for conflict visibility and explanation. | ||||
| - Notify for VEX-driven alerts. | ||||
|  | ||||
| ## Operational notes | ||||
| - MongoDB for observation storage and job metadata. | ||||
| - Offline kit packaging aligned with Concelier merges. | ||||
| - Connector-specific runbooks (see `docs/modules/concelier/operations/connectors`). | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-LNM-22-006 / DOCS-LNM-22-007 (shared with Concelier). | ||||
| - CLI-EXC-25-001..002 follow-up for CLI parity. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 1 – AOC enforcement:** maintain immutable VEX observations, provenance, and AOC verifier coverage. | ||||
| - **Epic 7 – VEX Consensus Lens:** supply trustworthy raw inputs, trust metadata, and consensus hooks for the lens computations. | ||||
| - **Epic 8 – Advisory AI:** expose citation-ready VEX payloads for the advisory assistant pipeline. | ||||
							
								
								
									
										9
									
								
								docs/modules/excititor/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/excititor/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Excititor | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | EXCITITOR-DOCS-0001 | TODO | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | EXCITITOR-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | EXCITITOR-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										749
									
								
								docs/modules/excititor/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										749
									
								
								docs/modules/excititor/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,749 @@ | ||||
| # component_architecture_excititor.md — **Stella Ops Excititor** (Sprint 22) | ||||
|  | ||||
| > Consolidates the VEX ingestion guardrails from Epic 1 with consensus and AI-facing requirements from Epics 7 and 8. This is the authoritative architecture record for Excititor. | ||||
|  | ||||
| > **Scope.** This document specifies the **Excititor** service: its purpose, trust model, data structures, observation/linkset pipelines, APIs, plug-in contracts, storage schema, performance budgets, testing matrix, and how it integrates with Concelier, Policy Engine, and evidence surfaces. It is implementation-ready. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & role in the platform | ||||
|  | ||||
| **Mission.** Convert heterogeneous **VEX** statements (OpenVEX, CSAF VEX, CycloneDX VEX; vendor/distro/platform sources) into immutable **VEX observations**, correlate them into **linksets** that retain provenance/conflicts without precedence, and publish deterministic evidence exports and events that Policy Engine, Console, and CLI use to suppress or explain findings. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Excititor **does not** decide PASS/FAIL. It supplies **evidence** (statuses + justifications + provenance weights). | ||||
| * Excititor preserves **conflicting observations** unchanged; consensus (when enabled) merely annotates how policy might choose, but raw evidence remains exportable. | ||||
| * VEX consumption is **backend-only**: Scanner never applies VEX. The backend’s **Policy Engine** asks Excititor for status evidence and then decides what to show. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Aggregation guardrails (AOC baseline) | ||||
|  | ||||
| Excititor enforces the same ingestion covenant as Concelier, tailored to VEX payloads: | ||||
|  | ||||
| 1. **Immutable `vex_raw` documents.** Upstream OpenVEX/CSAF/CycloneDX files are stored verbatim (`content.raw`) with provenance (`issuer`, `statement_id`, timestamps, signatures). Revisions append new versions linked by `supersedes`. | ||||
| 2. **No derived consensus at ingest time.** Fields such as `effective_status`, `merged_state`, `severity`, or reachability are forbidden. Roslyn analyzers and runtime guards block violations before writes. | ||||
| 3. **Linkset-only joins.** Product aliases, CVE keys, SBOM hints, and references live under `linkset`; ingestion must never mutate the underlying statement. | ||||
| 4. **Deterministic canonicalisation.** Writers sort JSON keys/arrays, normalize timestamps (UTC ISO‑8601), and hash content for reproducible exports. | ||||
| 5. **AOC verifier.** `StellaOps.AOC.Verifier` runs in CI and production, checking schema compliance, provenance completeness, sorted collections, and signature metadata. | ||||
|  | ||||
| ### 1.1 VEX raw document shape | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "_id": "vex_raw:openvex:VEX-2025-00001:v2", | ||||
|   "source": { | ||||
|     "issuer": "vendor:redhat", | ||||
|     "stream": "openvex", | ||||
|     "api": "https://vendor/api/vex/VEX-2025-00001.json", | ||||
|     "collector_version": "excititor/0.9.4" | ||||
|   }, | ||||
|   "upstream": { | ||||
|     "statement_id": "VEX-2025-00001", | ||||
|     "document_version": "2025-08-30T12:00:00Z", | ||||
|     "fetched_at": "2025-08-30T12:05:00Z", | ||||
|     "received_at": "2025-08-30T12:05:01Z", | ||||
|     "content_hash": "sha256:...", | ||||
|     "signature": { | ||||
|       "present": true, | ||||
|       "format": "dsse", | ||||
|       "key_id": "rekor:uuid", | ||||
|       "sig": "base64..." | ||||
|     } | ||||
|   }, | ||||
|   "content": { | ||||
|     "format": "openvex", | ||||
|     "spec_version": "1.0", | ||||
|     "raw": { /* upstream statement */ } | ||||
|   }, | ||||
|   "identifiers": { | ||||
|     "cve": ["CVE-2025-13579"], | ||||
|     "products": [ | ||||
|       {"purl": "pkg:rpm/redhat/openssl@3.0.9", "component": "openssl"} | ||||
|     ] | ||||
|   }, | ||||
|   "linkset": { | ||||
|     "aliases": ["REDHAT:RHSA-2025:1234"], | ||||
|     "sbom_products": ["pkg:rpm/redhat/openssl@3.0.9"], | ||||
|     "justifications": ["reasonable_worst_case_assumption"], | ||||
|     "references": [ | ||||
|       {"type": "advisory", "url": "https://..."} | ||||
|     ] | ||||
|   }, | ||||
|   "supersedes": "vex_raw:openvex:VEX-2025-00001:v1", | ||||
|   "tenant": "default" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 1.2 Issuer trust registry | ||||
|  | ||||
| To enable Epic 7’s consensus lens, Excititor maintains `vex_issuer_registry` documents containing: | ||||
|  | ||||
| - `issuer_id`, canonical name, and allowed domains. | ||||
| - `trust.tier` (`critical`, `high`, `medium`, `low`), `trust.confidence` (0–1). | ||||
| - `products` PURL patterns the issuer is authoritative for. | ||||
| - `signing_keys` with key IDs and expiry. | ||||
| - `last_validated_at`, `revocation_status`. | ||||
|  | ||||
| The registry is distributed as a signed bundle and cached locally; ingestion rejects statements from issuers without registry entries or valid signatures. | ||||
|  | ||||
| ### 1.3 Normalised tuple store | ||||
|  | ||||
| Excititor derives `vex_normalized` tuples (without making decisions) for downstream consumers: | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "advisory_key": "CVE-2025-13579", | ||||
|   "artifact": "pkg:rpm/redhat/openssl@3.0.9", | ||||
|   "issuer": "vendor:redhat", | ||||
|   "status": "not_affected", | ||||
|   "justification": "component_not_present", | ||||
|   "scope": "runtime_path", | ||||
|   "timestamp": "2025-08-30T12:00:00Z", | ||||
|   "trust": {"tier": "high", "confidence": 0.95}, | ||||
|   "statement_id": "VEX-2025-00001:v2", | ||||
|   "content_hash": "sha256:..." | ||||
| } | ||||
| ``` | ||||
|  | ||||
| These tuples allow VEX Lens to compute deterministic consensus without re-parsing heavy upstream documents. | ||||
|  | ||||
| ### 1.4 AI-ready citations | ||||
|  | ||||
| `GET /v1/vex/statements/{advisory_key}` produces sorted JSON responses containing raw statement metadata (`issuer`, `content_hash`, `signature`), normalised tuples, and provenance pointers. Advisory AI consumes this endpoint to build retrieval contexts with explicit citations. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Inputs, outputs & canonical domain | ||||
|  | ||||
| ### 1.1 Accepted input formats (ingest) | ||||
|  | ||||
| * **OpenVEX** JSON documents (attested or raw). | ||||
| * **CSAF VEX** 2.x (vendor PSIRTs and distros commonly publish CSAF). | ||||
| * **CycloneDX VEX** 1.4+ (standalone VEX or embedded VEX blocks). | ||||
| * **OCI‑attached attestations** (VEX statements shipped as OCI referrers) — optional connectors. | ||||
|  | ||||
| All connectors register **source metadata**: provider identity, trust tier, signature expectations (PGP/cosign/PKI), fetch windows, rate limits, and time anchors. | ||||
|  | ||||
| ### 1.2 Canonical model (observations & linksets) | ||||
|  | ||||
| #### VexObservation | ||||
|  | ||||
| ```jsonc | ||||
| observationId       // {tenant}:{providerId}:{upstreamId}:{revision} | ||||
| tenant | ||||
| providerId          // e.g., redhat, suse, ubuntu, osv | ||||
| streamId            // connector stream (csaf, openvex, cyclonedx, attestation) | ||||
| upstream{ | ||||
|     upstreamId, | ||||
|     documentVersion?, | ||||
|     fetchedAt, | ||||
|     receivedAt, | ||||
|     contentHash, | ||||
|     signature{present, format?, keyId?, signature?} | ||||
| } | ||||
| statements[ | ||||
|   { | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     status,                    // affected | not_affected | fixed | under_investigation | ||||
|     justification?, | ||||
|     introducedVersion?, | ||||
|     fixedVersion?, | ||||
|     lastObserved, | ||||
|     locator?,                  // JSON Pointer/line for provenance | ||||
|     evidence?[] | ||||
|   } | ||||
| ] | ||||
| content{ | ||||
|     format, | ||||
|     specVersion?, | ||||
|     raw | ||||
| } | ||||
| linkset{ | ||||
|     aliases[],                 // CVE/GHSA/vendor IDs | ||||
|     purls[], | ||||
|     cpes[], | ||||
|     references[{type,url}], | ||||
|     reconciledFrom[] | ||||
| } | ||||
| supersedes? | ||||
| createdAt | ||||
| attributes? | ||||
| ``` | ||||
|  | ||||
| #### VexLinkset | ||||
|  | ||||
| ```jsonc | ||||
| linksetId           // sha256 over sorted (tenant, vulnId, productKey, observationIds) | ||||
| tenant | ||||
| key{ | ||||
|     vulnerabilityId, | ||||
|     productKey, | ||||
|     confidence          // low|medium|high | ||||
| } | ||||
| observations[] = [ | ||||
|   { | ||||
|     observationId, | ||||
|     providerId, | ||||
|     status, | ||||
|     justification?, | ||||
|     introducedVersion?, | ||||
|     fixedVersion?, | ||||
|     evidence?, | ||||
|     collectedAt | ||||
|   } | ||||
| ] | ||||
| aliases{ | ||||
|     primary, | ||||
|     others[] | ||||
| } | ||||
| purls[] | ||||
| cpes[] | ||||
| conflicts[]?        // see VexLinksetConflict | ||||
| createdAt | ||||
| updatedAt | ||||
| ``` | ||||
|  | ||||
| #### VexLinksetConflict | ||||
|  | ||||
| ```jsonc | ||||
| conflictId | ||||
| type                // status-mismatch | justification-divergence | version-range-clash | non-joinable-overlap | metadata-gap | ||||
| field?              // optional pointer for UI rendering | ||||
| statements[]        // per-observation values with providerId + status/justification/version data | ||||
| confidence | ||||
| detectedAt | ||||
| ``` | ||||
|  | ||||
| #### VexConsensus (optional) | ||||
|  | ||||
| ```jsonc | ||||
| consensusId         // sha256(vulnerabilityId, productKey, policyRevisionId) | ||||
| vulnerabilityId | ||||
| productKey | ||||
| rollupStatus        // derived by Excititor policy adapter (linkset aware) | ||||
| sources[]           // observation references with weight, accepted flag, reason | ||||
| policyRevisionId | ||||
| evaluatedAt | ||||
| consensusDigest | ||||
| ``` | ||||
|  | ||||
| Consensus persists only when Excititor policy adapters require pre-computed rollups (e.g., Offline Kit). Policy Engine can also compute consensus on demand from linksets. | ||||
|  | ||||
| ### 1.3 Exports & evidence bundles | ||||
|  | ||||
| * **Raw observations** — JSON tree per observation for auditing/offline. | ||||
| * **Linksets** — grouped evidence for policy/Console/CLI consumption. | ||||
| * **Consensus (optional)** — if enabled, mirrors existing API contracts. | ||||
| * **Provider snapshots** — last N days of observations per provider to support diagnostics. | ||||
| * **Index** — `(productKey, vulnerabilityId) → {status candidates, confidence, observationIds}` for high-speed joins. | ||||
|  | ||||
| All exports remain deterministic and, when configured, attested via DSSE + Rekor v2. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Identity model — products & joins | ||||
|  | ||||
| ### 2.1 Vuln identity | ||||
|  | ||||
| * Accepts **CVE**, **GHSA**, vendor IDs (MSRC, RHSA…), distro IDs (DSA/USN/RHSA…) — normalized to `vulnId` with alias sets. | ||||
| * **Alias graph** maintained (from Concelier) to map vendor/distro IDs → CVE (primary) and to **GHSA** where applicable. | ||||
|  | ||||
| ### 2.2 Product identity (`productKey`) | ||||
|  | ||||
| * **Primary:** `purl` (Package URL). | ||||
| * **Secondary links:** `cpe`, **OS package NVRA/EVR**, NuGet/Maven/Golang identity, and **OS package name** when purl unavailable. | ||||
| * **Fallback:** `oci:<registry>/<repo>@<digest>` for image‑level VEX. | ||||
| * **Special cases:** kernel modules, firmware, platforms → provider‑specific mapping helpers (connector captures provider’s product taxonomy → canonical `productKey`). | ||||
|  | ||||
| > Excititor does not invent identities. If a provider cannot be mapped to purl/CPE/NVRA deterministically, we keep the native **product string** and mark the claim as **non‑joinable**; the backend will ignore it unless a policy explicitly whitelists that provider mapping. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Storage schema (MongoDB) | ||||
|  | ||||
| Database: `excititor` | ||||
|  | ||||
| ### 3.1 Collections | ||||
|  | ||||
| **`vex.providers`** | ||||
|  | ||||
| ``` | ||||
| _id: providerId | ||||
| name, homepage, contact | ||||
| trustTier: enum {vendor, distro, platform, hub, attestation} | ||||
| signaturePolicy: { type: pgp|cosign|x509|none, keys[], certs[], cosignKeylessRoots[] } | ||||
| fetch: { baseUrl, kind: http|oci|file, rateLimit, etagSupport, windowDays } | ||||
| enabled: bool | ||||
| createdAt, modifiedAt | ||||
| ``` | ||||
|  | ||||
| **`vex.raw`** (immutable raw documents) | ||||
|  | ||||
| ``` | ||||
| _id: sha256(doc bytes) | ||||
| providerId | ||||
| uri | ||||
| ingestedAt | ||||
| contentType | ||||
| sig: { verified: bool, method: pgp|cosign|x509|none, keyId|certSubject, bundle? } | ||||
| payload: GridFS pointer (if large) | ||||
| disposition: kept|replaced|superseded | ||||
| correlation: { replaces?: sha256, replacedBy?: sha256 } | ||||
| ``` | ||||
|  | ||||
| **`vex.observations`** | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "tenant:providerId:upstreamId:revision", | ||||
|   tenant, | ||||
|   providerId, | ||||
|   streamId, | ||||
|   upstream: { upstreamId, documentVersion?, fetchedAt, receivedAt, contentHash, signature }, | ||||
|   statements: [ | ||||
|     { | ||||
|       vulnerabilityId, | ||||
|       productKey, | ||||
|       status, | ||||
|       justification?, | ||||
|       introducedVersion?, | ||||
|       fixedVersion?, | ||||
|       lastObserved, | ||||
|       locator?, | ||||
|       evidence? | ||||
|     } | ||||
|   ], | ||||
|   content: { format, specVersion?, raw }, | ||||
|   linkset: { aliases[], purls[], cpes[], references[], reconciledFrom[] }, | ||||
|   supersedes?, | ||||
|   createdAt, | ||||
|   attributes? | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, providerId:1, upstream.upstreamId:1}`, `{tenant:1, statements.vulnerabilityId:1}`, `{tenant:1, linkset.purls:1}`, `{tenant:1, createdAt:-1}`. | ||||
|  | ||||
| **`vex.linksets`** | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: "sha256:...", | ||||
|   tenant, | ||||
|   key: { vulnerabilityId, productKey, confidence }, | ||||
|   observations: [ | ||||
|     { observationId, providerId, status, justification?, introducedVersion?, fixedVersion?, evidence?, collectedAt } | ||||
|   ], | ||||
|   aliases: { primary, others: [] }, | ||||
|   purls: [], | ||||
|   cpes: [], | ||||
|   conflicts: [], | ||||
|   createdAt, | ||||
|   updatedAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{tenant:1, key.vulnerabilityId:1, key.productKey:1}`, `{tenant:1, purls:1}`, `{tenant:1, updatedAt:-1}`. | ||||
|  | ||||
| **`vex.events`** (observation/linkset events, optional long retention) | ||||
|  | ||||
| ``` | ||||
| { | ||||
|   _id: ObjectId, | ||||
|   tenant, | ||||
|   type: "vex.observation.updated" | "vex.linkset.updated", | ||||
|   key, | ||||
|   delta, | ||||
|   hash, | ||||
|   occurredAt | ||||
| } | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{type:1, occurredAt:-1}`, TTL on `occurredAt` for configurable retention. | ||||
|  | ||||
| **`vex.consensus`** (optional rollups) | ||||
|  | ||||
| ``` | ||||
| _id: sha256(canonical(vulnerabilityId, productKey, policyRevisionId)) | ||||
| vulnerabilityId | ||||
| productKey | ||||
| rollupStatus | ||||
| sources[]      // observation references with weights/reasons | ||||
| policyRevisionId | ||||
| evaluatedAt | ||||
| signals?       // optional severity/kev/epss hints | ||||
| consensusDigest | ||||
| ``` | ||||
|  | ||||
|   * Indexes: `{vulnerabilityId:1, productKey:1}`, `{policyRevisionId:1, evaluatedAt:-1}`. | ||||
|  | ||||
| **`vex.exports`** (manifest of emitted artifacts) | ||||
|  | ||||
| ``` | ||||
| _id | ||||
| querySignature | ||||
| format: raw|consensus|index | ||||
| artifactSha256 | ||||
| rekor { uuid, index, url }? | ||||
| createdAt | ||||
| policyRevisionId | ||||
| cacheable: bool | ||||
| ``` | ||||
|  | ||||
| **`vex.cache`** — observation/linkset export cache: `{querySignature, exportId, ttl, hits}`. | ||||
|  | ||||
| **`vex.migrations`** — ordered migrations ensuring new indexes (`20251027-linksets-introduced`, etc.). | ||||
|  | ||||
| ### 3.2 Indexing strategy | ||||
|  | ||||
| * Hot path queries rely on `{tenant, key.vulnerabilityId, key.productKey}` covering linkset lookup. | ||||
| * Observability queries use `{tenant, updatedAt}` to monitor staleness. | ||||
| * Consensus (if enabled) keyed by `{vulnerabilityId, productKey, policyRevisionId}` for deterministic reuse. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Ingestion pipeline | ||||
|  | ||||
| ### 4.1 Connector contract | ||||
|  | ||||
| ```csharp | ||||
| public interface IVexConnector | ||||
| { | ||||
|     string ProviderId { get; } | ||||
|     Task FetchAsync(VexConnectorContext ctx, CancellationToken ct);   // raw docs | ||||
|     Task NormalizeAsync(VexConnectorContext ctx, CancellationToken ct); // raw -> ObservationStatements[] | ||||
| } | ||||
| ``` | ||||
|  | ||||
| * **Fetch** must implement: window scheduling, conditional GET (ETag/If‑Modified‑Since), rate limiting, retry/backoff. | ||||
| * **Normalize** parses the format, validates schema, maps product identities deterministically, emits observation statements with **provenance** metadata (locator, justification, version ranges). | ||||
|  | ||||
| ### 4.2 Signature verification (per provider) | ||||
|  | ||||
| * **cosign (keyless or keyful)** for OCI referrers or HTTP‑served JSON with Sigstore bundles. | ||||
| * **PGP** (provider keyrings) for distro/vendor feeds that sign docs. | ||||
| * **x509** (mutual TLS / provider‑pinned certs) where applicable. | ||||
| * Signature state is stored on **vex.raw.sig** and copied into `statements[].signatureState` so downstream policy can gate by verification result. | ||||
|  | ||||
| > Observation statements from sources failing signature policy are marked `"signatureState.verified=false"` and policy can down-weight or ignore them. | ||||
|  | ||||
| ### 4.3 Time discipline | ||||
|  | ||||
| * For each doc, prefer **provider’s document timestamp**; if absent, use fetch time. | ||||
| * Statements carry `lastObserved` which drives **tie-breaking** within equal weight tiers. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Normalization: product & status semantics | ||||
|  | ||||
| ### 5.1 Product mapping | ||||
|  | ||||
| * **purl** first; **cpe** second; OS package NVRA/EVR mapping helpers (distro connectors) produce purls via canonical tables (e.g., rpm→purl:rpm, deb→purl:deb). | ||||
| * Where a provider publishes **platform‑level** VEX (e.g., “RHEL 9 not affected”), connectors expand to known product inventory rules (e.g., map to sets of packages/components shipped in the platform). Expansion tables are versioned and kept per provider; every expansion emits **evidence** indicating the rule applied. | ||||
| * If expansion would be speculative, the statement remains **platform-scoped** with `productKey="platform:redhat:rhel:9"` and is flagged **non-joinable**; backend can decide to use platform VEX only when Scanner proves the platform runtime. | ||||
|  | ||||
| ### 5.2 Status + justification mapping | ||||
|  | ||||
| * Canonical **status**: `affected | not_affected | fixed | under_investigation`. | ||||
| * **Justifications** normalized to a controlled vocabulary (CISA‑aligned), e.g.: | ||||
|  | ||||
|   * `component_not_present` | ||||
|   * `vulnerable_code_not_in_execute_path` | ||||
|   * `vulnerable_configuration_unused` | ||||
|   * `inline_mitigation_applied` | ||||
|   * `fix_available` (with `fixedVersion`) | ||||
|   * `under_investigation` | ||||
| * Providers with free‑text justifications are mapped by deterministic tables; raw text preserved as `evidence`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Consensus algorithm | ||||
|  | ||||
| **Goal:** produce a **stable**, explainable `rollupStatus` per `(vulnId, productKey)` when consumers opt into Excititor-managed consensus derived from linksets. | ||||
|  | ||||
| ### 6.1 Inputs | ||||
|  | ||||
| * Set **S** of observation statements drawn from the current `VexLinkset` for `(tenant, vulnId, productKey)`. | ||||
| * **Excititor policy snapshot**: | ||||
|  | ||||
|   * **weights** per provider tier and per provider overrides. | ||||
|   * **justification gates** (e.g., require justification for `not_affected` to be acceptable). | ||||
|   * **minEvidence** rules (e.g., `not_affected` must come from ≥1 vendor or 2 distros). | ||||
|   * **signature requirements** (e.g., require verified signature for ‘fixed’ to be considered). | ||||
|  | ||||
| ### 6.2 Steps | ||||
|  | ||||
| 1. **Filter invalid** statements by signature policy & justification gates → set `S'`. | ||||
| 2. **Score** each statement: | ||||
|    `score = weight(provider) * freshnessFactor(lastObserved)` where freshnessFactor ∈ [0.8, 1.0] for staleness decay (configurable; small effect). Observations lacking verified signatures receive policy-configured penalties. | ||||
| 3. **Aggregate** scores per status: `W(status) = Σ score(statements with that status)`. | ||||
| 4. **Pick** `rollupStatus = argmax_status W(status)`. | ||||
| 5. **Tie‑breakers** (in order): | ||||
|  | ||||
|    * Higher **max single** provider score wins (vendor > distro > platform > hub). | ||||
|    * More **recent** lastObserved wins. | ||||
|    * Deterministic lexicographic order of status (`fixed` > `not_affected` > `under_investigation` > `affected`) as final tiebreaker. | ||||
| 6. **Explain**: mark accepted observations (`accepted=true; reason="weight"`/`"freshness"`/`"confidence"`) and rejected ones with explicit `reason` (`"insufficient_justification"`, `"signature_unverified"`, `"lower_weight"`, `"low_confidence_linkset"`). | ||||
|  | ||||
| > The algorithm is **pure** given `S` and policy snapshot; result is reproducible and hashed into `consensusDigest`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) Query & export APIs | ||||
|  | ||||
| All endpoints are versioned under `/api/v1/vex`. | ||||
|  | ||||
| ### 7.1 Query (online) | ||||
|  | ||||
| ``` | ||||
| POST /observations/search | ||||
|   body: { vulnIds?: string[], productKeys?: string[], providers?: string[], since?: timestamp, limit?: int, pageToken?: string } | ||||
|   → { observations[], nextPageToken? } | ||||
|  | ||||
| POST /linksets/search | ||||
|   body: { vulnIds?: string[], productKeys?: string[], confidence?: string[], since?: timestamp, limit?: int, pageToken?: string } | ||||
|   → { linksets[], nextPageToken? } | ||||
|  | ||||
| POST /consensus/search | ||||
|   body: { vulnIds?: string[], productKeys?: string[], policyRevisionId?: string, since?: timestamp, limit?: int, pageToken?: string } | ||||
|   → { entries[], nextPageToken? } | ||||
|  | ||||
| POST /excititor/resolve (scope: vex.read) | ||||
|   body: { productKeys?: string[], purls?: string[], vulnerabilityIds: string[], policyRevisionId?: string } | ||||
|   → { policy, resolvedAt, results: [ { vulnerabilityId, productKey, status, observations[], conflicts[], linksetConfidence, consensus?, signals?, envelope? } ] } | ||||
| ``` | ||||
|  | ||||
| ### 7.2 Exports (cacheable snapshots) | ||||
|  | ||||
| ``` | ||||
| POST /exports | ||||
|   body: { signature: { vulnFilter?, productFilter?, providers?, since? }, format: raw|consensus|index, policyRevisionId?: string, force?: bool } | ||||
|   → { exportId, artifactSha256, rekor? } | ||||
|  | ||||
| GET  /exports/{exportId}        → bytes (application/json or binary index) | ||||
| GET  /exports/{exportId}/meta   → { signature, policyRevisionId, createdAt, artifactSha256, rekor? } | ||||
| ``` | ||||
|  | ||||
| ### 7.3 Provider operations | ||||
|  | ||||
| ``` | ||||
| GET  /providers                  → provider list & signature policy | ||||
| POST /providers/{id}/refresh     → trigger fetch/normalize window | ||||
| GET  /providers/{id}/status      → last fetch, doc counts, signature stats | ||||
| ``` | ||||
|  | ||||
| **Auth:** service‑to‑service via Authority tokens; operator operations via UI/CLI with RBAC. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Attestation integration | ||||
|  | ||||
| * Exports can be **DSSE‑signed** via **Signer** and logged to **Rekor v2** via **Attestor** (optional but recommended for regulated pipelines). | ||||
| * `vex.exports.rekor` stores `{uuid, index, url}` when present. | ||||
| * **Predicate type**: `https://stella-ops.org/attestations/vex-export/1` with fields: | ||||
|  | ||||
|   * `querySignature`, `policyRevisionId`, `artifactSha256`, `createdAt`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| excititor: | ||||
|   mongo: { uri: "mongodb://mongo/excititor" } | ||||
|   s3: | ||||
|     endpoint: http://minio:9000 | ||||
|     bucket: stellaops | ||||
|   policy: | ||||
|     weights: | ||||
|       vendor: 1.0 | ||||
|       distro: 0.9 | ||||
|       platform: 0.7 | ||||
|       hub: 0.5 | ||||
|       attestation: 0.6 | ||||
|       ceiling: 1.25 | ||||
|     scoring: | ||||
|       alpha: 0.25 | ||||
|       beta: 0.5 | ||||
|     providerOverrides: | ||||
|       redhat: 1.0 | ||||
|       suse: 0.95 | ||||
|     requireJustificationForNotAffected: true | ||||
|     signatureRequiredForFixed: true | ||||
|     minEvidence: | ||||
|       not_affected: | ||||
|         vendorOrTwoDistros: true | ||||
|   connectors: | ||||
|     - providerId: redhat | ||||
|       kind: csaf | ||||
|       baseUrl: https://access.redhat.com/security/data/csaf/v2/ | ||||
|       signaturePolicy: { type: pgp, keys: [ "…redhat-pgp-key…" ] } | ||||
|       windowDays: 7 | ||||
|     - providerId: suse | ||||
|       kind: csaf | ||||
|       baseUrl: https://ftp.suse.com/pub/projects/security/csaf/ | ||||
|       signaturePolicy: { type: pgp, keys: [ "…suse-pgp-key…" ] } | ||||
|     - providerId: ubuntu | ||||
|       kind: openvex | ||||
|       baseUrl: https://…/vex/ | ||||
|       signaturePolicy: { type: none } | ||||
|     - providerId: vendorX | ||||
|       kind: cyclonedx-vex | ||||
|       ociRef: ghcr.io/vendorx/vex@sha256:… | ||||
|       signaturePolicy: { type: cosign, cosignKeylessRoots: [ "sigstore-root" ] } | ||||
| ``` | ||||
|  | ||||
| ### 9.1 WebService endpoints | ||||
|  | ||||
| With storage configured, the WebService exposes the following ingress and diagnostic APIs: | ||||
|  | ||||
| * `GET /excititor/status` – returns the active storage configuration and registered artifact stores. | ||||
| * `GET /excititor/health` – simple liveness probe. | ||||
| * `POST /excititor/statements` – accepts normalized VEX statements and persists them via `IVexClaimStore`; use this for migrations/backfills. | ||||
| * `GET /excititor/statements/{vulnId}/{productKey}?since=` – returns the immutable statement log for a vulnerability/product pair. | ||||
| * `POST /excititor/resolve` – requires `vex.read` scope; accepts up to 256 `(vulnId, productKey)` pairs via `productKeys` or `purls` and returns deterministic consensus results, decision telemetry, and a signed envelope (`artifact` digest, optional signer signature, optional attestation metadata + DSSE envelope). Returns **409 Conflict** when the requested `policyRevisionId` mismatches the active snapshot. | ||||
|  | ||||
| Run the ingestion endpoint once after applying migration `20251019-consensus-signals-statements` to repopulate historical statements with the new severity/KEV/EPSS signal fields. | ||||
|  | ||||
| * `weights.ceiling` raises the deterministic clamp applied to provider tiers/overrides (range 1.0‒5.0). Values outside the range are clamped with warnings so operators can spot typos. | ||||
| * `scoring.alpha` / `scoring.beta` configure KEV/EPSS boosts for the Phase 1 → Phase 2 scoring pipeline. Defaults (0.25, 0.5) preserve prior behaviour; negative or excessively large values fall back with diagnostics. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Security model | ||||
|  | ||||
| * **Input signature verification** enforced per provider policy (PGP, cosign, x509). | ||||
| * **Connector allowlists**: outbound fetch constrained to configured domains. | ||||
| * **Tenant isolation**: per‑tenant DB prefixes or separate DBs; per‑tenant S3 prefixes; per‑tenant policies. | ||||
| * **AuthN/Z**: Authority‑issued OpToks; RBAC roles (`vex.read`, `vex.admin`, `vex.export`). | ||||
| * **No secrets in logs**; deterministic logging contexts include providerId, docDigest, observationId, and linksetId. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Performance & scale | ||||
|  | ||||
| * **Targets:** | ||||
|  | ||||
|   * Normalize 10k observation statements/minute/core. | ||||
|   * Linkset rebuild ≤ 20 ms P95 for 1k unique `(vuln, product)` pairs in hot cache. | ||||
|   * Consensus (when enabled) compute ≤ 50 ms for 1k unique `(vuln, product)` pairs. | ||||
|   * Export (observations + linksets) 1M rows in ≤ 60 s on 8 cores with streaming writer. | ||||
|  | ||||
| * **Scaling:** | ||||
|  | ||||
|   * WebService handles control APIs; **Worker** background services (same image) execute fetch/normalize in parallel with rate‑limits; Mongo writes batched; upserts by natural keys. | ||||
|   * Exports stream straight to S3 (MinIO) with rolling buffers. | ||||
|  | ||||
| * **Caching:** | ||||
|  | ||||
|   * `vex.cache` maps query signatures → export; TTL to avoid stampedes; optimistic reuse unless `force`. | ||||
|  | ||||
| ### 11.1 Worker TTL refresh controls | ||||
|  | ||||
| Excititor.Worker ships with a background refresh service that re-evaluates stale consensus rows and applies stability dampers before publishing status flips. Operators can tune its behaviour through the following configuration (shown in `appsettings.json` syntax): | ||||
|  | ||||
| ```jsonc | ||||
| { | ||||
|   "Excititor": { | ||||
|     "Worker": { | ||||
|       "Refresh": { | ||||
|         "Enabled": true, | ||||
|         "ConsensusTtl": "02:00:00",       // refresh consensus older than 2 hours | ||||
|         "ScanInterval": "00:10:00",       // sweep cadence | ||||
|         "ScanBatchSize": 250,              // max documents examined per sweep | ||||
|         "Damper": { | ||||
|           "Minimum": "1.00:00:00",       // lower bound before status flip publishes | ||||
|           "Maximum": "2.00:00:00",       // upper bound guardrail | ||||
|           "DefaultDuration": "1.12:00:00", | ||||
|           "Rules": [ | ||||
|             { "MinWeight": 0.90, "Duration": "1.00:00:00" }, | ||||
|             { "MinWeight": 0.75, "Duration": "1.06:00:00" }, | ||||
|             { "MinWeight": 0.50, "Duration": "1.12:00:00" } | ||||
|           ] | ||||
|         } | ||||
|       } | ||||
|     } | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| * `ConsensusTtl` governs when the worker issues a fresh resolve for cached consensus data. | ||||
| * `Damper` lengths are clamped between `Minimum`/`Maximum`; duration is bypassed when component fingerprints (`VexProduct.ComponentIdentifiers`) change. | ||||
| * The same keys are available through environment variables (e.g., `Excititor__Worker__Refresh__ConsensusTtl=02:00:00`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Observability | ||||
|  | ||||
| * **Metrics:** | ||||
|  | ||||
|   * `vex.fetch.requests_total{provider}` / `vex.fetch.bytes_total{provider}` | ||||
|   * `vex.fetch.failures_total{provider,reason}` / `vex.signature.failures_total{provider,method}` | ||||
|   * `vex.normalize.statements_total{provider}` | ||||
|   * `vex.observations.write_total{result}` | ||||
|   * `vex.linksets.updated_total{result}` / `vex.linksets.conflicts_total{type}` | ||||
|   * `vex.consensus.rollup_total{status}` (when enabled) | ||||
|   * `vex.exports.bytes_total{format}` / `vex.exports.latency_seconds{format}` | ||||
| * **Tracing:** spans for fetch, verify, parse, map, observe, linkset, consensus, export. | ||||
| * **Dashboards:** provider staleness, linkset conflict hot spots, signature posture, export cache hit-rate. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) Testing matrix | ||||
|  | ||||
| * **Connectors:** golden raw docs → deterministic observation statements (fixtures per provider/format). | ||||
| * **Signature policies:** valid/invalid PGP/cosign/x509 samples; ensure rejects are recorded but not accepted. | ||||
| * **Normalization edge cases:** platform-scoped statements, free-text justifications, non-purl products. | ||||
| * **Linksets:** conflict scenarios across tiers; verify confidence scoring + conflict payload stability. | ||||
| * **Consensus (optional):** ensure tie-breakers honour policy weights/justification gates. | ||||
| * **Performance:** 1M-row observation/linkset export timing; memory ceilings; stream correctness. | ||||
| * **Determinism:** same inputs + policy → identical linkset hashes, conflict payloads, optional `consensusDigest`, and export bytes. | ||||
| * **API contract tests:** pagination, filters, RBAC, rate limits. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Integration points | ||||
|  | ||||
| * **Backend Policy Engine** (in Scanner.WebService): calls `POST /excititor/resolve` (scope `vex.read`) with batched `(purl, vulnId)` pairs to fetch `rollupStatus + sources`. | ||||
| * **Concelier**: provides alias graph (CVE↔vendor IDs) and may supply VEX‑adjacent metadata (e.g., KEV flag) for policy escalation. | ||||
| * **UI**: VEX explorer screens use `/observations/search`, `/linksets/search`, and `/consensus/search`; show conflicts & provenance. | ||||
| * **CLI**: `stella vex linksets export --since 7d --out vex-linksets.json` (optionally `--include-consensus`) for audits and Offline Kit parity. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Failure modes & fallback | ||||
|  | ||||
| * **Provider unreachable:** stale thresholds trigger warnings; policy can down‑weight stale providers automatically (freshness factor). | ||||
| * **Signature outage:** continue to ingest but mark `signatureState.verified=false`; consensus will likely exclude or down‑weight per policy. | ||||
| * **Schema drift:** unknown fields are preserved as `evidence`; normalization rejects only on **invalid identity** or **status**. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Rollout plan (incremental) | ||||
|  | ||||
| 1. **MVP**: OpenVEX + CSAF connectors for 3 major providers (e.g., Red Hat/SUSE/Ubuntu), normalization + consensus + `/excititor/resolve`. | ||||
| 2. **Signature policies**: PGP for distros; cosign for OCI. | ||||
| 3. **Exports + optional attestation**. | ||||
| 4. **CycloneDX VEX** connectors; platform claim expansion tables; UI explorer. | ||||
| 5. **Scale hardening**: export indexes; conflict analytics. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 18) Operational runbooks | ||||
|  | ||||
| * **Statement backfill** — see `docs/dev/EXCITITOR_STATEMENT_BACKFILL.md` for the CLI workflow, required permissions, observability guidance, and rollback steps. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 19) Appendix — canonical JSON (stable ordering) | ||||
|  | ||||
| All exports and consensus entries are serialized via `VexCanonicalJsonSerializer`: | ||||
|  | ||||
| * UTF‑8 without BOM; | ||||
| * keys sorted (ASCII); | ||||
| * arrays sorted by `(providerId, vulnId, productKey, lastObserved)` unless semantic order mandated; | ||||
| * timestamps in `YYYY‑MM‑DDThh:mm:ssZ`; | ||||
| * no insignificant whitespace. | ||||
|  | ||||
							
								
								
									
										21
									
								
								docs/modules/excititor/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										21
									
								
								docs/modules/excititor/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,21 @@ | ||||
| # Implementation plan — Excititor | ||||
|  | ||||
| ## Current objectives | ||||
| - Maintain deterministic behaviour and offline parity across releases. | ||||
| - Keep documentation, telemetry, and runbooks aligned with the latest sprint outcomes. | ||||
|  | ||||
| ## Workstreams | ||||
| - Backlog grooming: reconcile open stories in ../../TASKS.md with this module's roadmap. | ||||
| - Implementation: collaborate with service owners to land feature work defined in SPRINTS/EPIC docs. | ||||
| - Validation: extend tests/fixtures to preserve determinism and provenance requirements. | ||||
|  | ||||
| ## Epic milestones | ||||
| - **Epic 1 – AOC enforcement:** enforce immutable VEX observation schema, provenance capture, and guardrails. | ||||
| - **Epic 7 – VEX Consensus Lens:** provide lens-ready metadata (issuer trust, temporal scoping) and consensus APIs. | ||||
| - **Epic 8 – Advisory AI:** guarantee citation-ready payloads and normalized context for AI summaries/explainers. | ||||
| - Track DOCS-LNM-22-006/007 and CLI-EXC-25-001..002 in ../../TASKS.md. | ||||
|  | ||||
| ## Coordination | ||||
| - Review ./AGENTS.md before picking up new work. | ||||
| - Sync with cross-cutting teams noted in ../../implplan/SPRINTS.md. | ||||
| - Update this plan whenever scope, dependencies, or guardrails change. | ||||
							
								
								
									
										164
									
								
								docs/modules/excititor/mirrors.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										164
									
								
								docs/modules/excititor/mirrors.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,164 @@ | ||||
| # architecture_excititor_mirrors.md — Excititor Mirror Distribution | ||||
|  | ||||
| > **Status:** Draft (Sprint 7). Complements `docs/modules/excititor/architecture.md` by describing the mirror export surface exposed by `Excititor.WebService` and the configuration hooks used by operators and downstream mirrors. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Purpose | ||||
|  | ||||
| Excititor publishes canonical VEX consensus data. Operators (or StellaOps-managed mirrors) need a deterministic way to sync those exports into downstream environments. Mirror distribution provides: | ||||
|  | ||||
| * A declarative map of export bundles (`json`, `jsonl`, `openvex`, `csaf`) reachable via signed HTTP endpoints under `/excititor/mirror`. | ||||
| * Thin quota/authentication controls on top of the existing export cache so mirrors cannot starve the web service. | ||||
| * Stable payload shapes that downstream automation can monitor (index → fetch updates → download artifact → verify signature). | ||||
|  | ||||
| Mirror endpoints are intentionally **read-only**. Write paths (export generation, attestation, cache) remain the responsibility of the export pipeline. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Configuration model | ||||
|  | ||||
| The web service reads mirror configuration from `Excititor:Mirror` (YAML/JSON/appsettings). Each domain groups a set of exports that share rate limits and authentication rules. | ||||
|  | ||||
| ```yaml | ||||
| Excititor: | ||||
|   Mirror: | ||||
|     Domains: | ||||
|       - id: primary | ||||
|         displayName: Primary Mirror | ||||
|         requireAuthentication: false | ||||
|         maxIndexRequestsPerHour: 600 | ||||
|         maxDownloadRequestsPerHour: 1200 | ||||
|         exports: | ||||
|           - key: consensus | ||||
|             format: json | ||||
|             filters: | ||||
|               vulnId: CVE-2025-0001 | ||||
|               productKey: pkg:test/demo | ||||
|             sort: | ||||
|               createdAt: false     # descending | ||||
|             limit: 1000 | ||||
|           - key: consensus-openvex | ||||
|             format: openvex | ||||
|             filters: | ||||
|               vulnId: CVE-2025-0001 | ||||
| ``` | ||||
|  | ||||
| ### Root settings | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `outputRoot` | – | Filesystem root where mirror artefacts are written. Defaults to the Excititor file-system artifact store root when omitted. | | ||||
| | `directoryName` | – | Optional subdirectory created under `outputRoot`; defaults to `mirror`. | | ||||
| | `targetRepository` | – | Hint propagated to manifests/index files indicating the operator-visible location (for example `s3://mirror/excititor`). | | ||||
| | `signing` | – | Bundle signing configuration. When enabled, the exporter emits a detached JWS (`bundle.json.jws`) alongside each domain bundle. | | ||||
|  | ||||
| `signing` supports the following fields: | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `enabled` | – | Toggles detached signing for domain bundles. | | ||||
| | `algorithm` | – | Signing algorithm identifier (default `ES256`). | | ||||
| | `keyId` | ✅ (when `enabled`) | Signing key identifier resolved via the configured crypto provider registry. | | ||||
| | `provider` | – | Optional provider hint when multiple registries are available. | | ||||
| | `keyPath` | – | Optional PEM path used to seed the provider when the key is not already loaded. | | ||||
|  | ||||
| ### Domain field reference | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `id` | ✅ | Stable identifier. Appears in URLs (`/excititor/mirror/domains/{id}`) and download filenames. | | ||||
| | `displayName` | – | Human-friendly label surfaced in the `/domains` listing. Falls back to `id`. | | ||||
| | `requireAuthentication` | – | When `true` the service enforces that the caller is authenticated (Authority token). | | ||||
| | `maxIndexRequestsPerHour` | – | Per-domain quota for index endpoints. `0`/negative disables the guard. | | ||||
| | `maxDownloadRequestsPerHour` | – | Per-domain quota for artifact downloads. | | ||||
| | `exports` | ✅ | Collection of export projections. | | ||||
|  | ||||
| Export-level fields: | ||||
|  | ||||
| | Field | Required | Description | | ||||
| | --- | --- | --- | | ||||
| | `key` | ✅ | Unique key within the domain. Used in URLs (`/exports/{key}`) and filenames/bundle entries. | | ||||
| | `format` | ✅ | One of `json`, `jsonl`, `openvex`, `csaf`. Maps to `VexExportFormat`. | | ||||
| | `filters` | – | Key/value pairs executed via `VexQueryFilter`. Keys must match export data source columns (e.g., `vulnId`, `productKey`). | | ||||
| | `sort` | – | Key/boolean map (false = descending). | | ||||
| | `limit`, `offset`, `view` | – | Optional query bounds passed through to the export query. | | ||||
|  | ||||
| ⚠️ **Misconfiguration:** invalid formats or missing keys cause exports to be flagged with `status` in the index response; they are not exposed downstream. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) HTTP surface | ||||
|  | ||||
| Routes are grouped under `/excititor/mirror`. | ||||
|  | ||||
| | Method | Path | Description | | ||||
| | --- | --- | --- | | ||||
| | `GET` | `/domains` | Returns configured domains with quota metadata. | | ||||
| | `GET` | `/domains/{domainId}` | Domain detail (auth/quota + export keys). `404` for unknown domains. | | ||||
| | `GET` | `/domains/{domainId}/index` | Lists exports with exportId, query signature, format, artifact digest, attestation metadata, and size. Applies index quota. | | ||||
| | `GET` | `/domains/{domainId}/exports/{exportKey}` | Returns manifest metadata (single export). `404` if unknown/missing. | | ||||
| | `GET` | `/domains/{domainId}/exports/{exportKey}/download` | Streams export content from the artifact store. Applies download quota. | | ||||
|  | ||||
| Responses are serialized via `VexCanonicalJsonSerializer` ensuring stable ordering. Download responses include a content-disposition header naming the file `<domain>-<export>.<ext>`. | ||||
|  | ||||
| ### Error handling | ||||
|  | ||||
| * `401` – authentication required (`requireAuthentication=true`). | ||||
| * `404` – domain/export not found or manifest not persisted. | ||||
| * `429` – per-domain quota exceeded (`Retry-After` header set in seconds). | ||||
| * `503` – export misconfiguration (invalid format/query). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Rate limiting | ||||
|  | ||||
| `MirrorRateLimiter` implements a simple rolling 1-hour window using `IMemoryCache`. Each domain has two quotas: | ||||
|  | ||||
| * `index` scope → `maxIndexRequestsPerHour` | ||||
| * `download` scope → `maxDownloadRequestsPerHour` | ||||
|  | ||||
| `0` or negative limits disable enforcement. Quotas are best-effort (per-instance). For HA deployments, configure sticky routing at the ingress or replace the limiter with a distributed implementation. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Interaction with export pipeline | ||||
|  | ||||
| Mirror endpoints consume manifests produced by the export engine (`MongoVexExportStore`). They do **not** trigger new exports. Operators must configure connectors/exporters to keep targeted exports fresh (see `EXCITITOR-EXPORT-01-005/006/007`). | ||||
|  | ||||
| Recommended workflow: | ||||
|  | ||||
| 1. Define export plans at the export layer (JSON/OpenVEX/CSAF). | ||||
| 2. Configure mirror domains mapping to those plans. | ||||
| 3. Downstream mirror automation: | ||||
|    * `GET /domains/{id}/index` | ||||
|    * Compare `exportId` / `consensusRevision` | ||||
|    * `GET /download` when new | ||||
|    * Verify digest + attestation | ||||
|  | ||||
| When the export engine runs, it materializes the following artefacts under `outputRoot/<directoryName>`: | ||||
|  | ||||
| - `index.json` – canonical index listing each configured domain, manifest/bundle descriptors (with SHA-256 digests), and available export keys. | ||||
| - `<domain>/manifest.json` – per-domain summary with export metadata (query signature, consensus/score digests, source providers) and a descriptor pointing at the bundle. | ||||
| - `<domain>/bundle.json` – canonical payload containing serialized consensus, score envelopes, and normalized VEX claims for the matching export definitions. | ||||
| - `<domain>/bundle.json.jws` – optional detached JWS when signing is enabled. | ||||
|  | ||||
| Downstream automation reads `manifest.json`/`bundle.json` directly, while `/excititor/mirror` endpoints stream the same artefacts through authenticated HTTP. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Operational guidance | ||||
|  | ||||
| * Track quota utilisation via HTTP 429 metrics (configure structured logging or OTEL counters when rate limiting triggers). | ||||
| * Mirror domains can be deployed per tenant (e.g., `tenant-a`, `tenant-b`) with different auth requirements. | ||||
| * Ensure the underlying artifact stores (`FileSystem`, `S3`, offline bundle) retain artefacts long enough for mirrors to sync. | ||||
| * For air-gapped mirrors, combine mirror endpoints with the Offline Kit (see `docs/24_OFFLINE_KIT.md`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Future alignment | ||||
|  | ||||
| * Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships. | ||||
| * Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata. | ||||
| * Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively. | ||||
|  | ||||
							
								
								
									
										104
									
								
								docs/modules/excititor/scoring.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										104
									
								
								docs/modules/excititor/scoring.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,104 @@ | ||||
| ## Status | ||||
|  | ||||
| This document tracks the future-looking risk scoring model for Excititor. The calculation below is not active yet; Sprint 7 work will add the required schema fields, policy controls, and services. Until that ships, Excititor emits consensus statuses without numeric scores. | ||||
|  | ||||
| ## Scoring model (target state) | ||||
|  | ||||
| **S = Gate(VEX_status) × W_trust(source) × [Severity_base × (1 + α·KEV + β·EPSS)]** | ||||
|  | ||||
| * **Gate(VEX_status)**: `affected`/`under_investigation` → 1, `not_affected`/`fixed` → 0. A trusted “not affected” or “fixed” still zeroes the score. | ||||
| * **W_trust(source)**: normalized policy weight (baseline 0‒1). Policies may opt into >1 boosts for signed vendor feeds once Phase 1 closes. | ||||
| * **Severity_base**: canonical numeric severity from Concelier (CVSS or org-defined scale). | ||||
| * **KEV flag**: 0/1 boost when CISA Known Exploited Vulnerabilities applies. | ||||
| * **EPSS**: probability [0,1]; bounded multiplier. | ||||
| * **α, β**: configurable coefficients (default α=0.25, β=0.5) stored in policy. | ||||
|  | ||||
| Safeguards: freeze boosts when product identity is unknown, clamp outputs ≥0, and log every factor in the audit trail. | ||||
|  | ||||
| ## Implementation roadmap | ||||
|  | ||||
| | Phase | Scope | Artifacts | | ||||
| | --- | --- | --- | | ||||
| | **Phase 1 – Schema foundations** | Extend Excititor consensus/claims and Concelier canonical advisories with severity, KEV, EPSS, and expose α/β + weight ceilings in policy. | Sprint 7 tasks `EXCITITOR-CORE-02-001`, `EXCITITOR-POLICY-02-001`, `EXCITITOR-STORAGE-02-001`, `FEEDCORE-ENGINE-07-001`. | | ||||
| | **Phase 2 – Deterministic score engine** | Implement a scoring component that executes alongside consensus and persists score envelopes with hashes. | Planned task `EXCITITOR-CORE-02-002` (backlog). | | ||||
| | **Phase 3 – Surfacing & enforcement** | Expose scores via WebService/CLI, integrate with Concelier noise priors, and enforce policy-based suppressions. | To be scheduled after Phase 2. | | ||||
|  | ||||
| ## Policy controls (Phase 1) | ||||
|  | ||||
| Operators tune scoring inputs through the Excititor policy document: | ||||
|  | ||||
| ```yaml | ||||
| excititor: | ||||
|   policy: | ||||
|     weights: | ||||
|       vendor: 1.10      # per-tier weight | ||||
|       ceiling: 1.40     # max clamp applied to tiers and overrides (1.0‒5.0) | ||||
|     providerOverrides: | ||||
|       trusted.vendor: 1.35 | ||||
|     scoring: | ||||
|       alpha: 0.30       # KEV boost coefficient (defaults to 0.25) | ||||
|       beta: 0.60        # EPSS boost coefficient (defaults to 0.50) | ||||
| ``` | ||||
|  | ||||
| * All weights (tiers + overrides) are clamped to `[0, weights.ceiling]` with structured warnings when a value is out of range or not a finite number. | ||||
| * `weights.ceiling` itself is constrained to `[1.0, 5.0]`, preserving prior behaviour when omitted. | ||||
| * `scoring.alpha` / `scoring.beta` accept non-negative values up to 5.0; values outside the range fall back to defaults and surface diagnostics to operators. | ||||
|  | ||||
| ## Data model (after Phase 1) | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "vulnerabilityId": "CVE-2025-12345", | ||||
|   "product": "pkg:name@version", | ||||
|   "consensus": { | ||||
|     "status": "affected", | ||||
|     "policyRevisionId": "rev-12", | ||||
|     "policyDigest": "0D9AEC…" | ||||
|   }, | ||||
|   "signals": { | ||||
|     "severity": {"scheme": "CVSS:3.1", "score": 7.5}, | ||||
|     "kev": true, | ||||
|     "epss": 0.40 | ||||
|   }, | ||||
|   "policy": { | ||||
|     "weight": 1.15, | ||||
|     "alpha": 0.25, | ||||
|     "beta": 0.5 | ||||
|   }, | ||||
|   "score": { | ||||
|     "value": 10.8, | ||||
|     "generatedAt": "2025-11-05T14:12:30Z", | ||||
|     "audit": [ | ||||
|       "gate:affected", | ||||
|       "weight:1.15", | ||||
|       "severity:7.5", | ||||
|       "kev:1", | ||||
|       "epss:0.40" | ||||
|     ] | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## Operational guidance | ||||
|  | ||||
| * **Inputs**: Concelier delivers severity/KEV/EPSS via the advisory event log; Excititor connectors load VEX statements. Policy owns trust tiers and coefficients. | ||||
| * **Processing**: the scoring engine (Phase 2) runs next to consensus, storing results with deterministic hashes so exports and attestations can reference them. | ||||
| * **Consumption**: WebService/CLI will return consensus plus score; scanners may suppress findings only when policy-authorized VEX gating and signed score envelopes agree. | ||||
|  | ||||
| ## Pseudocode (Phase 2 preview) | ||||
|  | ||||
| ```python | ||||
| def risk_score(gate, weight, severity, kev, epss, alpha, beta, freeze_boosts=False): | ||||
|     if gate == 0: | ||||
|         return 0 | ||||
|     if freeze_boosts: | ||||
|         kev, epss = 0, 0 | ||||
|     boost = 1 + alpha * kev + beta * epss | ||||
|     return max(0, weight * severity * boost) | ||||
| ``` | ||||
|  | ||||
| ## FAQ | ||||
|  | ||||
| * **Can operators opt out?** Set α=β=0 or keep weights ≤1.0 via policy. | ||||
| * **What about missing signals?** Treat them as zero and log the omission. | ||||
| * **When will this ship?** Phase 1 is planned for Sprint 7; later phases depend on connector coverage and attestation delivery. | ||||
							
								
								
									
										22
									
								
								docs/modules/export-center/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/export-center/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Export Center agent guide | ||||
|  | ||||
| ## Mission | ||||
| Export Center packages reproducible evidence bundles (JSON, Trivy DB, mirror) with provenance metadata and optional signing for offline or mirrored deployments. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										34
									
								
								docs/modules/export-center/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										34
									
								
								docs/modules/export-center/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,34 @@ | ||||
| # StellaOps Export Center | ||||
|  | ||||
| Export Center packages reproducible evidence bundles (JSON, Trivy DB, mirror) with provenance metadata and optional signing for offline or mirrored deployments. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Coordinate export jobs based on profiles and scope selectors. | ||||
| - Assemble manifests, provenance documents, and cosign signatures. | ||||
| - Stream bundles via HTTP/OCI and stage them for Offline Kit uses. | ||||
| - Expose CLI/API surfaces for automation. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.ExportCenter.WebService` planner. | ||||
| - `StellaOps.ExportCenter.Worker` bundle builder. | ||||
| - Adapters in `StellaOps.ExportCenter.*` for JSON/Trivy/mirror variants. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - Concelier/Excititor/Policy data stores for evidence. | ||||
| - Signer/Attestor for provenance signing. | ||||
| - CLI for operator-managed exports. | ||||
|  | ||||
| ## Operational notes | ||||
| - Runbooks in ./operations/ for deployment and monitoring. | ||||
| - Mirror bundle instructions and validation notes. | ||||
| - Telemetry dashboards for export latency and retry rates. | ||||
|  | ||||
| ## Related resources | ||||
| - ./operations/runbook.md | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-EXPORT-35-001 … DOCS-EXPORT-37-002 in ../../TASKS.md. | ||||
| - EXPORT-ATTEST-75-002 cross-team deliverable. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 10 – Export Center:** deliver canonical JSON, Trivy DB, and mirror bundle workflows with provenance, signatures, and offline parity. | ||||
							
								
								
									
										9
									
								
								docs/modules/export-center/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/export-center/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Export Center | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | EXPORT CENTER-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | EXPORT CENTER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | EXPORT CENTER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										337
									
								
								docs/modules/export-center/api.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										337
									
								
								docs/modules/export-center/api.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,337 @@ | ||||
| # Export Center REST API | ||||
|  | ||||
| > **Audience:** Platform integrators, Console/CLI developers, and automation engineers orchestrating export runs.   | ||||
| > **Base route:** `/api/export/*` behind the StellaOps gateway; requires Authority-issued tokens with export scopes. | ||||
|  | ||||
| This reference describes the Export Center API introduced in Export Center Phase 1 (Epic 10) and extended in Phase 2. Use it alongside the [Export Center Architecture](architecture.md) and [Profiles](profiles.md) guides for service-level semantics. | ||||
|  | ||||
| > Status: Endpoint implementation lands with `EXPORT-SVC-35-006` (Sprint 35) and related follow-on tasks. As of the current build the WebService hosts only the template stub; use this contract for coordination and update once the API is wired. | ||||
|  | ||||
| ## 1. Authentication and headers | ||||
|  | ||||
| - **Authorization:** Bearer tokens in `Authorization: Bearer <token>` paired with DPoP proof. Required scopes per endpoint: | ||||
|   - `export:profile:manage` for profile CRUD. | ||||
|   - `export:run` to submit and cancel runs. | ||||
|   - `export:read` to list and inspect runs. | ||||
|   - `export:download` for bundle downloads and manifests. | ||||
| - **Tenant context:** Provide `X-Stella-Tenant` when the token carries multiple tenants; defaults to token tenant otherwise. | ||||
| - **Idempotency:** Mutating endpoints accept `Idempotency-Key` (UUID). Retrying with the same key returns the original result. | ||||
| - **Rate limits and quotas:** Responses include `X-Stella-Quota-Limit`, `X-Stella-Quota-Remaining`, and `X-Stella-Quota-Reset`. Exceeding quotas returns `429 Too Many Requests` with `ERR_EXPORT_QUOTA`. | ||||
| - **Content negotiation:** Requests and responses use `application/json; charset=utf-8` unless otherwise stated. Downloads stream binary content with profile-specific media types. | ||||
| - **SSE:** Event streams set `Content-Type: text/event-stream` and keep connections alive with comment heartbeats every 15 seconds. | ||||
|  | ||||
| ## 2. Error model | ||||
|  | ||||
| Errors follow standard HTTP codes with structured payloads: | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "code": "ERR_EXPORT_002", | ||||
|   "message": "Profile not found for tenant acme", | ||||
|   "details": [], | ||||
|   "traceId": "01J9N4Y4K2XY8C5V7T2S", | ||||
|   "timestamp": "2025-10-29T13:42:11Z" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| | Code | Description | Typical HTTP status | Notes | | ||||
| |------|-------------|---------------------|-------| | ||||
| | `ERR_EXPORT_001` | Validation failure (selectors, configuration) | 400 | `details` enumerates offending fields. | | ||||
| | `ERR_EXPORT_002` | Profile missing or not accessible for tenant | 404 | Returned on run submission or profile fetch. | | ||||
| | `ERR_EXPORT_003` | Concurrency or quota exceeded | 429 | Includes `retryAfterSeconds` in `details`. | | ||||
| | `ERR_EXPORT_004` | Adapter failure (schema mismatch, upstream outage) | 502 | Worker logs contain adapter error reason. | | ||||
| | `ERR_EXPORT_005` | Signing or KMS error | 500 | Run marked failed with `errorCode=signing`. | | ||||
| | `ERR_EXPORT_006` | Distribution failure (HTTP, OCI, object storage) | 502 | `details` lists failing distribution driver. | | ||||
| | `ERR_EXPORT_007` | Run canceled or expired | 409 | Includes cancel author and timestamp. | | ||||
| | `ERR_EXPORT_BASE_MISSING` | Base manifest for delta exports not found | 400 | Specific to `mirror:delta`. | | ||||
| | `ERR_EXPORT_EMPTY` | No records matched selectors (when `allowEmpty=false`) | 422 | Useful for guard-railled automation. | | ||||
| | `ERR_EXPORT_QUOTA` | Daily quota exhausted | 429 | Always paired with quota headers. | | ||||
|  | ||||
| All responses include `traceId` for correlation with logs and metrics. | ||||
|  | ||||
| ## 3. Profiles endpoints | ||||
|  | ||||
| ### 3.1 List profiles | ||||
|  | ||||
| ``` | ||||
| GET /api/export/profiles?kind=json&variant=raw&page=1&pageSize=20 | ||||
| Scopes: export:read | ||||
| ``` | ||||
|  | ||||
| Returns tenant-scoped profiles. Response headers: `X-Total-Count`, `Link` for pagination. | ||||
|  | ||||
| **Response** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "items": [ | ||||
|     { | ||||
|       "profileId": "prof-json-raw", | ||||
|       "name": "Daily JSON Raw", | ||||
|       "kind": "json", | ||||
|       "variant": "raw", | ||||
|       "distribution": ["http", "object"], | ||||
|       "retention": {"mode": "days", "value": 14}, | ||||
|       "createdAt": "2025-10-23T08:00:00Z", | ||||
|       "createdBy": "user:ops" | ||||
|     } | ||||
|   ], | ||||
|   "page": 1, | ||||
|   "pageSize": 20 | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 3.2 Get a profile | ||||
|  | ||||
| ``` | ||||
| GET /api/export/profiles/{profileId} | ||||
| Scopes: export:read | ||||
| ``` | ||||
|  | ||||
| Returns full configuration, including `config` payload, distribution options, and metadata. | ||||
|  | ||||
| ### 3.3 Create a profile | ||||
|  | ||||
| ``` | ||||
| POST /api/export/profiles | ||||
| Scopes: export:profile:manage | ||||
| ``` | ||||
|  | ||||
| **Request** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "profileId": "prof-airgap-mirror", | ||||
|   "name": "Airgap Mirror Weekly", | ||||
|   "kind": "mirror", | ||||
|   "variant": "full", | ||||
|   "include": ["advisories", "vex", "sboms", "policy"], | ||||
|   "distribution": ["http", "object"], | ||||
|   "encryption": { | ||||
|     "enabled": true, | ||||
|     "recipientKeys": ["age1tenantkey..."], | ||||
|     "strict": false | ||||
|   }, | ||||
|   "retention": {"mode": "days", "value": 30} | ||||
| } | ||||
| ``` | ||||
|  | ||||
| **Response 201** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "profileId": "prof-airgap-mirror", | ||||
|   "version": 1, | ||||
|   "createdAt": "2025-10-29T12:05:22Z", | ||||
|   "createdBy": "user:ops", | ||||
|   "status": "active" | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 3.4 Update profile metadata | ||||
|  | ||||
| ``` | ||||
| PATCH /api/export/profiles/{profileId} | ||||
| Scopes: export:profile:manage | ||||
| ``` | ||||
|  | ||||
| Allows renaming, toggling distribution switches, or updating retention. Structural configuration updates (kind/variant/include) create a new revision; the API returns `revisionCreated=true` and the new `profileId` (e.g., `prof-airgap-mirror@2`). | ||||
|  | ||||
| ### 3.5 Archive profile | ||||
|  | ||||
| ``` | ||||
| POST /api/export/profiles/{profileId}:archive | ||||
| Scopes: export:profile:manage | ||||
| ``` | ||||
|  | ||||
| Marks profile as inactive; existing runs remain accessible. Use `:restore` to reactivate. | ||||
|  | ||||
| ## 4. Run management | ||||
|  | ||||
| ### 4.1 Submit an export run | ||||
|  | ||||
| ``` | ||||
| POST /api/export/runs | ||||
| Scopes: export:run | ||||
| ``` | ||||
|  | ||||
| **Request** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "profileId": "prof-json-raw", | ||||
|   "selectors": { | ||||
|     "tenants": ["acme"], | ||||
|     "timeWindow": { | ||||
|       "from": "2025-10-01T00:00:00Z", | ||||
|       "to": "2025-10-29T00:00:00Z" | ||||
|     }, | ||||
|     "products": ["registry.example.com/app:*"], | ||||
|     "sboms": ["sbom:S-1001", "sbom:S-2004"] | ||||
|   }, | ||||
|   "policySnapshotId": "policy-snap-42", | ||||
|   "options": { | ||||
|     "allowEmpty": false, | ||||
|     "priority": "standard" | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| **Response 202** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "runId": "run-20251029-01", | ||||
|   "status": "pending", | ||||
|   "profileId": "prof-json-raw", | ||||
|   "createdAt": "2025-10-29T12:12:11Z", | ||||
|   "createdBy": "user:ops", | ||||
|   "selectors": { "...": "..." }, | ||||
|   "links": { | ||||
|     "self": "/api/export/runs/run-20251029-01", | ||||
|     "events": "/api/export/runs/run-20251029-01/events" | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 4.2 List runs | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs?status=active&profileId=prof-json-raw&page=1&pageSize=10 | ||||
| Scopes: export:read | ||||
| ``` | ||||
|  | ||||
| Returns latest runs with pagination. Each item includes summary counts, duration, and last event. | ||||
|  | ||||
| ### 4.3 Get run status | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs/{runId} | ||||
| Scopes: export:read | ||||
| ``` | ||||
|  | ||||
| Response fields: | ||||
|  | ||||
| | Field | Description | | ||||
| |-------|-------------| | ||||
| | `status` | `pending`, `running`, `success`, `failed`, `canceled`. | | ||||
| | `progress` | Object with `adapters`, `bytesWritten`, `recordsProcessed`. | | ||||
| | `errorCode` | Populated when `status=failed` (`signing`, `distribution`, etc). | | ||||
| | `policySnapshotId` | Returned for policy-aware profiles. | | ||||
| | `distributions` | List of available distribution descriptors (type, location, sha256, expiresAt). | | ||||
|  | ||||
| ### 4.4 Cancel a run | ||||
|  | ||||
| ``` | ||||
| POST /api/export/runs/{runId}:cancel | ||||
| Scopes: export:run | ||||
| ``` | ||||
|  | ||||
| Body optional (`{"reason": "Aborted due to incident INC-123"}`). Returns 202 and pushes `run.canceled` event. | ||||
|  | ||||
| ## 5. Events and telemetry | ||||
|  | ||||
| ### 5.1 Server-sent events | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs/{runId}/events | ||||
| Scopes: export:read | ||||
| Accept: text/event-stream | ||||
| ``` | ||||
|  | ||||
| Event payload example: | ||||
|  | ||||
| ``` | ||||
| event: run.progress | ||||
| data: {"runId":"run-20251029-01","phase":"adapter","adapter":"json","records":1024,"bytes":7340032,"timestamp":"2025-10-29T12:13:15Z"} | ||||
| ``` | ||||
|  | ||||
| Event types: | ||||
|  | ||||
| | Event | Meaning | | ||||
| |-------|---------| | ||||
| | `run.accepted` | Planner accepted job and queued with Orchestrator. | | ||||
| | `run.progress` | Periodic updates with phase, adapter, counts. | | ||||
| | `run.distribution` | Distribution driver finished (includes descriptor). | | ||||
| | `run.signed` | Signing completed successfully. | | ||||
| | `run.succeeded` | Run marked `success`. | | ||||
| | `run.failed` | Run failed; payload includes `errorCode`. | | ||||
| | `run.canceled` | Run canceled; includes `canceledBy`. | | ||||
|  | ||||
| SSE heartbeats (`: ping`) keep long-lived connections alive and should be ignored by clients. | ||||
|  | ||||
| ### 5.2 Audit events | ||||
|  | ||||
| `GET /api/export/runs/{runId}/events?format=audit` returns the same event stream in newline-delimited JSON for offline ingestion. | ||||
|  | ||||
| ## 6. Download endpoints | ||||
|  | ||||
| ### 6.1 Bundle download | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs/{runId}/download | ||||
| Scopes: export:download | ||||
| ``` | ||||
|  | ||||
| Streams the primary bundle (tarball, zip, or profile-specific layout). Headers: | ||||
|  | ||||
| - `Content-Disposition: attachment; filename="export-run-20251029-01.tar.zst"` | ||||
| - `X-Export-Digest: sha256:...` | ||||
| - `X-Export-Size: 73482019` | ||||
| - `X-Export-Encryption: age` (when mirror encryption enabled) | ||||
|  | ||||
| Supports HTTP range requests for resume functionality. If no bundle exists yet, responds `409` with `ERR_EXPORT_007`. | ||||
|  | ||||
| ### 6.2 Manifest download | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs/{runId}/manifest | ||||
| Scopes: export:download | ||||
| ``` | ||||
|  | ||||
| Returns signed `export.json`. To fetch the detached signature, append `?signature=true`. | ||||
|  | ||||
| ### 6.3 Provenance download | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs/{runId}/provenance | ||||
| Scopes: export:download | ||||
| ``` | ||||
|  | ||||
| Returns signed `provenance.json`. Supports `?signature=true`. Provenance includes attestation subject digests, policy snapshot ids, adapter versions, and KMS key identifiers. | ||||
|  | ||||
| ### 6.4 Distribution descriptors | ||||
|  | ||||
| ``` | ||||
| GET /api/export/runs/{runId}/distributions | ||||
| Scopes: export:read | ||||
| ``` | ||||
|  | ||||
| Lists all registered distribution targets (HTTP, OCI, object storage). Each item includes `type`, `location`, `sha256`, `sizeBytes`, and `expiresAt`. | ||||
|  | ||||
| ## 7. Webhook hand-off | ||||
|  | ||||
| Exports can notify external systems once a run succeeds by registering an HTTP webhook: | ||||
|  | ||||
| ``` | ||||
| POST /api/export/webhooks | ||||
| Scopes: export:profile:manage | ||||
| ``` | ||||
|  | ||||
| Payload includes `targetUrl`, `events` (e.g., `run.succeeded`), and optional secret for HMAC signatures. Webhook deliveries sign payloads with `X-Stella-Signature` header (`sha256=...`). Retries follow exponential backoff with dead-letter capture in `export_events`. | ||||
|  | ||||
| ## 8. Observability | ||||
|  | ||||
| - **Metrics endpoint:** `/metrics` (service-local) exposes Prometheus metrics listed in [Architecture](architecture.md#observability). | ||||
| - **Tracing:** When `traceparent` header is provided, worker spans join the calling trace. | ||||
| - **Run lookup by trace:** Use `GET /api/export/runs?traceId={id}` when troubleshooting distributed traces. | ||||
|  | ||||
| ## 9. Related documentation | ||||
|  | ||||
| - [Export Center Overview](overview.md) | ||||
| - [Export Center Architecture](architecture.md) | ||||
| - [Export Center Profiles](profiles.md) | ||||
| - [Export Center CLI Guide](cli.md) *(companion document)* | ||||
| - [Aggregation-Only Contract reference](../../ingestion/aggregation-only-contract.md) | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										127
									
								
								docs/modules/export-center/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										127
									
								
								docs/modules/export-center/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,127 @@ | ||||
| # Export Center Architecture | ||||
|  | ||||
| > Derived from Epic 10 – Export Center and the subsequent export adapter deep dives. | ||||
|  | ||||
| The Export Center is the dedicated service layer that packages StellaOps evidence and policy overlays into reproducible bundles. It runs as a multi-surface API backed by asynchronous workers and format adapters, enforcing Aggregation-Only Contract (AOC) guardrails while providing deterministic manifests, signing, and distribution paths. | ||||
|  | ||||
| ## Runtime topology | ||||
| - **Export Center API (`StellaOps.ExportCenter.WebService`).** Receives profile CRUD, export run requests, status queries, and download streams through the unified Web API gateway. Enforces tenant scopes, RBAC, quotas, and concurrency guards. | ||||
| - **Export Center Worker (`StellaOps.ExportCenter.Worker`).** Dequeues export jobs from the Orchestrator, resolves selectors, invokes adapters, and writes manifests and bundle artefacts. Stateless; scales horizontally. | ||||
| - **Backing stores.** | ||||
|   - MongoDB collections: `export_profiles`, `export_runs`, `export_inputs`, `export_distributions`, `export_events`. | ||||
|   - Object storage bucket or filesystem for staging bundle payloads. | ||||
|   - Optional registry/object storage credentials injected via Authority-scoped secrets. | ||||
| - **Integration peers.** | ||||
|   - **Findings Ledger** for advisory, VEX, SBOM payload streaming. | ||||
|   - **Policy Engine** for deterministic policy snapshots and evaluated findings. | ||||
|   - **Orchestrator** for job scheduling, quotas, and telemetry fan-out. | ||||
|   - **Authority** for tenant-aware access tokens and KMS key references. | ||||
|   - **Console & CLI** as presentation surfaces consuming the API. | ||||
|  | ||||
| ## Job lifecycle | ||||
| 1. **Profile selection.** Operator or automation picks a profile (`json:raw`, `json:policy`, `trivy:db`, `trivy:java-db`, `mirror:full`, `mirror:delta`) and submits scope selectors (tenant, time window, products, SBOM subjects, ecosystems). See `docs/modules/export-center/profiles.md` for profile definitions and configuration fields. | ||||
| 2. **Planner resolution.** API validates selectors, expands include/exclude lists, and writes a pending `export_run` with immutable parameters and deterministic ordering hints. | ||||
| 3. **Orchestrator dispatch.** `export_run` triggers a job lease via Orchestrator with quotas per tenant/profile and concurrency caps (default 4 active per tenant). | ||||
| 4. **Worker execution.** Worker streams data from Findings Ledger and Policy Engine using pagination cursors. Adapters write canonical payloads to staging storage, compute checksums, and emit streaming progress events (SSE). | ||||
| 5. **Manifest and provenance emission.** Worker writes `export.json` and `provenance.json`, signs them with configured KMS keys (cosign-compatible), and uploads signatures alongside content. | ||||
| 6. **Distribution registration.** Worker records available distribution methods (download URL, OCI reference, object storage path), raises completion/failure events, and exposes metrics/logs. | ||||
| 7. **Download & verification.** Clients download bundles or pull OCI artefacts, verify signatures, and consume provenance to trace source artefacts. | ||||
|  | ||||
| Cancellation requests mark runs as `aborted` and cause workers to stop iterating sources; partially written files are destroyed and the run is marked with an audit entry. | ||||
|  | ||||
| ## Core components | ||||
| ### API surface | ||||
| - Detailed request and response payloads are catalogued in `docs/modules/export-center/api.md`. | ||||
| - **Profiles API.** | ||||
|   - `GET /api/export/profiles`: list tenant-scoped profiles. | ||||
|   - `POST /api/export/profiles`: create custom profiles (variants of JSON, Trivy, mirror) with validated configuration schema. | ||||
|   - `PATCH /api/export/profiles/{id}`: update metadata; config changes clone new revision to preserve determinism. | ||||
| - **Runs API.** | ||||
|   - `POST /api/export/runs`: submit export run for a profile with selectors and options (policy snapshot id, mirror base manifest). | ||||
|   - `GET /api/export/runs/{id}`: status, progress counters, provenance summary. | ||||
|   - `GET /api/export/runs/{id}/events`: server-sent events with state transitions, adapter milestones, signing status. | ||||
|   - `POST /api/export/runs/{id}/cancel`: cooperative cancellation with audit logging. | ||||
| - **Downloads API.** | ||||
|   - `GET /api/export/runs/{id}/download`: streaming download with range support and checksum trailers. | ||||
|   - `GET /api/export/runs/{id}/manifest`: signed `export.json`. | ||||
|   - `GET /api/export/runs/{id}/provenance`: signed `provenance.json`. | ||||
|  | ||||
| All endpoints require Authority-issued JWT + DPoP tokens with scopes `export:run`, `export:read`, and tenant claim alignment. Rate-limiting and quotas surface via `X-Stella-Quota-*` headers. | ||||
|  | ||||
| ### Worker pipeline | ||||
| - **Input resolvers.** Query Findings Ledger and Policy Engine using stable pagination (Mongo `_id` ascending, or resume tokens for change streams). Selector expressions compile into Mongo filter fragments and/or API query parameters. | ||||
| - **Adapter host.** Adapter plugin loader (restart-time only) resolves profile variant to adapter implementation. Adapters present a deterministic `RunAsync(context)` contract with streaming writers and telemetry instrumentation. | ||||
| - **Content writers.** | ||||
|   - JSON adapters emit `.jsonl.zst` files with canonical ordering (tenant, subject, document id). | ||||
|   - Trivy adapters materialise SQLite databases or tar archives matching Trivy DB expectations; schema version gates prevent unsupported outputs. | ||||
|   - Mirror adapters assemble deterministic filesystem trees (manifests, indexes, payload subtrees) and, when configured, OCI artefact layers. | ||||
| - **Manifest generator.** Aggregates counts, bytes, hash digests (SHA-256), profile metadata, and input references. Writes `export.json` and `provenance.json` using canonical JSON (sorted keys, RFC3339 UTC timestamps). | ||||
| - **Signing service.** Integrates with platform KMS via Authority (default cosign signer). Produces in-toto SLSA attestations when configured. Supports detached signatures and optional in-bundle signatures. | ||||
| - **Distribution drivers.** `dist-http` exposes staged files via download endpoint; `dist-oci` pushes artefacts to registries using ORAS with digest pinning; `dist-objstore` uploads to tenant-specific prefixes with immutability flags. | ||||
|  | ||||
| ## Data model snapshots | ||||
|  | ||||
| | Collection | Purpose | Key fields | Notes | | ||||
| |------------|---------|------------|-------| | ||||
| | `export_profiles` | Profile definitions (kind, variant, config). | `_id`, `tenant`, `name`, `kind`, `variant`, `config_json`, `created_by`, `created_at`. | Config includes adapter parameters (included record types, compression, encryption). | | ||||
| | `export_runs` | Run state machine and audit info. | `_id`, `profile_id`, `tenant`, `status`, `requested_by`, `selectors`, `policy_snapshot_id`, `started_at`, `completed_at`, `duration_ms`, `error_code`. | Immutable selectors; status transitions recorded in `export_events`. | | ||||
| | `export_inputs` | Resolved input ranges. | `run_id`, `source`, `cursor`, `count`, `hash`. | Enables resumable retries and audit. | | ||||
| | `export_distributions` | Distribution artefacts. | `run_id`, `type` (`http`, `oci`, `object`), `location`, `sha256`, `size_bytes`, `expires_at`. | `expires_at` used for retention policies and automatic pruning. | | ||||
| | `export_events` | Timeline of state transitions and metrics. | `run_id`, `event_type`, `message`, `at`, `metrics`. | Feeds SSE stream and audit trails. | | ||||
|  | ||||
| ## Adapter responsibilities | ||||
| - **JSON (`json:raw`, `json:policy`).** | ||||
|   - Ensures canonical casing, timezone normalization, and linkset preservation. | ||||
|   - Policy variant embeds policy snapshot metadata (`policy_version`, `inputs_hash`, `decision_trace` fingerprint) and emits evaluated findings as separate files. | ||||
|   - Enforces AOC guardrails: no derived modifications to raw evidence fields. | ||||
| - **Trivy (`trivy:db`, `trivy:java-db`).** | ||||
|   - Maps StellaOps advisory schema to Trivy DB format, handling namespace collisions and ecosystem-specific ranges. | ||||
|   - Validates compatibility against supported Trivy schema versions; run fails fast if mismatch. | ||||
|   - Emits optional manifest summarising package counts and severity distribution. | ||||
| - **Mirror (`mirror:full`, `mirror:delta`).** | ||||
|   - Builds self-contained filesystem layout (`/manifests`, `/data/raw`, `/data/policy`, `/indexes`). | ||||
|   - Delta variant compares against base manifest (`base_export_id`) to write only changed artefacts; records `removed` entries for cleanup. | ||||
|   - Supports optional encryption of `/data` subtree (age/AES-GCM) with key wrapping stored in `provenance.json`. | ||||
|  | ||||
| Adapters expose structured telemetry events (`adapter.start`, `adapter.chunk`, `adapter.complete`) with record counts and byte totals per chunk. Failures emit `adapter.error` with reason codes. | ||||
|  | ||||
| ## Signing and provenance | ||||
| - **Manifest schema.** `export.json` contains run metadata, profile descriptor, selector summary, counts, SHA-256 digests, compression hints, and distribution list. Deterministic field ordering and normalized timestamps. | ||||
| - **Provenance schema.** `provenance.json` captures in-toto subject listing (bundle digest, manifest digest), referenced inputs (findings ledger queries, policy snapshot ids, SBOM identifiers), tool version (`exporter_version`, adapter versions), and KMS key identifiers. | ||||
| - **Attestation.** Cosign SLSA Level 2 template by default; optional SLSA Level 3 when supply chain attestations are enabled. Detached signatures stored alongside manifests; CLI/Console encourage `cosign verify --key <tenant-key>` workflow. | ||||
| - **Audit trail.** Each run stores success/failure status, signature identifiers, and verification hints for downstream automation (CI pipelines, offline verification scripts). | ||||
|  | ||||
| ## Distribution flows | ||||
| - **HTTP download.** Console and CLI stream bundles via chunked transfer; supports range requests and resumable downloads. Response includes `X-Export-Digest`, `X-Export-Length`, and optional encryption metadata. | ||||
| - **OCI push.** Worker uses ORAS to publish bundles as OCI artefacts with annotations describing profile, tenant, manifest digest, and provenance reference. Supports multi-tenant registries with `repository-per-tenant` naming. | ||||
| - **Object storage.** Writes to tenant-prefixed paths (`s3://stella-exports/{tenant}/{run-id}/...`) with immutable retention policies. Retention scheduler purges expired runs based on profile configuration. | ||||
| - **Offline Kit seeding.** Mirror bundles optionally staged into Offline Kit assembly pipelines, inheriting the same manifests and signatures. | ||||
|  | ||||
| ## Observability | ||||
| - **Metrics.** Emits `exporter_run_duration_seconds`, `exporter_run_bytes_total{profile}`, `exporter_run_failures_total{error_code}`, `exporter_active_runs{tenant}`, `exporter_distribution_push_seconds{type}`. | ||||
| - **Logs.** Structured logs with fields `run_id`, `tenant`, `profile_kind`, `adapter`, `phase`, `correlation_id`, `error_code`. Phases include `plan`, `resolve`, `adapter`, `manifest`, `sign`, `distribute`. | ||||
| - **Traces.** Optional OpenTelemetry spans (`export.plan`, `export.fetch`, `export.write`, `export.sign`, `export.distribute`) for cross-service correlation. | ||||
| - **Dashboards & alerts.** DevOps pipeline seeds Grafana dashboards summarising throughput, size, failure ratios, and distribution latency. Alert thresholds: failure rate >5% per profile, median run duration >p95 baseline, signature verification failures >0. | ||||
|  | ||||
| ## Security posture | ||||
| - Tenant claim enforced at every query and distribution path; cross-tenant selectors rejected unless explicit cross-tenant mirror feature toggled with signed approval. | ||||
| - RBAC scopes: `export:profile:manage`, `export:run`, `export:read`, `export:download`. Console hides actions without scope; CLI returns `401/403`. | ||||
| - Encryption options configurable per profile; keys derived from Authority-managed KMS. Mirror encryption uses tenant-specific recipients; JSON/Trivy rely on transport security plus optional encryption at rest. | ||||
| - Restart-only plugin loading ensures adapters and distribution drivers are vetted at deployment time, reducing runtime injection risks. | ||||
| - Deterministic output ensures tamper detection via content hashes; provenance links to source runs and policy snapshots to maintain auditability. | ||||
|  | ||||
| ## Deployment considerations | ||||
| - Packaged as separate API and worker containers. Helm chart and compose overlays define horizontal scaling, worker concurrency, queue leases, and object storage credentials. | ||||
| - Requires Authority client credentials for KMS and optional registry credentials stored via sealed secrets. | ||||
| - Offline-first deployments disable OCI distribution by default and provide local object storage endpoints; HTTP downloads served via internal gateway. | ||||
| - Health endpoints: `/health/ready` validates Mongo connectivity, object storage access, adapter registry integrity, and KMS signer readiness. | ||||
|  | ||||
| ## Compliance checklist | ||||
| - [ ] Profiles and runs enforce tenant scoping; cross-tenant exports disabled unless approved. | ||||
| - [ ] Manifests and provenance files are generated with deterministic hashes and signed via configured KMS. | ||||
| - [ ] Adapters run with restart-time registration only; no runtime plugin loading. | ||||
| - [ ] Distribution drivers respect allowlist; OCI push disabled when offline mode is active. | ||||
| - [ ] Metrics, logs, and traces follow observability guidelines; dashboards and alerts configured. | ||||
| - [ ] Retention policies and pruning jobs configured for staged bundles. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										231
									
								
								docs/modules/export-center/cli.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										231
									
								
								docs/modules/export-center/cli.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,231 @@ | ||||
| # Stella CLI - Export Center Commands | ||||
|  | ||||
| > **Audience:** Operators, release engineers, and CI maintainers using the `stella` CLI to manage Export Center profiles and runs.   | ||||
| > **Supported from:** `stella` CLI >= 0.22.0 (Export Center Phase 1).   | ||||
| > **Prerequisites:** Authority token with the scopes noted per command (`export:profile:manage`, `export:run`, `export:read`, `export:download`). | ||||
|  | ||||
| Use this guide with the [Export Center API reference](api.md) and [Profiles](profiles.md) catalogue. The CLI wraps the same REST endpoints, preserving deterministic behaviour and guardrails. | ||||
|  | ||||
| > Status: CLI support is tracked under `CLI-EXPORT-35-001` and `CLI-EXPORT-36-001`. The current CLI build does not yet surface these commands; treat this guide as the target contract and adjust once implementations merge. | ||||
|  | ||||
| ## 1. Global options and configuration | ||||
|  | ||||
| | Flag | Default | Description | | ||||
| |------|---------|-------------| | ||||
| | `--server <url>` | `https://stella.local` | Gateway root. Matches `STELLA_SERVER`. | | ||||
| | `--tenant <id>` | Token tenant | Override tenant for multi-tenant tokens. | | ||||
| | `--profile <name>` | none | Loads saved defaults from `~/.stella/profiles/<name>.toml`. | | ||||
| | `--output <file>` | stdout | Redirect full JSON response. | | ||||
| | `--format <table|json|yaml>` | `table` on TTY | Controls table formatting for list commands. | | ||||
| | `--trace` | false | Emit request timing and correlation ids. | | ||||
|  | ||||
| Environment variables: `STELLA_TOKEN`, `STELLA_SERVER`, `STELLA_TENANT`, `STELLA_PROFILE`. | ||||
|  | ||||
| Exit codes align with API error codes (see section 6). | ||||
|  | ||||
| ## 2. Profile management commands | ||||
|  | ||||
| ### 2.1 `stella export profile list` | ||||
|  | ||||
| List profiles for the current tenant. | ||||
|  | ||||
| ``` | ||||
| stella export profile list --kind json --variant raw --format table | ||||
| ``` | ||||
|  | ||||
| Outputs columns `PROFILE`, `KIND`, `VARIANT`, `DISTRIBUTION`, `RETENTION`. Use `--format json` for automation. | ||||
|  | ||||
| ### 2.2 `stella export profile show` | ||||
|  | ||||
| ``` | ||||
| stella export profile show prof-json-raw --output profile.json | ||||
| ``` | ||||
|  | ||||
| Fetches full configuration and writes it to file. | ||||
|  | ||||
| ### 2.3 `stella export profile create` | ||||
|  | ||||
| ``` | ||||
| stella export profile create --file profiles/prof-json-raw.json | ||||
| ``` | ||||
|  | ||||
| JSON schema matches `POST /api/export/profiles`. CLI validates against built-in schema before submission. Requires `export:profile:manage`. | ||||
|  | ||||
| ### 2.4 `stella export profile update` | ||||
|  | ||||
| ``` | ||||
| stella export profile update prof-json-raw \ | ||||
|   --retention "days:21" \ | ||||
|   --distribution http,object | ||||
| ``` | ||||
|  | ||||
| Supports toggling retention, adding/removing distribution targets, and renaming. Structural changes (kind, variant, include set) require editing the JSON and using `--replace-file` to create a new revision. | ||||
|  | ||||
| ### 2.5 `stella export profile archive` | ||||
|  | ||||
| ``` | ||||
| stella export profile archive prof-json-raw --reason "Superseded by Phase 2 profile" | ||||
| ``` | ||||
|  | ||||
| Marks the profile inactive. Use `stella export profile restore` to re-activate. | ||||
|  | ||||
| ## 3. Run lifecycle commands | ||||
|  | ||||
| ### 3.1 `stella export run submit` | ||||
|  | ||||
| ``` | ||||
| stella export run submit prof-json-raw \ | ||||
|   --selector tenant=acme \ | ||||
|   --selector product=registry.example.com/app:* \ | ||||
|   --selector time=2025-10-01T00:00:00Z,2025-10-29T00:00:00Z \ | ||||
|   --policy-snapshot policy-snap-42 \ | ||||
|   --allow-empty=false | ||||
| ``` | ||||
|  | ||||
| Selectors accept `key=value` pairs; use `time=<from>,<to>` for windows. The command prints the `runId` and initial status. | ||||
|  | ||||
| ### 3.2 `stella export run ls` | ||||
|  | ||||
| ``` | ||||
| stella export run ls --profile prof-json-raw --status active --tail 5 | ||||
| ``` | ||||
|  | ||||
| Shows recent runs with columns `RUN`, `PROFILE`, `STATUS`, `PROGRESS`, `UPDATED`. | ||||
|  | ||||
| ### 3.3 `stella export run show` | ||||
|  | ||||
| ``` | ||||
| stella export run show run-20251029-01 --format json | ||||
| ``` | ||||
|  | ||||
| Outputs full metadata, progress counters, distribution descriptors, and links. | ||||
|  | ||||
| ### 3.4 `stella export run watch` | ||||
|  | ||||
| ``` | ||||
| stella export run watch run-20251029-01 --follow | ||||
| ``` | ||||
|  | ||||
| Streams server-sent events and renders a live progress bar. `--json` prints raw events for scripting. | ||||
|  | ||||
| ### 3.5 `stella export run cancel` | ||||
|  | ||||
| ``` | ||||
| stella export run cancel run-20251029-01 --reason "Replacing with refined selectors" | ||||
| ``` | ||||
|  | ||||
| Gracefully cancels the run; exit code `0` indicates cancellation request accepted. | ||||
|  | ||||
| ## 4. Download and verification commands | ||||
|  | ||||
| ### 4.1 `stella export download` | ||||
|  | ||||
| ``` | ||||
| stella export download run-20251029-01 \ | ||||
|   --output out/exports/run-20251029-01.tar.zst \ | ||||
|   --resume | ||||
| ``` | ||||
|  | ||||
| Downloads the primary bundle. `--resume` enables HTTP range requests; the CLI checkpoints progress to `.part` files. | ||||
|  | ||||
| ### 4.2 `stella export manifest` | ||||
|  | ||||
| ``` | ||||
| stella export manifest run-20251029-01 --output manifests/export.json | ||||
| ``` | ||||
|  | ||||
| Fetches the signed manifest. Use `--signature manifests/export.json.sig` to save the detached signature. | ||||
|  | ||||
| ### 4.3 `stella export provenance` | ||||
|  | ||||
| ``` | ||||
| stella export provenance run-20251029-01 --output manifests/provenance.json | ||||
| ``` | ||||
|  | ||||
| Retrieves the signed provenance file. `--signature` behaves like the manifest command. | ||||
|  | ||||
| ### 4.4 `stella export verify` | ||||
|  | ||||
| ``` | ||||
| stella export verify run-20251029-01 \ | ||||
|   --manifest manifests/export.json \ | ||||
|   --provenance manifests/provenance.json \ | ||||
|   --key keys/acme-export.pub | ||||
| ``` | ||||
|  | ||||
| Wrapper around `cosign verify`. Returns exit `0` when signatures and digests validate. Exit `20` when verification fails. | ||||
|  | ||||
| ## 5. CI recipe (GitHub Actions example) | ||||
|  | ||||
| ```yaml | ||||
| name: Export Center Bundle | ||||
| on: | ||||
|   workflow_dispatch: | ||||
| jobs: | ||||
|   export: | ||||
|     runs-on: ubuntu-latest | ||||
|     steps: | ||||
|       - uses: actions/checkout@v4 | ||||
|       - name: Install Stella CLI | ||||
|         run: curl -sSfL https://downloads.stellaops.org/cli/install.sh | sh | ||||
|       - name: Submit export run | ||||
|         env: | ||||
|           STELLA_TOKEN: ${{ secrets.STELLA_TOKEN }} | ||||
|         run: | | ||||
|           run_id=$(stella export run submit prof-json-raw \ | ||||
|             --selector tenant=acme \ | ||||
|             --selector product=registry.example.com/app:* \ | ||||
|             --allow-empty=false \ | ||||
|             --format json | jq -r '.runId') | ||||
|           echo "RUN_ID=$run_id" >> $GITHUB_ENV | ||||
|       - name: Wait for completion | ||||
|         env: | ||||
|           STELLA_TOKEN: ${{ secrets.STELLA_TOKEN }} | ||||
|         run: | | ||||
|           stella export run watch "$RUN_ID" --json \ | ||||
|             | tee artifacts/run.log \ | ||||
|             | jq -e 'select(.event == "run.succeeded")' > /dev/null | ||||
|       - name: Download bundle | ||||
|         env: | ||||
|           STELLA_TOKEN: ${{ secrets.STELLA_TOKEN }} | ||||
|         run: | | ||||
|           stella export download "$RUN_ID" --output artifacts/export.tar.zst --resume | ||||
|           stella export manifest "$RUN_ID" --output artifacts/export.json --signature artifacts/export.json.sig | ||||
|           stella export provenance "$RUN_ID" --output artifacts/provenance.json --signature artifacts/provenance.json.sig | ||||
|       - name: Verify signatures | ||||
|         run: | | ||||
|           stella export verify "$RUN_ID" \ | ||||
|             --manifest artifacts/export.json \ | ||||
|             --provenance artifacts/provenance.json \ | ||||
|             --key keys/acme-export.pub | ||||
| ``` | ||||
|  | ||||
| ## 6. Exit codes | ||||
|  | ||||
| | Code | Meaning | | ||||
| |------|---------| | ||||
| | `0` | Command succeeded. | | ||||
| | `10` | Validation error (`ERR_EXPORT_001`). | | ||||
| | `11` | Profile missing or inaccessible (`ERR_EXPORT_002`). | | ||||
| | `12` | Quota or concurrency exceeded (`ERR_EXPORT_003` or `ERR_EXPORT_QUOTA`). | | ||||
| | `13` | Run failed due to adapter/signing/distribution error. | | ||||
| | `20` | Verification failure (`stella export verify`). | | ||||
| | `21` | Download incomplete after retries (network errors). | | ||||
| | `30` | CLI configuration error (missing token, invalid profile file). | | ||||
|  | ||||
| Exit codes above 100 are reserved for future profile-specific tooling. | ||||
|  | ||||
| ## 7. Offline usage notes | ||||
|  | ||||
| - Use profiles that enable `object` distribution with local object storage endpoints. CLI reads `STELLA_EXPORT_OBJECT_ENDPOINT` when provided (falls back to gateway). | ||||
| - Mirror bundles work offline by skipping OCI distribution. CLI adds `--offline` to bypass OCI checks. | ||||
| - `stella export verify` works fully offline when provided with tenant public keys (packaged in Offline Kit). | ||||
|  | ||||
| ## 8. Related documentation | ||||
|  | ||||
| - [Export Center Profiles](profiles.md) | ||||
| - [Export Center API reference](api.md) | ||||
| - [Export Center Architecture](architecture.md) | ||||
| - [Aggregation-Only Contract reference](../../ingestion/aggregation-only-contract.md) | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										66
									
								
								docs/modules/export-center/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										66
									
								
								docs/modules/export-center/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,66 @@ | ||||
| # Implementation plan — Export Center | ||||
|  | ||||
| ## Delivery phases | ||||
| - **Phase 1 – JSON & mirror foundations**   | ||||
|   Stand up the Export Center service + worker, deliver canonical JSON (`json:raw`, `json:policy`) and `mirror:full` profiles as download-only bundles, seed schema migrations, and publish manifest/provenance formats. | ||||
| - **Phase 2 – Trivy adapters & distribution**   | ||||
|   Implement Trivy DB / Java DB adapters, wire OCI/object storage distribution paths, and expose policy snapshot embedding + verification tooling. | ||||
| - **Phase 3 – Delta, encryption, scheduling**   | ||||
|   Release mirror deltas, bundle encryption, advanced scheduling/automation, resumable downloads, and CLI/Console verification workflows. | ||||
|  | ||||
| ## Component work breakdown | ||||
| - **Service & worker** | ||||
|   - Define migrations for `export_profiles`, `export_runs`, `export_inputs`, `export_distributions`. | ||||
|   - Implement planner, adapter host, signing/attestation layer, distribution engines, and deterministic manifests. | ||||
|   - Enforce tenant quotas, concurrency controls, and audit logging for create/cancel/distribute events. | ||||
| - **Adapters** | ||||
|   - JSON adapters: canonical JSONL writers, redaction guardrails, compression (zstd). | ||||
|   - Trivy adapters: field mapping, schema compatibility gating, validation suite. | ||||
|   - Mirror adapters: filesystem/OCI layout, delta computation, optional encryption with manifest updates. | ||||
| - **Integrations** | ||||
|   - Findings Ledger streaming APIs for advisories, VEX, SBOMs, findings. | ||||
|   - Policy Engine deterministic snapshot endpoint; VEX Lens consensus snapshot. | ||||
|   - Export Center telemetry surfaced through Observability stack. | ||||
| - **Surfaces** | ||||
|   - Console: profiles CRUD, run wizard, run detail + verification panel, distribution dashboards. | ||||
|   - CLI: `stella export profile|run|download|verify` with resumable downloads and signature verification. | ||||
| - **Security / RBAC** | ||||
|   - Scope enforcement per tenant, role matrix coverage, encryption key rotation tests, redaction filters. | ||||
| - **Docs & ops** | ||||
|   - Author module dossier (overview, architecture, profiles, API, CLI, mirror bundles, Trivy adapter, provenance & signing). | ||||
|   - Produce runbooks (`docs/operations/export-runbook.md`) and hardening guidance (`docs/security/export-hardening.md`). | ||||
|  | ||||
| ## Documentation deliverables | ||||
| - `docs/modules/export-center/overview.md` — responsibilities, profiles, surfaces. | ||||
| - `docs/modules/export-center/architecture.md` — service topology, adapters, manifests, distribution flow. | ||||
| - `docs/modules/export-center/profiles.md`, `trivy-adapter.md`, `mirror-bundles.md`, `provenance-and-signing.md`, `api.md`, `cli.md` — keep aligned with shipped features. | ||||
| - Cross-link Orchestrator, Policy, VEX Lens, CLI, and Offline Kit docs whenever exports become dependencies. | ||||
|  | ||||
| ## Acceptance criteria | ||||
| - Operators can create, monitor, and download an export; `cosign verify` (and CLI verify) succeeds against manifest + provenance, mapping back to source artifacts. | ||||
| - Trivy bundles import cleanly into Trivy across supported versions; mirror bundles run in Offline Kit reference environment (full + delta). | ||||
| - Policy snapshot runs reproduce deterministic decisions and include embedded `policyVersion` + `inputsHash`. | ||||
| - Tenant scoping and RBAC block unauthorized actions; encryption-enabled bundles lock data to recipient keys. | ||||
| - Metrics (`exporter_run_duration_seconds`, `exporter_bundle_bytes_total`, `exporter_run_failures_total`) and dashboards reflect live runs; alerts trigger on sustained failure rates. | ||||
| - Retried runs remain idempotent: manifests, hashes, and distribution artefacts match across identical inputs. | ||||
|  | ||||
| ## Risks & mitigations | ||||
| - **Schema drift (Trivy / policy):** versioned adapters with compatibility gates, CI integration tests, fail-fast with actionable errors. | ||||
| - **Bundle bloat:** zstd compression, sharding, delta exports, OCI dedupe. | ||||
| - **Data leakage:** strict schema allowlists, tenancy filters, redaction enforcement, encryption options. | ||||
| - **Non-determinism:** embed policy snapshots, enforce deterministic ordering, include content hashes in manifest. | ||||
| - **Operational slowness:** streaming downloads with range support, resumable CLI, concurrency limits, retry policies for workers. | ||||
|  | ||||
| ## Test strategy | ||||
| - **Unit:** adapter mapping, manifest hashing, signing/attestation, delta computation, encryption round-trips. | ||||
| - **Integration:** end-to-end runs for every profile, verification workflows, OCI push/pull, resume/abort scenarios. | ||||
| - **Compatibility:** matrix tests for Trivy versions, mirror bundle import in Offline Kit sample environment. | ||||
| - **Security:** tenant fuzzing, RBAC coverage, redaction/PII filters, key rotation. | ||||
| - **Performance & chaos:** stress exports with large datasets, simulate worker/API failures mid-run, confirm deterministic recovery. | ||||
|  | ||||
| ## Definition of done | ||||
| - Service, worker, and adapters deployed with telemetry & alerting. | ||||
| - CLI & Console workflows published, Offline Kit instructions updated. | ||||
| - Documentation set listed above refreshed; imposed rule statements appended where required. | ||||
| - CI pipelines include schema validation, profile verification, and determinism checks. | ||||
| - ./TASKS.md + ../../TASKS.md reflect current status for in-flight stories. | ||||
							
								
								
									
										202
									
								
								docs/modules/export-center/mirror-bundles.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										202
									
								
								docs/modules/export-center/mirror-bundles.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,202 @@ | ||||
| # Export Center Mirror Bundles | ||||
|  | ||||
| Mirror bundles package StellaOps evidence, policy overlays, and indexes for air-gapped or bandwidth-constrained environments. They are produced by the `mirror:full` and `mirror:delta` profiles described in Epic 10 (Export Center) and implemented across Sprints 35-37 (`EXPORT-SVC-35-004`, `EXPORT-SVC-37-001`, `EXPORT-SVC-37-002`). This guide details bundle layouts, delta mechanics, encryption workflow, import procedures, and operational best practices. | ||||
|  | ||||
| > Export Center workers are being wired while this document is written. Treat the content as the target contract for adapter development and update specifics as the implementation lands. | ||||
|  | ||||
| ## 1. Bundle overview | ||||
|  | ||||
| | Profile | Contents | Typical use cases | Distribution | | ||||
| |---------|----------|-------------------|--------------| | ||||
| | `mirror:full` | Complete snapshot of raw evidence, normalized records, indexes, policy snapshots, provenance, signatures. | Initial seeding of an air-gapped mirror, disaster recovery drills. | Download bundle, optional OCI artifact. | | ||||
| | `mirror:delta` | Changes since a specified base export: added/updated/removed advisories, VEX statements, SBOMs, indexes, manifests. | Incremental updates, bandwidth reduction, nightly refreshes. | Download bundle, optional OCI artifact. | | ||||
|  | ||||
| Both profiles respect AOC boundaries: raw ingestion data remains untouched, and policy outputs live under their own directory with explicit provenance. | ||||
|  | ||||
| ## 2. Filesystem layout | ||||
|  | ||||
| Directory structure inside the extracted bundle: | ||||
|  | ||||
| ``` | ||||
| mirror/ | ||||
|   manifest.yaml | ||||
|   export.json | ||||
|   provenance.json | ||||
|   README.md | ||||
|   indexes/ | ||||
|     advisories.index.json | ||||
|     vex.index.json | ||||
|     sbom.index.json | ||||
|     findings.index.json | ||||
|   data/ | ||||
|     raw/ | ||||
|       advisories/*.jsonl.zst | ||||
|       vex/*.jsonl.zst | ||||
|       sboms/<subject>/sbom.json | ||||
|     normalized/ | ||||
|       advisories/*.jsonl.zst | ||||
|       vex/*.jsonl.zst | ||||
|     policy/ | ||||
|       snapshot.json | ||||
|       evaluations.jsonl.zst | ||||
|     consensus/ | ||||
|       vex_consensus.jsonl.zst | ||||
|   signatures/ | ||||
|     export.sig | ||||
|     manifest.sig | ||||
| ``` | ||||
|  | ||||
| `manifest.yaml` summarises profile metadata, selectors, counts, sizes, and SHA-256 digests. `export.json` and `provenance.json` mirror the JSON profile manifests and are signed using the configured KMS key. | ||||
|  | ||||
| Example `manifest.yaml`: | ||||
|  | ||||
| ```yaml | ||||
| profile: mirror:full | ||||
| runId: run-20251029-01 | ||||
| tenant: acme | ||||
| selectors: | ||||
|   products: | ||||
|     - registry.example.com/app:* | ||||
|   timeWindow: | ||||
|     from: 2025-10-01T00:00:00Z | ||||
|     to: 2025-10-29T00:00:00Z | ||||
| counts: | ||||
|   advisories: 15234 | ||||
|   vex: 3045 | ||||
|   sboms: 872 | ||||
|   policyEvaluations: 19876 | ||||
| artifacts: | ||||
|   - path: data/raw/advisories/a0.jsonl.zst | ||||
|     sha256: 9f4b... | ||||
|     bytes: 7340021 | ||||
| encryption: | ||||
|   mode: age | ||||
|   strict: false | ||||
|   recipients: | ||||
|     - age1tenantkey... | ||||
| ``` | ||||
|  | ||||
| ## 3. Delta mechanics | ||||
|  | ||||
| Delta bundles reference a previous full or delta run via `baseExportId` and `baseManifestDigest`. They contain: | ||||
|  | ||||
| ``` | ||||
| delta/ | ||||
|   changed/ | ||||
|     data/raw/advisories/*.jsonl.zst | ||||
|     ... | ||||
|   removed/ | ||||
|     advisories.jsonl        # list of advisory IDs removed | ||||
|     vex.jsonl | ||||
|     sboms.jsonl | ||||
|   manifest.diff.json        # summary of counts, hashes, base export metadata | ||||
| ``` | ||||
|  | ||||
| - **Base lookup:** The worker verifies that the base export is reachable (download path or OCI reference). If missing, the run fails with `ERR_EXPORT_BASE_MISSING`. | ||||
| - **Change detection:** Uses deterministic hashing of normalized records to compute additions/updates. Indexes are regenerated only for affected subjects. | ||||
| - **Application order:** Consumers apply deltas sequentially. A `resetBaseline=true` flag instructs them to drop cached state and apply the bundle as a full refresh. | ||||
|  | ||||
| Example `manifest.diff.json` (delta): | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "baseExportId": "run-20251025-01", | ||||
|   "baseManifestDigest": "sha256:aa11...", | ||||
|   "resetBaseline": false, | ||||
|   "added": { | ||||
|     "advisories": 43, | ||||
|     "vex": 12, | ||||
|     "sboms": 5 | ||||
|   }, | ||||
|   "changed": { | ||||
|     "advisories": 18, | ||||
|     "vex": 7 | ||||
|   }, | ||||
|   "removed": { | ||||
|     "advisories": 2, | ||||
|     "vex": 0, | ||||
|     "sboms": 0 | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## 4. Encryption workflow | ||||
|  | ||||
| Mirror bundles support optional encryption of the `data/` subtree: | ||||
|  | ||||
| - **Algorithm:** Age (X25519) or AES-GCM (256-bit) based on profile configuration. | ||||
| - **Key wrapping:** Keys fetched from Authority-managed KMS through Export Center. Wrapped data keys stored in `provenance.json` under `encryption.recipients[]`. | ||||
| - **Metadata:** `manifest.yaml` records `encryption.mode`, `recipients`, and `encryptedPaths`. | ||||
| - **Strict mode:** `strict=true` encrypts everything except `manifest.yaml` and `export.json`. Default (`false`) leaves manifests unencrypted to simplify discovery. | ||||
| - **Verification:** CLI (`stella export verify`) and Offline Kit scripts perform signature checks prior to decryption. | ||||
|  | ||||
| Operators must distribute recipient keys out of band. Export Center does not transmit private keys. | ||||
|  | ||||
| ## 5. Import workflow | ||||
|  | ||||
| ### 5.1 Offline Kit | ||||
|  | ||||
| Offline Kit bundles reference the latest full mirror export plus the last `N` deltas. Administrators run: | ||||
|  | ||||
| ``` | ||||
| ./offline-kit/bin/mirror import /path/to/mirror-20251029-full.tar.zst | ||||
| ./offline-kit/bin/mirror import /path/to/mirror-20251030-delta.tar.zst | ||||
| ``` | ||||
|  | ||||
| The tool verifies signatures, applies deltas, and updates the mirror index served by the local gateway. | ||||
|  | ||||
| ### 5.2 Custom automation | ||||
|  | ||||
| 1. Download bundle (`stella export download`) and verify signatures (`stella export verify`). | ||||
| 2. Extract archive into a staging directory. | ||||
| 3. For encrypted bundles, decrypt using the provided age/AES key. | ||||
| 4. Sync `mirror/data` onto the target mirror store (object storage, NFS, etc.). | ||||
| 5. Republish indexes or reload services that depend on the mirror. | ||||
|  | ||||
| Delta consumers must track `appliedExportIds` to ensure ordering. | ||||
|  | ||||
| Sequence diagram of download/import: | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   participant CLI as stella CLI | ||||
|   participant Mirror as Mirror Store | ||||
|   participant Verify as Verification Tool | ||||
|   CLI->>CLI: stella export download run-20251029-01 | ||||
|   CLI->>Verify: stella export verify run-20251029-01 | ||||
|   CLI->>Mirror: mirror import mirror-20251029-full.tar.zst | ||||
|   CLI->>Mirror: mirror import mirror-20251030-delta.tar.zst | ||||
|   Mirror-->>CLI: import complete (run-20251030-02) | ||||
| ``` | ||||
|  | ||||
| ## 6. Operational guidance | ||||
|  | ||||
| - **Retention:** Keep at least one full bundle plus the deltas required for disaster recovery. Configure `ExportCenter:Retention:Mirror` to prune older bundles automatically. | ||||
| - **Storage footprint:** Full bundles can exceed tens of gigabytes. Plan object storage or NAS capacity accordingly and enable compression (`compression.codec=zstd`). | ||||
| - **Scheduling:** For high-churn environments, run daily full exports and hourly deltas. Record cadence in `manifest.yaml` (`schedule.cron`). | ||||
| - **Incident recovery:** To rebuild a mirror: | ||||
|   1. Apply the most recent full bundle. | ||||
|   2. Apply deltas in order of `createdAt`. | ||||
|   3. Re-run integrity checks (`mirror verify <path>`). | ||||
| - **Audit logging:** Export Center logs `mirror.bundle.created`, `mirror.delta.applied`, and `mirror.encryption.enabled` events. Consume them in the central observability pipeline. | ||||
|  | ||||
| ## 7. Troubleshooting | ||||
|  | ||||
| | Symptom | Meaning | Action | | ||||
| |---------|---------|--------| | ||||
| | `ERR_EXPORT_BASE_MISSING` | Base export not available | Republish base bundle or rebuild as full export. | | ||||
| | Delta applies but mirror misses entries | Deltas applied out of order | Rebuild from last full bundle and reapply in sequence. | | ||||
| | Decryption fails | Recipient key mismatch or corrupted bundle | Confirm key distribution and re-download bundle. | | ||||
| | Verification errors | Signature mismatch | Do not import; regenerate bundle and investigate signing pipeline. | | ||||
| | Manifest hash mismatch | Files changed after extraction | Re-extract bundle and re-run verification; check storage tampering. | | ||||
|  | ||||
| ## 8. References | ||||
|  | ||||
| - [Export Center Overview](overview.md) | ||||
| - [Export Center Architecture](architecture.md) | ||||
| - [Export Center API reference](api.md) | ||||
| - [Export Center CLI Guide](cli.md) | ||||
| - [Concelier mirror runbook](../concelier/operations/mirror.md) | ||||
| - [Aggregation-Only Contract reference](../../ingestion/aggregation-only-contract.md) | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										203
									
								
								docs/modules/export-center/operations/runbook.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										203
									
								
								docs/modules/export-center/operations/runbook.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,203 @@ | ||||
| # Export Center Operations Runbook | ||||
|  | ||||
| > Export Center workers and API are landing across Sprints 35-37. This runbook captures the target operational procedures so DevOps can validate them as each milestone goes live. Update specific commands once `EXPORT-SVC-35-006`, `EXPORT-SVC-36-001..004`, and related CLI tasks ship. | ||||
|  | ||||
| ## 1. Service scope | ||||
|  | ||||
| The Export Center packages StellaOps evidence and policy overlays into reproducible bundles (JSON, Trivy DB, mirror). Operations owns: | ||||
|  | ||||
| - Worker scaling, queue management, and distribution storage. | ||||
| - Monitoring and alerts for run throughput, failures, and verification issues. | ||||
| - Runbook execution for recovery, retention, and compliance. | ||||
| - Coordination with DevOps validation (cosign + `trivy module db import` smoke tests). | ||||
|  | ||||
| Related documentation: | ||||
|  | ||||
| - `docs/modules/export-center/overview.md` | ||||
| - `docs/modules/export-center/architecture.md` | ||||
| - `docs/modules/export-center/profiles.md` | ||||
| - `docs/modules/export-center/trivy-adapter.md` | ||||
| - `docs/modules/export-center/mirror-bundles.md` | ||||
| - `docs/modules/export-center/api.md` | ||||
| - `docs/modules/export-center/cli.md` | ||||
|  | ||||
| ## 2. Contacts & tooling | ||||
|  | ||||
| | Area | Owner(s) | Escalation | | ||||
| |------|----------|------------| | ||||
| | Export Center service | Exporter Service Guild | `#export-center-ops`, on-call rotation | | ||||
| | Distribution & CI smoke | DevOps Guild | CI channel, PagerDuty `devops-export` | | ||||
| | KMS / encryption | Authority Core | `#authority-core` | | ||||
| | Offline Kit dissemination | Offline Kit Guild | `#offline-kit` | | ||||
|  | ||||
| Primary tooling: | ||||
|  | ||||
| - `stella export` CLI (submit, watch, download, verify). | ||||
| - Export Center API (`/api/export/*`) for automation. | ||||
| - Grafana dashboards (`Export Center / Run Health`, `Export Center / Distribution`). | ||||
| - Alertmanager routes (`Export.Center.Failures`, `Export.Center.Verify`). | ||||
|  | ||||
| ## 3. Monitoring & SLOs | ||||
|  | ||||
| Key metrics (exposed by workers and API): | ||||
|  | ||||
| | Metric | SLO / Alert | Notes | | ||||
| |--------|-------------|-------| | ||||
| | `exporter_run_duration_seconds` | p95 < 300 s (full), < 120 s (delta) | Break down by profile (`profile_kind`). | | ||||
| | `exporter_run_failures_total` | Alert when > 3 failures/15 min per profile | Include `error_code` label. | | ||||
| | `exporter_run_bytes_total` | Track growth trends | Helps with storage planning. | | ||||
| | `exporter_distribution_push_seconds` | p95 < 60 s | Covers OCI/object storage. | | ||||
| | `exporter_verify_failures_total` | Alert on any non-zero | Raised when cosign/Trivy smoke tests fail. | | ||||
| | `exporter_retention_pruned_total` | Should increase nightly | Confirms retention job success. | | ||||
|  | ||||
| Dashboards must include: | ||||
|  | ||||
| - Run throughput by profile. | ||||
| - Failure breakdown (adapter, signing, distribution). | ||||
| - Queue depth and worker concurrency (via Orchestrator metrics). | ||||
| - Storage consumption (object storage buckets, local staging). | ||||
|  | ||||
| Alerts (Alertmanager): | ||||
|  | ||||
| - `ExportCenterRunFailureSpike` - `exporter_run_failures_total` increase rate > 3/15 min. | ||||
| - `ExportCenterVerifyFailure` - any entry in `exporter_verify_failures_total` > 0. | ||||
| - `ExportCenterWorkerLag` - queue backlog > threshold for 10 minutes. | ||||
| - `ExportCenterRetentionStale` - no pruning events in 24 hours. | ||||
|  | ||||
| ## 4. Routine operations | ||||
|  | ||||
| ### 4.1 Daily checklist | ||||
|  | ||||
| - Review dashboard for run throughput and error classes. | ||||
| - Confirm CI smoke job (cosign + `trivy module db import`) passed. | ||||
| - Check storage usage against capacity thresholds. | ||||
| - Verify retention job executed (look for `exporter_retention_pruned_total` increment). | ||||
| - Scan logs for `adapter.trivy.unsupported_schema_version` or `mirror.delta.apply_failed`. | ||||
|  | ||||
| ### 4.2 Weekly tasks | ||||
|  | ||||
| - Rotate Download/OCI API tokens if configured with short-lived credentials. | ||||
| - Review upcoming profile changes (new tenants, profile updates). | ||||
| - Test `stella export verify` against a recent run for each profile. | ||||
| - Exercise failover of workers (scale to zero one replica, ensure others pick up). | ||||
|  | ||||
| ### 4.3 Pre-release | ||||
|  | ||||
| - Ensure bundles generated for release candidates pass cosign verification. | ||||
| - Capture sample manifests (`export.json`, `manifest.yaml`) for documentation archives. | ||||
| - Validate Offline Kit packaging includes latest full + delta mirror bundles. | ||||
|  | ||||
| ## 5. Capacity & scaling | ||||
|  | ||||
| ### 5.1 Worker sizing | ||||
|  | ||||
| - Default workers handle ~2 full runs or 6 delta runs concurrently per 4 vCPU. | ||||
| - Scale out when: | ||||
|   - Queue depth (`exporter_jobs_ready`) > 10 for 10 minutes. | ||||
|   - p95 durations exceed SLO for multiple runs without failures. | ||||
| - Use Orchestrator quotas: ensure per-tenant concurrency (`max_active_runs`) is tuned. | ||||
|  | ||||
| ### 5.2 Storage planning | ||||
|  | ||||
| - Staging storage (object store or filesystem) must hold at least: | ||||
|   - Latest full bundle per tenant per profile. | ||||
|   - Last `N` deltas (default N=5). | ||||
| - Set retention policy via configuration: | ||||
|  | ||||
| ```yaml | ||||
| ExportCenter: | ||||
|   Retention: | ||||
|     Mirror: | ||||
|       Mode: days | ||||
|       Value: 30 | ||||
|     Trivy: | ||||
|       Mode: count | ||||
|       Value: 10 | ||||
| ``` | ||||
|  | ||||
| - Monitor `exporter_storage_bytes_total` (if available) or use bucket metrics from storage provider. | ||||
|  | ||||
| ## 6. Failure response | ||||
|  | ||||
| | Symptom | Likely cause | Immediate action | Follow-up | | ||||
| |---------|--------------|------------------|-----------| | ||||
| | `ERR_EXPORT_UNSUPPORTED_SCHEMA` | Trivy schema mismatch | Pin `SchemaVersion` to previous value; rerun export | Coordinate with Exporter Guild to add new mapping | | ||||
| | `ERR_EXPORT_BASE_MISSING` | Base manifest unavailable | Trigger full export (`mirror:full`), notify tenant | Investigate storage retention settings | | ||||
| | Run stuck in `pending` | Worker unavailable / queue paused | Check worker pods / Orchestrator status | Scale workers or fix queue |  | ||||
| | Signing failure (`errorCode=signing`) | KMS outage or permission change | Verify KMS health; retry run; escalate to Authority | Document incident, review key rotation schedule | | ||||
| | Distribution failure (`errorCode=distribution`) | OCI/object store outage | Switch profile distribution to download-only (`distribution: ["http"]`) | Restore distribution backend, resume normal config | | ||||
| | CLI verification failure in CI | New bundle did not pass cosign or Trivy import | Inspect pipeline logs; download bundle; rerun verification manually | Engage Exporter Guild if data quality issue | | ||||
| | Retention job skipped | Scheduler failure or misconfiguration | Run retention job manually (`stella export retention run`) | Audit scheduler configuration | | ||||
|  | ||||
| Log locations: `exporter` service emits structured logs with `runId`, `profile`, `errorCode`. For Kubernetes deployments, check `kubectl logs deployment/export-center-worker`. | ||||
|  | ||||
| ## 7. Recovery playbooks | ||||
|  | ||||
| ### 7.1 Replaying a failed run | ||||
|  | ||||
| 1. Identify run (`runId`) and root cause via `GET /api/export/runs/{id}`. | ||||
| 2. If configuration changed, clone profile and adjust settings. | ||||
| 3. Resubmit run (`stella export run submit` or API) with `--allow-empty` if intentionally empty. | ||||
| 4. Monitor SSE stream or `stella export run watch`. | ||||
| 5. After success, prune failed run data if necessary. | ||||
|  | ||||
| ### 7.2 Restoring from previous full bundle | ||||
|  | ||||
| 1. Locate last successful full bundle (`mirror:full`) and associated manifest. | ||||
| 2. Download and verify signatures. | ||||
| 3. Extract into mirror staging area. | ||||
| 4. Apply subsequent delta bundles in order. | ||||
| 5. Trigger mirror verification script (`mirror verify <path>`). | ||||
|  | ||||
| ### 7.3 KMS outage response | ||||
|  | ||||
| 1. Disable new export submissions temporarily (set per-tenant quota to 0). | ||||
| 2. Coordinate with Authority Core to restore KMS. | ||||
| 3. Once KMS back, run `stella export run submit --profile <id> --selectors ... --priority catch-up` for affected tenants. | ||||
|  | ||||
| ## 8. Verification workflow | ||||
|  | ||||
| All bundles must pass both signature and content verification. | ||||
|  | ||||
| ### 8.1 Trivy bundle validation (CI job) | ||||
|  | ||||
| ```bash | ||||
| cosign verify-blob \ | ||||
|   --key tenants/acme/export-center.pub \ | ||||
|   --signature signatures/trivy-db.sig \ | ||||
|   trivy/db.bundle | ||||
|  | ||||
| trivy module db import trivy/db.bundle --cache-dir /tmp/trivy-cache | ||||
| ``` | ||||
|  | ||||
| Automation: `DEVOPS-EXPORT-36-001` ensures this runs on every pipeline. | ||||
|  | ||||
| ### 8.2 Mirror bundle validation | ||||
|  | ||||
| ```bash | ||||
| cosign verify-blob \ | ||||
|   --key tenants/acme/export-center.pub \ | ||||
|   --signature signatures/export.sig \ | ||||
|   mirror/export.json | ||||
|  | ||||
| ./offline-kit/bin/mirror verify mirror-20251029-full.tar.zst | ||||
| ``` | ||||
|  | ||||
| If encryption enabled, decrypt using age or AES key before verification. | ||||
|  | ||||
| ## 9. Change management | ||||
|  | ||||
| - Profile changes require change record referencing tenant impact and expected bundle size. | ||||
| - Distribution configuration updates (`OCI` vs `HTTP`) must be tested in staging. | ||||
| - Schema upgrades (e.g., Trivy schema v3) need coordination with DevOps, Exporter, and Docs. | ||||
| - Update runbook and related docs when processes change (tie updates to `DOCS-EXPORT-37-005`). | ||||
|  | ||||
| ## 10. References | ||||
|  | ||||
| - `docs/modules/export-center/trivy-adapter.md` | ||||
| - `docs/modules/export-center/mirror-bundles.md` | ||||
| - `ops/devops/TASKS.md` (`DEVOPS-EXPORT-36-001`, `DEVOPS-EXPORT-37-001`) | ||||
| - `docs/ingestion/aggregation-only-contract.md` | ||||
| - `docs/24_OFFLINE_KIT.md` | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										63
									
								
								docs/modules/export-center/overview.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										63
									
								
								docs/modules/export-center/overview.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,63 @@ | ||||
| # Export Center Overview | ||||
|  | ||||
| The Export Center packages StellaOps evidence and policy outputs into portable, verifiable bundles. It provides one workflow for operators to deliver advisories, SBOMs, VEX statements, and policy decisions into downstream systems or air-gapped environments without rewriting data or violating the Aggregation-Only Contract (AOC). | ||||
|  | ||||
| ## What the Export Center delivers | ||||
| - **Unified export service.** A dedicated `exporter` service coordinates profiles, runs, signing, and distribution targets with deterministic manifests. | ||||
| - **Profile catalogue.** Out of the box variants include `json:raw`, `json:policy`, `trivy:db`, `trivy:java-db`, `mirror:full`, and `mirror:delta`, each aligned with AOC rules and downstream compatibility requirements. | ||||
| - **Surface parity.** Operators can create, monitor, and download exports through the Web API gateway, Console workflows, and the CLI (`stella export ...`). All surfaces enforce tenant scope and RBAC consistently. | ||||
| - **Automation hooks.** One-off, cron, and event triggers are orchestrated via the Scheduler/Orchestrator integration. Export telemetry (durations, bundle size, verification outcomes) feeds structured logs, metrics, and optional OpenTelemetry traces. | ||||
|  | ||||
| ### Profile variants at a glance | ||||
|  | ||||
| | Profile | Contents | Primary scenarios | Distribution defaults | | ||||
| |---------|----------|-------------------|-----------------------| | ||||
| | `json:raw` | Canonical advisories, VEX, SBOM JSONL with hashes | Downstream analytics, evidence escrow | HTTP download, object storage | | ||||
| | `json:policy` | `json:raw` plus policy snapshot, evaluated findings | Policy attestation, audit packages | HTTP download, object storage | | ||||
| | `trivy:db` / `trivy:java-db` | Trivy-compatible vulnerability databases | Feed external scanners and CI | OCI artifact push, download | | ||||
| | `mirror:full` | Complete evidence, indexes, policy, provenance | Air-gap mirror, disaster recovery | Filesystem bundle, OCI artifact | | ||||
| | `mirror:delta` | Changes relative to prior manifest | Incremental updates to mirrors | Filesystem bundle, OCI artifact | | ||||
|  | ||||
| ## How it works end-to-end | ||||
| 1. **Profile & scope resolution.** A profile defines export type, content filters, and bundle settings. Scope selectors target tenants, artifacts, time windows, ecosystems, or SBOM subjects. | ||||
| 2. **Ledger collection.** Workers stream canonical data from Findings Ledger, VEX Lens, Conseiller feeds, and SBOM service. Policy exports pin a deterministic policy snapshot from Policy Engine. | ||||
| 3. **Adapter execution.** JSON adapters produce normalized `.jsonl.zst` outputs, Trivy adapters translate to the Trivy DB schema, and mirror adapters build filesystem or OCI bundle layouts. | ||||
| 4. **Manifesting & provenance.** Every run emits `export.json` (profile, filters, counts, checksums) and `provenance.json` (source artifacts, policy snapshot ids, signature references). | ||||
| 5. **Signing & distribution.** Bundles are signed via configured KMS (cosign-compatible) and distributed through HTTP streaming, OCI registry pushes, or object storage staging. | ||||
|  | ||||
| Refer to `docs/modules/export-center/architecture.md` (Sprint 35 task) for component diagrams and adapter internals once published. | ||||
|  | ||||
| ## Security and compliance guardrails | ||||
| - **AOC alignment.** Exports bundle raw evidence and optional policy evaluations without mutating source content. Policy overlays remain attributed to Policy Engine and are clearly partitioned. | ||||
| - **Tenant isolation.** All queries, manifests, and bundle paths carry tenant identifiers. Cross-tenant exports require explicit signed approval and ship with provenance trails. | ||||
| - **Signing and encryption.** Manifests and payloads are signed using the platform KMS. Mirror profiles support optional in-bundle encryption (age/AES-GCM) with key wrapping. | ||||
| - **Determinism.** Identical inputs yield identical bundles. Timestamps serialize in UTC ISO-8601; manifests include content hashes for audit replay. | ||||
|  | ||||
| See `docs/security/policy-governance.md` and `docs/ingestion/aggregation-only-contract.md` for broader guardrail context. | ||||
|  | ||||
| ## Operating it offline | ||||
| - **Offline Kit integration.** Air-gapped deployments receive pre-built export profiles and object storage layout templates through the Offline Kit bundles. | ||||
| - **Mirror bundles.** `mirror:full` packages raw evidence, normalized indexes, policy snapshots, and provenance in a portable filesystem layout suitable for disconnected environments. `mirror:delta` tracks changes relative to a prior export manifest. | ||||
| - **No unsanctioned egress.** The exporter respects the platform allowlist. External calls (e.g., OCI pushes) require explicit configuration and are disabled by default for offline installs. | ||||
|  | ||||
| Consult `docs/24_OFFLINE_KIT.md` for Offline Kit delivery and `docs/modules/concelier/operations/mirror.md` for mirror ingestion procedures. | ||||
|  | ||||
| ## Getting started | ||||
| 1. **Choose a profile.** Map requirements to the profile table above. Policy-aware exports need a published policy snapshot. | ||||
| 2. **Define selectors.** Decide on tenants, products, SBOM subjects, or time windows to include. Default selectors export the entire tenant scope. | ||||
| 3. **Run via preferred surface.** | ||||
|    - **Console:** Navigate to the Export Center view, create a run, monitor progress, and download artifacts. | ||||
|    - **CLI:** Use `stella export run --profile <name> --selector <filters>` to submit a job, then `stella export download`. | ||||
|    - **API:** POST to `/api/export/runs` with profile id and scope payload; stream results from `/api/export/runs/{id}/download`. | ||||
| 4. **Verify bundles.** Use the attached provenance manifest and cosign signature to validate contents before distributing downstream. | ||||
|  | ||||
| Refer to `docs/modules/export-center/cli.md` for detailed command syntax and automation examples. | ||||
|  | ||||
| ## Observability & troubleshooting | ||||
| - Structured logs emit lifecycle events (`fetch`, `adapter`, `sign`, `publish`) with correlation IDs for parallel job tracing. | ||||
| - Metrics `exporter_run_duration_seconds`, `exporter_bundle_bytes_total`, and `exporter_run_failures_total` feed Grafana dashboards defined in the deployment runbooks. | ||||
| - Verification failures or schema mismatches bubble up through failure events and appear in Console/CLI with actionable error messages. Inspect the run's audit log and `provenance.json` for root cause. | ||||
|  | ||||
| See `docs/observability/policy.md` and `docs/modules/devops/runbooks/deployment-upgrade.md` for telemetry and operations guidance. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										139
									
								
								docs/modules/export-center/profiles.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										139
									
								
								docs/modules/export-center/profiles.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,139 @@ | ||||
| # Export Center Profiles | ||||
|  | ||||
| Export Center profiles define what data is collected, how it is encoded, and which distribution paths are enabled for a run. Profiles are tenant-scoped and deterministic: identical selectors and source data produce identical bundles. This guide summarises built-in profiles, configuration fields, schema conventions, and compatibility notes. | ||||
|  | ||||
| ## Profile catalogue | ||||
|  | ||||
| | Profile | Kind / Variant | Output artefacts | Primary use cases | | ||||
| |---------|----------------|------------------|-------------------| | ||||
| | `json:raw` | `json` / `raw` | Canonical JSONL archives of advisories, VEX, SBOMs | Evidence escrow, analytics pipelines | | ||||
| | `json:policy` | `json` / `policy` | `json:raw` artefacts plus policy snapshot + evaluated findings | Audit, compliance attestations | | ||||
| | `trivy:db` | `trivy` / `db` | Trivy-compatible vulnerability database | Feeding external scanners / CI | | ||||
| | `trivy:java-db` | `trivy` / `java-db` | Java ecosystem supplement for Trivy | Supply Java CVE data to Trivy | | ||||
| | `mirror:full` | `mirror` / `full` | Complete mirror bundle (raw, policy, indexes, provenance) | Air-gap deployments, disaster recovery | | ||||
| | `mirror:delta` | `mirror` / `delta` | Incremental changes relative to a prior manifest | Efficient mirror updates | | ||||
|  | ||||
| Profiles can be cloned and customised; configuration is immutable per revision to keep runs reproducible. | ||||
|  | ||||
| ## Common configuration fields | ||||
|  | ||||
| | Field | Description | Applies to | Notes | | ||||
| |-------|-------------|------------|-------| | ||||
| | `name` | Human-readable identifier displayed in Console/CLI | All | Unique per tenant. | | ||||
| | `kind` | Logical family (`json`, `trivy`, `mirror`) | All | Determines eligible adapters. | | ||||
| | `variant` | Specific export flavour (see table above) | All | Controls adapter behaviour. | | ||||
| | `include` | Record types to include (`advisories`, `vex`, `sboms`, `findings`) | JSON, mirror | Defaults depend on variant. | | ||||
| | `policySnapshotMode` | `required`, `optional`, or `none` | JSON policy, mirror | `required` forces a policy snapshot id when creating runs. | | ||||
| | `distribution` | Enabled distribution drivers (`http`, `oci`, `object`) | All | Offline installs typically disable `oci`. | | ||||
| | `compression` | Compression settings (`zstd`, level) | JSON, mirror | Trivy adapters manage compression internally. | | ||||
| | `encryption` | Mirror encryption options (`enabled`, `recipientKeys`, `strict`) | Mirror | When enabled, only `/data` subtree is encrypted; manifests remain plaintext. | | ||||
| | `retention` | Retention policy (days or `never`) | All | Drives pruning jobs for staged bundles. | | ||||
|  | ||||
| Selectors (time windows, tenants, products, SBOM subjects, ecosystems) are supplied per run, not stored in the profile. | ||||
|  | ||||
| ## JSON profiles | ||||
|  | ||||
| ### `json:raw` | ||||
| - **Content:** Exports raw advisories, VEX statements, and SBOMs as newline-delimited JSON (`.jsonl.zst`). | ||||
| - **Schema:** Follows canonical StellaOps schema with casing and timestamps normalised. Each record includes `tenant`, `source`, `linkset`, and `content` fields. | ||||
| - **Options:** | ||||
|   - `include` defaults to `["advisories", "vex", "sboms"]`. | ||||
|   - `compression` defaults to `zstd` level 9. | ||||
|   - `maxRecordsPerFile` (optional) splits outputs for large datasets. | ||||
| - **Compatibility:** Intended for analytics platforms, data escrow, or feeding downstream normalisation pipelines. | ||||
| - **Sample manifest excerpt:** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "profile": { "kind": "json", "variant": "raw" }, | ||||
|   "outputs": [ | ||||
|     { "type": "advisories.jsonl.zst", "sha256": "...", "count": 15234 }, | ||||
|     { "type": "vex.jsonl.zst", "sha256": "...", "count": 3045 }, | ||||
|     { "type": "sboms.jsonl.zst", "sha256": "...", "count": 872 } | ||||
|   ], | ||||
|   "selectors": { "tenant": "acme", "products": ["registry.example/app"] } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### `json:policy` | ||||
| - **Content:** Everything from `json:raw` plus: | ||||
|   - `policy_snapshot.json` (policy metadata, version, hash). | ||||
|   - `findings.policy.jsonl.zst` (evaluated findings with decision, rationale, rule id, inputs hash). | ||||
| - **Determinism:** Requires a policy snapshot id; runs fail if snapshot is missing or non-deterministic mode is active. | ||||
| - **Use cases:** Compliance exports, auditor packages, policy attestation archives. | ||||
| - **Guardrails:** AOC boundaries preserved: policy outputs are clearly partitioned from raw evidence. | ||||
|  | ||||
| ## Trivy profiles | ||||
|  | ||||
| ### `trivy:db` | ||||
| - Detailed adapter behaviour is documented in `docs/modules/export-center/trivy-adapter.md`. | ||||
| - **Content:** Produces a Trivy DB-compatible bundle (SQLite database or tarball as required by Trivy version). | ||||
| - **Mapping rules:** | ||||
|   - Advisory namespaces mapped to Trivy vendor IDs (e.g., `ubuntu`, `debian`, `npm`). | ||||
|   - Version ranges translated into Trivy's semantic version syntax. | ||||
|   - Severity mapped to Trivy standard (`UNKNOWN`, `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`). | ||||
| - **Validation:** Adapter enforces supported Trivy schema versions; configuration includes `targetSchemaVersion`. | ||||
| - **Distribution:** Typically pushed to OCI or object storage for downstream scanners; Console download remains available. | ||||
|  | ||||
| ### `trivy:java-db` | ||||
| - Refer to `docs/modules/export-center/trivy-adapter.md` for ecosystem-specific notes. | ||||
| - **Content:** Optional Java ecosystem supplement for Trivy (matching Trivy's separate Java DB). | ||||
| - **Dependencies:** Requires Java advisories in Findings Ledger; run fails with `ERR_EXPORT_EMPTY` if no Java data present and `allowEmpty=false`. | ||||
| - **Compatibility:** Intended for organisations using Trivy's Java plugin or hardened pipelines that split general and Java feeds. | ||||
|  | ||||
| ## Mirror profiles | ||||
|  | ||||
| ### `mirror:full` | ||||
| - Bundle structure and delta strategy are covered in `docs/modules/export-center/mirror-bundles.md`. | ||||
| - **Content:** Complete export with: | ||||
|   - Raw advisories, VEX, SBOMs (`/data/raw`). | ||||
|   - Policy overlays (`/data/policy`), including evaluated findings and policy snapshots. | ||||
|   - Indexes for fast lookup (`/indexes/advisories.pb`, `/indexes/sboms.pb`). | ||||
|   - Manifests and provenance (`/manifests/export.json`, `/manifests/provenance.json`). | ||||
| - **Layout:** Deterministic directory structure with hashed filenames to reduce duplication. | ||||
| - **Encryption:** Optional `encryption` block enables age/AES-GCM encryption of `/data`. `strict=true` encrypts everything except `export.json`. | ||||
| - **Use cases:** Air-gap replication, disaster recovery drills, Offline Kit seeding. | ||||
|  | ||||
| ### `mirror:delta` | ||||
| - See `docs/modules/export-center/mirror-bundles.md` for delta mechanics and application order. | ||||
| - **Content:** Includes only changes relative to a base manifest (specified by `baseExportId` when running). | ||||
|   - `changed`, `added`, `removed` lists in `manifests/delta.json`. | ||||
|   - Incremental indexes capturing only updated subjects. | ||||
| - **Constraints:** Requires the base manifest to exist in object storage or artifact registry accessible to the worker. Fails with `ERR_EXPORT_BASE_MISSING` otherwise. | ||||
| - **Workflow:** Ideal for frequent updates to mirrored environments with limited bandwidth. | ||||
|  | ||||
| ## Compatibility and guardrails | ||||
| - **Aggregation-Only Contract:** All profiles respect AOC boundaries: raw evidence is never mutated. Policy outputs are appended separately with clear provenance. | ||||
| - **Tenant scoping:** Profiles are tenant-specific. Cross-tenant exports require explicit administrative approval and signed justification. | ||||
| - **Retriable runs:** Re-running a profile with identical selectors yields matching manifests and hashes, facilitating verify-on-download workflows. | ||||
| - **Offline operation:** JSON and mirror profiles function in offline mode without additional configuration. Trivy profiles require pre-seeded schema metadata shipped via Offline Kit. | ||||
| - **Quota integration:** Profiles can define run quotas (per tenant per day). Quota exhaustion surfaces as `429 Too Many Requests` with `X-Stella-Quota-*` hints. | ||||
|  | ||||
| ## Example profile definition (CLI) | ||||
|  | ||||
| ```jsonc | ||||
| { | ||||
|   "name": "daily-json-raw", | ||||
|   "kind": "json", | ||||
|   "variant": "raw", | ||||
|   "include": ["advisories", "vex", "sboms"], | ||||
|   "distribution": ["http", "object"], | ||||
|   "compression": { "codec": "zstd", "level": 9 }, | ||||
|   "retention": { "mode": "days", "value": 14 } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Create via `stella export profile create --file profile.json` (CLI command documented separately). | ||||
|  | ||||
| ## Verification workflow | ||||
| - Download bundle via Console/CLI and extract `export.json` and `provenance.json`. | ||||
| - Run `cosign verify --key <tenant-key> export.json` (for detatched signatures use `--signature export.json.sig`). | ||||
| - Validate Trivy bundles with `trivy --cache-dir <temp> --debug --db-repository <oci-ref>` or local `trivy module db import`. | ||||
| - For mirror bundles, run internal `mirror verify` script (bundled in Offline Kit) to ensure directory layout and digests match manifest. | ||||
|  | ||||
| ## Extending profiles | ||||
| - Use API/CLI to clone an existing profile and adjust `include`, `distribution`, or retention policies. | ||||
| - Adapter plug-ins can introduce new variants (e.g., `json:raw-lite`, `mirror:policy-only`). Plug-ins must be registered at service restart and documented under `/docs/modules/export-center/profiles.md`. | ||||
| - Any new profile must append the imposed rule line and follow determinism and guardrail requirements. | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										150
									
								
								docs/modules/export-center/provenance-and-signing.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										150
									
								
								docs/modules/export-center/provenance-and-signing.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,150 @@ | ||||
| # Export Center Provenance & Signing | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
|  | ||||
| Export Center runs emit deterministic manifests, provenance records, and signatures so operators can prove bundle integrity end-to-end—whether the artefact is downloaded over HTTPS, pulled as an OCI object, or staged through the Offline Kit. This guide captures the canonical artefacts, signing pipeline, verification workflows, and failure handling expectations that backlogs `EXPORT-SVC-35-005` and `EXPORT-SVC-37-002` implement. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1. Goals & scope | ||||
|  | ||||
| - **Authenticity.** Every export manifest and provenance document is signed using Authority-managed KMS keys (cosign-compatible) with optional SLSA Level 3 attestation. | ||||
| - **Traceability.** Provenance links each bundle to the inputs that produced it: tenant, findings ledger queries, policy snapshots, SBOM identifiers, adapter versions, and encryption recipients. | ||||
| - **Determinism.** Canonical JSON (sorted keys, RFC 3339 UTC timestamps, normalized numbers) guarantees byte-for-byte stability across reruns with identical input. | ||||
| - **Portability.** Signatures and attestations travel with filesystem bundles, OCI artefacts, and Offline Kit staging trees. Verification does not require online Authority access when the bundle includes the cosign public key. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2. Artefact inventory | ||||
|  | ||||
| | File | Location | Description | Notes | | ||||
| |------|----------|-------------|-------| | ||||
| | `export.json` | `manifests/export.json` or HTTP `GET /api/export/runs/{id}/manifest` | Canonical manifest describing profile, selectors, counts, SHA-256 digests, compression hints, distribution targets. | Hash of this file is included in provenance `subjects[]`. | | ||||
| | `provenance.json` | `manifests/provenance.json` or `GET /api/export/runs/{id}/provenance` | In-toto provenance record listing subjects, materials, toolchain metadata, encryption recipients, and KMS key identifiers. | Mirrors SLSA Level 2 schema; optionally upgraded to Level 3 with builder attestations. | | ||||
| | `export.json.sig` / `export.json.dsse` | `signatures/export.json.sig` | Cosign signature (and optional DSSE envelope) for manifest. | File naming matches cosign defaults; offline verification scripts expect `.sig`. | | ||||
| | `provenance.json.sig` / `provenance.json.dsse` | `signatures/provenance.json.sig` | Cosign signature (and optional DSSE envelope) for provenance document. | `dsse` present when SLSA Level 3 is enabled. | | ||||
| | `bundle.attestation` | `signatures/bundle.attestation` (optional) | SLSA Level 2/3 attestation binding bundle tarball/OCI digest to the run. | Only produced when `export.attestation.enabled=true`. | | ||||
| | `manifest.yaml` | bundle root | Human-readable summary including digests, sizes, encryption metadata, and verification hints. | Unsigned but redundant; signatures cover the JSON manifests. | | ||||
|  | ||||
| All digests use lowercase hex SHA-256 (`sha256:<digest>`). When bundle encryption is enabled, `provenance.json` records wrapped data keys and recipient fingerprints under `encryption.recipients[]`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3. Signing pipeline | ||||
|  | ||||
| 1. **Canonicalisation.** Export worker serialises `export.json` and `provenance.json` using `NotifyCanonicalJsonSerializer` (identical canonical JSON helpers shared across services). Keys are sorted lexicographically, arrays ordered deterministically, timestamps normalised to UTC. | ||||
| 2. **Digest creation.** SHA-256 digests are computed and recorded: | ||||
|    - `manifest_hash` and `provenance_hash` stored in the run metadata (Mongo) and exported via `/api/export/runs/{id}`. | ||||
|    - Provenance `subjects[]` contains both manifest hash and bundle/archive hash. | ||||
| 3. **Key retrieval.** Worker obtains a short-lived signing token from Authority’s KMS client using tenant-scoped credentials (`export.sign` scope). Keys live in Authority or tenant-specific HSMs depending on deployment. | ||||
| 4. **Signature emission.** Cosign generates detached signatures (`*.sig`). If DSSE is enabled, cosign wraps payload bytes in a DSSE envelope (`*.dsse`). Attestations follow the SLSA Level 2 provenance template; Level 3 requires builder metadata (`EXPORT-SVC-37-002` optional feature flag). | ||||
| 5. **Storage & distribution.** Signatures and attestations are written alongside manifests in object storage, included in filesystem bundles, and attached as OCI artefact layers/annotations. | ||||
| 6. **Audit trail.** Run metadata captures signer identity (`signing_key_id`), cosign certificate serial, signature timestamps, and verification hints. Console/CLI surface these details for downstream automation. | ||||
|  | ||||
| > **Key management.** Secrets and key references are configured per tenant via `export.signing`, pointing to Authority clients or external HSM aliases. Offline deployments pre-load cosign public keys into the bundle (`signatures/pubkeys/{tenant}.pem`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4. Provenance schema highlights | ||||
|  | ||||
| `provenance.json` follows the SLSA provenance (`https://slsa.dev/provenance/v1`) structure with StellaOps-specific extensions. Key fields: | ||||
|  | ||||
| | Path | Description | | ||||
| |------|-------------| | ||||
| | `subject[]` | Array of `{name,digest}` pairs. Includes bundle tarball/OCI digest and `export.json` digest. | | ||||
| | `predicateType` | SLSA v1 (default). | | ||||
| | `predicate.builder` | `{id:"stellaops/export-center@<region>"}` identifies the worker instance/cluster. | | ||||
| | `predicate.buildType` | Profile identifier (`mirror:full`, `mirror:delta`, etc.). | | ||||
| | `predicate.invocation.parameters` | Profile selectors, retention flags, encryption mode, base export references. | | ||||
| | `predicate.materials[]` | Source artefacts with digests: findings ledger query snapshots, policy snapshot IDs + hashes, SBOM identifiers, adapter release digests. | | ||||
| | `predicate.metadata.buildFinishedOn` | RFC 3339 timestamp when signing completed. | | ||||
| | `predicate.metadata.reproducible` | Always `true`—workers guarantee determinism. | | ||||
| | `predicate.environment.encryption` | Records encryption recipients, wrapped keys, algorithm (`age` or `aes-gcm`). | | ||||
| | `predicate.environment.kms` | Signing key identifier (`authority://tenant/export-signing-key`) and certificate chain fingerprints. | | ||||
|  | ||||
| Sample (abridged): | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "subject": [ | ||||
|     { "name": "bundle.tar.zst", "digest": { "sha256": "c1fe..." } }, | ||||
|     { "name": "manifests/export.json", "digest": { "sha256": "ad42..." } } | ||||
|   ], | ||||
|   "predicate": { | ||||
|     "buildType": "mirror:delta", | ||||
|     "invocation": { | ||||
|       "parameters": { | ||||
|         "tenant": "tenant-01", | ||||
|         "baseExportId": "run-20251020-01", | ||||
|         "selectors": { "sources": ["concelier","vexer"], "profiles": ["mirror"] } | ||||
|       } | ||||
|     }, | ||||
|     "materials": [ | ||||
|       { "uri": "ledger://tenant-01/findings?cursor=rev-42", "digest": { "sha256": "0f9a..." } }, | ||||
|       { "uri": "policy://tenant-01/snapshots/rev-17", "digest": { "sha256": "8c3d..." } } | ||||
|     ], | ||||
|     "environment": { | ||||
|       "encryption": { | ||||
|         "mode": "age", | ||||
|         "recipients": [ | ||||
|           { "recipient": "age1qxyz...", "wrappedKey": "BASE64...", "keyId": "tenant-01/notify-age" } | ||||
|         ] | ||||
|       }, | ||||
|       "kms": { | ||||
|         "signingKeyId": "authority://tenant-01/export-signing", | ||||
|         "certificateChainSha256": "1f5e..." | ||||
|       } | ||||
|     } | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5. Verification workflows | ||||
|  | ||||
| | Scenario | Steps | | ||||
| |----------|-------| | ||||
| | **CLI verification** | 1. `stella export manifest <runId> --output manifests/export.json --signature manifests/export.json.sig`<br>2. `stella export provenance <runId> --output manifests/provenance.json --signature manifests/provenance.json.sig`<br>3. `cosign verify-blob --key pubkeys/tenant.pem --signature manifests/export.json.sig manifests/export.json`<br>4. `cosign verify-blob --key pubkeys/tenant.pem --signature manifests/provenance.json.sig manifests/provenance.json` | | ||||
| | **Bundle verification (offline)** | 1. Extract bundle (or mount OCI artefact).<br>2. Validate manifest/provenance signatures using bundled public key.<br>3. Recompute SHA-256 for `data/` files and compare with entries in `export.json`.<br>4. If encrypted, decrypt with Age/AES-GCM recipient key, then re-run digest comparisons on decrypted content. | | ||||
| | **CI pipeline** | Use `stella export verify --manifest manifests/export.json --provenance manifests/provenance.json --signature manifests/export.json.sig --signature manifests/provenance.json.sig` (task `CLI-EXPORT-37-001`). Failure exits non-zero with reason codes (`ERR_EXPORT_SIG_INVALID`, `ERR_EXPORT_DIGEST_MISMATCH`). | | ||||
| | **Console download** | Console automatically verifies signatures before exposing the bundle; failure surfaces an actionable error referencing the export run ID and required remediation. | | ||||
|  | ||||
| Verification guidance (docs/modules/cli/guides/cli-reference.md §export) cross-links here; keep both docs in sync when CLI behaviour changes. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6. Distribution considerations | ||||
|  | ||||
| - **HTTP headers.** `X-Export-Digest` includes bundle digest; `X-Export-Provenance` references `provenance.json` URL; `X-Export-Signature` references `.sig`. Clients use these hints to short-circuit re-downloads. | ||||
| - **OCI annotations.** `org.opencontainers.image.ref.name`, `io.stellaops.export.manifest-digest`, and `io.stellaops.export.provenance-ref` allow registry tooling to locate manifests/signatures quickly. | ||||
| - **Offline Kit staging.** Offline kit assembler copies `manifests/`, `signatures/`, and `pubkeys/` verbatim. Verification scripts (`offline-kits/bin/verify-export.sh`) wrap the cosign commands described above. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7. Failure handling & observability | ||||
|  | ||||
| - Runs surface signature status via `/api/export/runs/{id}` (`signing.status`, `signing.lastError`). Common errors include `ERR_EXPORT_KMS_UNAVAILABLE`, `ERR_EXPORT_ATTESTATION_FAILED`, `ERR_EXPORT_CANONICALIZE`. | ||||
| - Metrics: `exporter_sign_duration_seconds`, `exporter_sign_failures_total{error_code}`, `exporter_provenance_verify_failures_total`. | ||||
| - Logs: `phase=sign`, `error_code`, `signing_key_id`, `cosign_certificate_sn`. | ||||
| - Alerts: DevOps dashboards (task `DEVOPS-EXPORT-37-001`) trigger on consecutive signing failures or verification failures >0. | ||||
|  | ||||
| When verification fails downstream, operators should: | ||||
| 1. Confirm signatures using the known-good key. | ||||
| 2. Inspect `provenance.json` materials; rerun the source queries to ensure matching digests. | ||||
| 3. Review run audit logs and retry export with `--resume` to regenerate manifests. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8. Compliance checklist | ||||
|  | ||||
| - [ ] Manifests and provenance documents generated with canonical JSON, deterministic digests, and signatures. | ||||
| - [ ] Cosign public keys published per tenant, rotated through Authority, and distributed to Offline Kit consumers. | ||||
| - [ ] SLSA attestations enabled where supply-chain requirements demand Level 3 evidence. | ||||
| - [ ] CLI/Console verification paths documented and tested (CI pipelines exercise `stella export verify`). | ||||
| - [ ] Encryption metadata (recipients, wrapped keys) recorded in provenance and validated during verification. | ||||
| - [ ] Run audit logs capture signature timestamps, signer identity, and failure reasons. | ||||
|  | ||||
| --- | ||||
|  | ||||
| > **Imposed rule reminder:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										246
									
								
								docs/modules/export-center/trivy-adapter.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										246
									
								
								docs/modules/export-center/trivy-adapter.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,246 @@ | ||||
| # Export Center Trivy Adapters | ||||
|  | ||||
| The Trivy adapters translate StellaOps normalized advisories into the format consumed by Aqua Security's Trivy scanner. They enable downstream tooling to reuse StellaOps' curated data without bespoke converters, while preserving Aggregation-Only Contract (AOC) boundaries. This guide documents bundle layouts, field mappings, compatibility guarantees, validation workflows, and configuration toggles introduced in Sprint 36 (`EXPORT-SVC-36-001`, `EXPORT-SVC-36-002`). | ||||
|  | ||||
| > The current Export Center build is wiring the API and workers. Treat this document as the canonical interface for adapter implementation and update any behavioural changes during task sign-off. | ||||
|  | ||||
| ## 1. Adapter overview | ||||
|  | ||||
| | Variant | Bundle | Default profile | Notes | | ||||
| |---------|--------|-----------------|-------| | ||||
| | `trivy:db` | `db.bundle` | `trivy:db` | Core vulnerability database compatible with Trivy CLI >= 0.50.0 (schema v2). | | ||||
| | `trivy:java-db` | `java-db.bundle` | Optional extension | Java ecosystem supplement (Maven, Gradle). Enabled when `ExportCenter:Profiles:Trivy:EnableJavaDb=true`. | | ||||
|  | ||||
| Both variants ship inside the export run under `/export/trivy/`. Each bundle is a gzip-compressed tarball containing: | ||||
|  | ||||
| ``` | ||||
| metadata.json | ||||
| trivy.db                # BoltDB file with vulnerability/provider tables | ||||
| packages/*.json         # Only when schema requires JSON overlays (language ecosystems) | ||||
| ``` | ||||
|  | ||||
| The adapters never mutate input evidence. They only reshape normalized advisories and copy the exact upstream references so consumers can trace provenance. | ||||
|  | ||||
| ## 2. Bundle layout | ||||
|  | ||||
| ``` | ||||
| trivy/ | ||||
|   db.bundle | ||||
|     +-- metadata.json | ||||
|     +-- trivy.db | ||||
|   java-db.bundle        # present when Java DB enabled | ||||
|     +-- metadata.json | ||||
|     +-- trivy-java.db | ||||
|     +-- ecosystem/... | ||||
| signatures/ | ||||
|   trivy-db.sig | ||||
|   trivy-java-db.sig | ||||
| ``` | ||||
|  | ||||
| `metadata.json` aligns with Trivy's expectations (`schemaVersion`, `buildInfo`, `updatedAt`, etc.). Export Center adds an `stella` block to capture profile id, run id, and policy snapshot hints. | ||||
|  | ||||
| Example `metadata.json` (trimmed): | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "schemaVersion": 2, | ||||
|   "buildInfo": { | ||||
|     "trivyVersion": "0.50.1", | ||||
|     "vulnerabilityDBVersion": "2025-10-28T00:00:00Z" | ||||
|   }, | ||||
|   "updatedAt": "2025-10-29T11:42:03Z", | ||||
|   "stella": { | ||||
|     "runId": "run-20251029-01", | ||||
|     "profileId": "prof-trivy-db", | ||||
|     "tenant": "acme", | ||||
|     "policySnapshotId": "policy-snap-42", | ||||
|     "schemaVersion": 2 | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ## 3. Field mappings | ||||
|  | ||||
| ### 3.1 Namespace resolution | ||||
|  | ||||
| | Stella field | Trivy field | Notes | | ||||
| |--------------|-------------|-------| | ||||
| | `advisory.source.vendor` | `namespace` | Canonicalized to lowercase; e.g. `Ubuntu` -> `ubuntu`. | | ||||
| | `advisory.source.product` | `distribution` / `ecosystem` | Mapped via allowlist (`Ubuntu 22.04` -> `ubuntu:22.04`). | | ||||
| | `package.ecosystem` | `package.ecosystem` | OSS ecosystems (`npm`, `pip`, `nuget`, etc.). | | ||||
| | `package.nevra` / `package.evr` | `package.version` (OS) | RPM/DEB version semantics preserved. | | ||||
|  | ||||
| If a record lacks a supported namespace, the adapter drops it and logs `adapter.trivy.unsupported_namespace`. | ||||
|  | ||||
| ### 3.2 Vulnerability metadata | ||||
|  | ||||
| | Stella field | Trivy field | Transformation | | ||||
| |--------------|-------------|----------------| | ||||
| | `advisory.identifiers.cve[]` | `vulnerability.CVEIDs` | Array of strings. | | ||||
| | `advisory.identifiers.aliases[]` | `vulnerability.CWEIDs` / `References` | CVE -> `CVEIDs`, others appended to `References`. | | ||||
| | `advisory.summary` | `vulnerability.Title` | Stripped to 256 chars; rest moved to `Description`. | | ||||
| | `advisory.description` | `vulnerability.Description` | Markdown allowed, normalized to LF line endings. | | ||||
| | `advisory.severity.normalized` | `vulnerability.Severity` | Uses table below. | | ||||
| | `advisory.cvss[]` | `vulnerability.CVSS` | Stored as `{"vector": "...", "score": 7.8, "source": "NVD"}`. | | ||||
| | `advisory.published` | `vulnerability.PublishedDate` | ISO 8601 UTC. | | ||||
| | `advisory.modified` | `vulnerability.LastModifiedDate` | ISO 8601 UTC. | | ||||
| | `advisory.vendorStatement` | `vulnerability.VendorSeverity` / `VendorVectors` | Preserved in vendor block. | | ||||
|  | ||||
| Severity mapping: | ||||
|  | ||||
| | Stella severity | Trivy severity | | ||||
| |-----------------|----------------| | ||||
| | `critical` | `CRITICAL` | | ||||
| | `high` | `HIGH` | | ||||
| | `medium` | `MEDIUM` | | ||||
| | `low` | `LOW` | | ||||
| | `none` / `info` | `UNKNOWN` | | ||||
|  | ||||
| ### 3.3 Affected packages | ||||
|  | ||||
| | Stella field | Trivy field | Notes | | ||||
| |--------------|-------------|-------| | ||||
| | `package.name` | `package.name` | For OS distros uses source package when available. | | ||||
| | `package.purl` | `package.PURL` | Copied verbatim. | | ||||
| | `affects.vulnerableRange` | `package.vulnerableVersionRange` | SemVer or distro version range. | | ||||
| | `remediations.fixedVersion` | `package.fixedVersion` | Latest known fix. | | ||||
| | `remediations.urls[]` | `package.links` | Array; duplicates removed. | | ||||
| | `states.cpes[]` | `package.cpes` | For CPE-backed advisories. | | ||||
|  | ||||
| The adapter deduplicates entries by `(namespace, package.name, vulnerableRange)` to avoid duplicate records when multiple upstream segments agree. | ||||
|  | ||||
| Example mapping (Ubuntu advisory): | ||||
|  | ||||
| ```jsonc | ||||
| // Stella normalized input | ||||
| { | ||||
|   "source": {"vendor": "Ubuntu", "product": "22.04"}, | ||||
|   "identifiers": {"cve": ["CVE-2024-12345"]}, | ||||
|   "severity": {"normalized": "high"}, | ||||
|   "affects": [{ | ||||
|     "package": {"name": "openssl", "ecosystem": "ubuntu", "nevra": "1.1.1f-1ubuntu2.12"}, | ||||
|     "vulnerableRange": "< 1.1.1f-1ubuntu2.13", | ||||
|     "remediations": [{"fixedVersion": "1.1.1f-1ubuntu2.13"}] | ||||
|   }] | ||||
| } | ||||
|  | ||||
| // Trivy vulnerability entry | ||||
| { | ||||
|   "namespace": "ubuntu", | ||||
|   "package": { | ||||
|     "name": "openssl", | ||||
|     "version": "< 1.1.1f-1ubuntu2.13", | ||||
|     "fixedVersion": "1.1.1f-1ubuntu2.13" | ||||
|   }, | ||||
|   "vulnerability": { | ||||
|     "ID": "CVE-2024-12345", | ||||
|     "Severity": "HIGH" | ||||
|   } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| ### 3.4 Java DB specifics | ||||
|  | ||||
| The Java supplement only includes ecosystems `maven`, `gradle`, `sbt`. Additional fields: | ||||
|  | ||||
| | Stella field | Trivy Java field | Notes | | ||||
| |--------------|------------------|-------| | ||||
| | `package.group` | `GroupID` | Derived from Maven coordinates. | | ||||
| | `package.artifact` | `ArtifactID` | Derived from Maven coordinates. | | ||||
| | `package.version` | `Version` | Compared with semver-lite rules. | | ||||
| | `affects.symbolicRanges[]` | `VulnerableVersions` | Strings like `[1.0.0,1.2.3)`. | | ||||
|  | ||||
| ## 4. Compatibility matrix | ||||
|  | ||||
| | Trivy version | Schema version | Supported by adapter | Notes | | ||||
| |---------------|----------------|----------------------|-------| | ||||
| | 0.46.x | 2 | Yes | Baseline compatibility target. | | ||||
| | 0.50.x | 2 | Yes | Default validation target in CI. | | ||||
| | 0.51.x+ | 3 | Pending | Adapter throws `ERR_EXPORT_UNSUPPORTED_SCHEMA` until implemented. | | ||||
|  | ||||
| Schema mismatches emit `adapter.trivy.unsupported_schema_version` and abort the run. Operators can pin the schema via `ExportCenter:Adapters:Trivy:SchemaVersion`. | ||||
|  | ||||
| ## 5. Validation workflow | ||||
|  | ||||
| 1. **Unit tests** (`StellaOps.ExportCenter.Tests`): | ||||
|    - Mapping tests for OS and ecosystem packages. | ||||
|    - Severity conversion and range handling property tests. | ||||
| 2. **Integration tests** (`EXPORT-SVC-36-001`): | ||||
|    - Generate bundle from fixture dataset. | ||||
|    - Run `trivy module db import <bundle>` (Trivy CLI) to ensure the bundle is accepted. | ||||
|    - For Java DB, run `trivy java-repo --db <bundle>` against sample repository. | ||||
| 3. **CI smoke (`DEVOPS-EXPORT-36-001`)**: | ||||
|    - Validate metadata fields using `jq`. | ||||
|    - Ensure signatures verify with `cosign`. | ||||
|    - Check runtime by invoking `trivy fs --cache-dir <temp> --skip-update --custom-db <bundle> fixtures/image`. | ||||
|  | ||||
| Failures set the run status to `failed` with `errorCode="adapter-trivy"` so Console/CLI expose the reason. | ||||
|  | ||||
| ## 6. Configuration knobs | ||||
|  | ||||
| ```yaml | ||||
| ExportCenter: | ||||
|   Adapters: | ||||
|     Trivy: | ||||
|       SchemaVersion: 2           # enforce schema version | ||||
|       IncludeJavaDb: true        # enable java-db.bundle | ||||
|       AllowEmpty: false          # fail when no records match | ||||
|       MaxCvssVectorsPerEntry: 5  # truncate to avoid oversized payloads | ||||
|   Distribution: | ||||
|     Oras: | ||||
|       TrivyRepository: "registry.example.com/stella/trivy-db" | ||||
|       PublishDelta: false | ||||
|     Download: | ||||
|       FilenameFormat: "trivy-db-{runId}.tar.gz" | ||||
|       IncludeMetadata: true | ||||
| ``` | ||||
|  | ||||
| - `AllowEmpty=false` converts empty datasets into `ERR_EXPORT_EMPTY`. | ||||
| - `MaxCvssVectorsPerEntry` prevents extremely large multi-vector advisories from bloating the DB. | ||||
| - `PublishDelta` works in tandem with the planner's delta logic; when true, only changed blobs are pushed. | ||||
| - `FilenameFormat` lets operators align downloads with existing mirror tooling. | ||||
| - `IncludeMetadata` toggles whether `metadata.json` is stored alongside the bundle in the staging directory for quick inspection. | ||||
|  | ||||
| ## 7. Distribution guidelines | ||||
|  | ||||
| - **Download profile**: `db.bundle` placed under `/export/trivy/` and signed. Recommended filename `trivy-db-<runId>.tar.gz`. | ||||
| - **OCI push**: ORAS artifact with annotations: | ||||
|   - `org.opencontainers.artifact.description=StellaOps Trivy DB` | ||||
|   - `io.stella.export.profile=trivy:db` | ||||
|   - `io.stella.export.run=<runId>` | ||||
|   - `io.stella.export.schemaVersion=2` | ||||
| - **Offline Kit**: When `offlineBundle.includeTrivyDb=true`, the exporter copies the latest full bundle plus the last `N` deltas (configurable) with manifests for quick import. | ||||
|  | ||||
| Consumers should always verify signatures using `trivy-db.sig` / `trivy-java-db.sig` before trusting the bundle. | ||||
|  | ||||
| Example verification flow: | ||||
|  | ||||
| ```bash | ||||
| cosign verify-blob \ | ||||
|   --key tenants/acme/export-center.pub \ | ||||
|   --signature signatures/trivy-db.sig \ | ||||
|   trivy/db.bundle | ||||
|  | ||||
| trivy module db import trivy/db.bundle --cache-dir /tmp/trivy-cache | ||||
| ``` | ||||
|  | ||||
| ## 8. Troubleshooting | ||||
|  | ||||
| | Symptom | Likely cause | Remedy | | ||||
| |---------|--------------|--------| | ||||
| | `ERR_EXPORT_UNSUPPORTED_SCHEMA` | Trivy CLI updated schema version. | Bump `SchemaVersion`, extend mapping tables, regenerate fixtures. | | ||||
| | `adapter.trivy.unsupported_namespace` | Advisory namespace not in allowlist. | Extend namespace mapping or exclude in selector. | | ||||
| | `trivy import` fails with "invalid bolt page" | Corrupted bundle or truncated upload. | Re-run export; verify storage backend and signatures. | | ||||
| | Missing Java advisories | `IncludeJavaDb=false` or no Java data in Findings Ledger. | Enable flag and confirm upstream connectors populate Java ecosystems. | | ||||
| | Severity downgraded to UNKNOWN | Source severity missing or unrecognized. | Ensure upstream connectors populate severity or supply CVSS scores. | | ||||
| | `ERR_EXPORT_EMPTY` returned unexpectedly | Selectors yielded zero records while `AllowEmpty=false`. | Review selectors; set `AllowEmpty=true` if empty exports are acceptable. | | ||||
|  | ||||
| ## 9. References | ||||
|  | ||||
| - [Export Center API reference](api.md) | ||||
| - [Export Center CLI Guide](cli.md) | ||||
| - [Export Center Architecture](architecture.md) | ||||
| - [Export Center Overview](overview.md) | ||||
| - [Aqua Security Trivy documentation](https://aquasecurity.github.io/trivy/dev/database/structure/) *(external reference for schema expectations)* | ||||
|  | ||||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied. | ||||
							
								
								
									
										22
									
								
								docs/modules/graph/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/graph/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Graph agent guide | ||||
|  | ||||
| ## Mission | ||||
| Graph module (upcoming) will power graph-indexed queries for SBOM relationships, lineage, and blast-radius analysis. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										31
									
								
								docs/modules/graph/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										31
									
								
								docs/modules/graph/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,31 @@ | ||||
| # StellaOps Graph | ||||
|  | ||||
| Graph module (upcoming) will power graph-indexed queries for SBOM relationships, lineage, and blast-radius analysis. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Model SBOM and advisory entities as a navigable graph. | ||||
| - Provide APIs for dependency impact, provenance chains, and reachability analysis. | ||||
| - Integrate with Scheduler/Policy for graph-driven re-evaluation. | ||||
| - Expose tooling for offline explorers. | ||||
|  | ||||
| ### Domain highlights (Epic 5) | ||||
| - **Nodes:** artifacts/images, SBOM components, packages/versions, files/paths, licences, advisories, VEX statements, provenance attestations, policy versions. | ||||
| - **Edges:** `depends_on`, `contains`, `built_from`, `declared_in`, `affected_by`, `vex_exempts`, `governs_with`, `produced_by`, each timestamped and tenant-scoped. | ||||
| - **Overlays:** policy verdict overlays, VEX consensus, runtime telemetry, and export-ready snapshots with diff support. | ||||
| - **Queries:** reachability (`impact(graph, advisory)`), blast radius (`reverseDepends(component)`), provenance timeline, saved query library with semantic zoom for Console. | ||||
|  | ||||
| ## Key components | ||||
| - Planned services documented in implementation plan (to be delivered). | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - SBOM Service / Cartographer for data ingestion. | ||||
| - Policy & CLI for query surfaces. | ||||
|  | ||||
| ## Operational notes | ||||
| - Pending — see implementation plan for staged milestones. | ||||
|  | ||||
| ## Backlog references | ||||
| - DOCS-GRAPH-24-003 (architecture index) and SCHED-MODELS-21-001 tasks. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 5 – SBOM Graph Explorer:** deliver graph indexer, API, Console explorer, saved queries, overlays, and exports. | ||||
							
								
								
									
										9
									
								
								docs/modules/graph/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/graph/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Graph | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | GRAPH-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | GRAPH-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | GRAPH-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										56
									
								
								docs/modules/graph/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										56
									
								
								docs/modules/graph/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,56 @@ | ||||
| # Graph architecture | ||||
|  | ||||
| > Derived from Epic 5 – SBOM Graph Explorer; this section captures the core model, pipeline, and API expectations. Extend with diagrams as implementation matures. | ||||
|  | ||||
| ## 1) Core model | ||||
|  | ||||
| - **Nodes:** | ||||
|   - `Artifact` (application/image digest) with metadata (tenant, environment, labels). | ||||
|   - `Component` (package/version, purl, ecosystem). | ||||
|   - `File`/`Path` (source files, binary paths) with hash/time metadata. | ||||
|   - `License` nodes linked to components and SBOM attestations. | ||||
|   - `Advisory` and `VEXStatement` nodes linking to Concelier/Excititor records via digests. | ||||
|   - `PolicyVersion` nodes representing signed policy packs. | ||||
| - **Edges:** directed, timestamped relationships such as `DEPENDS_ON`, `BUILT_FROM`, `DECLARED_IN`, `AFFECTED_BY`, `VEX_EXEMPTS`, `GOVERNS_WITH`, `OBSERVED_RUNTIME`. Each edge carries provenance (SRM hash, SBOM digest, policy run ID). | ||||
| - **Overlays:** computed index tables providing fast access to reachability, blast radius, and differential views (e.g., `graph_overlay/vuln/{tenant}/{advisoryKey}`). | ||||
|  | ||||
| ## 2) Pipelines | ||||
|  | ||||
| 1. **Ingestion:** Cartographer/SBOM Service emit SBOM snapshots (`sbom_snapshot` events) captured by the Graph Indexer. Advisories/VEX from Concelier/Excititor generate edge updates, policy runs attach overlay metadata. | ||||
| 2. **ETL:** Normalises nodes/edges into canonical IDs, deduplicates, enforces tenant partitions, and writes to the graph store (planned: Neo4j-compatible or document + adjacency lists in Mongo). | ||||
| 3. **Overlay computation:** Batch workers build materialised views for frequently used queries (impact lists, saved queries, policy overlays) and store as immutable blobs for Offline Kit exports. | ||||
| 4. **Diffing:** `graph_diff` jobs compare two snapshots (e.g., pre/post deploy) and generate signed diff manifests for UI/CLI consumption. | ||||
|  | ||||
| ## 3) APIs | ||||
|  | ||||
| - `GET /graph/nodes/{id}` — fetch node with metadata and attached provenance. | ||||
| - `POST /graph/query/saved` — execute saved query (Cypher-like DSL) with tenant filtering; supports paging, citation metadata, and `explain` traces. | ||||
| - `GET /graph/impact/{advisoryKey}` — returns impacted artifacts with path context and policy/vex overlays. | ||||
| - `GET /graph/diff/{snapshotA}/{snapshotB}` — streaming API returning diff manifest including new/removed edges, risk summary, and export references. | ||||
| - `POST /graph/overlay/policy` — create or retrieve overlay for policy version + advisory set, referencing `effective_finding` results. | ||||
|  | ||||
| ## 4) Storage considerations | ||||
|  | ||||
| - Backed by either: | ||||
|   - **Document + adjacency** (Mongo collections `graph_nodes`, `graph_edges`, `graph_overlays`) with deterministic ordering and streaming exports. | ||||
|   - Or **Graph DB** (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint. | ||||
| - All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits. | ||||
|  | ||||
| ## 5) Offline & export | ||||
|  | ||||
| - Each snapshot packages `nodes.jsonl`, `edges.jsonl`, `overlays/` plus manifest with hash, counts, and provenance. Export Center consumes these artefacts for graph-specific bundles. | ||||
| - Saved queries and overlays include deterministic IDs so Offline Kit consumers can import and replay results. | ||||
|  | ||||
| ## 6) Observability | ||||
|  | ||||
| - Metrics: ingestion lag (`graph_ingest_lag_seconds`), node/edge counts, query latency per saved query, overlay generation duration. | ||||
| - Logs: structured events for ETL stages and query execution (with trace IDs). | ||||
| - Traces: ETL pipeline spans, query engine spans. | ||||
|  | ||||
| ## 7) Rollout notes | ||||
|  | ||||
| - Phase 1: ingest SBOM + advisories, deliver impact queries. | ||||
| - Phase 2: add VEX overlays, policy overlays, diff tooling. | ||||
| - Phase 3: expose runtime/Zastava edges and AI-assisted recommendations (future). | ||||
|  | ||||
| Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised. | ||||
							
								
								
									
										64
									
								
								docs/modules/graph/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										64
									
								
								docs/modules/graph/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,64 @@ | ||||
| # Implementation plan — Graph | ||||
|  | ||||
| ## Delivery phases | ||||
| - **Phase 1 – Graph Indexer foundations**   | ||||
|   Stand up Graph Indexer service, node/edge schemas, ingestion from SBOM/Concelier/Excititor events, identity stability, and snapshot materialisation. | ||||
| - **Phase 2 – Graph API service**   | ||||
|   Expose search, query, path, impact, diff, and overlay endpoints with RBAC, cost controls, and streaming responses. | ||||
| - **Phase 3 – Console & CLI experiences**   | ||||
|   Ship Graph Explorer UI (WebGL canvas, filters, diff mode, overlays) and CLI (`stella sbom graph ...`) for automation pipelines. | ||||
| - **Phase 4 – Advanced analytics**   | ||||
|   Implement clustering, centrality, saved queries, overlay caching, and Policy Engine explain integration. | ||||
| - **Phase 5 – Exports & offline**   | ||||
|   Deliver GraphML/CSV/NDJSON exports, Offline Kit bundles (`nodes.jsonl`, `edges.jsonl`, overlays), and deterministic manifests. | ||||
| - **Phase 6 – Observability & hardening**   | ||||
|   Complete dashboards, alerts, runbooks, load/perf testing, and a11y/accessibility review. | ||||
|  | ||||
| ## Work breakdown | ||||
| - **Services** | ||||
|   - Graph Indexer: event consumers, node/edge builders, snapshot/version handling, aggregate metrics. | ||||
|   - Graph API: validation, planner/cost guard, streaming tile engine, diff/overlay builder, exports. | ||||
|   - Worker jobs: clustering, diff, overlay materialisation with backpressure awareness. | ||||
| - **Data model & storage** | ||||
|   - Collections/tables (`graph_nodes`, `graph_edges`, `graph_snapshots`, `graph_saved_queries`, `graph_overlays_cache`), indexes, tenant partitioning, append-only change logs. | ||||
|   - Evaluate document + adjacency vs graph DB abstraction; ensure deterministic serialization for exports. | ||||
| - **Console** | ||||
|   - Feature module `graph-explorer` with routes, canvas renderer, panels, diff UI, saved queries, export workflows, a11y pass. | ||||
|   - Telemetry instrumentation for user interactions and query budgets. | ||||
| - **CLI & SDK** | ||||
|   - `stella sbom graph query|diff|impact|export`, with JSON schema and piping support. | ||||
|   - SDK utilities for automation and CI pipelines. | ||||
| - **Policy & VEX integration** | ||||
|   - Fetch explain traces for policy overlays, integrate VEX suppressions, align with Policy Engine & VEX Lens data models. | ||||
| - **Observability & Ops** | ||||
|   - Metrics (ingest lag, query latency, cache hit rate), log/traces, dashboards, alerting for runaway queries and OOM. | ||||
|   - Runbooks for incident classes (query denial, cache poisoning, degraded render). | ||||
| - **Documentation** | ||||
|   - Maintain overview, API, query language, console guide, CLI reference, policy/VEX integration docs with compliance checklists. | ||||
|  | ||||
| ## Acceptance criteria | ||||
| - Graph Indexer ingests SBOM/advisory/VEX events deterministically with tenant isolation and append-only provenance. | ||||
| - Graph API serves search/query/path/diff/overlay endpoints within budgeted latency and enforces cost limits + RBAC. | ||||
| - Console explorer visualises topology, overlays, diffs, saved queries; CLI commands mirror functionality for automation. | ||||
| - Exports (GraphML/CSV/NDJSON) and Offline Kit bundles reproduce snapshots and overlays with signed manifests. | ||||
| - Observability dashboards/alerts detect ingest lag, query failures, cache churn, and memory pressure; runbooks guide remediation. | ||||
| - Policy/VEX overlays align with Policy Engine explain traces and VEX suppressions. | ||||
|  | ||||
| ## Risks & mitigations | ||||
| - **Graph scale/complexity:** adopt adjacency compression, cached overlays, streaming pagination, enforced query budgets. | ||||
| - **Tenant bleed:** strict tenant filters, fuzz tests, data masking, compliance reviews. | ||||
| - **Runaway queries/visualization:** cost planner, query timeout, UI hints, safe mode renders. | ||||
| - **Cache poisoning:** input validation, schema versioning, eviction policies. | ||||
| - **Offline parity gaps:** deterministic export pipeline, integration tests for Offline Kit import. | ||||
|  | ||||
| ## Test strategy | ||||
| - **Unit:** node/edge builders, identifier stability, overlay computations, query planner, diff engine. | ||||
| - **Integration:** end-to-end ingest + query flows across SBOM/advisory/VEX, saved query execution, diff exports. | ||||
| - **Performance:** large SBOM datasets, concurrency, memory profiling, WebGL rendering. | ||||
| - **Security:** tenant isolation tests, RBAC, query cost abuse. | ||||
| - **Offline:** export/import verification, manifest hashing, CLI replay. | ||||
|  | ||||
| ## Definition of done | ||||
| - All phases delivered with telemetry, documentation, runbooks, and Offline Kit parity. | ||||
| - Console/CLI parity validated; a11y review complete. | ||||
| - ./TASKS.md and ../../TASKS.md updated; README/architecture/plan kept current with imposed rule references. | ||||
							
								
								
									
										22
									
								
								docs/modules/notify/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										22
									
								
								docs/modules/notify/AGENTS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,22 @@ | ||||
| # Notify agent guide | ||||
|  | ||||
| ## Mission | ||||
| Notify evaluates operator-defined rules against platform events and dispatches channel-specific payloads with full auditability. | ||||
|  | ||||
| ## Key docs | ||||
| - [Module README](./README.md) | ||||
| - [Architecture](./architecture.md) | ||||
| - [Implementation plan](./implementation_plan.md) | ||||
| - [Task board](./TASKS.md) | ||||
|  | ||||
| ## How to get started | ||||
| 1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module. | ||||
| 2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED). | ||||
| 3. Read the architecture and README for domain context before editing code or docs. | ||||
| 4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan. | ||||
|  | ||||
| ## Guardrails | ||||
| - Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md). | ||||
| - Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts. | ||||
| - Keep Offline Kit parity in mind—document air-gapped workflows for any new feature. | ||||
| - Update runbooks/observability assets when operational characteristics change. | ||||
							
								
								
									
										35
									
								
								docs/modules/notify/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										35
									
								
								docs/modules/notify/README.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,35 @@ | ||||
| # StellaOps Notify | ||||
|  | ||||
| Notify evaluates operator-defined rules against platform events and dispatches channel-specific payloads with full auditability. | ||||
|  | ||||
| ## Responsibilities | ||||
| - Process event streams and apply tenant-scoped routing rules. | ||||
| - Render connector-specific payloads (email, Slack, Teams, webhook, custom). | ||||
| - Enforce throttling, digests, and delivery retries. | ||||
| - Surface delivery/audit data for UI and CLI consumers. | ||||
|  | ||||
| ## Key components | ||||
| - `StellaOps.Notify.WebService` (rules API + preview). | ||||
| - `StellaOps.Notify.Worker` (delivery engine). | ||||
| - Connector libraries under `StellaOps.Notify.Connectors.*`. | ||||
|  | ||||
| ## Integrations & dependencies | ||||
| - MongoDB for rule/channel storage. | ||||
| - Redis/NATS for delivery queues. | ||||
| - CLI/UI for authoring and monitoring notifications. | ||||
|  | ||||
| ## Operational notes | ||||
| - Schema fixtures in ./resources/schemas & ./resources/samples. | ||||
| - Connector-specific monitoring dashboards. | ||||
| - Offline runner guidance inside operations playbook. | ||||
|  | ||||
| ## Related resources | ||||
| - ./resources/schemas | ||||
| - ./resources/samples | ||||
|  | ||||
| ## Backlog references | ||||
| - NOTIFY-SVC-38..40 (Notify backlog) referenced in `docs/README.md`. | ||||
| - DOCS-NOTIFY updates tracked in ../../TASKS.md when available. | ||||
|  | ||||
| ## Epic alignment | ||||
| - **Epic 11 – Notifications Studio:** deliver notifications workspace, preview tooling, immutable delivery ledger, and tenant-scoped throttling/digest controls. | ||||
							
								
								
									
										9
									
								
								docs/modules/notify/TASKS.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								docs/modules/notify/TASKS.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,9 @@ | ||||
| # Task board — Notify | ||||
|  | ||||
| > Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable. | ||||
|  | ||||
| | ID | Status | Owner(s) | Description | Notes | | ||||
| |----|--------|----------|-------------|-------| | ||||
| | NOTIFY-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md | | ||||
| | NOTIFY-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md | | ||||
| | NOTIFY-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow | | ||||
							
								
								
									
										515
									
								
								docs/modules/notify/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										515
									
								
								docs/modules/notify/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,515 @@ | ||||
| > **Scope.** Implementation‑ready architecture for **Notify** (aligned with Epic 11 – Notifications Studio): a rules‑driven, tenant‑aware notification service that consumes platform events (scan completed, report ready, rescan deltas, attestation logged, admission decisions, etc.), evaluates operator‑defined routing rules, renders **channel‑specific messages** (Slack/Teams/Email/Webhook), and delivers them **reliably** with idempotency, throttling, and digests. It is UI‑managed, auditable, and safe by default (no secrets leakage, no spam storms). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 0) Mission & boundaries | ||||
|  | ||||
| **Mission.** Convert **facts** from Stella Ops into **actionable, noise‑controlled** signals where teams already live (chat/email/webhooks), with **explainable** reasons and deep links to the UI. | ||||
|  | ||||
| **Boundaries.** | ||||
|  | ||||
| * Notify **does not make policy decisions** and **does not rescan**; it **consumes** events from Scanner/Scheduler/Vexer/Feedser/Attestor/Zastava and routes them. | ||||
| * Attachments are **links** (UI/attestation pages); Notify **does not** attach SBOMs or large blobs to messages. | ||||
| * Secrets for channels (Slack tokens, SMTP creds) are **referenced**, not stored raw in Mongo. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 1) Runtime shape & projects | ||||
|  | ||||
| ``` | ||||
| src/ | ||||
|  ├─ StellaOps.Notify.WebService/        # REST: rules/channels CRUD, test send, deliveries browse | ||||
|  ├─ StellaOps.Notify.Worker/            # consumers + evaluators + renderers + delivery workers | ||||
|  ├─ StellaOps.Notify.Connectors.* /     # channel plug-ins: Slack, Teams, Email, Webhook (v1) | ||||
|  │    └─ *.Tests/ | ||||
|  ├─ StellaOps.Notify.Engine/            # rules engine, templates, idempotency, digests, throttles | ||||
|  ├─ StellaOps.Notify.Models/            # DTOs (Rule, Channel, Event, Delivery, Template) | ||||
|  ├─ StellaOps.Notify.Storage.Mongo/     # rules, channels, deliveries, digests, locks | ||||
|  ├─ StellaOps.Notify.Queue/             # bus client (Redis Streams/NATS JetStream) | ||||
|  └─ StellaOps.Notify.Tests.*            # unit/integration/e2e | ||||
| ``` | ||||
|  | ||||
| **Deployables**: | ||||
|  | ||||
| * **Notify.WebService** (stateless API) | ||||
| * **Notify.Worker** (horizontal scale) | ||||
|  | ||||
| **Dependencies**: Authority (OpToks; DPoP/mTLS), MongoDB, Redis/NATS (bus), HTTP egress to Slack/Teams/Webhooks, SMTP relay for Email. | ||||
|  | ||||
| > **Configuration.** Notify.WebService bootstraps from `notify.yaml` (see `etc/notify.yaml.sample`). Use `storage.driver: mongo` with a production connection string; the optional `memory` driver exists only for tests. Authority settings follow the platform defaults—when running locally without Authority, set `authority.enabled: false` and supply `developmentSigningKey` so JWTs can be validated offline. | ||||
| > | ||||
| > `api.rateLimits` exposes token-bucket controls for delivery history queries and test-send previews (`deliveryHistory`, `testSend`). Default values allow generous browsing while preventing accidental bursts; operators can relax/tighten the buckets per deployment. | ||||
|  | ||||
| > **Plug-ins.** All channel connectors are packaged under `<baseDirectory>/plugins/notify`. The ordered load list must start with Slack/Teams before Email/Webhook so chat-first actions are registered deterministically for Offline Kit bundles: | ||||
| > | ||||
| > ```yaml | ||||
| > plugins: | ||||
| >   baseDirectory: "/var/opt/stellaops" | ||||
| >   directory: "plugins/notify" | ||||
| >   orderedPlugins: | ||||
| >     - StellaOps.Notify.Connectors.Slack | ||||
| >     - StellaOps.Notify.Connectors.Teams | ||||
| >     - StellaOps.Notify.Connectors.Email | ||||
| >     - StellaOps.Notify.Connectors.Webhook | ||||
| > ``` | ||||
| > | ||||
| > The Offline Kit job simply copies the `plugins/notify` tree into the air-gapped bundle; the ordered list keeps connector manifests stable across environments. | ||||
|  | ||||
| > **Authority clients.** Register two OAuth clients in StellaOps Authority: `notify-web-dev` (audience `notify.dev`) for development and `notify-web` (audience `notify`) for staging/production. Both require `notify.read` and `notify.admin` scopes and use DPoP-bound client credentials (`client_secret` in the samples). Reference entries live in `etc/authority.yaml.sample`, with placeholder secrets under `etc/secrets/notify-web*.secret.example`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 2) Responsibilities | ||||
|  | ||||
| 1. **Ingest** platform events from internal bus with strong ordering per key (e.g., image digest). | ||||
| 2. **Evaluate rules** (tenant‑scoped) with matchers: severity changes, namespaces, repos, labels, KEV flags, provider provenance (VEX), component keys, admission decisions, etc. | ||||
| 3. **Control noise**: **throttle**, **coalesce** (digest windows), and **dedupe** via idempotency keys. | ||||
| 4. **Render** channel‑specific messages using safe templates; include **evidence** and **links**. | ||||
| 5. **Deliver** with retries/backoff; record outcome; expose delivery history to UI. | ||||
| 6. **Test** paths (send test to channel targets) without touching live rules. | ||||
| 7. **Audit**: log who configured what, when, and why a message was sent. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 3) Event model (inputs) | ||||
|  | ||||
| Notify subscribes to the **internal event bus** (produced by services, escaped JSON; gzip allowed with caps): | ||||
|  | ||||
| * `scanner.scan.completed` — new SBOM(s) composed; artifacts ready | ||||
| * `scanner.report.ready` — analysis verdict (policy+vex) available; carries deltas summary | ||||
| * `scheduler.rescan.delta` — new findings after Feedser/Vexer deltas (already summarized) | ||||
| * `attestor.logged` — Rekor UUID returned (sbom/report/vex export) | ||||
| * `zastava.admission` — admit/deny with reasons, namespace, image digests | ||||
| * `feedser.export.completed` — new export ready (rarely notified directly; usually drives Scheduler) | ||||
| * `vexer.export.completed` — new consensus snapshot (ditto) | ||||
|  | ||||
| **Canonical envelope (bus → Notify.Engine):** | ||||
|  | ||||
| ```json | ||||
| { | ||||
|   "eventId": "uuid", | ||||
|   "kind": "scanner.report.ready", | ||||
|   "tenant": "tenant-01", | ||||
|   "ts": "2025-10-18T05:41:22Z", | ||||
|   "actor": "scanner-webservice", | ||||
|   "scope": { "namespace":"payments", "repo":"ghcr.io/acme/api", "digest":"sha256:..." }, | ||||
|   "payload": { /* kind-specific fields, see below */ } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| **Examples (payload cores):** | ||||
|  | ||||
| * `scanner.report.ready`: | ||||
|  | ||||
|   ```json | ||||
|   { | ||||
|     "reportId": "report-3def...", | ||||
|     "verdict": "fail", | ||||
|     "summary": {"total": 12, "blocked": 2, "warned": 3, "ignored": 5, "quieted": 2}, | ||||
|     "delta": {"newCritical": 1, "kev": ["CVE-2025-..."]}, | ||||
|     "links": {"ui": "https://ui/.../reports/report-3def...", "rekor": "https://rekor/..."}, | ||||
|     "dsse": { "...": "..." }, | ||||
|     "report": { "...": "..." } | ||||
|   } | ||||
|   ``` | ||||
|  | ||||
|   Payload embeds both the canonical report document and the DSSE envelope so connectors, Notify, and UI tooling can reuse the signed bytes without re-serialising. | ||||
|  | ||||
| * `scanner.scan.completed`: | ||||
|  | ||||
|   ```json | ||||
|   { | ||||
|     "reportId": "report-3def...", | ||||
|     "digest": "sha256:...", | ||||
|     "verdict": "fail", | ||||
|     "summary": {"total": 12, "blocked": 2, "warned": 3, "ignored": 5, "quieted": 2}, | ||||
|     "delta": {"newCritical": 1, "kev": ["CVE-2025-..."]}, | ||||
|     "policy": {"revisionId": "rev-42", "digest": "27d2..."}, | ||||
|     "findings": [{"id": "finding-1", "severity": "Critical", "cve": "CVE-2025-...", "reachability": "runtime"}], | ||||
|     "dsse": { "...": "..." } | ||||
|   } | ||||
|   ``` | ||||
|  | ||||
| * `zastava.admission`: | ||||
|  | ||||
|   ```json | ||||
|   { "decision":"deny|allow", "reasons":["unsigned image","missing SBOM"], | ||||
|     "images":[{"digest":"sha256:...","signed":false,"hasSbom":false}] } | ||||
|   ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 4) Rules engine — semantics | ||||
|  | ||||
| **Rule shape (simplified):** | ||||
|  | ||||
| ```yaml | ||||
| name: "high-critical-alerts-prod" | ||||
| enabled: true | ||||
| match: | ||||
|   eventKinds: ["scanner.report.ready","scheduler.rescan.delta","zastava.admission"] | ||||
|   namespaces: ["prod-*"] | ||||
|   repos: ["ghcr.io/acme/*"] | ||||
|   minSeverity: "high"            # min of new findings (delta context) | ||||
|   kev: true                      # require KEV-tagged or allow any if false | ||||
|   verdict: ["fail","deny"]       # filter for report/admission | ||||
|   vex: | ||||
|     includeRejectedJustifications: false    # notify only on accepted 'affected' | ||||
| actions: | ||||
|   - channel: "slack:sec-alerts"  # reference to Channel object | ||||
|     template: "concise" | ||||
|     throttle: "5m" | ||||
|   - channel: "email:soc" | ||||
|     digest: "hourly" | ||||
|     template: "detailed" | ||||
| ``` | ||||
|  | ||||
| **Evaluation order** | ||||
|  | ||||
| 1. **Tenant check** → discard if rule tenant ≠ event tenant. | ||||
| 2. **Kind filter** → discard early. | ||||
| 3. **Scope match** (namespace/repo/labels). | ||||
| 4. **Delta/severity gates** (if event carries `delta`). | ||||
| 5. **VEX gate** (drop if event’s finding is not affected under policy consensus unless rule says otherwise). | ||||
| 6. **Throttling/dedup** (idempotency key) — skip if suppressed. | ||||
| 7. **Actions** → enqueue per‑channel job(s). | ||||
|  | ||||
| **Idempotency key**: `hash(ruleId | actionId | event.kind | scope.digest | delta.hash | day-bucket)`; ensures “same alert” doesn’t fire more than once within throttle window. | ||||
|  | ||||
| **Digest windows**: maintain per action a **coalescer**: | ||||
|  | ||||
| * Window: `5m|15m|1h|1d` (configurable); coalesces events by tenant + namespace/repo or by digest group. | ||||
| * Digest messages summarize top N items and counts, with safe truncation. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 5) Channels & connectors (plug‑ins) | ||||
|  | ||||
| Channel config is **two‑part**: a **Channel** record (name, type, options) and a Secret **reference** (Vault/K8s Secret). Connectors are **restart-time plug-ins** discovered on service start (same manifest convention as Concelier/Excititor) and live under `plugins/notify/<channel>/`. | ||||
|  | ||||
| **Built‑in v1:** | ||||
|  | ||||
| * **Slack**: Bot token (xoxb‑…), `chat.postMessage` + `blocks`; rate limit aware (HTTP 429). | ||||
| * **Microsoft Teams**: Incoming Webhook (or Graph card later); adaptive card payloads. | ||||
| * **Email (SMTP)**: TLS (STARTTLS or implicit), From/To/CC/BCC; HTML+text alt; DKIM optional. | ||||
| * **Generic Webhook**: POST JSON with HMAC signature (Ed25519 or SHA‑256) in headers. | ||||
|  | ||||
| **Connector contract:** (implemented by plug-in assemblies) | ||||
|  | ||||
| ```csharp | ||||
| public interface INotifyConnector { | ||||
|   string Type { get; } // "slack" | "teams" | "email" | "webhook" | ... | ||||
|   Task<DeliveryResult> SendAsync(DeliveryContext ctx, CancellationToken ct); | ||||
|   Task<HealthResult> HealthAsync(ChannelConfig cfg, CancellationToken ct); | ||||
| } | ||||
| ``` | ||||
|  | ||||
| **DeliveryContext** includes **rendered content** and **raw event** for audit. | ||||
|  | ||||
| **Test-send previews.** Plug-ins can optionally implement `INotifyChannelTestProvider` to shape `/channels/{id}/test` responses. Providers receive a sanitised `ChannelTestPreviewContext` (channel, tenant, target, timestamp, trace) and return a `NotifyDeliveryRendered` preview + metadata. When no provider is present, the host falls back to a generic preview so the endpoint always responds. | ||||
|  | ||||
| **Secrets**: `ChannelConfig.secretRef` points to Authority‑managed secret handle or K8s Secret path; workers load at send-time; plug-in manifests (`notify-plugin.json`) declare capabilities and version. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 6) Templates & rendering | ||||
|  | ||||
| **Template engine**: strongly typed, safe Handlebars‑style; no arbitrary code. Partial templates per channel. Deterministic outputs (prop order, no locale drift unless requested). | ||||
|  | ||||
| **Variables** (examples): | ||||
|  | ||||
| * `event.kind`, `event.ts`, `scope.namespace`, `scope.repo`, `scope.digest` | ||||
| * `payload.verdict`, `payload.delta.newCritical`, `payload.links.ui`, `payload.links.rekor` | ||||
| * `topFindings[]` with `purl`, `vulnId`, `severity` | ||||
| * `policy.name`, `policy.revision` (if available) | ||||
|  | ||||
| **Helpers**: | ||||
|  | ||||
| * `severity_icon(sev)`, `link(text,url)`, `pluralize(n, "finding")`, `truncate(text, n)`, `code(text)`. | ||||
|  | ||||
| **Channel mapping**: | ||||
|  | ||||
| * Slack: title + blocks, limited to 50 blocks/3000 chars per section; long lists → link to UI. | ||||
| * Teams: Adaptive Card schema 1.5; fallback text for older channels (surfaced as `teams.fallbackText` metadata alongside webhook hash). | ||||
| * Email: HTML + text; inline table of top N findings, rest behind UI link. | ||||
| * Webhook: JSON with `event`, `ruleId`, `actionId`, `summary`, `links`, and raw `payload` subset. | ||||
|  | ||||
| **i18n**: template set per locale (English default; Bulgarian built‑in). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 7) Data model (Mongo) | ||||
|  | ||||
| Canonical JSON Schemas for rules/channels/events live in `docs/modules/notify/resources/schemas/`. Sample payloads intended for tests/UI mock responses are captured in `docs/modules/notify/resources/samples/`. | ||||
|  | ||||
| **Database**: `notify` | ||||
|  | ||||
| * `rules` | ||||
|  | ||||
|   ``` | ||||
|   { _id, tenantId, name, enabled, match, actions, createdBy, updatedBy, createdAt, updatedAt } | ||||
|   ``` | ||||
|  | ||||
| * `channels` | ||||
|  | ||||
|   ``` | ||||
|   { _id, tenantId, name:"slack:sec-alerts", type:"slack", | ||||
|     config:{ webhookUrl?:"", channel:"#sec-alerts", workspace?: "...", secretRef:"ref://..." }, | ||||
|     createdAt, updatedAt } | ||||
|   ``` | ||||
|  | ||||
| * `deliveries` | ||||
|  | ||||
|   ``` | ||||
|   { _id, tenantId, ruleId, actionId, eventId, kind, scope, status:"sent|failed|throttled|digested|dropped", | ||||
|     attempts:[{ts, status, code, reason}], | ||||
|     rendered:{ title, body, target },    // redacted for PII; body hash stored | ||||
|     sentAt, lastError? } | ||||
|   ``` | ||||
|  | ||||
| * `digests` | ||||
|  | ||||
|   ``` | ||||
|   { _id, tenantId, actionKey, window:"hourly", openedAt, items:[{eventId, scope, delta}], status:"open|flushed" } | ||||
|   ``` | ||||
|  | ||||
| * `throttles` | ||||
|  | ||||
|   ``` | ||||
|   { key:"idem:<hash>", ttlAt }   // short-lived, also cached in Redis | ||||
|   ``` | ||||
|  | ||||
| **Indexes**: rules by `{tenantId, enabled}`, deliveries by `{tenantId, sentAt desc}`, digests by `{tenantId, actionKey}`. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 8) External APIs (WebService) | ||||
|  | ||||
| Base path: `/api/v1/notify` (Authority OpToks; scopes: `notify.admin` for write, `notify.read` for view). | ||||
|  | ||||
| *All* REST calls require the tenant header `X-StellaOps-Tenant` (matches the canonical `tenantId` stored in Mongo). Payloads are normalised via `NotifySchemaMigration` before persistence to guarantee schema version pinning. | ||||
|  | ||||
| Authentication today is stubbed with Bearer tokens (`Authorization: Bearer <token>`). When Authority wiring lands, this will switch to OpTok validation + scope enforcement, but the header contract will remain the same. | ||||
|  | ||||
| Service configuration exposes `notify:auth:*` keys (issuer, audience, signing key, scope names) so operators can wire the Authority JWKS or (in dev) a symmetric test key. `notify:storage:*` keys cover Mongo URI/database/collection overrides. Both sets are required for the new API surface. | ||||
|  | ||||
| Internal tooling can hit `/internal/notify/<entity>/normalize` to upgrade legacy JSON and return canonical output used in the docs fixtures. | ||||
|  | ||||
| * **Channels** | ||||
|  | ||||
|   * `POST /channels` | `GET /channels` | `GET /channels/{id}` | `PATCH /channels/{id}` | `DELETE /channels/{id}` | ||||
|   * `POST /channels/{id}/test` → send sample message (no rule evaluation); returns `202 Accepted` with rendered preview + metadata (base keys: `channelType`, `target`, `previewProvider`, `traceId` + connector-specific entries); governed by `api.rateLimits:testSend`. | ||||
| * `GET /channels/{id}/health` → connector self‑check (returns redacted metadata: secret refs hashed, sensitive config keys masked, fallbacks noted via `teams.fallbackText`/`teams.validation.*`) | ||||
|  | ||||
| * **Rules** | ||||
|  | ||||
|   * `POST /rules` | `GET /rules` | `GET /rules/{id}` | `PATCH /rules/{id}` | `DELETE /rules/{id}` | ||||
|   * `POST /rules/{id}/test` → dry‑run rule against a **sample event** (no delivery unless `--send`) | ||||
|  | ||||
| * **Deliveries** | ||||
|  | ||||
|   * `POST /deliveries` → ingest worker delivery state (idempotent via `deliveryId`). | ||||
|   * `GET /deliveries?since=...&status=...&limit=...` → list envelope `{ items, count, continuationToken }` (most recent first); base metadata keys match the test-send response (`channelType`, `target`, `previewProvider`, `traceId`); rate-limited via `api.rateLimits.deliveryHistory`. See `docs/modules/notify/resources/samples/notify-delivery-list-response.sample.json`. | ||||
|   * `GET /deliveries/{id}` → detail (redacted body + metadata) | ||||
|   * `POST /deliveries/{id}/retry` → force retry (admin, future sprint) | ||||
|  | ||||
| * **Admin** | ||||
|  | ||||
|   * `GET /stats` (per tenant counts, last hour/day) | ||||
|   * `GET /healthz|readyz` (liveness) | ||||
|   * `POST /locks/acquire` | `POST /locks/release` – worker coordination primitives (short TTL). | ||||
|   * `POST /digests` | `GET /digests/{actionKey}` | `DELETE /digests/{actionKey}` – manage open digest windows. | ||||
|   * `POST /audit` | `GET /audit?since=&limit=` – append/query structured audit trail entries. | ||||
|  | ||||
| **Ingestion**: workers do **not** expose public ingestion; they **subscribe** to the internal bus. (Optional `/events/test` for integration testing, admin‑only.) | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 9) Delivery pipeline (worker) | ||||
|  | ||||
| ``` | ||||
| [Event bus] → [Ingestor] → [RuleMatcher] → [Throttle/Dedupe] → [DigestCoalescer] → [Renderer] → [Connector] → [Result] | ||||
|                                                  └────────→ [DeliveryStore] | ||||
| ``` | ||||
|  | ||||
| * **Ingestor**: N consumers with per‑key ordering (key = tenant|digest|namespace). | ||||
| * **RuleMatcher**: loads active rules snapshot for tenant into memory; vectorized predicate check. | ||||
| * **Throttle/Dedupe**: consult Redis + Mongo `throttles`; if hit → record `status=throttled`. | ||||
| * **DigestCoalescer**: append to open digest window or flush when timer expires. | ||||
| * **Renderer**: select template (channel+locale), inject variables, enforce length limits, compute `bodyHash`. | ||||
| * **Connector**: send; handle provider‑specific rate limits and backoffs; `maxAttempts` with exponential jitter; overflow → DLQ (dead‑letter topic) + UI surfacing. | ||||
|  | ||||
| **Idempotency**: per action **idempotency key** stored in Redis (TTL = `throttle window` or `digest window`). Connectors also respect **provider** idempotency where available (e.g., Slack `client_msg_id`). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 10) Reliability & rate controls | ||||
|  | ||||
| * **Per‑tenant** RPM caps (default 600/min) + **per‑channel** concurrency (Slack 1–4, Teams 1–2, Email 8–32 based on relay). | ||||
| * **Backoff** map: Slack 429 → respect `Retry‑After`; SMTP 4xx → retry; 5xx → retry with jitter; permanent rejects → drop with status recorded. | ||||
| * **DLQ**: NATS/Redis stream `notify.dlq` with `{event, rule, action, error}` for operator inspection; UI shows DLQ items. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 11) Security & privacy | ||||
|  | ||||
| * **AuthZ**: all APIs require **Authority** OpToks; actions scoped by tenant. | ||||
| * **Secrets**: `secretRef` only; Notify fetches just‑in‑time from Authority Secret proxy or K8s Secret (mounted). No plaintext secrets in Mongo. | ||||
| * **Egress TLS**: validate SSL; pin domains per channel config; optional CA bundle override for on‑prem SMTP. | ||||
| * **Webhook signing**: HMAC or Ed25519 signatures in `X-StellaOps-Signature` + replay‑window timestamp; include canonical body hash in header. | ||||
| * **Redaction**: deliveries store **hashes** of bodies, not full payloads for chat/email to minimize PII retention (configurable). | ||||
| * **Quiet hours**: per tenant (e.g., 22:00–06:00) route high‑sev only; defer others to digests. | ||||
| * **Loop prevention**: Webhook target allowlist + event origin tags; do not ingest own webhooks. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 12) Observability (Prometheus + OTEL) | ||||
|  | ||||
| * `notify.events_consumed_total{kind}` | ||||
| * `notify.rules_matched_total{ruleId}` | ||||
| * `notify.throttled_total{reason}` | ||||
| * `notify.digest_coalesced_total{window}` | ||||
| * `notify.sent_total{channel}` / `notify.failed_total{channel,code}` | ||||
| * `notify.delivery_latency_seconds{channel}` (end‑to‑end) | ||||
| * **Tracing**: spans `ingest`, `match`, `render`, `send`; correlation id = `eventId`. | ||||
|  | ||||
| **SLO targets** | ||||
|  | ||||
| * Event→delivery p95 **≤ 30–60 s** under nominal load. | ||||
| * Failure rate p95 **< 0.5%** per hour (excluding provider outages). | ||||
| * Duplicate rate **≈ 0** (idempotency working). | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 13) Configuration (YAML) | ||||
|  | ||||
| ```yaml | ||||
| notify: | ||||
|   authority: | ||||
|     issuer: "https://authority.internal" | ||||
|     require: "dpop"               # or "mtls" | ||||
|   bus: | ||||
|     kind: "redis"                 # or "nats" | ||||
|     streams: | ||||
|       - "scanner.events" | ||||
|       - "scheduler.events" | ||||
|       - "attestor.events" | ||||
|       - "zastava.events" | ||||
|   mongo: | ||||
|     uri: "mongodb://mongo/notify" | ||||
|   limits: | ||||
|     perTenantRpm: 600 | ||||
|     perChannel: | ||||
|       slack:   { concurrency: 2 } | ||||
|       teams:   { concurrency: 1 } | ||||
|       email:   { concurrency: 8 } | ||||
|       webhook: { concurrency: 8 } | ||||
|   digests: | ||||
|     defaultWindow: "1h" | ||||
|     maxItems: 100 | ||||
|   quietHours: | ||||
|     enabled: true | ||||
|     window: "22:00-06:00" | ||||
|     minSeverity: "critical" | ||||
|   webhooks: | ||||
|     sign: | ||||
|       method: "ed25519"           # or "hmac-sha256" | ||||
|       keyRef: "ref://notify/webhook-sign-key" | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 14) UI touch‑points | ||||
|  | ||||
| * **Notifications → Channels**: add Slack/Teams/Email/Webhook; run **health**; rotate secrets. | ||||
| * **Notifications → Rules**: create/edit YAML rules with linting; test with sample events; see match rate. | ||||
| * **Notifications → Deliveries**: timeline with filters (status, channel, rule); inspect last error; retry. | ||||
| * **Digest preview**: shows current window contents and when it will flush. | ||||
| * **Quiet hours**: configure per tenant; show overrides. | ||||
| * **DLQ**: browse dead‑letters; requeue after fix. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 15) Failure modes & responses | ||||
|  | ||||
| | Condition                           | Behavior                                                                              | | ||||
| | ----------------------------------- | ------------------------------------------------------------------------------------- | | ||||
| | Slack 429 / Teams 429               | Respect `Retry‑After`, backoff with jitter, reduce concurrency                        | | ||||
| | SMTP transient 4xx                  | Retry up to `maxAttempts`; escalate to DLQ on exhaust                                 | | ||||
| | Invalid channel secret              | Mark channel unhealthy; suppress sends; surface in UI                                 | | ||||
| | Rule explosion (matches everything) | Safety valve: per‑tenant RPM caps; auto‑pause rule after X drops; UI alert            | | ||||
| | Bus outage                          | Buffer to local queue (bounded); resume consuming when healthy                        | | ||||
| | Mongo slowness                      | Fall back to Redis throttles; batch write deliveries; shed low‑priority notifications | | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 16) Testing matrix | ||||
|  | ||||
| * **Unit**: matchers, throttle math, digest coalescing, idempotency keys, template rendering edge cases. | ||||
| * **Connectors**: provider‑level rate limits, payload size truncation, error mapping. | ||||
| * **Integration**: synthetic event storm (10k/min), ensure p95 latency & duplicate rate. | ||||
| * **Security**: DPoP/mTLS on APIs; secretRef resolution; webhook signing & replay windows. | ||||
| * **i18n**: localized templates render deterministically. | ||||
| * **Chaos**: Slack/Teams API flaps; SMTP greylisting; Redis hiccups; ensure graceful degradation. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 17) Sequences (representative) | ||||
|  | ||||
| **A) New criticals after Feedser delta (Slack immediate + Email hourly digest)** | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant SCH as Scheduler | ||||
|   participant NO as Notify.Worker | ||||
|   participant SL as Slack | ||||
|   participant SMTP as Email | ||||
|  | ||||
|   SCH->>NO: bus event scheduler.rescan.delta { newCritical:1, digest:sha256:... } | ||||
|   NO->>NO: match rules (Slack immediate; Email hourly digest) | ||||
|   NO->>SL: chat.postMessage (concise) | ||||
|   SL-->>NO: 200 OK | ||||
|   NO->>NO: append to digest window (email:soc) | ||||
|   Note over NO: At window close → render digest email | ||||
|   NO->>SMTP: send email (detailed digest) | ||||
|   SMTP-->>NO: 250 OK | ||||
| ``` | ||||
|  | ||||
| **B) Admission deny (Teams card + Webhook)** | ||||
|  | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant ZA as Zastava | ||||
|   participant NO as Notify.Worker | ||||
|   participant TE as Teams | ||||
|   participant WH as Webhook | ||||
|  | ||||
|   ZA->>NO: bus event zastava.admission { decision: "deny", reasons: [...] } | ||||
|   NO->>TE: POST adaptive card | ||||
|   TE-->>NO: 200 OK | ||||
|   NO->>WH: POST JSON (signed) | ||||
|   WH-->>NO: 2xx | ||||
| ``` | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 18) Implementation notes | ||||
|  | ||||
| * **Language**: .NET 10; minimal API; `System.Text.Json` with canonical writer for body hashing; Channels for pipelines. | ||||
| * **Bus**: Redis Streams (**XGROUP** consumers) or NATS JetStream for at‑least‑once with ack; per‑tenant consumer groups to localize backpressure. | ||||
| * **Templates**: compile and cache per rule+channel+locale; version with rule `updatedAt` to invalidate. | ||||
| * **Rules**: store raw YAML + parsed AST; validate with schema + static checks (e.g., nonsensical combos). | ||||
| * **Secrets**: pluggable secret resolver (Authority Secret proxy, K8s, Vault). | ||||
| * **Rate limiting**: `System.Threading.RateLimiting` + per‑connector adapters. | ||||
|  | ||||
| --- | ||||
|  | ||||
| ## 19) Roadmap (post‑v1) | ||||
|  | ||||
| * **PagerDuty/Opsgenie** connectors; **Jira** ticket creation. | ||||
| * **User inbox** (in‑app notifications) + mobile push via webhook relay. | ||||
| * **Anomaly suppression**: auto‑pause noisy rules with hints (learned thresholds). | ||||
| * **Graph rules**: “only notify if *not_affected → affected* transition at consensus layer”. | ||||
| * **Label enrichment**: pluggable taggers (business criticality, data classification) to refine matchers. | ||||
							
								
								
									
										61
									
								
								docs/modules/notify/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										61
									
								
								docs/modules/notify/implementation_plan.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,61 @@ | ||||
| # Implementation plan — Notify | ||||
|  | ||||
| ## Delivery phases | ||||
| - **Phase 1 – Core rules engine & delivery ledger**   | ||||
|   Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, and audit logging. | ||||
| - **Phase 2 – Connectors & rendering**   | ||||
|   Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, and secret referencing. | ||||
| - **Phase 3 – Console & CLI authoring**   | ||||
|   Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, and test sends. | ||||
| - **Phase 4 – Governance & observability**   | ||||
|   Add approvals, RBAC, tenant quotas, Notify metrics/logs/traces, dashboards, Notify-specific alerts, and Notify runbooks. | ||||
| - **Phase 5 – Offline & compliance**   | ||||
|   Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, and auditing for regulated environments. | ||||
|  | ||||
| ## Work breakdown | ||||
| - **Service & worker** | ||||
|   - REST API for rules/channels/delivery history, idempotency middleware, digest scheduler. | ||||
|   - Worker pipelines for event intake, rule matching, template rendering, delivery execution, retries, and throttling. | ||||
|   - Delivery ledger capturing payload metadata, response, retry state, DSSE signatures. | ||||
| - **Connectors** | ||||
|   - Slack/Teams/Email/Webhook plug-ins with configuration validation, rate limiting, error classification. | ||||
|   - Secrets referenced via Authority/Secret store; no plaintext storage. | ||||
| - **Console & CLI** | ||||
|   - Console module for rules builder, condition editor, preview, test send, delivery insights, digests and schedule configuration. | ||||
|   - CLI (`stella notify rule|channel|delivery`) for automation, export/import. | ||||
| - **Integrations** | ||||
|   - Event sources: Concelier, Excititor, Policy Engine, Vuln Explorer, Export Center, Attestor, Zastava, Scheduler. | ||||
|   - Notify events to Notify (meta) for failure escalations, accepted-risk expiration reminders. | ||||
| - **Observability & ops** | ||||
|   - Metrics: delivery success/failure, retry counts, throttle hits, digest generation, channel health. | ||||
|   - Logs/traces with tenant, rule ID, channel, correlation ID; dashboards and alerts. | ||||
|   - Runbooks for misconfigured channels, throttling, event backlog, incident digest. | ||||
| - **Docs & compliance** | ||||
|   - Update Notifications Studio guides, channel runbooks, security/RBAC docs, Offline Kit instructions. | ||||
|   - Provide compliance checklist (audit logging, retention, opt-out). | ||||
|  | ||||
| ## Acceptance criteria | ||||
| - Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures. | ||||
| - Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely. | ||||
| - Console/CLI support rule creation, testing, digests, delivery browsing, and export/import workflows. | ||||
| - Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation. | ||||
| - Offline Kit bundle contains configs, rules, digests, and deployment scripts for air-gapped installs. | ||||
| - Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules. | ||||
|  | ||||
| ## Risks & mitigations | ||||
| - **Notification storms:** throttling, digests, dedupe windows, preview/test gating. | ||||
| - **Secret compromise:** secret references only, rotation workflows, audit logging. | ||||
| - **Connector API changes:** versioned adapter layer, nightly health checks, fallback channels. | ||||
| - **Noise vs signal:** simulation previews, metrics, rule scoring, recommended defaults. | ||||
| - **Offline parity:** export/import of rules, connectors, and digests with signed manifests. | ||||
|  | ||||
| ## Test strategy | ||||
| - **Unit:** rule evaluation, template rendering, connector clients, throttling, digests. | ||||
| - **Integration:** end-to-end events from core services, multi-channel routing, retries, audit logging. | ||||
| - **Performance:** burst throttling, digest creation, large rule sets. | ||||
| - **Security:** RBAC tests, tenant isolation, secret reference validation, DSSE signature verification. | ||||
| - **Offline:** export/import round-trips, Offline Kit deployment, manual delivery replay. | ||||
|  | ||||
| ## Definition of done | ||||
| - Notify service, workers, connectors, Console/CLI, observability, and Offline Kit assets shipped with documentation and runbooks. | ||||
| - Compliance checklist appended to docs; ./TASKS.md and ../../TASKS.md updated with progress. | ||||
| @@ -0,0 +1,32 @@ | ||||
| { | ||||
|   "schemaVersion": "notify.channel@1", | ||||
|   "channelId": "channel-slack-sec-ops", | ||||
|   "tenantId": "tenant-01", | ||||
|   "name": "slack:sec-ops", | ||||
|   "type": "slack", | ||||
|   "displayName": "SecOps Slack", | ||||
|   "description": "Primary incident response channel.", | ||||
|   "config": { | ||||
|     "secretRef": "ref://notify/channels/slack/sec-ops", | ||||
|     "target": "#sec-ops", | ||||
|     "properties": { | ||||
|       "workspace": "stellaops-sec" | ||||
|     }, | ||||
|     "limits": { | ||||
|       "concurrency": 2, | ||||
|       "requestsPerMinute": 60, | ||||
|       "timeout": "PT10S" | ||||
|     } | ||||
|   }, | ||||
|   "enabled": true, | ||||
|   "labels": { | ||||
|     "team": "secops" | ||||
|   }, | ||||
|   "metadata": { | ||||
|     "createdByTask": "NOTIFY-MODELS-15-102" | ||||
|   }, | ||||
|   "createdBy": "ops:amir", | ||||
|   "createdAt": "2025-10-18T17:02:11+00:00", | ||||
|   "updatedBy": "ops:amir", | ||||
|   "updatedAt": "2025-10-18T17:45:00+00:00" | ||||
| } | ||||
| @@ -0,0 +1,46 @@ | ||||
| { | ||||
|   "items": [ | ||||
|     { | ||||
|       "deliveryId": "delivery-7f3b6c51", | ||||
|       "tenantId": "tenant-acme", | ||||
|       "ruleId": "rule-critical-slack", | ||||
|       "actionId": "slack-secops", | ||||
|       "eventId": "4f6e9c09-01b4-4c2a-8a57-3d06de182d74", | ||||
|       "kind": "scanner.report.ready", | ||||
|       "status": "Sent", | ||||
|       "statusReason": null, | ||||
|       "rendered": { | ||||
|         "channelType": "Slack", | ||||
|         "format": "Slack", | ||||
|         "target": "#sec-alerts", | ||||
|         "title": "Critical findings detected", | ||||
|         "body": "{\"text\":\"Critical findings detected\",\"blocks\":[{\"type\":\"section\",\"text\":{\"type\":\"mrkdwn\",\"text\":\"*Critical findings detected*\\n1 new critical finding across 2 images.\"}},{\"type\":\"context\",\"elements\":[{\"type\":\"mrkdwn\",\"text\":\"Preview generated 2025-10-19T16:23:41.889Z · Trace `trace-58c212`\"}]}]}", | ||||
|         "summary": "1 new critical finding across 2 images.", | ||||
|         "textBody": "1 new critical finding across 2 images.\nTrace: trace-58c212", | ||||
|         "locale": "en-us", | ||||
|         "bodyHash": "febf9b2a630d862b07f4390edfbf31f5e8b836529f5232c491f4b3f6dba4a4b2", | ||||
|         "attachments": [] | ||||
|       }, | ||||
|       "attempts": [ | ||||
|         { | ||||
|           "timestamp": "2025-10-19T16:23:42.112Z", | ||||
|           "status": "Succeeded", | ||||
|           "statusCode": 200, | ||||
|           "reason": null | ||||
|         } | ||||
|       ], | ||||
|       "metadata": { | ||||
|         "channelType": "slack", | ||||
|         "target": "#sec-alerts", | ||||
|         "previewProvider": "fallback", | ||||
|         "traceId": "trace-58c212", | ||||
|         "slack.channel": "#sec-alerts" | ||||
|       }, | ||||
|       "createdAt": "2025-10-19T16:23:41.889Z", | ||||
|       "sentAt": "2025-10-19T16:23:42.101Z", | ||||
|       "completedAt": "2025-10-19T16:23:42.112Z" | ||||
|     } | ||||
|   ], | ||||
|   "count": 1, | ||||
|   "continuationToken": "2025-10-19T16:23:41.889Z|tenant-acme:delivery-7f3b6c51" | ||||
| } | ||||
| @@ -0,0 +1,34 @@ | ||||
| { | ||||
|   "eventId": "8a8d6a2f-9315-49fe-9d52-8fec79ec7aeb", | ||||
|   "kind": "scanner.report.ready", | ||||
|   "version": "1", | ||||
|   "tenant": "tenant-01", | ||||
|   "ts": "2025-10-19T03:58:42+00:00", | ||||
|   "actor": "scanner-webservice", | ||||
|   "scope": { | ||||
|     "namespace": "prod-payment", | ||||
|     "repo": "ghcr.io/acme/api", | ||||
|     "digest": "sha256:79c1f9e5...", | ||||
|     "labels": { | ||||
|       "environment": "production" | ||||
|     }, | ||||
|     "attributes": {} | ||||
|   }, | ||||
|   "payload": { | ||||
|     "delta": { | ||||
|       "kev": [ | ||||
|         "CVE-2025-40123" | ||||
|       ], | ||||
|       "newCritical": 1, | ||||
|       "newHigh": 2 | ||||
|     }, | ||||
|     "links": { | ||||
|       "rekor": "https://rekor.stella.local/api/v1/log/entries/1", | ||||
|       "ui": "https://ui.stella.local/reports/sha256-79c1f9e5" | ||||
|     }, | ||||
|     "verdict": "fail" | ||||
|   }, | ||||
|   "attributes": { | ||||
|     "correlationId": "scan-23a6" | ||||
|   } | ||||
| } | ||||
| @@ -0,0 +1,63 @@ | ||||
| { | ||||
|   "schemaVersion": "notify.rule@1", | ||||
|   "ruleId": "rule-secops-critical", | ||||
|   "tenantId": "tenant-01", | ||||
|   "name": "Critical digests to SecOps", | ||||
|   "description": "Escalate KEV-tagged findings to on-call feeds.", | ||||
|   "enabled": true, | ||||
|   "match": { | ||||
|     "eventKinds": [ | ||||
|       "scanner.report.ready", | ||||
|       "scheduler.rescan.delta" | ||||
|     ], | ||||
|     "namespaces": [ | ||||
|       "prod-*" | ||||
|     ], | ||||
|     "repositories": [], | ||||
|     "digests": [], | ||||
|     "labels": [], | ||||
|     "componentPurls": [], | ||||
|     "minSeverity": "high", | ||||
|     "verdicts": [], | ||||
|     "kevOnly": true, | ||||
|     "vex": { | ||||
|       "includeAcceptedJustifications": false, | ||||
|       "includeRejectedJustifications": false, | ||||
|       "includeUnknownJustifications": false, | ||||
|       "justificationKinds": [ | ||||
|         "component-remediated", | ||||
|         "not-affected" | ||||
|       ] | ||||
|     } | ||||
|   }, | ||||
|   "actions": [ | ||||
|     { | ||||
|       "actionId": "email-digest", | ||||
|       "channel": "email:soc", | ||||
|       "digest": "hourly", | ||||
|       "template": "digest", | ||||
|       "enabled": true, | ||||
|       "metadata": { | ||||
|         "locale": "en-us" | ||||
|       } | ||||
|     }, | ||||
|     { | ||||
|       "actionId": "slack-oncall", | ||||
|       "channel": "slack:sec-ops", | ||||
|       "template": "concise", | ||||
|       "throttle": "PT5M", | ||||
|       "metadata": {}, | ||||
|       "enabled": true | ||||
|     } | ||||
|   ], | ||||
|   "labels": { | ||||
|     "team": "secops" | ||||
|   }, | ||||
|   "metadata": { | ||||
|     "source": "sprint-15" | ||||
|   }, | ||||
|   "createdBy": "ops:zoya", | ||||
|   "createdAt": "2025-10-19T04:12:27+00:00", | ||||
|   "updatedBy": "ops:zoya", | ||||
|   "updatedAt": "2025-10-19T04:45:03+00:00" | ||||
| } | ||||
| @@ -0,0 +1,19 @@ | ||||
| { | ||||
|   "schemaVersion": "notify.template@1", | ||||
|   "templateId": "tmpl-slack-concise", | ||||
|   "tenantId": "tenant-01", | ||||
|   "channelType": "slack", | ||||
|   "key": "concise", | ||||
|   "locale": "en-us", | ||||
|   "body": "{{severity_icon payload.delta.newCritical}} {{summary}}", | ||||
|   "description": "Slack concise message for high severity findings.", | ||||
|   "renderMode": "markdown", | ||||
|   "format": "slack", | ||||
|   "metadata": { | ||||
|     "version": "2025-10-19" | ||||
|   }, | ||||
|   "createdBy": "ops:zoya", | ||||
|   "createdAt": "2025-10-19T05:00:00+00:00", | ||||
|   "updatedBy": "ops:zoya", | ||||
|   "updatedAt": "2025-10-19T05:45:00+00:00" | ||||
| } | ||||
							
								
								
									
										73
									
								
								docs/modules/notify/resources/schemas/notify-channel@1.json
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										73
									
								
								docs/modules/notify/resources/schemas/notify-channel@1.json
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,73 @@ | ||||
| { | ||||
|   "$id": "https://stella-ops.org/schemas/notify/notify-channel@1.json", | ||||
|   "$schema": "http://json-schema.org/draft-07/schema#", | ||||
|   "title": "Notify Channel", | ||||
|   "type": "object", | ||||
|   "required": [ | ||||
|     "schemaVersion", | ||||
|     "channelId", | ||||
|     "tenantId", | ||||
|     "name", | ||||
|     "type", | ||||
|     "config", | ||||
|     "enabled", | ||||
|     "createdAt", | ||||
|     "updatedAt" | ||||
|   ], | ||||
|   "properties": { | ||||
|     "schemaVersion": {"type": "string", "const": "notify.channel@1"}, | ||||
|     "channelId": {"type": "string"}, | ||||
|     "tenantId": {"type": "string"}, | ||||
|     "name": {"type": "string"}, | ||||
|     "type": { | ||||
|       "type": "string", | ||||
|       "enum": ["slack", "teams", "email", "webhook", "custom"] | ||||
|     }, | ||||
|     "displayName": {"type": "string"}, | ||||
|     "description": {"type": "string"}, | ||||
|     "config": {"$ref": "#/$defs/channelConfig"}, | ||||
|     "enabled": {"type": "boolean"}, | ||||
|     "labels": {"$ref": "#/$defs/stringMap"}, | ||||
|     "metadata": {"$ref": "#/$defs/stringMap"}, | ||||
|     "createdBy": {"type": "string"}, | ||||
|     "createdAt": {"type": "string", "format": "date-time"}, | ||||
|     "updatedBy": {"type": "string"}, | ||||
|     "updatedAt": {"type": "string", "format": "date-time"} | ||||
|   }, | ||||
|   "additionalProperties": false, | ||||
|   "$defs": { | ||||
|     "channelConfig": { | ||||
|       "type": "object", | ||||
|       "required": ["secretRef"], | ||||
|       "properties": { | ||||
|         "secretRef": {"type": "string"}, | ||||
|         "target": {"type": "string"}, | ||||
|         "endpoint": {"type": "string", "format": "uri"}, | ||||
|         "properties": {"$ref": "#/$defs/stringMap"}, | ||||
|         "limits": {"$ref": "#/$defs/channelLimits"} | ||||
|       }, | ||||
|       "additionalProperties": false | ||||
|     }, | ||||
|     "channelLimits": { | ||||
|       "type": "object", | ||||
|       "properties": { | ||||
|         "concurrency": {"type": "integer", "minimum": 1}, | ||||
|         "requestsPerMinute": {"type": "integer", "minimum": 1}, | ||||
|         "timeout": { | ||||
|           "type": "string", | ||||
|           "pattern": "^P(T.*)?$", | ||||
|           "description": "ISO 8601 duration" | ||||
|         }, | ||||
|         "maxBatchSize": {"type": "integer", "minimum": 1} | ||||
|       }, | ||||
|       "additionalProperties": false | ||||
|     }, | ||||
|     "stringMap": { | ||||
|       "type": "object", | ||||
|       "patternProperties": { | ||||
|         ".*": {"type": "string"} | ||||
|       }, | ||||
|       "additionalProperties": false | ||||
|     } | ||||
|   } | ||||
| } | ||||
Some files were not shown because too many files have changed in this diff Show More
		Reference in New Issue
	
	Block a user