docs consolidation and others

This commit is contained in:
master
2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions

View File

@@ -0,0 +1,36 @@
# CycloneDX 1.6 to 1.7 migration
> **STATUS: MIGRATION COMPLETED**
> CycloneDX 1.7 support completed in Sprint 3200 (November 2024).
> All scanner output now generates CycloneDX 1.7 by default.
> This document preserved for operators migrating from StellaOps versions <0.9.0.
## Summary
- Default SBOM output is now CycloneDX 1.7 (JSON and Protobuf).
- CycloneDX 1.6 ingestion remains supported for backward compatibility.
- VEX exports include CycloneDX 1.7 fields for ratings, sources, and affected versions.
## What changed
- `specVersion` is emitted as `1.7`.
- Media types include explicit 1.7 versions:
- `application/vnd.cyclonedx+json; version=1.7`
- `application/vnd.cyclonedx+protobuf; version=1.7`
- VEX documents may now include:
- `vulnerability.ratings[]` with CVSS v4/v3/v2 metadata
- `vulnerability.source` with provider and PURL/URL reference
- `vulnerability.affects[].versions[]` entries
## Required updates for consumers
1. Update Accept and Content-Type headers to request or send CycloneDX 1.7.
2. If you validate against JSON schemas, switch to the CycloneDX 1.7 schema.
3. Ensure parsers ignore unknown fields for forward compatibility.
4. Update OCI referrer media types to the 1.7 values.
## Compatibility notes
- CycloneDX 1.6 SBOMs are still accepted on ingest.
- CycloneDX 1.7 is the default output on Scanner and export surfaces.
## References
- CycloneDX 1.7 specification: https://cyclonedx.org/docs/1.7/
- Scanner architecture: `docs/modules/scanner/architecture.md`
- SBOM service architecture: `docs/modules/sbomservice/architecture.md`

View File

@@ -0,0 +1,58 @@
# Exception Governance Migration Guide
> **Imposed rule:** All exceptions must be time-bound, tenant-scoped, and auditable; legacy perpetual suppressions are prohibited after cutover.
This guide explains how to migrate from legacy suppressions/notifications to the unified Exception Governance model in Excititor and Console.
## 1. What changes
- **Unified exception object:** replaces ad-hoc suppressions. Fields: `tenant`, `scope` (purl/image/component), `vuln` (CVE/alias), `justification`, `expiration`, `owner`, `evidence_refs`, `policy_binding`, `status` (draft/staged/active/expired).
- **Two-phase activation:** `draft → staged → active` with policy simulator snapshot; rollbacks produce a compensating exception marked `supersedes`.
- **Notifications:** move from broad email hooks to route-specific notifications (policy events, expiring exceptions) using Notify service templates.
- **Auditability:** each lifecycle change emits Timeline + Evidence Locker entries; exports include DSSE attestation of the exception set.
## 2. Migration phases
1. **Inventory legacy suppressions**
- Export current suppressions and notification rules (per tenant) to NDJSON.
- Classify by scope: package, image, repo, tenant-wide.
2. **Normalize and enrich**
- Map each suppression to the unified schema; add `expiration` (default 30/90 days), `owner`, `justification` (use VEX schema categories when available).
- Attach evidence references (ticket URL, VEX claim ID, scan report digest) where missing.
3. **Create staged exceptions**
- Import NDJSON via Console or `stella exceptions import --stage` (CLI guide: `docs/modules/cli/guides/exceptions.md`).
- Run policy simulator; resolve conflicts flagged by Aggregation-Only Contract (AOC) enforcement.
4. **Activate with guardrails**
- Promote staged → active in batches; each promotion emits Timeline events and optional Rekor-backed attestation bundle (if Attestor is enabled).
- Configure Notify templates for expiring exceptions (T14/T3 days) and denied promotions.
5. **Decommission legacy paths**
- Disable legacy suppression writes; keep read-only for 30 days with banner noting deprecation.
- Remove legacy notification hooks after confirming staged/active parity.
## 3. Data shapes
- **Import NDJSON record (minimal):** `{ tenant, vuln, scope:{type:'purl'|'image'|'component', value}, justification, expiration, owner }
- **Export manifest:** `{ generated_at, tenant, count, sha256, aoc_enforced, source:'migration-legacy-suppressions' }`
- **Attestation (optional):** DSSE over exception set digest; stored alongside manifest in Evidence Locker.
## 4. Rollback plan
- Keep legacy suppressions read-only for 30 days.
- If a promotion batch causes regressions, mark affected exceptions `expired` and re-enable corresponding legacy suppressions for that tenant only.
- Emit `rollback_notice` Timeline events and Notify operators.
## 5. Air-gap considerations
- Imports/exports are file-based (NDJSON + manifest); no external calls required.
- Verification uses bundled attestations; Rekor proofs are optional offline.
- Console shows AOC badge when Aggregation-Only Contract limits apply; exports record `aoc=true`.
## 6. Checklists
- [ ] All legacy suppressions exported to NDJSON per tenant.
- [ ] Every exception has justification, owner, expiration.
- [ ] Policy simulator run and results attached to exception batch.
- [ ] Notify templates enabled for expiring/denied promotions.
- [ ] Legacy write paths disabled; read-only banner present.
- [ ] Attestation bundle stored (if Attestor available) and Evidence Locker entry created.
## 7. References
- `docs/modules/excititor/architecture.md`
- `docs/modules/excititor/implementation_plan.md`
- `docs/modules/cli/guides/exceptions.md`
- `docs/security/export-hardening.md`
- `docs/policy/ui-integration.md`

View File

@@ -0,0 +1,61 @@
# Graph Parity Rollout Guide
Status: Draft (2025-11-26) — DOCS-GRAPH-24-007.
## Goal
Transition from legacy graph surfaces (Cartographer/UI stubs) to the new Graph API + Indexer stack with clear rollback and parity checks.
## Scope
- Graph API (Sprint 0207) + Graph Indexer (Sprint 0141)
- Consumers: Graph Explorer, Vuln Explorer, Console/CLI, Export Center, Advisory AI overlays
- Tenants: all; pilot recommended with 12 tenants first
## Phased rollout
1) **Pilot**
- Enable new Graph API for pilot tenants behind feature flag `graph.api.v2`.
- Run daily parity job: compare node/edge counts and hashes against legacy output for selected snapshots.
2) **Shadow**
- Mirror queries from UI/CLI to both legacy and new APIs; log differences.
- Metrics to track: `parity_diff_nodes_total`, `parity_diff_edges_total`, p95 latency deltas.
3) **Cutover**
- Switch UI/CLI to new endpoints; keep shadow logging for 1 week.
- Freeze legacy write paths; keep read-only export for rollback.
4) **Cleanup**
- Remove legacy routes; retain archived parity reports and exports.
## Parity checks
- Deterministic snapshots: compare SHA256 of `nodes.jsonl` and `edges.jsonl` (sorted).
- Query parity: run canned queries (search/query/paths/diff) and compare:
- Node/edge counts, first/last IDs
- Presence of overlays (policy/vex)
- Cursor progression
- Performance: ensure p95 latency within ±20% of legacy baseline during shadow.
## Rollback
- Keep legacy service in read-only mode; toggle feature flag back if parity fails.
- Retain last good exports and parity reports for each tenant.
- If overlays mismatch: clear overlay cache and rerun policy overlay ingestion; fall back to legacy overlays temporarily.
## Observability
- Dashboards: add panels for parity diff counters and latency delta.
- Alerts:
- `parity_diff_nodes_total > 0` for 10m
- Latency delta > 20% for 10m
- Logs should include tenant, snapshotId, query type, cursor, hash comparisons.
## Owners
- Graph API Guild (API/runtime)
- Graph Indexer Guild (snapshots/ingest)
- Observability Guild (dashboards/alerts)
- UI/CLI Guilds (client cutover)
## Checklists
- [ ] Feature flag wired and default off.
- [ ] Canned query set stored in repo (deterministic inputs).
- [ ] Parity job outputs SHA256 comparison and stores reports per tenant/date.
- [ ] Rollback tested in staging.
## References
- `docs/api/graph.md`, `docs/modules/graph/architecture-index.md`
- `docs/implplan/SPRINT_0141_0001_0001_graph_indexer.md`
- `docs/implplan/SPRINT_0207_0001_0001_graph.md`

View File

@@ -0,0 +1,146 @@
# No-Merge Migration Playbook
_Last updated: 2025-11-06_
This playbook guides the full retirement of the legacy Merge service (`AdvisoryMergeService`) in favour of Link-Not-Merge (LNM) observations plus linksets. It is written for the BE-Merge, Architecture, DevOps, and Docs guilds coordinating Sprint110 (Ingestion & Evidence) deliverables, and it feeds CONCELIER-LNM-21-101 / MERGE-LNM-21-001 and downstream DOCS-LNM-22-008.
## 0. Scope & objectives
- **Primary goal:** cut over all advisory pipelines to Link-Not-Merge with no residual dependencies on `AdvisoryMergeService`.
- **Secondary goals:** maintain deterministic evidence, zero data loss, and reversible deployment across online and offline tenants.
- **Success criteria:**
- All connectors emit observation `affected.versions[]` with provenance and pass LNM guardrails.
- Linkset dashboards show zero `missing_version_entries_total` and no `Normalized version rules missing…` warnings.
- Policy, Export Center, and CLI consumers operate solely on observations/linksets.
- Rollback playbook validated and rehearsed in staging.
## 1. Prerequisites checklist
| Item | Owner | Notes |
| --- | --- | --- |
| Normalized version ranges emitted for all Sprint110 connectors (`Acsc`, `Cccs`, `CertBund`, `CertCc`, `Cve`, `Ghsa`, `Ics.Cisa`, `Kisa`, `Ru.Bdu`, `Ru.Nkcki`, `Vndr.Apple`, `Vndr.Cisco`, `Vndr.Msrc`). | Connector guilds | Follow `docs/dev/normalized-rule-recipes.md`; update fixtures with `UPDATE_*_FIXTURES=1`. |
| Metrics dashboards (`LinksetVersionCoverage`, `Normalized version rules missing`) available in Grafana/CI snapshots. | Observability guild | Publish baseline before shadow rollout. |
| Concelier WebService exposes `linkset` and `observation` read APIs for policy/CLI consumers. | BE-Merge / Platform | Confirm contract parity with Merge outputs. |
| Export Center / Offline Kit aware of new manifests. | Export Center guild | Provide beta bundle for QA verification. |
| Docs guild aligned on public migration messaging. | Docs guild | Update `docs/dev`, `docs/modules/concelier`, and release notes once cutover date is locked. |
Do not proceed to Phase1 until all prerequisites are checked or explicitly waived by Architecture guild.
## 2. Feature flag & configuration plan
| Toggle | Default | Purpose | Notes |
| --- | --- | --- | --- |
| `concelier:features:noMergeEnabled` | `true` | Master switch to disable legacy Merge job scheduling/execution. | Applies to WebService + Worker; gate `AdvisoryMergeService` DI registration. |
| `concelier:features:lnmShadowWrites` | `true` | Enables dual-write of linksets while Merge remains active. | Keep enabled throughout Phase01 to validate parity. |
| `concelier:jobs:merge:allowlist` | `[]` | Explicit allowlist for Merge jobs when noMergeEnabled is `false`. | Set to empty during Phase2+ to prevent accidental restarts. |
| `policy:overlays:requireLinksetEvidence` | `false` | Policy engine safety net to require linkset-backed findings. | Flip to `true` only after cutover (Phase2). |
> 2025-11-06: WebService now defaults `concelier:features:noMergeEnabled` to `true`, skipping Merge DI registration and removing the `merge:reconcile` job unless operators set the flag to `false` and allowlist the job (MERGE-LNM-21-002).
>
> 2025-11-06: Analyzer `CONCELIER0002` ships with Concelier hosts to block new references to `AdvisoryMergeService` / `AddMergeModule`. Suppressions must be paired with an explicit migration note.
> 2025-11-06: Analyzer coverage validated via unit tests catching object creation, field declarations, `typeof`, and DI extension invocations; merge assemblies remain exempt for legacy cleanup helpers.
> **Configuration hygiene:** Document the toggle values per environment in `ops/devops/configuration/staging.md` and `ops/devops/configuration/production.md`. Air-gapped customers receive defaults through the Offline Kit release notes.
## 3. Rollout phases
| Phase | Goal | Duration | Key actions |
| --- | --- | --- | --- |
| **0 Preparation** | Ensure readiness | 23 days | Finalise prerequisites, snapshot Merge metrics, dry-run backfill scripts in dev. |
| **1 Shadow / Dual Write** | Validate parity | 57 days | Enable `lnmShadowWrites`, keep Merge primary. Compare linkset vs merged outputs using `stella concelier diff-merge --snapshot <date>`; fix discrepancies. |
| **2 Cutover** | Switch to LNM | 1 day (per env) | Enable `noMergeEnabled`, disable Merge job schedules, update Policy/Export configs, run post-cutover smoke tests. |
| **3 Harden** | Decommission Merge | 23 days | Remove Merge background services, delete `merge_event` retention jobs, clean dashboards, notify operators. |
### 3.1 Environment sequencing
1. **Dev/Test clusters:** Validate all automation. Run full regression suite (`dotnet test src/Concelier/...`).
2. **Staging:** Execute complete backfill (see §4) and collect 24h of telemetry before sign-off.
3. **Production:** Perform cutover during low-ingest window; communicate via Slack/email + status page two days in advance.
4. **Offline kit:** Package new Observer snapshots with LNM-only data; ensure instructions cover flag toggles for air-gapped deployments.
### 3.2 Smoke test matrix
- `stella concelier status --include linkset` returns healthy and shows zero Merge workers.
- `stella policy evaluate` against sample tenants produces identical findings pre/post cutover.
- Export Center bundle diff shows only expected metadata changes (manifest ID, timestamps).
- Grafana dashboards: `linkset_insert_duration_ms` steady, `merge.identity.conflicts` flatlined.
## 4. Backfill strategy
1. **Freeze Merge writes:** Pause Merge job scheduler (`MergeJobScheduler.PauseAsync`) to prevent new merge events while snapshots are taken.
2. **Generate linkset baseline:** Run `dotnet run --project src/Concelier/StellaOps.Concelier.WebService -- linkset backfill --from 2024-01-01` (or equivalent CLI job) to rebuild linksets from `advisory_raw`. Capture job output artefacts and attach to the sprint issue.
3. **Validate parity:** Use the internal diff tool (`tools/concelier/compare-linkset-merge.ps1`) to compare sample advisories. Any diffs must be triaged before production cutover.
4. **Publish evidence:** For air-gapped tenants, create a one-off Offline Kit slice (`export profile linkset-backfill`) and push to staging mirror.
5. **Tag snapshot:** Record Mongo `oplog` timestamp and S3/object storage manifests in `ops/devops/runbooks/concelier/no-merge.md` (new section) so rollback knows the safe point.
> **Determinism:** rerunning the backfill with identical inputs must produce byte-identical linkset documents. Use the `--verify-determinism` flag where available and archive the checksum report under `artifacts/lnm-backfill/<date>/`.
## 5. Validation gates
- **Metrics:** `linkset_insert_duration_ms`, `linkset_documents_total`, `normalized_version_rules_missing`, `merge.identity.conflicts`.
- Gate: `normalized_version_rules_missing == 0` for 48h before enabling `noMergeEnabled`.
- **Logs:** Ensure no occurrences of `Fallbacking to merge service` after cutover.
- **Change streams:** Policy and Scheduler should observe only `advisory.linkset.updated` events; monitor for stragglers referencing merge IDs.
- **QA:** Golden tests in `StellaOps.Concelier.Merge.Tests` updated to assert absence of merge outputs, plus integration tests verifying LNM-only exports.
Capture validation evidence in the sprint journal (attach Grafana screenshots + CLI output).
## 6. Rollback plan
1. **Toggle sequence:**
- Set `concelier:features:noMergeEnabled=false`.
- Re-enable Merge job schedules (`concelier:jobs:merge:allowlist=["merge:default"]`).
- Disable `policy:overlays:requireLinksetEvidence`.
2. **Data considerations:**
- Linkset writes continue, so no data is lost; ensure Policy consumers ignore linkset-only fields during rollback window.
- If Merge pipeline was fully removed (Phase3 complete), redeploy the Merge service container image from the `rollback` tag published before cutover.
3. **Verification:**
- Run `stella concelier status` to confirm Merge workers active.
- Monitor `merge.identity.conflicts` for spikes; if present, roll forward and re-open incident with Architecture guild.
4. **Communication:**
- Post incident note in #release-infra and customer status page.
- Log rollback reason, window, and configs in `ops/devops/incidents/<yyyy-mm-dd>-no-merge.md`.
Rollback window should not exceed 4hours; beyond that, plan to roll forward with a hotfix rather than reintroducing Merge.
## 7. Documentation & communications
- Update `docs/modules/concelier/architecture.md` appendix to mark Merge deprecated and link back to this playbook.
- Coordinate with Docs guild to publish operator-facing guidance (`docs/releases/2025-q4.md`) and update CLI help text.
- Notify product/CS teams with a short FAQ covering timelines, customer impact, and steps for self-hosted installations.
## 8. Responsibilities matrix
| Area | Lead guild(s) | Supporting |
| --- | --- | --- |
| Feature flags & config | BE-Merge | DevOps |
| Backfill scripting | BE-Merge | Tools |
| Observability dashboards | Observability | QA |
| Offline kit packaging | Export Center | AirGap |
| Customer comms | Docs | Product, Support |
## 9. Deliverables & artefacts
- Config diff per environment (stored in GitOps repo).
- Backfill checksum report (`artifacts/lnm-backfill/<date>/checksums.json`).
- Grafana export (PDF) showing validation metrics.
- QA test run attesting to LNM-only regressions passing.
- Updated runbook entry in `ops/devops/runbooks/concelier/`.
---
## 10. Migration readiness checklist
| Item | Primary owner | Status notes |
| --- | --- | --- |
| Capture Linkset coverage baselines (`version_entries_total`, `missing_version_entries_total`) and archive Grafana export. | Observability Guild | [ ] Pending |
| Stage and verify linkset backfill using `linkset backfill` job; store checksum report under `artifacts/lnm-backfill/<date>/`. | BE-Merge, DevOps Guild | [ ] Pending |
| Confirm feature flags per environment (`noMergeEnabled`, `lnmShadowWrites`, `policy:overlays:requireLinksetEvidence`) match Phase 03 plan. | DevOps Guild | [ ] Pending |
| Publish operator comms (status page, Slack/email) with cutover + rollback windows. | Docs Guild, Product | [ ] Pending |
| Execute rollback rehearsal in staging and log results in `ops/devops/incidents/<date>-no-merge.md`. | DevOps Guild, Architecture Guild | [ ] Pending |
> Update the checklist as each item completes; completion of every row is required before moving to Phase2 (Cutover).
---
With this playbook completed, proceed to MERGE-LNM-21-002 to remove the Merge service code paths and enforce compile-time analyzers that block new merge dependencies.

View File

@@ -0,0 +1,41 @@
# Policy Parity Migration Guide
> **Imposed rule:** Parity runs must use frozen inputs (SBOM, advisories, VEX, reachability, signals) and record hashes; activation is blocked until parity success is attested.
This guide describes how to dual-run old vs new policies and activate only after parity is proven.
## 1. Scope
- Applies to migration from legacy policy engine to SPL/DSL v1.
- Covers dual-run, comparison, rollback, and air-gap parity.
## 2. Dual-run process
1. **Freeze inputs**: snapshot SBOM/advisory/VEX/reachability feeds; record hashes.
2. **Shadow new policy**: run in shadow with same inputs; record findings and explain traces.
3. **Compare**: use `stella policy compare --base <legacy> --candidate <new>` to diff findings (status/severity) and rule hits.
4. **Thresholds**: parity passes when diff counts are zero or within approved budget (`--max-diff`); any status downgrade to `affected` must be reviewed.
5. **Attest**: generate parity report (hashes, diffs, runs) and DSSE-sign it; store in Evidence Locker.
6. **Promote**: activate new policy only after parity attestation verified and approvals captured.
## 3. CLI commands
- `stella policy compare --base policy-legacy@42 --candidate policy-new@3 --inputs frozen.inputs.json --max-diff 0`
- `stella policy parity report --base ... --candidate ... --output parity-report.json --sign`
## 4. Air-gap workflow
- Run compare offline using bundled inputs; export parity report + DSSE; import into Console/Authority when back online.
## 5. Rollback
- Keep legacy policy approved/archivable; rollback with `stella policy activate <legacy>` if parity regression discovered.
## 6. Checklist
- [ ] Inputs frozen and hashed.
- [ ] Shadow runs executed and stored.
- [ ] Diff computed and within budget.
- [ ] Parity report DSSE-signed and stored.
- [ ] Approvals recorded; two-person rule satisfied.
- [ ] Rollback path documented.
## References
- `docs/policy/runtime.md`
- `docs/policy/editor.md`
- `docs/policy/governance.md`
- `docs/policy/overview.md`