feat: Implement vulnerability token signing and verification utilities
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added VulnTokenSigner for signing JWT tokens with specified algorithms and keys. - Introduced VulnTokenUtilities for resolving tenant and subject claims, and sanitizing context dictionaries. - Created VulnTokenVerificationUtilities for parsing tokens, verifying signatures, and deserializing payloads. - Developed VulnWorkflowAntiForgeryTokenIssuer for issuing anti-forgery tokens with configurable options. - Implemented VulnWorkflowAntiForgeryTokenVerifier for verifying anti-forgery tokens and validating payloads. - Added AuthorityVulnerabilityExplorerOptions to manage configuration for vulnerability explorer features. - Included tests for FilesystemPackRunDispatcher to ensure proper job handling under egress policy restrictions.
This commit is contained in:
141
docs/migration/no-merge.md
Normal file
141
docs/migration/no-merge.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# No-Merge Migration Playbook
|
||||
|
||||
_Last updated: 2025-11-03_
|
||||
|
||||
This playbook guides the full retirement of the legacy Merge service (`AdvisoryMergeService`) in favour of Link-Not-Merge (LNM) observations plus linksets. It is written for the BE-Merge, Architecture, DevOps, and Docs guilds coordinating Sprint 110 (Ingestion & Evidence) deliverables, and it feeds CONCELIER-LNM-21-101 / MERGE-LNM-21-001 and downstream DOCS-LNM-22-008.
|
||||
|
||||
## 0. Scope & objectives
|
||||
|
||||
- **Primary goal:** cut over all advisory pipelines to Link-Not-Merge with no residual dependencies on `AdvisoryMergeService`.
|
||||
- **Secondary goals:** maintain deterministic evidence, zero data loss, and reversible deployment across online and offline tenants.
|
||||
- **Success criteria:**
|
||||
- All connectors emit observation `affected.versions[]` with provenance and pass LNM guardrails.
|
||||
- Linkset dashboards show zero `missing_version_entries_total` and no `Normalized version rules missing…` warnings.
|
||||
- Policy, Export Center, and CLI consumers operate solely on observations/linksets.
|
||||
- Rollback playbook validated and rehearsed in staging.
|
||||
|
||||
## 1. Prerequisites checklist
|
||||
|
||||
| Item | Owner | Notes |
|
||||
| --- | --- | --- |
|
||||
| Normalized version ranges emitted for all Sprint 110 connectors (`Acsc`, `Cccs`, `CertBund`, `CertCc`, `Cve`, `Ghsa`, `Ics.Cisa`, `Kisa`, `Ru.Bdu`, `Ru.Nkcki`, `Vndr.Apple`, `Vndr.Cisco`, `Vndr.Msrc`). | Connector guilds | Follow `docs/dev/normalized-rule-recipes.md`; update fixtures with `UPDATE_*_FIXTURES=1`. |
|
||||
| Metrics dashboards (`LinksetVersionCoverage`, `Normalized version rules missing`) available in Grafana/CI snapshots. | Observability guild | Publish baseline before shadow rollout. |
|
||||
| Concelier WebService exposes `linkset` and `observation` read APIs for policy/CLI consumers. | BE-Merge / Platform | Confirm contract parity with Merge outputs. |
|
||||
| Export Center / Offline Kit aware of new manifests. | Export Center guild | Provide beta bundle for QA verification. |
|
||||
| Docs guild aligned on public migration messaging. | Docs guild | Update `docs/dev`, `docs/modules/concelier`, and release notes once cutover date is locked. |
|
||||
|
||||
Do not proceed to Phase 1 until all prerequisites are checked or explicitly waived by Architecture guild.
|
||||
|
||||
## 2. Feature flag & configuration plan
|
||||
|
||||
| Toggle | Default | Purpose | Notes |
|
||||
| --- | --- | --- | --- |
|
||||
| `concelier:features:noMergeEnabled` | `false` | Master switch to disable legacy Merge job scheduling/execution. | Applies to WebService + Worker; gate `AdvisoryMergeService` DI registration. |
|
||||
| `concelier:features:lnmShadowWrites` | `true` | Enables dual-write of linksets while Merge remains active. | Keep enabled throughout Phase 0–1 to validate parity. |
|
||||
| `concelier:jobs:merge:allowlist` | `[]` | Explicit allowlist for Merge jobs when noMergeEnabled is `false`. | Set to empty during Phase 2+ to prevent accidental restarts. |
|
||||
| `policy:overlays:requireLinksetEvidence` | `false` | Policy engine safety net to require linkset-backed findings. | Flip to `true` only after cutover (Phase 2). |
|
||||
|
||||
> **Configuration hygiene:** Document the toggle values per environment in `ops/devops/configuration/staging.md` and `ops/devops/configuration/production.md`. Air-gapped customers receive defaults through the Offline Kit release notes.
|
||||
|
||||
## 3. Rollout phases
|
||||
|
||||
| Phase | Goal | Duration | Key actions |
|
||||
| --- | --- | --- | --- |
|
||||
| **0 – Preparation** | Ensure readiness | 2–3 days | Finalise prerequisites, snapshot Merge metrics, dry-run backfill scripts in dev. |
|
||||
| **1 – Shadow / Dual Write** | Validate parity | 5–7 days | Enable `lnmShadowWrites`, keep Merge primary. Compare linkset vs merged outputs using `stella concelier diff-merge --snapshot <date>`; fix discrepancies. |
|
||||
| **2 – Cutover** | Switch to LNM | 1 day (per env) | Enable `noMergeEnabled`, disable Merge job schedules, update Policy/Export configs, run post-cutover smoke tests. |
|
||||
| **3 – Harden** | Decommission Merge | 2–3 days | Remove Merge background services, delete `merge_event` retention jobs, clean dashboards, notify operators. |
|
||||
|
||||
### 3.1 Environment sequencing
|
||||
|
||||
1. **Dev/Test clusters:** Validate all automation. Run full regression suite (`dotnet test src/Concelier/...`).
|
||||
2. **Staging:** Execute complete backfill (see §4) and collect 24 h of telemetry before sign-off.
|
||||
3. **Production:** Perform cutover during low-ingest window; communicate via Slack/email + status page two days in advance.
|
||||
4. **Offline kit:** Package new Observer snapshots with LNM-only data; ensure instructions cover flag toggles for air-gapped deployments.
|
||||
|
||||
### 3.2 Smoke test matrix
|
||||
|
||||
- `stella concelier status --include linkset` returns healthy and shows zero Merge workers.
|
||||
- `stella policy evaluate` against sample tenants produces identical findings pre/post cutover.
|
||||
- Export Center bundle diff shows only expected metadata changes (manifest ID, timestamps).
|
||||
- Grafana dashboards: `linkset_insert_duration_ms` steady, `merge.identity.conflicts` flatlined.
|
||||
|
||||
## 4. Backfill strategy
|
||||
|
||||
1. **Freeze Merge writes:** Pause Merge job scheduler (`MergeJobScheduler.PauseAsync`) to prevent new merge events while snapshots are taken.
|
||||
2. **Generate linkset baseline:** Run `dotnet run --project src/Concelier/StellaOps.Concelier.WebService -- linkset backfill --from 2024-01-01` (or equivalent CLI job) to rebuild linksets from `advisory_raw`. Capture job output artefacts and attach to the sprint issue.
|
||||
3. **Validate parity:** Use the internal diff tool (`tools/concelier/compare-linkset-merge.ps1`) to compare sample advisories. Any diffs must be triaged before production cutover.
|
||||
4. **Publish evidence:** For air-gapped tenants, create a one-off Offline Kit slice (`export profile linkset-backfill`) and push to staging mirror.
|
||||
5. **Tag snapshot:** Record Mongo `oplog` timestamp and S3/object storage manifests in `ops/devops/runbooks/concelier/no-merge.md` (new section) so rollback knows the safe point.
|
||||
|
||||
> **Determinism:** rerunning the backfill with identical inputs must produce byte-identical linkset documents. Use the `--verify-determinism` flag where available and archive the checksum report under `artifacts/lnm-backfill/<date>/`.
|
||||
|
||||
## 5. Validation gates
|
||||
|
||||
- **Metrics:** `linkset_insert_duration_ms`, `linkset_documents_total`, `normalized_version_rules_missing`, `merge.identity.conflicts`.
|
||||
- Gate: `normalized_version_rules_missing == 0` for 48 h before enabling `noMergeEnabled`.
|
||||
- **Logs:** Ensure no occurrences of `Fallbacking to merge service` after cutover.
|
||||
- **Change streams:** Policy and Scheduler should observe only `advisory.linkset.updated` events; monitor for stragglers referencing merge IDs.
|
||||
- **QA:** Golden tests in `StellaOps.Concelier.Merge.Tests` updated to assert absence of merge outputs, plus integration tests verifying LNM-only exports.
|
||||
|
||||
Capture validation evidence in the sprint journal (attach Grafana screenshots + CLI output).
|
||||
|
||||
## 6. Rollback plan
|
||||
|
||||
1. **Toggle sequence:**
|
||||
- Set `concelier:features:noMergeEnabled=false`.
|
||||
- Re-enable Merge job schedules (`concelier:jobs:merge:allowlist=["merge:default"]`).
|
||||
- Disable `policy:overlays:requireLinksetEvidence`.
|
||||
2. **Data considerations:**
|
||||
- Linkset writes continue, so no data is lost; ensure Policy consumers ignore linkset-only fields during rollback window.
|
||||
- If Merge pipeline was fully removed (Phase 3 complete), redeploy the Merge service container image from the `rollback` tag published before cutover.
|
||||
3. **Verification:**
|
||||
- Run `stella concelier status` to confirm Merge workers active.
|
||||
- Monitor `merge.identity.conflicts` for spikes; if present, roll forward and re-open incident with Architecture guild.
|
||||
4. **Communication:**
|
||||
- Post incident note in #release-infra and customer status page.
|
||||
- Log rollback reason, window, and configs in `ops/devops/incidents/<yyyy-mm-dd>-no-merge.md`.
|
||||
|
||||
Rollback window should not exceed 4 hours; beyond that, plan to roll forward with a hotfix rather than reintroducing Merge.
|
||||
|
||||
## 7. Documentation & communications
|
||||
|
||||
- Update `docs/modules/concelier/architecture.md` appendix to mark Merge deprecated and link back to this playbook.
|
||||
- Coordinate with Docs guild to publish operator-facing guidance (`docs/releases/2025-q4.md`) and update CLI help text.
|
||||
- Notify product/CS teams with a short FAQ covering timelines, customer impact, and steps for self-hosted installations.
|
||||
|
||||
## 8. Responsibilities matrix
|
||||
|
||||
| Area | Lead guild(s) | Supporting |
|
||||
| --- | --- | --- |
|
||||
| Feature flags & config | BE-Merge | DevOps |
|
||||
| Backfill scripting | BE-Merge | Tools |
|
||||
| Observability dashboards | Observability | QA |
|
||||
| Offline kit packaging | Export Center | AirGap |
|
||||
| Customer comms | Docs | Product, Support |
|
||||
|
||||
## 9. Deliverables & artefacts
|
||||
|
||||
- Config diff per environment (stored in GitOps repo).
|
||||
- Backfill checksum report (`artifacts/lnm-backfill/<date>/checksums.json`).
|
||||
- Grafana export (PDF) showing validation metrics.
|
||||
- QA test run attesting to LNM-only regressions passing.
|
||||
- Updated runbook entry in `ops/devops/runbooks/concelier/`.
|
||||
|
||||
---
|
||||
|
||||
## 10. Migration readiness checklist
|
||||
|
||||
| Item | Primary owner | Status notes |
|
||||
| --- | --- | --- |
|
||||
| Capture Linkset coverage baselines (`version_entries_total`, `missing_version_entries_total`) and archive Grafana export. | Observability Guild | [ ] Pending |
|
||||
| Stage and verify linkset backfill using `linkset backfill` job; store checksum report under `artifacts/lnm-backfill/<date>/`. | BE-Merge, DevOps Guild | [ ] Pending |
|
||||
| Confirm feature flags per environment (`noMergeEnabled`, `lnmShadowWrites`, `policy:overlays:requireLinksetEvidence`) match Phase 0–3 plan. | DevOps Guild | [ ] Pending |
|
||||
| Publish operator comms (status page, Slack/email) with cutover + rollback windows. | Docs Guild, Product | [ ] Pending |
|
||||
| Execute rollback rehearsal in staging and log results in `ops/devops/incidents/<date>-no-merge.md`. | DevOps Guild, Architecture Guild | [ ] Pending |
|
||||
|
||||
> Update the checklist as each item completes; completion of every row is required before moving to Phase 2 (Cutover).
|
||||
|
||||
---
|
||||
|
||||
With this playbook completed, proceed to MERGE-LNM-21-002 to remove the Merge service code paths and enforce compile-time analyzers that block new merge dependencies.
|
||||
Reference in New Issue
Block a user