Merge branch 'main' of https://git.stella-ops.org/stella-ops.org/git.stella-ops.org

2025-10-30 00:12:41 +02:00
parent 7b5bdcf4d3 55464f8498
commit 7600caea63
39 changed files with 2130 additions and 164 deletions
--- a/docs/03_VISION.md
+++ b/docs/03_VISION.md
@@ -1,95 +1,379 @@
 #  3 · Product Vision — **Stella Ops**  
-*(v1.3 — 12 Jul 2025 · supersedes v1.2; expanded with ecosystem integration, refined metrics, and alignment to emerging trends)*  
+
+## 1) Problem Statement & Goals
+
+We ship containers. We need:
+- **Authenticity & integrity** of build artifacts and metadata.
+- **Provenance** attached to artifacts, not platforms.
+- **Transparency** to detect tampering and retroactive edits.
+- **Determinism & explainability** so scanner judgments can be replayed and justified.
+- **Actionability** to separate theoretical from exploitable risk (VEX).
+- **Minimal trust** across multi‑tenant and third‑party boundaries.
+
+**Non‑goals:** Building a new package manager, inventing new SBOM/attestation formats, or depending on closed standards.

 ---

-##  0 Preamble  
+## 2) Golden Path (Minimal End‑to‑End Flow)

-This Vision builds on the purpose and gap analysis defined in **01 WHY**.  
-It paints a three‑year “north‑star” picture of success for the open‑source project and sets the measurable guard‑rails that every roadmap item must serve, while fostering ecosystem growth and adaptability to trends like SBOM mandates, AI‑assisted security **and transparent usage quotas**.
+```mermaid
+flowchart LR
+    A[Source / Image / Rootfs] --> B[SBOM Producer\nCycloneDX 1.6]
+    B --> C[Signer\nin‑toto Attestation + DSSE]
+    C --> D[Transparency\nSigstore Rekor - optional but RECOMMENDED]
+    D --> E[Durable Storage\nSBOMs, Attestations, Proofs]
+    E --> F[Scanner\nPkg analyzers + Entry‑trace + Layer cache]
+    F --> G[VEX Authoring\nOpenVEX + SPDX 3.0.1 relationships]
+    G --> H[Policy Gate\nOPA/Rego: allow/deny + waivers]
+    H --> I[Artifacts Store\nReports, SARIF, VEX, Audit log]
+````
+
+**Adopted standards (pinned for interoperability):**
+
+* **SBOM:** CycloneDX **1.6** (JSON/XML)
+* **Attestation & signing:** **in‑toto Attestations** (Statement + Predicate) in **DSSE** envelopes
+* **Transparency:** **Sigstore Rekor** (inclusion proofs, monitoring)
+* **Exploitability:** **OpenVEX** (statuses & justifications)
+* **Modeling & interop:** **SPDX 3.0.1** (relationships / VEX modeling)
+* **Findings interchange (optional):** SARIF for analyzer output
+
+> Pinnings are *policy*, not claims about “latest”. We may update pins via normal change control.

 ---

-##  1 North‑Star Vision Statement (2027)  
+## 3) Security Invariants (What MUST Always Hold)

-> *By mid‑2027, Stella Ops is the fastest, most‑trusted self‑hosted SBOM scanner. Developers expect vulnerability feedback in **five seconds or less**—even while the free tier enforces a transparent **{{ quota_token }} scans/day** limit with graceful waiting. The project thrives on a vibrant plug‑in marketplace, weekly community releases, transparent governance, and seamless integrations with major CI/CD ecosystems—while never breaking the five‑second promise.*
+1. **Artifact identity is content‑addressed.**
+
+   * All identities are SHA‑256 digests of immutable blobs (images, SBOMs, attestations).
+2. **Every SBOM is signed.**
+
+   * SBOMs MUST be wrapped in **in‑toto DSSE** attestations tied to the container digest.
+3. **Provenance is attached, not implied.**
+
+   * Build metadata (who/where/how) MUST ride as attestations linked by digest.
+4. **Transparency FIRST mindset.**
+
+   * Signatures/attestations SHOULD be logged to **Rekor** and store inclusion proofs.
+5. **Determinism & replay.**
+
+   * Scans MUST be reproducible given: input digests, scanner version, DB snapshot, and config.
+6. **Explainability.**
+
+   * Findings MUST show the *why*: package → file path → call‑stack / entrypoint (when available).
+7. **Exploitability over enumeration.**
+
+   * Risk MUST be communicated via **VEX** (OpenVEX), including **under_investigation** where appropriate.
+8. **Least privilege & minimal trust.**
+
+   * Build keys are short‑lived; scanners run on ephemeral, least‑privileged workers.
+9. **Air‑gap friendly.**
+
+   * Mirrors for vuln DBs and containers; all verification MUST work without public egress.
+10. **No hidden blockers.**
+
+* Policy gates MUST be code‑reviewable (e.g., Rego) and auditable; waivers are attestations, not emails.

 ---

-##  2 Outcomes & Success Metrics  
+## 4) Trust Boundaries & Roles

-| KPI (community‑centric)          | Baseline Jul 2025 | Target Q2‑2026 | North‑Star 2027 |
-| -------------------------------- | ----------------- | -------------- | --------------- |
-| ⭐ Gitea / GitHub stars           | 0                 | 4 000          | 10 000          |
-| Weekly active Docker pulls       | 0                 | 1 500          | 4 000           |
-| P95 SBOM scan time (alpine)      | 5 s               | **≤ 5 s**      | **≤ 4 s**       |
-| Free‑tier scan satisfaction*     | n/a               | ≥ 90 %         | ≥ 95 %          |
-| First‑time‑contributor PRs / qtr | 0                 | 15             | 30              |
+<!-- ```mermaid
+flowchart TB
+    subgraph DevTenant[Dev Tenant]
+      SRC[Source Code]
+      CI[CI Runner]
+    end
+    subgraph SecPlatform[Security Platform]
+      SB[SBOM Service]
+      AT[Attestation Service]
+      TR[Transparency Client]
+      SCN[Scanner Pool]
+      POL[Policy Gate]
+      ST[Artifacts Store]
+    end
+    subgraph External[External/3rd‑party]
+      REG[Container Registry]
+      REK[Rekor]
+    end

-\*Measured via anonymous telemetry *opt‑in only*: ratio of successful scans to `429 QuotaExceeded` errors.
+    SRC --> CI
+    CI -->|image digest| REG
+    REG -->|pull by digest| SB
+    SB --> AT --> TR --> REK
+    AT --> ST
+    REK --> ST
+    ST --> SCN --> POL --> ST
+
+``` -->
+
+* **Build/CI:** Holds signing capability (short‑lived keys or keyless signing).
+* **Registry:** Source of truth for image bytes; access via digest only.
+* **Scanner Pool:** Ephemeral nodes; content‑addressed caches; no shared mutable state.
+* **Artifacts Store:** Immutable, WORM‑like storage for SBOMs, attestations, proofs, SARIF, VEX.

 ---

-##  3 Strategic Pillars  
+## 5) Data & Evidence We Persist

-1. **Speed First** – preserve the sub‑5 s P95 wall‑time; any feature that hurts it must ship behind a toggle or plug‑in. **Quota throttling must apply a soft 5 s delay first, so “speed first” remains true even at the limit.**  
-2. **Offline‑by‑Design** – every byte required to scan ships in public images; Internet access is optional.  
-3. **Modular Forever** – capabilities land as hot‑load plug‑ins; the monolith can split without rewrites.  
-4. **Community Ownership** – ADRs and governance decisions live in public; new maintainers elected by meritocracy.  
-5. **Zero‑Surprise Upgrades & Limits** – SemVer discipline; `main` is always installable; minor upgrades never break CI YAML **and free‑tier limits are clearly documented, with early UI warnings.**  
-6. **Ecosystem Harmony** – Prioritise integrations with popular OSS tools (e.g., Trivy extensions, BuildKit hooks) to lower adoption barriers.
+| Artifact             | MUST Persist                         | Why                          |
+| -------------------- | ------------------------------------ | ---------------------------- |
+| SBOM (CycloneDX 1.6) | Raw file + DSSE attestation          | Reproducibility, audit       |
+| in‑toto Statement    | Full JSON                            | Traceability                 |
+| Rekor entry          | UUID + inclusion proof               | Tamper‑evidence              |
+| Scanner output       | SARIF + raw notes                    | Triage & tooling interop     |
+| VEX                  | OpenVEX + links to findings          | Noise reduction & compliance |
+| Policy decisions     | Input set + decision + rule versions | Governance & forensics       |
+
+Retention follows our Compliance policy; default **≥ 18 months**.

 ---

-##  4 Road‑map Themes (18‑24 months)  
+## 6) Scanner Requirements (Determinism & Explainability)

-| Horizon            | Theme                   | Example EPIC                                                                                                                       |
-| ------------------ | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
-| **Q3‑2025** (3 mo) | **Core Stability & UX** | One‑command installer; dark‑mode UI; baseline SBOM scanning; **Free‑tier Quota Service ({{ quota_token }}  scans/day, early banner, wait‑wall).** |
-| 6–12 mo            | *Extensibility*         | Scan‑service micro‑split PoC; community plugin marketplace beta.                                                                   |
-| 12–18 mo           | *Ecosystem*             | Community plug‑in marketplace launch; integrations with Syft and Harbor.                                                           |
-| 18–24 mo           | *Resilience & Scale*    | Redis Cluster auto‑sharding; AI‑assisted triage plugin framework.                                                                  |
-
-*(Granular decomposition lives in 25_LEDGER.md.)
+* **Inputs pinned:** image digest(s), SBOM(s), scanner version, vuln DB snapshot date, config hash.
+* **Explainability:** show file paths, package coords (e.g., purl), and—when possible—**entry‑trace/call‑stack** from executable entrypoints to vulnerable symbol(s).
+* **Caching:** content‑addressed per‑layer & per‑ecosystem caches; warming does not change decisions.
+* **Unknowns:** output **under_investigation** where exploitability is not yet known; roll into VEX.
+* **Interchange:** emit **SARIF** for IDE and pipeline consumption (optional but recommended).

 ---

-##  5 Stakeholder Personas & Benefits  
+## 7) Policy Gate (OPA/Rego) — Examples

-| Persona               | Core Benefit                                                     |
-| --------------------- | ---------------------------------------------------------------- |
-| Solo OSS maintainer   | Laptop scans in **≤ 5 s**; zero cloud reliance.                  |
-| CI Platform Engineer  | Single‑binary backend + Redis; stable YAML integrations.         |
-| Security Auditor      | AGPL code, traceable CVE sources, reproducible benchmarks.       |
-| Community Contributor | Plugin hooks and good‑first issues; merit‑based maintainer path. |
-| Budget‑conscious Lead | Clear **{{ quota_token }} scans/day** allowance before upgrades are required.  |
+> Gate runs after scan + VEX merge. It treats VEX as first‑class input.

-(See **01 WHY §3** for detailed pain‑points & evidence.)
+### 7.1 Deny unreconciled criticals that are exploitable
+
+```rego
+package stella.policy
+
+default allow := false
+
+exploitable(v) {
+  v.severity == "CRITICAL"
+  v.exploitability == "affected"
+}
+
+allow {
+  not exploitable_some
+}
+
+exploitable_some {
+  some v in input.findings
+  exploitable(v)
+  not waived(v.id)
+}
+
+waived(id) {
+  some w in input.vex
+  w.vuln_id == id
+  w.status == "not_affected"
+  w.justification != ""
+}
+```
+
+### 7.2 Require Rekor inclusion for attestations
+
+```rego
+package stella.policy
+
+violation[msg] {
+  some a in input.attestations
+  not a.rekor.inclusion_proof
+  msg := sprintf("Attestation %s lacks Rekor inclusion proof", [a.id])
+}
+```

 ---

-##  6 Non‑Goals (2025‑2027)  
+## 8) Version Pins & Compatibility

-* Multi‑tenant SaaS offering.  
-* Automated “fix PR” generation.  
-* Proprietary compliance certifications (left to downstream distros).  
-* Windows **container** scanning (agents only).  
+| Domain       | Standard       | Stella Pin       | Notes                                            |
+| ------------ | -------------- | ---------------- | ------------------------------------------------ |
+| SBOM         | CycloneDX      | **1.6**          | JSON or XML accepted; JSON preferred             |
+| Attestation  | in‑toto        | **Statement v1** | Predicates per use case (e.g., sbom, provenance) |
+| Envelope     | DSSE           | **v1**           | Canonical JSON payloads                          |
+| Transparency | Sigstore Rekor | **API stable**   | Inclusion proof stored alongside artifacts       |
+| VEX          | OpenVEX        | **spec current** | Map to SPDX 3.0.1 relationships as needed        |
+| Interop      | SPDX           | **3.0.1**        | Use for modeling & cross‑ecosystem exchange      |
+| Findings     | SARIF          | **2.1.0**        | Optional but recommended                         |

 ---

-##  7 Review & Change Process  
+## 9) Minimal CLI Playbook (Illustrative)

-* **Cadence:** product owner leads a public Vision review every **2 sprints (≈ 1 quarter)**.  
-* **Amendments:** material changes require PR labelled `type:vision` + two maintainer approvals.  
-* **Versioning:** bump patch for typo, minor for KPI tweak, major if North‑Star statement shifts.  
-* **Community Feedback:** Open GitHub Discussions for input; incorporate top‑voted suggestions quarterly.
+> Commands below are illustrative; wire them into CI with short‑lived credentials.
+
+```bash
+# 1) Produce SBOM (CycloneDX 1.6) from image digest
+syft registry:5000/myimg@sha256:... -o cyclonedx-json > sbom.cdx.json
+
+# 2) Create in‑toto DSSE attestation bound to the image digest
+cosign attest --predicate sbom.cdx.json \
+  --type https://stella-ops.org/attestations/sbom/1 \
+  --key env://COSIGN_KEY \
+  registry:5000/myimg@sha256:...
+
+# 3) (Optional but recommended) Rekor transparency
+cosign sign --key env://COSIGN_KEY registry:5000/myimg@sha256:...
+cosign verify-attestation --type ... --certificate-oidc-issuer https://token.actions... registry:5000/myimg@sha256:... > rekor-proof.json
+
+# 4) Scan (pinned DB snapshot)
+stella-scan --image registry:5000/myimg@sha256:... \
+  --sbom sbom.cdx.json \
+  --db-snapshot 2025-10-01 \
+  --out findings.sarif
+
+# 5) Emit VEX
+stella-vex --from findings.sarif --policy vex-policy.yaml --out vex.json
+
+# 6) Gate
+opa eval -i gate-input.json -d policy/ -f pretty "data.stella.policy.allow"
+```

 ---

+## 10) JSON Skeletons (Copy‑Ready)
+
+### 10.1 in‑toto Statement (DSSE payload)
+
+```json
+{
+  "_type": "https://in-toto.io/Statement/v1",
+  "subject": [
+    {
+      "name": "registry:5000/myimg",
+      "digest": { "sha256": "IMAGE_DIGEST_SHA256" }
+    }
+  ],
+  "predicateType": "https://stella-ops.org/attestations/sbom/1",
+  "predicate": {
+    "sbomFormat": "CycloneDX",
+    "sbomVersion": "1.6",
+    "mediaType": "application/vnd.cyclonedx+json",
+    "location": "sha256:SBOM_BLOB_SHA256"
+  }
+}
+```
+
+### 10.2 DSSE Envelope (wrapping the Statement)
+
+```json
+{
+  "payloadType": "application/vnd.in-toto+json",
+  "payload": "BASE64URL_OF_CANONICAL_STATEMENT_JSON",
+  "signatures": [
+    {
+      "keyid": "KEY_ID_OR_CERT_ID",
+      "sig": "BASE64URL_SIGNATURE"
+    }
+  ]
+}
+```
+
+### 10.3 OpenVEX (compact)
+
+```json
+{
+  "@context": "https://openvex.dev/ns/v0.2.0",
+  "author": "Stella Ops Security",
+  "timestamp": "2025-10-29T00:00:00Z",
+  "statements": [
+    {
+      "vulnerability": "CVE-2025-0001",
+      "products": ["pkg:purl/example@1.2.3?arch=amd64"],
+      "status": "under_investigation",
+      "justification": "analysis_ongoing",
+      "timestamp": "2025-10-29T00:00:00Z"
+    }
+  ]
+}
+```
+
+---
+
+## 11) Handling “Unknowns” & Noise
+
+* Use **OpenVEX** statuses: `affected`, `not_affected`, `fixed`, `under_investigation`.
+* Prefer **justifications** over free‑text.
+* Time‑bound **waivers** are modeled as VEX with `not_affected` + justification or `affected` + compensating controls.
+* Dashboards MUST surface counts separately for `under_investigation` so risk is visible.
+
+---
+
+## 12) Operational Guidance
+
+**Key management**
+
+* Use **ephemeral OIDC** or short‑lived keys (HSM/KMS bound).
+* Rotate signer identities at least quarterly; no shared long‑term keys in CI.
+
+**Caching & performance**
+
+* Layer caches keyed by digest + analyzer version.
+* Pre‑warm vuln DB snapshots; mirror into air‑gapped envs.
+
+**Multi‑tenancy**
+
+* Strict tenant isolation for storage and compute.
+* Rate‑limit and bound memory/CPU per scan job.
+
+**Auditing**
+
+* Every decision is a record: inputs, versions, rule commit, actor, result.
+* Preserve Rekor inclusion proofs with the attestation record.
+
+---
+
+## 13) Exceptions Process (Break‑glass)
+
+1. Open a tracked exception with: artifact digest, CVE(s), business justification, expiry.
+2. Generate VEX entry reflecting the exception (`not_affected` with justification or `affected` with compensating controls).
+3. Merge into policy inputs; **policy MUST read VEX**, not tickets.
+4. Re‑review before expiry; exceptions cannot auto‑renew.
+
+---
+
+## 14) Threat Model (Abbreviated)
+
+* **Tampering**: modified SBOMs/attestations → mitigated by DSSE + Rekor + WORM storage.
+* **Confused deputy**: scanning a different image → mitigated by digest‑only pulls and subject digests in attestations.
+* **TOCTOU / re‑tagging**: registry tags drift → mitigated by digest pinning everywhere.
+* **Scanner poisoning**: unpinned DBs → mitigated by snapshotting and recording version/date.
+* **Key compromise**: long‑lived CI keys → mitigated by OIDC keyless or short‑lived KMS keys.
+
+---
+
+## 15) Implementation Checklist
+
+* [ ] SBOM producer emits CycloneDX 1.6; bound to image digest.
+* [ ] in‑toto+DSSE signing wired in CI; Rekor logging enabled.
+* [ ] Durable artifact store with WORM semantics.
+* [ ] Scanner produces explainable findings; SARIF optional.
+* [ ] OpenVEX emitted and archived; linked to findings & image.
+* [ ] Policy gate enforced; waivers modeled as VEX; decisions logged.
+* [ ] Air‑gap mirrors for registry and vuln DBs.
+* [ ] Runbooks for key rotation, Rekor outage, and database rollback.
+
+---
+
+## 16) Glossary
+
+* **SBOM**: Software Bill of Materials describing packages/components within an artifact.
+* **Attestation**: Signed statement binding facts (predicate) to a subject (artifact) using in‑toto.
+* **DSSE**: Envelope that signs the canonical payload detached from transport.
+* **Transparency Log**: Append‑only log (e.g., Rekor) giving inclusion and temporal proofs.
+* **VEX**: Vulnerability Exploitability eXchange expressing exploitability status & justification.
+
+---
+
+
 ## 8 · Change Log

 | Version | Date        | Note (high‑level)                                                                                     |
 | ------- | ----------- | ----------------------------------------------------------------------------------------------------- |
+| v1.4    | 29-Oct-2025 | Initial principles, golden path, policy examples, and JSON skeletons.                                    |
 | v1.4    | 14‑Jul‑2025 | First public revision reflecting quarterly roadmap & KPI baseline.                                    |
 | v1.3    | 12‑Jul‑2025 | Expanded ecosystem pillar, added metrics/integrations, refined non-goals, community persona/feedback. |
 | v1.2    | 11‑Jul‑2025 | Restructured to link with WHY; merged principles into Strategic Pillars; added review §7              |
--- a/docs/dev/cartographer-graph-handshake.md
+++ b/docs/dev/cartographer-graph-handshake.md
@@ -0,0 +1,49 @@
+# Cartographer Graph Handshake Plan
+
+_Status: 2025-10-29_
+
+## Why this exists
+The Concelier/Excititor graph enrichment work (CONCELIER-GRAPH-21-001/002, EXCITITOR-GRAPH-21-001/002/005) and the merge-side coordination tasks (FEEDMERGE-COORD-02-901/902) are blocked on a clear contract with Cartographer and the Policy Engine. This document captures the minimum artefacts each guild owes so we can unblock the graph pipeline and resume implementation without re-scoping every stand-up.
+
+## Deliverables by guild
+
+### Cartographer Guild
+- **CARTO-GRAPH-21-002** (Inspector contract): publish the inspector payload schema (`graph.inspect.v1`) including the fields Cartographer needs from Concelier/Excititor (SBOM relationships, advisory/VEX linkouts, justification summaries). Target format: shared Proto/JSON schema stored under `src/Cartographer/Contracts/`.
+- **CARTO-GRAPH-21-005** (Inspector access patterns): document the query shapes Cartographer will execute (PURL → advisory, PURL → VEX statement, policy scope filters) so storage can project the right indexes/materialized views. Include sample `mongosh` queries and desired TTL/limit behaviour.
+- Provide a test harness (e.g., Postman collection or integration fixture) Cartographer will use to validate the Concelier/Excititor endpoints once they land.
+
+### Concelier Core Guild
+- Derive adjacency data from SBOM normalization as described in CONCELIER-GRAPH-21-001 (depends on `CONCELIER-POLICY-20-002`). Once Cartographer publishes the schema above, implement:
+  - Node payloads: component metadata, scopes, entrypoint annotations.
+  - Edge payloads: `contains`, `depends_on`, `provides`, provenance array.
+- **Change events (CONCELIER-GRAPH-21-002)**: define `sbom.relationship.changed` event contract with tenant + context metadata, referencing Cartographer’s filter requirements. Include event samples and replay instructions in `docs/graph/concelier-events.md`.
+- Coordinate with Cartographer on pagination/streaming expectations (page size, continuation token, retention window).
+
+### Excititor Core & Storage Guilds
+- **Inspector linkouts (EXCITITOR-GRAPH-21-001)**: expose Batched VEX/advisory lookup endpoint that accepts graph node PURLs and responds with raw document slices + justification metadata. Ensure Policy Engine scope enrichment (EXCITITOR-POLICY-20-002) feeds this response so Cartographer does not need to call multiple services.
+- **Overlay enrichment (EXCITITOR-GRAPH-21-002)**: align the overlay metadata with Cartographer’s schema once it lands (include justification summaries, document versions, and provenance).
+- **Indexes/materialized views (EXCITITOR-GRAPH-21-005)**: after Cartographer publishes query shapes, create the necessary indexes (PURL + tenant, policy scope) and document migrations in storage runbooks. Provide load testing evidence before enabling in production.
+
+### Policy Guild
+- **CONCELIER-POLICY-20-002**: publish the enriched linkset schema that powers both Concelier and Excititor payloads. Include enumerations for relationship types and scope tags.
+- Share the Policy Engine timeline for policy overlay metadata (`POLICY-ENGINE-30-001`) so Excititor can plan the overlay enrichment delivery.
+
+## Shared action items
+
+| Owner | Task | Deadline | Notes |
+|-------|------|----------|-------|
+| Cartographer | Publish inspector schema + query patterns (`CARTO-GRAPH-21-002`/`21-005`) | 2025-11-04 | Attach schema files + examples to this doc once merged. |
+| Concelier Core | Draft change-event payload with sample JSON | 2025-11-06 | Blocked until Cartographer schema lands; prepare skeleton PR in `docs/graph/concelier-events.md`. |
+| Excititor Core/Storage | Prototype batch linkout API + index design doc | 2025-11-07 | Leverage Cartographer query patterns to size indexes; include perf targets. |
+| Policy Guild | Confirm linkset enrichment fields + overlay timeline | 2025-11-05 | Needed to unblock both Concelier enrichment and Excititor overlay tasks. |
+
+## Reporting
+- Track progress in the `#cartographer-handshake` Slack thread (create once Cartographer posts the schema MR).
+- During the twice-weekly graph sync, review outstanding checklist items above and update the task notes (`TASKS.md`) so the backlog reflects real-time status.
+- Once the schema and query contracts are merged, the Concelier/Excititor teams can flip their tasks from **BLOCKED** to **DOING** and attach implementation plans referencing this document.
+
+## Appendix: references
+- `CONCELIER-GRAPH-21-001`, `CONCELIER-GRAPH-21-002` (Concelier Core task board)
+- `EXCITITOR-GRAPH-21-001`, `EXCITITOR-GRAPH-21-002`, `EXCITITOR-GRAPH-21-005` (Excititor Core/Storage task boards)
+- `CARTO-GRAPH-21-002`, `CARTO-GRAPH-21-005` (Cartographer task board)
+- `POLICY-ENGINE-30-001`, `CONCELIER-POLICY-20-002`, `EXCITITOR-POLICY-20-002` (Policy Engine roadmap)
--- a/docs/dev/java-analyzer-observation-plan.md
+++ b/docs/dev/java-analyzer-observation-plan.md
@@ -0,0 +1,37 @@
+# Java Analyzer Observation Writer Plan
+
+_Status: 2025-10-29_
+
+SCANNER-ANALYZERS-JAVA-21-008 (resolver + AOC writer) is blocked by upstream heuristics that need to settle before we can emit observation JSON. This note itemises the remaining work so the analyzer guild can sequence delivery without re-opening design discussions in every stand-up.
+
+## Prerequisite summary
+- **SCANNER-ANALYZERS-JAVA-21-004** (reflection / dynamic loader heuristics) – must emit normalized reflection edges with confidence + call-site metadata. Outstanding items: TCCL coverage for servlet containers and resource-based plugin hints. Owners: Java Analyzer Guild.
+- **SCANNER-ANALYZERS-JAVA-21-005** (framework config extraction) – required to surface Spring/Jakarta entrypoints that feed observation entrypoint metadata. Add YAML/property parsing fixtures and document reason codes (`config-spring`, `config-jaxrs`, etc.).
+- **SCANNER-ANALYZERS-JAVA-21-006** (JNI/native hints) – optional but highly recommended before observation writer so JNI edges land alongside static ones. Coordinate with native analyzer on reason codes.
+- **Advisory core** – ensure AOC writer schema (`JavaObservation.json`) is frozen before we serialise to avoid churn downstream.
+
+## Deliverables for SCANNER-ANALYZERS-JAVA-21-008
+1. **Observation projection (`JavaObservationWriter`)**
+   - Inputs: normalised workspace + analyzer outputs (classpath graph, SPI table, reflection edges, config hints, JNI hints).
+   - Outputs: deterministic JSON containing entrypoints, components, edges, warnings, provenance. Align with `docs/aoc/java-observation-schema.md` once published.
+2. **AOC guard integration**
+   - Serialize observation documents through `Scanner.Aoc` guard pipeline; add unit tests covering required fields and forbidden derived data.
+3. **Fixture updates**
+   - Expand `fixtures/lang/java/` set to include reflection-heavy app, Spring Boot sample, JNI sample, modular app. Record golden outputs with `UPDATE_JAVA_FIXTURES=1`.
+4. **Metrics & logging**
+   - Emit counters (`scanner.java.observation.edges_total`, etc.) to trace observation completeness during CI runs.
+5. **Documentation**
+   - Update `docs/scanner/java-analyzer.md` with reason code matrix and observation field definitions.
+
+## Action items
+| Owner | Task | Due | Notes |
+|-------|------|-----|-------|
+| Java Analyzer Guild | Land reflection TODOs (TCCL + resource plugin hints) | 2025-11-01 | Required for reliable dynamic edges. |
+| Java Analyzer Guild | Finish config extractor for Spring/Jakarta | 2025-11-02 | Use sample apps in `fixtures/lang/java/config-*`. |
+| Java Analyzer Guild | Draft observation writer spike PR using new schema | 2025-11-04 | PR can be draft but should include JSON schema + sample. |
+| Scanner AOC Owners | Validate observation JSON against AOC guard + schema | 2025-11-05 | Blocker for marking 21-008 as DOING. |
+| QA Guild | Prepare regression harness + performance gate (<300 ms per fat jar) | 2025-11-06 | Align with SCANNER-ANALYZERS-JAVA-21-009. |
+
+## Reporting
+- Track these checkpoints in the Java analyzer weekly sync; once prerequisites are green, flip SCANNER-ANALYZERS-JAVA-21-008 to **DOING**.
+- Store schema and sample output under `docs/scanner/java-observations/` so AOC reviewers have a stable reference.
--- a/docs/dev/normalized-rule-recipes.md
+++ b/docs/dev/normalized-rule-recipes.md
@@ -0,0 +1,94 @@
+# Normalized Version Rule Recipes
+
+_Status: 2025-10-29_
+
+This guide captures the minimum wiring required for connectors and Merge coordination tasks to finish the normalized-version rollout that unblocks FEEDMERGE-COORD-02-9xx.
+
+## 1. Quick-start checklist
+
+1. Ensure your mapper already emits `AffectedPackage.VersionRanges` (SemVer, NEVRA, EVR).  If you only have vendor/product strings, capture the raw range text before trimming so it can feed the helper.
+2. Call `SemVerRangeRuleBuilder.BuildNormalizedRules(rawRange, patchedVersion, provenance)` for each range and place the result in `AffectedPackage.NormalizedVersions`.
+3. Set a provenance note in the format `connector:{advisoryId}:{index}` so Merge can differentiate connector-provided rules from canonical fallbacks.
+4. Verify with `dotnet test` that the connector snapshot fixtures now include the `normalizedVersions` array and update fixtures by setting the connector-specific `UPDATE_*_FIXTURES=1` environment variable.
+5. Tail Merge logs (or the test output) for the new warning `Normalized version rules missing for {AdvisoryKey}`; an empty warning stream means the connector/merge artefacts are ready to close FEEDMERGE-COORD-02-901/902.
+
+## 2. Code snippet: SemVer connector (CCCS/Cisco/ICS-CISA)
+
+```csharp
+using StellaOps.Concelier.Normalization.SemVer;
+
+private static IReadOnlyList<AffectedPackage> BuildPackages(MyDto dto, DateTimeOffset recordedAt)
+{
+    var packages = new List<AffectedPackage>();
+
+    foreach (var entry in dto.AffectedEntries.Select((value, index) => (value, index)))
+    {
+        var rangeText = entry.value.Range?.Trim();
+        var patched = entry.value.FixedVersion;
+        var provenance = $"{MyConnectorPlugin.SourceName}:{dto.AdvisoryId}:{entry.index}";
+
+        var normalizedRules = SemVerRangeRuleBuilder.BuildNormalizedRules(rangeText, patched, provenance);
+        var primitives = SemVerRangeRuleBuilder.Build(rangeText, patched, provenance)
+            .Select(result => result.Primitive.ToAffectedVersionRange(provenance))
+            .ToArray();
+
+        packages.Add(new AffectedPackage(
+            AffectedPackageTypes.SemVer,
+            entry.value.PackageId,
+            versionRanges: primitives,
+            normalizedVersions: normalizedRules,
+            provenance: new[]
+            {
+                new AdvisoryProvenance(
+                    MyConnectorPlugin.SourceName,
+                    "package",
+                    entry.value.PackageId,
+                    recordedAt,
+                    new[] { ProvenanceFieldMasks.AffectedPackages })
+            }));
+    }
+
+    return packages;
+}
+```
+
+A few notes:
+
+- If you already have `SemVerPrimitive` instances, call `.ToNormalizedVersionRule(provenance)` on each primitive instead of rebuilding from raw strings.
+- Use `SemVerRangeRuleBuilder.BuildNormalizedRules` when the connector only tracks raw range text plus an optional fixed/patched version.
+- For products that encode ranges like `"ExampleOS 4.12 - 4.14"`, run a small regex to peel off the version substring (see §3) and use the same provenance note when emitting the rule and the original range primitive.
+
+## 3. Parsing helper for trailing version phrases
+
+Many of the overdue connectors store affected products as natural-language phrases.  The following helper normalises common patterns (`1.2 - 1.4`, `<= 3.5`, `Version 7.2 and later`).
+
+```csharp
+private static string? TryExtractRangeSuffix(string productString)
+{
+    if (string.IsNullOrWhiteSpace(productString))
+    {
+        return null;
+    }
+
+    var match = Regex.Match(productString, "(?<range>(?:<=?|>=?)?\s*\d+(?:\.\d+){0,2}(?:\s*-\s*\d+(?:\.\d+){0,2})?)", RegexOptions.CultureInvariant);
+    return match.Success ? match.Groups["range"].Value.Trim() : null;
+}
+```
+
+Once you extract the `range` fragment, feed it to `SemVerRangeRuleBuilder.BuildNormalizedRules(range, null, provenance)`.  Keep the original product string as-is so operators can still see the descriptive text.
+
+## 4. Merge dashboard hygiene
+
+- Run `dotnet build src/Concelier/__Libraries/StellaOps.Concelier.Merge/StellaOps.Concelier.Merge.csproj` after wiring a connector to confirm no warnings appear.
+- Merge counter tag pairs to watch in Grafana/CI logs:
+  - `concelier.merge.normalized_rules{package_type="npm"}` – increases once the connector emits normalized arrays.
+  - `concelier.merge.normalized_rules_missing{package_type="vendor"}` – should trend to zero once rollout completes.
+- The Merge service now logs `Normalized version rules missing for {AdvisoryKey}; sources=...; packageTypes=...` when a connector still needs to supply normalized rules.  Use this as the acceptance gate for FEEDMERGE-COORD-02-901/902.
+
+## 5. Documentation touchpoints
+
+- Update the connector `TASKS.md` entry with the date you flipped on normalized rules and note the provenance format you chose.
+- Record any locale-specific parsing (e.g., German `bis`) in the connector README so future contributors can regenerate fixtures confidently.
+- When opening the PR, include `dotnet test` output covering the connector tests so reviewers see the normalized array diff.
+
+Once each connector follows the steps above, we can mark FEEDCONN-CCCS-02-009, FEEDCONN-CISCO-02-009, FEEDCONN-CERTBUND-02-010, FEEDCONN-ICSCISA-02-012, and the FEEDMERGE-COORD-02-90x tasks as resolved.
--- a/docs/ingestion/aggregation-only-contract.md
+++ b/docs/ingestion/aggregation-only-contract.md
@@ -50,7 +50,7 @@
 | `content.format` | string | Source format (`CSAF`, `OSV`, etc.). |
 | `content.spec_version` | string | Upstream spec version when known. |
 | `content.raw` | object | Full upstream payload, untouched except for transport normalisation. |
-| `identifiers` | object | Normalised identifiers (`cve`, `ghsa`, `aliases`, etc.) derived losslessly from raw content. |
+| `identifiers` | object | Upstream identifiers (`cve`, `ghsa`, `aliases`, etc.) captured as provided (trimmed, order preserved, duplicates allowed). |
 | `linkset` | object | Join hints (see section 4.3). |
 | `supersedes` | string or null | Points to previous revision of same upstream doc when content hash changes. |

@@ -79,6 +79,7 @@
 Canonicalisation rules:
 - Package URLs are rendered in canonical form without qualifiers/subpaths (`pkg:type/namespace/name@version`).
 - CPE values are normalised to the 2.3 binding (`cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*`).
+- Connector mapping stages are responsible for the canonical form; ingestion trims whitespace but otherwise preserves the original order and duplicate entries so downstream policy can reason about upstream intent.

 ### 4.4 `advisory_observations`

@@ -99,10 +100,10 @@ Canonicalisation rules:
 | `content.format` / `content.specVersion` | string | Raw payload format metadata (CSAF, OSV, JSON, etc.). |
 | `content.raw` | object | Full upstream document stored losslessly (Relaxed Extended JSON). |
 | `content.metadata` | object | Optional connector-specific metadata (batch ids, hints). |
-| `linkset.aliases` | array | Normalized aliases (lower-case, sorted). |
-| `linkset.purls` | array | Normalized PURLs extracted from the document. |
-| `linkset.cpes` | array | Normalized CPE URIs. |
-| `linkset.references` | array | `{ type, url }` pairs (type lower-case). |
+| `linkset.aliases` | array | Connector-supplied aliases (trimmed, order preserved, duplicates allowed). |
+| `linkset.purls` | array | Connector-supplied PURLs (ingestion preserves order and duplicates). |
+| `linkset.cpes` | array | Connector-supplied CPE URIs (trimmed, order preserved). |
+| `linkset.references` | array | `{ type, url }` pairs (trimmed; ingestion preserves order). |
 | `createdAt` | datetime | Timestamp when Concelier persisted the observation. |
 | `attributes` | object | Optional provenance attributes keyed by connector. |