From 01f4943ab93475ffa49581e8e4d1d3603151e465 Mon Sep 17 00:00:00 2001
From: master <contact@stella-ops.org>
Date: Sun, 14 Dec 2025 16:23:44 +0200
Subject: [PATCH] up

---
 ...5 - Define a north star metric for TTFS.md |  866 +++++++++++
 ...ning the Call‑Stack Reachability Engine.md | 1258 ++++++++++++++++
 ...‑Diff - Defining Meaningful Risk Change.md |  892 ++++++++++++
 ... - Add a dedicated “first_signal” event.md | 1295 +++++++++++++++++
 ...25 - Create a small ground‑truth corpus.md |  787 ++++++++++
 ...- Dissect triage and evidence workflows.md |  551 +++++++
 ...ate PostgreSQL vs MongoDB for StellaOps.md |  544 +++++++
 .../archived/AR-REVIVE-PLAN.md                |   12 -
 8 files changed, 6193 insertions(+), 12 deletions(-)
 create mode 100644 docs/product-advisories/13-Dec-2025 - Define a north star metric for TTFS.md
 create mode 100644 docs/product-advisories/13-Dec-2025 - Designing the Call‑Stack Reachability Engine.md
 create mode 100644 docs/product-advisories/13-Dec-2025 - Smart‑Diff - Defining Meaningful Risk Change.md
 create mode 100644 docs/product-advisories/14-Dec-2025 - Add a dedicated “first_signal” event.md
 create mode 100644 docs/product-advisories/14-Dec-2025 - Create a small ground‑truth corpus.md
 create mode 100644 docs/product-advisories/14-Dec-2025 - Dissect triage and evidence workflows.md
 create mode 100644 docs/product-advisories/14-Dec-2025 - Evaluate PostgreSQL vs MongoDB for StellaOps.md
 delete mode 100644 docs/product-advisories/archived/AR-REVIVE-PLAN.md

diff --git a/docs/product-advisories/13-Dec-2025 - Define a north star metric for TTFS.md b/docs/product-advisories/13-Dec-2025 - Define a north star metric for TTFS.md
new file mode 100644
index 000000000..df3a177a1
--- /dev/null
+++ b/docs/product-advisories/13-Dec-2025 - Define a north star metric for TTFS.md	
@@ -0,0 +1,866 @@
+Here’s a simple, battle‑tested way to make your UX feel fast under pressure: treat **Time‑to‑First‑Signal (TTFS)** as a product SLO and design everything backwards from it.
+
+---
+
+# TTFS SLO: the idea in one line
+
+Guarantee **p50 < 2s, p95 < 5s** from user action (or CI event) to the **first meaningful signal** (status, cause, or next step)—fast enough to calm triage, short enough to be felt.
+
+---
+
+## What counts as “First Signal”?
+
+* A clear, human message like: “Scan started; last error matched: `NU1605` (likely transitive). Retry advice →”
+* Or a progress token with context: “Queued (ETA ~18s). Cached reachability graph loaded.”
+
+Not a spinner. Not 0% progress. A real, decision‑shaping hint.
+
+---
+
+## Budget the pipeline backwards (guardrails)
+
+* **Frontend (≤150 ms):** render instant skeleton + last known state; optimistic UI; no blocking on fresh data.
+* **Edge/API (≤250 ms):** return a “signal frame” fast path (status + last error signature + cached ETA) from cache.
+* **Core services (≤500–1500 ms):** pre‑index failures, warm reachability summaries, enqueue heavy work, emit stream token.
+* **Slow work (async):** full scan, lattice policy merge, provenance trails—arrive later via push updates.
+
+---
+
+## Minimal implementation (1–2 sprints)
+
+1. **Define the signal contract**
+
+   * `FirstSignal { kind, verb, scope, lastKnownOutcome?, ETA?, nextAction? }`
+   * Version it; keep it <1 KB; always return within the SLO window.
+
+2. **Cache last error signature**
+
+   * Key: `(repo, branch|imageDigest, toolchain-hash)`
+   * Value: `{errorCode, excerpt, fixLink, firstSeenAt, hitCount}`
+   * Evict by LRU + TTL (e.g., 7–14 days). Use Valkey in default profile; Postgres JSONB in air‑gap.
+
+3. **Pre‑index the failing step**
+
+   * When a job fails, extract and store:
+
+     * normalized step id (e.g., `scanner:deps-restore`)
+     * top 1–3 error tokens (codes, regex’d phrases)
+     * minimal context (package id, version range)
+   * Write a tiny **“failure indexer”** that runs in‑band on failure and out‑of‑band on success.
+
+4. **Lazy‑load everything else**
+
+   * UI shows FirstSignal + “Details loading…”
+   * Fetch heavy panes (full CVE list, call‑graph, SBOM diff) after paint.
+
+5. **Fast path endpoint**
+
+   * `GET /signal/{jobId}` returns from cache or snapshot table.
+   * If cache miss: fall back to “cold signal” (`queued`, basic ETA) and **immediately** enqueue warmup tasks.
+
+6. **Streaming updates**
+
+   * Emit compact deltas: `status:started → status:analyzing → triage:blocked(POLICY_X)` etc.
+   * UI subscribes; CI annotates with the same tokens.
+
+---
+
+## TTFS SLO monitor (keep it honest)
+
+* Emit for every user‑visible action: `ttfs_ms`, `path` (UI|CLI|CI), `signal_kind`, `cache_hit` (T/F).
+* Track **p50/p95** by surface and by repo size.
+* Page on **p95 > 5s** for 5 consecutive minutes (or >2% of traffic).
+* Store exemplars (trace ids) to replay slow paths.
+
+---
+
+## Stella Ops–specific hooks (drop‑in)
+
+* **Scanner.WebService:** on job accept, write `FirstSignal{kind:"queued", ETA}`; if failure index has a hit, attach `lastKnownOutcome`.
+* **Feedser/Vexer:** publish “known criticals changed since last run” as a hint in FirstSignal.
+* **Policy Engine:** pre‑evaluate “obvious blocks” (e.g., banned license) and surface as `nextAction:"toggle waiver or update license map"`.
+* **Air‑gapped profile:** skip Valkey; keep a `first_signal_snapshots` Postgres table + NOTIFY/LISTEN for streaming.
+
+---
+
+## UX micro‑rules
+
+* **Never show a spinner alone**; always pair with a sentence or chip (“Warm cache found; verifying”).
+* **3 taps max** to reach evidence: Button → FirstSignal → Evidence card.
+* **Always include a next step** (“Retry with `--ignore NU1605` is unsafe; use `PackageReference` pin → link”).
+
+---
+
+## Quick success criteria
+
+* New incident claims: “I knew what was happening within 2 seconds.”
+* CI annotates within 5s on p95.
+* Support tickets referencing “stuck scans” drop ≥40%.
+
+---
+
+If you want, I can turn this into a ready‑to‑paste **TASKS.md** (owners, DOD, metrics, endpoints, DB schemas) for your Stella Ops repos.
+````md
+# TASKS.md — TTFS (Time‑to‑First‑Signal) Fast Signal + Progressive Updates
+
+> Paste this file into the repo root (or `/docs/TTFS/TASKS.md`).  
+> This plan is structured as two sprints (A + B) with clear owners, dependencies, and DoD.
+
+---
+
+## 0) Product SLO and non‑negotiables
+
+### SLO
+- **TTFS p50 < 2s, p95 < 5s**
+- Applies to: **Web UI**, **CLI**, **CI annotations**
+- TTFS = time from **user action / CI start** → **first meaningful signal rendered/logged**
+
+### What counts as “First Signal”
+A First Signal must include at least one of:
+- Status + context (“Queued, ETA ~18s”; “Started, phase: restore”; “Blocked by policy XYZ”)
+- Known cause hint (error token/code/category)
+- Next action (open logs, docs link, retry command)
+
+A spinner alone does **not** count.
+
+### Hard constraints
+- `/jobs/{id}/signal` must **never block** on full scan work
+- FirstSignal payload in normal cases **< 1KB**
+- **No secrets** in snapshots, excerpts, telemetry
+
+---
+
+## 1) Scope and module owners
+
+### Modules (assumed)
+- **Scanner.WebService** (job API + signal provider)
+- **Scanner.Worker** (phase changes + event publishing)
+- **Policy Engine** (block reasons + quick pre-eval hooks)
+- **Feedser/Vexer** (optional: “critical changed” hint)
+- **Web UI** (progressive rendering + streaming)
+- **CLI** (first signal + streaming)
+- **CI Integration** (checks/annotations)
+- **Platform/Observability** (metrics, dashboards, alerts)
+- **Security/Compliance** (redaction + tenant isolation)
+
+### Owners (replace with actual people/teams)
+- **Backend Lead:** @be-owner
+- **Frontend Lead:** @fe-owner
+- **DevEx/CLI Lead:** @dx-owner
+- **CI Integrations Lead:** @ci-owner
+- **SRE/Obs Lead:** @sre-owner
+- **Security Lead:** @sec-owner
+- **PM:** @pm-owner
+
+---
+
+## 2) Canonical contract: FirstSignal v1.0
+
+### FirstSignal shape (canonical)
+All surfaces (UI/CLI/CI) must be representable via this contract.
+
+```json
+{
+  "version": "1.0",
+  "signalId": "sig_...",
+  "jobId": "job_...",
+
+  "timestamp": "2025-12-14T18:22:31.014Z",
+  "kind": "queued|started|phase|blocked|failed|succeeded|canceled|unavailable",
+  "phase": "resolve|fetch|restore|analyze|policy|report|unknown",
+
+  "scope": { "type": "repo|image|artifact", "id": "org/repo@branch-or-digest" },
+
+  "summary": "Queued (ETA ~18s). Last failure matched: NU1605 (dependency downgrade).",
+  "etaSeconds": 18,
+
+  "lastKnownOutcome": {
+    "signatureId": "sigerr_...",
+    "errorCode": "NU1605",
+    "token": "dependency-downgrade",
+    "excerpt": "Detected package downgrade: ...",
+    "confidence": "low|medium|high",
+    "firstSeenAt": "2025-12-01T00:00:00Z",
+    "hitCount": 14
+  },
+
+  "nextActions": [
+    { "type": "open_logs|open_job|docs|retry|cli_command", "label": "Open logs", "target": "/jobs/job_.../logs" }
+  ],
+
+  "diagnostics": {
+    "cacheHit": true,
+    "source": "snapshot|failure_index|cold_start",
+    "correlationId": "corr_..."
+  }
+}
+````
+
+### Contract rules
+
+* Must always include: `version`, `jobId`, `timestamp`, `kind`, `summary`
+* Keep normal payload < 1KB (enforce excerpt max length; avoid lists)
+* Never include secrets; excerpts must be redacted
+
+---
+
+## 3) Milestones
+
+### Sprint A — “TTFS Baseline”
+
+Goal: Always show **some** meaningful First Signal quickly.
+
+Deliverables:
+
+* Snapshot persistence (DB) + optional cache
+* `/jobs/{id}/signal` fast path
+* UI skeleton + immediate FirstSignal rendering (poll fallback OK)
+* Base telemetry: `ttfs_ms`, endpoint latency, cache hit
+
+### Sprint B — “Smart Hints + Streaming”
+
+Goal: First Signal is helpful and updates live.
+
+Deliverables:
+
+* Failure signature indexer + lookup
+* SSE events (or WebSocket) for incremental updates
+* CLI streaming + CI annotations
+* Dashboards + alerts + exemplars/traces
+* Redaction hardening and tenant isolation validation
+
+---
+
+## 4) Sprint A tasks — TTFS baseline
+
+### A1 — Implement FirstSignal types and helpers (shared package)
+
+**Owner:** @be-owner
+**Depends on:** none
+**Est:** 2–4 pts
+
+**Tasks**
+
+* [ ] Define FirstSignal v1.0 schema in a shared package (`/common/contracts/firstsignal`)
+* [ ] Add validators:
+
+  * [ ] required fields present
+  * [ ] size limits (excerpt length; total serialized bytes threshold warning)
+  * [ ] allowed enums for kind/phase
+* [ ] Add builders:
+
+  * [ ] `buildQueuedSignal(job, eta?)`
+  * [ ] `buildColdSignal(job)`
+  * [ ] `mergeHint(signal, lastKnownOutcome)`
+  * [ ] `addNextActions(signal, actions[])`
+
+**DoD**
+
+* Contract is versioned, unit-tested, and used by backend endpoint
+* Validation rejects/flags invalid signals in tests
+
+---
+
+### A2 — Snapshot storage: `first_signal_snapshots` table + migrations
+
+**Owner:** @be-owner
+**Depends on:** A1
+**Est:** 3–5 pts
+
+**Tasks**
+
+* [ ] Add Postgres migration for `first_signal_snapshots`
+* [ ] Implement CRUD:
+
+  * [ ] `createSnapshot(jobId, signal)`
+  * [ ] `updateSnapshot(jobId, partialSignal)` (phase transitions)
+  * [ ] `getSnapshot(jobId)`
+* [ ] Enforce:
+
+  * [ ] `payload_json` size guard (soft warn + hard cap via excerpt limit)
+  * [ ] `updated_at` maintained automatically
+
+**Suggested schema**
+
+```sql
+CREATE TABLE first_signal_snapshots (
+  job_id TEXT PRIMARY KEY,
+  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+  kind TEXT NOT NULL,
+  phase TEXT NOT NULL,
+  summary TEXT NOT NULL,
+  eta_seconds INT NULL,
+  payload_json JSONB NOT NULL
+);
+CREATE INDEX ON first_signal_snapshots (updated_at DESC);
+```
+
+**DoD**
+
+* Migration included
+* Integration test: create job → snapshot exists within request lifecycle (or best-effort async write + immediate cold response)
+
+---
+
+### A3 — Cache layer (default profile) with Postgres fallback
+
+**Owner:** @be-owner
+**Depends on:** A2
+**Est:** 3–6 pts
+
+**Tasks**
+
+* [ ] Add optional Valkey/Redis support:
+
+  * [ ] key: `signal:job:{jobId}` TTL: 24h
+  * [ ] read-through cache on `/signal`
+  * [ ] write-through on snapshot updates
+* [ ] Air-gapped mode behavior:
+
+  * [ ] cache disabled → read/write snapshots in Postgres only
+* [ ] Add config toggles:
+
+  * [ ] `TTFS_CACHE_BACKEND=valkey|postgres|none`
+  * [ ] `TTFS_CACHE_TTL_SECONDS=86400`
+
+**DoD**
+
+* With cache enabled: `/signal` p95 latency meets budget in load test
+* With cache disabled: correctness remains; p95 within acceptable baseline
+
+---
+
+### A4 — `/jobs/{jobId}/signal` fast-path endpoint
+
+**Owner:** @be-owner
+**Depends on:** A2, A3
+**Est:** 4–8 pts
+
+**Tasks**
+
+* [ ] Implement `GET /jobs/{jobId}/signal`
+
+  * [ ] Try cache snapshot
+  * [ ] Else DB snapshot
+  * [ ] Else cold signal (`kind=queued`, `phase=unknown`, summary “Queued. Preparing scan…”)
+  * [ ] Best-effort snapshot write if missing (non-blocking)
+* [ ] Response headers:
+
+  * [ ] `X-Correlation-Id`
+  * [ ] `Cache-Status: hit|miss|bypass`
+* [ ] Add server-side timing logs (debug-level) for:
+
+  * [ ] cache read time
+  * [ ] db read time
+  * [ ] cold path time
+
+**Performance budget**
+
+* Cache-hit response: **p95 ≤ 250ms**
+* Cold response: **p95 ≤ 500ms**
+
+**DoD**
+
+* Endpoint never blocks on scan work
+* Returns a valid FirstSignal every time job exists
+* Load test demonstrates budgets
+
+---
+
+### A5 — Create snapshot at job creation and update on phase changes
+
+**Owner:** @be-owner + @worker-owner
+**Depends on:** A2
+**Est:** 5–8 pts
+
+**Tasks**
+
+* [ ] In `POST /jobs`:
+
+  * [ ] Immediately write initial snapshot:
+
+    * `kind=queued`
+    * `phase=unknown`
+    * summary includes “Queued” and optional ETA
+* [ ] In worker:
+
+  * [ ] When job starts: update snapshot to `kind=started`, `phase=resolve|fetch|restore…`
+  * [ ] On phase transitions: update snapshot
+  * [ ] On terminal: `kind=succeeded|failed|canceled`
+* [ ] Ensure updates are idempotent and safe (replays)
+
+**DoD**
+
+* For any started job, snapshot shows phase changes within a few seconds
+* Terminal kind always correct
+
+---
+
+### A6 — UI: Immediate “First Signal” rendering with polling fallback
+
+**Owner:** @fe-owner
+**Depends on:** A4
+**Est:** 6–10 pts
+
+**Tasks**
+
+* [ ] On scan trigger:
+
+  * [ ] Render skeleton + “Preparing scan…” message (no spinner-only)
+  * [ ] Call `POST /jobs` (get jobId)
+  * [ ] Immediately call `GET /jobs/{jobId}/signal`
+  * [ ] Render summary + at least one next action button (Open job/logs)
+* [ ] Poll fallback:
+
+  * [ ] If streaming not available yet (Sprint A), poll `/signal` every 2–5s until terminal
+* [ ] Lazy-load heavy panels (must not block First Signal):
+
+  * [ ] vulnerability list
+  * [ ] dependency graph
+  * [ ] SBOM diff
+
+**DoD**
+
+* Real user monitoring shows UI TTFS p50 < 2s, p95 < 5s for the baseline path
+* No spinner-only states
+
+---
+
+### A7 — Telemetry: baseline metrics and tracing
+
+**Owner:** @sre-owner + @be-owner + @fe-owner
+**Depends on:** A4, A6
+**Est:** 5–10 pts
+
+**Metrics**
+
+* [ ] `ttfs_ms` (emitted client-side for UI; server-side for CLI/CI if needed)
+
+  * tags: `surface=ui|cli|ci`, `cache_hit=true|false`, `signal_source=snapshot|cold_start`, `kind`, `repo_size_bucket`
+* [ ] `signal_endpoint_latency_ms`
+* [ ] `signal_payload_bytes`
+* [ ] `signal_error_rate`
+
+**Tracing**
+
+* [ ] Correlation id propagated:
+
+  * [ ] API response header
+  * [ ] worker logs
+  * [ ] events (Sprint B)
+
+**Dashboards**
+
+* [ ] TTFS p50/p95 by surface
+* [ ] cache hit rate
+* [ ] endpoint latency percentiles
+
+**DoD**
+
+* Metrics visible in dashboard
+* Correlation ids make it possible to trace slow examples end-to-end
+
+---
+
+## 5) Sprint B tasks — smart hints + streaming
+
+### B1 — Failure signature extraction + redaction library
+
+**Owner:** @be-owner + @sec-owner
+**Depends on:** A1
+**Est:** 6–12 pts
+
+**Tasks**
+
+* [ ] Implement redaction utility (unit-tested):
+
+  * [ ] strip bearer tokens, API keys, access tokens, private URLs
+  * [ ] cap excerpt length (e.g., 240 chars)
+  * [ ] normalize whitespace
+* [ ] Implement signature extraction from:
+
+  * [ ] structured step errors (preferred)
+  * [ ] raw logs (fallback) via regex ruleset
+* [ ] Map to:
+
+  * `errorCode` (if present)
+  * `token` (normalized category)
+  * `confidence` (high/med/low)
+
+**DoD**
+
+* Redaction unit tests include “known secret-like patterns”
+* Extraction produces stable tokens for top failure families
+
+---
+
+### B2 — Failure signature storage: `failure_signatures` table + upsert on failures
+
+**Owner:** @be-owner
+**Depends on:** B1
+**Est:** 5–10 pts
+
+**Tasks**
+
+* [ ] Add Postgres migration for `failure_signatures`
+* [ ] Implement lookup key:
+
+  * `(scope_type, scope_id, toolchain_hash)`
+* [ ] On job failure:
+
+  * [ ] extract signature → redaction → upsert
+  * [ ] increment hit_count; update last_seen_at
+* [ ] Retention:
+
+  * [ ] TTL job: delete signatures older than 14 days (configurable)
+  * [ ] or retain last N signatures per scope
+
+**Suggested schema**
+
+```sql
+CREATE TABLE failure_signatures (
+  signature_id TEXT PRIMARY KEY,
+  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+  scope_type TEXT NOT NULL,
+  scope_id TEXT NOT NULL,
+  toolchain_hash TEXT NOT NULL,
+  error_code TEXT NULL,
+  token TEXT NOT NULL,
+  excerpt TEXT NULL,
+  confidence TEXT NOT NULL,
+  first_seen_at TIMESTAMPTZ NOT NULL,
+  last_seen_at TIMESTAMPTZ NOT NULL,
+  hit_count INT NOT NULL DEFAULT 1
+);
+CREATE INDEX ON failure_signatures (scope_type, scope_id, toolchain_hash);
+CREATE INDEX ON failure_signatures (token);
+```
+
+**DoD**
+
+* Failure runs populate signatures
+* Excerpts are redacted and capped
+* Retention job verified
+
+---
+
+### B3 — Enrich FirstSignal with “lastKnownOutcome” hint
+
+**Owner:** @be-owner
+**Depends on:** B2
+**Est:** 3–6 pts
+
+**Tasks**
+
+* [ ] On `/signal` (fast path):
+
+  * [ ] if snapshot exists but has no hint, attempt signature lookup by scope+toolchain hash
+  * [ ] merge hint into signal
+  * [ ] include `diagnostics.source=failure_index` when used
+* [ ] Add “next actions” for common tokens:
+
+  * [ ] docs link for known error codes/tokens
+  * [ ] “open logs” always present
+
+**DoD**
+
+* For scopes with prior failures, FirstSignal includes hint within SLO budgets
+
+---
+
+### B4 — Streaming updates via SSE (recommended)
+
+**Owner:** @be-owner + @worker-owner + @fe-owner
+**Depends on:** A5
+**Est:** 8–16 pts
+
+**Backend tasks**
+
+* [ ] Add `GET /jobs/{jobId}/events` SSE endpoint
+* [ ] Define event payloads:
+
+  * `status` (kind+phase+message)
+  * `hint` (token+errorCode+confidence)
+  * `policy` (blocked + policyId)
+  * `complete` (terminal)
+* [ ] Worker publishes events at:
+
+  * start
+  * phase transitions
+  * policy decision
+  * terminal
+* [ ] Ensure reconnect safety:
+
+  * [ ] event id monotonic or timestamp
+  * [ ] optional replay window (last N events in memory or DB)
+
+**Frontend tasks**
+
+* [ ] Subscribe after jobId known
+* [ ] Update FirstSignal UI in-place on deltas
+* [ ] Fallback to polling when SSE fails
+
+**DoD**
+
+* UI updates without refresh
+* Event stream doesn’t spam (3–8 meaningful events per job typical)
+* SSE failure degrades gracefully
+
+---
+
+### B5 — Policy Engine: “obvious block” pre-eval for early signal
+
+**Owner:** @be-owner + @policy-owner
+**Depends on:** B4 (optional), or can enrich snapshot directly
+**Est:** 5–10 pts
+
+**Tasks**
+
+* [ ] Add a quick pre-evaluation hook for high-signal blocks:
+
+  * banned license
+  * disallowed package
+  * org-level denylist
+* [ ] Emit early policy event or update snapshot:
+
+  * `kind=blocked`, `phase=policy`, summary names the policy
+  * next action points to waiver/docs (if supported)
+
+**DoD**
+
+* When an obvious block is present, users see it in FirstSignal without waiting for full analysis
+
+---
+
+### B6 — CLI: First Signal + streaming
+
+**Owner:** @dx-owner
+**Depends on:** A4, B4
+**Est:** 5–10 pts
+
+**Tasks**
+
+* [ ] Ensure CLI prints FirstSignal within TTFS budget
+* [ ] Add `--follow` default behavior:
+
+  * connect to SSE and stream deltas
+* [ ] Provide minimal, non-spammy output:
+
+  * only on meaningful transitions
+* [ ] Print correlation id for support triage
+
+**DoD**
+
+* CLI TTFS p50 < 2s, p95 < 5s
+* Streaming works and degrades to polling
+
+---
+
+### B7 — CI annotations/checks: initial First Signal within 5s p95
+
+**Owner:** @ci-owner
+**Depends on:** A4, B4 (optional)
+**Est:** 6–12 pts
+
+**Tasks**
+
+* [ ] On CI job start:
+
+  * [ ] call `/signal` and publish check/annotation with summary + job link
+* [ ] Update annotations only on state changes:
+
+  * queued → started
+  * started → blocked/failed/succeeded
+* [ ] Avoid annotation spam (max 3–5 updates)
+
+**DoD**
+
+* CI shows actionable first message within 5s p95
+* Updates are minimal and meaningful
+
+---
+
+### B8 — Observability: SLO alerts + exemplars
+
+**Owner:** @sre-owner
+**Depends on:** A7
+**Est:** 5–10 pts
+
+**Tasks**
+
+* [ ] Alerts:
+
+  * [ ] page when `p95(ttfs_ms) > 5000` for 5 mins
+  * [ ] page when `signal_endpoint_error_rate > 1%`
+* [ ] Add exemplars / trace links on slow TTFS samples
+* [ ] Add breakdown dashboard:
+
+  * surface (ui/cli/ci)
+  * cacheHit
+  * repo size bucket
+  * kind/phase
+
+**DoD**
+
+* On-call can diagnose slow TTFS with one click to traces/logs
+
+---
+
+## 6) Cross-cutting: security, privacy, and tenancy
+
+### S1 — Tenant-safe caching and lookups
+
+**Owner:** @sec-owner + @be-owner
+**Depends on:** A3, B2
+**Est:** 3–6 pts
+
+**Tasks**
+
+* [ ] Ensure cache keys include tenant/org boundary where applicable:
+
+  * `tenant:{tenantId}:signal:job:{jobId}`
+* [ ] Ensure failure signatures are only looked up within same tenant
+* [ ] Add tests for cross-tenant leakage
+
+**DoD**
+
+* No cross-tenant access possible via cache or signature index
+
+---
+
+### S2 — No secrets policy enforcement
+
+**Owner:** @sec-owner
+**Depends on:** B1
+**Est:** 2–5 pts
+
+**Tasks**
+
+* [ ] Add “secret scanning” unit tests for redaction
+* [ ] Add runtime guardrails:
+
+  * if excerpt contains forbidden patterns → replace with “[redacted]”
+* [ ] Ensure telemetry attributes never include excerpts
+
+**DoD**
+
+* Security review sign-off for snapshot + signature + telemetry
+
+---
+
+## 7) Global Definition of Done
+
+A feature is “done” only when:
+
+* [ ] Meets TTFS SLO in staging load test and in production RUM (within agreed rollout window)
+* [ ] Has:
+
+  * [ ] unit tests
+  * [ ] integration tests
+  * [ ] basic load test coverage for `/signal`
+* [ ] Has:
+
+  * [ ] dashboards
+  * [ ] alerts (or explicitly deferred with signed waiver)
+* [ ] Has:
+
+  * [ ] secure redaction
+  * [ ] tenant isolation
+* [ ] Has a rollback plan via feature flag
+
+---
+
+## 8) Test plan
+
+### Unit tests
+
+* FirstSignal contract validation (required fields, enums)
+* Redaction patterns (bearer tokens, API keys, URLs, long strings)
+* Signature extraction rule correctness
+
+### Integration tests
+
+* Create job → snapshot exists → `/signal` returns it
+* Worker phase transitions update snapshot
+* Job fail → signature stored → next job → `/signal` includes lastKnownOutcome
+* SSE connect → receive events in order → terminal event once
+
+### Load tests (must-have)
+
+* `/jobs/{id}/signal`:
+
+  * cache-hit p95 ≤ 250ms
+  * cold path p95 ≤ 500ms
+  * error rate < 0.1% under expected concurrency
+
+### Chaos/degraded tests
+
+* Cache down → Postgres fallback works
+* SSE blocked → UI polls and still updates
+
+---
+
+## 9) Feature flags and rollout
+
+### Flags
+
+* `ttfs.first_signal_enabled` (default ON in staging)
+* `ttfs.cache_enabled`
+* `ttfs.failure_index_enabled`
+* `ttfs.sse_enabled`
+* `ttfs.policy_preeval_enabled`
+
+### Rollout steps
+
+1. Enable baseline FirstSignal + snapshots for internal/staging
+2. Enable cache in default profile
+3. Enable failure index (read-only first; then write)
+4. Enable SSE for 10% traffic → 50% → 100%
+5. Enable CI annotations (start with non-blocking informational checks)
+
+---
+
+## 10) PR review checklist (paste into PR template)
+
+* [ ] No blocking heavy work added to `/signal` path
+* [ ] Signal payload size remains < 1KB in normal cases
+* [ ] Excerpts are redacted + length-capped
+* [ ] Tenant boundary included in cache keys and DB queries
+* [ ] Metrics emitted (`ttfs_ms`, endpoint latency, cacheHit)
+* [ ] UI has no spinner-only state; always shows message + next action
+* [ ] Streaming has polling fallback
+* [ ] Tests added/updated (unit + integration)
+
+---
+
+## 11) “Ready for QA” scenarios
+
+QA should validate:
+
+* UI:
+
+  * click scan → first message within 2s typical
+  * see queued/started/blocked states clearly
+  * open logs works
+* CLI:
+
+  * first output within 2s typical
+  * follow stream updates
+* CI:
+
+  * first annotation/check appears quickly and links to job
+* Security:
+
+  * inject fake token into logs → stored excerpt is redacted
+* Multi-tenant:
+
+  * run jobs across tenants → no leakage in signals or hints
+
+---
+
+```
+
+If you want this split into **multiple repo-local files** (e.g., `/docs/TTFS/ARCH.md`, `/docs/TTFS/SCHEMAS.sql`, `/docs/TTFS/RUNBOOK.md`, plus a PR template snippet), say the folder structure you prefer and I’ll output them in the same paste-ready format.
+```
diff --git a/docs/product-advisories/13-Dec-2025 - Designing the Call‑Stack Reachability Engine.md b/docs/product-advisories/13-Dec-2025 - Designing the Call‑Stack Reachability Engine.md
new file mode 100644
index 000000000..8e29fea7a
--- /dev/null
+++ b/docs/product-advisories/13-Dec-2025 - Designing the Call‑Stack Reachability Engine.md	
@@ -0,0 +1,1258 @@
+Here’s a practical blueprint for building a **reachability‑first code+binary scanner** that fuses static call‑graphs with runtime evidence, and scales to large monorepos/microservices.
+
+---
+
+# 1) Static analyzers (per language)
+
+* **.NET (Roslyn / IL)**
+
+  * Parse solutions with `Microsoft.CodeAnalysis.MSBuild`, collect symbols, build call graph from `ISymbol` → `IInvocationOperation`.
+  * Handle reflection edges by heuristics (string literals, `Type.GetType`, DI registrations).
+  * IL pass: read assemblies with `System.Reflection.Metadata` to connect external/library calls.
+  * Minimal sample:
+
+    ```csharp
+    using Microsoft.CodeAnalysis;
+    using Microsoft.CodeAnalysis.CSharp;
+    using Microsoft.CodeAnalysis.MSBuild;
+
+    var ws = MSBuildWorkspace.Create();
+    var sln = await ws.OpenSolutionAsync(@"path\to.sln");
+    foreach (var proj in sln.Projects)
+    foreach (var doc in proj.Documents)
+    {
+        var model = await doc.GetSemanticModelAsync();
+        var root = await doc.GetSyntaxRootAsync();
+        foreach (var node in root.DescendantNodes().OfType<Microsoft.CodeAnalysis.CSharp.Syntax.InvocationExpressionSyntax>())
+        {
+            var sym = model.GetSymbolInfo(node).Symbol as IMethodSymbol;
+            if (sym != null)
+            {
+                // record edge: caller -> sym.ContainingType.Name + "." + sym.Name
+            }
+        }
+    }
+    ```
+* **Java (Soot or WALA)**
+
+  * Build bytecode call graph (CHA/RTA/points‑to) and export edges.
+  * Seed entrypoints from `public static void main`, Spring Boot controllers, servlet mappings.
+* **Node/Python**
+
+  * Build AST + import graph; resolve exports (`module.exports`, `export default`, Python `__all__`).
+  * Track dynamic requires (best‑effort string eval); record web/router handlers as entrypoints.
+* **Go/Rust**
+
+  * Use build graph (Go modules, Cargo metadata) + AST to map `main` and handler functions.
+  * Include linker‑time features/conditions to avoid dead edges.
+* **Binary‑only (containers, closed libs)**
+
+  * Recover function boundaries (Ghidra/rizin), mine strings/imports, detect candidates for entrypoints from container `ENTRYPOINT/CMD`, service files, and exposed ports.
+  * Heuristics: exported symbols, syscall usage, and common framework stubs.
+
+---
+
+# 2) Runtime confirmation (evidence)
+
+* **Windows/.NET:** ETW sampling to “mint” runtime edges (method IDs, stack samples) without heavy overhead.
+* **Linux/containers:** eBPF/usdt or perf sampling to confirm hot paths; record PID→image→build info to link evidence back to SBOM components.
+* **Rule:** static edge exists → mark **probable**; static+runtime match → mark **proven** (confidence ↑, prioritize).
+
+---
+
+# 3) Entrypoint discovery
+
+* **Web services:** framework routers (ASP.NET Core endpoints, Spring mappings, Express routes, FastAPI decorators).
+* **Jobs/CLIs:** scheduler configs (Cron, systemd timers, k8s CronJobs).
+* **Events:** message consumers (RabbitMQ/Kafka topics), gRPC service maps.
+
+Entrypoints seed reachability: start from entry, traverse call graph, intersect with SBOM → “reachable components + reachable vulns”.
+
+---
+
+# 4) Scale & storage
+
+* **Shard** by repo/service; compute graphs independently.
+* **Compress** with SCCs (strongly connected components) to shrink graph size.
+* **Cap cardinality** using hot‑path sampling (keep top‑N edges by observed frequency).
+* **Cache**: content‑addressed graphs keyed by `(SBOM hash, compiler flags, env)`; invalidate on source/SBOM/CFG changes or new VEX/policy.
+* **Store** edges as `(caller, callee, kind: static|runtime, weight, build-id)` in Postgres; keep Valkey for ephemeral reachability queries.
+
+---
+
+# 5) SBOM/VEX linkage
+
+* Normalize package coordinates (purl), map symbols/binaries → SBOM components.
+* For each CVE:
+
+  * **Reachable?** (entrypoint‑anchored traversal hits affected symbol/library)
+  * **Proven at runtime?** (evidence present)
+  * **Gated by config?** (feature flags, platform checks)
+* Emit VEX with machine‑explainable reasons (e.g., *not reachable*, *reachable but not loaded*, *reachable+proven*).
+
+---
+
+# 6) APIs and outputs (developer‑friendly)
+
+* **CLI**
+
+  * `scan graph --lang dotnet --sln path.sln --out graph.scc.json`
+  * `scan runtime --target pod/myservice --duration 30s --out stacks.json`
+  * `reachability join --graph graph.scc.json --runtime stacks.json --sbom bom.cdx.json --out reach.cdxr.json`
+* **HTTP**
+
+  * `POST /graph` (upload call graph)
+  * `POST /runtime` (upload evidence)
+  * `POST /reachability` → returns ranked, evidence‑linked findings
+* **Artifacts**
+
+  * `graph.scc.json` (SCC‑compressed call graph)
+  * `reach.cdxr.json` (CycloneDX extension with evidence)
+  * `vex.json` (OpenVEX/CSAF w/ “justifications”)
+
+---
+
+# 7) Quality gates & tests
+
+* **Golden images**: tiny test services where reachable/unreachable CVEs are known.
+* **Mutation tests**: toggle entrypoints, flags, and ensure reachability shifts correctly.
+* **Drift checks**: if runtime sees edges not in static graph → open “coverage debt” issue.
+
+---
+
+# 8) Security & perf knobs
+
+* Sampling rate caps (CPU bound), PID/image allowlists, PII‑safe symbol hashing option.
+* Offline mode: bundle symbols + evidence into a replayable archive (deterministic re‑evaluation).
+
+---
+
+If you want, I can generate a **starter repo layout** (Roslyn worker, Java WALA worker, eBPF sampler, joiner, and a Postgres schema) tailored to your .NET 10 + microservices stack.
+Below is a developer-ready **product + BA implementation specification** for the **Reachability-First Scanner** described earlier, tailored to **StellaOps (.NET 10)** and your standing architecture rules (**lattice algorithms run in `scanner.webservice`; Concelier/Excititor preserve prune source; Postgres is SoR; Valkey is ephemeral only**).
+
+---
+
+# StellaOps Reachability-First Scanner
+
+## Developer Implementation Specification (v1)
+
+## 0) Objective and boundaries
+
+### Objective
+
+Reduce vulnerability noise by classifying findings as **Unreachable / Possibly Reachable / Reachable (Static) / Proven Reachable (Runtime)** using:
+
+1. **Static call graph** (best-effort; language-aware)
+2. **Runtime evidence** (sampling, low overhead)
+3. **Entrypoint seeding** (framework-aware)
+4. **Join** against SBOM component mapping + vulnerability data (from Concelier) + VEX (from Excititor)
+
+### Non-goals (v1)
+
+* Perfect points-to analysis for all languages.
+* Full decompilation for every binary (support is “best-effort” with confidence).
+* Executing or fuzzing workloads.
+
+---
+
+# 1) Product behavior: what the user sees
+
+## 1.1 Reachability statuses (canonical)
+
+These labels must be stable across UI/CLI/API:
+
+* **UNREACHABLE**: no path from any discovered entrypoint to affected component/symbol.
+* **POSSIBLY_REACHABLE**: graph incomplete / dynamic behavior; heuristics indicate risk.
+* **REACHABLE_STATIC**: a static path exists from at least one entrypoint.
+* **REACHABLE_PROVEN**: runtime evidence confirms code path or library load (stronger than static).
+
+### Required explanation fields (always returned)
+
+Every reachability classification must include:
+
+* `why[]`: list of structured reasons (machine-readable codes + human text)
+* `evidence[]`: references to graph paths and/or runtime samples
+* `confidence`: 0.0–1.0
+* `scope`: component-only or symbol-level (if symbol mapping exists)
+
+## 1.2 Key UX outputs (pipeline-first)
+
+* CLI output for CI gates: `stella scan reachability --format sarif|json`
+* UI detail panel must show:
+
+  * Entry point(s) → path summary (k shortest paths, default k=3)
+  * Whether runtime proved it (samples, timestamps, container/build IDs)
+  * Which assumptions/heuristics were used (reflection, DI, dynamic import, etc.)
+
+---
+
+# 2) System architecture (StellaOps modules)
+
+## 2.1 Services and responsibilities
+
+### `StellaOps.Scanner.WebService` (authoritative)
+
+**Owns the reachability pipeline and the lattice computation for reachability decisions.**
+Responsibilities:
+
+* Ingest static graphs from language workers
+* Ingest runtime evidence (from collectors)
+* Normalize symbols → components (SBOM join)
+* Compute reachability results, confidence, and explanation artifacts
+* Expose query APIs and CI export formats
+* Persist everything to Postgres (SoR)
+* Use Valkey only as ephemeral accelerator
+
+### Language workers (stateless compute)
+
+Examples:
+
+* `StellaOps.Scanner.Worker.DotNet`
+* `StellaOps.Scanner.Worker.Java`
+* `StellaOps.Scanner.Worker.Node`
+* `StellaOps.Scanner.Worker.Python`
+* `StellaOps.Scanner.Worker.Go`
+* `StellaOps.Scanner.Worker.Rust`
+* `StellaOps.Scanner.Worker.Binary`
+
+Responsibilities:
+
+* Produce `CallGraph.v1.json` (+ optional `Entrypoints.v1.json`)
+* Provide symbol IDs stable within a scan (see hashing rules)
+
+### Runtime collectors (agent/sidecar; optional)
+
+* Windows: ETW/EventPipe sampling for .NET
+* Linux: eBPF/perf sampling for native; plus runtime-specific exporters where feasible
+
+Collectors only emit **evidence events**; they never compute reachability.
+
+### Concelier / Excititor integration
+
+* Concelier provides vulnerability facts (CVE ↔ component versions).
+* Excititor provides VEX statements.
+  **Neither computes reachability or lattice merges**; they provide **pruned sources** only.
+
+---
+
+# 3) Data contracts (hard requirements)
+
+## 3.1 Stable identifiers
+
+All graph nodes must have:
+
+* `nodeId`: stable across replays when code is unchanged.
+* `symbolKey`: canonical string (language-specific)
+* `artifactKey`: assembly/jar/module/binary identity (prefer build ID + path + hash)
+* Optional: `purlCandidates[]` (library mapping hints)
+
+**DotNet nodeId rule (v1):**
+`nodeId = SHA256(assemblyMvid + ":" + metadataToken + ":" + genericArity + ":" + signatureShape)`
+
+* If token unavailable (source-only), fallback: SHA256(projectPath + ":" + file + ":" + span + ":" + symbolDisplayString)
+
+## 3.2 CallGraph.v1.json
+
+Minimum required schema:
+
+```json
+{
+  "schema": "stella.callgraph.v1",
+  "scanKey": "uuid",
+  "language": "dotnet|java|node|python|go|rust|binary",
+  "artifacts": [{ "artifactKey": "…", "kind": "assembly|jar|module|binary", "sha256": "…" }],
+  "nodes": [{
+    "nodeId": "…",
+    "artifactKey": "…",
+    "symbolKey": "Namespace.Type::Method(…)",
+    "visibility": "public|internal|private|unknown",
+    "isEntrypointCandidate": false
+  }],
+  "edges": [{
+    "from": "nodeId",
+    "to": "nodeId",
+    "kind": "static|heuristic",
+    "reason": "direct_call|virtual_call|reflection_string|di_binding|dynamic_import|unknown",
+    "weight": 1.0
+  }],
+  "entrypoints": [{
+    "nodeId": "…",
+    "kind": "http|grpc|cli|job|event|unknown",
+    "route": "/api/orders/{id}",
+    "framework": "aspnetcore|minimalapi|spring|express|unknown"
+  }]
+}
+```
+
+## 3.3 RuntimeEvidence.v1.json
+
+```json
+{
+  "schema": "stella.runtimeevidence.v1",
+  "scanKey": "uuid",
+  "collectedAt": "2025-12-14T10:00:00Z",
+  "environment": {
+    "os": "linux|windows",
+    "k8s": { "namespace": "…", "pod": "…", "container": "…" },
+    "imageDigest": "sha256:…",
+    "buildId": "…"
+  },
+  "samples": [{
+    "timestamp": "…",
+    "pid": 1234,
+    "threadId": 77,
+    "frames": ["nodeId","nodeId","nodeId"],
+    "sampleWeight": 1.0
+  }],
+  "loadedArtifacts": [{
+    "artifactKey": "…",
+    "evidence": "loaded_module|mapped_file|jar_loaded"
+  }]
+}
+```
+
+---
+
+# 4) Postgres schema (system of record)
+
+## 4.1 Core tables
+
+You can implement with migrations in `StellaOps.Scanner.Persistence` (EF Core 9).
+
+### `scan`
+
+* `scan_id uuid pk`
+* `created_at timestamptz`
+* `repo_uri text null`
+* `commit_sha text null`
+* `sbom_digest text` (hash of SBOM input)
+* `policy_digest text` (hash of reachability policy inputs)
+* `status text` (NEW/RUNNING/DONE/FAILED)
+
+Indexes:
+
+* `(commit_sha, sbom_digest)` for caching
+
+### `artifact`
+
+* `artifact_id uuid pk`
+* `scan_id uuid fk`
+* `artifact_key text` unique per scan
+* `kind text`
+* `sha256 text`
+* `build_id text null`
+* `purl text null`
+
+Index:
+
+* `(scan_id, artifact_key)` unique
+
+### `cg_node`
+
+* `scan_id uuid fk`
+* `node_id text` (hash string)
+* `artifact_key text`
+* `symbol_key text`
+* `visibility text`
+* `flags int` (bitset: entrypointCandidate, external, generated, etc.)
+  PK: `(scan_id, node_id)`
+
+GIN index:
+
+* `symbol_key` trigram for search (optional)
+
+### `cg_edge`
+
+* `scan_id uuid fk`
+* `from_node_id text`
+* `to_node_id text`
+* `kind smallint` (0 static, 1 heuristic, 2 runtime_minted)
+* `reason smallint`
+* `weight real`
+  PK: `(scan_id, from_node_id, to_node_id, kind, reason)`
+
+Indexes:
+
+* `(scan_id, from_node_id)`
+* `(scan_id, to_node_id)`
+
+### `entrypoint`
+
+* `scan_id uuid`
+* `node_id text`
+* `kind text`
+* `framework text`
+* `route text null`
+  PK: `(scan_id, node_id, kind, framework, route)`
+
+### `runtime_sample`
+
+* `scan_id uuid`
+* `collected_at timestamptz`
+* `env_hash text` (hash of environment identity)
+* `sample_id bigserial pk`
+* `timestamp timestamptz`
+* `pid int`
+* `thread_id int`
+* `frames text[]` (nodeIds)
+* `weight real`
+
+Partition suggestion:
+
+* Partition by `scan_id` or by month depending on retention.
+
+### `symbol_component_map`
+
+* `scan_id uuid`
+* `node_id text`
+* `purl text`
+* `mapping_kind text` (exact|heuristic|external)
+* `confidence real`
+  PK: `(scan_id, node_id, purl)`
+
+### `reachability_component`
+
+* `scan_id uuid`
+* `purl text`
+* `status smallint` (0 unreachable, 1 possible, 2 reachable_static, 3 reachable_proven)
+* `confidence real`
+* `why jsonb`
+* `evidence jsonb`
+  PK: `(scan_id, purl)`
+
+### `reachability_finding`
+
+* `scan_id uuid`
+* `cve_id text`
+* `purl text`
+* `status smallint`
+* `confidence real`
+* `why jsonb`
+* `evidence jsonb`
+  PK: `(scan_id, cve_id, purl)`
+
+## 4.2 Valkey usage (ephemeral only)
+
+Allowed:
+
+* Dedup keys for evidence ingest (short TTL)
+* Hot query cache: `(scan_id, purl)` → reachability result
+* Rate limits / nonces
+
+Not allowed:
+
+* Authoritative queueing for scan state
+* Any “only copy” of results
+
+---
+
+# 5) Reachability computation (the actual algorithm)
+
+## 5.1 Inputs
+
+* Call graph nodes/edges + entrypoints
+* Runtime evidence (optional)
+* SBOM (CycloneDX/SPDX) with purls
+* Concelier vulnerability facts (CVE ↔ purl/version ranges)
+* Excititor VEX statements (not affected / affected / under investigation)
+
+## 5.2 Normalize to a graph suitable for traversal
+
+In `scanner.webservice`:
+
+1. Build adjacency list for `cg_edge.kind in (static, heuristic)`
+2. Optionally compress SCCs:
+
+   * Compute SCCs (Tarjan/Kosaraju)
+   * Store SCC mapping for explanation paths (must remain explainable)
+
+## 5.3 Entrypoint seeding rules
+
+Entrypoints come from:
+
+* Worker-reported entrypoints (preferred)
+* Framework discovery in worker (ASP.NET maps, Spring mappings, etc.)
+* Fallback: `Main`, exported symbols, container CMD/ENTRYPOINT
+
+**If entrypoints are empty**, mark all results as `POSSIBLY_REACHABLE` with reason `NO_ENTRYPOINTS_DISCOVERED`, unless runtime evidence exists.
+
+## 5.4 Traversal
+
+For each scan:
+
+* Start from all entrypoints; traverse reachable nodes.
+* Track:
+
+  * `firstSeenFromEntrypoint[node]` (for k-shortest path reconstruction)
+  * `pathWitness[node]` (parent pointers or compressed witness)
+
+Produce:
+
+* `reachableNodesStatic` set
+
+## 5.5 Join to components (SBOM)
+
+Map reachable nodes to purls using `symbol_component_map`.
+
+Mapping sources (priority order):
+
+1. Exact binary symbol → package metadata (where available)
+2. Assembly/jar/module to SBOM component (by hash/purl)
+3. Heuristics: namespace prefixes, import paths, jar manifest, npm package.json, go module path
+
+If a vulnerable purl is in SBOM but has **no symbol mapping**, component reachability defaults:
+
+* If artifact is **loaded at runtime** → at least `REACHABLE_PROVEN` (component level)
+* Else if referenced by static dependency graph → `POSSIBLY_REACHABLE`
+* Else → `UNREACHABLE` (with `NO_SYMBOL_MAPPING` reason)
+
+## 5.6 Runtime evidence upgrade (“minting”)
+
+If runtime evidence is present:
+
+* For each sample stack:
+
+  * Mark each frame node as “executed”
+  * Mint runtime edges: consecutive frames become `cg_edge.kind=runtime_minted` (optional table or derived view)
+* If any executed node maps to purl affected by CVE:
+
+  * Upgrade status to `REACHABLE_PROVEN`
+* If only loaded artifact exists:
+
+  * Upgrade component status to `REACHABLE_PROVEN` (component-only), but keep symbol-level as unknown.
+
+## 5.7 Confidence scoring (deterministic)
+
+A simple deterministic scoring function (v1) used everywhere:
+
+* Base:
+
+  * `UNREACHABLE` → 0.05
+  * `POSSIBLY_REACHABLE` → 0.35
+  * `REACHABLE_STATIC` → 0.70
+  * `REACHABLE_PROVEN` → 0.95
+* Modifiers:
+
+  * +0.10 if path uses only `static` edges (no heuristic)
+  * −0.15 if path includes `reflection_string|dynamic_import`
+  * +0.10 if runtime evidence hits a node in affected component
+  * −0.10 if entrypoints incomplete (`NO_ENTRYPOINTS_DISCOVERED`)
+    Clamp to `[0, 1]`.
+
+All modifiers must be recorded in `why[]`.
+
+---
+
+# 6) Language worker specs (what each worker must do)
+
+## 6.1 .NET worker (Roslyn + optional IL)
+
+**Goal (v1):** produce good-enough call graph + entrypoints for ASP.NET Core and workers.
+
+### Required features
+
+* Direct invocation edges: `InvocationExpressionSyntax`
+* Object creation edges: constructors
+* Delegate invocation: best-effort; record heuristic edge when target unresolved
+* Virtual/interface dispatch:
+
+  * record `virtual_call` edge to declared method
+  * optionally add edges to known overrides within solution (static, conservative)
+* Async/await: treat state machine calls as implementation detail; connect logical caller → awaited method
+
+### Entrypoint discovery (.NET)
+
+Implement these detectors:
+
+* `Program.Main` (classic)
+* ASP.NET Core:
+
+  * Controllers: `[ApiController]`, route attributes, action methods
+  * Minimal APIs: `MapGet/MapPost/MapMethods` patterns (syntactic + semantic)
+  * gRPC: `MapGrpcService<T>()` and service methods
+  * Hosted services: `IHostedService`, `BackgroundService.ExecuteAsync` as job entrypoints
+* Message consumers (if present): known libs patterns (e.g., MassTransit consumers)
+
+### Reflection and DI heuristics
+
+Produce **heuristic edges** when you see:
+
+* `Type.GetType("…")`, `Assembly.GetType`, `GetMethod("…")`, `Invoke`
+* `services.AddTransient<IFoo,Foo>()` / `AddScoped` / `AddSingleton`
+
+  * Add edge `IFoo` → `Foo` constructor as `di_binding` heuristic
+* `Activator.CreateInstance`, `ServiceProvider.GetService` patterns
+
+### Output guarantees
+
+* Must not crash on partial compilation (missing refs); produce partial graph with `why=COMPILATION_PARTIAL`
+* Provide `artifact_key` per assembly/project output
+
+## 6.2 Java / Node / Python / Go / Rust workers
+
+v1 expectations:
+
+* Provide import graph + framework entrypoints + best-effort call edges.
+* Always label uncertain resolution as `heuristic` with a reason code.
+
+## 6.3 Binary worker
+
+v1 expectations:
+
+* Identify artifacts, exported symbols, imported libs, and candidate entrypoints from container metadata.
+* Provide component-level mapping primarily; symbol-level mapping only when confident.
+
+---
+
+# 7) APIs (scanner.webservice)
+
+## 7.1 Ingestion endpoints
+
+* `POST /api/scans` → creates scan record (returns `scanId`)
+* `POST /api/scans/{scanId}/callgraphs` → accepts `CallGraph.v1.json`
+* `POST /api/scans/{scanId}/runtimeevidence` → accepts `RuntimeEvidence.v1.json`
+* `POST /api/scans/{scanId}/sbom` → accepts CycloneDX/SPDX
+* `POST /api/scans/{scanId}/compute-reachability` → triggers computation (idempotent)
+
+Rules:
+
+* All ingests must be **idempotent** via `contentDigest` header (store seen digests in Postgres; Valkey may accelerate dedupe).
+* Reject mismatched `scanKey/scanId`.
+
+## 7.2 Query endpoints
+
+* `GET /api/scans/{scanId}/reachability/components?purl=...`
+* `GET /api/scans/{scanId}/reachability/findings?cve=...`
+* `GET /api/scans/{scanId}/reachability/explain?cve=...&purl=...`
+
+  * returns `why[]` + path witness + sample refs
+
+## 7.3 Export endpoints
+
+* `GET /api/scans/{scanId}/exports/sarif`
+* `GET /api/scans/{scanId}/exports/cdxr` (CycloneDX reachability extension)
+* `GET /api/scans/{scanId}/exports/openvex` (reachability justifications as VEX annotations)
+
+---
+
+# 8) Deterministic replay requirements (must-have)
+
+Every reachability result must be reproducible from:
+
+* SBOM digest
+* CallGraph digests (per worker)
+* RuntimeEvidence digests (optional)
+* Concelier feed snapshot digest
+* Excititor VEX snapshot digest
+* Policy digest (confidence scoring + gating rules)
+
+Implement `ReplayManifest.json`:
+
+```json
+{
+  "schema": "stella.replaymanifest.v1",
+  "scanId": "uuid",
+  "inputs": {
+    "sbomDigest": "sha256:…",
+    "callGraphs": [{"language":"dotnet","digest":"sha256:…"}],
+    "runtimeEvidence": [{"digest":"sha256:…"}],
+    "concelierSnapshot": "sha256:…",
+    "excititorSnapshot": "sha256:…",
+    "policyDigest": "sha256:…"
+  }
+}
+```
+
+---
+
+# 9) Quality gates and acceptance criteria
+
+## 9.1 Golden corpus (mandatory)
+
+Create `/tests/Reachability.Golden/` with:
+
+* Minimal ASP.NET controller app with known reachable endpoint → vulnerable lib call
+* Minimal app with vulnerable lib present but never called → unreachable
+* Reflection-based activation case → “possible” unless runtime proves
+* BackgroundService job case
+
+**Acceptance**:
+
+* Each golden test asserts:
+
+  * Reachability status
+  * At least one `why[]` reason
+  * Deterministic `confidence` within ±0.01
+
+## 9.2 Drift detection (mandatory)
+
+If runtime minted edges not present in static graph above a threshold:
+
+* Emit `COVERAGE_DRIFT` warning with top missing edges
+* Store drift report in Postgres (`reachability_drift` table or JSONB field)
+
+## 9.3 Performance SLOs (v1 targets)
+
+* 1 medium service (100k LOC .NET) static graph: < 2 minutes on CI runner class machine
+* Reachability compute: < 30 seconds
+* Query `GET finding`: < 200ms p95 (use Postgres indexes + optional Valkey cache)
+
+---
+
+# 10) Implementation plan (developer checklist)
+
+## Milestone A — Data plumbing (1)
+
+* Implement Postgres tables + migrations
+* Implement ingestion endpoints
+* Persist call graphs and runtime evidence
+* Implement digesting + idempotency
+
+**Done when:** you can ingest a call graph + SBOM and query stored nodes/edges.
+
+## Milestone B — DotNet worker MVP (2)
+
+* Roslyn graph extraction (direct calls + constructors)
+* ASP.NET Core entrypoints (controllers + minimal APIs)
+* Output `CallGraph.v1.json`
+
+**Done when:** golden corpus “reachable/unreachable” works on static only.
+
+## Milestone C — Reachability compute (3)
+
+* Entry point seeding
+* SCC compression optional
+* Traversal + path witness generation
+* Join to SBOM purls
+* Generate `reachability_component` + `reachability_finding`
+
+**Done when:** component-level reachability returns stable answers.
+
+## Milestone D — Runtime evidence upgrade (4)
+
+* Ingest runtime evidence
+* Upgrade statuses to `REACHABLE_PROVEN`
+* Drift report
+
+**Done when:** runtime samples can flip a “possible” to “proven”.
+
+## Milestone E — Exports (5)
+
+* SARIF export
+* CycloneDX reachability extension export
+* OpenVEX annotations export
+
+**Done when:** CI pipeline can gate on proven/static reachability.
+
+---
+
+If you want this converted into **repo-facing artifacts** (e.g., `docs/modules/scanner/reachability-spec.md`, EF Core entity classes skeletons, and OpenAPI snippets for the endpoints), tell me the exact module paths you use under `/src/` and I’ll emit drop-in files matching your StellaOps naming conventions.
+Below is a concrete, implementable **architecture + technology stack** for **`StellaOps.Scanner.Worker.DotNet`** (the .NET static graph worker), written as if you’re handing it directly to an engineer to build.
+
+---
+
+# 1) Worker’s purpose in the StellaOps scanner system
+
+**`StellaOps.Scanner.Worker.DotNet` is a stateless “graph producer.”** It does **not** compute reachability, confidence, or any lattice logic (that lives in `scanner.webservice`). The worker:
+
+1. **Discovers entrypoints** (ASP.NET Core controllers, minimal APIs, gRPC, hosted services, etc.)
+2. **Extracts a static call graph** (method → method edges)
+3. **Adds heuristic edges** for DI/reflection/dynamic patterns
+4. **Emits `CallGraph.v1.json`** and optionally uploads it to `scanner.webservice`
+
+Key constraint: node IDs must be compatible with runtime evidence (EventPipe/ETW) mapping. That’s why we build node IDs from **(Module MVID + metadata token)** whenever possible.
+
+---
+
+# 2) Deployment model
+
+## 2.1 Container image choice
+
+You have two legitimate modes; implement both:
+
+### Mode A — “Artifacts-first” (preferred for security)
+
+* Input: already-built assemblies from CI (`bin/Release/.../*.dll` + associated files)
+* Worker does **no `dotnet build`**
+* Worker performs **IL/metadata scanning** + optional Roslyn source parsing for entrypoints/heuristics
+
+### Mode B — “Build-and-scan” (convenience; higher risk)
+
+* Input: repo checkout with `.sln`
+* Worker runs `dotnet restore`/`dotnet build` inside a sandboxed container, then scans outputs
+
+Because .NET build can execute **MSBuild tasks, analyzers, and source generators** (code execution risk), the product-default should be Mode A in any untrusted scenario.
+
+## 2.2 Runtime requirements
+
+* Base runtime: **.NET 10 (LTS)**. Microsoft’s support policy lists .NET 10 as LTS with original release **Nov 11, 2025** and latest patch **10.0.1 (Dec 9, 2025)**. ([Microsoft][1])
+* If you use Mode B, the image must include **.NET 10 SDK** (not just runtime). ([Microsoft][2])
+
+## 2.3 Sandbox controls (Mode B)
+
+If you allow building:
+
+* Run with **no outbound network** (or allowlist only internal NuGet proxy).
+* Read-only root FS; writable temp only.
+* Drop Linux capabilities; use seccomp/apparmor defaults.
+* Mount repo read-only; write outputs to a dedicated volume.
+* Disable telemetry: `DOTNET_CLI_TELEMETRY_OPTOUT=1`.
+
+---
+
+# 3) Core architecture (pipeline)
+
+Implement the worker as a single executable (CLI) with internal pipeline stages:
+
+```
+┌───────────────────────────────────────────────────────────────┐
+│ Worker.DotNet CLI                                              │
+│  Inputs: --sln / --assemblies / --repo, --scanKey, --out       │
+└───────────────┬───────────────────────────────────────────────┘
+                │
+                ▼
+┌───────────────────────────────────────────────────────────────┐
+│ Stage 0: Discovery                                              │
+│  - Find solutions/projects or assemblies                         │
+│  - Determine configuration/TFM                                   │
+└───────────────┬───────────────────────────────────────────────┘
+                │
+                ▼
+┌───────────────────────────────────────────────────────────────┐
+│ Stage 1: Build (optional)                                       │
+│  - dotnet restore/build OR skip                                 │
+│  - Collect output assembly paths                                │
+└───────────────┬───────────────────────────────────────────────┘
+                │
+                ▼
+┌───────────────────────────────────────────────────────────────┐
+│ Stage 2: Reference Indexer                                      │
+│  - Build mapping: (AssemblyName, Version) -> artifactKey        │
+│  - Compute sha256 per referenced dll                            │
+└───────────────┬───────────────────────────────────────────────┘
+                │
+                ▼
+┌───────────────────────────────────────────────────────────────┐
+│ Stage 3: IL Call Graph Extractor                                │
+│  - Parse each project assembly                                  │
+│  - Create method nodes (nodeId = hash(MVID:token))              │
+│  - Parse IL & add static edges (call/callvirt/newobj/ldftn...)  │
+│  - Emit external nodes for member refs                           │
+└───────────────┬───────────────────────────────────────────────┘
+                │
+                ▼
+┌───────────────────────────────────────────────────────────────┐
+│ Stage 4: Roslyn Entrypoints + Heuristics                        │
+│  - Controllers/minimal APIs/gRPC/HostedService entrypoints      │
+│  - DI binding edges (AddTransient/AddScoped/AddSingleton etc.)  │
+│  - Reflection edges (Type.GetType/GetMethod/Invoke etc.)        │
+│  - Resolve Roslyn symbols -> nodeIds via symbolKey dictionary    │
+└───────────────┬───────────────────────────────────────────────┘
+                │
+                ▼
+┌───────────────────────────────────────────────────────────────┐
+│ Stage 5: Merge + Emit                                           │
+│  - Merge nodes/edges/entrypoints                                │
+│  - Output CallGraph.v1.json                                     │
+│  - Optional POST to scanner.webservice                           │
+└───────────────────────────────────────────────────────────────┘
+```
+
+**Why IL-first?**
+Because you want **metadata token + MVID** node IDs that correlate naturally with runtime stacks. Deterministic builds make MVID stable for identical compilation inputs. ([Microsoft Learn][3])
+
+---
+
+# 4) Technology stack (NuGet + platform APIs)
+
+## 4.1 Roslyn / MSBuild loading
+
+Use Roslyn MSBuild workspace packages:
+
+* `Microsoft.CodeAnalysis.Workspaces.MSBuild` (MSBuildWorkspace support) ([NuGet][4])
+* `Microsoft.CodeAnalysis.CSharp.Workspaces` (C# semantic model / operations API)
+* Optional: `Microsoft.CodeAnalysis` meta-package (superset) ([NuGet][5])
+* `Microsoft.Build.Locator` (register MSBuild instances for workspace loading)
+
+Roslyn packages are actively published by RoslynTeam (latest shown as **5.0.0** as of Nov 2025). ([NuGet][6])
+
+## 4.2 IL + metadata scanning
+
+Prefer BCL APIs (no extra dependencies):
+
+* `System.Reflection.Metadata`
+* `System.Reflection.PortableExecutable`
+* `System.Reflection.Emit.OpCodes` for IL decoding (operand sizes)
+  (This lets you implement a compact IL parser without Cecil.)
+
+Optional alternative (faster development, more deps):
+
+* `Mono.Cecil` (makes IL traversal trivial) ([NuGet][7])
+
+## 4.3 CLI + logging + JSON
+
+* `System.CommandLine` (recommended)
+* `Microsoft.Extensions.Logging` (+ Console logger)
+* `System.Text.Json` (source-generated serializers strongly recommended)
+
+## 4.4 Runtime alignment note
+
+Runtime collectors commonly rely on EventPipe/ETW; the .NET diagnostics client library (`Microsoft.Diagnostics.NETCore.Client`) is the standard managed API for EventPipe sessions. ([Microsoft Learn][8])
+The worker itself doesn’t collect runtime evidence, but the **nodeId algorithm must match what runtime collectors can compute** (hence MVID+token).
+
+---
+
+# 5) Internal module decomposition
+
+Implement these internal components as classes/services. Keep them testable (pure functions where possible).
+
+## 5.1 `WorkerOptions`
+
+Holds CLI options:
+
+* `ScanKey` (uuid)
+* `RepoRoot`, `SolutionPath` OR `AssembliesPath[]`
+* `Configuration` (default Release)
+* `TargetFramework` (optional)
+* `BuildMode` = `Artifacts | Build`
+* `OutFile`
+* `UploadUrl` + `ApiKey` (optional)
+* `MaxEdgesPerNode` (optional throttle)
+* `IncludeExternalNodes` (bool)
+* `Concurrency` (int)
+
+## 5.2 `BuildOrchestrator` (Mode B only)
+
+Responsibilities:
+
+* Run `dotnet restore` and `dotnet build`
+* Capture output logs and surface them as structured diagnostics
+* Return discovered output assemblies (dll paths)
+
+Hard requirements:
+
+* Support `--no-restore` and `--no-build` toggles (or equivalent)
+* Support `ContinuousIntegrationBuild=true` to improve determinism when available
+* If build fails, still attempt to scan any assemblies that exist, but mark output with `why=BUILD_FAILED_PARTIAL`.
+
+## 5.3 `MsbuildWorkspaceLoader` (Roslyn)
+
+Responsibilities:
+
+* Register MSBuild with `MSBuildLocator`
+* Load `.sln` via `MSBuildWorkspace`
+* Provide:
+
+  * `Solution` object
+  * `Project` list (C# only for v1)
+  * Compilation(s) when needed (for semantic analysis)
+
+MSBuildWorkspace is the canonical Roslyn path for analyzing MSBuild solutions. ([NuGet][4])
+
+## 5.4 `ReferenceIndexer`
+
+Responsibilities:
+
+* Build a map from referenced assemblies to `artifactKey`
+* For each `PortableExecutableReference` with a file path:
+
+  * compute sha256
+  * read assembly identity (name, version)
+  * create `artifactKey`
+  * add to:
+
+    * `AssemblyIdentity -> artifactKey`
+    * `artifactKey -> sha256/path/version`
+
+This index is used by IL extractor to attribute **external nodes** to correct artifacts.
+
+## 5.5 `IlCallGraphExtractor`
+
+Responsibilities:
+
+* For each “root” assembly (project output):
+
+  * open PE
+  * get module MVID
+  * enumerate `MethodDefinition` rows
+  * create nodes for all methods
+  * parse IL bodies and emit edges
+
+### IL parsing scope (v1)
+
+You only need to recognize these opcodes as “calls”:
+
+* `call`
+* `callvirt`
+* `newobj`
+* `jmp`
+* `ldftn`
+* `ldvirtftn`
+
+### Node identity
+
+* Internal method nodeId:
+
+  * `nodeId = SHA256( MVID + ":" + metadataToken + ":" + arity + ":" + signatureShape )`
+  * Minimal acceptable: `SHA256(MVID + ":" + metadataToken)`
+
+This is intentionally compatible with how runtime stacks identify methods (module + token).
+
+### External method nodes
+
+If a call operand is a `MemberRef`/`MethodSpec` that targets another assembly:
+
+* Create an “external node” with:
+
+  * `symbolKey` computed from metadata signature
+  * `artifactKey` resolved via `ReferenceIndexer` (assembly identity match)
+  * `nodeId = SHA256("ext:" + artifactKey + ":" + symbolKey)` (runtime-proof not required)
+
+Set `flags |= External`.
+
+## 5.6 `RoslynEntrypointExtractor`
+
+Responsibilities:
+
+* Produce `entrypoints[]` records pointing to nodeIds.
+
+### Must support (v1)
+
+**ASP.NET Core MVC controllers**
+
+* Type has `[ApiController]` or derives from `ControllerBase`
+* Action methods: public instance methods with routing attributes `[HttpGet]`, `[HttpPost]`, `[Route]`, etc.
+* Route template:
+
+  * combine controller + action route attributes (best effort)
+* `entrypoint.kind = http`, `framework=aspnetcore`
+
+**Minimal APIs**
+
+* Detect invocation of `MapGet`, `MapPost`, `MapPut`, `MapDelete`, `MapMethods`
+* Extract route string literal when available
+* Handler target:
+
+  * lambda => map to generated method? (best effort)
+  * method group => resolve to method symbolKey => nodeId
+
+**gRPC**
+
+* Detect `MapGrpcService<T>()` (endpoint registration)
+* Entry points: service methods on generated base types (best effort)
+
+**Background jobs**
+
+* Types implementing `IHostedService`
+* `BackgroundService.ExecuteAsync` override
+* `entrypoint.kind = job`
+
+### Mapping Roslyn → nodeId
+
+Do **not** attempt to compute metadata tokens from Roslyn symbols directly.
+
+Instead:
+
+* Generate the same canonical `symbolKey` for Roslyn symbols
+* Resolve `symbolKey -> nodeId` using a dictionary built from IL nodes
+
+If not resolvable, emit an entrypoint with a synthetic “unresolved” node:
+
+* `nodeId = SHA256("unresolved:" + symbolKey)`
+* `flags |= Unresolved`
+* `why += ENTRYPOINT_SYMBOL_UNRESOLVED`
+
+## 5.7 `RoslynHeuristicEdgeExtractor`
+
+Responsibilities:
+
+* Add **heuristic edges** that IL won’t reliably capture.
+
+### DI bindings (must-have)
+
+Detect common DI registration patterns:
+
+* `services.AddTransient<IFoo, Foo>()`
+* `AddScoped`, `AddSingleton`
+  Emit heuristic edge:
+* from: interface method set? (v1 simplify to type-level constructor edge)
+* to: `Foo..ctor(...)` node
+* `reason = di_binding`
+
+Practical v1 implementation:
+
+* Create edge from a synthetic “DI container” node per assembly to implementation constructors.
+* Or create edges from the registration site method to the constructor.
+  (Choose one and keep consistent.)
+
+### Reflection (must-have)
+
+Emit heuristic edges with lower confidence:
+
+* `Type.GetType("Namespace.Type, Assembly")`
+* `Assembly.Load(...)`, `GetMethod("X")`, `Invoke`
+* `Activator.CreateInstance(...)`
+
+If string literal resolves to a type/method in the solution, create edge:
+
+* from: caller method
+* to: target method/ctor
+* `reason = reflection_string`
+
+If not resolvable, record a `why=REFLECTION_UNRESOLVED_STRING` diagnostic; do not crash.
+
+## 5.8 `GraphMerger`
+
+Responsibilities:
+
+* Merge nodes/edges/entrypoints from IL and Roslyn stages
+* De-duplicate edges by `(from,to,kind,reason)`
+* Apply optional throttles:
+
+  * cap edges per node
+  * drop low-weight heuristics if too many
+
+## 5.9 `CallGraphWriter`
+
+Responsibilities:
+
+* Serialize `CallGraph.v1.json` exactly to spec
+* Include:
+
+  * `artifacts[]` (project outputs + references)
+  * `nodes[]`, `edges[]`
+  * `entrypoints[]`
+  * `language = "dotnet"`
+  * `scanKey`
+
+---
+
+# 6) Canonical symbolKey format (critical for merges)
+
+Pick one canonical form and use it everywhere.
+
+Recommended v1 `symbolKey` shape:
+
+```
+{Namespace}.{TypeName}[`Arity][+Nested]::{MethodName}[`Arity]({ParamType1},{ParamType2},...)
+```
+
+Rules:
+
+* Use `System.*` full names for BCL types
+* Use `+` for nested types (metadata style)
+* Use backtick arity for generic type/method definitions
+* For arrays: `System.String[]`
+* For byref: `System.String&`
+
+**Implementation detail:**
+
+* IL extractor can build this from metadata signatures.
+* Roslyn extractor can build this using a controlled `SymbolDisplayFormat`.
+
+If you get this right, Roslyn → IL mapping becomes reliable.
+
+---
+
+# 7) CLI surface (what developers will actually run)
+
+Minimum viable commands:
+
+### Artifacts-first scan
+
+```bash
+stella-worker-dotnet scan \
+  --scanKey 00000000-0000-0000-0000-000000000000 \
+  --assemblies ./artifacts/bin/Release \
+  --out ./callgraph.json
+```
+
+### Build-and-scan (internal trusted only)
+
+```bash
+stella-worker-dotnet scan \
+  --scanKey ... \
+  --sln ./src/MySolution.sln \
+  --configuration Release \
+  --tfm net10.0 \
+  --buildMode build \
+  --out ./callgraph.json
+```
+
+### Upload to scanner.webservice
+
+```bash
+stella-worker-dotnet scan \
+  --scanKey ... \
+  --assemblies ./artifacts/bin/Release \
+  --upload https://scanner/api/scans/{scanId}/callgraphs \
+  --apiKey $STELLA_API_KEY
+```
+
+---
+
+# 8) Observability and failure behavior
+
+## 8.1 Structured diagnostics
+
+Always emit:
+
+* counts: nodes/edges/entrypoints
+* build outcome: success/failed/partial
+* list of projects scanned/skipped
+* unresolved symbol counts (entrypoints + heuristic edges)
+
+## 8.2 Hard failure vs partial output
+
+* If at least one assembly scanned, output a graph even if others fail.
+* Mark diagnostics in output:
+
+  * add `why`/`notes` (if you extend schema) OR log to stderr and let webservice record the warning on ingest.
+
+---
+
+# 9) Why this architecture works for reachability
+
+* **IL extraction** provides the most faithful call edges and stable node IDs.
+* **Roslyn adds what IL can’t:** framework entrypoints, DI and reflection heuristics.
+* Node IDs based on **MVID+token** align with deterministic compilation behavior and runtime tooling patterns. Deterministic compilation replaces timestamp/MVID with values derived from compilation inputs, enabling stable identity under stable inputs. ([Microsoft Learn][3])
+* Roslyn MSBuildWorkspace is the canonical way to load solutions/projects with correct references and compilation options. ([NuGet][4])
+
+---
+
+# 10) Implementation “must-do” checklist for the developer
+
+1. **Define canonical symbolKey** and implement it in:
+
+   * IL extractor
+   * Roslyn symbol formatter
+2. Implement `ReferenceIndexer` to map assembly identity → artifactKey.
+3. Implement IL extractor:
+
+   * nodes for method defs
+   * edges for call opcodes
+4. Implement entrypoint detectors (controllers + minimal APIs + hosted service).
+5. Implement DI + reflection heuristic edges.
+6. Merge and output `CallGraph.v1.json` matching schema.
+7. Add golden tests (small ASP.NET apps) verifying:
+
+   * entrypoint detection
+   * at least one static path exists
+   * DI/reflection edges appear with correct reason codes
+
+---
+
+If you want, I can also provide:
+
+* a **concrete folder layout** (`/src/StellaOps.Scanner.Worker.DotNet/...`) with project files,
+* the **exact `SymbolDisplayFormat`** you should use for symbolKey stability,
+* and an IL opcode decoder snippet that only records call-like opcodes (fast, minimal).
+
+[1]: https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core ".NET and .NET Core official support policy | .NET"
+[2]: https://dotnet.microsoft.com/en-US/download/dotnet/10.0 "Download .NET 10.0 (Linux, macOS, and Windows) | .NET"
+[3]: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-options/code-generation?utm_source=chatgpt.com "C# Compiler Options that control code generation"
+[4]: https://www.nuget.org/packages/Microsoft.CodeAnalysis.Workspaces.MSBuild/?utm_source=chatgpt.com "Microsoft.CodeAnalysis.Workspaces.MSBuild 5.0.0"
+[5]: https://www.nuget.org/packages/microsoft.codeanalysis?utm_source=chatgpt.com "Microsoft.CodeAnalysis 5.0.0"
+[6]: https://www.nuget.org/profiles/RoslynTeam?utm_source=chatgpt.com "NuGet Gallery | RoslynTeam"
+[7]: https://www.nuget.org/packages/mono.cecil/?utm_source=chatgpt.com "Mono.Cecil 0.11.6"
+[8]: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/diagnostics-client-library?utm_source=chatgpt.com "Diagnostics client library - .NET"
diff --git a/docs/product-advisories/13-Dec-2025 - Smart‑Diff - Defining Meaningful Risk Change.md b/docs/product-advisories/13-Dec-2025 - Smart‑Diff - Defining Meaningful Risk Change.md
new file mode 100644
index 000000000..253b7615e
--- /dev/null
+++ b/docs/product-advisories/13-Dec-2025 - Smart‑Diff - Defining Meaningful Risk Change.md	
@@ -0,0 +1,892 @@
+Here’s a crisp, first‑time‑friendly blueprint for **Smart‑Diff**—a minimal‑noise way to highlight only changes that actually shift security risk, not every tiny SBOM/VEX delta.
+
+---
+
+# What “Smart‑Diff” means (in plain terms)
+
+Smart‑Diff is the **smallest set of changes** between two builds/releases that **materially change risk**. We only surface a change when it affects exploitability or policy—not when a dev-only transitive bumped a patch with no runtime path.
+
+**Count it as a Smart‑Diff only if at least one of these flips:**
+
+* **Reachability:** new reachable vulnerable code appears, or previously reachable code becomes unreachable.
+* **VEX status:** a CVE’s status changes (e.g., to `not_affected`).
+* **Version vs affected ranges:** a dependency crosses into/out of a known vulnerable range.
+* **KEV/EPSS/Policy:** CISA KEV listing, EPSS spike, or your org policy gates change.
+
+Ignore:
+
+* CVEs that are both **unreachable** and **VEX = not_affected**.
+* Pure patch‑level churn that doesn’t cross an affected range and isn’t KEV‑listed.
+* Dev/test‑only deps with **no runtime path**.
+
+---
+
+# Minimal data model (practical)
+
+* **DiffSet { added, removed, changed }** for packages, symbols, CVEs, and policy gates.
+* **AffectedGraph { package → symbol → call‑site }**: reachability edges from entrypoints to vulnerable sinks.
+* **EvidenceLink { attestation | VEX | KEV | scanner trace }** per item, so every claim is traceable.
+
+---
+
+# Core algorithms (what makes it “smart”)
+
+* **Reachability‑aware set ops:** run set diffs only on **reachable** vuln findings.
+* **SemVer gates:** treat “crossing an affected range” as a boolean boundary; patch bumps inside a safe range don’t alert.
+* **VEX merge logic:** vendor or internal VEX that says `not_affected` suppresses noise unless KEV contradicts.
+* **EPSS‑weighted priority:** rank surfaced diffs by latest EPSS; KEV always escalates to top.
+* **Policy overlays:** org rules (e.g., “block any KEV,” “warn if EPSS > 0.7”) applied last.
+
+---
+
+# Example (why it’s quieter, but safer)
+
+* **OpenSSL 3.0.10 → 3.0.11** with VEX `not_affected` for a CVE: Smart‑Diff marks **risk down** and **closes** the prior alert.
+* A **transitive dev dependency** changes with **no runtime path**: Smart‑Diff **logs only**, no red flag.
+
+---
+
+# Implementation plan (Stella Ops‑ready)
+
+**1) Inputs**
+
+* SBOM (CycloneDX/SPDX) old vs new
+* VEX (OpenVEX/CycloneDX VEX)
+* Vuln feeds (NVD, vendor), **CISA KEV**, **EPSS**
+* Reachability traces (per language analyzers)
+
+**2) Normalize**
+
+* Map all deps to **purl**, normalize versions, index CVEs → affected ranges.
+* Ingest VEX and attach to CVE ↔ component with clear status precedence.
+
+**3) Build graphs**
+
+* Generate/refresh **AffectedGraph** per build: entrypoints → call stacks → vulnerable symbols.
+* Tag each finding with `{reachable?, vex_status, kev?, epss, policy_flags}`.
+
+**4) Diff**
+
+* Compute **DiffSet** between builds for:
+
+  * Reachable findings
+  * VEX statuses
+  * Version/range crossings
+  * Policy/KEV/EPSS gates
+
+**5) Prioritize & suppress**
+
+* Drop items that are **unreachable AND not_affected**.
+* Collapse patch‑level churn unless **KEV‑listed**.
+* Sort remaining by **KEV first**, then **EPSS**, then **runtime blast‑radius** (fan‑in/fan‑out).
+
+**6) Evidence**
+
+* Attach **EvidenceLink** to each surfaced change:
+
+  * VEX doc (line/ID)
+  * KEV entry
+  * EPSS score + timestamp
+  * Reachability call stack (top 1‑3 paths)
+
+**7) UX**
+
+* Pipeline‑first: output a **Smart‑Diff report JSON** + concise CLI table:
+
+  * `risk ↑/↓`, reason (reachability/VEX/KEV/EPSS), component@version, CVE, **one** example call‑stack.
+* UI is an explainer: expand to full stack, VEX note, KEV link, and “minimum safe change” suggestion.
+
+---
+
+# Module sketch (your stack)
+
+* **Services:** `Sbomer.Diff`, `Vexer.Merge`, `Scanner.Reachability`, `Feedser.KEV/EPSS`, `Policy.Engine`, `SmartDiff.Service`
+* **Store:** PostgreSQL (SoR), Valkey cache (ephemeral). Tables: `components`, `cves`, `vex_entries`, `reachability_edges`, `smartdiff_events`, `evidence_links`.
+* **APIs:**
+
+  * `POST /smartdiff/compare` → returns filtered diff + priorities
+  * `GET /smartdiff/:id/evidence` → links to VEX/KEV/EPSS + trace
+* **CI usage:** `stella smart-diff --old sbomA.json --new sbomB.json --vex vex.json --out smartdiff.json`
+
+---
+
+# Guardrails (to keep it deterministic)
+
+* Freeze feed snapshots per run (hash KEV/EPSS CSVs + VEX docs).
+* Version the merge rules (VEX precedence + policy) and emit in the report header.
+* Log the **exact** semver comparisons that triggered/exempted an alert.
+
+If you want, I can draft the **Postgres schema**, the **.NET 10 DTOs** for `DiffSet` and `AffectedGraph`, and a **CLI prototype** (`stella smart-diff`) you can drop into your pipeline.
+Noted: the services are **Concelier** (feeds: KEV/EPSS/NVD/vendor snapshots) and **Excititor** (VEX merge + status resolution). I’ll use those names going forward.
+
+Below is a **product + business analysis implementation spec** that a developer can follow to build the Smart‑Diff capability you described.
+
+---
+
+# 1) Product objective
+
+## Problem
+
+Classic SBOM/VEX diffs are noisy: they surface *all* dependency/CVE churn, even when nothing changes in **actual exploitable risk**.
+
+## Goal
+
+Produce a **Smart‑Diff report** between two builds/releases that highlights only changes that **materially impact security risk**, with evidence attached.
+
+## Success criteria
+
+* **Noise reduction:** >80% fewer diff items vs raw SBOM diff for typical builds (measured by count).
+* **No missed “high-risk flips”:** any change that creates or removes a **reachable vulnerable path** must appear.
+* **Traceability:** every surfaced Smart‑Diff item has at least **one evidence link** (VEX entry, reachability trace, KEV reference, feed snapshot hash, scanner output).
+
+---
+
+# 2) Scope
+
+## In scope (MVP)
+
+* Compare two “build snapshots”: `{SBOM, VEX, reachability traces, vuln feed snapshot, policy snapshot}`
+* Detect & report these change types:
+
+  1. **Reachability flips** (reachable ↔ unreachable)
+  2. **VEX status changes** (e.g., `affected` → `not_affected`)
+  3. **Version crosses vuln boundary** (safe ↔ affected range)
+  4. **KEV/EPSS/policy gate flips** (e.g., becomes KEV-listed)
+* Suppress noise using explicit rules (see section 6)
+* Output:
+
+  * JSON report for CI
+  * concise CLI output (table)
+  * optional UI list view (later)
+
+## Out of scope (for now)
+
+* Full remediation planning / patch PR automation
+* Cross-repo portfolio aggregation (doable later)
+* Advanced exploit intelligence beyond KEV/EPSS
+
+---
+
+# 3) Key definitions (developers must implement these exactly)
+
+## 3.1 Finding
+
+A “finding” is a tuple:
+
+`FindingKey = (component_purl, component_version, cve_id)`
+
+…and includes computed fields:
+
+* `reachable: bool | unknown`
+* `vex_status: enum` (see 3.3)
+* `in_affected_range: bool | unknown`
+* `kev: bool`
+* `epss_score: float | null`
+* `policy_flags: set<string>`
+* `evidence_links: list<EvidenceLink>`
+
+## 3.2 Material risk change (Smart‑Diff item)
+
+A change is “material” if it changes the computed **RiskState** for any `FindingKey` or creates/removes a `FindingKey` that is in-scope after suppression rules.
+
+## 3.3 VEX status vocabulary
+
+Normalize all incoming VEX statuses into a fixed internal enum:
+
+* `AFFECTED`
+* `NOT_AFFECTED`
+* `FIXED`
+* `UNDER_INVESTIGATION`
+* `UNKNOWN` (no statement or unparseable)
+
+> Note: Use OpenVEX/CycloneDX VEX mappings, but internal logic must operate on the above set.
+
+---
+
+# 4) System context and responsibilities
+
+You already have a modular setup. Developers should implement Smart‑Diff as a pipeline over these components:
+
+## Components (names aligned to your system)
+
+* **Sbomer**
+
+  * Ingest SBOM(s), normalize to purl/version graph
+* **Scanner.Reachability**
+
+  * Produce reachability traces: entrypoints → call paths → vulnerable symbol/sink
+* **Concelier**
+
+  * Fetch + snapshot vulnerability intelligence (NVD/vendor/OSV as applicable), **CISA KEV**, **EPSS**
+  * Provide *feed snapshot identifiers* (hashes) per run
+* **Excititor**
+
+  * Ingest and merge VEX sources
+  * Resolve a final `vex_status` per (component, cve)
+  * Provide precedence + explanation
+* **Policy.Engine**
+
+  * Evaluate org rules against a computed finding (e.g., “block if KEV”)
+* **SmartDiff.Service**
+
+  * Compute risk states for “old” and “new”
+  * Diff them
+  * Suppress noise
+  * Rank + output report with evidence
+
+---
+
+# 5) Developer deliverables
+
+## Deliverable A: Smart‑Diff computation library
+
+A deterministic library that takes:
+
+* `OldSnapshot` and `NewSnapshot` (see section 7)
+* returns a `SmartDiffReport`
+
+## Deliverable B: Service endpoint
+
+`POST /smartdiff/compare` returns report JSON.
+
+## Deliverable C: CLI command
+
+`stella smart-diff --old <dir|file> --new <dir|file> [--policy policy.json] --out smartdiff.json`
+
+---
+
+# 6) Smart‑Diff rules
+
+Developers must implement these as **explicit, testable rule functions**.
+
+## 6.1 Suppression rules (noise filters)
+
+A finding is **suppressed** if ALL apply:
+
+1. `reachable == false` (or `unknown` treated as false only if you explicitly decide; recommended: unknown is *not* suppressible)
+2. `vex_status == NOT_AFFECTED`
+3. `kev == false`
+4. no policy requires it (e.g., “report all vuln findings” override)
+
+**Patch churn suppression**
+
+* If a component version changes but:
+
+  * `in_affected_range` remains false in both versions, AND
+  * no KEV/policy flag flips,
+  * then suppress (don’t surface).
+
+**Dev/test dependency suppression (optional if you already tag scopes)**
+
+* If SBOM scope indicates `dev/test` AND `reachable == false`, suppress.
+* If reachability is unknown, do **not** suppress by scope alone (avoid false negatives).
+
+## 6.2 Material change detection rules
+
+Surface a Smart‑Diff item when any of the following changes between old and new:
+
+### Rule R1: Reachability flip
+
+* `reachable` changes: `false → true` (risk ↑) or `true → false` (risk ↓)
+* Include at least one call path as evidence if reachable is true.
+
+### Rule R2: VEX status flip
+
+* `vex_status` changes meaningfully:
+
+  * `AFFECTED ↔ NOT_AFFECTED`
+  * `UNDER_INVESTIGATION → NOT_AFFECTED` etc.
+* Changes involving `UNKNOWN` should be shown but ranked lower unless KEV.
+
+### Rule R3: Affected range boundary
+
+* `in_affected_range` flips:
+
+  * `false → true` (risk ↑)
+  * `true → false` (risk ↓)
+* This is the main guard against patch churn noise.
+
+### Rule R4: Intelligence / policy flip
+
+* `kev` changes `false → true` or `epss_score` crosses a configured threshold
+* any `policy_flag` changes severity (warn → block)
+
+---
+
+# 7) Snapshot contract (what Smart‑Diff compares)
+
+Define a stable internal format:
+
+```json
+{
+  "snapshot_id": "build-2025.12.14+sha.abc123",
+  "created_at": "2025-12-14T12:34:56Z",
+  "sbom": { "...": "CycloneDX or SPDX raw" },
+  "vex_documents": [ { "...": "OpenVEX/CycloneDX VEX raw" } ],
+  "reachability": {
+    "analyzer": "java-callgraph@1.2.0",
+    "entrypoints": ["com.app.Main#main"],
+    "paths": [
+      {
+        "component_purl": "pkg:maven/org.example/foo@1.2.3",
+        "cve": "CVE-2024-1234",
+        "sink": "org.example.foo.VulnClass#vulnMethod",
+        "callstack": ["...", "..."]
+      }
+    ]
+  },
+  "concelier_feed_snapshot": {
+    "kev_hash": "sha256:...",
+    "epss_hash": "sha256:...",
+    "vuln_db_hash": "sha256:..."
+  },
+  "policy_snapshot": { "policy_hash": "sha256:...", "rules": [ ... ] }
+}
+```
+
+**Implementation note**
+
+* SBOM/VEX can remain “raw”, but you must also build normalized indexes (in-memory or stored) for diffing.
+
+---
+
+# 8) Data normalization requirements
+
+## 8.1 Component identity
+
+* Use **purl** as canonical component ID.
+* Normalize casing, qualifiers, and version string normalization per ecosystem.
+
+## 8.2 Vulnerability identity
+
+* Use `CVE-*` as primary key where available.
+* If you ingest OSV IDs too, map them to CVE when possible but keep OSV ID in evidence.
+
+## 8.3 Affected range evaluation
+
+Implement:
+`bool? IsVersionInAffectedRange(version, affectedRanges)`
+
+Return `null` (unknown) if version cannot be parsed or range semantics are unknown.
+
+---
+
+# 9) Excititor: VEX merge requirements
+
+Developers should implement Excititor as a deterministic resolver:
+
+## 9.1 Inputs
+
+* List of VEX documents, each with metadata:
+
+  * `source` (vendor/internal/scanner)
+  * `issued_at`
+  * `signature/attestation` info (if present)
+
+## 9.2 Output
+
+For each `(component_purl, cve_id)`:
+
+* `final_status`
+* `winning_statement_id`
+* `precedence_reason`
+* `all_statements[]` (for audit)
+
+## 9.3 Precedence rules (recommendation)
+
+Implement as ordered priority (highest wins), unless overridden by your org:
+
+1. **Internal signed VEX** (security team attested)
+2. **Vendor signed VEX**
+3. **Internal unsigned VEX**
+4. **Scanner/VEX-like annotations**
+5. None → `UNKNOWN`
+
+Conflict handling:
+
+* If two same-priority statements disagree, pick newest by `issued_at`, but **record conflict** and surface it as a low-priority Smart‑Diff meta-item (optional).
+
+---
+
+# 10) Concelier: feed snapshot requirements
+
+Concelier must provide deterministic inputs to Smart‑Diff.
+
+## 10.1 What Concelier stores
+
+* KEV list snapshot
+* EPSS snapshot
+* Vulnerability database snapshot (your choice: NVD mirror, OSV, vendor advisories)
+
+## 10.2 Required APIs (internal)
+
+* `GET /concelier/snapshots/latest`
+* `GET /concelier/snapshots/{hash}`
+* `GET /concelier/kev/{snapshotHash}/is_listed?cve=CVE-...`
+* `GET /concelier/epss/{snapshotHash}/score?cve=CVE-...`
+
+## 10.3 Determinism
+
+Smart‑Diff report must include the snapshot hashes used, so the result can be reproduced.
+
+---
+
+# 11) RiskState computation (core dev logic)
+
+Implement a pure function:
+
+`RiskState ComputeRiskState(FindingKey key, Snapshot snapshot)`
+
+### Inputs used
+
+* SBOM: to confirm component exists, scope, runtime path
+* Concelier feeds: KEV, EPSS, affected ranges
+* Excititor: VEX status
+* Reachability analyzer output
+* Policy engine: flags based on org rules
+
+### Output
+
+```json
+{
+  "finding_key": { "purl": "...", "version": "...", "cve": "..." },
+  "reachable": true,
+  "vex_status": "AFFECTED",
+  "in_affected_range": true,
+  "kev": false,
+  "epss": 0.42,
+  "policy": {
+    "decision": "WARN|BLOCK|ALLOW",
+    "flags": ["epss_over_0_4"]
+  },
+  "evidence": [
+    { "type": "reachability_trace", "ref": "trace:abc", "detail": "short call stack..." },
+    { "type": "vex", "ref": "openvex:doc123#stmt7" },
+    { "type": "concelier_snapshot", "ref": "sha256:..." }
+  ]
+}
+```
+
+---
+
+# 12) Diff engine specification
+
+## 12.1 Inputs
+
+* `OldRiskStates: map<FindingKey, RiskState>`
+* `NewRiskStates: map<FindingKey, RiskState>`
+
+You build these maps by:
+
+1. Enumerating candidate findings in each snapshot:
+
+   * from vulnerability matching against SBOM components (affected ranges)
+   * plus any VEX statements referencing components
+2. Joining with reachability traces
+3. Resolving status via Excititor
+4. Applying Concelier intelligence + policy
+
+## 12.2 Diff output types
+
+Return `SmartDiffItem` with:
+
+* `change_type`: `ADDED|REMOVED|CHANGED`
+* `risk_direction`: `UP|DOWN|NEUTRAL`
+* `reason_codes`: `[REACHABILITY_FLIP, VEX_FLIP, RANGE_FLIP, KEV_FLIP, POLICY_FLIP, EPSS_THRESHOLD]`
+* `old_state` / `new_state`
+* `priority_score`
+* `evidence_links[]`
+
+## 12.3 Suppress AFTER diff, not before
+
+Important: compute diff on full sets, then suppress items by rules, because:
+
+* suppression itself can flip (e.g., VEX becomes `not_affected` → item disappears, which is meaningful as “risk down”).
+
+---
+
+# 13) Priority scoring & ranking
+
+Implement a deterministic score:
+
+### Hard ordering
+
+1. `kev == true` in new state → top tier
+2. Reachable in new state (`reachable == true`) → next tier
+
+### Numeric scoring (example)
+
+```
+score =
+  + 1000 if new.kev
+  + 500 if new.reachable
+  + 200 if reason includes RANGE_FLIP to affected
+  + 150 if VEX_FLIP to AFFECTED
+  + 0..100 based on EPSS (epss * 100)
+  + policy weight: +300 if decision BLOCK, +100 if WARN
+```
+
+Always include `score_breakdown` in report for explainability.
+
+---
+
+# 14) Evidence requirements (must implement)
+
+Every Smart‑Diff item must include **at least one** evidence link, and ideally 2–4:
+
+EvidenceLink schema:
+
+```json
+{
+  "type": "vex|reachability|kev|epss|scanner|sbom|policy",
+  "ref": "stable identifier",
+  "summary": "one-line human readable",
+  "blob_hash": "sha256 of raw evidence payload (optional)"
+}
+```
+
+Examples:
+
+* `type=kev`: ref is `concelier:kev@{snapshotHash}#CVE-2024-1234`
+* `type=reachability`: ref is `reach:{snapshotId}:{traceId}`
+* `type=vex`: ref is `openvex:{docHash}#statement:{id}`
+
+---
+
+# 15) API specification
+
+## 15.1 Compare endpoint
+
+`POST /smartdiff/compare`
+
+Request:
+
+```json
+{
+  "old_snapshot_id": "buildA",
+  "new_snapshot_id": "buildB",
+  "options": {
+    "include_suppressed": false,
+    "max_items": 200,
+    "epss_threshold": 0.7
+  }
+}
+```
+
+Response:
+
+```json
+{
+  "report_id": "smartdiff:2025-12-14:xyz",
+  "old": { "snapshot_id": "buildA", "feed_hashes": { ... } },
+  "new": { "snapshot_id": "buildB", "feed_hashes": { ... } },
+  "summary": {
+    "risk_up": 3,
+    "risk_down": 8,
+    "reachable_new": 2,
+    "kev_new": 1,
+    "suppressed": 143
+  },
+  "items": [
+    {
+      "change_type": "CHANGED",
+      "risk_direction": "UP",
+      "priority_score": 1680,
+      "reason_codes": ["REACHABILITY_FLIP","RANGE_FLIP"],
+      "finding_key": {
+        "purl": "pkg:maven/org.example/foo",
+        "version_old": "1.2.3",
+        "version_new": "1.2.4",
+        "cve": "CVE-2024-1234"
+      },
+      "old_state": { "...": "RiskState" },
+      "new_state": { "...": "RiskState" },
+      "evidence": [ ... ]
+    }
+  ]
+}
+```
+
+## 15.2 Evidence endpoint
+
+`GET /smartdiff/{report_id}/evidence/{evidence_ref}`
+
+Returns raw stored evidence (or a signed URL if you store blobs elsewhere).
+
+---
+
+# 16) CLI behavior
+
+Command:
+
+```
+stella smart-diff \
+  --old ./snapshots/buildA \
+  --new ./snapshots/buildB \
+  --policy ./policy.json \
+  --out ./smartdiff.json
+```
+
+CLI output (human):
+
+* Summary line: `risk ↑ 3 | risk ↓ 8 | new reachable 2 | new KEV 1`
+* Then top N items sorted by priority, each one line:
+
+  * `↑ REACHABILITY_FLIP foo@1.2.4 CVE-2024-1234 (EPSS 0.42) path: Main→...→vulnMethod`
+
+Exit code:
+
+* `0` if policy decision overall is ALLOW/WARN
+* `2` if any item triggers policy BLOCK in new snapshot (configurable)
+
+---
+
+# 17) Storage schema (Postgres) — implementation-ready
+
+You can implement in a single schema to start; split later.
+
+## Core tables
+
+### `snapshots`
+
+* `snapshot_id (pk)`
+* `created_at`
+* `sbom_hash`
+* `policy_hash`
+* `kev_hash`
+* `epss_hash`
+* `vuln_db_hash`
+* `metadata jsonb`
+
+### `components`
+
+* `component_id (pk)` (internal UUID)
+* `snapshot_id (fk)`
+* `purl`
+* `version`
+* `scope` (runtime/dev/test/unknown)
+* `direct bool`
+* indexes on `(snapshot_id, purl)` and `(purl, version)`
+
+### `findings`
+
+* `finding_id (pk)`
+* `snapshot_id (fk)`
+* `purl`
+* `version`
+* `cve`
+* `reachable bool null`
+* `vex_status text`
+* `in_affected_range bool null`
+* `kev bool`
+* `epss real null`
+* `policy_decision text`
+* `policy_flags text[]`
+* index `(snapshot_id, purl, cve)`
+
+### `reachability_traces`
+
+* `trace_id (pk)`
+* `snapshot_id (fk)`
+* `purl`
+* `cve`
+* `sink`
+* `callstack jsonb`
+* index `(snapshot_id, purl, cve)`
+
+### `vex_statements`
+
+* `stmt_id (pk)`
+* `snapshot_id (fk)`
+* `purl`
+* `cve`
+* `source`
+* `issued_at`
+* `status`
+* `doc_hash`
+* `raw jsonb`
+* index `(snapshot_id, purl, cve)`
+
+### `smartdiff_reports`
+
+* `report_id (pk)`
+* `created_at`
+* `old_snapshot_id`
+* `new_snapshot_id`
+* `options jsonb`
+* `summary jsonb`
+
+### `smartdiff_items`
+
+* `item_id (pk)`
+* `report_id (fk)`
+* `change_type`
+* `risk_direction`
+* `priority_score`
+* `reason_codes text[]`
+* `purl`
+* `cve`
+* `old_version`
+* `new_version`
+* `old_state jsonb`
+* `new_state jsonb`
+
+### `evidence_links`
+
+* `evidence_id (pk)`
+* `report_id (fk)`
+* `item_id (fk)`
+* `type`
+* `ref`
+* `summary`
+* `blob_hash`
+
+---
+
+# 18) Implementation plan (developer-focused)
+
+## Phase 1 — MVP (end-to-end working)
+
+1. **Normalize SBOM**
+
+   * Parse CycloneDX/SPDX
+   * Build `components` list with purl + version + scope
+2. **Concelier integration**
+
+   * Load KEV + EPSS snapshots (even from local files initially)
+   * Expose snapshot hashes
+3. **Excititor integration**
+
+   * Parse OpenVEX/CycloneDX VEX
+   * Implement precedence rules and output `final_status`
+4. **Affected range matching**
+
+   * For each component, query vulnerability DB snapshot for affected ranges
+   * Produce candidate findings `(purl, version, cve)`
+5. **Reachability ingestion**
+
+   * Accept reachability JSON traces (even if generated elsewhere initially)
+   * Mark `reachable=true` when trace exists for (purl,cve)
+6. **Compute RiskState**
+
+   * For each finding compute `kev`, `epss`, `policy_decision`
+7. **Diff + suppression + ranking**
+
+   * Generate `SmartDiffReport`
+8. **Outputs**
+
+   * JSON report + CLI table
+   * Store report + items in Postgres
+
+Acceptance tests for Phase 1:
+
+* Given a known pair of snapshots, Smart‑Diff only includes:
+
+  * reachable vulnerable changes
+  * VEX flips
+  * affected range boundary flips
+  * KEV flips
+* Patch churn not crossing ranges is absent.
+
+## Phase 2 — Determinism & evidence hardening
+
+* Store raw evidence blobs (VEX doc hash, trace payload hash)
+* Ensure feed snapshots are immutable and referenced by hash
+* Add `score_breakdown`
+* Add conflict surfacing for VEX merge
+
+## Phase 3 — Performance & scale
+
+* Incremental computation (only recompute affected components changed)
+* Cache Concelier lookups by `(snapshotHash, cve)`
+* Batch range matching queries
+* Add pagination and `max_items` enforcement
+
+---
+
+# 19) Edge cases developers must handle
+
+1. **Reachability unknown**
+
+   * If no analyzer output exists, set `reachable = null`
+   * Do not suppress solely based on `reachable=null`
+2. **Version parse failures**
+
+   * `in_affected_range = null`
+   * Surface range-related changes only when one side is determinable
+3. **Component renamed / purl drift**
+
+   * Consider purl normalization rules (namespace casing, qualifiers)
+   * If purl changes but is same artifact, treat as new component (unless you implement alias mapping later)
+4. **Multiple CVE sources / duplicates**
+
+   * Deduplicate by CVE ID per component+version
+5. **Conflicting VEX statements**
+
+   * Pick winner deterministically, but log conflict evidence
+6. **KEV listed but VEX says not affected**
+
+   * Still suppress? Recommended:
+
+     * Do **not** suppress; surface as “KEV listed but VEX not_affected” and rank high (KEV tier)
+7. **Policy config changes**
+
+   * Treat policy hash difference as a diff dimension; surface “policy flip” items even if underlying vuln unchanged
+
+---
+
+# 20) Testing strategy (must implement)
+
+## Unit tests
+
+* SemVer compare + affected range evaluation
+* Excititor precedence resolution
+* Suppression rules (table-driven tests)
+* Priority scoring determinism
+
+## Integration tests
+
+* Build synthetic snapshots:
+
+  * A: vuln present, unreachable, VEX not_affected
+  * B: same vuln reachable
+  * Assert Smart‑Diff surfaces exactly one item with `REACHABILITY_FLIP`
+* KEV flip test:
+
+  * Same findings, KEV list changes between Concelier snapshots
+  * Assert item surfaces with `KEV_FLIP`
+
+## Regression suite
+
+Keep a folder of snapshot pairs and expected outputs:
+
+* `fixtures/snapA`, `fixtures/snapB`, `expected.smartdiff.json`
+
+---
+
+# 21) What the developer should code first (practical order)
+
+1. DTOs:
+
+   * `Snapshot`, `Component`, `VexStatement`, `ReachTrace`, `FindingKey`, `RiskState`, `SmartDiffItem`, `SmartDiffReport`
+2. Pure functions:
+
+   * `NormalizePurl`
+   * `IsVersionInAffectedRange`
+   * `ResolveVexStatus` (Excititor)
+   * `ComputeRiskState`
+   * `DiffRiskStates`
+   * `ApplySuppression`
+   * `ScoreAndRank`
+3. Persistence:
+
+   * store snapshots and computed findings
+4. API + CLI wrappers
+
+---
+
+If you want, I can also provide:
+
+* a **concrete JSON Schema** for `SmartDiffReport`
+* **C# (.NET 10) interfaces + class skeletons** for `ConcelierClient`, `ExcititorResolver`, and `SmartDiffService`
+* a **fixture set** (sample SBOM/VEX/reach traces) to bootstrap the test suite
diff --git a/docs/product-advisories/14-Dec-2025 - Add a dedicated “first_signal” event.md b/docs/product-advisories/14-Dec-2025 - Add a dedicated “first_signal” event.md
new file mode 100644
index 000000000..69fb49ac4
--- /dev/null
+++ b/docs/product-advisories/14-Dec-2025 - Add a dedicated “first_signal” event.md	
@@ -0,0 +1,1295 @@
+Here’s a lightweight pattern to make failures show up instantly while keeping backends decoupled: **emit a tiny, versioned event the moment you know something failed**, and attach pointers to heavier evidence that can arrive later.
+
+---
+
+# Why this helps
+
+* **UI reacts in real time**: show “Failed at Step X (E123)” immediately—no waiting for logs, SBOMs, or artifacts to upload/process.
+* **Backends evolve safely**: logs, traces, SBOM/VEX, heap dumps, etc., can change format or arrive out of order without breaking the UI contract.
+* **Deterministic UX**: a small, stable schema prevents flaky pipelines from blocking visibility.
+* **Great for air‑gapped/offline**: the tiny event rides your internal bus/storage; bulky payloads sync or materialize when available.
+
+---
+
+# The event itself (keep it tiny)
+
+**Fields (stable, versioned):**
+
+* `v` — schema version (e.g., `1`).
+* `ts` — event timestamp (UTC, ISO 8601).
+* `run_id` — pipeline/execution correlation ID.
+* `stage` — coarse phase (e.g., `fetch`, `build`, `scan`, `policy`, `deploy`).
+* `step` — fine-grained step (e.g., `trivy-scan`, `dotnet-restore`).
+* `status` — `fail|warn|pass|info` (for this pattern, you’ll use `fail`).
+* `error_class` — stable classifier (e.g., `NETWORK_DNS`, `AUTH_EXPIRED`, `POLICY_BLOCK`, `VULN_REACHABLE`).
+* `summary` — short human string (“Reachable vuln blocks release”).
+* `pointers` — array of *opaque, resolvable references* (log offsets, artifact URIs, attestation IDs).
+* `kv` — optional tiny key/values for quick filtering (e.g., `severity=A`, `package=openssl`).
+* `sig` (optional) — detached/inline signature (DSSE) for integrity.
+
+**Example**
+
+```json
+{
+  "v": 1,
+  "ts": "2025-12-13T12:10:03Z",
+  "run_id": "run_7f3c6a8",
+  "stage": "policy",
+  "step": "vex-gate",
+  "status": "fail",
+  "error_class": "VULN_REACHABLE",
+  "summary": "Reachable CVE blocks release",
+  "pointers": [
+    {"type":"log", "ref":"logs://scanner/7f3c6a8#L1423-L1480"},
+    {"type":"attestation", "ref":"rekor://sha256:…"},
+    {"type":"sbom", "ref":"artifact://sbom/cyclonedx@run_7f3c6a8.json"}
+  ],
+  "kv": {"cve":"CVE-2025-12345", "component":"openssl", "severity":"A"}
+}
+```
+
+---
+
+# UI behavior (instant, then enrich)
+
+1. **Instant render** (sub-200 ms): show a red card with stage/step, `error_class`, and `summary`.
+2. **Progressive hydration**: as pointers resolve, add:
+
+   * “View log excerpt” (jump to `#L1423-L1480`)
+   * “Open attestation” (verify DSSE/Rekor)
+   * “Inspect SBOM diff” (component → version → call‑graph)
+3. **Stable affordances**: UI never breaks if a pointer is slow/missing; it just shows a spinner or “awaiting evidence”.
+
+---
+
+# Backend contract
+
+* **Publish early**: emit on first knowledge of failure (e.g., non‑zero exit, policy deny, TLS error).
+* **Don’t embed heavy data**: only pointers or tiny facts for filters.
+* **Pointer resolution is pluggable**: files, object storage, Postgres row, Valkey cache key, Rekor entry—whatever suits the deployment.
+* **Version discipline**: bump `v` only for breaking schema changes; additive fields are fine.
+
+---
+
+# Minimal topic map (so teams agree on names)
+
+* `stage`: `fetch|build|scan|policy|sign|package|deploy`
+* `error_class` suggestions:
+
+  * Infra: `NETWORK_DNS`, `NETWORK_TIMEOUT`, `REGISTRY_403`, `DISK_FULL`
+  * AuthN/Z: `AUTH_EXPIRED`, `TOKEN_SCOPE_MISS`
+  * Supply chain: `ATTESTATION_MISSING`, `SIGNATURE_INVALID`, `SBOM_STALE`
+  * Secure build: `POLICY_BLOCK`, `VULN_REACHABLE`, `MALWARE_FLAG`
+  * Runtime: `IMAGE_DRIFT`, `PROVENANCE_MISMATCH`
+
+Keep each to a 1–2 line definition in a shared doc.
+
+---
+
+# Drop‑in for Stella Ops (tailored)
+
+* **Emitter**: `StellaOps.Events` (tiny .NET lib) used by Scanner/Policy/Scheduler to publish `TinyFailureEvent`.
+* **Transport**: Postgres notify (default) + Valkey pub/sub accelerator. (Matches your Postgres+Valkey architecture choice.)
+* **Resolver service**: `EvidenceGateway` that turns `pointers` into viewable slices (log excerpts, SBOM component focus, Rekor proof).
+* **UI**: “Failure Feed” panel shows cards from the event stream; detail drawer resolves pointers on demand.
+* **Signing**: optional DSSE for events; Rekor (or mirror) for attestations—your “Proof‑Linked” moat.
+* **Air‑gap**: pointers use `artifact://` and `row://` schemes resolvable entirely on‑prem.
+
+---
+
+# Quick implementation checklist
+
+* Define `TinyFailureEvent` schema v1 and `error_class` registry.
+* Add emit helpers for each module (`FailNow(summary, error_class, pointers, kv)`).
+* Build `EvidenceGateway.Resolve(pointer)` handlers.
+* UI: render card instantly; hydrate sections as resolvers return.
+* Telemetry: metrics on TTF**E** (Time‑To‑Failure‑Event) and pointer hydration latencies.
+* Docs: 1‑page contract; examples for each error_class.
+
+If you want, I can draft the .NET 10 interfaces (`ITinyEventEmitter`, resolvers, and a small Razor/Angular card) and a Postgres schema you can paste into your repo.
+Below is a **PM-grade implementation spec** for “Real-time Failure Signaling” using **Tiny Failure Events** + **Evidence Pointers**, written so engineers can build it without guessing.
+
+---
+
+# Product: Real-time Failure Signaling (Tiny Failure Events)
+
+## Goal
+
+When any pipeline run fails, users must see **what failed and where** (stage/step + error class + short summary) **immediately**, even if logs/SBOM/attestations are delayed, huge, or unavailable.
+
+The UI must render a failure card from a tiny event and then progressively enrich with evidence as it becomes resolvable.
+
+## Outcomes we must deliver
+
+1. **Instant visibility:** “Failed at Step X” appears within seconds of failure.
+2. **Decoupling:** UI depends only on a stable tiny schema, not on log formats/artifact structures.
+3. **Evidence linking:** Users can open logs/SBOM/attestations when available, via pointers.
+4. **Reliability:** Duplicate/out-of-order events don’t break the UI; state remains consistent.
+5. **Security:** Evidence access is authorized; pointers do not leak sensitive info.
+
+---
+
+# Scope
+
+## In scope (MVP)
+
+* Emit **TinyFailureEvent v1** on first detected failure for a step.
+* Transport events in near real-time to UI.
+* Store events durably and allow UI to fetch a run’s event timeline.
+* Support evidence pointers for:
+
+  * logs (excerptable)
+  * artifacts (SBOM, reports)
+  * attestations (provenance/signature)
+* UI:
+
+  * show run timeline
+  * show failure card instantly
+  * hydrate evidence sections on demand (or automatically where feasible)
+
+## Out of scope (MVP)
+
+* Full trace viewer / distributed tracing UI (we can link to external trace systems via pointer).
+* Automated remediation (“fix it”) actions.
+* Full-blown case management.
+
+---
+
+# Key terms and definitions
+
+* **Run:** A single execution of a pipeline. Identified by `run_id`.
+* **Stage:** Coarse lifecycle phase (`fetch`, `build`, `scan`, `policy`, `sign`, `package`, `deploy`).
+* **Step:** A concrete activity within a stage (`dotnet-restore`, `trivy-scan`, `vex-gate`).
+* **Tiny Failure Event:** A small message representing “this step failed”, including stable classification and references to evidence.
+* **Pointer:** An opaque reference that can be resolved into evidence content or a link later.
+
+---
+
+# User stories and acceptance criteria
+
+## Story 1: I see failure instantly
+
+**As a** developer
+**I want** to see which step failed immediately
+**So that** I don’t wait on logs/artifacts
+
+**Acceptance criteria**
+
+* When a step fails, the UI updates within **≤ 2 seconds p95** from the time the orchestrator/runner detects failure.
+* The failure card includes:
+
+  * stage, step
+  * error class
+  * human summary
+  * timestamp
+  * (optional) primary key/value details (e.g., CVE, severity)
+
+## Story 2: I can open evidence when available
+
+**As a** release engineer
+**I want** to click evidence links (logs/SBOM/attestation)
+**So that** I can diagnose/root-cause
+
+**Acceptance criteria**
+
+* Failure card shows evidence sections as:
+
+  * **Available** (clickable)
+  * **Pending** (spinner / “awaiting evidence”)
+  * **Unavailable** (“not produced” or “access denied”)
+* Clicking log evidence opens an excerpt view, not a 500MB file download.
+* Evidence access enforces authorization (same as run access).
+
+## Story 3: Events are robust to duplicates/out-of-order
+
+**As a** user
+**I want** the timeline to remain correct
+**Even if** event delivery is at-least-once
+
+**Acceptance criteria**
+
+* UI displays exactly one current “failed” state per step attempt.
+* Duplicate events do not create duplicate cards.
+* Out-of-order arrival does not revert a step from fail → pass.
+
+---
+
+# Functional requirements (what developers must build)
+
+## FR1: TinyFailureEvent schema v1
+
+### Required fields
+
+All producers MUST emit events that validate against this schema.
+
+```json
+{
+  "v": 1,
+  "event_id": "evt_01J…", 
+  "ts": "2025-12-13T12:10:03.123Z",
+  "run_id": "run_7f3c6a8",
+  "stage": "policy",
+  "step": "vex-gate",
+  "attempt": 1,
+  "status": "fail",
+  "error_class": "VULN_REACHABLE",
+  "summary": "Reachable CVE blocks release",
+  "pointers": [],
+  "kv": {}
+}
+```
+
+### Field definitions & constraints
+
+* `v` (int, required): must be `1` for this spec.
+* `event_id` (string, required): globally unique.
+
+  * Format: `evt_<ULID>` (ULID recommended for time-sortable IDs).
+* `ts` (RFC3339 UTC, required): creation timestamp.
+* `run_id` (string, required): stable correlation id for run.
+* `stage` (enum string, required): one of:
+
+  * `fetch|build|scan|policy|sign|package|deploy|runtime`
+* `step` (string, required): lowercase kebab-case recommended; max 80 chars.
+* `attempt` (int, required): starts at 1; increments for retries.
+* `status` (enum string, required for this feature): `fail` (MVP supports fail only; schema allows later expansion)
+* `error_class` (string, required): stable classifier from a shared registry (see FR2).
+
+  * max 64 chars; uppercase snake-case.
+* `summary` (string, required): human readable, max 140 chars.
+* `pointers` (array, optional): max 20 items; each item is a `Pointer` object (see FR3).
+* `kv` (object, optional): small metadata map for filtering.
+
+  * max 20 keys
+  * key max 32 chars; value max 120 chars
+  * no nested objects/arrays
+
+### Size limits
+
+* Entire event payload MUST be ≤ **8 KB** serialized JSON.
+* If producers exceed limits, they MUST truncate `summary` and drop low-priority `kv` keys before failing emission.
+
+---
+
+## FR2: Error class registry (stable contract)
+
+We maintain a canonical list of `error_class` values in a shared repo/module.
+
+### Requirements
+
+* Each `error_class` MUST have:
+
+  * name (e.g., `NETWORK_DNS`)
+  * short description
+  * severity mapping (optional)
+  * recommended remediation hints (optional, can be UI-side)
+* Producers MUST use a registry value if applicable.
+* Producers MAY emit `error_class="UNKNOWN"` if no mapping exists, but must log a warning and increment a metric.
+
+### Initial registry (minimum)
+
+Infra/Network:
+
+* `NETWORK_DNS`
+* `NETWORK_TIMEOUT`
+* `DISK_FULL`
+
+Auth:
+
+* `AUTH_EXPIRED`
+* `REGISTRY_403`
+
+Supply chain:
+
+* `SIGNATURE_INVALID`
+* `ATTESTATION_MISSING`
+* `SBOM_MISSING`
+
+Policy/Security:
+
+* `POLICY_BLOCK`
+* `VULN_REACHABLE`
+* `MALWARE_FLAG`
+
+Runner/Orchestrator:
+
+* `STEP_TIMEOUT`
+* `RUN_ABORTED`
+* `WORKER_LOST`
+
+---
+
+## FR3: Evidence pointer format and rules
+
+### Pointer object schema
+
+```json
+{
+  "type": "log|artifact|attestation|url|trace",
+  "ref": "logs://scanner/run_7f3c6a8#L1423-L1480",
+  "mime": "text/plain",
+  "label": "Scanner log excerpt",
+  "expires_at": "2025-12-20T00:00:00Z",
+  "sha256": "optional hex"
+}
+```
+
+### Rules
+
+* `type` and `ref` are required.
+* `ref` is opaque to UI; UI passes it to the resolver service.
+* `label` is optional, but strongly recommended for UI friendliness.
+* `expires_at` is optional; if present UI should show “may expire”.
+* `sha256` optional for immutability verification (artifacts/attestations especially).
+
+### Allowed schemes (MVP)
+
+* `logs://<provider>/<run_id>#Lx-Ly`
+* `artifact://<kind>/<name>@<version-or-run-id>`
+* `attestation://<store>/<id-or-digest>`
+* `url://<encoded>` (only internal allowed; resolver enforces)
+* `trace://<system>/<trace-id>`
+
+### Security constraints
+
+* Pointers MUST NOT embed secrets (tokens, passwords).
+* Any pointer that could expose sensitive data MUST be resolvable only through the Evidence Gateway (FR6), never directly client-side.
+* The resolver MUST enforce authorization for the requesting user.
+
+---
+
+## FR4: Emission rules (when and how events are produced)
+
+### When to emit
+
+Producers MUST emit a TinyFailureEvent when:
+
+1. A step exits non-zero.
+2. A policy decision is “deny/block”.
+3. A required artifact/attestation is missing at gate time.
+4. A step times out.
+5. The worker is lost (emitted by orchestrator watchdog).
+
+### Exactly-once vs at-least-once
+
+* Transport can be **at-least-once**.
+* Consumers MUST be idempotent using `(run_id, stage, step, attempt, status)` + `event_id`.
+
+### One failure event per step attempt
+
+* For a given `(run_id, stage, step, attempt)`:
+
+  * First emitted `status=fail` is canonical.
+  * Later fail events for the same tuple are treated as “updates” only if they add pointers/kv (see FR5).
+
+### Updates / enrichment
+
+We support enrichment without breaking “tiny”:
+
+* Producers MAY emit a second event **with the same tuple** (run_id/stage/step/attempt/status) that adds pointers or kv after the initial fail.
+* Consumers MUST merge pointers (dedupe identical `type+ref`) and merge kv (new keys overwrite old keys).
+* Producers MUST NOT spam; max 3 enrichment events per tuple.
+
+---
+
+## FR5: Event storage and aggregation
+
+### Required services/components
+
+1. **Event Ingest** (API or internal library endpoint)
+2. **Event Store** (durable DB table)
+3. **Realtime Fanout** (pub/sub channel)
+4. **Run Timeline API** (query per run)
+
+### Behavior
+
+* On ingest:
+
+  * Validate schema (reject invalid with 400/validation error).
+  * Persist to event store.
+  * Publish to realtime channel.
+
+### Suggested DB model (Postgres)
+
+Table: `run_events`
+
+* `event_id` PK
+* `run_id` indexed
+* `ts` indexed
+* `stage`, `step`, `attempt`, `status` indexed composite
+* `payload` jsonb
+* `ingested_at`
+
+Uniqueness constraints:
+
+* `event_id` unique
+* Optional: unique on `(run_id, stage, step, attempt, status, hash(summary))` if you want stronger dedupe
+
+### Query API
+
+* `GET /runs/{run_id}/events` returns events sorted by `ts` ascending.
+* UI should also subscribe realtime to avoid polling.
+
+---
+
+## FR6: Evidence Gateway (pointer resolver)
+
+### Purpose
+
+A single service that resolves pointers into either:
+
+* log excerpts
+* signed download URLs
+* attestation display + verification data
+* external trace links (sanitized)
+
+### Endpoints (MVP)
+
+1. **Resolve metadata**
+
+   * `POST /evidence/resolve`
+   * body: `{ "run_id": "...", "pointers": [ { "type": "...", "ref": "..." } ] }`
+   * returns per pointer:
+
+     * `status`: `available|pending|missing|denied|expired|error`
+     * `kind`: `inline|link`
+     * `title`
+     * `mime`
+     * `size_bytes` (if known)
+     * `link` (if kind=link) – must be short-lived, server-generated
+     * `inline_preview` (optional, small excerpt)
+
+2. **Fetch log excerpt**
+
+   * `GET /evidence/log-excerpt?ref=...`
+   * returns:
+
+     * `text` (max 64 KB)
+     * `start_line`, `end_line`
+     * `source` (provider info)
+
+3. **Fetch artifact**
+
+   * `GET /evidence/artifact?ref=...`
+   * returns either:
+
+     * short-lived download link
+     * or 404/403/410
+
+### AuthZ requirements
+
+* Evidence Gateway MUST verify the caller has access to the `run_id`.
+* Gateway MUST validate that the pointer belongs to that run (or is explicitly declared “global shared”).
+* Gateway MUST audit-log every evidence resolution.
+
+### Resilience
+
+* If evidence is not ready, resolver returns `pending`, not 500.
+* If pointer is unknown format, return `error` with a safe message.
+
+---
+
+# UI requirements (what the product must do)
+
+## UI1: Run timeline renders from events
+
+* The run detail page MUST show:
+
+  * stages/steps list
+  * current state per step (pass/warn/fail/running)
+  * failure details if fail exists
+* The failure state MUST be derived from TinyFailureEvent without requiring any log fetch.
+
+## UI2: Failure card content (minimum)
+
+When a fail event arrives:
+
+* Show a red failure card with:
+
+  * `stage` + `step`
+  * `summary`
+  * `error_class` badge
+  * `ts` (relative + absolute on hover)
+  * key kv fields (up to 4 shown; remainder behind “Show more”)
+
+## UI3: Progressive hydration
+
+* The card MUST include an “Evidence” section.
+* For each pointer:
+
+  * show a row with label and availability status
+  * if available, show “Open”
+  * if pending, show spinner + “Awaiting evidence”
+  * if denied, show lock icon + “No access”
+  * if missing, show “Not produced”
+* Clicking “Open”:
+
+  * logs open excerpt viewer (modal/drawer)
+  * artifacts open in viewer or download (type-dependent)
+  * attestations open verification view
+
+## UI4: Realtime behavior
+
+* UI MUST subscribe to realtime events for the run.
+* UI MUST apply idempotent merge logic:
+
+  * dedupe by `event_id`
+  * merge enrichment events by tuple (run_id/stage/step/attempt/status)
+
+## UI5: Ordering and out-of-order handling
+
+* UI MUST sort by `ts` for display.
+* UI MUST NOT regress a step state if a late “pass/info” arrives after fail.
+
+  * Rule: `fail` is terminal for a step attempt.
+
+---
+
+# Non-functional requirements
+
+## Latency
+
+* From failure detection to UI update: **≤ 2s p95**, **≤ 5s p99** (within the same network).
+* Evidence resolution:
+
+  * `resolve` call should return in **≤ 300ms p95** for cached/known pointers.
+
+## Reliability
+
+* Event ingestion must be durable (stored) before fanout.
+* System must tolerate:
+
+  * duplicates
+  * retries
+  * out-of-order delivery
+  * partial evidence availability
+
+## Payload limits
+
+* Event size ≤ 8KB
+* Evidence inline previews ≤ 4KB per pointer
+
+## Retention
+
+* Tiny events retained ≥ 30 days (configurable).
+* Evidence retention depends on provider, but resolver must surface expiry.
+
+---
+
+# Metrics and instrumentation (definition of success)
+
+Producers + ingestion MUST emit:
+
+* `ttfe_ms`: time to failure event (from step start or from failure detection)
+* `event_ingest_latency_ms`
+* `event_validation_fail_count`
+* `unknown_error_class_count`
+* `pointer_resolution_status_count{available|pending|missing|denied|expired|error}`
+* `pointer_hydration_latency_ms`
+
+UI MUST log:
+
+* time from run page open → first event rendered
+* evidence open clickthrough rate
+* evidence resolution failure rate
+
+---
+
+# Edge cases we explicitly handle
+
+1. **Runner killed before it can emit**
+
+   * Orchestrator watchdog emits `WORKER_LOST` with stage/step best-effort.
+
+2. **Logs produced after failure**
+
+   * Initial fail event has no log pointer.
+   * Later enrichment event adds log pointer (same tuple).
+
+3. **Evidence exists but user lacks access**
+
+   * Resolver returns `denied`; UI shows locked state.
+
+4. **Evidence link expired**
+
+   * Resolver returns `expired` and provides a “Refresh” action that re-resolves.
+
+5. **Multiple retries**
+
+   * `attempt` increments; UI shows attempt number and keeps prior attempt history.
+
+---
+
+# Definition of Done (engineers can ship when…)
+
+## Backend DoD
+
+* Schema validation implemented.
+* Ingest → store → fanout pipeline working.
+* Enrichment merge logic implemented.
+* Evidence Gateway resolves at least:
+
+  * log excerpt pointers
+  * artifact pointers
+  * attestation pointers
+* AuthZ enforced.
+
+## Frontend DoD
+
+* Run page shows failure card from TinyFailureEvent alone.
+* Evidence hydration works and handles all resolver statuses.
+* Realtime updates work; duplicates/out-of-order are safe.
+
+## QA DoD (minimum test cases)
+
+1. Step fails; event arrives; UI shows failure without logs.
+2. Log pointer arrives later; UI updates evidence section.
+3. Duplicate event delivery; UI shows one failure.
+4. Evidence denied; UI shows lock state.
+5. Out-of-order: enrichment arrives before initial fail; UI still resolves correctly.
+
+---
+
+# Implementation checklist (suggested division of work)
+
+### Team A: Producer SDK
+
+* `StellaOps.Events` library:
+
+  * `EmitFailure(run_id, stage, step, attempt, error_class, summary, pointers, kv)`
+  * ULID generation
+  * payload size enforcement
+  * retry w/ backoff
+
+### Team B: Ingest + Store + Fanout
+
+* API endpoint or internal gRPC for ingest
+* Postgres `run_events` table
+* Valkey pub/sub channel: `run:{run_id}:events`
+
+### Team C: Evidence Gateway
+
+* pointer parsing + resolvers
+* log excerpt adapter(s)
+* artifact download adapter(s)
+* attestation viewer adapter(s)
+
+### Team D: UI
+
+* realtime subscription
+* timeline state reducer w/ merge rules
+* evidence drawer + resolver calls
+
+---
+
+If you want, I can also provide:
+
+* a **formal JSON Schema** file for v1 (drop-in for validation),
+* a **state reducer spec** for the UI (pure function inputs/outputs),
+* and a **Postgres migration** + example queries.
+Below is a **UI State Reducer Spec (Pure Function Inputs/Outputs)** for the Run Detail page that renders the **timeline + step statuses + failure cards + evidence hydration** using TinyFailureEvents (and future-compatible with pass/warn/info).
+
+This is written so devs can implement it as a deterministic reducer (Redux, Zustand w/ reducer, Elm-style update, etc.).
+
+---
+
+# UI State Reducer Spec v1: Run Timeline + Failure Cards
+
+## Reducer contract
+
+### Pure function
+
+```ts
+reduceRunView(state: RunViewState, action: Action): RunViewState
+```
+
+### Guarantees
+
+* **Pure & deterministic**: no IO, no timers, no random IDs, no Date.now() inside reducer.
+* **Idempotent**: applying the same `RUN_EVENT_RECEIVED` twice yields the same state after the first time.
+* **Order-safe**: out-of-order events never “downgrade” a step attempt from `fail` → `pass`.
+
+---
+
+# 1) Data types
+
+## 1.1 Event type used by reducer
+
+```ts
+type StageName =
+  | 'fetch' | 'build' | 'scan' | 'policy'
+  | 'sign' | 'package' | 'deploy' | 'runtime';
+
+type StepStatus =
+  // present now (MVP)
+  | 'fail'
+  // future-compatible
+  | 'warn' | 'pass' | 'running' | 'queued' | 'info' | 'unknown';
+
+type PointerType = 'log' | 'artifact' | 'attestation' | 'url' | 'trace';
+
+type Pointer = {
+  type: PointerType;
+  ref: string;
+  mime?: string;
+  label?: string;
+  expires_at?: string; // RFC3339
+  sha256?: string;
+};
+
+type TinyEventV1 = {
+  v: 1;
+  event_id: string;
+  ts: string;          // RFC3339 UTC
+  run_id: string;
+  stage: StageName;
+  step: string;
+  attempt: number;
+  status: StepStatus;  // MVP sends 'fail' only
+  error_class: string;
+  summary: string;
+  pointers?: Pointer[];
+  kv?: Record<string, string>;
+};
+
+// Normalized for sorting and comparisons (created outside or inside reducer deterministically)
+type NormalizedEvent = TinyEventV1 & {
+  tsMs: number; // parse(ts) -> number, invalid => 0
+};
+```
+
+---
+
+## 1.2 Keys and comparisons
+
+```ts
+type TupleKey = string;       // `${stage}|${step}|${attempt}|${status}`
+type StepAttemptKey = string; // `${stage}|${step}|${attempt}`
+type StepIdentityKey = string;// `${stage}|${step}` (no attempt)
+type PointerKey = string;     // `${type}|${ref}`
+
+function tupleKey(e: TinyEventV1): TupleKey {
+  return `${e.stage}|${e.step}|${e.attempt}|${e.status}`;
+}
+function stepAttemptKey(e: TinyEventV1): StepAttemptKey {
+  return `${e.stage}|${e.step}|${e.attempt}`;
+}
+function stepIdentityKey(e: TinyEventV1): StepIdentityKey {
+  return `${e.stage}|${e.step}`;
+}
+function pointerKey(p: Pointer): PointerKey {
+  return `${p.type}|${p.ref}`;
+}
+
+// Sort: ts ascending, then event_id lexicographically (stable deterministic tiebreak)
+function compareEvent(a: NormalizedEvent, b: NormalizedEvent): number {
+  if (a.tsMs !== b.tsMs) return a.tsMs - b.tsMs;
+  return a.event_id < b.event_id ? -1 : (a.event_id > b.event_id ? 1 : 0);
+}
+```
+
+---
+
+## 1.3 Status ranking rule (terminal safety)
+
+We need a single numeric ranking so we can:
+
+* prevent regressions (`fail` must remain terminal), and
+* compute rollups.
+
+```ts
+const STATUS_RANK: Record<StepStatus, number> = {
+  unknown: 0,
+  queued:  1,
+  running: 2,
+  info:    3,
+  pass:    4,
+  warn:    5,
+  fail:    6,
+};
+
+function isTerminal(status: StepStatus): boolean {
+  return status === 'fail' || status === 'warn' || status === 'pass';
+}
+```
+
+**Invariant:** A step attempt’s displayed status must never decrease in rank.
+
+---
+
+# 2) State shape
+
+This state is for a single Run Detail page (one `runId` at a time). If you store multiple runs in a global store, wrap this in a `Record<runId, RunViewState>`.
+
+```ts
+type RealtimeStatus = 'idle' | 'connecting' | 'connected' | 'disconnected' | 'error';
+type LoadStatus = 'idle' | 'loading' | 'loaded' | 'error';
+
+type EvidenceResolveStatus =
+  | 'unresolved'  // pointer exists but no resolver call made yet
+  | 'loading'     // resolver call in-flight
+  | 'available' | 'pending' | 'missing' | 'denied' | 'expired' | 'error';
+
+type EvidenceResolution = {
+  status: EvidenceResolveStatus;
+  kind?: 'inline' | 'link';
+  title?: string;
+  mime?: string;
+  size_bytes?: number;
+  inline_preview?: string; // small preview
+  link?: string;           // short-lived link
+  error_message?: string;
+};
+
+type EvidenceState = {
+  pointer: Pointer;          // latest metadata merged from events
+  status: EvidenceResolveStatus;
+  lastResolvedAtMs?: number; // from action payload (not Date.now)
+  // for stale response protection
+  seq: number;               // increments each request
+  inFlightSeq?: number;      // seq currently in-flight
+  resolution?: EvidenceResolution;
+};
+
+type PointerAggregate = {
+  pointerKey: PointerKey;
+  pointer: Pointer; // merged metadata
+};
+
+type TupleAggregate = {
+  tupleKey: TupleKey;
+
+  // all events contributing to this tuple (same stage/step/attempt/status)
+  eventIdsSorted: string[];      // sorted by (tsMs, event_id)
+  canonicalEventId: string;      // min by (tsMs, event_id)
+
+  // merged view computed deterministically from eventIdsSorted
+  merged: {
+    summary: string;             // from canonical event
+    error_class: string;         // from canonical event
+    kv: Record<string, string>;  // merged by sorted order (later overwrites)
+    pointers: PointerAggregate[];// dedup by pointerKey, merged by sorted order
+    updatedAtMs: number;         // max tsMs among contributing events
+  };
+};
+
+type StepAttemptState = {
+  key: StepAttemptKey;
+  stage: StageName;
+  step: string;
+  attempt: number;
+
+  // all tuple aggregates for this attempt (one per status)
+  tuplesByStatus: Partial<Record<StepStatus, TupleKey>>;
+
+  // derived “best” status for this attempt
+  bestStatus: StepStatus;
+  bestStatusRank: number;
+  updatedAtMs: number; // max of all tupleAgg.updatedAtMs for this attempt
+};
+
+type StageRollup = {
+  stage: StageName;
+  // worst status among latest attempts of steps in this stage
+  rollupStatus: StepStatus;
+  rollupRank: number;
+};
+
+type RunViewState = {
+  runId: string | null;
+
+  loading: { initialEvents: LoadStatus; error?: string };
+  realtime: { status: RealtimeStatus; error?: string };
+
+  // storage
+  eventsById: Record<string, NormalizedEvent>;
+  timelineEventIds: string[];  // global timeline sorted by (tsMs, event_id)
+
+  tupleAggByKey: Record<TupleKey, TupleAggregate>;
+  stepAttemptByKey: Record<StepAttemptKey, StepAttemptState>;
+  latestAttemptByStep: Record<StepIdentityKey, number>; // max attempt observed
+
+  stageRollups: Record<StageName, StageRollup>;
+
+  evidenceByPointer: Record<PointerKey, EvidenceState>;
+};
+```
+
+---
+
+# 3) Actions (inputs to reducer)
+
+```ts
+type Action =
+  | { type: 'RUN_VIEW_OPENED'; runId: string }
+  | { type: 'RUN_EVENTS_LOAD_STARTED'; runId: string }
+  | { type: 'RUN_EVENTS_LOADED'; runId: string; events: TinyEventV1[] }
+  | { type: 'RUN_EVENTS_LOAD_FAILED'; runId: string; error: string }
+
+  | { type: 'REALTIME_STATUS_CHANGED'; runId: string; status: RealtimeStatus; error?: string }
+  | { type: 'RUN_EVENT_RECEIVED'; event: TinyEventV1 }
+
+  // Evidence hydration lifecycle (pure reducer; side-effects happen elsewhere)
+  | { type: 'EVIDENCE_RESOLVE_REQUESTED'; runId: string; pointerKey: PointerKey }
+  | { type: 'EVIDENCE_RESOLVE_RESULT'; runId: string; pointerKey: PointerKey; seq: number; resolvedAtMs: number; resolution: EvidenceResolution }
+  | { type: 'EVIDENCE_RESOLVE_CLEARED'; runId: string; pointerKey: PointerKey };
+```
+
+**Reducer must ignore** any action where `action.runId !== state.runId` (except `RUN_VIEW_OPENED` which sets it).
+
+---
+
+# 4) Reducer semantics (outputs)
+
+## 4.1 RUN_VIEW_OPENED
+
+**Input:** `{ runId }`
+**Output:** resets all run-specific state.
+
+Rules:
+
+* Set `state.runId = runId`
+* Clear events, aggregates, evidence, timeline.
+* Set `loading.initialEvents = 'loading'`
+* Set `realtime.status = 'connecting'` (optional)
+
+---
+
+## 4.2 RUN_EVENTS_LOAD_STARTED / LOADED / FAILED
+
+### RUN_EVENTS_LOAD_STARTED
+
+* If runId matches, set `loading.initialEvents = 'loading'`.
+
+### RUN_EVENTS_LOADED
+
+* If runId matches:
+
+  * For each event in `events`: apply the exact same logic as `RUN_EVENT_RECEIVED`.
+  * Then set `loading.initialEvents = 'loaded'`.
+
+### RUN_EVENTS_LOAD_FAILED
+
+* If runId matches: `loading.initialEvents = 'error'`, store error string.
+
+---
+
+## 4.3 REALTIME_STATUS_CHANGED
+
+* Update `realtime.status` and `realtime.error` if runId matches.
+
+---
+
+## 4.4 RUN_EVENT_RECEIVED (core ingestion)
+
+### Preconditions
+
+If `state.runId` is null, ignore (or treat as no-op).
+If `event.run_id !== state.runId`, ignore.
+
+### Step A — normalize + dedupe
+
+* Convert to `NormalizedEvent`:
+
+  * `tsMs = parseRFC3339ToMs(event.ts)`; if parse fails, `tsMs = 0`.
+  * Default `pointers = []`, `kv = {}` if missing.
+* If `eventsById[event_id]` exists: **no-op**.
+
+### Step B — insert into global stores
+
+* Add to `eventsById[event_id]`.
+* Insert `event_id` into `timelineEventIds` keeping sorted order by `(tsMs, event_id)`.
+
+### Step C — ensure evidence entries exist for pointers
+
+For each pointer `p`:
+
+* `pk = pointerKey(p)`
+* If `evidenceByPointer[pk]` is missing:
+
+  * create `{ pointer: p, status: 'unresolved', seq: 0 }`
+* Else merge pointer metadata into `evidenceByPointer[pk].pointer` using pointer-merge rules (below).
+  (Do **not** overwrite existing resolver resolution fields.)
+
+### Step D — update tuple aggregate (merge/enrichment)
+
+Let `tk = tupleKey(event)`.
+
+* If `tupleAggByKey[tk]` missing, create new `TupleAggregate` with:
+
+  * `eventIdsSorted = [event_id]`
+  * `canonicalEventId = event_id`
+  * `merged` from this event
+
+* Else:
+
+  * Insert `event_id` into `eventIdsSorted` in sorted order (using `compareEvent` via `eventsById`).
+  * Recompute:
+
+    * `canonicalEventId = min(eventIdsSorted)` by compareEvent
+    * `merged` deterministically from all contributing events (see merge rules)
+
+### Tuple merge rules (deterministic)
+
+Given contributing events `E` sorted by `(tsMs, event_id)` ascending:
+
+* `canonical = E[0]`
+* `merged.summary = canonical.summary`
+* `merged.error_class = canonical.error_class`
+* `merged.kv`:
+
+  * start empty `{}`
+  * for each event `e` in order, for each `(k,v)` in `e.kv`: `merged.kv[k] = v`
+    (later events overwrite earlier keys)
+* `merged.pointers`:
+
+  * maintain `map: Record<PointerKey, Pointer>`
+  * for each event `e` in order, for each pointer `p`:
+
+    * `pk = pointerKey(p)`
+    * if not present: set map[pk] = p
+    * else: map[pk] = mergePointerMeta(map[pk], p) (see below)
+  * output pointers as an array sorted by `PointerKey` lexicographically (for stable UI lists)
+* `merged.updatedAtMs = max(e.tsMs)`
+
+### Pointer metadata merge rule (non-null wins, later wins)
+
+```ts
+function mergePointerMeta(oldP: Pointer, newP: Pointer): Pointer {
+  // type/ref must match
+  return {
+    type: oldP.type,
+    ref: oldP.ref,
+    // later non-empty wins
+    mime:       newP.mime       ?? oldP.mime,
+    label:      newP.label      ?? oldP.label,
+    expires_at: newP.expires_at ?? oldP.expires_at,
+    sha256:     newP.sha256     ?? oldP.sha256,
+  };
+}
+```
+
+---
+
+## 4.5 Update StepAttemptState (best status + no regression)
+
+After tuple aggregate update, update the parent step attempt:
+
+* Let `sak = stepAttemptKey(event)` and `sid = stepIdentityKey(event)`.
+
+### latest attempt tracking
+
+* `latestAttemptByStep[sid] = max(previous, event.attempt)`
+
+### StepAttemptState update
+
+* If missing, create:
+
+  * `bestStatus = 'unknown'`, `bestStatusRank = 0`, `tuplesByStatus = {}`
+* Set `tuplesByStatus[event.status] = tk`
+
+### Recompute best status (never decreases)
+
+Compute candidate best by checking all statuses present for this attempt:
+
+```ts
+candidateBest = argmax(status in tuplesByStatus) STATUS_RANK[status]
+```
+
+Then apply **no-regression rule**:
+
+* If `STATUS_RANK[candidateBest] >= step.bestStatusRank`:
+
+  * update `bestStatus`, `bestStatusRank`
+* Else:
+
+  * keep existing `bestStatus` (prevents fail → pass regressions)
+
+Set `updatedAtMs = max(updatedAtMs, tupleAgg.merged.updatedAtMs)`.
+
+**Important:** This rule guarantees “late pass/info” cannot override a prior fail.
+
+---
+
+## 4.6 Stage rollups (optional but recommended)
+
+Whenever any `StepAttemptState` changes, update `stageRollups[stage]` deterministically:
+
+For each stage:
+
+* Consider only the **latest attempt per step identity** in that stage:
+
+  * For each `StepIdentityKey = stage|step`, find `attempt = latestAttemptByStep[stage|step]`
+  * Look up `StepAttemptState` for that attempt.
+* Roll up stage status as the **worst rank** among those:
+
+  * `rollupRank = max(step.bestStatusRank)`
+  * `rollupStatus = status with that rank`
+
+If a stage has no steps yet, set `rollupStatus='unknown'`.
+
+---
+
+# 5) Evidence hydration reducer rules
+
+Evidence actions update `evidenceByPointer` only; they must not mutate events/aggregates.
+
+## 5.1 EVIDENCE_RESOLVE_REQUESTED
+
+**Input:** `{ pointerKey }`
+
+Rules:
+
+* If no evidence entry exists: create one with status `unresolved` and `seq=0` (should be rare).
+* Increment `seq = seq + 1`
+* Set `inFlightSeq = seq`
+* Set `status = 'loading'`
+* Keep `resolution` (optional: clear it if you want UI to hide stale info; recommended to keep and show “Refreshing…”)
+
+**Middleware/effect contract (outside reducer):**
+
+* After dispatching `EVIDENCE_RESOLVE_REQUESTED`, the effect layer reads `inFlightSeq` from state and uses it in the API call.
+* When the response returns, dispatch `EVIDENCE_RESOLVE_RESULT` with that same `seq`.
+
+## 5.2 EVIDENCE_RESOLVE_RESULT
+
+**Input:** `{ pointerKey, seq, resolvedAtMs, resolution }`
+
+Rules:
+
+* If `evidenceByPointer[pointerKey]` missing: ignore or create (implementation choice).
+* If `evidence.inFlightSeq !== seq`: **ignore stale response**.
+* Else:
+
+  * `status = resolution.status`
+  * `resolution = resolution`
+  * `lastResolvedAtMs = resolvedAtMs`
+  * `inFlightSeq = undefined`
+
+## 5.3 EVIDENCE_RESOLVE_CLEARED
+
+* Reset entry back to `{ status:'unresolved', resolution: undefined, inFlightSeq: undefined }`
+* Keep `pointer` metadata.
+
+---
+
+# 6) Selectors (pure outputs for rendering)
+
+These are not reducer logic, but they define how UI consumes state deterministically.
+
+## 6.1 Timeline view model
+
+```ts
+selectTimeline(state): NormalizedEvent[] {
+  return state.timelineEventIds.map(id => state.eventsById[id]);
+}
+```
+
+## 6.2 Latest attempt cards per step identity
+
+```ts
+type StepCardVM = {
+  stage: StageName;
+  step: string;
+  attempt: number;
+  status: StepStatus;
+  error_class?: string;
+  summary?: string;
+  kv: Record<string,string>;
+  pointers: PointerAggregate[];
+  updatedAtMs: number;
+};
+
+selectLatestStepCards(state): StepCardVM[] {
+  const cards: StepCardVM[] = [];
+  for (const sid in state.latestAttemptByStep) {
+    const attempt = state.latestAttemptByStep[sid];
+    const [stage, step] = sid.split('|') as [StageName, string];
+    const sak = `${stage}|${step}|${attempt}`;
+
+    const sa = state.stepAttemptByKey[sak];
+    if (!sa) continue;
+
+    // Prefer fail tuple for details if present
+    const failTk = sa.tuplesByStatus['fail'];
+    const bestTk = sa.tuplesByStatus[sa.bestStatus];
+    const tk = failTk ?? bestTk;
+    const agg = tk ? state.tupleAggByKey[tk] : undefined;
+
+    cards.push({
+      stage, step, attempt,
+      status: sa.bestStatus,
+      error_class: agg?.merged.error_class,
+      summary: agg?.merged.summary,
+      kv: agg?.merged.kv ?? {},
+      pointers: agg?.merged.pointers ?? [],
+      updatedAtMs: sa.updatedAtMs,
+    });
+  }
+  // stable ordering: by stage order, then step name
+  return cards.sort((a,b) =>
+    (STAGE_ORDER.indexOf(a.stage) - STAGE_ORDER.indexOf(b.stage)) ||
+    a.step.localeCompare(b.step)
+  );
+}
+```
+
+## 6.3 Failure banner (first failure by time)
+
+```ts
+selectFirstFailure(state): StepCardVM | null {
+  const cards = selectLatestStepCards(state).filter(c => c.status === 'fail');
+  if (cards.length === 0) return null;
+  return cards.sort((a,b) => a.updatedAtMs - b.updatedAtMs)[0];
+}
+```
+
+---
+
+# 7) Worked examples (expected reducer behavior)
+
+## Example A: fail event arrives, then enrichment adds pointers
+
+1. Receive fail event (no pointers)
+
+* Step card shows `fail`, summary, error_class, evidence list empty.
+
+2. Receive second event same tupleKey with pointers
+
+* Same step card remains `fail` (no regression)
+* Evidence section now lists pointers (status `unresolved` until resolved).
+
+## Example B: out-of-order enrichment arrives before initial fail
+
+* Enrichment event arrives first (later tsMs) → creates tupleAgg; canonical is that (for now).
+* Later initial fail arrives with earlier tsMs:
+
+  * canonical becomes the earlier event (smaller tsMs)
+  * **pointers remain**, because merged pointers are union across all contributing events.
+
+## Example C: duplicate delivery
+
+* Same `event_id` received twice → second is ignored (idempotent).
+
+## Example D: late pass after fail (future-proof)
+
+* If a `pass` event arrives after a `fail` for the same step attempt:
+
+  * `bestStatusRank` is already `fail` (6)
+  * candidate is `pass` (4)
+  * no-regression rule keeps `fail`
+
+---
+
+# 8) Implementation notes (non-binding but useful)
+
+* Event counts per run are usually small; simple array insert + sort is fine.
+* If you expect thousands of events, maintain a binary insertion for `timelineEventIds` and `eventIdsSorted`.
+* Keep all “current time” out of reducer. Any timestamps used in actions (e.g., `resolvedAtMs`) must be created outside.
+
+---
+
+If you want next, I can provide:
+
+* a drop-in **TypeScript implementation** of `reduceRunView` with helper functions, and
+* a set of **unit test vectors** (Given actions → expect final state) covering all edge cases above.
diff --git a/docs/product-advisories/14-Dec-2025 - Create a small ground‑truth corpus.md b/docs/product-advisories/14-Dec-2025 - Create a small ground‑truth corpus.md
new file mode 100644
index 000000000..8b191f3cf
--- /dev/null
+++ b/docs/product-advisories/14-Dec-2025 - Create a small ground‑truth corpus.md	
@@ -0,0 +1,787 @@
+Here’s a compact playbook for building **10–20 “toy services” with planted, labeled vulnerabilities** so you can demo reachability, measure scanner accuracy, and make the “why” behind each finding obvious.
+
+### Why do this
+
+* **Repeatable benchmarks:** same inputs → same findings → track accuracy over time.
+* **Explainable demos:** each vuln has a story, proof path, and a fix.
+* **Coverage sanity checks:** distinguish **reachable** vs **unreachable** vulns so tools can’t inflate results.
+
+### Core design
+
+* Each service = 1 repo with:
+
+  * `/app` (tiny API or worker), `/infra` (Dockerfile/compose), `/tests` (PyTest/Jest + attack scripts), `/labels.yaml` (ground‑truth).
+  * `labels.yaml` schema:
+
+    ```yaml
+    service: svc-01-password-reset
+    vulns:
+      - id: V1
+        cve: CVE-2022-XXXXX
+        type: dep_runtime
+        package: express
+        version: 4.17.0
+        reachable: true
+        path_tags: ["route:/reset", "call:crypto.md5", "env:DEV_MODE"]
+        proof: ["curl.sh#L10", "trace.json:/reset stack -> md5()"]
+        fix_hint: "upgrade express to 4.18.3"
+      - id: V2
+        type: dep_build
+        package: lodash
+        version: 4.17.5
+        reachable: false
+        path_tags: ["devDependency", "no-import"]
+    ```
+* **Tagged paths**: add lightweight traces (e.g., log “TAG:route:/reset” before vulnerable call) so tests can assert reachability.
+
+### Suggested catalog (pick 10–20)
+
+1. **Password reset token** (MD5, predictable tokens) – reachable via `/reset`.
+2. **SQL injection** (string‑concat query) – reachable via `/search`.
+3. **Path traversal** (`../` in `?file=`) – reachable but sandboxed; variant unreachable behind dead route flag.
+4. **Deserialization bug** (unsafe `pickle`/`BinaryFormatter`) – reachable in worker queue.
+5. **SSRF** (proxy fetch) – guarded by allow‑list in unreachable variant.
+6. **Command injection** (`child_process.exec`) – reachable via debug param; unreachable alt uses execFile.
+7. **JWT none‑alg** acceptance – only when `DEV_MODE=1`.
+8. **Hardcoded credentials** (in config) – present but not used (unreachable).
+9. **Dependency vuln (runtime)** old `express/fastapi` called in hot path.
+10. **Dependency vuln (build‑time only)** devDependency only (unreachable at runtime).
+11. **Insecure TLS** (skip verify) – gated behind feature flag.
+12. **Open redirect** – requires crafted `next=` param.
+13. **XXE** in XML upload – off by default in unreachable variant.
+14. **Insecure deserialization in message bus consumer** – invoked by test producer.
+15. **Race condition** (TOCTOU temp file) – demonstrated by parallel test.
+16. **Use‑after‑free style bug** (C tiny service) – reachable with specific sequence; alt path never called.
+17. **CSRF** on state‑changing route – reachable only without SameSite/CSRF tokens.
+18. **Directory listing** (misconfigured static server) – reachable under `/public`.
+19. **Prototype pollution** (JS merge) – only reachable when `content-type: application/json`.
+20. **Zip‑slip** in archive import – prevented in unreachable variant via safe unzip.
+
+### Tech stack mix
+
+* **Languages:** Node (Express), Python (FastAPI/Flask), Go (net/http), C# (.NET Minimal API), one small C binary.
+* **Packaging:** Docker per service; one multi‑stage with vulnerable build‑tool only (to test build‑time vs runtime vulns).
+* **Data:** SQLite or in‑memory maps to avoid ops noise.
+
+### Test harness (deterministic)
+
+* `make test` runs:
+
+  1. **Smoke** (service up).
+  2. **Exploit scripts** trigger each *reachable* vuln and store `evidence/trace.json`.
+  3. **Scanner run** (your tool + competitors) against the image/container/fs.
+  4. **Evaluator** compares scanner output to `labels.yaml`.
+
+### Metrics you’ll get
+
+* **Precision/recall** overall and by class (dep_runtime, dep_build, code, config).
+* **Reachability precision**: % of flagged vulns with a proven path tag match.
+* **Overreport index**: unreachable‑flag hits / total hits.
+* **TTFS (Time‑to‑first‑signal)**: from scan start to first evidence‑backed block.
+* **Fix guidance score**: did the tool propose the correct minimal upgrade/patch?
+
+### Minimal evaluator format
+
+Scanner output → normalized JSON:
+
+```json
+{ "findings": [
+  {"cve":"CVE-2022-XXXXX","package":"express","version":"4.17.0",
+   "class":"dep_runtime","path_tags":["route:/reset","call:crypto.md5"]}
+]}
+```
+
+Evaluator joins on `(cve|type|package)` and checks:
+
+* tag overlap with `labels.vulns[*].path_tags`
+* reachable expectation matches
+* counts per class; exports `report.md` + `report.csv`.
+
+### Demo storyline (5 min)
+
+1. Run **svc‑01**; hit `/reset`; show trace marker.
+2. Run your scanner; show it ranks the **reachable dep vuln** above the **devDependency vuln**.
+3. Flip env to disable route; rerun → reachable finding disappears → score improves.
+4. Show **fix hint** applied (upgrade) → green.
+
+### Repo layout (monorepo)
+
+```
+/toys/
+  svc-01-reset-md5/
+  svc-02-sql-injection/
+  ...
+/harness/
+  normalize.py
+  evaluate.py
+  run_scans.sh
+/docs/
+  rubric.md   # metric definitions & thresholds
+```
+
+### Guardrails
+
+* Keep images tiny (<150MB) and ports unique.
+* Deterministic seeds for any randomness.
+* No outbound calls in tests (use local mocks).
+* Clearly mark **unsafe** code blocks with comments.
+
+### First 5 to build this week
+
+1. `svc-01-reset-md5` (Node)
+2. `svc-02-sql-injection` (Python/FastAPI)
+3. `svc-03-dep-build-only` (Node devDependency)
+4. `svc-04-cmd-injection` (.NET Minimal API)
+5. `svc-05-zip-slip` (Go)
+
+If you want, I can generate the skeleton repos (Dockerfile, app, tests, `labels.yaml`, and the evaluator script) so you can drop them into your monorepo and start measuring immediately.
+Below is a **developer framework** you can hand to the team as the governing “contract” for implementing the full toy-service catalogue at a **best-in-class** standard, while keeping the suite deterministic, safe, and maximally useful for scanner R&D.
+
+---
+
+## 1) Non-negotiable principles
+
+1. **Determinism first**
+
+   * Same git SHA + same inputs ⇒ identical images, SBOMs, findings, scores.
+   * Pin everything: base image **by digest**, language deps **by lockfiles**, tool versions **by exact semver**, and record it in an evidence manifest.
+
+2. **Ground truth is authoritative**
+
+   * Every planted weakness must have a **machine-readable label**, and at least one **verifiable proof artifact**.
+   * No “implicit” vulnerabilities; if it’s not labeled, it does not exist for scoring.
+
+3. **Reachability is tiered, not binary**
+
+   * You will label and prove *how* it is reachable (imported vs executed vs tainted input), not just “reachable: true”.
+
+4. **Safety by construction**
+
+   * Services run on an isolated docker network; tests must not require internet.
+   * Proofs should demonstrate *execution and dataflow* rather than “weaponized exploitation”.
+
+---
+
+## 2) Repository and service contract
+
+### Standard monorepo layout
+
+```
+/toys/
+  svc-01-.../
+    app/
+    infra/          # Dockerfile, compose, network policy
+    tests/          # positive + negative reachability tests
+    labels.yaml     # ground truth
+    evidence/       # generated by tests (trace, tags, manifests)
+    fix/            # minimal patch proving remediation
+/harness/
+  run-suite/
+  normalize/
+  evaluate/
+/schemas/
+  labels.schema.json
+/docs/
+  benchmark-contract.md
+  scoring.md
+  reviewer-checklist.md
+```
+
+### Required service deliverables (Definition of Done)
+
+A service PR is “DONE” only if it includes:
+
+* `labels.yaml` validated by `schemas/labels.schema.json`
+* Docker build reproducible enough to be stable in CI (digest pinned; lockfiles committed)
+* **Positive tests** that generate evidence proving reachability tiers (see §3)
+* **Negative tests** proving “unreachable” claims (feature flags off, devDependency only, dead route, etc.)
+* `fix/` patch that removes/mitigates the weakness and produces a measurable delta (findings drop, reachability flips, or config gate blocks)
+* An `evidence/manifest.json` capturing tool versions, git sha, image digest, timestamps (UTC), and hashes of evidence files
+
+---
+
+## 3) Reachability tiers and evidence requirements
+
+### Reachability levels (use these everywhere)
+
+* **R0 Present**: vulnerable component exists in image/SBOM, not imported/loaded.
+* **R1 Loaded**: imported/linked/initialized, but no executed path proven.
+* **R2 Executed**: vulnerable function/module is executed in a test (deterministic trace).
+* **R3 Tainted execution**: execution occurs with externally influenced input (route param/message/body).
+* **R4 Exploitable** (optional): controlled, non-harmful PoC demonstrates full impact.
+
+### Minimum evidence per level
+
+* R0: SBOM + file hash / package metadata
+* R1: runtime startup logs or module load trace tag
+* R2: callsite tag + stack trace snippet (or deterministic trace file)
+* R3: R2 + taint marker showing data originated from external boundary (HTTP/queue/env) and reached call
+* R4: only if safe and necessary; keep it non-weaponized and sandboxed
+
+**Key rule:** prefer proving **execution + dataflow** over providing “payload recipes”.
+
+---
+
+## 4) Ground truth schema (what `labels.yaml` must capture)
+
+Every vuln entry must have:
+
+* Stable ID: `svc-XX:Vn` (never renumber once published)
+* Class: `dep_runtime | dep_build | code | config | os_pkg | supply_chain`
+* Identity: `cve` (if applicable), `purl`, `package`, `version`, `location` (path/module)
+* Reachability: `reachability_level: R0..R4`, `entrypoint` (route/topic/cli), `preconditions` (flags/env/auth)
+* Proofs:
+
+  * `proof.artifacts[]` (e.g., trace file, tag log, coverage snippet)
+  * `proof.tags[]` (canonical tag strings)
+* Fix:
+
+  * `fix.type` (upgrade/config/code)
+  * `fix.patch_path` (under `fix/`)
+  * `fix.expected_delta` (what should change in findings/evidence)
+* Negatives (if unreachable):
+
+  * `negative_proof` explaining and proving why it is unreachable
+
+Canonical tag format (consistent across languages):
+
+* `TAG:route:/reset`
+* `TAG:call:Crypto.Md5`
+* `TAG:taint:http.body.resetToken`
+* `TAG:flag:DEV_MODE=true`
+
+---
+
+## 5) Service implementation standards (how developers build each toy)
+
+### A. Vulnerability planting patterns (approved)
+
+* **Dependency runtime**: vulnerable version is a production dependency and exercised on a normal route/job.
+* **Dependency build-only**: devDependency only, or used only in build stage; prove it never ships in final image.
+* **Code vuln**: the vulnerable sink is behind a clean, deterministic entrypoint and instrumented.
+* **Config vuln**: misconfig is explicit and versioned (headers, TLS settings, authz rules), with a fix patch.
+
+### B. Instrumentation requirements
+
+* Every reachable vuln must emit:
+
+  * one **entrypoint tag** (route/topic/command)
+  * one **sink tag** (the vulnerable call or module)
+  * optional **taint tag** for R3
+* Evidence generation must be stable and machine-parsable:
+
+  * JSON trace preferred (`evidence/trace.json`)
+  * Logs acceptable if structured and anchored with tags
+
+### C. Negative-case discipline (unreachable means proven unreachable)
+
+Unreachable claims must be backed by one of:
+
+* compilation/linker exclusion (dead code eliminated) + proof
+* dependency not present in final image (multi-stage) + proof (image file listing / SBOM diff)
+* feature flag off + proof (config captured + route unavailable)
+* auth gate + proof (unauthorized cannot reach sink)
+
+---
+
+## 6) Harness and scoring gates (how you enforce “best in class”)
+
+### Normalization
+
+All scanners’ outputs must normalize into one internal shape:
+
+* `(identity: purl+cve+version+location) + class + reachability_claim + evidence_refs`
+
+### Core metrics (tracked per commit)
+
+* **Recall (by class)**: runtime deps, OS pkgs, code, config
+* **Precision**: false positive rate, especially R0/R1 misclassified as R2/R3
+* **Reachability accuracy**:
+
+  * overreach: predicted reachable but labeled R0/R1
+  * underreach: labeled R2/R3 but predicted non-reachable
+* **TTFS** (Time-to-First-Signal): time to first *evidence-backed* blocking issue
+* **Fix validation**: applying `fix/` must produce the expected delta
+
+### Quality gates (example thresholds you can enforce in CI)
+
+* Runtime dependency recall ≥ 0.95
+* Unreachable false positives ≤ 0.05 (for R0/R1)
+* Reachability underreport ≤ 0.10 (for labeled R2/R3)
+* TTFS regression: no worse than +10% vs main
+* Fix validation pass rate = 100% for modified services
+
+(Adjust numbers as your suite matures; the framework is the key.)
+
+---
+
+## 7) Review checklist (what reviewers enforce)
+
+A PR adding/modifying a service is rejected if any of these fail:
+
+* Labels complete, schema-valid, and stable IDs preserved
+* Proof artifacts are deterministic and generated by tests
+* Reachability tier justified and matches evidence
+* Unreachable claims have negative proofs
+* Docker build uses pinned digests + lockfiles committed
+* `fix/` produces measurable delta and does not introduce new unlabeled issues
+* No network egress required; tests are hermetic
+
+---
+
+## 8) Catalogue coverage plan (so the suite actually pushes “best in class”)
+
+Define coverage targets across axes (each service must declare which axes it covers):
+
+* Language/runtime: Node, Python, Go, .NET 10, Java, native
+* Packaging: fat images, slim, distroless, multi-stage, vendored deps
+* Dependency complexity: direct, transitive, aliased/renamed, bundled
+* Reachability: R0..R3 mandatory; R4 only for a small subset
+* Supply chain: SBOM mismatches, provenance metadata, build-time tools present but excluded
+
+This prevents you from building 20 near-duplicates and thinking you have “coverage”.
+
+---
+
+## 9) What to hand developers today (copy/paste into your repo)
+
+Create these docs/templates as the “contract”:
+
+* `/docs/benchmark-contract.md` (principles + DoD + reachability tiers)
+* `/docs/reviewer-checklist.md` (the rejection criteria)
+* `/schemas/labels.schema.json` (strict validation)
+* `/toys/_TEMPLATE/` (service skeleton with instrumentation + tests scaffolding)
+* `/harness/evaluate/` (metrics + CI gating)
+
+If you want, I can draft the actual **`labels.schema.json`**, the **service template**, and the **review checklist** in ready-to-commit form (paths, names, and .NET 10 conventions included).
+Below is an **extended, implementable catalogue** (20 services + 4 “meta” repos) with **concrete stacks, entrypoints, reachability tiers, canonical tags, proof tests, and fixes**. It is written so developers can start building immediately while keeping the suite deterministic and scanner-grade.
+
+---
+
+## A. Catalogue index (what each service is for)
+
+| ID | Service                     | Stack                 | Primary capability tested                              |
+| -: | --------------------------- | --------------------- | ------------------------------------------------------ |
+| 01 | reset-token-weak-crypto     | Node/Express          | Code reachability + crypto misuse                      |
+| 02 | search-sql-injection        | Python/FastAPI        | Taint → sink (SQLi), route evidence                    |
+| 03 | cmd-injection-diagnostics   | .NET 10 Minimal API   | Taint → shell sink + gating                            |
+| 04 | zip-import-zip-slip         | Go net/http           | Archive handling (Zip Slip), filesystem proof          |
+| 05 | xml-upload-xxe              | Java/Spring Boot      | XML parser config (XXE), safe proof                    |
+| 06 | jwt-none-devmode            | .NET 10               | Config-gated auth bypass (reachability depends on env) |
+| 07 | fetcher-ssrf                | Node/Express          | SSRF to internal-only target, network isolation        |
+| 08 | outbound-tls-skipverify     | Go                    | TLS misconfig + “reachable only if feature enabled”    |
+| 09 | queue-pickle-deser          | Python worker         | Async reachability via queue + unsafe deserialization  |
+| 10 | efcore-rawsql               | .NET 10 + EF Core     | ORM raw SQL misuse + input flow                        |
+| 11 | shaded-jar-deps             | Java/Gradle           | Shaded/fat jar dependency discovery                    |
+| 12 | webpack-bundled-dep         | Node/Webpack          | Bundled deps + SBOM correctness                        |
+| 13 | go-static-modver            | Go static             | Detect module versions in static binaries              |
+| 14 | dotnet-singlefile-trim      | .NET 10 publish       | Single-file/trimmed dependency evidence                |
+| 15 | cors-credentials-wildcard   | .NET 10 or Node       | Config vulnerability (CORS) + fix delta                |
+| 16 | open-redirect               | Node/Express          | Web vuln classification + allowlist fix                |
+| 17 | csrf-state-change           | .NET 10 Razor/Minimal | Missing CSRF protections + cookie semantics            |
+| 18 | prototype-pollution-merge   | Node                  | JSON-body gated path + sink                            |
+| 19 | path-traversal-download     | Python/Flask          | File handling traversal + normalization                |
+| 20 | insecure-tempfile-toctou    | Go or .NET            | Concurrency/race evidence (safe)                       |
+| 21 | k8s-misconfigs              | YAML/Helm             | IaC scanning (privileged, hostPath, etc.)              |
+| 22 | docker-multistage-buildonly | Any                   | Build-time-only vuln exclusion proof                   |
+| 23 | secrets-fakes-corpus        | Any                   | Secret detection precision (fake tokens)               |
+| 24 | sbom-mismatch-lab           | Any                   | SBOM validation + diff correctness                     |
+
+---
+
+## B. Canonical tagging (use across all services)
+
+Every reachable vuln must produce at least:
+
+* `TAG:route:<method> <path>` or `TAG:topic:<name>`
+* `TAG:call:<sink>`
+* If R3: `TAG:taint:<boundary>` (http.query, http.body, queue.msg, env.var)
+
+**Evidence artifact:** `evidence/trace.json` lines such as:
+
+```json
+{"ts":"...","corr":"...","tags":["TAG:route:POST /reset","TAG:taint:http.body.email","TAG:call:Crypto.MD5"]}
+```
+
+---
+
+## C. Service specs (developers can implement 1:1)
+
+### 01) `svc-01-reset-token-weak-crypto` (Node/Express)
+
+**Purpose:** R3 code reachability; crypto misuse; ensure scanner doesn’t over-rank unreachable dev deps.
+**Entrypoints:** `POST /reset` and `POST /reset/confirm`
+**Vulns:**
+
+* `V1` **CWE-327 Weak Crypto** — reset token derived from deterministic inputs (no CSPRNG).
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:POST /reset`, `TAG:taint:http.body.email`, `TAG:call:Crypto.WeakToken`
+  * Proof test: request reset; assert trace contains sink tag.
+  * Fix: use `crypto.randomBytes()` and store hashed token.
+* `V2` **dep_build** — vulnerable npm devDependency present only in `devDependencies`.
+
+  * Reachability: **R0**
+  * Negative proof: final image contains no node_modules entry for it OR it is never imported (coverage + grep import map).
+
+**Hard mode variant:** token generation only happens when `FEATURE_RESET_V1=1` → label unreachable when off.
+
+---
+
+### 02) `svc-02-search-sql-injection` (Python/FastAPI + SQLite)
+
+**Purpose:** Classic taint → SQL sink; evidence-driven.
+**Entrypoint:** `GET /search?q=`
+**Vulns:**
+
+* `V1` **CWE-89 SQL Injection** — query constructed via string concatenation.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:GET /search`, `TAG:taint:http.query.q`, `TAG:call:SQL.Unparameterized`
+  * Proof test: send query with SQL metacharacters; verify trace hits sink.
+  * Fix: parameterized query / query builder.
+
+**Hard mode variant:** same route exists but safe path uses parameters; unsafe path only if header `X-Debug=1` and env `DEV_MODE=1`.
+
+---
+
+### 03) `svc-03-cmd-injection-diagnostics` (.NET 10 Minimal API)
+
+**Purpose:** Detect command execution sink and prove gating.
+**Entrypoint:** `GET /diag/ping?host=`
+**Vulns:**
+
+* `V1` **CWE-78 Command Injection** — shell invocation with user-influenced argument.
+
+  * Reachability: **R3** when `DIAG_ENABLED=1`
+  * Tags: `TAG:route:GET /diag/ping`, `TAG:taint:http.query.host`, `TAG:call:Process.Start.Shell`
+  * Proof test: call endpoint with characters that would alter shell parsing; evidence is sink tag + controlled output marker (not destructive).
+  * Fix: avoid shell, use argument arrays (`ProcessStartInfo.ArgumentList`) + allowlist hostnames.
+
+**Hard mode variant:** sink is in a helper library referenced transitively; scanner must resolve call graph.
+
+---
+
+### 04) `svc-04-zip-import-zip-slip` (Go)
+
+**Purpose:** File/archive handling; safe filesystem proof; no “real system” impact.
+**Entrypoint:** `POST /import-zip`
+**Vulns:**
+
+* `V1` **CWE-22 Path Traversal (Zip Slip)** — extraction path not normalized/validated.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:POST /import-zip`, `TAG:taint:http.body.zip`, `TAG:call:Archive.Extract.UnsafeJoin`
+  * Proof test: upload crafted zip that attempts to place `evidence/sentinel.txt` outside dest; assert sentinel ends up outside intended folder.
+  * Fix: clean paths; reject entries escaping dest; forbid absolute paths.
+
+---
+
+### 05) `svc-05-xml-upload-xxe` (Java/Spring Boot)
+
+**Purpose:** Parser config scanning + code-path proof.
+**Entrypoint:** `POST /upload-xml`
+**Vulns:**
+
+* `V1` **CWE-611 XXE** — DocumentBuilderFactory with external entities enabled.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:POST /upload-xml`, `TAG:taint:http.body.xml`, `TAG:call:XML.Parse.XXEEnabled`
+  * Proof test: XML references a **local test file under `/app/testdata/`** and returns its sentinel string (no external network).
+  * Fix: disable external entity resolution and secure processing.
+
+---
+
+### 06) `svc-06-jwt-none-devmode` (.NET 10)
+
+**Purpose:** Reachability depends on environment and config.
+**Entrypoint:** `GET /admin` (Bearer JWT)
+**Vulns:**
+
+* `V1` **CWE-345 Insufficient Verification** — accepts unsigned token when `DEV_MODE=1`.
+
+  * Reachability: **R2** (exec) / **R3** (if token from request)
+  * Tags: `TAG:route:GET /admin`, `TAG:flag:DEV_MODE=true`, `TAG:call:Auth.JWT.AcceptNoneAlg`
+  * Proof test: run container with DEV_MODE=1; request triggers sink tag.
+  * Negative test: DEV_MODE=0 must not hit sink tag.
+  * Fix: enforce algorithm + signature validation always.
+
+---
+
+### 07) `svc-07-fetcher-ssrf` (Node/Express)
+
+**Purpose:** SSRF detection with internal-only target in docker network.
+**Entrypoint:** `GET /fetch?url=`
+**Vulns:**
+
+* `V1` **CWE-918 SSRF** — URL fetched without scheme/host restrictions.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:GET /fetch`, `TAG:taint:http.query.url`, `TAG:call:HTTP.Client.Fetch`
+  * Proof test: fetch `http://internal-metadata/health` (a companion container in compose); assert response contains sentinel + sink tag.
+  * Fix: allowlist hosts/schemes; block private ranges; require signed destinations.
+
+---
+
+### 08) `svc-08-outbound-tls-skipverify` (Go)
+
+**Purpose:** Config vuln + “reachable only when feature on.”
+**Entrypoint:** `POST /sync` triggers outbound HTTPS call
+**Vulns:**
+
+* `V1` **CWE-295 Improper Cert Validation** — `InsecureSkipVerify=true` when `SYNC_FAST=1`.
+
+  * Reachability: **R2** (exec)
+  * Tags: `TAG:route:POST /sync`, `TAG:flag:SYNC_FAST=true`, `TAG:call:TLS.InsecureSkipVerify`
+  * Fix: proper CA pinning / system pool; explicit cert verification.
+
+---
+
+### 09) `svc-09-queue-pickle-deser` (Python API + worker)
+
+**Purpose:** Async reachability: API enqueues → worker executes sink.
+**Entrypoints:** `POST /enqueue` + worker consumer
+**Vulns:**
+
+* `V1` **CWE-502 Unsafe Deserialization** — worker uses unsafe deserializer.
+
+  * Reachability: **R3** (taint from HTTP → queue → worker)
+  * Tags: `TAG:route:POST /enqueue`, `TAG:topic:jobs`, `TAG:call:Deserialize.Unsafe`
+  * Proof test: enqueue benign payload that triggers sink tag and deterministic “handled” response (no arbitrary execution PoC).
+  * Fix: switch to safe format (JSON) and validate schema.
+
+---
+
+### 10) `svc-10-efcore-rawsql` (.NET 10 + EF Core)
+
+**Purpose:** ORM misuse; taint → SQL sink detection.
+**Entrypoint:** `GET /reports?where=`
+**Vulns:**
+
+* `V1` **CWE-89 SQLi** — `FromSqlRaw`/`ExecuteSqlRaw` with interpolated input.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:GET /reports`, `TAG:taint:http.query.where`, `TAG:call:EFCore.FromSqlRaw.Unsafe`
+  * Fix: `FromSqlInterpolated` with parameters or LINQ predicates.
+
+---
+
+### 11) `svc-11-shaded-jar-deps` (Java/Gradle)
+
+**Purpose:** Dependency discovery inside fat/shaded jar; reachable vs present-only.
+**Entrypoint:** `GET /parse`
+**Vulns:**
+
+* `V1` **dep_runtime** — vulnerable lib included in shaded jar and actually invoked.
+
+  * Reachability: **R2**
+  * Tags: `TAG:route:GET /parse`, `TAG:call:Lib.Parse.VulnerableMethod`
+* `V2` **dep_build/test** — test-scoped vulnerable lib not packaged in runtime jar.
+
+  * Reachability: **R0**
+  * Negative proof: SBOM for runtime jar excludes it; file listing confirms.
+
+**Fix:** bump dependency and rebuild shaded jar.
+
+---
+
+### 12) `svc-12-webpack-bundled-dep` (Node/Webpack)
+
+**Purpose:** Bundled dependencies, source map presence/absence, SBOM correctness.
+**Entrypoint:** `GET /render?template=`
+**Vulns:**
+
+* `V1` **dep_runtime** — vulnerable template lib bundled; invoked by render.
+
+  * Reachability: **R2/R3** depending on input usage
+  * Tags: `TAG:route:GET /render`, `TAG:taint:http.query.template`, `TAG:call:Template.Render`
+* `V2` **R0** — vulnerable package in lockfile but tree-shaken and absent from output bundle.
+
+  * Negative proof: bundle inspection + build manifest.
+
+**Fix:** upgrade dependency and rebuild bundle; ensure SBOM maps bundle contents.
+
+---
+
+### 13) `svc-13-go-static-modver` (Go static binary)
+
+**Purpose:** Scanner capability to extract module versions from static binary.
+**Entrypoint:** `GET /hash?alg=`
+**Vulns:**
+
+* `V1` **dep_runtime** — vulnerable Go module version linked; executed on route.
+
+  * Reachability: **R2**
+  * Tags: `TAG:route:GET /hash`, `TAG:call:GoMod.VulnFunc`
+* `V2` **R1** — module linked but only used in dead code path (guarded by constant false).
+
+  * Negative proof: coverage/trace never hits sink.
+
+**Fix:** update `go.mod` and rebuild.
+
+---
+
+### 14) `svc-14-dotnet-singlefile-trim` (.NET 10 publish single-file)
+
+**Purpose:** Detect assemblies in single-file + trimming edge cases.
+**Entrypoint:** `GET /export`
+**Vulns:**
+
+* `V1` **dep_runtime** — vulnerable NuGet referenced and executed.
+
+  * Reachability: **R2**
+  * Tags: `TAG:route:GET /export`, `TAG:call:NuGet.VulnMethod`
+* `V2` **R0** — package referenced in project but trimmed out and not present.
+
+  * Negative proof: runtime file map (single-file manifest) excludes it.
+
+**Fix:** bump NuGet; adjust trimming settings if needed.
+
+---
+
+### 15) `svc-15-cors-credentials-wildcard` (.NET 10)
+
+**Purpose:** Config/misconfig detection; clear fix delta.
+**Entrypoint:** any API route
+**Vulns:**
+
+* `V1` **CWE-942 / CORS Misconfig** — `Access-Control-Allow-Origin: *` with credentials.
+
+  * Reachability: **R2** (observed in response headers)
+  * Tags: `TAG:route:GET /health`, `TAG:call:HTTP.Headers.CORSWildcardCreds`
+  * Proof test: request and assert headers + tag.
+  * Fix: explicit allowed origins + disable credentials unless needed.
+
+---
+
+### 16) `svc-16-open-redirect` (Node/Express)
+
+**Purpose:** Web vuln classification, allowlist fix.
+**Entrypoint:** `GET /login?next=`
+**Vulns:**
+
+* `V1` **CWE-601 Open Redirect** — next param used directly.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:GET /login`, `TAG:taint:http.query.next`, `TAG:call:Redirect.Unvalidated`
+  * Fix: allowlist relative paths; reject absolute URLs.
+
+---
+
+### 17) `svc-17-csrf-state-change` (.NET 10)
+
+**Purpose:** CSRF detection + cookie semantics.
+**Entrypoint:** `POST /account/email` (cookie auth)
+**Vulns:**
+
+* `V1` **CWE-352 CSRF** — no anti-forgery token; SameSite mis-set.
+
+  * Reachability: **R2**
+  * Tags: `TAG:route:POST /account/email`, `TAG:call:Auth.CSRF.MissingProtection`
+  * Fix: antiforgery token + SameSite=Lax/Strict and proper CORS.
+
+---
+
+### 18) `svc-18-prototype-pollution-merge` (Node)
+
+**Purpose:** JSON-body gated sink; reachability must respect content-type and route.
+**Entrypoint:** `POST /profile` (application/json)
+**Vulns:**
+
+* `V1` **CWE-1321 Prototype Pollution** — unsafe deep merge of user object into defaults.
+
+  * Reachability: **R3** (only if JSON)
+  * Tags: `TAG:route:POST /profile`, `TAG:taint:http.body.json`, `TAG:call:Object.Merge.Unsafe`
+  * Negative test: same request with non-JSON must not hit sink tag.
+  * Fix: safe merge, deny `__proto__` / `constructor` keys.
+
+---
+
+### 19) `svc-19-path-traversal-download` (Python/Flask)
+
+**Purpose:** File traversal with safe, local sentinel proof.
+**Entrypoint:** `GET /download?file=`
+**Vulns:**
+
+* `V1` **CWE-22 Path Traversal** — file path concatenated without normalization.
+
+  * Reachability: **R3**
+  * Tags: `TAG:route:GET /download`, `TAG:taint:http.query.file`, `TAG:call:FS.Read.UnsafePath`
+  * Proof test: attempt to read a known sentinel file outside the allowed directory (within container).
+  * Fix: normalize path, enforce base dir constraint.
+
+---
+
+### 20) `svc-20-insecure-tempfile-toctou` (Go or .NET)
+
+**Purpose:** Concurrency/race category; deterministic reproduction via controlled scheduling.
+**Entrypoint:** `POST /export` creates temp file and then reopens by name
+**Vulns:**
+
+* `V1` **CWE-367 TOCTOU** — uses predictable temp name + separate open.
+
+  * Reachability: **R2** (requires parallel test harness)
+  * Tags: `TAG:route:POST /export`, `TAG:call:FS.TempFile.InsecurePattern`
+  * Proof test: run two coordinated requests; assert race condition triggers sentinel behavior.
+  * Fix: use secure temp APIs + hold open FD; atomic operations.
+
+---
+
+## D. Meta repos (not “services” but essential for best-in-class scanning)
+
+### 21) `svc-21-k8s-misconfigs` (YAML/Helm)
+
+**Purpose:** IaC scanning; false-positive discipline.
+**Artifacts:** `manifests/*.yaml`, `helm/Chart.yaml`
+**Findings to plant:**
+
+* privileged container, `hostPath`, `runAsUser: 0`, missing resource limits, writable rootfs, wildcard RBAC
+  **Proof:** static assertions in tests (OPA/Conftest or your harness) generate evidence tags like `TAG:iac:k8s.privileged`.
+
+---
+
+### 22) `svc-22-docker-multistage-buildonly`
+
+**Purpose:** Prove build-time-only deps do not ship; prevent scanners from overreporting.
+**Pattern:** builder stage installs vulnerable tooling; final stage is distroless and excludes it.
+**Proof:** final image SBOM + `docker export` file list hash; must not include builder artifacts.
+
+---
+
+### 23) `svc-23-secrets-fakes-corpus`
+
+**Purpose:** Secret detection precision/recall without storing real secrets.
+**Pattern:** files containing **fake** tokens matching common regexes but clearly marked `FAKE_` and useless.
+**Labels:** must distinguish:
+
+* `R0 present` fake secret in docs/examples
+* `R2 reachable` secret injected into runtime env accidentally (then fixed)
+
+---
+
+### 24) `svc-24-sbom-mismatch-lab`
+
+**Purpose:** SBOM validation and drift detection.
+**Pattern:** generate an SBOM, then change deps without regenerating; label mismatch as a “supply_chain” issue.
+**Proof:** harness compares `image digest + lockfile hash + sbom hash`.
+
+---
+
+## E. Implementation notes that raise the bar (recommended defaults)
+
+1. **Each service ships with both**:
+
+   * `tests/test_positive_v*.{py,js,cs}` producing evidence for reachable vulns
+   * `tests/test_negative_v*.{py,js,cs}` proving unreachable claims
+2. **Every service includes a `fix/` patch** and a CI job that:
+
+   * builds “vuln image”, scans, evaluates
+   * applies fix, rebuilds, re-scans, confirms expected delta
+3. **Hard-mode toggle per service** (optional but valuable):
+
+   * `MODE=easy`: vuln sits on hot path (for demos)
+   * `MODE=hard`: same vuln behind realistic conditions (auth, header, flag, content-type, async)
+
+---
+
+If you want this to be “maxim degree” for scanner R&D, the next step is to add **one additional dimension per service** (fat jar, single-file, distroless, vendored deps, shaded deps, optional extras, transitive only, etc.). I can propose a precise pairing (which dimension goes to which service) so the suite covers all packaging and reachability edge cases without duplication.
diff --git a/docs/product-advisories/14-Dec-2025 - Dissect triage and evidence workflows.md b/docs/product-advisories/14-Dec-2025 - Dissect triage and evidence workflows.md
new file mode 100644
index 000000000..b1a3b9be8
--- /dev/null
+++ b/docs/product-advisories/14-Dec-2025 - Dissect triage and evidence workflows.md	
@@ -0,0 +1,551 @@
+Here’s a tight, practical blueprint for building (and proving) a fast, evidence‑first triage workflow—plus the power‑user affordances that make Stella Ops feel “snappy” even offline.
+
+# What “good” looks like (background in plain words)
+
+* **Alert → evidence → decision** in one flow: an alert should open directly onto the concrete proof (reachability, call‑stack, provenance), then offer a one‑click decision (VEX/CSAF status) with audit logging.
+* **Time‑to‑First‑Signal (TTFS)** is king: how fast a human sees the first credible piece of evidence that explains *why this alert matters here*.
+* **Clicks‑to‑Closure**: count how many interactions to reach a defensible decision recorded in the audit log.
+
+# Minimal evidence bundle per finding
+
+* **Reachability proof**: function‑level path or package‑level import chain (with “toggle reachability view” hotkey).
+* **Call‑stack snippet**: 5–10 frames around the sink/source with file:line anchors.
+* **Provenance**: attestation / DSSE + build ancestry (image → layer → artifact → commit).
+* **VEX/CSAF status**: affected/not‑affected/under‑investigation + reason.
+* **Diff**: what changed since last scan (SBOM or VEX delta), rendered as a small, human‑readable “smart‑diff.”
+
+# KPIs to measure in CI and UI
+
+* **TTFS (p50/p95)** from alert creation to first rendered evidence.
+* **Clicks‑to‑Closure (median)** per decision type.
+* **Evidence completeness score** (0–4): reachability, call‑stack, provenance, VEX/CSAF present.
+* **Offline friendliness score**: % of evidence resolvable with no network.
+* **Audit log completeness**: every decision has: evidence hash set, actor, policy context, replay token.
+
+# Power‑user affordances (keyboard first)
+
+* **Jump to evidence** (`J`): focuses the first incomplete evidence pane.
+* **Copy DSSE** (`Y`): copies the attestation block or Rekor entry ref.
+* **Toggle reachability view** (`R`): path list ↔ compact graph ↔ textual proof.
+* **Search‑within‑graph** (`/`): node/func/package, instant.
+* **Deterministic sort** (`S`): stable sort by (reachability→severity→age→component) to remove hesitation.
+* **Quick VEX set** (`A`, `N`, `U`): Affected / Not‑affected / Under‑investigation with templated reasons.
+
+# UX flow to implement (end‑to‑end)
+
+1. **Alert row** shows: TTFS timer, reachability badge, “decision state,” and a diff‑dot if something changed.
+2. **Open alert** lands on **Evidence tab** (not Details). Top strip = three proof pills:
+
+   * Reachability ✓ / Call‑stack ✓ / Provenance ✓ (click to expand inline).
+3. **Decision drawer** pinned on the right:
+
+   * VEX/CSAF radio (A/N/U) → Reason presets → “Record decision.”
+   * Shows **audit‑ready summary** (hashes, timestamps, policy).
+4. **Diff tab**: SBOM/VEX delta since last run, grouped by “meaningful risk shift.”
+5. **Activity tab**: immutable audit log; export as a signed bundle for audits.
+
+# Graph performance on large call‑graphs
+
+* **Minimal‑latency snapshots**: pre‑render static PNG/SVG thumbnails server‑side; open with tiny preview then hydrate to interactive graph lazily.
+* **Progressive neighborhood expansion**: load 1‑hop first, expand on demand; keep the first TTFS < 500 ms.
+* **Stable node ordering**: deterministic layout with consistent anchors to avoid “graph shuffle” anxiety.
+* **Chunked graph edges** with capped fan‑out; collapse identical library paths into a **reachability macro‑edge**.
+
+# Offline‑friendly design
+
+* **Local evidence cache**: store (SBOM slices, path proofs, DSSE attestations, compiled call‑stacks) in a signed bundle beside the SARIF/VEX.
+* **Deferred enrichment**: mark fields that need internet (e.g., upstream CSAF fetch) and queue a background “enricher” when network returns.
+* **Predictable fallbacks**: if provenance server missing, show embedded DSSE and “verification pending,” never blank states.
+
+# Audit & replay
+
+* **Deterministic replay token**: hash(feed manifests + rules + lattice policy + inputs) → attach to every decision.
+* **One‑click “Reproduce”**: opens CLI snippet pinned to the exact versions and policies.
+* **Evidence hash‑set**: content‑address each proof artifact; the audit entry stores only hashes + signer.
+
+# TTFS & Clicks‑to‑Closure: how to measure in code
+
+* Emit a `ttfs.start` at alert creation; first paint of any evidence card emits `ttfs.signal`.
+* Increment a per‑alert **interaction counter**; on “Record decision” emit `close.clicks`.
+* Log **evidence bitset** (reach, stack, prov, vex) at decision time for completeness scoring.
+
+# Developer tasks (concrete, shippable)
+
+* **Evidence API**: `GET /alerts/{id}/evidence` returns `{reachability, callstack, provenance, vex, hashes[]}` with deterministic sort.
+* **Proof renderer**: tiny, no‑framework widget that can render from the offline bundle; hydrate to full only on interaction.
+* **Keyboard map**: global handler with overlay help (`?`); no collisions; all actions are idempotent.
+* **Graph service**: server‑side layout + snapshot PNG; client hydrates WebGL only when user expands.
+* **Smart‑diff**: diff SBOM/VEX → classify into “risk‑raising / neutral / reducing,” surface only the first item by default.
+* **Audit logger**: append‑only stream; signed checkpoints; export `.stella-audit.tgz` (attestations + JSONL).
+
+# Benchmarks to run weekly
+
+* **TTFS under poor network** (100 ms RTT, 1% loss): p95 < 1.5 s to first evidence.
+* **Graph hydration on 250k‑edge image**: preview < 300 ms, interactive < 2.0 s.
+* **Keyboard coverage**: ≥90% of triage actions executable without mouse.
+* **Offline replay**: 100% of decisions re‑render from bundle; zero web calls required.
+
+# Why Stella’s approach reduces hesitation
+
+* **Deterministic sort orders** keep findings in place between refreshes.
+* **Minimal‑latency graph snapshots** show something trustworthy immediately, then refine—no “blank panel” delay.
+* **Replayable, signed bundles** make every click auditable and reversible, which builds operator confidence.
+
+If you want, I can turn this into:
+
+* a **UI checklist** for a design review,
+* a **.NET 10 API contract** (DTOs + endpoints),
+* or a **Cypress/Playwright test plan** that measures TTFS and clicks‑to‑closure automatically.
+Below is a PM‑style implementation guideline you can hand to developers. It’s written as a **build spec**: clear goals, “MUST/SHOULD” requirements, acceptance criteria, and the non‑functional guardrails (performance, offline, auditability) that make triage feel fast and defensible.
+
+---
+
+# Stella Ops — Evidence‑First Triage Implementation Guidelines (PM Spec)
+
+## 0) Assumptions and scope
+
+**Assumptions**
+
+* Stella Ops ingests vulnerability findings (SCA/SAST/image scans), has SBOM context, and can compute reachability/call paths.
+* Triage outcomes must be recorded as VEX/CSAF‑compatible states with reasons and audit trails.
+* Users may operate in restricted networks and need an offline mode that still shows evidence.
+
+**In scope**
+
+* Evidence‑first alert triage UI + APIs + telemetry.
+* Reachability proof + call stack view + provenance attestation view.
+* VEX/CSAF decision recording with audit export.
+* Offline evidence bundle and deterministic replay token.
+
+**Out of scope (for this phase)**
+
+* Building the underlying static analyzer or SBOM generator (we consume their outputs).
+* Full CSAF publishing workflow (we store and export; publishing is separate).
+* Remediation automation (PRs, patching).
+
+---
+
+## 1) Product principles (non‑negotiables)
+
+1. **Evidence before detail**
+   Opening an alert **MUST** show the best available evidence immediately (even partial/placeholder), not a generic “details” page.
+2. **Fast first signal**
+   The UI **MUST** render a credible “first signal” quickly (reachability badge, call stack snippet, or provenance block).
+3. **Determinism reduces hesitation**
+   Sorting, graphs, and diffs **MUST** be stable across refreshes. No jittery re-layout.
+4. **Offline by design**
+   If evidence exists locally (bundle), the UI **MUST** render it without network access.
+5. **Audit-ready by default**
+   Every decision **MUST** be reproducible, attributable, and exportable with evidence hashes.
+
+---
+
+## 2) Success metrics (what we ship toward)
+
+These become acceptance criteria and dashboards.
+
+### Primary metrics (P0)
+
+* **TTFS (Time‑to‑First‑Signal)**: p95 < **1.5s** from opening an alert to first evidence card rendering (with 100ms RTT, 1% loss simulation).
+* **Clicks‑to‑Closure**: median < **6** interactions to record a VEX decision.
+* **Evidence completeness** at decision time: ≥ **90%** of decisions include evidence hash set + reason + replay token.
+
+### Secondary metrics (P1)
+
+* **Offline resolution rate**: ≥ **95%** of alerts opened with a local bundle show reachability + provenance without network.
+* **Graph usability**: preview render < **300ms**, interactive hydration < **2.0s** for large graphs (see §7).
+
+---
+
+## 3) User workflows and “Definition of Done”
+
+### Workflow A: Triage an alert to a decision
+
+**DoD**: user can open an alert, see evidence, set VEX state, and the system records a signed/auditable decision event.
+
+**Steps**
+
+1. Alert list shows key signals (reachability badge, decision state, diff indicator).
+2. Open alert → Evidence view loads first.
+3. User reviews reachability/call stack/provenance.
+4. User sets VEX status + reason preset (editable).
+5. User records decision.
+6. Audit log entry appears instantly and is exportable.
+
+### Workflow B: Explain “why is this flagged?”
+
+**DoD**: user can show a defensible proof (path/call stack/provenance) and copy it into a ticket.
+
+---
+
+## 4) UI requirements (MUST/SHOULD/MAY)
+
+## 4.1 Alert list page
+
+**MUST**
+
+* Each row includes:
+
+  * Severity + component identifier
+  * **Decision state** (Unset / Under Investigation / Not Affected / Affected)
+  * **Reachability badge** (Reachable / Not Reachable / Unknown) where available
+  * **Diff indicator** if SBOM/VEX changed since last scan (simple dot/label)
+  * Age / first seen / last updated
+* **Deterministic sort** default:
+  `Reachability DESC → Severity DESC → Decision state (Unset first) → Age DESC → Component name ASC`
+* Keyboard navigation:
+
+  * `↑/↓` move selection, `Enter` open alert.
+  * `/` search/filter focus.
+
+**SHOULD**
+
+* Inline “quick set” decision menu (Affected / Not affected / Under investigation) without leaving list for obvious cases, but still requires reason and logs evidence hashes.
+
+## 4.2 Alert detail — landing tab MUST be Evidence
+
+**MUST**
+
+* Default landing is **Evidence** (not “Overview”).
+* Top section shows 3 “proof pills” with status:
+
+  * Reachability (✓ / ! / …)
+  * Call stack (✓ / ! / …)
+  * Provenance (✓ / ! / …)
+* Each pill expands inline (no navigation) into a compact evidence panel.
+
+**MUST: No blank panels**
+
+* If evidence is loading, show skeleton + “what’s coming.”
+* If evidence missing, show a reason (“not computed”, “requires source map”, “offline – enrichment pending”).
+
+## 4.3 Decision drawer
+
+**MUST**
+
+* Pinned right drawer (or persistent bottom sheet on small screens).
+* Controls:
+
+  * VEX/CSAF status: **Affected / Not affected / Under investigation**
+  * Reason preset dropdown + editable reason text
+  * “Record decision” button
+* Preview “Audit summary” before submit:
+
+  * Evidence hashes included
+  * Policy context (ruleset version)
+  * Replay token
+  * Actor identity
+
+**MUST**
+
+* On submit, create an append-only audit event and immediately reflect status in UI.
+
+**SHOULD**
+
+* Allow attaching references: ticket URL, incident ID, PR link (stored as metadata).
+
+## 4.4 Diff tab
+
+**MUST**
+
+* Show delta since last scan:
+
+  * SBOM diffs (component version changes, removals/additions)
+  * VEX diffs (status changes)
+* Group diffs by **risk shift**:
+
+  * Risk‑raising (new reachable vuln, severity increase)
+  * Neutral (metadata-only)
+  * Risk‑reducing (fixed version, reachability removed)
+
+**SHOULD**
+
+* Provide “Copy diff summary” for change management.
+
+## 4.5 Activity/Audit tab
+
+**MUST**
+
+* Immutable timeline of decisions and evidence changes.
+* Each entry includes:
+
+  * actor, timestamp, decision, reason
+  * evidence hash set
+  * replay token
+  * bundle/export availability
+
+---
+
+## 5) Power-user and accessibility requirements
+
+### Keyboard shortcuts (MUST)
+
+* `J`: jump to next missing/incomplete evidence panel
+* `R`: toggle reachability view (list ↔ compact graph ↔ textual proof)
+* `Y`: copy selected evidence block (call stack / DSSE / path proof)
+* `A`: set “Affected” (opens reason preset selection)
+* `N`: set “Not affected”
+* `U`: set “Under investigation”
+* `?`: keyboard help overlay
+
+### Accessibility (MUST)
+
+* Fully navigable by keyboard
+* Visible focus states
+* Screen-reader labels for evidence pills and drawer controls
+* Color is never the only signal (badges must have text/icon)
+
+---
+
+## 6) Evidence model: what every alert should attempt to provide
+
+Treat this as the **minimum evidence bundle**. Each item may be “unavailable,” but must be explicit.
+
+**MUST** support:
+
+1. **Reachability proof**
+
+   * At least one of:
+
+     * function-level call path: `entry → … → vulnerable_sink`
+     * package/module import chain
+   * Includes confidence/algorithm tag: `static`, `dynamic`, `heuristic`
+2. **Call stack snippet**
+
+   * 5–10 frames around the relevant node with file:line anchors where possible
+3. **Provenance**
+
+   * DSSE attestation or equivalent statement
+   * Artifact ancestry chain: image → layer → artifact → commit (as available)
+   * Verification status: verified / pending / failed (with reason)
+4. **Decision state**
+
+   * VEX status + reason + timestamps
+5. **Evidence hash set**
+
+   * Content-addressed hashes of each evidence artifact included in the decision
+
+**SHOULD**
+
+* “Evidence freshness”: when computed, tool version, input revisions.
+
+---
+
+## 7) Performance and graph rendering requirements
+
+### TTFS budget (MUST)
+
+* When opening an alert:
+
+  * **<200ms**: show skeleton and cached row metadata
+  * **<500ms**: render at least one evidence pill with meaningful content OR a cached preview image
+  * **<1.5s p95**: render reachability + provenance for typical alerts
+
+### Graph rendering for large call graphs (MUST)
+
+* **Two-phase rendering**
+
+  1. Server-generated **static snapshot** (PNG/SVG) displayed immediately
+  2. Interactive graph hydrates lazily on user expand
+* **Progressive expansion**
+
+  * Load 1-hop neighborhood first; expand on click
+* **Deterministic layout**
+
+  * Same input produces same layout anchors (no reshuffles between refreshes)
+* **Fan-out control**
+
+  * Collapse repeated library paths into “macro edges” to keep the graph readable
+
+---
+
+## 8) Offline mode requirements
+
+Offline is not “nice to have”; it is a defined mode.
+
+### Offline evidence bundle (MUST)
+
+* A single file (e.g., `.stella.bundle.tgz`) that contains:
+
+  * Alert metadata snapshot
+  * Evidence artifacts (reachability proofs, call stacks, provenance attestations)
+  * SBOM slice(s) necessary for diffs
+  * VEX decision history (if available)
+  * Manifest with content hashes (Merkle-ish)
+* Bundle must be **signed** (or include signature material) and verifiable.
+
+### UI behavior (MUST)
+
+* If bundle is present:
+
+  * UI loads evidence from it first
+  * Any missing items show “enrichment pending” (not “error”)
+* If network returns:
+
+  * Background refresh allowed, but **must not reorder** the alert list unexpectedly
+  * Must surface “updated evidence available” as a user-controlled refresh, not an auto-switch that changes context mid-triage
+
+---
+
+## 9) Auditability and replay requirements
+
+### Decision event schema (MUST)
+
+Every recorded decision must store:
+
+* `alert_id`, `artifact_id` (image digest or commit hash)
+* `actor_id`, `timestamp`
+* `decision_status` (Affected/Not affected/Under investigation)
+* `reason_code` (preset) + `reason_text`
+* `evidence_hashes[]` (content-addressed hashes)
+* `policy_context` (ruleset version, policy id)
+* `replay_token` (hash of inputs needed to reproduce)
+
+### Replay token (MUST)
+
+* Deterministic hash of:
+
+  * scan inputs (SBOM digest, image digest, tool versions)
+  * policy/rules versions
+  * reachability algorithm version
+* “Reproduce” button produces a CLI snippet (copyable) pinned to these versions.
+
+### Export (MUST)
+
+* Exportable audit bundle that includes:
+
+  * JSONL of decision events
+  * evidence artifacts referenced by hashes
+  * signatures/attestations
+* Export must be stable and verifiable later.
+
+---
+
+## 10) API and data contract guidelines (developer-facing)
+
+This is an implementation guideline, not a full API spec—keep it simple and cache-friendly.
+
+### MUST endpoints (or equivalent)
+
+* `GET /alerts?filters…` → list view payload (small, cacheable)
+* `GET /alerts/{id}/evidence` → evidence payload (reachability, call stack, provenance, hashes)
+* `POST /alerts/{id}/decisions` → record decision event (append-only)
+* `GET /alerts/{id}/audit` → audit timeline
+* `GET /alerts/{id}/diff?baseline=…` → SBOM/VEX diff view
+* `GET /bundles/{id}` and/or `POST /bundles/verify` → offline bundle download/verify
+
+### Evidence payload guidelines (MUST)
+
+* Deterministic ordering for arrays and nodes (stable sorts).
+* Explicit `status` per evidence section: `available | loading | unavailable | error`.
+* Include `hash` per artifact for content addressing.
+
+**Example shape**
+
+```json
+{
+  "alert_id": "a123",
+  "reachability": { "status": "available", "hash": "sha256:…", "proof": { "type": "call_path", "nodes": [...] } },
+  "callstack":     { "status": "available", "hash": "sha256:…", "frames": [...] },
+  "provenance":    { "status": "pending",   "hash": null,       "dsse": { "embedded": true, "payload": "…" } },
+  "vex":           { "status": "available", "current": {...}, "history": [...] },
+  "hashes": ["sha256:…", "sha256:…"]
+}
+```
+
+---
+
+## 11) Telemetry requirements (how we prove it’s fast)
+
+**MUST** instrument:
+
+* `alert_opened` (timestamp, alert_id)
+* `evidence_first_paint` (timestamp, evidence_type)
+* `decision_recorded` (timestamp, clicks_count, evidence_bitset)
+* `bundle_loaded` (hit/miss, size, verification_status)
+* `graph_preview_paint` and `graph_hydrated`
+
+**MUST** compute:
+
+* TTFS = `evidence_first_paint - alert_opened`
+* Clicks‑to‑Closure = interaction counter per alert until decision recorded
+* Evidence completeness bitset at decision time: reachability/callstack/provenance/vex present
+
+---
+
+## 12) Error handling and edge cases
+
+**MUST**
+
+* Never show empty states without explanation.
+* Distinguish between:
+
+  * “not computed yet”
+  * “not possible due to missing inputs”
+  * “blocked by permissions”
+  * “offline—enrichment pending”
+  * “verification failed”
+
+**SHOULD**
+
+* Offer “Request enrichment” action when evidence missing (creates a job/task id).
+
+---
+
+## 13) Security, permissions, and multi-tenancy
+
+**MUST**
+
+* RBAC gating for:
+
+  * viewing provenance attestations
+  * recording decisions
+  * exporting audit bundles
+* All decision events are immutable; corrections are new events (append-only).
+* PII handling:
+
+  * Avoid storing freeform reasons with secrets; warn on paste patterns (optional P1).
+
+---
+
+## 14) Engineering execution plan (priorities)
+
+### P0 (ship first)
+
+* Evidence-first alert detail landing
+* Decision drawer + append-only audit
+* Deterministic alert list sort + reachability badge
+* Evidence API + decision POST
+* TTFS + clicks telemetry
+* Static graph preview + lazy hydration
+
+### P1
+
+* Offline bundle load/verify + offline rendering
+* Smart diff view (risk shift grouping)
+* Exportable audit bundle
+* Keyboard shortcuts + help overlay
+
+### P2
+
+* Inline quick decisions from list
+* Advanced graph search within view
+* Suggest reason presets based on evidence patterns
+
+---
+
+## 15) Acceptance criteria checklist (what QA signs off)
+
+A build is acceptable when:
+
+* Opening an alert renders at least one evidence pill within **500ms** (with cache) and TTFS p95 meets target under network simulation.
+* Users can record A/N/U decisions with reason and see an audit event immediately.
+* Decision event includes evidence hashes + replay token.
+* Alert list sorting is stable and deterministic across refresh.
+* Graph preview appears instantly; interactive graph hydrates only on expand.
+* Offline bundle renders evidence without network; missing items show “enrichment pending,” not errors.
+* Keyboard shortcuts work; `?` overlay lists them; full keyboard navigation is possible.
+
+---
+
+If you want, I can also format this into a **developer-ready ticket pack** (epics + user stories + acceptance tests) so engineers can implement without interpretation drift.
diff --git a/docs/product-advisories/14-Dec-2025 - Evaluate PostgreSQL vs MongoDB for StellaOps.md b/docs/product-advisories/14-Dec-2025 - Evaluate PostgreSQL vs MongoDB for StellaOps.md
new file mode 100644
index 000000000..9f73ca4b2
--- /dev/null
+++ b/docs/product-advisories/14-Dec-2025 - Evaluate PostgreSQL vs MongoDB for StellaOps.md	
@@ -0,0 +1,544 @@
+Here’s a quick, practical cheat‑sheet on choosing **PostgreSQL vs MongoDB** for security/DevOps apps—plus how I’d model SBOM/VEX and queues in Stella Ops without adding moving parts.
+
+---
+
+# PostgreSQL you can lean on (why it often wins for ops apps)
+
+* **JSONB that flies:** Store documents yet query like SQL. Add **GIN indexes** on JSONB fields for fast lookups (`jsonb_ops` general; `jsonb_path_ops` great for `@>` containment).
+* **Queue pattern built‑in:** `SELECT … FOR UPDATE SKIP LOCKED` lets multiple workers pop jobs from the same table safely—no head‑of‑line blocking, no extra broker.
+* **Cooperative locks:** **Advisory locks** (session/transaction) for “at‑most‑once” sections or leader election.
+* **Lightweight pub/sub:** **LISTEN/NOTIFY** for async nudges between services (poke a worker to re‑scan, refresh cache, etc.).
+* **Search included:** **Full‑text search** (tsvector/tsquery) is native—no separate search service for moderate needs.
+* **Serious backups:** **PITR** with WAL archiving / `pg_basebackup` for deterministic rollbacks and offline bundles.
+
+# MongoDB facts to factor in
+
+* **Flexible ingest:** Schemaless docs make it easy to absorb varied telemetry and vendor feeds.
+* **Horizontal scale:** Sharding is mature for huge, read‑heavy datasets.
+* **Consistency is a choice:** Design embedding vs refs and when to use multi‑document transactions.
+
+---
+
+# A simple rule of thumb (Stella Ops‑style)
+
+* **System of record:** PostgreSQL (JSONB first).
+* **Hot paths:** Materialized views + JSONB GIN indexes.
+* **Queues & coordination:** PostgreSQL (skip‑locked + advisory locks).
+* **Cache/accel only:** Valkey (ephemeral).
+* **MongoDB:** Optional for **very large, read‑optimized graph snapshots** (e.g., periodically baked reachability graphs) if Postgres starts to strain.
+
+---
+
+# Concrete patterns you can drop in today
+
+**1) SBOM/VEX storage (Postgres JSONB)**
+
+```sql
+-- Documents
+CREATE TABLE sbom (
+  id BIGSERIAL PRIMARY KEY,
+  artifact_purl TEXT NOT NULL,
+  doc JSONB NOT NULL,
+  created_at TIMESTAMPTZ DEFAULT now()
+);
+CREATE INDEX sbom_purl_idx ON sbom(artifact_purl);
+CREATE INDEX sbom_doc_gin ON sbom USING GIN (doc jsonb_path_ops);
+
+-- Common queries
+-- find components by name/version:
+-- SELECT * FROM sbom WHERE doc @> '{"components":[{"name":"openssl","version":"3.0.14"}]}';
+
+-- VEX
+CREATE TABLE vex (
+  id BIGSERIAL PRIMARY KEY,
+  subject_purl TEXT NOT NULL,
+  vex_doc JSONB NOT NULL,
+  created_at TIMESTAMPTZ DEFAULT now()
+);
+CREATE INDEX vex_subject_idx ON vex(subject_purl);
+CREATE INDEX vex_doc_gin ON vex USING GIN (vex_doc jsonb_path_ops);
+```
+
+**2) Hot reads via materialized views**
+
+```sql
+CREATE MATERIALIZED VIEW mv_open_findings AS
+SELECT
+  s.artifact_purl,
+  c->>'name' AS comp,
+  c->>'version' AS ver,
+  v.vex_doc
+FROM sbom s
+CROSS JOIN LATERAL jsonb_array_elements(s.doc->'components') c
+LEFT JOIN vex v ON v.subject_purl = s.artifact_purl
+-- add WHERE clauses to pre‑filter only actionable rows
+;
+CREATE INDEX mv_open_findings_idx ON mv_open_findings(artifact_purl, comp);
+```
+
+Refresh cadence: on feed import or via a scheduler; `REFRESH MATERIALIZED VIEW CONCURRENTLY mv_open_findings;`
+
+**3) Queue without a broker**
+
+```sql
+CREATE TABLE job_queue(
+  id BIGSERIAL PRIMARY KEY,
+  kind TEXT NOT NULL,          -- e.g., 'scan', 'sbom-diff'
+  payload JSONB NOT NULL,
+  run_after TIMESTAMPTZ DEFAULT now(),
+  attempts INT DEFAULT 0,
+  locked_at TIMESTAMPTZ,
+  locked_by TEXT
+);
+CREATE INDEX job_queue_ready_idx ON job_queue(kind, run_after);
+
+-- Worker loop
+WITH cte AS (
+  SELECT id FROM job_queue
+  WHERE kind = $1 AND run_after <= now() AND locked_at IS NULL
+  ORDER BY id
+  FOR UPDATE SKIP LOCKED
+  LIMIT 1
+)
+UPDATE job_queue j
+SET locked_at = now(), locked_by = $2
+FROM cte
+WHERE j.id = cte.id
+RETURNING j.*;
+```
+
+Release/fail with: set `locked_at=NULL, locked_by=NULL, attempts=attempts+1` or delete on success.
+
+**4) Advisory lock for singletons**
+
+```sql
+-- Acquire (per tenant, per artifact)
+SELECT pg_try_advisory_xact_lock(hashtextextended('recalc:'||tenant||':'||artifact, 0));
+```
+
+**5) Nudge workers without a bus**
+
+```sql
+NOTIFY stella_scan, json_build_object('purl', $1, 'priority', 5)::TEXT;
+-- workers LISTEN stella_scan and enqueue quickly
+```
+
+---
+
+# When to add MongoDB
+
+* You need **interactive exploration** over **hundreds of millions of nodes/edges** (e.g., historical “proof‑of‑integrity” graphs) where document fan‑out and denormalized reads beat relational joins.
+* Snapshot cadence is **batchy** (hourly/daily), and you can **re‑emit** snapshots deterministically from Postgres (single source of truth).
+* You want to isolate read spikes from the transactional core.
+
+**Snapshot pipe:** Postgres → (ETL) → MongoDB collection `{graph_id, node, edges[], attrs}` with **compound shard keys** tuned to your UI traversal.
+
+---
+
+# Why this fits Stella Ops
+
+* Fewer moving parts on‑prem/air‑gapped.
+* Deterministic replays (PITR + immutable imports).
+* Clear performance levers (GIN indexes, MVs, skip‑locked queues).
+* MongoDB stays optional, purpose‑built for giant read graphs—not a default dependency.
+
+If you want, I can turn the above into ready‑to‑run `.sql` migrations and a small **.NET 10** worker (Dapper/EF Core) that implements the queue loop + advisory locks + LISTEN/NOTIFY hooks.
+Below is a handoff-ready set of **PostgreSQL tables/views engineering guidelines** intended for developer review. It is written as a **gap-finding checklist** with **concrete DDL patterns** and **performance red flags** (Postgres as system of record, JSONB where useful, derived projections where needed).
+
+---
+
+# PostgreSQL Tables & Views Engineering Guide
+
+## 0) Non-negotiable principles
+
+1. **Every hot query must have an index story.** If you cannot name the index that serves it, you have a performance gap.
+2. **Write path stays simple.** Prefer **append-only** versioning to large updates (especially for JSONB).
+3. **Multi-tenant must be explicit.** Every core table includes `tenant_id` and indexes are tenant-prefixed.
+4. **Derived data is a product.** If the UI needs it fast, model it as a **projection table or materialized view**, not as an ad-hoc mega-join.
+5. **Idempotency is enforced in the DB.** Unique keys for imports/jobs/results; no “best effort” dedupe in application only.
+
+---
+
+# 1) Table taxonomy and what to look for
+
+Use this to classify every table; each class has different indexing/retention/locking rules.
+
+### A. Source-of-truth (SOR) tables
+
+Examples: `sbom_document`, `vex_document`, `feed_import`, `scan_manifest`, `attestation`.
+
+* **Expect:** immutable rows, versioning via new row inserts.
+* **Gaps:** frequent updates to large JSONB; missing `content_hash`; no unique idempotency key.
+
+### B. Projection tables (query-optimized)
+
+Examples: `open_findings`, `artifact_risk_summary`, `component_index`.
+
+* **Expect:** denormalized, indexed for UI/API; refresh/update strategy defined.
+* **Gaps:** projections rebuilt from scratch too often; missing incremental update plan; no retention plan.
+
+### C. Queue/outbox tables
+
+Examples: `job_queue`, `outbox_events`.
+
+* **Expect:** `SKIP LOCKED` claim pattern; retry + DLQ; minimal lock duration.
+* **Gaps:** holding row locks while doing work; missing partial index for “ready” jobs.
+
+### D. Audit/event tables
+
+Examples: `scan_run_event`, `decision_event`, `access_audit`.
+
+* **Expect:** append-only; partitioned by time; BRIN on timestamps.
+* **Gaps:** single huge table without partitioning; slow deletes instead of partition drops.
+
+---
+
+# 2) Naming, keys, and required columns
+
+## Required columns per class
+
+### SOR documents (SBOM/VEX/Attestations)
+
+* `tenant_id uuid`
+* `id bigserial` (internal PK)
+* `external_id uuid` (optional API-facing id)
+* `content_hash bytea` (sha256) **NOT NULL**
+* `doc jsonb` **NOT NULL**
+* `created_at timestamptz` **NOT NULL default now()**
+* `supersedes_id bigint NULL` (version chain) OR `version int`
+
+**Checklist**
+
+* [ ] Unique constraint exists: `(tenant_id, content_hash)`
+* [ ] Version strategy exists (supersedes/version) and is queryable
+* [ ] “Latest” access is index-backed (see §4)
+
+### Queue
+
+* `tenant_id uuid` (if multi-tenant)
+* `id bigserial`
+* `kind text`
+* `payload jsonb`
+* `run_after timestamptz`
+* `attempts int`
+* `locked_at timestamptz NULL`
+* `locked_by text NULL`
+* `status smallint` (optional; e.g., ready/running/done/dead)
+
+**Checklist**
+
+* [ ] “Ready to claim” has a partial index (see §4)
+* [ ] Claim transaction is short (claim+commit; work outside lock)
+
+---
+
+# 3) JSONB rules that prevent “looks fine → melts in prod”
+
+## When JSONB is appropriate
+
+* Storing signed envelopes (DSSE), SBOM/VEX raw docs, vendor payloads.
+* Ingest-first scenarios where schema evolves.
+
+## When JSONB is a performance hazard
+
+* You frequently query deep keys/arrays (components, vulnerabilities, call paths).
+* You need sorting/aggregations on doc fields.
+
+**Mandatory pattern for hot JSON fields**
+
+1. Keep the raw JSONB for fidelity.
+2. Extract **hot keys** into **stored generated columns** (or real columns), index those.
+3. Extract **hot arrays** into child tables (components, vulnerabilities).
+
+Example:
+
+```sql
+CREATE TABLE sbom_document (
+  id           bigserial PRIMARY KEY,
+  tenant_id    uuid NOT NULL,
+  artifact_purl text NOT NULL,
+  content_hash bytea NOT NULL,
+  doc          jsonb NOT NULL,
+  created_at   timestamptz NOT NULL DEFAULT now(),
+
+  -- hot keys as generated columns
+  bom_format   text GENERATED ALWAYS AS ((doc->>'bomFormat')) STORED,
+  spec_version text GENERATED ALWAYS AS ((doc->>'specVersion')) STORED
+);
+
+CREATE UNIQUE INDEX ux_sbom_doc_hash ON sbom_document(tenant_id, content_hash);
+CREATE INDEX ix_sbom_doc_tenant_artifact ON sbom_document(tenant_id, artifact_purl, created_at DESC);
+CREATE INDEX ix_sbom_doc_json_gin ON sbom_document USING GIN (doc jsonb_path_ops);
+CREATE INDEX ix_sbom_doc_bomformat ON sbom_document(tenant_id, bom_format);
+```
+
+**Checklist**
+
+* [ ] Any query using `doc->>` in WHERE has either an expression index or a generated column index
+* [ ] Any query using `jsonb_array_elements(...)` in hot path has been replaced by a normalized child table or a projection table
+
+---
+
+# 4) Indexing standards (what devs must justify)
+
+## Core rules
+
+1. **Tenant-first**: `INDEX(tenant_id, …)` for anything read per tenant.
+2. **Sort support**: if query uses `ORDER BY created_at DESC`, index must end with `created_at DESC`.
+3. **Partial indexes** for sparse predicates (status/locked flags).
+4. **BRIN** for massive append-only time series.
+5. **GIN jsonb_path_ops** for containment (`@>`) on JSONB; avoid GIN for everything.
+
+## Required index patterns by use case
+
+### “Latest version per artifact”
+
+If you store versions as rows:
+
+```sql
+-- supports: WHERE tenant_id=? AND artifact_purl=? ORDER BY created_at DESC LIMIT 1
+CREATE INDEX ix_sbom_latest ON sbom_document(tenant_id, artifact_purl, created_at DESC);
+```
+
+### Ready queue claims
+
+```sql
+CREATE INDEX ix_job_ready
+ON job_queue(kind, run_after, id)
+WHERE locked_at IS NULL;
+
+-- Optional: tenant scoped
+CREATE INDEX ix_job_ready_tenant
+ON job_queue(tenant_id, kind, run_after, id)
+WHERE locked_at IS NULL;
+```
+
+### JSON key lookup (expression index)
+
+```sql
+-- supports: WHERE (doc->>'subject') = ?
+CREATE INDEX ix_vex_subject_expr
+ON vex_document(tenant_id, (doc->>'subject'));
+```
+
+### Massive event table time filtering
+
+```sql
+CREATE INDEX brin_scan_events_time
+ON scan_run_event USING BRIN (occurred_at);
+```
+
+**Red flags**
+
+* GIN index on a JSONB column + frequent updates = bloat and write amplification.
+* No partial index for queue readiness → sequential scans under load.
+* Composite indexes with wrong leading column order (e.g., `created_at, tenant_id`) → not used.
+
+---
+
+# 5) Partitioning and retention (avoid “infinite tables”)
+
+Use partitioning for:
+
+* audit/events
+* scan run logs
+* large finding histories
+* anything > tens of millions rows with time-based access
+
+## Standard approach
+
+* Partition by `occurred_at` (monthly) for event/audit tables.
+* Retention by dropping partitions (fast and vacuum-free).
+
+Example:
+
+```sql
+CREATE TABLE scan_run_event (
+  tenant_id uuid NOT NULL,
+  scan_run_id bigint NOT NULL,
+  occurred_at timestamptz NOT NULL,
+  event_type text NOT NULL,
+  payload jsonb NOT NULL
+) PARTITION BY RANGE (occurred_at);
+```
+
+**Checklist**
+
+* [ ] Partition creation/rollover process exists (migration or scheduler)
+* [ ] Retention is “DROP PARTITION”, not “DELETE WHERE occurred_at < …”
+* [ ] Each partition has needed local indexes (BRIN/time + tenant filters)
+
+---
+
+# 6) Views vs Materialized Views vs Projection Tables
+
+## Use a normal VIEW when
+
+* It’s thin (renaming columns, simple joins) and not used in hot paths.
+
+## Use a MATERIALIZED VIEW when
+
+* It accelerates complex joins/aggregations and can be refreshed on a schedule.
+* You can tolerate refresh lag.
+
+**Materialized view requirements**
+
+* Must have a **unique index** to use `REFRESH … CONCURRENTLY`.
+* Refresh must be **outside** an explicit transaction block.
+
+Example:
+
+```sql
+CREATE MATERIALIZED VIEW mv_artifact_risk AS
+SELECT tenant_id, artifact_purl, max(score) AS risk_score
+FROM open_findings
+GROUP BY tenant_id, artifact_purl;
+
+CREATE UNIQUE INDEX ux_mv_artifact_risk
+ON mv_artifact_risk(tenant_id, artifact_purl);
+```
+
+## Prefer projection tables over MV when
+
+* You need **incremental updates** (on import/scan completion).
+* You need deterministic “point-in-time” snapshots per manifest.
+
+**Checklist**
+
+* [ ] Every MV has refresh cadence + owner (which worker/job triggers it)
+* [ ] UI/API queries do not depend on a heavy non-materialized view
+* [ ] If “refresh cost” scales with whole dataset, projection table exists instead
+
+---
+
+# 7) Queue and outbox patterns that do not deadlock
+
+## Claim pattern (short transaction)
+
+```sql
+WITH cte AS (
+  SELECT id
+  FROM job_queue
+  WHERE kind = $1
+    AND run_after <= now()
+    AND locked_at IS NULL
+  ORDER BY id
+  FOR UPDATE SKIP LOCKED
+  LIMIT 1
+)
+UPDATE job_queue j
+SET locked_at = now(),
+    locked_by = $2
+FROM cte
+WHERE j.id = cte.id
+RETURNING j.*;
+```
+
+**Rules**
+
+* Claim + commit quickly.
+* Do work outside the lock.
+* On completion: update row to done (or delete if you want compactness).
+* On failure: increment attempts, set `run_after = now() + backoff`, release lock.
+
+**Checklist**
+
+* [ ] Worker does not keep transaction open while scanning/importing
+* [ ] Backoff policy is encoded (in DB columns) and observable
+* [ ] DLQ condition exists (attempts > N) and is queryable
+
+---
+
+# 8) Query performance review checklist (what to require in PRs)
+
+For each new endpoint/query:
+
+* [ ] Provide the query (SQL) and the intended parameters.
+* [ ] Provide `EXPLAIN (ANALYZE, BUFFERS)` from a dataset size that resembles staging.
+* [ ] Identify the serving index(es).
+* [ ] Confirm row estimates are not wildly wrong (if they are: stats or predicate mismatch).
+* [ ] Confirm it is tenant-scoped and uses the tenant-leading index.
+
+**Common fixes**
+
+* Replace `IN (SELECT …)` with `EXISTS` for correlated checks.
+* Replace `ORDER BY … LIMIT` without index with an index that matches ordering.
+* Avoid exploding joins with JSON arrays; pre-extract.
+
+---
+
+# 9) Vacuum, bloat, and “why is disk growing”
+
+## Design to avoid bloat
+
+* Append-only for large docs and events.
+* If frequent updates are needed, isolate hot-updated columns into a smaller table.
+
+Example split:
+
+* `job_queue_payload` (stable)
+* `job_queue_state` (locked/status/attempts updated frequently)
+
+**Checklist**
+
+* [ ] Large frequently-updated JSONB tables have been questioned
+* [ ] Updates do not rewrite big TOAST values repeatedly
+* [ ] Retention is partition-drop where possible
+
+---
+
+# 10) Migration safety rules (prevent production locks)
+
+* Index creation: `CREATE INDEX CONCURRENTLY`.
+* Dropping indexes: `DROP INDEX CONCURRENTLY`.
+* New column with default on large table:
+
+  1. `ADD COLUMN` nullable
+  2. backfill in batches
+  3. `ALTER COLUMN SET NOT NULL`
+  4. add default if needed
+
+**Checklist**
+
+* [ ] No long-running `ALTER TABLE` on huge tables without plan
+* [ ] Any new NOT NULL constraint is staged safely
+
+---
+
+# 11) Stella Ops-specific schema guidance (SBOM/VEX/Finding)
+
+## Minimum recommended normalized tables
+
+Even if you keep raw SBOM/VEX JSON:
+
+* `sbom_document` (raw, immutable)
+* `sbom_component` (extracted components)
+* `vex_document` (raw, immutable)
+* `vex_statement` (extracted statements per CVE/component)
+* `finding` (facts: CVE ↔ component ↔ artifact ↔ scan_run)
+* `scan_manifest` (determinism: feed versions/hashes, policy hash)
+* `scan_run` (links results to manifest)
+
+**Key gap detectors**
+
+* If “find all artifacts affected by CVE X” is slow → missing `finding` indexing.
+* If “component search” is slow → missing `sbom_component` and its indexes.
+* If “replay this scan” is not exact → missing `scan_manifest` + feed import hashes.
+
+---
+
+# 12) Minimal “definition of done” for a new table/view
+
+A PR adding a table/view is incomplete unless it includes:
+
+* [ ] Table classification (SOR / projection / queue / event)
+* [ ] Primary key and idempotency unique key
+* [ ] Tenant scoping strategy
+* [ ] Index plan mapped to known queries
+* [ ] Retention plan (especially for event/projection tables)
+* [ ] Refresh/update plan if derived
+* [ ] Example query + `EXPLAIN` for the top 1–3 access patterns
+
+---
+
+If you want this as a single drop-in repo document, tell me the target path (e.g., `/docs/platform/postgres-table-view-guidelines.md`) and I will format it exactly as a team-facing guideline, including a one-page “Architecture/Performance Gaps” review form that engineers can paste into PR descriptions.
diff --git a/docs/product-advisories/archived/AR-REVIVE-PLAN.md b/docs/product-advisories/archived/AR-REVIVE-PLAN.md
deleted file mode 100644
index c7aff3cf5..000000000
--- a/docs/product-advisories/archived/AR-REVIVE-PLAN.md
+++ /dev/null
@@ -1,12 +0,0 @@
-# Archived Advisories Revival Plan (Stub)
-
-Use with sprint task 13 (ARCHIVED-GAPS-300-020).
-
-- Candidate advisories to revive:
-  - SBOM-Provenance-Spine
-  - Binary reachability (VB branch)
-  - Function-level VEX explainability
-  - PostgreSQL storage blueprint
-- Decide canonical schemas/recipes (provenance, reachability, PURL/Build-ID).
-- Document determinism seeds/SLOs, redaction/isolation rules, changelog/signing approach.
-- Mark supersedes/duplicates and PostgreSQL storage blueprint guardrails.