Merge branch 'main' of https://git.stella-ops.org/stella-ops.org/git.stella-ops.org

2025-12-13 02:22:54 +02:00
parent 999e26a48e d776e93b16
commit e00f6365da
4 changed files with 2977 additions and 0 deletions
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,750 @@
+Here’s a simple, practical way to score vulnerabilities that’s more auditable than plain CVSS: build a **deterministic score** from three reproducible inputs—**Reachability**, **Evidence**, and **Provenance**—so every number is explainable and replayable.
+
+---
+
+### Why move beyond CVSS?
+
+* **CVSS is context-light**: it rarely knows *your* call paths, configs, or runtime.
+* **Audits need proof**: regulators and customers increasingly ask, “show me how you got this score.”
+* **Teams need consistency**: the same image should get the same score across environments when inputs are identical.
+
+---
+
+### The scoring idea (plain English)
+
+Score = a weighted function of:
+
+1. **Reachability Depth (R)** — how close the vulnerable function is to a real entry point in *your* app (e.g., public HTTP route → handler → library call).
+2. **Evidence Density (E)** — how much concrete proof you have (stack traces, symbol hits, config toggles, feature flags, SCA vs. SAST vs. DAST vs. runtime).
+3. **Provenance Integrity (P)** — how trustworthy the artifact chain is (signed SBOM, DSSE attestations, SLSA/Rekor entries, reproducible build match).
+
+A compact, auditable formula you can start with:
+
+```
+NormalizedScore = W_R * f(R)  +  W_E * g(E)  +  W_P * h(P)
+```
+
+* Pick monotonic, bounded transforms (e.g., map to 0..1):
+
+  * f(R): inverse of hops (shorter path ⇒ higher value)
+  * g(E): weighted sum of evidence types (runtime>DAST>SAST>SCA, with decay for stale data)
+  * h(P): cryptographic/provenance checks (unsigned < signed < signed+attested < signed+attested+reproducible)
+
+Keep **W_R + W_E + W_P = 1** (e.g., 0.5, 0.35, 0.15 for reachability-first triage).
+
+---
+
+### What makes this “deterministic”?
+
+* Inputs are **machine-replayable**: call-graph JSON, evidence bundle (hashes + timestamps), provenance attestations.
+* The score is **purely a function of those inputs**, so anyone can recompute it later and match your result byte-for-byte.
+
+---
+
+### Minimal rubric (ready to implement)
+
+* **Reachability (R, 0..1)**
+
+  * 1.00 = vulnerable symbol called on a hot path from a public route (≤3 hops)
+  * 0.66 = reachable but behind uncommon feature flag or deep path (4–7 hops)
+  * 0.33 = only theoretically reachable (code present, no discovered path)
+  * 0.00 = dead/unreferenced code in this build
+* **Evidence (E, 0..1)** (sum, capped at 1.0)
+
+  * +0.6 runtime trace hitting the symbol
+  * +0.3 DAST/integ test activating vulnerable behavior
+  * +0.2 SAST precise sink match
+  * +0.1 SCA presence only (no call evidence)
+  * (Apply 10–30% decay if older than N days)
+* **Provenance (P, 0..1)**
+
+  * 0.0 unsigned/unknown origin
+  * 0.3 signed image only
+  * 0.6 signed + SBOM (hash-linked)
+  * 1.0 signed + SBOM + DSSE attestations + reproducible build match
+
+Example weights: `W_R=0.5, W_E=0.35, W_P=0.15`.
+
+---
+
+### How this plugs into **Stella Ops**
+
+* **Scanner** produces call-graphs & symbol maps (R).
+* **Vexer**/Evidence store aggregates SCA/SAST/DAST/runtime proofs with timestamps (E).
+* **Authority/Proof‑Graph** verifies signatures, SBOM↔image hash links, DSSE/Rekor (P).
+* **Policy Engine** applies the scoring formula (YAML policy) and emits a signed VEX note with the score + input hashes.
+* **Replay**: any audit can re-run the same policy with the same inputs and get the same score.
+
+---
+
+### Developer checklist (do this first)
+
+* Emit a **Reachability JSON** per build: entrypoints, hops, functions, edges, timestamps, hashes.
+* Normalize **Evidence Types** with IDs, confidence, freshness, and content hashes.
+* Record **Provenance Facts** (signing certs, SBOM digest, DSSE bundle, reproducible-build fingerprint).
+* Implement the **score function as a pure library** (no I/O), version it (e.g., `score.v1`), and include the version + inputs’ hashes in every VEX note.
+* Add a **30‑sec “Time‑to‑Evidence” UI**: click a score → see the exact call path, evidence list, and provenance checks.
+
+---
+
+### Why this helps compliance & sales
+
+* Every number is **auditable** (inputs + function are transparent).
+* Scores remain **consistent across air‑gapped sites** (deterministic, no hidden heuristics).
+* You can **prove reduction** after a fix (paths disappear, evidence decays, provenance improves).
+
+If you want, I can draft the YAML policy schema and a tiny .NET 10 library stub for `score.v1` so you can drop it into Stella Ops today.
+Below is an extended, **developer-ready implementation plan** to build the deterministic vulnerability score into **Stella Ops** (Scanner → Evidence/Vexer → Authority/Proof‑Graph → Policy Engine → UI/VEX output). I’m assuming a .NET-centric stack (since you mentioned .NET 10 earlier), but everything is laid out so the scoring core stays language-agnostic.
+
+---
+
+## 1) Extend the scoring model into a stable, “auditable primitive”
+
+### 1.1 Outputs you should standardize on
+
+Produce **two** signed artifacts per finding (plus optional UI views):
+
+1. **ScoreResult** (primary):
+
+* `riskScore` (0–100 integer)
+* `subscores` (each 0–100 integer): `baseSeverity`, `reachability`, `evidence`, `provenance`
+* `explain[]` (structured reasons, ordered deterministically)
+* `inputs` (digests of all upstream inputs)
+* `policy` (policy version + digest)
+* `engine` (engine version + digest)
+* `asOf` timestamp (the only “time” allowed to affect the result)
+
+2. **VEX note** (OpenVEX/CSAF-compatible wrapper):
+
+* references ScoreResult digest
+* embeds the score (optional) + the input digests
+* signed by Stella Ops Authority
+
+> Key audit requirement: anyone can recompute the score **offline** from the input bundle + policy + engine version.
+
+---
+
+## 2) Make determinism non-negotiable
+
+### 2.1 Determinism rules (implement as “engineering constraints”)
+
+These are the common ways deterministic systems become non-deterministic:
+
+* **No floating point** in scoring math. Use integer “basis points” and integer bucket tables.
+* **No implicit time**. Scoring takes `asOf` as an explicit input. Evidence “freshness” is computed as `asOf - evidence.timestamp`.
+* **Canonical serialization** for hashing:
+
+  * Use RFC-style canonical JSON (e.g., JCS) or a strict canonical CBOR profile.
+  * Sort keys and arrays deterministically.
+* **Stable ordering** for explanation lists:
+
+  * Always sort factors by `(factorId, contributingObjectDigest)`.
+
+### 2.2 Fixed-point scoring approach (recommended)
+
+Represent weights and multipliers as **basis points** (bps):
+
+* 100% = 10,000 bps
+* 1% = 100 bps
+
+Example: `totalScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
+
+---
+
+## 3) Extended score definition (v1)
+
+### 3.1 Subscores (0–100 integers)
+
+#### BaseSeverity (B)
+
+* Source: CVSS if present, else vendor severity, else default.
+* Normalize to 0–100:
+
+  * CVSS 0.0–10.0 → 0–100 by `B = round(CVSS * 10)`
+
+Keep it small weight so you’re “beyond CVSS” but still anchored.
+
+#### Reachability (R)
+
+Computed from reachability report (call-path depth + gating conditions).
+
+**Hop buckets** (example):
+
+* 0–2 hops: 100
+* 3 hops: 85
+* 4 hops: 70
+* 5 hops: 55
+* 6 hops: 45
+* 7 hops: 35
+* 8+ hops: 20
+* unreachable: 0
+
+**Gate multipliers** (apply multiplicatively in bps):
+
+* behind feature flag: ×7000
+* auth required: ×8000
+* only admin role: ×8500
+* non-default config: ×7500
+
+Final: `R = bucketScore * gateMultiplier / 10000`
+
+#### Evidence (E)
+
+Sum evidence “points” capped at 100, then apply freshness multiplier.
+
+Evidence points (example):
+
+* runtime trace hitting vulnerable symbol: +60
+* DAST / integration test triggers behavior: +30
+* SAST precise sink match: +20
+* SCA presence only: +10
+
+Freshness bucket multiplier (example):
+
+* age ≤ 7 days: ×10000
+* ≤ 30 days: ×9000
+* ≤ 90 days: ×7500
+* ≤ 180 days: ×6000
+* ≤ 365 days: ×4000
+* > 365: ×2000
+
+Final: `E = min(100, sum(points)) * freshness / 10000`
+
+#### Provenance (P)
+
+Based on verified supply-chain checks.
+
+Levels:
+
+* unsigned/unknown: 0
+* signed image: 30
+* signed + SBOM hash-linked to image: 60
+* signed + SBOM + DSSE attestations verified: 80
+* above + reproducible build match: 100
+
+### 3.2 Total score and overrides
+
+Weights (example):
+
+* `wB=1000` (10%)
+* `wR=4500` (45%)
+* `wE=3000` (30%)
+* `wP=1500` (15%)
+
+Total:
+
+* `riskScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
+
+Override examples (still deterministic, because they depend on evidence flags):
+
+* If `knownExploited=true` AND `R >= 70` → force score to 95+
+* If unreachable (`R=0`) AND only SCA evidence (`E<=10`) → clamp score ≤ 25
+
+---
+
+## 4) Canonical schemas (what to build first)
+
+### 4.1 ReachabilityReport (per artifact + vuln)
+
+Minimum fields:
+
+* `artifactDigest` (sha256 of image or build artifact)
+* `graphDigest` (sha256 of canonical call-graph representation)
+* `vulnId` (CVE/OSV/etc)
+* `vulnerableSymbol` (fully-qualified)
+* `entrypoints[]` (HTTP routes, queue consumers, CLI commands, cron handlers)
+* `shortestPath`:
+
+  * `hops` (int)
+  * `nodes[]` (ordered list of symbols)
+  * `edges[]` (optional)
+* `gates[]`:
+
+  * `type` (“featureFlag” | “authRequired” | “configNonDefault” | …)
+  * `detail` (string)
+* `computedAt` (timestamp)
+* `toolVersion`
+
+### 4.2 EvidenceBundle (per artifact + vuln)
+
+Evidence items are immutable and deduped by content hash.
+
+* `evidenceId` (content hash)
+* `artifactDigest`
+* `vulnId`
+* `type` (“SCA” | “SAST” | “DAST” | “RUNTIME” | “ADVISORY”)
+* `tool` (name/version)
+* `timestamp`
+* `confidence` (0–100)
+* `subject` (package, symbol, endpoint)
+* `payloadDigest` (hash of raw payload stored separately)
+
+### 4.3 ProvenanceReport (per artifact)
+
+* `artifactDigest`
+* `signatureChecks[]` (who signed, what key, result)
+* `sbomDigest` + `sbomType`
+* `attestations[]` (DSSE digests + verification result)
+* `transparencyLogRefs[]` (optional)
+* `reproducibleMatch` (bool)
+* `computedAt`
+* `toolVersion`
+* `verificationLogDigest`
+
+### 4.4 ScoreInput + ScoreResult
+
+**ScoreInput** should include:
+
+* `asOf`
+* `policyVersion`
+* digests for reachability/evidence/provenance/base severity source
+
+**ScoreResult** should include:
+
+* `riskScore`, `subscores`
+* `explain[]` (deterministic)
+* `engineVersion`, `policyDigest`
+* `inputs[]` (digests)
+* `resultDigest` (hash of canonical ScoreResult)
+* `signature` (Authority signs the digest)
+
+---
+
+## 5) Development implementation plan (phased, with deliverables + acceptance criteria)
+
+### Phase A — Foundations: schemas, hashing, policy format, test harness
+
+**Deliverables**
+
+* Canonical JSON format rules + hashing utilities (shared lib)
+* JSON Schemas for: ReachabilityReport, EvidenceItem, ProvenanceReport, ScoreInput, ScoreResult
+* “Golden fixture” repo: a set of input bundles and expected ScoreResults
+* Policy format `score.v1` (YAML or JSON) using **integer bps**
+
+**Acceptance criteria**
+
+* Same input bundle → identical `resultDigest` across:
+
+  * OS (Linux/Windows)
+  * CPU (x64/ARM64)
+  * runtime versions (supported .NET versions)
+* Fixtures run in CI and fail on any byte-level diff
+
+---
+
+### Phase B — Scoring engine (pure function library)
+
+**Deliverables**
+
+* `Stella.ScoreEngine` as a pure library:
+
+  * `ComputeScore(ScoreInputBundle) -> ScoreResult`
+  * `Explain(ScoreResult) -> structured explanation` (already embedded)
+* Policy parser + validator:
+
+  * weights sum to 10,000
+  * bucket tables monotonic
+  * override rules deterministic and total order
+
+**Acceptance criteria**
+
+* 100% deterministic tests passing (golden fixtures)
+* “Explain” always includes:
+
+  * subscores
+  * applied buckets
+  * applied gate multipliers
+  * freshness bucket selected
+  * provenance level selected
+* No non-deterministic dependencies (time, random, locale, float)
+
+---
+
+### Phase C — Evidence pipeline (Vexer / Evidence Store)
+
+**Deliverables**
+
+* Normalized evidence ingestion adapters:
+
+  * SCA ingest (from your existing scanner output)
+  * SAST ingest
+  * DAST ingest
+  * runtime trace ingest (optional MVP → “symbol hit” events)
+* Evidence Store service:
+
+  * immutability (append-only)
+  * dedupe by `evidenceId`
+  * query by `(artifactDigest, vulnId)`
+
+**Acceptance criteria**
+
+* Ingesting the same evidence twice yields identical state (idempotent)
+* Every evidence record can be exported as a bundle with content hashes
+* Evidence timestamps preserved; `asOf` drives freshness deterministically
+
+---
+
+### Phase D — Reachability analyzer (Scanner extension)
+
+**Deliverables**
+
+* Call-graph builder and symbol resolver:
+
+  * for .NET: IL-level call graph + ASP.NET route discovery
+* Reachability computation:
+
+  * compute shortest path hops from entrypoints to vulnerable symbol
+  * attach gating detections (config/feature/auth heuristics)
+* Reachability report emitter:
+
+  * emits ReachabilityReport with stable digests
+
+**Acceptance criteria**
+
+* Given the same build artifact, reachability report digest is stable
+* Paths are replayable and visualizable (nodes are resolvable)
+* Unreachable findings are explicitly marked and explainable
+
+---
+
+### Phase E — Provenance verification (Authority / Proof‑Graph)
+
+**Deliverables**
+
+* Verification pipeline:
+
+  * signature verification for artifact digest
+  * SBOM hash linking
+  * attestation verification (DSSE/in‑toto style)
+  * optional transparency log reference capture
+  * optional reproducible-build comparison input
+* ProvenanceReport emitter (signed verification log digest)
+
+**Acceptance criteria**
+
+* Verification is offline-capable if given the necessary bundles
+* Any failed check is captured with a deterministic error code + message
+* ProvenanceReport digest is stable for same inputs
+
+---
+
+### Phase F — Orchestration: “score a finding” workflow + VEX output
+
+**Deliverables**
+
+* Orchestrator service (or existing pipeline step) that:
+
+  1. receives a vulnerability finding
+  2. fetches reachability/evidence/provenance bundles
+  3. builds ScoreInput with `asOf`
+  4. computes ScoreResult
+  5. signs ScoreResult digest
+  6. emits VEX note referencing ScoreResult digest
+* Storage for ScoreResult + VEX note (immutable, versioned)
+
+**Acceptance criteria**
+
+* “Recompute” produces same ScoreResult digest if inputs unchanged
+* VEX note includes:
+
+  * policy version + digest
+  * engine version
+  * input digests
+  * score + subscores
+* End-to-end API returns “why” data in <1 round trip (cached)
+
+---
+
+### Phase G — UI: “Why this score?” and replay/export
+
+**Deliverables**
+
+* Findings view enhancements:
+
+  * score badge + risk bucket (Low/Med/High/Critical)
+  * click-through “Why this score”
+* “Why this score” panel:
+
+  * call path visualization (at least as an ordered list for MVP)
+  * evidence list with freshness + confidence
+  * provenance checks list (pass/fail)
+  * export bundle (inputs + policy + engine version) for audit replay
+
+**Acceptance criteria**
+
+* Any score is explainable in <30 seconds by a human reviewer
+* Exported bundle can reproduce score offline
+
+---
+
+### Phase H — Governance: policy-as-code, versioning, calibration, rollout
+
+**Deliverables**
+
+* Policy registry:
+
+  * store `score.v1` policies by org/project/environment
+  * approvals + change log
+* Versioning strategy:
+
+  * engine semantic versioning
+  * policy digest pinned in ScoreResult
+  * migration tooling (e.g., score.v1 → score.v2)
+* Rollout mechanics:
+
+  * shadow mode: compute score but don’t enforce
+  * enforcement gates: block deploy if score ≥ threshold
+
+**Acceptance criteria**
+
+* Policy changes never rewrite past scores
+* You can backfill new scores with a new policy version without ambiguity
+* Audit log shows: who changed policy, when, why (optional but recommended)
+
+---
+
+## 6) Engineering backlog (epics → stories → DoD)
+
+### Epic 1: Deterministic core
+
+* Story: implement canonical JSON + hashing
+* Story: implement fixed-point math helpers (bps)
+* Story: implement score.v1 buckets + overrides
+* DoD:
+
+  * no floats
+  * golden test suite
+  * deterministic explain ordering
+
+### Epic 2: Evidence normalization
+
+* Story: evidence schema + dedupe
+* Story: adapters (SCA/SAST/DAST/runtime)
+* Story: evidence query API
+* DoD:
+
+  * idempotent ingest
+  * bundle export with digests
+
+### Epic 3: Reachability
+
+* Story: entrypoint discovery for target frameworks
+* Story: call graph extraction
+* Story: shortest-path computation
+* Story: gating heuristics
+* DoD:
+
+  * stable digests
+  * replayable paths
+
+### Epic 4: Provenance
+
+* Story: verify signatures
+* Story: verify SBOM link
+* Story: verify attestations
+* Story: reproducible match input support
+* DoD:
+
+  * deterministic error codes
+  * stable provenance scoring
+
+### Epic 5: End-to-end score + VEX
+
+* Story: orchestration
+* Story: ScoreResult signing
+* Story: VEX generation and storage
+* DoD:
+
+  * recompute parity
+  * verifiable signatures
+
+### Epic 6: UI
+
+* Story: score badge + buckets
+* Story: why panel
+* Story: export bundle + recompute button
+* DoD:
+
+  * human explainability
+  * offline replay works
+
+---
+
+## 7) APIs to implement (minimal but complete)
+
+### 7.1 Compute score (internal)
+
+* `POST /api/score/compute`
+
+  * input: `ScoreInput` + references or inline bundles
+  * output: `ScoreResult`
+
+### 7.2 Get score (product)
+
+* `GET /api/findings/{findingId}/score`
+
+  * returns latest ScoreResult + VEX reference
+
+### 7.3 Explain score
+
+* `GET /api/findings/{findingId}/score/explain`
+
+  * returns `explain[]` + call path + evidence list + provenance checks
+
+### 7.4 Export replay bundle
+
+* `GET /api/findings/{findingId}/score/bundle`
+
+  * returns a tar/zip containing:
+
+    * ScoreInput
+    * policy file
+    * reachability/evidence/provenance reports
+    * engine version manifest
+
+---
+
+## 8) Testing strategy (what to automate early)
+
+### Unit tests
+
+* bucket selection correctness
+* gate multiplier composition
+* evidence freshness bucketing
+* provenance level mapping
+* override rule ordering
+
+### Golden fixtures
+
+* fixed input bundles → fixed ScoreResult digest
+* run on every supported platform/runtime
+
+### Property-based tests
+
+* monotonicity:
+
+  * fewer hops should not reduce R
+  * more evidence points should not reduce E
+  * stronger provenance should not reduce P
+
+### Integration tests
+
+* full pipeline: finding → bundles → score → VEX
+* “recompute” parity tests
+
+---
+
+## 9) Operational concerns and hardening
+
+### Performance
+
+* Cache reachability per `(artifactDigest, vulnId, symbol)`
+* Cache provenance per `artifactDigest`
+* Evidence queries should be indexed by `(artifactDigest, vulnId, type)`
+
+### Security
+
+* Treat evidence ingestion as untrusted input:
+
+  * strict schema validation
+  * content-hash dedupe prevents tampering via overwrite
+* Sign ScoreResults and VEX notes
+* RBAC:
+
+  * who can change policy
+  * who can override scores (if allowed at all)
+
+### Data retention
+
+* Evidence payloads can be large; keep digests + store raw payloads in object storage
+* Keep a “minimal replay bundle” always (schemas + digests + policy + engine)
+
+---
+
+## 10) Concrete “MVP first” slice (smallest valuable product)
+
+If you want a crisp MVP that still satisfies “auditable determinism”:
+
+1. Scoring engine (`B + R + E + P`), fixed-point, golden tests
+2. Evidence store (SCA + runtime optional)
+3. Reachability: only hop depth from HTTP routes to symbol (no fancy gates)
+4. Provenance: signed image + SBOM link only
+5. UI: score + “why” panel showing:
+
+   * hops/path list
+   * evidence list
+   * provenance checklist
+6. Emit a signed VEX note containing the score + input digests
+
+That MVP already proves the core differentiator: **deterministic, replayable risk scoring**.
+
+---
+
+## 11) Starter policy file (score.v1) using basis points
+
+Here’s a good “real implementation” starting point (int-only):
+
+```yaml
+policyVersion: score.v1
+weightsBps:
+  baseSeverity: 1000
+  reachability: 4500
+  evidence: 3000
+  provenance: 1500
+
+reachability:
+  hopBuckets:
+    - { maxHops: 2, score: 100 }
+    - { maxHops: 3, score: 85 }
+    - { maxHops: 4, score: 70 }
+    - { maxHops: 5, score: 55 }
+    - { maxHops: 6, score: 45 }
+    - { maxHops: 7, score: 35 }
+    - { maxHops: 9999, score: 20 }
+  unreachableScore: 0
+  gateMultipliersBps:
+    featureFlag: 7000
+    authRequired: 8000
+    adminOnly: 8500
+    nonDefaultConfig: 7500
+
+evidence:
+  points:
+    runtime: 60
+    dast: 30
+    sast: 20
+    sca: 10
+  freshnessBuckets:
+    - { maxAgeDays: 7, multiplierBps: 10000 }
+    - { maxAgeDays: 30, multiplierBps: 9000 }
+    - { maxAgeDays: 90, multiplierBps: 7500 }
+    - { maxAgeDays: 180, multiplierBps: 6000 }
+    - { maxAgeDays: 365, multiplierBps: 4000 }
+    - { maxAgeDays: 99999, multiplierBps: 2000 }
+
+provenance:
+  levels:
+    unsigned: 0
+    signed: 30
+    signedWithSbom: 60
+    signedWithSbomAndAttestations: 80
+    reproducible: 100
+
+overrides:
+  - name: knownExploitedAndReachable
+    when:
+      flags:
+        knownExploited: true
+      minReachability: 70
+    setScore: 95
+
+  - name: unreachableAndOnlySca
+    when:
+      maxReachability: 0
+      maxEvidence: 10
+    clampMaxScore: 25
+```
+
+---
+
+If you want, I can also include a **repo layout + CI “golden fixture” test runner** (dotnet test + cross-platform determinism checks) and a **.NET 10 ScoreEngine skeleton** that enforces: no floats, canonical JSON hashing, and stable explanation ordering.
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,744 @@
+Here’s a simple, high‑leverage UX metric to add to your pipeline run view that will immediately make DevOps feel faster and calmer:
+
+# Time‑to‑First‑Signal (TTFS)
+
+**What it is:** the time from opening a run’s details page until the UI renders the **first actionable insight** (e.g., “Stage `build` failed – `dotnet restore` 401 – token expired”).
+**Why it matters:** engineers don’t need *all* data instantly—just the first trustworthy clue to start acting. Lower TTFS = quicker triage, lower stress, tighter MTTR.
+
+---
+
+## What counts as a “first signal”
+
+* Failed stage + reason (exit code, key log line, failing test name)
+* Degraded but actionable status (e.g., flaky test signature)
+* Policy gate block with the specific rule that failed
+* Reachability‑aware security finding that blocks deploy (one concrete example, not the whole list)
+
+> Not a signal: spinners, generic “loading…”, or unactionable counts.
+
+---
+
+## How to optimize TTFS (practical steps)
+
+1. **Deferred loading (prioritize critical panes):**
+
+   * Render header + failing stage card first; lazy‑load artifacts, full logs, and graphs after.
+   * Pre‑expand the *first failing node* in the stage graph.
+
+2. **Log pre‑indexing at ingest:**
+
+   * During CI, stream logs into chunks keyed by `[jobId, phase, severity, firstErrorLine]`.
+   * Extract the **first error tuple** (timestamp, step, message) and store it next to the job record.
+   * On UI open, fetch only that tuple (sub‑100 ms) before fetching the rest.
+
+3. **Cached summaries:**
+
+   * Persist a tiny JSON “run.summary.v1” (status, first failing stage, first error line, blocking policies) in Redis/Postgres.
+   * Invalidate on new job events; always serve this summary first.
+
+4. **Edge prefetch:**
+
+   * When the runs table is visible, prefetch summaries for rows in viewport so details pages open “warm”.
+
+5. **Compress + cap first log burst:**
+
+   * Send the first **5–10 error lines** (already extracted) immediately; stream the rest.
+
+---
+
+## Instrumentation (so you can prove it)
+
+Emit these points as telemetry:
+
+* `ttfs_start`: when the run details route is entered (or when tab becomes visible)
+* `ttfs_signal_rendered`: when the first actionable card is in the DOM
+* `ttfs_ms = signal_rendered - start`
+* Dimensions: `pipeline_provider`, `repo`, `branch`, `run_type` (PR/main), `device`, `release`, `network_state`
+
+**SLO:** *P50 ≤ 700 ms, P95 ≤ 2.5 s* (adjust to your infra).
+
+**Dashboards to track:**
+
+* TTFS distribution (P50/P90/P95) by release
+* Correlate TTFS with bounce rate and “open → rerun” delay
+* Error budget: % of views with TTFS > 3 s
+
+---
+
+## Minimal backend contract (example)
+
+```json
+GET /api/runs/{runId}/first-signal
+{
+  "runId": "123",
+  "firstSignal": {
+    "type": "stage_failed",
+    "stage": "build",
+    "step": "dotnet restore",
+    "message": "401 Unauthorized: token expired",
+    "at": "2025-12-11T09:22:31Z",
+    "artifact": { "kind": "log", "range": {"start": 1880, "end": 1896} }
+  },
+  "summaryEtag": "W/\"a1b2c3\""
+}
+```
+
+---
+
+## Frontend pattern (Angular 17, signal‑first)
+
+* Fire `first-signal` request in route resolver.
+* Render `FirstSignalCard` immediately.
+* Lazy‑load stage graph, full logs, security panes.
+* Fire `ttfs_signal_rendered` when `FirstSignalCard` enters viewport.
+
+---
+
+## CI adapter hints (GitLab/GitHub/Azure)
+
+* Hook on job status webhooks to compute & store the first error tuple.
+* For GitLab: scan `trace` stream for first `ERRO|FATAL|##[error]` match; store to DB table `ci_run_first_signal(run_id, stage, step, message, t)`.
+
+---
+
+## “Good TTFS” acceptance tests
+
+* Run with early fail → first signal < 1 s, shows exact command + exit code.
+* Run with policy gate fail → rule name + fix hint visible first.
+* Offline/slow network → cached summary still renders an actionable hint.
+
+---
+
+## Copy to put in your UX guidelines
+
+> “Optimize **Time‑to‑First‑Signal (TTFS)** above all. Users must see one trustworthy, actionable clue within 1 second on a warm path—even if the rest of the UI is still loading.”
+
+If you want, I can sketch the exact DB schema for the pre‑indexed log tuples and the Angular resolver + telemetry hooks next.
+Below is an extended, end‑to‑end implementation plan for **Time‑to‑First‑Signal (TTFS)** that you can drop into your backlog. It includes architecture, data model, API contracts, frontend work, observability, QA, and rollout—structured as epics/phases with “definition of done” and acceptance criteria.
+
+---
+
+# Scope extension
+
+## What we’re building
+
+A run details experience that renders **one actionable clue** fast—before loading heavy UI like full logs, graphs, artifacts.
+
+**“First signal”** is a small payload derived from run/job events and the earliest meaningful error evidence (stage/step + key log line(s) + reason/classification).
+
+## What we’re extending beyond the initial idea
+
+1. **First‑Signal Quality** (not just speed)
+
+   * Classify error type (auth, dependency, compilation, test, infra, policy, timeout).
+   * Identify “culprit step” and a stable “signature” for dedupe and search.
+2. **Progressive disclosure UX**
+
+   * Summary → First signal card → expanded context (stage graph, logs, artifacts).
+3. **Provider‑agnostic ingestion**
+
+   * Adapters for GitLab/GitHub/Azure (or your CI provider).
+4. **Caching + prefetch**
+
+   * Warm open from list/table, with ETags and stale‑while‑revalidate.
+5. **Observability & SLOs**
+
+   * TTFS metrics, dashboards, alerting, and quality metrics (false signals).
+6. **Rollout safety**
+
+   * Feature flags, canary, A/B gating, and a guaranteed fallback path.
+
+---
+
+# Success criteria
+
+## Primary metric
+
+* **TTFS (ms)**: time from details page route enter → first actionable signal rendered.
+
+## Targets (example SLOs)
+
+* **P50 ≤ 700 ms**, **P95 ≤ 2500 ms** on warm path.
+* **Cold path**: P95 ≤ 4000 ms (depends on infra).
+
+## Secondary outcome metrics
+
+* **Open→Action time**: time from opening run to first user action (rerun, cancel, assign, open failing log line).
+* **Bounce rate**: close page within 10 seconds without interaction.
+* **MTTR proxy**: time from failure to first rerun or fix commit.
+
+## Quality metrics
+
+* **Signal availability rate**: % of run views that show a first signal card within 3s.
+* **Signal accuracy score** (sampled): engineer confirms “helpful vs not”.
+* **Extractor failure rate**: parsing errors / missing mappings / timeouts.
+
+---
+
+# Architecture overview
+
+## Data flow
+
+1. **CI provider events** (job started, job finished, stage failed, log appended) land in your backend.
+2. **Run summarizer** maintains:
+
+   * `run_summary` (small JSON)
+   * `first_signal` (small, actionable payload)
+3. **UI opens run details**
+
+   * Immediately calls `GET /runs/{id}/first-signal` (or `/summary`).
+   * Renders FirstSignalCard as soon as payload arrives.
+4. Background fetches:
+
+   * Stage graph, full logs, artifacts, security scans, trends.
+
+## Key decision: where to compute first signal
+
+* **Option A: at ingest time (recommended)**
+  Compute first signal when logs/events arrive, store it, serve it instantly.
+* **Option B: on demand**
+  Compute when user opens run details (simpler initially, worse TTFS and load).
+
+---
+
+# Data model
+
+## Tables (relational example)
+
+### `ci_run`
+
+* `run_id (pk)`
+* `provider`
+* `repo_id`
+* `branch`
+* `status`
+* `created_at`, `updated_at`
+
+### `ci_job`
+
+* `job_id (pk)`
+* `run_id (fk)`
+* `stage_name`
+* `job_name`
+* `status`
+* `started_at`, `finished_at`
+
+### `ci_log_chunk`
+
+* `chunk_id (pk)`
+* `job_id (fk)`
+* `seq` (monotonic)
+* `byte_start`, `byte_end` (range into blob)
+* `first_error_line_no` (nullable)
+* `first_error_excerpt` (nullable, short)
+* `severity_max` (info/warn/error)
+
+### `ci_run_summary`
+
+* `run_id (pk)`
+* `version` (e.g., `1`)
+* `etag` (hash)
+* `summary_json` (small, 1–5 KB)
+* `updated_at`
+
+### `ci_first_signal`
+
+* `run_id (pk)`
+* `etag`
+* `signal_json` (small, 0.5–2 KB)
+* `quality_flags` (bitmask or json)
+* `updated_at`
+
+## Cache layer
+
+* Redis keys:
+
+  * `run:{runId}:summary:v1`
+  * `run:{runId}:first-signal:v1`
+* TTL: generous but safe (e.g., 24h) with “write‑through” on event updates.
+
+---
+
+# First signal definition
+
+## `FirstSignal` object (recommended shape)
+
+```json
+{
+  "runId": "123",
+  "computedAt": "2025-12-12T09:22:31Z",
+  "status": "failed",
+  "firstSignal": {
+    "type": "stage_failed",
+    "classification": "dependency_auth",
+    "stage": "build",
+    "job": "build-linux-x64",
+    "step": "dotnet restore",
+    "message": "401 Unauthorized: token expired",
+    "signature": "dotnet-restore-401-unauthorized",
+    "log": {
+      "jobId": "job-789",
+      "lines": [
+        "error : Response status code does not indicate success: 401 (Unauthorized).",
+        "error : The token is expired."
+      ],
+      "range": { "start": 1880, "end": 1896 }
+    },
+    "suggestedActions": [
+      { "label": "Rotate token", "type": "doc", "target": "internal://docs/tokens" },
+      { "label": "Rerun job", "type": "action", "target": "rerun-job:job-789" }
+    ]
+  },
+  "etag": "W/\"a1b2c3\""
+}
+```
+
+### Notes
+
+* `signature` should be stable for grouping.
+* `suggestedActions` is optional but hugely valuable (even 1–2 actions).
+
+---
+
+# APIs
+
+## 1) First signal endpoint
+
+**GET** `/api/runs/{runId}/first-signal`
+
+Headers:
+
+* `If-None-Match: W/"..."` supported
+* Response includes `ETag` and `Cache-Control`
+
+Responses:
+
+* `200`: full first signal object
+* `304`: not modified
+* `404`: run not found
+* `204`: run exists but signal not available yet (rare; should degrade gracefully)
+
+## 2) Summary endpoint (optional but useful)
+
+**GET** `/api/runs/{runId}/summary`
+
+* Includes: status, first failing stage/job, timestamps, blocking policies, artifact counts.
+
+## 3) SSE / WebSocket updates (nice-to-have)
+
+**GET** `/api/runs/{runId}/events` (SSE)
+
+* Push new signal or summary updates in near real-time while user is on the page.
+
+---
+
+# Frontend implementation plan (Angular 17)
+
+## UX behavior
+
+1. **Route enter**
+
+   * Start TTFS timer.
+2. Render instantly:
+
+   * Title, status badge, pipeline metadata (run id, commit, branch).
+   * Skeleton for details area.
+3. Fetch first signal:
+
+   * Render `FirstSignalCard` immediately when available.
+   * Fire telemetry event when card is **in DOM and visible**.
+4. Lazy-load:
+
+   * Stage graph
+   * Full logs viewer
+   * Artifacts list
+   * Security findings
+   * Trends, flaky tests, etc.
+
+## Angular structure
+
+* `RunDetailsResolver` (or `resolveFn`) requests first signal.
+* `RunDetailsComponent` uses signals to render quickly.
+* `FirstSignalCardComponent` is standalone + minimal deps.
+
+## Prefetch strategy from runs list view
+
+* When the runs table is visible, prefetch summaries/first signals for items in viewport:
+
+  * Use `IntersectionObserver` to prefetch only visible rows.
+  * Store results in an in-memory cache (e.g., `Map<runId, FirstSignal>`).
+  * Respect ETag to avoid redundant payloads.
+
+## Telemetry hooks
+
+* `ttfs_start`: route activation + tab visible
+* `ttfs_signal_rendered`: FirstSignalCard attached and visible
+* Dimensions: provider, repo, branch, run_type, release_version, network_state
+
+---
+
+# Backend implementation plan
+
+## Summarizer / First-signal service
+
+A service or module that:
+
+* subscribes to run/job events
+* receives log chunks (or pointers)
+* computes and stores:
+
+  * `run_summary`
+  * `first_signal`
+* publishes updates (optional) to an event stream for SSE
+
+### Concurrency rule
+
+First signal should be set once per run unless a “better” signal appears:
+
+* if current signal is missing → set
+* if current signal is “generic” and new one is “specific” → replace
+* otherwise keep (avoid churn)
+
+---
+
+# Extraction & classification logic
+
+## Minimum viable extractor (Phase 1)
+
+* Heuristics:
+
+  * first match among patterns: `FATAL`, `ERROR`, `##[error]`, `panic:`, `Unhandled exception`, `npm ERR!`, `BUILD FAILED`, etc.
+  * plus provider-specific fail markers
+* Pull:
+
+  * stage/job/step context (from job metadata or step boundaries)
+  * 5–10 log lines around first error line
+
+## Improved extractor (Phase 2+)
+
+* Language/tool specific rules:
+
+  * dotnet, maven/gradle, npm/yarn/pnpm, python/pytest, go test, docker build, terraform, helm
+* Add `classification` and `signature`:
+
+  * normalize common errors:
+
+    * auth expired/forbidden
+    * missing dependency / DNS / TLS
+    * compilation error
+    * test failure (include test name)
+    * infra capacity / agent lost
+    * policy gate failure
+
+## Guardrails
+
+* **Secret redaction**: before storing excerpts, run your existing redaction pipeline.
+* **Payload cap**: cap message length and excerpt lines.
+* **PII discipline**: avoid including arbitrary stack traces if they contain sensitive paths; include only key lines.
+
+---
+
+# Development plan by phases (epics)
+
+Each phase below includes deliverables + acceptance criteria. You can treat each as a sprint/iteration.
+
+---
+
+## Phase 0 — Baseline and alignment
+
+### Deliverables
+
+* Baseline TTFS measurement (current behavior)
+* Definition of “actionable signal” and priority rules
+* Performance budget for run details view
+
+### Tasks
+
+* Add client-side telemetry for current page load steps:
+
+  * route enter, summary loaded, logs loaded, graph loaded
+* Measure TTFS proxy today (likely “time to status shown”)
+* Identify top 20 failure modes in your CI (from historical logs)
+
+### Acceptance criteria
+
+* Dashboard shows baseline P50/P95 for current experience.
+* “First signal” contract signed off with UI + backend teams.
+
+---
+
+## Phase 1 — Data model and storage
+
+### Deliverables
+
+* DB migrations for `ci_run_summary` and `ci_first_signal`
+* Redis cache keys and invalidation strategy
+* ADR: where summaries live and how they update
+
+### Tasks
+
+* Create tables and indices:
+
+  * index on `run_id`, `updated_at`, `provider`
+* Add serializer/deserializer for `summary_json` and `signal_json`
+* Implement ETag generation (hash of JSON payload)
+
+### Acceptance criteria
+
+* Can store and retrieve summary + first signal for a run in < 50ms (DB) and < 10ms (cache).
+* ETag works end-to-end.
+
+---
+
+## Phase 2 — Ingestion and first signal computation
+
+### Deliverables
+
+* First-signal computation module
+* Provider adapter integration points (webhook consumers)
+* “first error tuple” extraction from logs
+
+### Tasks
+
+* On job log append:
+
+  * scan incrementally for first error markers
+  * store excerpt + line range + job/stage/step mapping
+* On job finish/fail:
+
+  * finalize first signal with best known context
+* Implement the “better signal replaces generic” rule
+
+### Acceptance criteria
+
+* For a known failing run, API returns first signal without reading full log blob.
+* Computation does not exceed a small CPU budget per log chunk (guard with limits).
+* Extraction failure rate < 1% for sampled runs (initial).
+
+---
+
+## Phase 3 — API endpoints and caching
+
+### Deliverables
+
+* `/runs/{id}/first-signal` endpoint
+* Optional `/runs/{id}/summary`
+* Cache-control + ETag support
+* Access control checks consistent with existing run authorization
+
+### Tasks
+
+* Serve cached first signal first; fallback to DB
+* If missing:
+
+  * return `204` (or a “pending” object) and allow UI fallback
+* Add server-side metrics:
+
+  * endpoint latency, cache hit rate, payload size
+
+### Acceptance criteria
+
+* Endpoint P95 latency meets target (e.g., < 200ms internal).
+* Cache hit rate is high for active runs (after prefetch).
+
+---
+
+## Phase 4 — Frontend progressive rendering
+
+### Deliverables
+
+* FirstSignalCard component
+* Route resolver + local cache
+* Prefetch on runs list view
+* Telemetry for TTFS
+
+### Tasks
+
+* Render shell immediately
+* Fetch and render first signal
+* Lazy-load heavy panels using `@defer` / dynamic imports
+* Implement “open failing stage” default behavior
+
+### Acceptance criteria
+
+* In throttled network test, first signal card appears significantly earlier than logs and graphs.
+* `ttfs_signal_rendered` fires exactly once per view, with correct dimensions.
+
+---
+
+## Phase 5 — Observability, dashboards, and alerting
+
+### Deliverables
+
+* TTFS dashboards by:
+
+  * provider, repo, run type, release version
+* Alerts:
+
+  * P95 regression threshold
+* Quality dashboard:
+
+  * availability rate, extraction failures, “generic signal rate”
+
+### Tasks
+
+* Create event pipeline for telemetry into your analytics system
+* Define SLO/error budget alerts
+* Add tracing (OpenTelemetry) for endpoint and summarizer
+
+### Acceptance criteria
+
+* You can correlate TTFS with:
+
+  * bounce rate
+  * open→action time
+* You can pinpoint whether regressions are backend, frontend, or provider‑specific.
+
+---
+
+## Phase 6 — QA, performance testing, rollout
+
+### Deliverables
+
+* Automated tests
+* Feature flag + gradual rollout
+* A/B experiment (optional)
+
+### Tasks
+
+**Testing**
+
+* Unit tests:
+
+  * extractor patterns
+  * classification rules
+* Integration tests:
+
+  * simulated job logs with known outcomes
+* E2E (Playwright/Cypress):
+
+  * verify first signal appears before logs
+  * verify fallback path works if endpoint fails
+* Performance tests:
+
+  * cold cache vs warm cache
+  * throttled CPU/network profiles
+
+**Rollout**
+
+* Feature flag:
+
+  * enabled for internal users first
+  * ramp by repo or percentage
+* Monitor key metrics during ramp:
+
+  * TTFS P95
+  * API error rate
+  * UI error rate
+  * cache miss spikes
+
+### Acceptance criteria
+
+* No increase in overall error rates.
+* TTFS improves at least X% for a meaningful slice of users (define X from baseline).
+* Fallback UX remains usable when signals are unavailable.
+
+---
+
+# Backlog examples (ready-to-create Jira tickets)
+
+## Epic: Run summary and first signal storage
+
+* Create `ci_first_signal` table
+* Create `ci_run_summary` table
+* Implement ETag hashing
+* Implement Redis caching layer
+* Add admin/debug endpoint (internal only) to inspect computed signals
+
+## Epic: Log chunk extraction
+
+* Implement incremental log scanning
+* Store first error excerpt + range
+* Map excerpt to job + step
+* Add redaction pass to excerpts
+
+## Epic: Run details progressive UI
+
+* FirstSignalCard UI component
+* Lazy-load logs viewer
+* Default to opening failing stage
+* Prefetch signals in runs list
+
+## Epic: Telemetry and dashboards
+
+* Add `ttfs_start` and `ttfs_signal_rendered`
+* Add endpoint latency metrics
+* Build dashboards + alerts
+* Add sampling for “signal helpfulness” feedback
+
+---
+
+# Risk register and mitigations
+
+## Risk: First signal is wrong/misleading
+
+* Mitigation:
+
+  * track “generic signal rate” and “corrected by user” feedback
+  * classification confidence scoring
+  * always provide quick access to full logs as fallback
+
+## Risk: Logs contain secrets
+
+* Mitigation:
+
+  * redact excerpts before storing/serving
+  * cap excerpt lines and length
+  * keep raw logs behind existing permissions
+
+## Risk: Increased ingest CPU cost
+
+* Mitigation:
+
+  * incremental scanning with early stop after first error captured
+  * limit scanning per chunk
+  * sample/skip overly large logs until job completion
+
+## Risk: Cache invalidation bugs
+
+* Mitigation:
+
+  * ETag-based correctness
+  * versioned keys: `:v1`
+  * “write-through” cache updates from summarizer
+
+---
+
+# Definition of Done checklist
+
+A phase is “done” when:
+
+* ✅ TTFS measured with reliable client events
+* ✅ FirstSignalCard renders from `/first-signal` endpoint
+* ✅ ETag caching works
+* ✅ Fallback path is solid (no blank screens)
+* ✅ Dashboards exist and are actively watched during rollout
+* ✅ Security review completed for log excerpts/redaction
+* ✅ Load tests show no unacceptable backend regressions
+
+---
+
+# Optional enhancements after initial launch
+
+1. **Next-step recommendations**
+   Add action suggestions and deep links (rotate token, open failing test, open doc).
+2. **Flaky test / known issue detection**
+   Show “this matches known flaky signature” with last-seen frequency.
+3. **“Compare to last green”**
+   Summarize what changed since last successful run (commit diff, dependency bump).
+4. **SSE live updates**
+   Update first signal as soon as failure occurs while user watches.
+
+---
+
+If you tell me your current backend stack (Node/Go/.NET), log storage (S3/Elastic/Loki), and which CI providers you support, I can translate this into a concrete set of modules/classes, exact schema migrations, and the Angular routing + signals code structure you’d implement.
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,643 @@
+Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast.
+
+# Replay Fidelity (what, why, how)
+
+**What it is:** the share of historical scans that reproduce **bit‑for‑bit** when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.
+
+**Why you want it:** it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.
+
+---
+
+## The metric
+
+* **Per‑scan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`.
+* **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`.
+* **Breakdown:** also track by scanner, language, image base, feed version, and environment.
+
+---
+
+## What must be captured in the scan manifest
+
+* Exact source refs (image digest / repo SHA), container layers’ digests
+* Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
+* Feed snapshots (CVE DB, OVAL, vendor advisories) as **content‑addressed** bundles
+* Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
+* Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy
+
+---
+
+## Pass/Fail rules you can ship
+
+* **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
+* **Warn:** Any bucket drops by ≥ 2% week‑over‑week
+* **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95
+
+---
+
+## Minimal replay harness (outline)
+
+1. Pick N historical scans (e.g., last 200 or stratified by image language).
+2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds).
+3. Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
+4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256.
+5. Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code).
+
+---
+
+## Dashboard (what to show)
+
+* Fidelity % (30/90‑day) + sparkline
+* Top offenders (by language/scanner/policy set)
+* “Cause of mismatch” histogram (feed vs runtime vs code vs policy)
+* Click‑through: deterministic diff (e.g., which CVEs flipped and why)
+
+---
+
+## Quick wins for Stella Ops
+
+* Treat **feeds as immutable snapshots** (content‑addressed tar.zst) and record their digest in each scan.
+* Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network).
+* Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
+* Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays.
+* Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`.
+
+Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres?
+Below is an extended “Replay Fidelity” design **plus a concrete development implementation plan** you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules.
+
+---
+
+## 1) Extend the concept: Replay Fidelity as a product capability
+
+### 1.1 Fidelity levels (so you can be strict without being brittle)
+
+Instead of a single yes/no, define **tiers** that you can report and gate on:
+
+1. **Bitwise Fidelity (BF)**
+
+   * *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization.
+   * *Use:* strongest auditability, catch ordering/nondeterminism.
+
+2. **Semantic Fidelity (SF)**
+
+   * *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps).
+   * *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
+   * *Use:* protects you from “cosmetic diffs” and helps triage.
+
+3. **Policy Fidelity (PF)**
+
+   * *Definition:* Final policy decision (pass/fail + reason codes) matches.
+   * *Use:* useful when outputs may evolve but governance outcome must remain stable.
+
+**Recommended reporting:**
+
+* Dashboard shows BF, SF, PF together.
+* Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally.
+
+---
+
+### 1.2 “Why did it drift?”—Mismatch classification taxonomy
+
+When a replay fails, auto-tag the cause so humans don’t diff JSON by hand.
+
+**Primary mismatch classes**
+
+* **Feed drift:** CVE/OVAL/vendor advisory snapshot differs.
+* **Policy drift:** policy/lattice/rules differ (or default rule set changed).
+* **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences.
+* **Scanner drift:** scanner binary build differs or dependency versions changed.
+* **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic.
+* **External IO:** network calls, “latest” resolution, remote package registry changes.
+
+**Output:** a `mismatch_reason` plus a short `diff_summary`.
+
+---
+
+### 1.3 Deterministic “scan envelope” design
+
+A replay only works if the scan is fully specified.
+
+**Scan envelope components**
+
+* **Inputs:** image digest, repo commit SHA, build provenance, layers digests.
+* **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles.
+* **Feeds:** content-addressed feed bundle digests (see §2.3).
+* **Policy/rules:** git commit SHA + content digest of compiled rules.
+* **Environment:** OS/arch, tz/locale, “clock mode”, network mode, CPU count.
+* **Normalization:** “canonicalization version” for SBOM/VEX/findings.
+
+---
+
+### 1.4 Canonicalization so “bitwise” is meaningful
+
+To make BF achievable:
+
+* Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
+* Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames)
+* Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)`
+* Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)
+
+---
+
+### 1.5 Sampling strategy
+
+You don’t need to replay everything.
+
+**Nightly sample:** stratified by:
+
+* language ecosystem (npm, pip, maven, go, rust…)
+* scanner engine
+* base OS
+* “regulatory tier”
+* image size/complexity
+
+**Plus:** always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.
+
+---
+
+## 2) Technical architecture blueprint
+
+### 2.1 System components
+
+1. **Manifest Writer (in the scan pipeline)**
+
+   * Produces `ScanManifest v1` JSON
+   * Records all digests and versions
+
+2. **Artifact Store**
+
+   * Stores SBOM, findings, VEX, evidence blobs
+   * Stores canonical hashes for BF checks
+
+3. **Feed Snapshotter**
+
+   * Periodically builds immutable feed bundles
+   * Content-addressed (digest-keyed)
+   * Stores metadata (source URLs, generation timestamp, signature)
+
+4. **Replay Orchestrator**
+
+   * Chooses historical scans to replay
+   * Launches “replay executor” jobs
+
+5. **Replay Executor**
+
+   * Runs scanner in pinned container image
+   * Network off, tz fixed, clock policy applied
+   * Produces new artifacts + hashes
+
+6. **Diff & Scoring Engine**
+
+   * Computes BF/SF/PF
+   * Generates mismatch classification + diff summary
+
+7. **Metrics + UI Dashboard**
+
+   * Prometheus metrics
+   * UI for drill-down diffs
+
+---
+
+### 2.2 Data model (Postgres-friendly)
+
+**Core tables**
+
+* `scan_manifests`
+
+  * `scan_id (pk)`
+  * `manifest_json`
+  * `manifest_sha256`
+  * `created_at`
+* `scan_artifacts`
+
+  * `scan_id (fk)`
+  * `artifact_type` (sbom|findings|vex|evidence)
+  * `artifact_uri`
+  * `canonical_sha256`
+  * `schema_version`
+* `feed_snapshots`
+
+  * `feed_digest (pk)`
+  * `bundle_uri`
+  * `sources_json`
+  * `generated_at`
+  * `signature`
+* `replay_runs`
+
+  * `replay_id (pk)`
+  * `original_scan_id (fk)`
+  * `status` (queued|running|passed|failed)
+  * `bf_match bool`, `sf_match bool`, `pf_match bool`
+  * `mismatch_reason`
+  * `diff_summary_json`
+  * `started_at`, `finished_at`
+  * `executor_env_json` (arch, tz, cpu, image digest)
+
+**Indexes**
+
+* `(created_at)` for sampling windows
+* `(mismatch_reason, finished_at)` for triage
+* `(scanner_version, ecosystem)` for breakdown dashboards
+
+---
+
+### 2.3 Feed Snapshotting (the key to long-term replay)
+
+**Feed bundle format**
+
+* `feeds/<source>/<date>/...` inside a tar.zst
+* manifest file inside bundle: `feed_bundle_manifest.json` containing:
+
+  * source URLs
+  * retrieval commit/etag (if any)
+  * file hashes
+  * generated_by version
+
+**Content addressing**
+
+* Digest of the entire bundle (`sha256(tar.zst)`) is the reference.
+* Scans record only the digest + URI.
+
+**Immutability**
+
+* Store bundles in object storage with WORM / retention if you need compliance.
+
+---
+
+### 2.4 Replay execution sandbox
+
+For determinism, enforce:
+
+* **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags)
+* **Fixed TZ/locale**
+* **Pinned container image digest**
+* **Clock policy**
+
+  * Either “real time but recorded” or “frozen time at original scan timestamp”
+  * If scanner logic uses current date for severity windows, freeze time
+
+---
+
+## 3) Development implementation plan
+
+I’ll lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size.
+
+### Workstream A — Scan Manifest & Canonical Artifacts
+
+**Goal:** every scan is replayable on paper, even before replays run.
+
+**Deliverables**
+
+* `ScanManifest v1` schema + writer integrated into scan pipeline
+* Canonicalization library + canonical hashing for all artifacts
+
+**Acceptance criteria**
+
+* Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
+* Artifact hashes are stable across repeated runs in the same environment
+
+---
+
+### Workstream B — Feed Snapshotting & Policy Versioning
+
+**Goal:** eliminate “feed drift” by pinning immutable inputs.
+
+**Deliverables**
+
+* Feed bundle builder + signer + uploader
+* Policy/rules bundler (compiled rules bundle, digest recorded)
+
+**Acceptance criteria**
+
+* New scans reference feed bundle digests (not “latest”)
+* A scan can be re-run with the same feed bundle and policy bundle
+
+---
+
+### Workstream C — Replay Runner & Diff Engine
+
+**Goal:** execute historical scans and score BF/SF/PF with actionable diffs.
+
+**Deliverables**
+
+* `stella replay --from manifest.json`
+* Orchestrator job to schedule replays
+* Diff engine + mismatch classifier
+* Storage of replay results
+
+**Acceptance criteria**
+
+* Replay produces deterministic artifacts in a pinned environment
+* Dashboard/CLI shows BF/SF/PF + diff summary for failures
+
+---
+
+### Workstream D — Observability, Dashboard, and CI Gates
+
+**Goal:** make fidelity visible and enforceable.
+
+**Deliverables**
+
+* Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf`
+* Breakdown labels (scanner, ecosystem, policy_set, base_os)
+* Alerts for drop thresholds
+* CI gate option: “block release if BF < threshold on canary set”
+
+**Acceptance criteria**
+
+* Engineering can see drift within 24h
+* Releases are blocked when fidelity regressions occur
+
+---
+
+## 4) Suggested sprint plan with concrete tasks
+
+### Sprint 0 — Design lock + baseline
+
+**Tasks**
+
+* Define manifest schema: `ScanManifest v1` fields + versioning rules
+* Decide canonicalization rules (what is normalized vs preserved)
+* Choose initial “golden canary” scan set (10–20 representative targets)
+* Add “replay-fidelity” epic with ownership & SLIs/SLOs
+
+**Exit criteria**
+
+* Approved schema + canonicalization spec
+* Canary set stored and tagged
+
+---
+
+### Sprint 1 — Manifest writer + artifact hashing (MVP)
+
+**Tasks**
+
+* Implement manifest writer in scan pipeline
+* Store `manifest_json` + `manifest_sha256`
+* Implement canonicalization + hashing for:
+
+  * findings list (sorted)
+  * SBOM (normalized)
+  * VEX (if present)
+* Persist canonical hashes in `scan_artifacts`
+
+**Exit criteria**
+
+* Two identical scans in the same environment yield identical artifact hashes
+* A “manifest export” endpoint/CLI works:
+
+  * `stella scan --emit-manifest out.json`
+
+---
+
+### Sprint 2 — Feed snapshotter + policy bundling
+
+**Tasks**
+
+* Build feed bundler job:
+
+  * pull raw sources
+  * normalize layout
+  * generate `feed_bundle_manifest.json`
+  * tar.zst + sha256
+  * upload + record in `feed_snapshots`
+* Update scan pipeline:
+
+  * resolve feed bundle digest at scan start
+  * record digest in scan manifest
+* Bundle policy/lattice:
+
+  * compile rules into an immutable artifact
+  * record policy bundle digest in manifest
+
+**Exit criteria**
+
+* Scans reference immutable feed + policy digests
+* You can fetch feed bundle by digest and reproduce the same feed inputs
+
+---
+
+### Sprint 3 — Replay executor + “no network” sandbox
+
+**Tasks**
+
+* Create replay container image / runtime wrapper
+* Implement `stella replay --from MANIFEST.json`
+
+  * pulls scanner image by digest
+  * mounts feed bundle + policy bundle
+  * runs in network-off mode
+  * applies tz/locale + clock mode
+* Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage)
+
+**Exit criteria**
+
+* Replay runs end-to-end for canary scans
+* Deterministic runtime controls verified (no DNS egress, fixed tz)
+
+---
+
+### Sprint 4 — Diff engine + mismatch classification
+
+**Tasks**
+
+* Implement BF compare (canonical hashes)
+* Implement SF compare (semantic JSON/object comparison)
+* Implement PF compare (policy decision equivalence)
+* Implement mismatch classification rules:
+
+  * if feed digest differs → feed drift
+  * if scanner digest differs → scanner drift
+  * if environment differs → runtime drift
+  * else → nondeterminism (with sub-tags for ordering/time/RNG)
+* Generate `diff_summary_json`:
+
+  * top N changed CVEs
+  * packages added/removed
+  * policy verdict changes
+
+**Exit criteria**
+
+* Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes
+* Engineers can reproduce failures locally with the manifest
+
+---
+
+### Sprint 5 — Dashboard + alerts + CI gate
+
+**Tasks**
+
+* Expose Prometheus metrics from replay service
+* Build dashboard:
+
+  * BF/SF/PF trends
+  * breakdown by ecosystem/scanner/policy
+  * mismatch cause histogram
+* Add alerting rules (drop threshold, bucket regression)
+* Add CI gate mode:
+
+  * “run replays on canary set for this release candidate”
+  * block merge if BF < target
+
+**Exit criteria**
+
+* Fidelity visible to leadership and engineering
+* Release process is protected by canary replays
+
+---
+
+### Sprint 6 — Hardening + compliance polish
+
+**Tasks**
+
+* Backward compatible manifest upgrades:
+
+  * `manifest_version` bump rules
+  * migration support
+* Artifact signing / integrity:
+
+  * sign manifest hash
+  * optional transparency log later
+* Storage & retention policies (cost controls)
+* Runbook + oncall playbook
+
+**Exit criteria**
+
+* Audit story is complete: “show me exactly how scan X was produced”
+* Operational load is manageable and cost-bounded
+
+---
+
+## 5) Engineering specs you can start implementing immediately
+
+### 5.1 `ScanManifest v1` skeleton (example)
+
+```json
+{
+  "manifest_version": "1.0",
+  "scan_id": "scan_123",
+  "created_at": "2025-12-12T10:15:30Z",
+
+  "input": {
+    "type": "oci_image",
+    "image_ref": "registry/app@sha256:...",
+    "layers": ["sha256:...", "sha256:..."],
+    "source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
+  },
+
+  "scanner": {
+    "engine": "stella",
+    "scanner_image_digest": "sha256:...",
+    "scanner_version": "2025.12.0",
+    "config_digest": "sha256:...",
+    "flags": ["--deep", "--vex"]
+  },
+
+  "feeds": {
+    "vuln_feed_bundle_digest": "sha256:...",
+    "license_db_digest": "sha256:..."
+  },
+
+  "policy": {
+    "policy_bundle_digest": "sha256:...",
+    "policy_set": "prod-default"
+  },
+
+  "environment": {
+    "arch": "amd64",
+    "os": "linux",
+    "tz": "UTC",
+    "locale": "C",
+    "network": "disabled",
+    "clock_mode": "frozen",
+    "clock_value": "2025-12-12T10:15:30Z"
+  },
+
+  "normalization": {
+    "canonicalizer_version": "1.2.0",
+    "sbom_schema": "cyclonedx-1.6",
+    "vex_schema": "cyclonedx-vex-1.0"
+  }
+}
+```
+
+---
+
+### 5.2 CLI spec (minimal)
+
+* `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/`
+* `stella replay --from MANIFEST.json --out-dir replay_out/`
+* `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json`
+
+---
+
+## 6) Testing strategy (to prevent determinism regressions)
+
+### Unit tests
+
+* Canonicalization: same object → same bytes
+* Sorting stability: randomized input order → stable output
+* Hash determinism
+
+### Integration tests
+
+* Golden canaries:
+
+  * run scan twice in same runner → BF match
+  * replay from manifest → BF match
+* “Network leak” test:
+
+  * DNS requests must be zero
+* “Clock leak” test:
+
+  * freeze time; ensure outputs do not include real timestamps
+
+### Chaos tests
+
+* Vary CPU count, run concurrency, run order → still BF match
+* Randomized scheduling / thread interleavings to find races
+
+---
+
+## 7) Operational policies (so it stays useful)
+
+### Retention & cost controls
+
+* Keep full artifacts for regulated scans (e.g., 1–7 years)
+* For non-regulated:
+
+  * keep manifests + canonical hashes long-term
+  * expire heavy evidence blobs after N days
+* Compress large artifacts and dedupe by digest
+
+### Alerting examples
+
+* BF drops by ≥2% week-over-week (any major bucket) → warn
+* BF < 0.90 overall or regulated BF < 0.95 → page / block release
+
+### Triage workflow
+
+* Failed replay auto-creates a ticket with:
+
+  * manifest link
+  * mismatch_reason
+  * diff_summary
+  * reproduction command
+
+---
+
+## 8) What “done” looks like (definition of success)
+
+* Any customer/auditor can pick a scan from 6 months ago and you can:
+
+  1. retrieve manifest + feed bundle + policy bundle by digest
+  2. replay in a pinned sandbox
+  3. show BF/SF/PF results and diffs
+* Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.
+
+---
+
+If you want, I can also provide:
+
+* a **Postgres DDL** for the tables above,
+* a **Prometheus metrics contract** (names + labels + example queries),
+* and a **diff_summary_json schema** that supports a UI “diff view” without reprocessing artifacts.
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,840 @@
+Here’s a quick, plain‑English idea you can use right away: **not all code diffs are equal**—some actually change what’s *reachable* at runtime (and thus security posture), while others just refactor internals. A “**Smart‑Diff**” pipeline flags only the diffs that open or close attack paths by combining (1) call‑stack traces, (2) dependency graphs, and (3) dataflow.
+
+---
+
+### Why this matters (background)
+
+* Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers).
+* Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit.
+* **Dataflow‑aware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policy‑defined entrypoint.”
+
+---
+
+### Minimal architecture (fits Stella Ops)
+
+1. **Entrypoint map** (per service): controllers, handlers, consumers.
+2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink).
+3. **Reachability cache** keyed by (commit, entrypoint, package@version).
+4. **Smart‑Diff** = `reachable_paths(commit_B) – reachable_paths(commit_A)`.
+
+   * If a path to a sensitive sink is newly reachable → **High**.
+   * If a path disappears → auto‑generate **VEX “not affected (no reachable path)”**.
+
+---
+
+### Tiny working seeds
+
+**C# (.NET 10) — Roslyn skeleton to diff call‑reachability**
+
+```csharp
+// SmartDiff.csproj targets net10.0
+using Microsoft.CodeAnalysis;
+using Microsoft.CodeAnalysis.CSharp;
+using Microsoft.CodeAnalysis.FindSymbols;
+
+public static class SmartDiff
+{
+    public static async Task<HashSet<string>> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks)
+    {
+        var workspace = MSBuild.MSBuildWorkspace.Create();
+        var solution = await workspace.OpenSolutionAsync(solutionPath);
+        var index = new HashSet<string>();
+
+        foreach (var proj in solution.Projects)
+        {
+            var comp = await proj.GetCompilationAsync();
+            if (comp is null) continue;
+
+            // Resolve entrypoints & sinks by symbol name
+            var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
+                .OfType<IMethodSymbol>().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList();
+            var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
+                .OfType<IMethodSymbol>().Where(m => sinks.Contains(m.ToDisplayString())).ToList();
+
+            foreach (var ep in epSymbols)
+            foreach (var sink in sinkSymbols)
+            {
+                // Heuristic reachability: cheap path search via SymbolFinder
+                var refs = await SymbolFinder.FindReferencesAsync(sink, solution);
+                if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk
+                    index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}");
+            }
+        }
+        return index;
+
+        static IEnumerable<ISymbol> Descend(INamespaceOrTypeSymbol sym)
+        {
+            foreach (var m in sym.GetMembers())
+            {
+                yield return m;
+                if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x;
+            }
+        }
+    }
+}
+```
+
+**Go — SSA & callgraph seed**
+
+```go
+// go.mod: require golang.org/x/tools latest
+package main
+
+import (
+	"fmt"
+	"golang.org/x/tools/go/callgraph/cha"
+	"golang.org/x/tools/go/packages"
+	"golang.org/x/tools/go/ssa"
+)
+
+func main() {
+	cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false}
+	pkgs, _ := packages.Load(cfg, "./...")
+	prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0))
+	for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) }
+	prog.Build()
+
+	cg := cha.CallGraph(prog)
+	// TODO: map entrypoints & sinks, then walk cg from EPs to sinks
+	fmt.Println("nodes:", len(cg.Nodes))
+}
+```
+
+---
+
+### How to use it in your pipeline (fast win)
+
+* **Pre‑merge job**:
+
+  1. Build call graph for `HEAD` and `HEAD^`.
+  2. Compute Smart‑Diff.
+  3. If any *new* EP→sink path appears, fail with a short, proof‑linked note:
+     “New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).”
+* **Post‑scan VEX**:
+
+  * For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that package’s vulnerable surface.
+
+---
+
+### Evidence to show in the UI
+
+* “**Path card**”: EP → … → Sink, with file:line hop‑list and commit hash.
+* “**What changed**”: before/after path diff (green removed, red added).
+* “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto).
+
+---
+
+### Developer checklist (Stella Ops style)
+
+* [ ] Define entrypoints per service (attribute or YAML).
+* [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO).
+* [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`.
+* [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service).
+* [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX.
+* [ ] Gate merges on **newly‑reachable sensitive sinks**; auto‑VEX when paths disappear.
+
+If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”).
+Below is a concrete **development implementation plan** to take the “Smart‑Diff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). I’ll assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops).
+
+---
+
+## 1) Product definition
+
+### Problem you’re solving
+
+Security noise comes from:
+
+* “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint”
+* Git diffs look big even when behavior is unchanged
+* Teams struggle to triage “is this change actually risky?”
+
+### What Smart‑Diff should do (core behavior)
+
+Given **base commit A** and **head commit B**:
+
+1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands).
+2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.).
+3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint).
+4. Emit **Smart‑Diff**:
+
+   * **Newly reachable** EP→sink paths (risk ↑)
+   * **Removed** EP→sink paths (risk ↓)
+   * **Changed** paths (same sink but different sanitization/guards)
+5. Attach **dependency vulnerability context**:
+
+   * If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable”
+   * Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence
+
+### MVP definition (minimum shippable)
+
+A PR check that:
+
+* Flags **new** reachable paths to a small set of high‑risk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL).
+* Produces:
+
+  * SARIF report (for code scanning UI)
+  * JSON artifact containing proof paths (EP → … → sink with file:line)
+  * Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed)
+
+---
+
+## 2) Architecture you can actually build
+
+### High‑level components
+
+1. **Policy & Taxonomy Service**
+
+   * Defines entrypoints, sources, sinks, sanitizers, confidence rules
+   * Versioned and centrally managed (but supports repo overrides)
+
+2. **Analyzer Workers (language adapters)**
+
+   * .NET analyzer (Roslyn + control flow)
+   * Go analyzer (SSA + callgraph)
+   * Outputs standardized IR (Intermediate Representation)
+
+3. **Graph Store + Reachability Engine**
+
+   * Stores symbol nodes + call edges + dataflow edges
+   * Computes reachable sinks per entrypoint
+   * Computes diff between commits A and B
+
+4. **Vulnerability Mapper + VEX Generator**
+
+   * Maps vulnerable packages/functions → “surfaces”
+   * Joins with reachability results
+   * Emits OpenVEX (or CycloneDX VEX) with evidence links
+
+5. **CI/PR Integrations**
+
+   * CLI that runs in CI
+   * Optional server mode (cache + incremental processing)
+
+6. **UI/API**
+
+   * Path cards: “what changed”, “why it matters”, “proof”
+   * Filters by sink class, confidence, service, entrypoint
+
+### Data contracts (standardized IR)
+
+Make every analyzer output the same shapes so the rest of the pipeline is language‑agnostic:
+
+* **Symbols**
+
+  * `symbol_id`: stable hash of (lang, module, fully-qualified name, signature)
+  * metadata: file, line ranges, kind (method/function), accessibility
+
+* **Edges**
+
+  * Call edge: `caller_symbol_id -> callee_symbol_id`
+  * Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces
+  * Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.)
+
+* **Entrypoints / Sources / Sinks**
+
+  * entrypoint: (symbol_id, route/topic/command metadata)
+  * sink: (symbol_id, sink_type, severity, cwe mapping optional)
+
+* **Paths**
+
+  * `entrypoint -> ... -> sink`
+  * hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant
+
+---
+
+## 3) Workstreams and deliverables
+
+### Workstream A — Policy, taxonomy, configuration
+
+**Deliverables**
+
+* `smartdiff.policy.yaml` schema and validator
+* A default sink taxonomy:
+
+  * `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later)
+* Initial sanitizer patterns:
+
+  * For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists
+
+**Implementation notes**
+
+* Start strict and small: 10–20 sinks, 10 sources, 10 sanitizers.
+* Provide repo-level overrides:
+
+  * `smartdiff.policy.yaml` in repo root
+  * Central policies referenced by version tag
+
+**Acceptance criteria**
+
+* A service can onboard by configuring:
+
+  * entrypoint discovery mode (auto + manual)
+  * sink classes to enforce
+  * severity threshold to fail PR
+
+---
+
+### Workstream B — .NET analyzer (Roslyn)
+
+**Deliverables**
+
+* Build pipeline that produces:
+
+  * call graph (methods and invocations)
+  * basic control-flow guards for reachability (optional for MVP)
+  * taint propagation for common patterns (MVP: parameter → sink)
+* Entry point discovery for:
+
+  * ASP.NET controllers (`[HttpGet]`, `[HttpPost]`)
+  * Minimal APIs (`MapGet/MapPost`)
+  * gRPC service methods
+  * message consumers (configurable attributes/interfaces)
+
+**Implementation notes (practical path)**
+
+* MVP static callgraph:
+
+  * Use Roslyn semantic model to resolve invocation targets
+  * For virtual/interface calls: conservative resolution to possible implementations within the compilation
+* MVP taint:
+
+  * “Sources”: request params/body, headers, query string, message payloads
+  * “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc.
+  * Propagate taint across:
+
+    * parameter → local → argument
+    * return values
+    * simple assignments and concatenations (heuristic)
+* Confidence scoring:
+
+  * Direct static call resolution: high
+  * Reflection/dynamic: low (flag separately)
+
+**Acceptance criteria**
+
+* On a demo ASP.NET service, if a PR adds:
+
+  * `HttpPost /upload` → `File.WriteAllBytes(userPath, ...)`
+    Smart‑Diff flags **new EP→FILE_WRITE path** and shows hops with file/line.
+
+---
+
+### Workstream C — Go analyzer (SSA)
+
+**Deliverables**
+
+* SSA build + callgraph extraction
+* Entrypoint discovery for:
+
+  * `net/http` handlers
+  * common routers (Gin/Echo/Chi) via adapter rules
+  * gRPC methods
+  * consumers (Kafka/NATS/etc.) by config
+
+**Implementation notes**
+
+* Use `golang.org/x/tools/go/packages` + `ssa` build
+* Callgraph:
+
+  * start with CHA (Class Hierarchy Analysis) for speed
+  * later add pointer analysis for precision on interfaces
+* Taint:
+
+  * sources: `http.Request`, router params, message payloads
+  * sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs
+
+**Acceptance criteria**
+
+* A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding.
+
+---
+
+### Workstream D — Graph store + reachability computation
+
+**Deliverables**
+
+* Schema in Postgres (recommended first) for:
+
+  * commits, services, languages
+  * symbols, edges, entrypoints, sinks
+  * computed reachable “facts” (entrypoint→sink with shortest path(s))
+* Reachability engine:
+
+  * BFS/DFS per entrypoint with early cutoffs
+  * path reconstruction storage (store predecessor map or store k-shortest paths)
+
+**Implementation notes**
+
+* Don’t start with a graph DB unless you must.
+* Use Postgres tables + indexes:
+
+  * `edges(from_symbol, to_symbol, commit_id, kind)`
+  * `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)`
+  * `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)`
+* Cache:
+
+  * keyed by (commit, policy_version, analyzer_version)
+  * avoids recompute on re-runs
+
+**Acceptance criteria**
+
+* For any analyzed commit, you can answer:
+
+  * “Which sinks are reachable from these entrypoints?”
+  * “Show me one proof path per (entrypoint, sink_type).”
+
+---
+
+### Workstream E — Smart‑Diff engine (the “diff” part)
+
+**Deliverables**
+
+* Diff algorithm producing three buckets:
+
+  * `added_paths`, `removed_paths`, `changed_paths`
+* “Changed” means:
+
+  * same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes
+
+**Implementation notes**
+
+* Identify a path by a stable fingerprint:
+
+  * `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)`
+* Store:
+
+  * top-k paths for each pair for evidence (k=1 for MVP, add more later)
+* Severity gating rules:
+
+  * Example:
+
+    * New path to `CMD_EXEC` = fail
+    * New path to `FILE_WRITE` = warn unless under `/tmp` allowlist
+    * New path to `SQL_RAW` = fail unless parameterized sanitizer present
+
+**Acceptance criteria**
+
+* Given commits A and B:
+
+  * If B introduces a new reachable sink, CI fails with a single actionable card:
+
+    * **EP**: route / handler
+    * **Sink**: type + symbol
+    * **Proof**: hop list
+    * **Why**: policy rule triggered
+
+---
+
+### Workstream F — Vulnerability mapping + VEX
+
+**Deliverables**
+
+* Ingest dependency inventory (SBOM or lockfiles)
+* Map vulnerabilities to “surfaces”
+
+  * package → vulnerable module/function patterns
+  * minimal version/range matching (from your existing vuln feed)
+* Decision logic:
+
+  * **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink
+  * else **Not affected / Not exploitable** with justification
+
+**Implementation notes**
+
+* Start with a pragmatic approach:
+
+  * package‑level reachability: “is any symbol in that package reachable?”
+  * then iterate toward function‑level surfaces
+* VEX output:
+
+  * include commit hash, policy version, evidence paths
+  * embed links to internal “path card” URLs if available
+
+**Acceptance criteria**
+
+* For a known vulnerable dependency, the system emits:
+
+  * VEX “not affected” if package code is never reached from any entrypoint, with proof references.
+
+---
+
+### Workstream G — CI integration + developer UX
+
+**Deliverables**
+
+* A single CLI:
+
+  * `smartdiff analyze --commit <sha> --service <svc> --lang <dotnet|go>`
+  * `smartdiff diff --base <shaA> --head <shaB> --out sarif`
+* CI templates for:
+
+  * GitHub Actions / GitLab CI
+* Outputs:
+
+  * SARIF
+  * JSON evidence bundle
+  * optional OpenVEX file
+
+**Acceptance criteria**
+
+* Teams can enable Smart‑Diff by adding:
+
+  * CI job + config file
+  * no additional infra required for MVP (local artifacts mode)
+* When infra is available, enable server caching mode for speed.
+
+---
+
+### Workstream H — UI “Path Cards”
+
+**Deliverables**
+
+* UI components:
+
+  * Path card list with filters (sink type, severity, confidence)
+  * “What changed” diff view:
+
+    * red = added hops
+    * green = removed hops
+  * “Evidence” panel:
+
+    * file:line for each hop
+    * code snippets (optional)
+* APIs:
+
+  * `GET /smartdiff/{repo}/{pr}/findings`
+  * `GET /smartdiff/{repo}/{commit}/path/{path_id}`
+
+**Acceptance criteria**
+
+* A developer can click one finding and understand:
+
+  * how the data got there
+  * exactly what line introduced the risk
+  * how to fix (sanitize/guard/allowlist)
+
+---
+
+## 4) Milestone plan (sequenced, no time promises)
+
+### Milestone 0 — Foundation
+
+* Repo scaffolding:
+
+  * `smartdiff-cli/`
+  * `analyzers/dotnet/`
+  * `analyzers/go/`
+  * `core-ir/` (schemas + validation)
+  * `server/` (optional; can come later)
+* Define IR JSON schema + versioning rules
+* Implement policy YAML + validator + sample policies
+* Implement “local mode” artifact output
+
+**Exit criteria**
+
+* You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo.
+
+---
+
+### Milestone 1 — Callgraph reachability MVP
+
+* .NET: build call edges + entrypoint discovery (basic)
+* Go: build call edges + entrypoint discovery (basic)
+* Graph store: in-memory or local sqlite/postgres
+* Compute reachable sinks (callgraph only, no taint)
+
+**Exit criteria**
+
+* On a demo repo, you can list:
+
+  * entrypoints
+  * reachable sinks (callgraph reachability only)
+  * a proof path (hop list)
+
+---
+
+### Milestone 2 — Smart‑Diff MVP (PR gating)
+
+* Compute diff between base/head reachable sink sets
+* Produce SARIF with:
+
+  * rule id = sink type
+  * message includes entrypoint + sink + link to evidence JSON
+* CI templates + documentation
+
+**Exit criteria**
+
+* In PR checks, the job fails on new EP→sink paths and links to a proof.
+
+---
+
+### Milestone 3 — Taint/dataflow MVP (high-value sinks only)
+
+* Add taint propagation to reduce false positives:
+
+  * differentiate “sink reachable” vs “untrusted data reaches sink”
+* Add sanitizer recognition
+* Add confidence scoring + suppression mechanisms (policy allowlists)
+
+**Exit criteria**
+
+* A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise).
+
+---
+
+### Milestone 4 — VEX integration MVP
+
+* Join reachability with dependency vulnerabilities
+* Emit OpenVEX (and/or CycloneDX VEX)
+* Store evidence references (paths) inside VEX justification
+
+**Exit criteria**
+
+* For a repo with a vulnerable dependency, you can automatically produce:
+
+  * affected/not affected with evidence.
+
+---
+
+### Milestone 5 — Scale and precision improvements
+
+* Incremental analysis (only analyze changed projects/packages)
+* Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion)
+* Optional runtime telemetry integration:
+
+  * import production traces to prioritize “actually observed” entrypoints
+
+**Exit criteria**
+
+* Works on large services with acceptable run time and stable noise levels.
+
+---
+
+## 5) Backlog you can paste into Jira (epics + key stories)
+
+### Epic: Policy & taxonomy
+
+* Story: Define `smartdiff.policy.yaml` schema and validator
+  **AC:** invalid configs fail with clear errors; configs are versioned.
+* Story: Provide default sink list and severities
+  **AC:** at least 10 sink rules with test cases.
+
+### Epic: .NET analyzer
+
+* Story: Resolve method invocations to symbols (Roslyn)
+  **AC:** correct targets for direct calls; conservative handling for virtual calls.
+* Story: Discover ASP.NET routes and bind to entrypoint symbols
+  **AC:** entrypoints include route/method metadata.
+
+### Epic: Go analyzer
+
+* Story: SSA build and callgraph extraction
+  **AC:** function nodes and edges generated for a multi-package repo.
+* Story: net/http entrypoint discovery
+  **AC:** handler functions recognized as entrypoints with path labels.
+
+### Epic: Reachability engine
+
+* Story: Compute reachable sinks per entrypoint
+  **AC:** store at least one path with hop list.
+* Story: Smart‑Diff A vs B
+  **AC:** added/removed paths computed deterministically.
+
+### Epic: CI/SARIF
+
+* Story: Emit SARIF results
+  **AC:** findings appear in code scanning UI; include file/line.
+
+### Epic: Taint analysis
+
+* Story: Propagate taint from request to sink for 3 sink classes
+  **AC:** produces “tainted” evidence with a variable/argument trace.
+* Story: Sanitizer recognition
+  **AC:** path marked “sanitized” and downgraded per policy.
+
+### Epic: VEX
+
+* Story: Generate OpenVEX statements from reachability + vuln feed
+  **AC:** for “not affected” includes justification and evidence references.
+
+---
+
+## 6) Key engineering decisions (recommended defaults)
+
+### Storage
+
+* Start with **Postgres** (or even local sqlite for MVP) for simplicity.
+* Introduce a graph DB only if:
+
+  * you need very large multi-commit graph queries at low latency
+  * Postgres performance becomes a hard blocker
+
+### Confidence model
+
+Every edge/path should carry:
+
+* `confidence`: High/Med/Low
+* `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic`
+  This lets you:
+* gate only on high-confidence paths in early rollout
+* keep low-confidence as “informational”
+
+### Suppression model
+
+* Local suppressions:
+
+  * `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry
+* Policy allowlists:
+
+  * allow file writes only under certain directories
+  * allow outbound network only to configured domains
+
+---
+
+## 7) Testing strategy (to avoid “cool demo, unusable tool”)
+
+### Unit tests
+
+* Symbol hashing stability tests
+* Call resolution tests:
+
+  * overloads, generics, interfaces, lambdas
+* Policy parsing/validation tests
+
+### Integration tests (must-have)
+
+* Golden repos in `testdata/`:
+
+  * one ASP.NET minimal API
+  * one MVC controller app
+  * one Go net/http + one Gin app
+* Golden outputs:
+
+  * expected entrypoints
+  * expected reachable sinks
+  * expected diff between commits
+
+### Regression tests
+
+* A curated corpus of “known issues”:
+
+  * false positives you fixed should never return
+  * false negatives: ensure known risky path is always found
+
+### Performance tests
+
+* Measure:
+
+  * analysis time per 50k LOC
+  * memory peak
+  * graph size
+* Budget enforcement:
+
+  * if over budget, degrade gracefully (lower precision, mark low confidence)
+
+---
+
+## 8) Example configs and outputs (to make onboarding easy)
+
+### Example policy YAML (minimal)
+
+```yaml
+version: 1
+service: invoices-api
+entrypoints:
+  autodiscover:
+    dotnet:
+      aspnet: true
+    go:
+      net_http: true
+
+sinks:
+  - type: CMD_EXEC
+    severity: high
+    match:
+      dotnet:
+        symbols:
+          - "System.Diagnostics.Process.Start(string)"
+      go:
+        symbols:
+          - "os/exec.Command"
+  - type: FILE_WRITE
+    severity: medium
+    match:
+      dotnet:
+        namespaces: ["System.IO"]
+      go:
+        symbols: ["os.WriteFile"]
+
+gating:
+  fail_on:
+    - sink_type: CMD_EXEC
+      when: "added && confidence >= medium"
+    - sink_type: FILE_WRITE
+      when: "added && tainted && confidence >= medium"
+```
+
+### Evidence JSON shape (what the UI consumes)
+
+```json
+{
+  "commit": "abc123",
+  "entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"},
+  "sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"},
+  "confidence": "high",
+  "tainted": true,
+  "path": [
+    {"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42},
+    {"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18},
+    {"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null}
+  ]
+}
+```
+
+---
+
+## 9) Risks and mitigations (explicit)
+
+1. **Dynamic behavior (reflection, DI, router magic)**
+
+   * Mitigation: conservative fallbacks + confidence labels + optional runtime traces later
+
+2. **Noise from huge callgraphs**
+
+   * Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, k‑shortest paths only
+
+3. **Large repo build failures**
+
+   * Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag
+
+4. **Teams rejecting gating**
+
+   * Mitigation: staged rollout:
+
+     * Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER
+
+---
+
+## 10) Definition of done (what “implemented” means)
+
+You should consider Smart‑Diff “implemented” when:
+
+* A repo can enable it with one config + one CI job.
+* PRs get:
+
+  * a small number of **actionable** findings (not hundreds)
+  * each finding has a proof path with file/line hops
+* It reliably detects at least:
+
+  * new command execution paths
+  * new unsafe deserialization paths
+  * new tainted filesystem write paths
+* It can optionally emit VEX decisions backed by reachability evidence.
+
+---
+
+If you want the next step, I can also give you:
+
+* a **concrete repo layout** with module boundaries,
+* the **Postgres schema** (tables + indexes),
+* and a **language adapter interface** (so adding Java/Python later is straightforward).