add advisories

2025-12-13 02:08:11 +02:00
parent 564df71bfb
commit d776e93b16
4 changed files with 2977 additions and 0 deletions
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,750 @@
 Here’s a simple, practical way to score vulnerabilities that’s more auditable than plain CVSS: build a **deterministic score** from three reproducible inputs—**Reachability**, **Evidence**, and **Provenance**—so every number is explainable and replayable.
 ---
 ### Why move beyond CVSS?
 * **CVSS is context-light**: it rarely knows *your* call paths, configs, or runtime.
 * **Audits need proof**: regulators and customers increasingly ask, “show me how you got this score.”
 * **Teams need consistency**: the same image should get the same score across environments when inputs are identical.
 ---
 ### The scoring idea (plain English)
 Score = a weighted function of:
 1. **Reachability Depth (R)** — how close the vulnerable function is to a real entry point in *your* app (e.g., public HTTP route → handler → library call).
 2. **Evidence Density (E)** — how much concrete proof you have (stack traces, symbol hits, config toggles, feature flags, SCA vs. SAST vs. DAST vs. runtime).
 3. **Provenance Integrity (P)** — how trustworthy the artifact chain is (signed SBOM, DSSE attestations, SLSA/Rekor entries, reproducible build match).
 A compact, auditable formula you can start with:
 ```
 NormalizedScore = W_R * f(R)  +  W_E * g(E)  +  W_P * h(P)
 ```
 * Pick monotonic, bounded transforms (e.g., map to 0..1):
  * f(R): inverse of hops (shorter path ⇒ higher value)
  * g(E): weighted sum of evidence types (runtime>DAST>SAST>SCA, with decay for stale data)
  * h(P): cryptographic/provenance checks (unsigned < signed < signed+attested < signed+attested+reproducible)
 Keep **W_R + W_E + W_P = 1** (e.g., 0.5, 0.35, 0.15 for reachability-first triage).
 ---
 ### What makes this “deterministic”?
 * Inputs are **machine-replayable**: call-graph JSON, evidence bundle (hashes + timestamps), provenance attestations.
 * The score is **purely a function of those inputs**, so anyone can recompute it later and match your result byte-for-byte.
 ---
 ### Minimal rubric (ready to implement)
 * **Reachability (R, 0..1)**
  * 1.00 = vulnerable symbol called on a hot path from a public route (≤3 hops)
  * 0.66 = reachable but behind uncommon feature flag or deep path (4–7 hops)
  * 0.33 = only theoretically reachable (code present, no discovered path)
  * 0.00 = dead/unreferenced code in this build
 * **Evidence (E, 0..1)** (sum, capped at 1.0)
  * +0.6 runtime trace hitting the symbol
  * +0.3 DAST/integ test activating vulnerable behavior
  * +0.2 SAST precise sink match
  * +0.1 SCA presence only (no call evidence)
  * (Apply 10–30% decay if older than N days)
 * **Provenance (P, 0..1)**
  * 0.0 unsigned/unknown origin
  * 0.3 signed image only
  * 0.6 signed + SBOM (hash-linked)
  * 1.0 signed + SBOM + DSSE attestations + reproducible build match
 Example weights: `W_R=0.5, W_E=0.35, W_P=0.15`.
 ---
 ### How this plugs into **Stella Ops**
 * **Scanner** produces call-graphs & symbol maps (R).
 * **Vexer**/Evidence store aggregates SCA/SAST/DAST/runtime proofs with timestamps (E).
 * **Authority/Proof‑Graph** verifies signatures, SBOM↔image hash links, DSSE/Rekor (P).
 * **Policy Engine** applies the scoring formula (YAML policy) and emits a signed VEX note with the score + input hashes.
 * **Replay**: any audit can re-run the same policy with the same inputs and get the same score.
 ---
 ### Developer checklist (do this first)
 * Emit a **Reachability JSON** per build: entrypoints, hops, functions, edges, timestamps, hashes.
 * Normalize **Evidence Types** with IDs, confidence, freshness, and content hashes.
 * Record **Provenance Facts** (signing certs, SBOM digest, DSSE bundle, reproducible-build fingerprint).
 * Implement the **score function as a pure library** (no I/O), version it (e.g., `score.v1`), and include the version + inputs’ hashes in every VEX note.
 * Add a **30‑sec “Time‑to‑Evidence” UI**: click a score → see the exact call path, evidence list, and provenance checks.
 ---
 ### Why this helps compliance & sales
 * Every number is **auditable** (inputs + function are transparent).
 * Scores remain **consistent across air‑gapped sites** (deterministic, no hidden heuristics).
 * You can **prove reduction** after a fix (paths disappear, evidence decays, provenance improves).
 If you want, I can draft the YAML policy schema and a tiny .NET 10 library stub for `score.v1` so you can drop it into Stella Ops today.
 Below is an extended, **developer-ready implementation plan** to build the deterministic vulnerability score into **Stella Ops** (Scanner → Evidence/Vexer → Authority/Proof‑Graph → Policy Engine → UI/VEX output). I’m assuming a .NET-centric stack (since you mentioned .NET 10 earlier), but everything is laid out so the scoring core stays language-agnostic.
 ---
 ## 1) Extend the scoring model into a stable, “auditable primitive”
 ### 1.1 Outputs you should standardize on
 Produce **two** signed artifacts per finding (plus optional UI views):
 1. **ScoreResult** (primary):
 * `riskScore` (0–100 integer)
 * `subscores` (each 0–100 integer): `baseSeverity`, `reachability`, `evidence`, `provenance`
 * `explain[]` (structured reasons, ordered deterministically)
 * `inputs` (digests of all upstream inputs)
 * `policy` (policy version + digest)
 * `engine` (engine version + digest)
 * `asOf` timestamp (the only “time” allowed to affect the result)
 2. **VEX note** (OpenVEX/CSAF-compatible wrapper):
 * references ScoreResult digest
 * embeds the score (optional) + the input digests
 * signed by Stella Ops Authority
 > Key audit requirement: anyone can recompute the score **offline** from the input bundle + policy + engine version.
 ---
 ## 2) Make determinism non-negotiable
 ### 2.1 Determinism rules (implement as “engineering constraints”)
 These are the common ways deterministic systems become non-deterministic:
 * **No floating point** in scoring math. Use integer “basis points” and integer bucket tables.
 * **No implicit time**. Scoring takes `asOf` as an explicit input. Evidence “freshness” is computed as `asOf - evidence.timestamp`.
 * **Canonical serialization** for hashing:
  * Use RFC-style canonical JSON (e.g., JCS) or a strict canonical CBOR profile.
  * Sort keys and arrays deterministically.
 * **Stable ordering** for explanation lists:
  * Always sort factors by `(factorId, contributingObjectDigest)`.
 ### 2.2 Fixed-point scoring approach (recommended)
 Represent weights and multipliers as **basis points** (bps):
 * 100% = 10,000 bps
 * 1% = 100 bps
 Example: `totalScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
 ---
 ## 3) Extended score definition (v1)
 ### 3.1 Subscores (0–100 integers)
 #### BaseSeverity (B)
 * Source: CVSS if present, else vendor severity, else default.
 * Normalize to 0–100:
  * CVSS 0.0–10.0 → 0–100 by `B = round(CVSS * 10)`
 Keep it small weight so you’re “beyond CVSS” but still anchored.
 #### Reachability (R)
 Computed from reachability report (call-path depth + gating conditions).
 **Hop buckets** (example):
 * 0–2 hops: 100
 * 3 hops: 85
 * 4 hops: 70
 * 5 hops: 55
 * 6 hops: 45
 * 7 hops: 35
 * 8+ hops: 20
 * unreachable: 0
 **Gate multipliers** (apply multiplicatively in bps):
 * behind feature flag: ×7000
 * auth required: ×8000
 * only admin role: ×8500
 * non-default config: ×7500
 Final: `R = bucketScore * gateMultiplier / 10000`
 #### Evidence (E)
 Sum evidence “points” capped at 100, then apply freshness multiplier.
 Evidence points (example):
 * runtime trace hitting vulnerable symbol: +60
 * DAST / integration test triggers behavior: +30
 * SAST precise sink match: +20
 * SCA presence only: +10
 Freshness bucket multiplier (example):
 * age ≤ 7 days: ×10000
 * ≤ 30 days: ×9000
 * ≤ 90 days: ×7500
 * ≤ 180 days: ×6000
 * ≤ 365 days: ×4000
 * > 365: ×2000
 Final: `E = min(100, sum(points)) * freshness / 10000`
 #### Provenance (P)
 Based on verified supply-chain checks.
 Levels:
 * unsigned/unknown: 0
 * signed image: 30
 * signed + SBOM hash-linked to image: 60
 * signed + SBOM + DSSE attestations verified: 80
 * above + reproducible build match: 100
 ### 3.2 Total score and overrides
 Weights (example):
 * `wB=1000` (10%)
 * `wR=4500` (45%)
 * `wE=3000` (30%)
 * `wP=1500` (15%)
 Total:
 * `riskScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
 Override examples (still deterministic, because they depend on evidence flags):
 * If `knownExploited=true` AND `R >= 70` → force score to 95+
 * If unreachable (`R=0`) AND only SCA evidence (`E<=10`) → clamp score ≤ 25
 ---
 ## 4) Canonical schemas (what to build first)
 ### 4.1 ReachabilityReport (per artifact + vuln)
 Minimum fields:
 * `artifactDigest` (sha256 of image or build artifact)
 * `graphDigest` (sha256 of canonical call-graph representation)
 * `vulnId` (CVE/OSV/etc)
 * `vulnerableSymbol` (fully-qualified)
 * `entrypoints[]` (HTTP routes, queue consumers, CLI commands, cron handlers)
 * `shortestPath`:
  * `hops` (int)
  * `nodes[]` (ordered list of symbols)
  * `edges[]` (optional)
 * `gates[]`:
  * `type` (“featureFlag” | “authRequired” | “configNonDefault” | …)
  * `detail` (string)
 * `computedAt` (timestamp)
 * `toolVersion`
 ### 4.2 EvidenceBundle (per artifact + vuln)
 Evidence items are immutable and deduped by content hash.
 * `evidenceId` (content hash)
 * `artifactDigest`
 * `vulnId`
 * `type` (“SCA” | “SAST” | “DAST” | “RUNTIME” | “ADVISORY”)
 * `tool` (name/version)
 * `timestamp`
 * `confidence` (0–100)
 * `subject` (package, symbol, endpoint)
 * `payloadDigest` (hash of raw payload stored separately)
 ### 4.3 ProvenanceReport (per artifact)
 * `artifactDigest`
 * `signatureChecks[]` (who signed, what key, result)
 * `sbomDigest` + `sbomType`
 * `attestations[]` (DSSE digests + verification result)
 * `transparencyLogRefs[]` (optional)
 * `reproducibleMatch` (bool)
 * `computedAt`
 * `toolVersion`
 * `verificationLogDigest`
 ### 4.4 ScoreInput + ScoreResult
 **ScoreInput** should include:
 * `asOf`
 * `policyVersion`
 * digests for reachability/evidence/provenance/base severity source
 **ScoreResult** should include:
 * `riskScore`, `subscores`
 * `explain[]` (deterministic)
 * `engineVersion`, `policyDigest`
 * `inputs[]` (digests)
 * `resultDigest` (hash of canonical ScoreResult)
 * `signature` (Authority signs the digest)
 ---
 ## 5) Development implementation plan (phased, with deliverables + acceptance criteria)
 ### Phase A — Foundations: schemas, hashing, policy format, test harness
 **Deliverables**
 * Canonical JSON format rules + hashing utilities (shared lib)
 * JSON Schemas for: ReachabilityReport, EvidenceItem, ProvenanceReport, ScoreInput, ScoreResult
 * “Golden fixture” repo: a set of input bundles and expected ScoreResults
 * Policy format `score.v1` (YAML or JSON) using **integer bps**
 **Acceptance criteria**
 * Same input bundle → identical `resultDigest` across:
  * OS (Linux/Windows)
  * CPU (x64/ARM64)
  * runtime versions (supported .NET versions)
 * Fixtures run in CI and fail on any byte-level diff
 ---
 ### Phase B — Scoring engine (pure function library)
 **Deliverables**
 * `Stella.ScoreEngine` as a pure library:
  * `ComputeScore(ScoreInputBundle) -> ScoreResult`
  * `Explain(ScoreResult) -> structured explanation` (already embedded)
 * Policy parser + validator:
  * weights sum to 10,000
  * bucket tables monotonic
  * override rules deterministic and total order
 **Acceptance criteria**
 * 100% deterministic tests passing (golden fixtures)
 * “Explain” always includes:
  * subscores
  * applied buckets
  * applied gate multipliers
  * freshness bucket selected
  * provenance level selected
 * No non-deterministic dependencies (time, random, locale, float)
 ---
 ### Phase C — Evidence pipeline (Vexer / Evidence Store)
 **Deliverables**
 * Normalized evidence ingestion adapters:
  * SCA ingest (from your existing scanner output)
  * SAST ingest
  * DAST ingest
  * runtime trace ingest (optional MVP → “symbol hit” events)
 * Evidence Store service:
  * immutability (append-only)
  * dedupe by `evidenceId`
  * query by `(artifactDigest, vulnId)`
 **Acceptance criteria**
 * Ingesting the same evidence twice yields identical state (idempotent)
 * Every evidence record can be exported as a bundle with content hashes
 * Evidence timestamps preserved; `asOf` drives freshness deterministically
 ---
 ### Phase D — Reachability analyzer (Scanner extension)
 **Deliverables**
 * Call-graph builder and symbol resolver:
  * for .NET: IL-level call graph + ASP.NET route discovery
 * Reachability computation:
  * compute shortest path hops from entrypoints to vulnerable symbol
  * attach gating detections (config/feature/auth heuristics)
 * Reachability report emitter:
  * emits ReachabilityReport with stable digests
 **Acceptance criteria**
 * Given the same build artifact, reachability report digest is stable
 * Paths are replayable and visualizable (nodes are resolvable)
 * Unreachable findings are explicitly marked and explainable
 ---
 ### Phase E — Provenance verification (Authority / Proof‑Graph)
 **Deliverables**
 * Verification pipeline:
  * signature verification for artifact digest
  * SBOM hash linking
  * attestation verification (DSSE/in‑toto style)
  * optional transparency log reference capture
  * optional reproducible-build comparison input
 * ProvenanceReport emitter (signed verification log digest)
 **Acceptance criteria**
 * Verification is offline-capable if given the necessary bundles
 * Any failed check is captured with a deterministic error code + message
 * ProvenanceReport digest is stable for same inputs
 ---
 ### Phase F — Orchestration: “score a finding” workflow + VEX output
 **Deliverables**
 * Orchestrator service (or existing pipeline step) that:
  1. receives a vulnerability finding
  2. fetches reachability/evidence/provenance bundles
  3. builds ScoreInput with `asOf`
  4. computes ScoreResult
  5. signs ScoreResult digest
  6. emits VEX note referencing ScoreResult digest
 * Storage for ScoreResult + VEX note (immutable, versioned)
 **Acceptance criteria**
 * “Recompute” produces same ScoreResult digest if inputs unchanged
 * VEX note includes:
  * policy version + digest
  * engine version
  * input digests
  * score + subscores
 * End-to-end API returns “why” data in <1 round trip (cached)
 ---
 ### Phase G — UI: “Why this score?” and replay/export
 **Deliverables**
 * Findings view enhancements:
  * score badge + risk bucket (Low/Med/High/Critical)
  * click-through “Why this score”
 * “Why this score” panel:
  * call path visualization (at least as an ordered list for MVP)
  * evidence list with freshness + confidence
  * provenance checks list (pass/fail)
  * export bundle (inputs + policy + engine version) for audit replay
 **Acceptance criteria**
 * Any score is explainable in <30 seconds by a human reviewer
 * Exported bundle can reproduce score offline
 ---
 ### Phase H — Governance: policy-as-code, versioning, calibration, rollout
 **Deliverables**
 * Policy registry:
  * store `score.v1` policies by org/project/environment
  * approvals + change log
 * Versioning strategy:
  * engine semantic versioning
  * policy digest pinned in ScoreResult
  * migration tooling (e.g., score.v1 → score.v2)
 * Rollout mechanics:
  * shadow mode: compute score but don’t enforce
  * enforcement gates: block deploy if score ≥ threshold
 **Acceptance criteria**
 * Policy changes never rewrite past scores
 * You can backfill new scores with a new policy version without ambiguity
 * Audit log shows: who changed policy, when, why (optional but recommended)
 ---
 ## 6) Engineering backlog (epics → stories → DoD)
 ### Epic 1: Deterministic core
 * Story: implement canonical JSON + hashing
 * Story: implement fixed-point math helpers (bps)
 * Story: implement score.v1 buckets + overrides
 * DoD:
  * no floats
  * golden test suite
  * deterministic explain ordering
 ### Epic 2: Evidence normalization
 * Story: evidence schema + dedupe
 * Story: adapters (SCA/SAST/DAST/runtime)
 * Story: evidence query API
 * DoD:
  * idempotent ingest
  * bundle export with digests
 ### Epic 3: Reachability
 * Story: entrypoint discovery for target frameworks
 * Story: call graph extraction
 * Story: shortest-path computation
 * Story: gating heuristics
 * DoD:
  * stable digests
  * replayable paths
 ### Epic 4: Provenance
 * Story: verify signatures
 * Story: verify SBOM link
 * Story: verify attestations
 * Story: reproducible match input support
 * DoD:
  * deterministic error codes
  * stable provenance scoring
 ### Epic 5: End-to-end score + VEX
 * Story: orchestration
 * Story: ScoreResult signing
 * Story: VEX generation and storage
 * DoD:
  * recompute parity
  * verifiable signatures
 ### Epic 6: UI
 * Story: score badge + buckets
 * Story: why panel
 * Story: export bundle + recompute button
 * DoD:
  * human explainability
  * offline replay works
 ---
 ## 7) APIs to implement (minimal but complete)
 ### 7.1 Compute score (internal)
 * `POST /api/score/compute`
  * input: `ScoreInput` + references or inline bundles
  * output: `ScoreResult`
 ### 7.2 Get score (product)
 * `GET /api/findings/{findingId}/score`
  * returns latest ScoreResult + VEX reference
 ### 7.3 Explain score
 * `GET /api/findings/{findingId}/score/explain`
  * returns `explain[]` + call path + evidence list + provenance checks
 ### 7.4 Export replay bundle
 * `GET /api/findings/{findingId}/score/bundle`
  * returns a tar/zip containing:
    * ScoreInput
    * policy file
    * reachability/evidence/provenance reports
    * engine version manifest
 ---
 ## 8) Testing strategy (what to automate early)
 ### Unit tests
 * bucket selection correctness
 * gate multiplier composition
 * evidence freshness bucketing
 * provenance level mapping
 * override rule ordering
 ### Golden fixtures
 * fixed input bundles → fixed ScoreResult digest
 * run on every supported platform/runtime
 ### Property-based tests
 * monotonicity:
  * fewer hops should not reduce R
  * more evidence points should not reduce E
  * stronger provenance should not reduce P
 ### Integration tests
 * full pipeline: finding → bundles → score → VEX
 * “recompute” parity tests
 ---
 ## 9) Operational concerns and hardening
 ### Performance
 * Cache reachability per `(artifactDigest, vulnId, symbol)`
 * Cache provenance per `artifactDigest`
 * Evidence queries should be indexed by `(artifactDigest, vulnId, type)`
 ### Security
 * Treat evidence ingestion as untrusted input:
  * strict schema validation
  * content-hash dedupe prevents tampering via overwrite
 * Sign ScoreResults and VEX notes
 * RBAC:
  * who can change policy
  * who can override scores (if allowed at all)
 ### Data retention
 * Evidence payloads can be large; keep digests + store raw payloads in object storage
 * Keep a “minimal replay bundle” always (schemas + digests + policy + engine)
 ---
 ## 10) Concrete “MVP first” slice (smallest valuable product)
 If you want a crisp MVP that still satisfies “auditable determinism”:
 1. Scoring engine (`B + R + E + P`), fixed-point, golden tests
 2. Evidence store (SCA + runtime optional)
 3. Reachability: only hop depth from HTTP routes to symbol (no fancy gates)
 4. Provenance: signed image + SBOM link only
 5. UI: score + “why” panel showing:
   * hops/path list
   * evidence list
   * provenance checklist
 6. Emit a signed VEX note containing the score + input digests
 That MVP already proves the core differentiator: **deterministic, replayable risk scoring**.
 ---
 ## 11) Starter policy file (score.v1) using basis points
 Here’s a good “real implementation” starting point (int-only):
 ```yaml
 policyVersion: score.v1
 weightsBps:
  baseSeverity: 1000
  reachability: 4500
  evidence: 3000
  provenance: 1500
 reachability:
  hopBuckets:
    - { maxHops: 2, score: 100 }
    - { maxHops: 3, score: 85 }
    - { maxHops: 4, score: 70 }
    - { maxHops: 5, score: 55 }
    - { maxHops: 6, score: 45 }
    - { maxHops: 7, score: 35 }
    - { maxHops: 9999, score: 20 }
  unreachableScore: 0
  gateMultipliersBps:
    featureFlag: 7000
    authRequired: 8000
    adminOnly: 8500
    nonDefaultConfig: 7500
 evidence:
  points:
    runtime: 60
    dast: 30
    sast: 20
    sca: 10
  freshnessBuckets:
    - { maxAgeDays: 7, multiplierBps: 10000 }
    - { maxAgeDays: 30, multiplierBps: 9000 }
    - { maxAgeDays: 90, multiplierBps: 7500 }
    - { maxAgeDays: 180, multiplierBps: 6000 }
    - { maxAgeDays: 365, multiplierBps: 4000 }
    - { maxAgeDays: 99999, multiplierBps: 2000 }
 provenance:
  levels:
    unsigned: 0
    signed: 30
    signedWithSbom: 60
    signedWithSbomAndAttestations: 80
    reproducible: 100
 overrides:
  - name: knownExploitedAndReachable
    when:
      flags:
        knownExploited: true
      minReachability: 70
    setScore: 95
  - name: unreachableAndOnlySca
    when:
      maxReachability: 0
      maxEvidence: 10
    clampMaxScore: 25
 ```
 ---
 If you want, I can also include a **repo layout + CI “golden fixture” test runner** (dotnet test + cross-platform determinism checks) and a **.NET 10 ScoreEngine skeleton** that enforces: no floats, canonical JSON hashing, and stable explanation ordering.
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,744 @@
 Here’s a simple, high‑leverage UX metric to add to your pipeline run view that will immediately make DevOps feel faster and calmer:
 # Time‑to‑First‑Signal (TTFS)
 **What it is:** the time from opening a run’s details page until the UI renders the **first actionable insight** (e.g., “Stage `build` failed – `dotnet restore` 401 – token expired”).
 **Why it matters:** engineers don’t need *all* data instantly—just the first trustworthy clue to start acting. Lower TTFS = quicker triage, lower stress, tighter MTTR.
 ---
 ## What counts as a “first signal”
 * Failed stage + reason (exit code, key log line, failing test name)
 * Degraded but actionable status (e.g., flaky test signature)
 * Policy gate block with the specific rule that failed
 * Reachability‑aware security finding that blocks deploy (one concrete example, not the whole list)
 > Not a signal: spinners, generic “loading…”, or unactionable counts.
 ---
 ## How to optimize TTFS (practical steps)
 1. **Deferred loading (prioritize critical panes):**
   * Render header + failing stage card first; lazy‑load artifacts, full logs, and graphs after.
   * Pre‑expand the *first failing node* in the stage graph.
 2. **Log pre‑indexing at ingest:**
   * During CI, stream logs into chunks keyed by `[jobId, phase, severity, firstErrorLine]`.
   * Extract the **first error tuple** (timestamp, step, message) and store it next to the job record.
   * On UI open, fetch only that tuple (sub‑100 ms) before fetching the rest.
 3. **Cached summaries:**
   * Persist a tiny JSON “run.summary.v1” (status, first failing stage, first error line, blocking policies) in Redis/Postgres.
   * Invalidate on new job events; always serve this summary first.
 4. **Edge prefetch:**
   * When the runs table is visible, prefetch summaries for rows in viewport so details pages open “warm”.
 5. **Compress + cap first log burst:**
   * Send the first **5–10 error lines** (already extracted) immediately; stream the rest.
 ---
 ## Instrumentation (so you can prove it)
 Emit these points as telemetry:
 * `ttfs_start`: when the run details route is entered (or when tab becomes visible)
 * `ttfs_signal_rendered`: when the first actionable card is in the DOM
 * `ttfs_ms = signal_rendered - start`
 * Dimensions: `pipeline_provider`, `repo`, `branch`, `run_type` (PR/main), `device`, `release`, `network_state`
 **SLO:** *P50 ≤ 700 ms, P95 ≤ 2.5 s* (adjust to your infra).
 **Dashboards to track:**
 * TTFS distribution (P50/P90/P95) by release
 * Correlate TTFS with bounce rate and “open → rerun” delay
 * Error budget: % of views with TTFS > 3 s
 ---
 ## Minimal backend contract (example)
 ```json
 GET /api/runs/{runId}/first-signal
 {
  "runId": "123",
  "firstSignal": {
    "type": "stage_failed",
    "stage": "build",
    "step": "dotnet restore",
    "message": "401 Unauthorized: token expired",
    "at": "2025-12-11T09:22:31Z",
    "artifact": { "kind": "log", "range": {"start": 1880, "end": 1896} }
  },
  "summaryEtag": "W/\"a1b2c3\""
 }
 ```
 ---
 ## Frontend pattern (Angular 17, signal‑first)
 * Fire `first-signal` request in route resolver.
 * Render `FirstSignalCard` immediately.
 * Lazy‑load stage graph, full logs, security panes.
 * Fire `ttfs_signal_rendered` when `FirstSignalCard` enters viewport.
 ---
 ## CI adapter hints (GitLab/GitHub/Azure)
 * Hook on job status webhooks to compute & store the first error tuple.
 * For GitLab: scan `trace` stream for first `ERRO|FATAL|##[error]` match; store to DB table `ci_run_first_signal(run_id, stage, step, message, t)`.
 ---
 ## “Good TTFS” acceptance tests
 * Run with early fail → first signal < 1 s, shows exact command + exit code.
 * Run with policy gate fail → rule name + fix hint visible first.
 * Offline/slow network → cached summary still renders an actionable hint.
 ---
 ## Copy to put in your UX guidelines
 > “Optimize **Time‑to‑First‑Signal (TTFS)** above all. Users must see one trustworthy, actionable clue within 1 second on a warm path—even if the rest of the UI is still loading.”
 If you want, I can sketch the exact DB schema for the pre‑indexed log tuples and the Angular resolver + telemetry hooks next.
 Below is an extended, end‑to‑end implementation plan for **Time‑to‑First‑Signal (TTFS)** that you can drop into your backlog. It includes architecture, data model, API contracts, frontend work, observability, QA, and rollout—structured as epics/phases with “definition of done” and acceptance criteria.
 ---
 # Scope extension
 ## What we’re building
 A run details experience that renders **one actionable clue** fast—before loading heavy UI like full logs, graphs, artifacts.
 **“First signal”** is a small payload derived from run/job events and the earliest meaningful error evidence (stage/step + key log line(s) + reason/classification).
 ## What we’re extending beyond the initial idea
 1. **First‑Signal Quality** (not just speed)
   * Classify error type (auth, dependency, compilation, test, infra, policy, timeout).
   * Identify “culprit step” and a stable “signature” for dedupe and search.
 2. **Progressive disclosure UX**
   * Summary → First signal card → expanded context (stage graph, logs, artifacts).
 3. **Provider‑agnostic ingestion**
   * Adapters for GitLab/GitHub/Azure (or your CI provider).
 4. **Caching + prefetch**
   * Warm open from list/table, with ETags and stale‑while‑revalidate.
 5. **Observability & SLOs**
   * TTFS metrics, dashboards, alerting, and quality metrics (false signals).
 6. **Rollout safety**
   * Feature flags, canary, A/B gating, and a guaranteed fallback path.
 ---
 # Success criteria
 ## Primary metric
 * **TTFS (ms)**: time from details page route enter → first actionable signal rendered.
 ## Targets (example SLOs)
 * **P50 ≤ 700 ms**, **P95 ≤ 2500 ms** on warm path.
 * **Cold path**: P95 ≤ 4000 ms (depends on infra).
 ## Secondary outcome metrics
 * **Open→Action time**: time from opening run to first user action (rerun, cancel, assign, open failing log line).
 * **Bounce rate**: close page within 10 seconds without interaction.
 * **MTTR proxy**: time from failure to first rerun or fix commit.
 ## Quality metrics
 * **Signal availability rate**: % of run views that show a first signal card within 3s.
 * **Signal accuracy score** (sampled): engineer confirms “helpful vs not”.
 * **Extractor failure rate**: parsing errors / missing mappings / timeouts.
 ---
 # Architecture overview
 ## Data flow
 1. **CI provider events** (job started, job finished, stage failed, log appended) land in your backend.
 2. **Run summarizer** maintains:
   * `run_summary` (small JSON)
   * `first_signal` (small, actionable payload)
 3. **UI opens run details**
   * Immediately calls `GET /runs/{id}/first-signal` (or `/summary`).
   * Renders FirstSignalCard as soon as payload arrives.
 4. Background fetches:
   * Stage graph, full logs, artifacts, security scans, trends.
 ## Key decision: where to compute first signal
 * **Option A: at ingest time (recommended)**
  Compute first signal when logs/events arrive, store it, serve it instantly.
 * **Option B: on demand**
  Compute when user opens run details (simpler initially, worse TTFS and load).
 ---
 # Data model
 ## Tables (relational example)
 ### `ci_run`
 * `run_id (pk)`
 * `provider`
 * `repo_id`
 * `branch`
 * `status`
 * `created_at`, `updated_at`
 ### `ci_job`
 * `job_id (pk)`
 * `run_id (fk)`
 * `stage_name`
 * `job_name`
 * `status`
 * `started_at`, `finished_at`
 ### `ci_log_chunk`
 * `chunk_id (pk)`
 * `job_id (fk)`
 * `seq` (monotonic)
 * `byte_start`, `byte_end` (range into blob)
 * `first_error_line_no` (nullable)
 * `first_error_excerpt` (nullable, short)
 * `severity_max` (info/warn/error)
 ### `ci_run_summary`
 * `run_id (pk)`
 * `version` (e.g., `1`)
 * `etag` (hash)
 * `summary_json` (small, 1–5 KB)
 * `updated_at`
 ### `ci_first_signal`
 * `run_id (pk)`
 * `etag`
 * `signal_json` (small, 0.5–2 KB)
 * `quality_flags` (bitmask or json)
 * `updated_at`
 ## Cache layer
 * Redis keys:
  * `run:{runId}:summary:v1`
  * `run:{runId}:first-signal:v1`
 * TTL: generous but safe (e.g., 24h) with “write‑through” on event updates.
 ---
 # First signal definition
 ## `FirstSignal` object (recommended shape)
 ```json
 {
  "runId": "123",
  "computedAt": "2025-12-12T09:22:31Z",
  "status": "failed",
  "firstSignal": {
    "type": "stage_failed",
    "classification": "dependency_auth",
    "stage": "build",
    "job": "build-linux-x64",
    "step": "dotnet restore",
    "message": "401 Unauthorized: token expired",
    "signature": "dotnet-restore-401-unauthorized",
    "log": {
      "jobId": "job-789",
      "lines": [
        "error : Response status code does not indicate success: 401 (Unauthorized).",
        "error : The token is expired."
      ],
      "range": { "start": 1880, "end": 1896 }
    },
    "suggestedActions": [
      { "label": "Rotate token", "type": "doc", "target": "internal://docs/tokens" },
      { "label": "Rerun job", "type": "action", "target": "rerun-job:job-789" }
    ]
  },
  "etag": "W/\"a1b2c3\""
 }
 ```
 ### Notes
 * `signature` should be stable for grouping.
 * `suggestedActions` is optional but hugely valuable (even 1–2 actions).
 ---
 # APIs
 ## 1) First signal endpoint
 **GET** `/api/runs/{runId}/first-signal`
 Headers:
 * `If-None-Match: W/"..."` supported
 * Response includes `ETag` and `Cache-Control`
 Responses:
 * `200`: full first signal object
 * `304`: not modified
 * `404`: run not found
 * `204`: run exists but signal not available yet (rare; should degrade gracefully)
 ## 2) Summary endpoint (optional but useful)
 **GET** `/api/runs/{runId}/summary`
 * Includes: status, first failing stage/job, timestamps, blocking policies, artifact counts.
 ## 3) SSE / WebSocket updates (nice-to-have)
 **GET** `/api/runs/{runId}/events` (SSE)
 * Push new signal or summary updates in near real-time while user is on the page.
 ---
 # Frontend implementation plan (Angular 17)
 ## UX behavior
 1. **Route enter**
   * Start TTFS timer.
 2. Render instantly:
   * Title, status badge, pipeline metadata (run id, commit, branch).
   * Skeleton for details area.
 3. Fetch first signal:
   * Render `FirstSignalCard` immediately when available.
   * Fire telemetry event when card is **in DOM and visible**.
 4. Lazy-load:
   * Stage graph
   * Full logs viewer
   * Artifacts list
   * Security findings
   * Trends, flaky tests, etc.
 ## Angular structure
 * `RunDetailsResolver` (or `resolveFn`) requests first signal.
 * `RunDetailsComponent` uses signals to render quickly.
 * `FirstSignalCardComponent` is standalone + minimal deps.
 ## Prefetch strategy from runs list view
 * When the runs table is visible, prefetch summaries/first signals for items in viewport:
  * Use `IntersectionObserver` to prefetch only visible rows.
  * Store results in an in-memory cache (e.g., `Map<runId, FirstSignal>`).
  * Respect ETag to avoid redundant payloads.
 ## Telemetry hooks
 * `ttfs_start`: route activation + tab visible
 * `ttfs_signal_rendered`: FirstSignalCard attached and visible
 * Dimensions: provider, repo, branch, run_type, release_version, network_state
 ---
 # Backend implementation plan
 ## Summarizer / First-signal service
 A service or module that:
 * subscribes to run/job events
 * receives log chunks (or pointers)
 * computes and stores:
  * `run_summary`
  * `first_signal`
 * publishes updates (optional) to an event stream for SSE
 ### Concurrency rule
 First signal should be set once per run unless a “better” signal appears:
 * if current signal is missing → set
 * if current signal is “generic” and new one is “specific” → replace
 * otherwise keep (avoid churn)
 ---
 # Extraction & classification logic
 ## Minimum viable extractor (Phase 1)
 * Heuristics:
  * first match among patterns: `FATAL`, `ERROR`, `##[error]`, `panic:`, `Unhandled exception`, `npm ERR!`, `BUILD FAILED`, etc.
  * plus provider-specific fail markers
 * Pull:
  * stage/job/step context (from job metadata or step boundaries)
  * 5–10 log lines around first error line
 ## Improved extractor (Phase 2+)
 * Language/tool specific rules:
  * dotnet, maven/gradle, npm/yarn/pnpm, python/pytest, go test, docker build, terraform, helm
 * Add `classification` and `signature`:
  * normalize common errors:
    * auth expired/forbidden
    * missing dependency / DNS / TLS
    * compilation error
    * test failure (include test name)
    * infra capacity / agent lost
    * policy gate failure
 ## Guardrails
 * **Secret redaction**: before storing excerpts, run your existing redaction pipeline.
 * **Payload cap**: cap message length and excerpt lines.
 * **PII discipline**: avoid including arbitrary stack traces if they contain sensitive paths; include only key lines.
 ---
 # Development plan by phases (epics)
 Each phase below includes deliverables + acceptance criteria. You can treat each as a sprint/iteration.
 ---
 ## Phase 0 — Baseline and alignment
 ### Deliverables
 * Baseline TTFS measurement (current behavior)
 * Definition of “actionable signal” and priority rules
 * Performance budget for run details view
 ### Tasks
 * Add client-side telemetry for current page load steps:
  * route enter, summary loaded, logs loaded, graph loaded
 * Measure TTFS proxy today (likely “time to status shown”)
 * Identify top 20 failure modes in your CI (from historical logs)
 ### Acceptance criteria
 * Dashboard shows baseline P50/P95 for current experience.
 * “First signal” contract signed off with UI + backend teams.
 ---
 ## Phase 1 — Data model and storage
 ### Deliverables
 * DB migrations for `ci_run_summary` and `ci_first_signal`
 * Redis cache keys and invalidation strategy
 * ADR: where summaries live and how they update
 ### Tasks
 * Create tables and indices:
  * index on `run_id`, `updated_at`, `provider`
 * Add serializer/deserializer for `summary_json` and `signal_json`
 * Implement ETag generation (hash of JSON payload)
 ### Acceptance criteria
 * Can store and retrieve summary + first signal for a run in < 50ms (DB) and < 10ms (cache).
 * ETag works end-to-end.
 ---
 ## Phase 2 — Ingestion and first signal computation
 ### Deliverables
 * First-signal computation module
 * Provider adapter integration points (webhook consumers)
 * “first error tuple” extraction from logs
 ### Tasks
 * On job log append:
  * scan incrementally for first error markers
  * store excerpt + line range + job/stage/step mapping
 * On job finish/fail:
  * finalize first signal with best known context
 * Implement the “better signal replaces generic” rule
 ### Acceptance criteria
 * For a known failing run, API returns first signal without reading full log blob.
 * Computation does not exceed a small CPU budget per log chunk (guard with limits).
 * Extraction failure rate < 1% for sampled runs (initial).
 ---
 ## Phase 3 — API endpoints and caching
 ### Deliverables
 * `/runs/{id}/first-signal` endpoint
 * Optional `/runs/{id}/summary`
 * Cache-control + ETag support
 * Access control checks consistent with existing run authorization
 ### Tasks
 * Serve cached first signal first; fallback to DB
 * If missing:
  * return `204` (or a “pending” object) and allow UI fallback
 * Add server-side metrics:
  * endpoint latency, cache hit rate, payload size
 ### Acceptance criteria
 * Endpoint P95 latency meets target (e.g., < 200ms internal).
 * Cache hit rate is high for active runs (after prefetch).
 ---
 ## Phase 4 — Frontend progressive rendering
 ### Deliverables
 * FirstSignalCard component
 * Route resolver + local cache
 * Prefetch on runs list view
 * Telemetry for TTFS
 ### Tasks
 * Render shell immediately
 * Fetch and render first signal
 * Lazy-load heavy panels using `@defer` / dynamic imports
 * Implement “open failing stage” default behavior
 ### Acceptance criteria
 * In throttled network test, first signal card appears significantly earlier than logs and graphs.
 * `ttfs_signal_rendered` fires exactly once per view, with correct dimensions.
 ---
 ## Phase 5 — Observability, dashboards, and alerting
 ### Deliverables
 * TTFS dashboards by:
  * provider, repo, run type, release version
 * Alerts:
  * P95 regression threshold
 * Quality dashboard:
  * availability rate, extraction failures, “generic signal rate”
 ### Tasks
 * Create event pipeline for telemetry into your analytics system
 * Define SLO/error budget alerts
 * Add tracing (OpenTelemetry) for endpoint and summarizer
 ### Acceptance criteria
 * You can correlate TTFS with:
  * bounce rate
  * open→action time
 * You can pinpoint whether regressions are backend, frontend, or provider‑specific.
 ---
 ## Phase 6 — QA, performance testing, rollout
 ### Deliverables
 * Automated tests
 * Feature flag + gradual rollout
 * A/B experiment (optional)
 ### Tasks
 **Testing**
 * Unit tests:
  * extractor patterns
  * classification rules
 * Integration tests:
  * simulated job logs with known outcomes
 * E2E (Playwright/Cypress):
  * verify first signal appears before logs
  * verify fallback path works if endpoint fails
 * Performance tests:
  * cold cache vs warm cache
  * throttled CPU/network profiles
 **Rollout**
 * Feature flag:
  * enabled for internal users first
  * ramp by repo or percentage
 * Monitor key metrics during ramp:
  * TTFS P95
  * API error rate
  * UI error rate
  * cache miss spikes
 ### Acceptance criteria
 * No increase in overall error rates.
 * TTFS improves at least X% for a meaningful slice of users (define X from baseline).
 * Fallback UX remains usable when signals are unavailable.
 ---
 # Backlog examples (ready-to-create Jira tickets)
 ## Epic: Run summary and first signal storage
 * Create `ci_first_signal` table
 * Create `ci_run_summary` table
 * Implement ETag hashing
 * Implement Redis caching layer
 * Add admin/debug endpoint (internal only) to inspect computed signals
 ## Epic: Log chunk extraction
 * Implement incremental log scanning
 * Store first error excerpt + range
 * Map excerpt to job + step
 * Add redaction pass to excerpts
 ## Epic: Run details progressive UI
 * FirstSignalCard UI component
 * Lazy-load logs viewer
 * Default to opening failing stage
 * Prefetch signals in runs list
 ## Epic: Telemetry and dashboards
 * Add `ttfs_start` and `ttfs_signal_rendered`
 * Add endpoint latency metrics
 * Build dashboards + alerts
 * Add sampling for “signal helpfulness” feedback
 ---
 # Risk register and mitigations
 ## Risk: First signal is wrong/misleading
 * Mitigation:
  * track “generic signal rate” and “corrected by user” feedback
  * classification confidence scoring
  * always provide quick access to full logs as fallback
 ## Risk: Logs contain secrets
 * Mitigation:
  * redact excerpts before storing/serving
  * cap excerpt lines and length
  * keep raw logs behind existing permissions
 ## Risk: Increased ingest CPU cost
 * Mitigation:
  * incremental scanning with early stop after first error captured
  * limit scanning per chunk
  * sample/skip overly large logs until job completion
 ## Risk: Cache invalidation bugs
 * Mitigation:
  * ETag-based correctness
  * versioned keys: `:v1`
  * “write-through” cache updates from summarizer
 ---
 # Definition of Done checklist
 A phase is “done” when:
 * ✅ TTFS measured with reliable client events
 * ✅ FirstSignalCard renders from `/first-signal` endpoint
 * ✅ ETag caching works
 * ✅ Fallback path is solid (no blank screens)
 * ✅ Dashboards exist and are actively watched during rollout
 * ✅ Security review completed for log excerpts/redaction
 * ✅ Load tests show no unacceptable backend regressions
 ---
 # Optional enhancements after initial launch
 1. **Next-step recommendations**
   Add action suggestions and deep links (rotate token, open failing test, open doc).
 2. **Flaky test / known issue detection**
   Show “this matches known flaky signature” with last-seen frequency.
 3. **“Compare to last green”**
   Summarize what changed since last successful run (commit diff, dependency bump).
 4. **SSE live updates**
   Update first signal as soon as failure occurs while user watches.
 ---
 If you tell me your current backend stack (Node/Go/.NET), log storage (S3/Elastic/Loki), and which CI providers you support, I can translate this into a concrete set of modules/classes, exact schema migrations, and the Angular routing + signals code structure you’d implement.
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,643 @@
 Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast.
 # Replay Fidelity (what, why, how)
 **What it is:** the share of historical scans that reproduce **bit‑for‑bit** when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.
 **Why you want it:** it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.
 ---
 ## The metric
 * **Per‑scan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`.
 * **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`.
 * **Breakdown:** also track by scanner, language, image base, feed version, and environment.
 ---
 ## What must be captured in the scan manifest
 * Exact source refs (image digest / repo SHA), container layers’ digests
 * Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
 * Feed snapshots (CVE DB, OVAL, vendor advisories) as **content‑addressed** bundles
 * Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
 * Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy
 ---
 ## Pass/Fail rules you can ship
 * **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
 * **Warn:** Any bucket drops by ≥ 2% week‑over‑week
 * **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95
 ---
 ## Minimal replay harness (outline)
 1. Pick N historical scans (e.g., last 200 or stratified by image language).
 2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds).
 3. Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
 4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256.
 5. Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code).
 ---
 ## Dashboard (what to show)
 * Fidelity % (30/90‑day) + sparkline
 * Top offenders (by language/scanner/policy set)
 * “Cause of mismatch” histogram (feed vs runtime vs code vs policy)
 * Click‑through: deterministic diff (e.g., which CVEs flipped and why)
 ---
 ## Quick wins for Stella Ops
 * Treat **feeds as immutable snapshots** (content‑addressed tar.zst) and record their digest in each scan.
 * Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network).
 * Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
 * Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays.
 * Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`.
 Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres?
 Below is an extended “Replay Fidelity” design **plus a concrete development implementation plan** you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules.
 ---
 ## 1) Extend the concept: Replay Fidelity as a product capability
 ### 1.1 Fidelity levels (so you can be strict without being brittle)
 Instead of a single yes/no, define **tiers** that you can report and gate on:
 1. **Bitwise Fidelity (BF)**
   * *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization.
   * *Use:* strongest auditability, catch ordering/nondeterminism.
 2. **Semantic Fidelity (SF)**
   * *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps).
   * *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
   * *Use:* protects you from “cosmetic diffs” and helps triage.
 3. **Policy Fidelity (PF)**
   * *Definition:* Final policy decision (pass/fail + reason codes) matches.
   * *Use:* useful when outputs may evolve but governance outcome must remain stable.
 **Recommended reporting:**
 * Dashboard shows BF, SF, PF together.
 * Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally.
 ---
 ### 1.2 “Why did it drift?”—Mismatch classification taxonomy
 When a replay fails, auto-tag the cause so humans don’t diff JSON by hand.
 **Primary mismatch classes**
 * **Feed drift:** CVE/OVAL/vendor advisory snapshot differs.
 * **Policy drift:** policy/lattice/rules differ (or default rule set changed).
 * **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences.
 * **Scanner drift:** scanner binary build differs or dependency versions changed.
 * **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic.
 * **External IO:** network calls, “latest” resolution, remote package registry changes.
 **Output:** a `mismatch_reason` plus a short `diff_summary`.
 ---
 ### 1.3 Deterministic “scan envelope” design
 A replay only works if the scan is fully specified.
 **Scan envelope components**
 * **Inputs:** image digest, repo commit SHA, build provenance, layers digests.
 * **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles.
 * **Feeds:** content-addressed feed bundle digests (see §2.3).
 * **Policy/rules:** git commit SHA + content digest of compiled rules.
 * **Environment:** OS/arch, tz/locale, “clock mode”, network mode, CPU count.
 * **Normalization:** “canonicalization version” for SBOM/VEX/findings.
 ---
 ### 1.4 Canonicalization so “bitwise” is meaningful
 To make BF achievable:
 * Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
 * Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames)
 * Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)`
 * Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)
 ---
 ### 1.5 Sampling strategy
 You don’t need to replay everything.
 **Nightly sample:** stratified by:
 * language ecosystem (npm, pip, maven, go, rust…)
 * scanner engine
 * base OS
 * “regulatory tier”
 * image size/complexity
 **Plus:** always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.
 ---
 ## 2) Technical architecture blueprint
 ### 2.1 System components
 1. **Manifest Writer (in the scan pipeline)**
   * Produces `ScanManifest v1` JSON
   * Records all digests and versions
 2. **Artifact Store**
   * Stores SBOM, findings, VEX, evidence blobs
   * Stores canonical hashes for BF checks
 3. **Feed Snapshotter**
   * Periodically builds immutable feed bundles
   * Content-addressed (digest-keyed)
   * Stores metadata (source URLs, generation timestamp, signature)
 4. **Replay Orchestrator**
   * Chooses historical scans to replay
   * Launches “replay executor” jobs
 5. **Replay Executor**
   * Runs scanner in pinned container image
   * Network off, tz fixed, clock policy applied
   * Produces new artifacts + hashes
 6. **Diff & Scoring Engine**
   * Computes BF/SF/PF
   * Generates mismatch classification + diff summary
 7. **Metrics + UI Dashboard**
   * Prometheus metrics
   * UI for drill-down diffs
 ---
 ### 2.2 Data model (Postgres-friendly)
 **Core tables**
 * `scan_manifests`
  * `scan_id (pk)`
  * `manifest_json`
  * `manifest_sha256`
  * `created_at`
 * `scan_artifacts`
  * `scan_id (fk)`
  * `artifact_type` (sbom|findings|vex|evidence)
  * `artifact_uri`
  * `canonical_sha256`
  * `schema_version`
 * `feed_snapshots`
  * `feed_digest (pk)`
  * `bundle_uri`
  * `sources_json`
  * `generated_at`
  * `signature`
 * `replay_runs`
  * `replay_id (pk)`
  * `original_scan_id (fk)`
  * `status` (queued|running|passed|failed)
  * `bf_match bool`, `sf_match bool`, `pf_match bool`
  * `mismatch_reason`
  * `diff_summary_json`
  * `started_at`, `finished_at`
  * `executor_env_json` (arch, tz, cpu, image digest)
 **Indexes**
 * `(created_at)` for sampling windows
 * `(mismatch_reason, finished_at)` for triage
 * `(scanner_version, ecosystem)` for breakdown dashboards
 ---
 ### 2.3 Feed Snapshotting (the key to long-term replay)
 **Feed bundle format**
 * `feeds/<source>/<date>/...` inside a tar.zst
 * manifest file inside bundle: `feed_bundle_manifest.json` containing:
  * source URLs
  * retrieval commit/etag (if any)
  * file hashes
  * generated_by version
 **Content addressing**
 * Digest of the entire bundle (`sha256(tar.zst)`) is the reference.
 * Scans record only the digest + URI.
 **Immutability**
 * Store bundles in object storage with WORM / retention if you need compliance.
 ---
 ### 2.4 Replay execution sandbox
 For determinism, enforce:
 * **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags)
 * **Fixed TZ/locale**
 * **Pinned container image digest**
 * **Clock policy**
  * Either “real time but recorded” or “frozen time at original scan timestamp”
  * If scanner logic uses current date for severity windows, freeze time
 ---
 ## 3) Development implementation plan
 I’ll lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size.
 ### Workstream A — Scan Manifest & Canonical Artifacts
 **Goal:** every scan is replayable on paper, even before replays run.
 **Deliverables**
 * `ScanManifest v1` schema + writer integrated into scan pipeline
 * Canonicalization library + canonical hashing for all artifacts
 **Acceptance criteria**
 * Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
 * Artifact hashes are stable across repeated runs in the same environment
 ---
 ### Workstream B — Feed Snapshotting & Policy Versioning
 **Goal:** eliminate “feed drift” by pinning immutable inputs.
 **Deliverables**
 * Feed bundle builder + signer + uploader
 * Policy/rules bundler (compiled rules bundle, digest recorded)
 **Acceptance criteria**
 * New scans reference feed bundle digests (not “latest”)
 * A scan can be re-run with the same feed bundle and policy bundle
 ---
 ### Workstream C — Replay Runner & Diff Engine
 **Goal:** execute historical scans and score BF/SF/PF with actionable diffs.
 **Deliverables**
 * `stella replay --from manifest.json`
 * Orchestrator job to schedule replays
 * Diff engine + mismatch classifier
 * Storage of replay results
 **Acceptance criteria**
 * Replay produces deterministic artifacts in a pinned environment
 * Dashboard/CLI shows BF/SF/PF + diff summary for failures
 ---
 ### Workstream D — Observability, Dashboard, and CI Gates
 **Goal:** make fidelity visible and enforceable.
 **Deliverables**
 * Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf`
 * Breakdown labels (scanner, ecosystem, policy_set, base_os)
 * Alerts for drop thresholds
 * CI gate option: “block release if BF < threshold on canary set”
 **Acceptance criteria**
 * Engineering can see drift within 24h
 * Releases are blocked when fidelity regressions occur
 ---
 ## 4) Suggested sprint plan with concrete tasks
 ### Sprint 0 — Design lock + baseline
 **Tasks**
 * Define manifest schema: `ScanManifest v1` fields + versioning rules
 * Decide canonicalization rules (what is normalized vs preserved)
 * Choose initial “golden canary” scan set (10–20 representative targets)
 * Add “replay-fidelity” epic with ownership & SLIs/SLOs
 **Exit criteria**
 * Approved schema + canonicalization spec
 * Canary set stored and tagged
 ---
 ### Sprint 1 — Manifest writer + artifact hashing (MVP)
 **Tasks**
 * Implement manifest writer in scan pipeline
 * Store `manifest_json` + `manifest_sha256`
 * Implement canonicalization + hashing for:
  * findings list (sorted)
  * SBOM (normalized)
  * VEX (if present)
 * Persist canonical hashes in `scan_artifacts`
 **Exit criteria**
 * Two identical scans in the same environment yield identical artifact hashes
 * A “manifest export” endpoint/CLI works:
  * `stella scan --emit-manifest out.json`
 ---
 ### Sprint 2 — Feed snapshotter + policy bundling
 **Tasks**
 * Build feed bundler job:
  * pull raw sources
  * normalize layout
  * generate `feed_bundle_manifest.json`
  * tar.zst + sha256
  * upload + record in `feed_snapshots`
 * Update scan pipeline:
  * resolve feed bundle digest at scan start
  * record digest in scan manifest
 * Bundle policy/lattice:
  * compile rules into an immutable artifact
  * record policy bundle digest in manifest
 **Exit criteria**
 * Scans reference immutable feed + policy digests
 * You can fetch feed bundle by digest and reproduce the same feed inputs
 ---
 ### Sprint 3 — Replay executor + “no network” sandbox
 **Tasks**
 * Create replay container image / runtime wrapper
 * Implement `stella replay --from MANIFEST.json`
  * pulls scanner image by digest
  * mounts feed bundle + policy bundle
  * runs in network-off mode
  * applies tz/locale + clock mode
 * Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage)
 **Exit criteria**
 * Replay runs end-to-end for canary scans
 * Deterministic runtime controls verified (no DNS egress, fixed tz)
 ---
 ### Sprint 4 — Diff engine + mismatch classification
 **Tasks**
 * Implement BF compare (canonical hashes)
 * Implement SF compare (semantic JSON/object comparison)
 * Implement PF compare (policy decision equivalence)
 * Implement mismatch classification rules:
  * if feed digest differs → feed drift
  * if scanner digest differs → scanner drift
  * if environment differs → runtime drift
  * else → nondeterminism (with sub-tags for ordering/time/RNG)
 * Generate `diff_summary_json`:
  * top N changed CVEs
  * packages added/removed
  * policy verdict changes
 **Exit criteria**
 * Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes
 * Engineers can reproduce failures locally with the manifest
 ---
 ### Sprint 5 — Dashboard + alerts + CI gate
 **Tasks**
 * Expose Prometheus metrics from replay service
 * Build dashboard:
  * BF/SF/PF trends
  * breakdown by ecosystem/scanner/policy
  * mismatch cause histogram
 * Add alerting rules (drop threshold, bucket regression)
 * Add CI gate mode:
  * “run replays on canary set for this release candidate”
  * block merge if BF < target
 **Exit criteria**
 * Fidelity visible to leadership and engineering
 * Release process is protected by canary replays
 ---
 ### Sprint 6 — Hardening + compliance polish
 **Tasks**
 * Backward compatible manifest upgrades:
  * `manifest_version` bump rules
  * migration support
 * Artifact signing / integrity:
  * sign manifest hash
  * optional transparency log later
 * Storage & retention policies (cost controls)
 * Runbook + oncall playbook
 **Exit criteria**
 * Audit story is complete: “show me exactly how scan X was produced”
 * Operational load is manageable and cost-bounded
 ---
 ## 5) Engineering specs you can start implementing immediately
 ### 5.1 `ScanManifest v1` skeleton (example)
 ```json
 {
  "manifest_version": "1.0",
  "scan_id": "scan_123",
  "created_at": "2025-12-12T10:15:30Z",
  "input": {
    "type": "oci_image",
    "image_ref": "registry/app@sha256:...",
    "layers": ["sha256:...", "sha256:..."],
    "source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
  },
  "scanner": {
    "engine": "stella",
    "scanner_image_digest": "sha256:...",
    "scanner_version": "2025.12.0",
    "config_digest": "sha256:...",
    "flags": ["--deep", "--vex"]
  },
  "feeds": {
    "vuln_feed_bundle_digest": "sha256:...",
    "license_db_digest": "sha256:..."
  },
  "policy": {
    "policy_bundle_digest": "sha256:...",
    "policy_set": "prod-default"
  },
  "environment": {
    "arch": "amd64",
    "os": "linux",
    "tz": "UTC",
    "locale": "C",
    "network": "disabled",
    "clock_mode": "frozen",
    "clock_value": "2025-12-12T10:15:30Z"
  },
  "normalization": {
    "canonicalizer_version": "1.2.0",
    "sbom_schema": "cyclonedx-1.6",
    "vex_schema": "cyclonedx-vex-1.0"
  }
 }
 ```
 ---
 ### 5.2 CLI spec (minimal)
 * `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/`
 * `stella replay --from MANIFEST.json --out-dir replay_out/`
 * `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json`
 ---
 ## 6) Testing strategy (to prevent determinism regressions)
 ### Unit tests
 * Canonicalization: same object → same bytes
 * Sorting stability: randomized input order → stable output
 * Hash determinism
 ### Integration tests
 * Golden canaries:
  * run scan twice in same runner → BF match
  * replay from manifest → BF match
 * “Network leak” test:
  * DNS requests must be zero
 * “Clock leak” test:
  * freeze time; ensure outputs do not include real timestamps
 ### Chaos tests
 * Vary CPU count, run concurrency, run order → still BF match
 * Randomized scheduling / thread interleavings to find races
 ---
 ## 7) Operational policies (so it stays useful)
 ### Retention & cost controls
 * Keep full artifacts for regulated scans (e.g., 1–7 years)
 * For non-regulated:
  * keep manifests + canonical hashes long-term
  * expire heavy evidence blobs after N days
 * Compress large artifacts and dedupe by digest
 ### Alerting examples
 * BF drops by ≥2% week-over-week (any major bucket) → warn
 * BF < 0.90 overall or regulated BF < 0.95 → page / block release
 ### Triage workflow
 * Failed replay auto-creates a ticket with:
  * manifest link
  * mismatch_reason
  * diff_summary
  * reproduction command
 ---
 ## 8) What “done” looks like (definition of success)
 * Any customer/auditor can pick a scan from 6 months ago and you can:
  1. retrieve manifest + feed bundle + policy bundle by digest
  2. replay in a pinned sandbox
  3. show BF/SF/PF results and diffs
 * Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.
 ---
 If you want, I can also provide:
 * a **Postgres DDL** for the tables above,
 * a **Prometheus metrics contract** (names + labels + example queries),
 * and a **diff_summary_json schema** that supports a UI “diff view” without reprocessing artifacts.
--- a/docs/product-advisories/12-Dec-2025
+++ b/docs/product-advisories/12-Dec-2025
@@ -0,0 +1,840 @@
 Here’s a quick, plain‑English idea you can use right away: **not all code diffs are equal**—some actually change what’s *reachable* at runtime (and thus security posture), while others just refactor internals. A “**Smart‑Diff**” pipeline flags only the diffs that open or close attack paths by combining (1) call‑stack traces, (2) dependency graphs, and (3) dataflow.
 ---
 ### Why this matters (background)
 * Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers).
 * Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit.
 * **Dataflow‑aware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policy‑defined entrypoint.”
 ---
 ### Minimal architecture (fits Stella Ops)
 1. **Entrypoint map** (per service): controllers, handlers, consumers.
 2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink).
 3. **Reachability cache** keyed by (commit, entrypoint, package@version).
 4. **Smart‑Diff** = `reachable_paths(commit_B) – reachable_paths(commit_A)`.
   * If a path to a sensitive sink is newly reachable → **High**.
   * If a path disappears → auto‑generate **VEX “not affected (no reachable path)”**.
 ---
 ### Tiny working seeds
 **C# (.NET 10) — Roslyn skeleton to diff call‑reachability**
 ```csharp
 // SmartDiff.csproj targets net10.0
 using Microsoft.CodeAnalysis;
 using Microsoft.CodeAnalysis.CSharp;
 using Microsoft.CodeAnalysis.FindSymbols;
 public static class SmartDiff
 {
    public static async Task<HashSet<string>> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks)
    {
        var workspace = MSBuild.MSBuildWorkspace.Create();
        var solution = await workspace.OpenSolutionAsync(solutionPath);
        var index = new HashSet<string>();
        foreach (var proj in solution.Projects)
        {
            var comp = await proj.GetCompilationAsync();
            if (comp is null) continue;
            // Resolve entrypoints & sinks by symbol name
            var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
                .OfType<IMethodSymbol>().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList();
            var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
                .OfType<IMethodSymbol>().Where(m => sinks.Contains(m.ToDisplayString())).ToList();
            foreach (var ep in epSymbols)
            foreach (var sink in sinkSymbols)
            {
                // Heuristic reachability: cheap path search via SymbolFinder
                var refs = await SymbolFinder.FindReferencesAsync(sink, solution);
                if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk
                    index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}");
            }
        }
        return index;
        static IEnumerable<ISymbol> Descend(INamespaceOrTypeSymbol sym)
        {
            foreach (var m in sym.GetMembers())
            {
                yield return m;
                if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x;
            }
        }
    }
 }
 ```
 **Go — SSA & callgraph seed**
 ```go
 // go.mod: require golang.org/x/tools latest
 package main
 import (
 	"fmt"
 	"golang.org/x/tools/go/callgraph/cha"
 	"golang.org/x/tools/go/packages"
 	"golang.org/x/tools/go/ssa"
 )
 func main() {
 	cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false}
 	pkgs, _ := packages.Load(cfg, "./...")
 	prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0))
 	for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) }
 	prog.Build()
 	cg := cha.CallGraph(prog)
 	// TODO: map entrypoints & sinks, then walk cg from EPs to sinks
 	fmt.Println("nodes:", len(cg.Nodes))
 }
 ```
 ---
 ### How to use it in your pipeline (fast win)
 * **Pre‑merge job**:
  1. Build call graph for `HEAD` and `HEAD^`.
  2. Compute Smart‑Diff.
  3. If any *new* EP→sink path appears, fail with a short, proof‑linked note:
     “New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).”
 * **Post‑scan VEX**:
  * For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that package’s vulnerable surface.
 ---
 ### Evidence to show in the UI
 * “**Path card**”: EP → … → Sink, with file:line hop‑list and commit hash.
 * “**What changed**”: before/after path diff (green removed, red added).
 * “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto).
 ---
 ### Developer checklist (Stella Ops style)
 * [ ] Define entrypoints per service (attribute or YAML).
 * [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO).
 * [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`.
 * [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service).
 * [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX.
 * [ ] Gate merges on **newly‑reachable sensitive sinks**; auto‑VEX when paths disappear.
 If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”).
 Below is a concrete **development implementation plan** to take the “Smart‑Diff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). I’ll assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops).
 ---
 ## 1) Product definition
 ### Problem you’re solving
 Security noise comes from:
 * “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint”
 * Git diffs look big even when behavior is unchanged
 * Teams struggle to triage “is this change actually risky?”
 ### What Smart‑Diff should do (core behavior)
 Given **base commit A** and **head commit B**:
 1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands).
 2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.).
 3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint).
 4. Emit **Smart‑Diff**:
   * **Newly reachable** EP→sink paths (risk ↑)
   * **Removed** EP→sink paths (risk ↓)
   * **Changed** paths (same sink but different sanitization/guards)
 5. Attach **dependency vulnerability context**:
   * If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable”
   * Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence
 ### MVP definition (minimum shippable)
 A PR check that:
 * Flags **new** reachable paths to a small set of high‑risk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL).
 * Produces:
  * SARIF report (for code scanning UI)
  * JSON artifact containing proof paths (EP → … → sink with file:line)
  * Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed)
 ---
 ## 2) Architecture you can actually build
 ### High‑level components
 1. **Policy & Taxonomy Service**
   * Defines entrypoints, sources, sinks, sanitizers, confidence rules
   * Versioned and centrally managed (but supports repo overrides)
 2. **Analyzer Workers (language adapters)**
   * .NET analyzer (Roslyn + control flow)
   * Go analyzer (SSA + callgraph)
   * Outputs standardized IR (Intermediate Representation)
 3. **Graph Store + Reachability Engine**
   * Stores symbol nodes + call edges + dataflow edges
   * Computes reachable sinks per entrypoint
   * Computes diff between commits A and B
 4. **Vulnerability Mapper + VEX Generator**
   * Maps vulnerable packages/functions → “surfaces”
   * Joins with reachability results
   * Emits OpenVEX (or CycloneDX VEX) with evidence links
 5. **CI/PR Integrations**
   * CLI that runs in CI
   * Optional server mode (cache + incremental processing)
 6. **UI/API**
   * Path cards: “what changed”, “why it matters”, “proof”
   * Filters by sink class, confidence, service, entrypoint
 ### Data contracts (standardized IR)
 Make every analyzer output the same shapes so the rest of the pipeline is language‑agnostic:
 * **Symbols**
  * `symbol_id`: stable hash of (lang, module, fully-qualified name, signature)
  * metadata: file, line ranges, kind (method/function), accessibility
 * **Edges**
  * Call edge: `caller_symbol_id -> callee_symbol_id`
  * Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces
  * Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.)
 * **Entrypoints / Sources / Sinks**
  * entrypoint: (symbol_id, route/topic/command metadata)
  * sink: (symbol_id, sink_type, severity, cwe mapping optional)
 * **Paths**
  * `entrypoint -> ... -> sink`
  * hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant
 ---
 ## 3) Workstreams and deliverables
 ### Workstream A — Policy, taxonomy, configuration
 **Deliverables**
 * `smartdiff.policy.yaml` schema and validator
 * A default sink taxonomy:
  * `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later)
 * Initial sanitizer patterns:
  * For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists
 **Implementation notes**
 * Start strict and small: 10–20 sinks, 10 sources, 10 sanitizers.
 * Provide repo-level overrides:
  * `smartdiff.policy.yaml` in repo root
  * Central policies referenced by version tag
 **Acceptance criteria**
 * A service can onboard by configuring:
  * entrypoint discovery mode (auto + manual)
  * sink classes to enforce
  * severity threshold to fail PR
 ---
 ### Workstream B — .NET analyzer (Roslyn)
 **Deliverables**
 * Build pipeline that produces:
  * call graph (methods and invocations)
  * basic control-flow guards for reachability (optional for MVP)
  * taint propagation for common patterns (MVP: parameter → sink)
 * Entry point discovery for:
  * ASP.NET controllers (`[HttpGet]`, `[HttpPost]`)
  * Minimal APIs (`MapGet/MapPost`)
  * gRPC service methods
  * message consumers (configurable attributes/interfaces)
 **Implementation notes (practical path)**
 * MVP static callgraph:
  * Use Roslyn semantic model to resolve invocation targets
  * For virtual/interface calls: conservative resolution to possible implementations within the compilation
 * MVP taint:
  * “Sources”: request params/body, headers, query string, message payloads
  * “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc.
  * Propagate taint across:
    * parameter → local → argument
    * return values
    * simple assignments and concatenations (heuristic)
 * Confidence scoring:
  * Direct static call resolution: high
  * Reflection/dynamic: low (flag separately)
 **Acceptance criteria**
 * On a demo ASP.NET service, if a PR adds:
  * `HttpPost /upload` → `File.WriteAllBytes(userPath, ...)`
    Smart‑Diff flags **new EP→FILE_WRITE path** and shows hops with file/line.
 ---
 ### Workstream C — Go analyzer (SSA)
 **Deliverables**
 * SSA build + callgraph extraction
 * Entrypoint discovery for:
  * `net/http` handlers
  * common routers (Gin/Echo/Chi) via adapter rules
  * gRPC methods
  * consumers (Kafka/NATS/etc.) by config
 **Implementation notes**
 * Use `golang.org/x/tools/go/packages` + `ssa` build
 * Callgraph:
  * start with CHA (Class Hierarchy Analysis) for speed
  * later add pointer analysis for precision on interfaces
 * Taint:
  * sources: `http.Request`, router params, message payloads
  * sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs
 **Acceptance criteria**
 * A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding.
 ---
 ### Workstream D — Graph store + reachability computation
 **Deliverables**
 * Schema in Postgres (recommended first) for:
  * commits, services, languages
  * symbols, edges, entrypoints, sinks
  * computed reachable “facts” (entrypoint→sink with shortest path(s))
 * Reachability engine:
  * BFS/DFS per entrypoint with early cutoffs
  * path reconstruction storage (store predecessor map or store k-shortest paths)
 **Implementation notes**
 * Don’t start with a graph DB unless you must.
 * Use Postgres tables + indexes:
  * `edges(from_symbol, to_symbol, commit_id, kind)`
  * `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)`
  * `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)`
 * Cache:
  * keyed by (commit, policy_version, analyzer_version)
  * avoids recompute on re-runs
 **Acceptance criteria**
 * For any analyzed commit, you can answer:
  * “Which sinks are reachable from these entrypoints?”
  * “Show me one proof path per (entrypoint, sink_type).”
 ---
 ### Workstream E — Smart‑Diff engine (the “diff” part)
 **Deliverables**
 * Diff algorithm producing three buckets:
  * `added_paths`, `removed_paths`, `changed_paths`
 * “Changed” means:
  * same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes
 **Implementation notes**
 * Identify a path by a stable fingerprint:
  * `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)`
 * Store:
  * top-k paths for each pair for evidence (k=1 for MVP, add more later)
 * Severity gating rules:
  * Example:
    * New path to `CMD_EXEC` = fail
    * New path to `FILE_WRITE` = warn unless under `/tmp` allowlist
    * New path to `SQL_RAW` = fail unless parameterized sanitizer present
 **Acceptance criteria**
 * Given commits A and B:
  * If B introduces a new reachable sink, CI fails with a single actionable card:
    * **EP**: route / handler
    * **Sink**: type + symbol
    * **Proof**: hop list
    * **Why**: policy rule triggered
 ---
 ### Workstream F — Vulnerability mapping + VEX
 **Deliverables**
 * Ingest dependency inventory (SBOM or lockfiles)
 * Map vulnerabilities to “surfaces”
  * package → vulnerable module/function patterns
  * minimal version/range matching (from your existing vuln feed)
 * Decision logic:
  * **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink
  * else **Not affected / Not exploitable** with justification
 **Implementation notes**
 * Start with a pragmatic approach:
  * package‑level reachability: “is any symbol in that package reachable?”
  * then iterate toward function‑level surfaces
 * VEX output:
  * include commit hash, policy version, evidence paths
  * embed links to internal “path card” URLs if available
 **Acceptance criteria**
 * For a known vulnerable dependency, the system emits:
  * VEX “not affected” if package code is never reached from any entrypoint, with proof references.
 ---
 ### Workstream G — CI integration + developer UX
 **Deliverables**
 * A single CLI:
  * `smartdiff analyze --commit <sha> --service <svc> --lang <dotnet|go>`
  * `smartdiff diff --base <shaA> --head <shaB> --out sarif`
 * CI templates for:
  * GitHub Actions / GitLab CI
 * Outputs:
  * SARIF
  * JSON evidence bundle
  * optional OpenVEX file
 **Acceptance criteria**
 * Teams can enable Smart‑Diff by adding:
  * CI job + config file
  * no additional infra required for MVP (local artifacts mode)
 * When infra is available, enable server caching mode for speed.
 ---
 ### Workstream H — UI “Path Cards”
 **Deliverables**
 * UI components:
  * Path card list with filters (sink type, severity, confidence)
  * “What changed” diff view:
    * red = added hops
    * green = removed hops
  * “Evidence” panel:
    * file:line for each hop
    * code snippets (optional)
 * APIs:
  * `GET /smartdiff/{repo}/{pr}/findings`
  * `GET /smartdiff/{repo}/{commit}/path/{path_id}`
 **Acceptance criteria**
 * A developer can click one finding and understand:
  * how the data got there
  * exactly what line introduced the risk
  * how to fix (sanitize/guard/allowlist)
 ---
 ## 4) Milestone plan (sequenced, no time promises)
 ### Milestone 0 — Foundation
 * Repo scaffolding:
  * `smartdiff-cli/`
  * `analyzers/dotnet/`
  * `analyzers/go/`
  * `core-ir/` (schemas + validation)
  * `server/` (optional; can come later)
 * Define IR JSON schema + versioning rules
 * Implement policy YAML + validator + sample policies
 * Implement “local mode” artifact output
 **Exit criteria**
 * You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo.
 ---
 ### Milestone 1 — Callgraph reachability MVP
 * .NET: build call edges + entrypoint discovery (basic)
 * Go: build call edges + entrypoint discovery (basic)
 * Graph store: in-memory or local sqlite/postgres
 * Compute reachable sinks (callgraph only, no taint)
 **Exit criteria**
 * On a demo repo, you can list:
  * entrypoints
  * reachable sinks (callgraph reachability only)
  * a proof path (hop list)
 ---
 ### Milestone 2 — Smart‑Diff MVP (PR gating)
 * Compute diff between base/head reachable sink sets
 * Produce SARIF with:
  * rule id = sink type
  * message includes entrypoint + sink + link to evidence JSON
 * CI templates + documentation
 **Exit criteria**
 * In PR checks, the job fails on new EP→sink paths and links to a proof.
 ---
 ### Milestone 3 — Taint/dataflow MVP (high-value sinks only)
 * Add taint propagation to reduce false positives:
  * differentiate “sink reachable” vs “untrusted data reaches sink”
 * Add sanitizer recognition
 * Add confidence scoring + suppression mechanisms (policy allowlists)
 **Exit criteria**
 * A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise).
 ---
 ### Milestone 4 — VEX integration MVP
 * Join reachability with dependency vulnerabilities
 * Emit OpenVEX (and/or CycloneDX VEX)
 * Store evidence references (paths) inside VEX justification
 **Exit criteria**
 * For a repo with a vulnerable dependency, you can automatically produce:
  * affected/not affected with evidence.
 ---
 ### Milestone 5 — Scale and precision improvements
 * Incremental analysis (only analyze changed projects/packages)
 * Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion)
 * Optional runtime telemetry integration:
  * import production traces to prioritize “actually observed” entrypoints
 **Exit criteria**
 * Works on large services with acceptable run time and stable noise levels.
 ---
 ## 5) Backlog you can paste into Jira (epics + key stories)
 ### Epic: Policy & taxonomy
 * Story: Define `smartdiff.policy.yaml` schema and validator
  **AC:** invalid configs fail with clear errors; configs are versioned.
 * Story: Provide default sink list and severities
  **AC:** at least 10 sink rules with test cases.
 ### Epic: .NET analyzer
 * Story: Resolve method invocations to symbols (Roslyn)
  **AC:** correct targets for direct calls; conservative handling for virtual calls.
 * Story: Discover ASP.NET routes and bind to entrypoint symbols
  **AC:** entrypoints include route/method metadata.
 ### Epic: Go analyzer
 * Story: SSA build and callgraph extraction
  **AC:** function nodes and edges generated for a multi-package repo.
 * Story: net/http entrypoint discovery
  **AC:** handler functions recognized as entrypoints with path labels.
 ### Epic: Reachability engine
 * Story: Compute reachable sinks per entrypoint
  **AC:** store at least one path with hop list.
 * Story: Smart‑Diff A vs B
  **AC:** added/removed paths computed deterministically.
 ### Epic: CI/SARIF
 * Story: Emit SARIF results
  **AC:** findings appear in code scanning UI; include file/line.
 ### Epic: Taint analysis
 * Story: Propagate taint from request to sink for 3 sink classes
  **AC:** produces “tainted” evidence with a variable/argument trace.
 * Story: Sanitizer recognition
  **AC:** path marked “sanitized” and downgraded per policy.
 ### Epic: VEX
 * Story: Generate OpenVEX statements from reachability + vuln feed
  **AC:** for “not affected” includes justification and evidence references.
 ---
 ## 6) Key engineering decisions (recommended defaults)
 ### Storage
 * Start with **Postgres** (or even local sqlite for MVP) for simplicity.
 * Introduce a graph DB only if:
  * you need very large multi-commit graph queries at low latency
  * Postgres performance becomes a hard blocker
 ### Confidence model
 Every edge/path should carry:
 * `confidence`: High/Med/Low
 * `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic`
  This lets you:
 * gate only on high-confidence paths in early rollout
 * keep low-confidence as “informational”
 ### Suppression model
 * Local suppressions:
  * `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry
 * Policy allowlists:
  * allow file writes only under certain directories
  * allow outbound network only to configured domains
 ---
 ## 7) Testing strategy (to avoid “cool demo, unusable tool”)
 ### Unit tests
 * Symbol hashing stability tests
 * Call resolution tests:
  * overloads, generics, interfaces, lambdas
 * Policy parsing/validation tests
 ### Integration tests (must-have)
 * Golden repos in `testdata/`:
  * one ASP.NET minimal API
  * one MVC controller app
  * one Go net/http + one Gin app
 * Golden outputs:
  * expected entrypoints
  * expected reachable sinks
  * expected diff between commits
 ### Regression tests
 * A curated corpus of “known issues”:
  * false positives you fixed should never return
  * false negatives: ensure known risky path is always found
 ### Performance tests
 * Measure:
  * analysis time per 50k LOC
  * memory peak
  * graph size
 * Budget enforcement:
  * if over budget, degrade gracefully (lower precision, mark low confidence)
 ---
 ## 8) Example configs and outputs (to make onboarding easy)
 ### Example policy YAML (minimal)
 ```yaml
 version: 1
 service: invoices-api
 entrypoints:
  autodiscover:
    dotnet:
      aspnet: true
    go:
      net_http: true
 sinks:
  - type: CMD_EXEC
    severity: high
    match:
      dotnet:
        symbols:
          - "System.Diagnostics.Process.Start(string)"
      go:
        symbols:
          - "os/exec.Command"
  - type: FILE_WRITE
    severity: medium
    match:
      dotnet:
        namespaces: ["System.IO"]
      go:
        symbols: ["os.WriteFile"]
 gating:
  fail_on:
    - sink_type: CMD_EXEC
      when: "added && confidence >= medium"
    - sink_type: FILE_WRITE
      when: "added && tainted && confidence >= medium"
 ```
 ### Evidence JSON shape (what the UI consumes)
 ```json
 {
  "commit": "abc123",
  "entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"},
  "sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"},
  "confidence": "high",
  "tainted": true,
  "path": [
    {"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42},
    {"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18},
    {"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null}
  ]
 }
 ```
 ---
 ## 9) Risks and mitigations (explicit)
 1. **Dynamic behavior (reflection, DI, router magic)**
   * Mitigation: conservative fallbacks + confidence labels + optional runtime traces later
 2. **Noise from huge callgraphs**
   * Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, k‑shortest paths only
 3. **Large repo build failures**
   * Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag
 4. **Teams rejecting gating**
   * Mitigation: staged rollout:
     * Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER
 ---
 ## 10) Definition of done (what “implemented” means)
 You should consider Smart‑Diff “implemented” when:
 * A repo can enable it with one config + one CI job.
 * PRs get:
  * a small number of **actionable** findings (not hundreds)
  * each finding has a proof path with file/line hops
 * It reliably detects at least:
  * new command execution paths
  * new unsafe deserialization paths
  * new tainted filesystem write paths
 * It can optionally emit VEX decisions backed by reachability evidence.
 ---
 If you want the next step, I can also give you:
 * a **concrete repo layout** with module boundaries,
 * the **Postgres schema** (tables + indexes),
 * and a **language adapter interface** (so adding Java/Python later is straightforward).