diff --git a/docs/product-advisories/12-Dec-2025 - Designing a Deterministic Vulnerability Scoring Matrix.md b/docs/product-advisories/12-Dec-2025 - Designing a Deterministic Vulnerability Scoring Matrix.md new file mode 100644 index 000000000..37fdb79fa --- /dev/null +++ b/docs/product-advisories/12-Dec-2025 - Designing a Deterministic Vulnerability Scoring Matrix.md @@ -0,0 +1,750 @@ +Here’s a simple, practical way to score vulnerabilities that’s more auditable than plain CVSS: build a **deterministic score** from three reproducible inputs—**Reachability**, **Evidence**, and **Provenance**—so every number is explainable and replayable. + +--- + +### Why move beyond CVSS? + +* **CVSS is context-light**: it rarely knows *your* call paths, configs, or runtime. +* **Audits need proof**: regulators and customers increasingly ask, “show me how you got this score.” +* **Teams need consistency**: the same image should get the same score across environments when inputs are identical. + +--- + +### The scoring idea (plain English) + +Score = a weighted function of: + +1. **Reachability Depth (R)** — how close the vulnerable function is to a real entry point in *your* app (e.g., public HTTP route → handler → library call). +2. **Evidence Density (E)** — how much concrete proof you have (stack traces, symbol hits, config toggles, feature flags, SCA vs. SAST vs. DAST vs. runtime). +3. **Provenance Integrity (P)** — how trustworthy the artifact chain is (signed SBOM, DSSE attestations, SLSA/Rekor entries, reproducible build match). + +A compact, auditable formula you can start with: + +``` +NormalizedScore = W_R * f(R) + W_E * g(E) + W_P * h(P) +``` + +* Pick monotonic, bounded transforms (e.g., map to 0..1): + + * f(R): inverse of hops (shorter path ⇒ higher value) + * g(E): weighted sum of evidence types (runtime>DAST>SAST>SCA, with decay for stale data) + * h(P): cryptographic/provenance checks (unsigned < signed < signed+attested < signed+attested+reproducible) + +Keep **W_R + W_E + W_P = 1** (e.g., 0.5, 0.35, 0.15 for reachability-first triage). + +--- + +### What makes this “deterministic”? + +* Inputs are **machine-replayable**: call-graph JSON, evidence bundle (hashes + timestamps), provenance attestations. +* The score is **purely a function of those inputs**, so anyone can recompute it later and match your result byte-for-byte. + +--- + +### Minimal rubric (ready to implement) + +* **Reachability (R, 0..1)** + + * 1.00 = vulnerable symbol called on a hot path from a public route (≤3 hops) + * 0.66 = reachable but behind uncommon feature flag or deep path (4–7 hops) + * 0.33 = only theoretically reachable (code present, no discovered path) + * 0.00 = dead/unreferenced code in this build +* **Evidence (E, 0..1)** (sum, capped at 1.0) + + * +0.6 runtime trace hitting the symbol + * +0.3 DAST/integ test activating vulnerable behavior + * +0.2 SAST precise sink match + * +0.1 SCA presence only (no call evidence) + * (Apply 10–30% decay if older than N days) +* **Provenance (P, 0..1)** + + * 0.0 unsigned/unknown origin + * 0.3 signed image only + * 0.6 signed + SBOM (hash-linked) + * 1.0 signed + SBOM + DSSE attestations + reproducible build match + +Example weights: `W_R=0.5, W_E=0.35, W_P=0.15`. + +--- + +### How this plugs into **Stella Ops** + +* **Scanner** produces call-graphs & symbol maps (R). +* **Vexer**/Evidence store aggregates SCA/SAST/DAST/runtime proofs with timestamps (E). +* **Authority/Proof‑Graph** verifies signatures, SBOM↔image hash links, DSSE/Rekor (P). +* **Policy Engine** applies the scoring formula (YAML policy) and emits a signed VEX note with the score + input hashes. +* **Replay**: any audit can re-run the same policy with the same inputs and get the same score. + +--- + +### Developer checklist (do this first) + +* Emit a **Reachability JSON** per build: entrypoints, hops, functions, edges, timestamps, hashes. +* Normalize **Evidence Types** with IDs, confidence, freshness, and content hashes. +* Record **Provenance Facts** (signing certs, SBOM digest, DSSE bundle, reproducible-build fingerprint). +* Implement the **score function as a pure library** (no I/O), version it (e.g., `score.v1`), and include the version + inputs’ hashes in every VEX note. +* Add a **30‑sec “Time‑to‑Evidence” UI**: click a score → see the exact call path, evidence list, and provenance checks. + +--- + +### Why this helps compliance & sales + +* Every number is **auditable** (inputs + function are transparent). +* Scores remain **consistent across air‑gapped sites** (deterministic, no hidden heuristics). +* You can **prove reduction** after a fix (paths disappear, evidence decays, provenance improves). + +If you want, I can draft the YAML policy schema and a tiny .NET 10 library stub for `score.v1` so you can drop it into Stella Ops today. +Below is an extended, **developer-ready implementation plan** to build the deterministic vulnerability score into **Stella Ops** (Scanner → Evidence/Vexer → Authority/Proof‑Graph → Policy Engine → UI/VEX output). I’m assuming a .NET-centric stack (since you mentioned .NET 10 earlier), but everything is laid out so the scoring core stays language-agnostic. + +--- + +## 1) Extend the scoring model into a stable, “auditable primitive” + +### 1.1 Outputs you should standardize on + +Produce **two** signed artifacts per finding (plus optional UI views): + +1. **ScoreResult** (primary): + +* `riskScore` (0–100 integer) +* `subscores` (each 0–100 integer): `baseSeverity`, `reachability`, `evidence`, `provenance` +* `explain[]` (structured reasons, ordered deterministically) +* `inputs` (digests of all upstream inputs) +* `policy` (policy version + digest) +* `engine` (engine version + digest) +* `asOf` timestamp (the only “time” allowed to affect the result) + +2. **VEX note** (OpenVEX/CSAF-compatible wrapper): + +* references ScoreResult digest +* embeds the score (optional) + the input digests +* signed by Stella Ops Authority + +> Key audit requirement: anyone can recompute the score **offline** from the input bundle + policy + engine version. + +--- + +## 2) Make determinism non-negotiable + +### 2.1 Determinism rules (implement as “engineering constraints”) + +These are the common ways deterministic systems become non-deterministic: + +* **No floating point** in scoring math. Use integer “basis points” and integer bucket tables. +* **No implicit time**. Scoring takes `asOf` as an explicit input. Evidence “freshness” is computed as `asOf - evidence.timestamp`. +* **Canonical serialization** for hashing: + + * Use RFC-style canonical JSON (e.g., JCS) or a strict canonical CBOR profile. + * Sort keys and arrays deterministically. +* **Stable ordering** for explanation lists: + + * Always sort factors by `(factorId, contributingObjectDigest)`. + +### 2.2 Fixed-point scoring approach (recommended) + +Represent weights and multipliers as **basis points** (bps): + +* 100% = 10,000 bps +* 1% = 100 bps + +Example: `totalScore = (wB*B + wR*R + wE*E + wP*P) / 10000` + +--- + +## 3) Extended score definition (v1) + +### 3.1 Subscores (0–100 integers) + +#### BaseSeverity (B) + +* Source: CVSS if present, else vendor severity, else default. +* Normalize to 0–100: + + * CVSS 0.0–10.0 → 0–100 by `B = round(CVSS * 10)` + +Keep it small weight so you’re “beyond CVSS” but still anchored. + +#### Reachability (R) + +Computed from reachability report (call-path depth + gating conditions). + +**Hop buckets** (example): + +* 0–2 hops: 100 +* 3 hops: 85 +* 4 hops: 70 +* 5 hops: 55 +* 6 hops: 45 +* 7 hops: 35 +* 8+ hops: 20 +* unreachable: 0 + +**Gate multipliers** (apply multiplicatively in bps): + +* behind feature flag: ×7000 +* auth required: ×8000 +* only admin role: ×8500 +* non-default config: ×7500 + +Final: `R = bucketScore * gateMultiplier / 10000` + +#### Evidence (E) + +Sum evidence “points” capped at 100, then apply freshness multiplier. + +Evidence points (example): + +* runtime trace hitting vulnerable symbol: +60 +* DAST / integration test triggers behavior: +30 +* SAST precise sink match: +20 +* SCA presence only: +10 + +Freshness bucket multiplier (example): + +* age ≤ 7 days: ×10000 +* ≤ 30 days: ×9000 +* ≤ 90 days: ×7500 +* ≤ 180 days: ×6000 +* ≤ 365 days: ×4000 +* > 365: ×2000 + +Final: `E = min(100, sum(points)) * freshness / 10000` + +#### Provenance (P) + +Based on verified supply-chain checks. + +Levels: + +* unsigned/unknown: 0 +* signed image: 30 +* signed + SBOM hash-linked to image: 60 +* signed + SBOM + DSSE attestations verified: 80 +* above + reproducible build match: 100 + +### 3.2 Total score and overrides + +Weights (example): + +* `wB=1000` (10%) +* `wR=4500` (45%) +* `wE=3000` (30%) +* `wP=1500` (15%) + +Total: + +* `riskScore = (wB*B + wR*R + wE*E + wP*P) / 10000` + +Override examples (still deterministic, because they depend on evidence flags): + +* If `knownExploited=true` AND `R >= 70` → force score to 95+ +* If unreachable (`R=0`) AND only SCA evidence (`E<=10`) → clamp score ≤ 25 + +--- + +## 4) Canonical schemas (what to build first) + +### 4.1 ReachabilityReport (per artifact + vuln) + +Minimum fields: + +* `artifactDigest` (sha256 of image or build artifact) +* `graphDigest` (sha256 of canonical call-graph representation) +* `vulnId` (CVE/OSV/etc) +* `vulnerableSymbol` (fully-qualified) +* `entrypoints[]` (HTTP routes, queue consumers, CLI commands, cron handlers) +* `shortestPath`: + + * `hops` (int) + * `nodes[]` (ordered list of symbols) + * `edges[]` (optional) +* `gates[]`: + + * `type` (“featureFlag” | “authRequired” | “configNonDefault” | …) + * `detail` (string) +* `computedAt` (timestamp) +* `toolVersion` + +### 4.2 EvidenceBundle (per artifact + vuln) + +Evidence items are immutable and deduped by content hash. + +* `evidenceId` (content hash) +* `artifactDigest` +* `vulnId` +* `type` (“SCA” | “SAST” | “DAST” | “RUNTIME” | “ADVISORY”) +* `tool` (name/version) +* `timestamp` +* `confidence` (0–100) +* `subject` (package, symbol, endpoint) +* `payloadDigest` (hash of raw payload stored separately) + +### 4.3 ProvenanceReport (per artifact) + +* `artifactDigest` +* `signatureChecks[]` (who signed, what key, result) +* `sbomDigest` + `sbomType` +* `attestations[]` (DSSE digests + verification result) +* `transparencyLogRefs[]` (optional) +* `reproducibleMatch` (bool) +* `computedAt` +* `toolVersion` +* `verificationLogDigest` + +### 4.4 ScoreInput + ScoreResult + +**ScoreInput** should include: + +* `asOf` +* `policyVersion` +* digests for reachability/evidence/provenance/base severity source + +**ScoreResult** should include: + +* `riskScore`, `subscores` +* `explain[]` (deterministic) +* `engineVersion`, `policyDigest` +* `inputs[]` (digests) +* `resultDigest` (hash of canonical ScoreResult) +* `signature` (Authority signs the digest) + +--- + +## 5) Development implementation plan (phased, with deliverables + acceptance criteria) + +### Phase A — Foundations: schemas, hashing, policy format, test harness + +**Deliverables** + +* Canonical JSON format rules + hashing utilities (shared lib) +* JSON Schemas for: ReachabilityReport, EvidenceItem, ProvenanceReport, ScoreInput, ScoreResult +* “Golden fixture” repo: a set of input bundles and expected ScoreResults +* Policy format `score.v1` (YAML or JSON) using **integer bps** + +**Acceptance criteria** + +* Same input bundle → identical `resultDigest` across: + + * OS (Linux/Windows) + * CPU (x64/ARM64) + * runtime versions (supported .NET versions) +* Fixtures run in CI and fail on any byte-level diff + +--- + +### Phase B — Scoring engine (pure function library) + +**Deliverables** + +* `Stella.ScoreEngine` as a pure library: + + * `ComputeScore(ScoreInputBundle) -> ScoreResult` + * `Explain(ScoreResult) -> structured explanation` (already embedded) +* Policy parser + validator: + + * weights sum to 10,000 + * bucket tables monotonic + * override rules deterministic and total order + +**Acceptance criteria** + +* 100% deterministic tests passing (golden fixtures) +* “Explain” always includes: + + * subscores + * applied buckets + * applied gate multipliers + * freshness bucket selected + * provenance level selected +* No non-deterministic dependencies (time, random, locale, float) + +--- + +### Phase C — Evidence pipeline (Vexer / Evidence Store) + +**Deliverables** + +* Normalized evidence ingestion adapters: + + * SCA ingest (from your existing scanner output) + * SAST ingest + * DAST ingest + * runtime trace ingest (optional MVP → “symbol hit” events) +* Evidence Store service: + + * immutability (append-only) + * dedupe by `evidenceId` + * query by `(artifactDigest, vulnId)` + +**Acceptance criteria** + +* Ingesting the same evidence twice yields identical state (idempotent) +* Every evidence record can be exported as a bundle with content hashes +* Evidence timestamps preserved; `asOf` drives freshness deterministically + +--- + +### Phase D — Reachability analyzer (Scanner extension) + +**Deliverables** + +* Call-graph builder and symbol resolver: + + * for .NET: IL-level call graph + ASP.NET route discovery +* Reachability computation: + + * compute shortest path hops from entrypoints to vulnerable symbol + * attach gating detections (config/feature/auth heuristics) +* Reachability report emitter: + + * emits ReachabilityReport with stable digests + +**Acceptance criteria** + +* Given the same build artifact, reachability report digest is stable +* Paths are replayable and visualizable (nodes are resolvable) +* Unreachable findings are explicitly marked and explainable + +--- + +### Phase E — Provenance verification (Authority / Proof‑Graph) + +**Deliverables** + +* Verification pipeline: + + * signature verification for artifact digest + * SBOM hash linking + * attestation verification (DSSE/in‑toto style) + * optional transparency log reference capture + * optional reproducible-build comparison input +* ProvenanceReport emitter (signed verification log digest) + +**Acceptance criteria** + +* Verification is offline-capable if given the necessary bundles +* Any failed check is captured with a deterministic error code + message +* ProvenanceReport digest is stable for same inputs + +--- + +### Phase F — Orchestration: “score a finding” workflow + VEX output + +**Deliverables** + +* Orchestrator service (or existing pipeline step) that: + + 1. receives a vulnerability finding + 2. fetches reachability/evidence/provenance bundles + 3. builds ScoreInput with `asOf` + 4. computes ScoreResult + 5. signs ScoreResult digest + 6. emits VEX note referencing ScoreResult digest +* Storage for ScoreResult + VEX note (immutable, versioned) + +**Acceptance criteria** + +* “Recompute” produces same ScoreResult digest if inputs unchanged +* VEX note includes: + + * policy version + digest + * engine version + * input digests + * score + subscores +* End-to-end API returns “why” data in <1 round trip (cached) + +--- + +### Phase G — UI: “Why this score?” and replay/export + +**Deliverables** + +* Findings view enhancements: + + * score badge + risk bucket (Low/Med/High/Critical) + * click-through “Why this score” +* “Why this score” panel: + + * call path visualization (at least as an ordered list for MVP) + * evidence list with freshness + confidence + * provenance checks list (pass/fail) + * export bundle (inputs + policy + engine version) for audit replay + +**Acceptance criteria** + +* Any score is explainable in <30 seconds by a human reviewer +* Exported bundle can reproduce score offline + +--- + +### Phase H — Governance: policy-as-code, versioning, calibration, rollout + +**Deliverables** + +* Policy registry: + + * store `score.v1` policies by org/project/environment + * approvals + change log +* Versioning strategy: + + * engine semantic versioning + * policy digest pinned in ScoreResult + * migration tooling (e.g., score.v1 → score.v2) +* Rollout mechanics: + + * shadow mode: compute score but don’t enforce + * enforcement gates: block deploy if score ≥ threshold + +**Acceptance criteria** + +* Policy changes never rewrite past scores +* You can backfill new scores with a new policy version without ambiguity +* Audit log shows: who changed policy, when, why (optional but recommended) + +--- + +## 6) Engineering backlog (epics → stories → DoD) + +### Epic 1: Deterministic core + +* Story: implement canonical JSON + hashing +* Story: implement fixed-point math helpers (bps) +* Story: implement score.v1 buckets + overrides +* DoD: + + * no floats + * golden test suite + * deterministic explain ordering + +### Epic 2: Evidence normalization + +* Story: evidence schema + dedupe +* Story: adapters (SCA/SAST/DAST/runtime) +* Story: evidence query API +* DoD: + + * idempotent ingest + * bundle export with digests + +### Epic 3: Reachability + +* Story: entrypoint discovery for target frameworks +* Story: call graph extraction +* Story: shortest-path computation +* Story: gating heuristics +* DoD: + + * stable digests + * replayable paths + +### Epic 4: Provenance + +* Story: verify signatures +* Story: verify SBOM link +* Story: verify attestations +* Story: reproducible match input support +* DoD: + + * deterministic error codes + * stable provenance scoring + +### Epic 5: End-to-end score + VEX + +* Story: orchestration +* Story: ScoreResult signing +* Story: VEX generation and storage +* DoD: + + * recompute parity + * verifiable signatures + +### Epic 6: UI + +* Story: score badge + buckets +* Story: why panel +* Story: export bundle + recompute button +* DoD: + + * human explainability + * offline replay works + +--- + +## 7) APIs to implement (minimal but complete) + +### 7.1 Compute score (internal) + +* `POST /api/score/compute` + + * input: `ScoreInput` + references or inline bundles + * output: `ScoreResult` + +### 7.2 Get score (product) + +* `GET /api/findings/{findingId}/score` + + * returns latest ScoreResult + VEX reference + +### 7.3 Explain score + +* `GET /api/findings/{findingId}/score/explain` + + * returns `explain[]` + call path + evidence list + provenance checks + +### 7.4 Export replay bundle + +* `GET /api/findings/{findingId}/score/bundle` + + * returns a tar/zip containing: + + * ScoreInput + * policy file + * reachability/evidence/provenance reports + * engine version manifest + +--- + +## 8) Testing strategy (what to automate early) + +### Unit tests + +* bucket selection correctness +* gate multiplier composition +* evidence freshness bucketing +* provenance level mapping +* override rule ordering + +### Golden fixtures + +* fixed input bundles → fixed ScoreResult digest +* run on every supported platform/runtime + +### Property-based tests + +* monotonicity: + + * fewer hops should not reduce R + * more evidence points should not reduce E + * stronger provenance should not reduce P + +### Integration tests + +* full pipeline: finding → bundles → score → VEX +* “recompute” parity tests + +--- + +## 9) Operational concerns and hardening + +### Performance + +* Cache reachability per `(artifactDigest, vulnId, symbol)` +* Cache provenance per `artifactDigest` +* Evidence queries should be indexed by `(artifactDigest, vulnId, type)` + +### Security + +* Treat evidence ingestion as untrusted input: + + * strict schema validation + * content-hash dedupe prevents tampering via overwrite +* Sign ScoreResults and VEX notes +* RBAC: + + * who can change policy + * who can override scores (if allowed at all) + +### Data retention + +* Evidence payloads can be large; keep digests + store raw payloads in object storage +* Keep a “minimal replay bundle” always (schemas + digests + policy + engine) + +--- + +## 10) Concrete “MVP first” slice (smallest valuable product) + +If you want a crisp MVP that still satisfies “auditable determinism”: + +1. Scoring engine (`B + R + E + P`), fixed-point, golden tests +2. Evidence store (SCA + runtime optional) +3. Reachability: only hop depth from HTTP routes to symbol (no fancy gates) +4. Provenance: signed image + SBOM link only +5. UI: score + “why” panel showing: + + * hops/path list + * evidence list + * provenance checklist +6. Emit a signed VEX note containing the score + input digests + +That MVP already proves the core differentiator: **deterministic, replayable risk scoring**. + +--- + +## 11) Starter policy file (score.v1) using basis points + +Here’s a good “real implementation” starting point (int-only): + +```yaml +policyVersion: score.v1 +weightsBps: + baseSeverity: 1000 + reachability: 4500 + evidence: 3000 + provenance: 1500 + +reachability: + hopBuckets: + - { maxHops: 2, score: 100 } + - { maxHops: 3, score: 85 } + - { maxHops: 4, score: 70 } + - { maxHops: 5, score: 55 } + - { maxHops: 6, score: 45 } + - { maxHops: 7, score: 35 } + - { maxHops: 9999, score: 20 } + unreachableScore: 0 + gateMultipliersBps: + featureFlag: 7000 + authRequired: 8000 + adminOnly: 8500 + nonDefaultConfig: 7500 + +evidence: + points: + runtime: 60 + dast: 30 + sast: 20 + sca: 10 + freshnessBuckets: + - { maxAgeDays: 7, multiplierBps: 10000 } + - { maxAgeDays: 30, multiplierBps: 9000 } + - { maxAgeDays: 90, multiplierBps: 7500 } + - { maxAgeDays: 180, multiplierBps: 6000 } + - { maxAgeDays: 365, multiplierBps: 4000 } + - { maxAgeDays: 99999, multiplierBps: 2000 } + +provenance: + levels: + unsigned: 0 + signed: 30 + signedWithSbom: 60 + signedWithSbomAndAttestations: 80 + reproducible: 100 + +overrides: + - name: knownExploitedAndReachable + when: + flags: + knownExploited: true + minReachability: 70 + setScore: 95 + + - name: unreachableAndOnlySca + when: + maxReachability: 0 + maxEvidence: 10 + clampMaxScore: 25 +``` + +--- + +If you want, I can also include a **repo layout + CI “golden fixture” test runner** (dotnet test + cross-platform determinism checks) and a **.NET 10 ScoreEngine skeleton** that enforces: no floats, canonical JSON hashing, and stable explanation ordering. diff --git a/docs/product-advisories/12-Dec-2025 - Measure UX Efficiency Through TTFS.md b/docs/product-advisories/12-Dec-2025 - Measure UX Efficiency Through TTFS.md new file mode 100644 index 000000000..df155e5e9 --- /dev/null +++ b/docs/product-advisories/12-Dec-2025 - Measure UX Efficiency Through TTFS.md @@ -0,0 +1,744 @@ +Here’s a simple, high‑leverage UX metric to add to your pipeline run view that will immediately make DevOps feel faster and calmer: + +# Time‑to‑First‑Signal (TTFS) + +**What it is:** the time from opening a run’s details page until the UI renders the **first actionable insight** (e.g., “Stage `build` failed – `dotnet restore` 401 – token expired”). +**Why it matters:** engineers don’t need *all* data instantly—just the first trustworthy clue to start acting. Lower TTFS = quicker triage, lower stress, tighter MTTR. + +--- + +## What counts as a “first signal” + +* Failed stage + reason (exit code, key log line, failing test name) +* Degraded but actionable status (e.g., flaky test signature) +* Policy gate block with the specific rule that failed +* Reachability‑aware security finding that blocks deploy (one concrete example, not the whole list) + +> Not a signal: spinners, generic “loading…”, or unactionable counts. + +--- + +## How to optimize TTFS (practical steps) + +1. **Deferred loading (prioritize critical panes):** + + * Render header + failing stage card first; lazy‑load artifacts, full logs, and graphs after. + * Pre‑expand the *first failing node* in the stage graph. + +2. **Log pre‑indexing at ingest:** + + * During CI, stream logs into chunks keyed by `[jobId, phase, severity, firstErrorLine]`. + * Extract the **first error tuple** (timestamp, step, message) and store it next to the job record. + * On UI open, fetch only that tuple (sub‑100 ms) before fetching the rest. + +3. **Cached summaries:** + + * Persist a tiny JSON “run.summary.v1” (status, first failing stage, first error line, blocking policies) in Redis/Postgres. + * Invalidate on new job events; always serve this summary first. + +4. **Edge prefetch:** + + * When the runs table is visible, prefetch summaries for rows in viewport so details pages open “warm”. + +5. **Compress + cap first log burst:** + + * Send the first **5–10 error lines** (already extracted) immediately; stream the rest. + +--- + +## Instrumentation (so you can prove it) + +Emit these points as telemetry: + +* `ttfs_start`: when the run details route is entered (or when tab becomes visible) +* `ttfs_signal_rendered`: when the first actionable card is in the DOM +* `ttfs_ms = signal_rendered - start` +* Dimensions: `pipeline_provider`, `repo`, `branch`, `run_type` (PR/main), `device`, `release`, `network_state` + +**SLO:** *P50 ≤ 700 ms, P95 ≤ 2.5 s* (adjust to your infra). + +**Dashboards to track:** + +* TTFS distribution (P50/P90/P95) by release +* Correlate TTFS with bounce rate and “open → rerun” delay +* Error budget: % of views with TTFS > 3 s + +--- + +## Minimal backend contract (example) + +```json +GET /api/runs/{runId}/first-signal +{ + "runId": "123", + "firstSignal": { + "type": "stage_failed", + "stage": "build", + "step": "dotnet restore", + "message": "401 Unauthorized: token expired", + "at": "2025-12-11T09:22:31Z", + "artifact": { "kind": "log", "range": {"start": 1880, "end": 1896} } + }, + "summaryEtag": "W/\"a1b2c3\"" +} +``` + +--- + +## Frontend pattern (Angular 17, signal‑first) + +* Fire `first-signal` request in route resolver. +* Render `FirstSignalCard` immediately. +* Lazy‑load stage graph, full logs, security panes. +* Fire `ttfs_signal_rendered` when `FirstSignalCard` enters viewport. + +--- + +## CI adapter hints (GitLab/GitHub/Azure) + +* Hook on job status webhooks to compute & store the first error tuple. +* For GitLab: scan `trace` stream for first `ERRO|FATAL|##[error]` match; store to DB table `ci_run_first_signal(run_id, stage, step, message, t)`. + +--- + +## “Good TTFS” acceptance tests + +* Run with early fail → first signal < 1 s, shows exact command + exit code. +* Run with policy gate fail → rule name + fix hint visible first. +* Offline/slow network → cached summary still renders an actionable hint. + +--- + +## Copy to put in your UX guidelines + +> “Optimize **Time‑to‑First‑Signal (TTFS)** above all. Users must see one trustworthy, actionable clue within 1 second on a warm path—even if the rest of the UI is still loading.” + +If you want, I can sketch the exact DB schema for the pre‑indexed log tuples and the Angular resolver + telemetry hooks next. +Below is an extended, end‑to‑end implementation plan for **Time‑to‑First‑Signal (TTFS)** that you can drop into your backlog. It includes architecture, data model, API contracts, frontend work, observability, QA, and rollout—structured as epics/phases with “definition of done” and acceptance criteria. + +--- + +# Scope extension + +## What we’re building + +A run details experience that renders **one actionable clue** fast—before loading heavy UI like full logs, graphs, artifacts. + +**“First signal”** is a small payload derived from run/job events and the earliest meaningful error evidence (stage/step + key log line(s) + reason/classification). + +## What we’re extending beyond the initial idea + +1. **First‑Signal Quality** (not just speed) + + * Classify error type (auth, dependency, compilation, test, infra, policy, timeout). + * Identify “culprit step” and a stable “signature” for dedupe and search. +2. **Progressive disclosure UX** + + * Summary → First signal card → expanded context (stage graph, logs, artifacts). +3. **Provider‑agnostic ingestion** + + * Adapters for GitLab/GitHub/Azure (or your CI provider). +4. **Caching + prefetch** + + * Warm open from list/table, with ETags and stale‑while‑revalidate. +5. **Observability & SLOs** + + * TTFS metrics, dashboards, alerting, and quality metrics (false signals). +6. **Rollout safety** + + * Feature flags, canary, A/B gating, and a guaranteed fallback path. + +--- + +# Success criteria + +## Primary metric + +* **TTFS (ms)**: time from details page route enter → first actionable signal rendered. + +## Targets (example SLOs) + +* **P50 ≤ 700 ms**, **P95 ≤ 2500 ms** on warm path. +* **Cold path**: P95 ≤ 4000 ms (depends on infra). + +## Secondary outcome metrics + +* **Open→Action time**: time from opening run to first user action (rerun, cancel, assign, open failing log line). +* **Bounce rate**: close page within 10 seconds without interaction. +* **MTTR proxy**: time from failure to first rerun or fix commit. + +## Quality metrics + +* **Signal availability rate**: % of run views that show a first signal card within 3s. +* **Signal accuracy score** (sampled): engineer confirms “helpful vs not”. +* **Extractor failure rate**: parsing errors / missing mappings / timeouts. + +--- + +# Architecture overview + +## Data flow + +1. **CI provider events** (job started, job finished, stage failed, log appended) land in your backend. +2. **Run summarizer** maintains: + + * `run_summary` (small JSON) + * `first_signal` (small, actionable payload) +3. **UI opens run details** + + * Immediately calls `GET /runs/{id}/first-signal` (or `/summary`). + * Renders FirstSignalCard as soon as payload arrives. +4. Background fetches: + + * Stage graph, full logs, artifacts, security scans, trends. + +## Key decision: where to compute first signal + +* **Option A: at ingest time (recommended)** + Compute first signal when logs/events arrive, store it, serve it instantly. +* **Option B: on demand** + Compute when user opens run details (simpler initially, worse TTFS and load). + +--- + +# Data model + +## Tables (relational example) + +### `ci_run` + +* `run_id (pk)` +* `provider` +* `repo_id` +* `branch` +* `status` +* `created_at`, `updated_at` + +### `ci_job` + +* `job_id (pk)` +* `run_id (fk)` +* `stage_name` +* `job_name` +* `status` +* `started_at`, `finished_at` + +### `ci_log_chunk` + +* `chunk_id (pk)` +* `job_id (fk)` +* `seq` (monotonic) +* `byte_start`, `byte_end` (range into blob) +* `first_error_line_no` (nullable) +* `first_error_excerpt` (nullable, short) +* `severity_max` (info/warn/error) + +### `ci_run_summary` + +* `run_id (pk)` +* `version` (e.g., `1`) +* `etag` (hash) +* `summary_json` (small, 1–5 KB) +* `updated_at` + +### `ci_first_signal` + +* `run_id (pk)` +* `etag` +* `signal_json` (small, 0.5–2 KB) +* `quality_flags` (bitmask or json) +* `updated_at` + +## Cache layer + +* Redis keys: + + * `run:{runId}:summary:v1` + * `run:{runId}:first-signal:v1` +* TTL: generous but safe (e.g., 24h) with “write‑through” on event updates. + +--- + +# First signal definition + +## `FirstSignal` object (recommended shape) + +```json +{ + "runId": "123", + "computedAt": "2025-12-12T09:22:31Z", + "status": "failed", + "firstSignal": { + "type": "stage_failed", + "classification": "dependency_auth", + "stage": "build", + "job": "build-linux-x64", + "step": "dotnet restore", + "message": "401 Unauthorized: token expired", + "signature": "dotnet-restore-401-unauthorized", + "log": { + "jobId": "job-789", + "lines": [ + "error : Response status code does not indicate success: 401 (Unauthorized).", + "error : The token is expired." + ], + "range": { "start": 1880, "end": 1896 } + }, + "suggestedActions": [ + { "label": "Rotate token", "type": "doc", "target": "internal://docs/tokens" }, + { "label": "Rerun job", "type": "action", "target": "rerun-job:job-789" } + ] + }, + "etag": "W/\"a1b2c3\"" +} +``` + +### Notes + +* `signature` should be stable for grouping. +* `suggestedActions` is optional but hugely valuable (even 1–2 actions). + +--- + +# APIs + +## 1) First signal endpoint + +**GET** `/api/runs/{runId}/first-signal` + +Headers: + +* `If-None-Match: W/"..."` supported +* Response includes `ETag` and `Cache-Control` + +Responses: + +* `200`: full first signal object +* `304`: not modified +* `404`: run not found +* `204`: run exists but signal not available yet (rare; should degrade gracefully) + +## 2) Summary endpoint (optional but useful) + +**GET** `/api/runs/{runId}/summary` + +* Includes: status, first failing stage/job, timestamps, blocking policies, artifact counts. + +## 3) SSE / WebSocket updates (nice-to-have) + +**GET** `/api/runs/{runId}/events` (SSE) + +* Push new signal or summary updates in near real-time while user is on the page. + +--- + +# Frontend implementation plan (Angular 17) + +## UX behavior + +1. **Route enter** + + * Start TTFS timer. +2. Render instantly: + + * Title, status badge, pipeline metadata (run id, commit, branch). + * Skeleton for details area. +3. Fetch first signal: + + * Render `FirstSignalCard` immediately when available. + * Fire telemetry event when card is **in DOM and visible**. +4. Lazy-load: + + * Stage graph + * Full logs viewer + * Artifacts list + * Security findings + * Trends, flaky tests, etc. + +## Angular structure + +* `RunDetailsResolver` (or `resolveFn`) requests first signal. +* `RunDetailsComponent` uses signals to render quickly. +* `FirstSignalCardComponent` is standalone + minimal deps. + +## Prefetch strategy from runs list view + +* When the runs table is visible, prefetch summaries/first signals for items in viewport: + + * Use `IntersectionObserver` to prefetch only visible rows. + * Store results in an in-memory cache (e.g., `Map`). + * Respect ETag to avoid redundant payloads. + +## Telemetry hooks + +* `ttfs_start`: route activation + tab visible +* `ttfs_signal_rendered`: FirstSignalCard attached and visible +* Dimensions: provider, repo, branch, run_type, release_version, network_state + +--- + +# Backend implementation plan + +## Summarizer / First-signal service + +A service or module that: + +* subscribes to run/job events +* receives log chunks (or pointers) +* computes and stores: + + * `run_summary` + * `first_signal` +* publishes updates (optional) to an event stream for SSE + +### Concurrency rule + +First signal should be set once per run unless a “better” signal appears: + +* if current signal is missing → set +* if current signal is “generic” and new one is “specific” → replace +* otherwise keep (avoid churn) + +--- + +# Extraction & classification logic + +## Minimum viable extractor (Phase 1) + +* Heuristics: + + * first match among patterns: `FATAL`, `ERROR`, `##[error]`, `panic:`, `Unhandled exception`, `npm ERR!`, `BUILD FAILED`, etc. + * plus provider-specific fail markers +* Pull: + + * stage/job/step context (from job metadata or step boundaries) + * 5–10 log lines around first error line + +## Improved extractor (Phase 2+) + +* Language/tool specific rules: + + * dotnet, maven/gradle, npm/yarn/pnpm, python/pytest, go test, docker build, terraform, helm +* Add `classification` and `signature`: + + * normalize common errors: + + * auth expired/forbidden + * missing dependency / DNS / TLS + * compilation error + * test failure (include test name) + * infra capacity / agent lost + * policy gate failure + +## Guardrails + +* **Secret redaction**: before storing excerpts, run your existing redaction pipeline. +* **Payload cap**: cap message length and excerpt lines. +* **PII discipline**: avoid including arbitrary stack traces if they contain sensitive paths; include only key lines. + +--- + +# Development plan by phases (epics) + +Each phase below includes deliverables + acceptance criteria. You can treat each as a sprint/iteration. + +--- + +## Phase 0 — Baseline and alignment + +### Deliverables + +* Baseline TTFS measurement (current behavior) +* Definition of “actionable signal” and priority rules +* Performance budget for run details view + +### Tasks + +* Add client-side telemetry for current page load steps: + + * route enter, summary loaded, logs loaded, graph loaded +* Measure TTFS proxy today (likely “time to status shown”) +* Identify top 20 failure modes in your CI (from historical logs) + +### Acceptance criteria + +* Dashboard shows baseline P50/P95 for current experience. +* “First signal” contract signed off with UI + backend teams. + +--- + +## Phase 1 — Data model and storage + +### Deliverables + +* DB migrations for `ci_run_summary` and `ci_first_signal` +* Redis cache keys and invalidation strategy +* ADR: where summaries live and how they update + +### Tasks + +* Create tables and indices: + + * index on `run_id`, `updated_at`, `provider` +* Add serializer/deserializer for `summary_json` and `signal_json` +* Implement ETag generation (hash of JSON payload) + +### Acceptance criteria + +* Can store and retrieve summary + first signal for a run in < 50ms (DB) and < 10ms (cache). +* ETag works end-to-end. + +--- + +## Phase 2 — Ingestion and first signal computation + +### Deliverables + +* First-signal computation module +* Provider adapter integration points (webhook consumers) +* “first error tuple” extraction from logs + +### Tasks + +* On job log append: + + * scan incrementally for first error markers + * store excerpt + line range + job/stage/step mapping +* On job finish/fail: + + * finalize first signal with best known context +* Implement the “better signal replaces generic” rule + +### Acceptance criteria + +* For a known failing run, API returns first signal without reading full log blob. +* Computation does not exceed a small CPU budget per log chunk (guard with limits). +* Extraction failure rate < 1% for sampled runs (initial). + +--- + +## Phase 3 — API endpoints and caching + +### Deliverables + +* `/runs/{id}/first-signal` endpoint +* Optional `/runs/{id}/summary` +* Cache-control + ETag support +* Access control checks consistent with existing run authorization + +### Tasks + +* Serve cached first signal first; fallback to DB +* If missing: + + * return `204` (or a “pending” object) and allow UI fallback +* Add server-side metrics: + + * endpoint latency, cache hit rate, payload size + +### Acceptance criteria + +* Endpoint P95 latency meets target (e.g., < 200ms internal). +* Cache hit rate is high for active runs (after prefetch). + +--- + +## Phase 4 — Frontend progressive rendering + +### Deliverables + +* FirstSignalCard component +* Route resolver + local cache +* Prefetch on runs list view +* Telemetry for TTFS + +### Tasks + +* Render shell immediately +* Fetch and render first signal +* Lazy-load heavy panels using `@defer` / dynamic imports +* Implement “open failing stage” default behavior + +### Acceptance criteria + +* In throttled network test, first signal card appears significantly earlier than logs and graphs. +* `ttfs_signal_rendered` fires exactly once per view, with correct dimensions. + +--- + +## Phase 5 — Observability, dashboards, and alerting + +### Deliverables + +* TTFS dashboards by: + + * provider, repo, run type, release version +* Alerts: + + * P95 regression threshold +* Quality dashboard: + + * availability rate, extraction failures, “generic signal rate” + +### Tasks + +* Create event pipeline for telemetry into your analytics system +* Define SLO/error budget alerts +* Add tracing (OpenTelemetry) for endpoint and summarizer + +### Acceptance criteria + +* You can correlate TTFS with: + + * bounce rate + * open→action time +* You can pinpoint whether regressions are backend, frontend, or provider‑specific. + +--- + +## Phase 6 — QA, performance testing, rollout + +### Deliverables + +* Automated tests +* Feature flag + gradual rollout +* A/B experiment (optional) + +### Tasks + +**Testing** + +* Unit tests: + + * extractor patterns + * classification rules +* Integration tests: + + * simulated job logs with known outcomes +* E2E (Playwright/Cypress): + + * verify first signal appears before logs + * verify fallback path works if endpoint fails +* Performance tests: + + * cold cache vs warm cache + * throttled CPU/network profiles + +**Rollout** + +* Feature flag: + + * enabled for internal users first + * ramp by repo or percentage +* Monitor key metrics during ramp: + + * TTFS P95 + * API error rate + * UI error rate + * cache miss spikes + +### Acceptance criteria + +* No increase in overall error rates. +* TTFS improves at least X% for a meaningful slice of users (define X from baseline). +* Fallback UX remains usable when signals are unavailable. + +--- + +# Backlog examples (ready-to-create Jira tickets) + +## Epic: Run summary and first signal storage + +* Create `ci_first_signal` table +* Create `ci_run_summary` table +* Implement ETag hashing +* Implement Redis caching layer +* Add admin/debug endpoint (internal only) to inspect computed signals + +## Epic: Log chunk extraction + +* Implement incremental log scanning +* Store first error excerpt + range +* Map excerpt to job + step +* Add redaction pass to excerpts + +## Epic: Run details progressive UI + +* FirstSignalCard UI component +* Lazy-load logs viewer +* Default to opening failing stage +* Prefetch signals in runs list + +## Epic: Telemetry and dashboards + +* Add `ttfs_start` and `ttfs_signal_rendered` +* Add endpoint latency metrics +* Build dashboards + alerts +* Add sampling for “signal helpfulness” feedback + +--- + +# Risk register and mitigations + +## Risk: First signal is wrong/misleading + +* Mitigation: + + * track “generic signal rate” and “corrected by user” feedback + * classification confidence scoring + * always provide quick access to full logs as fallback + +## Risk: Logs contain secrets + +* Mitigation: + + * redact excerpts before storing/serving + * cap excerpt lines and length + * keep raw logs behind existing permissions + +## Risk: Increased ingest CPU cost + +* Mitigation: + + * incremental scanning with early stop after first error captured + * limit scanning per chunk + * sample/skip overly large logs until job completion + +## Risk: Cache invalidation bugs + +* Mitigation: + + * ETag-based correctness + * versioned keys: `:v1` + * “write-through” cache updates from summarizer + +--- + +# Definition of Done checklist + +A phase is “done” when: + +* ✅ TTFS measured with reliable client events +* ✅ FirstSignalCard renders from `/first-signal` endpoint +* ✅ ETag caching works +* ✅ Fallback path is solid (no blank screens) +* ✅ Dashboards exist and are actively watched during rollout +* ✅ Security review completed for log excerpts/redaction +* ✅ Load tests show no unacceptable backend regressions + +--- + +# Optional enhancements after initial launch + +1. **Next-step recommendations** + Add action suggestions and deep links (rotate token, open failing test, open doc). +2. **Flaky test / known issue detection** + Show “this matches known flaky signature” with last-seen frequency. +3. **“Compare to last green”** + Summarize what changed since last successful run (commit diff, dependency bump). +4. **SSE live updates** + Update first signal as soon as failure occurs while user watches. + +--- + +If you tell me your current backend stack (Node/Go/.NET), log storage (S3/Elastic/Loki), and which CI providers you support, I can translate this into a concrete set of modules/classes, exact schema migrations, and the Angular routing + signals code structure you’d implement. diff --git a/docs/product-advisories/12-Dec-2025 - Replay Fidelity as a Proof Metric.md b/docs/product-advisories/12-Dec-2025 - Replay Fidelity as a Proof Metric.md new file mode 100644 index 000000000..f1fe0b04e --- /dev/null +++ b/docs/product-advisories/12-Dec-2025 - Replay Fidelity as a Proof Metric.md @@ -0,0 +1,643 @@ +Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast. + +# Replay Fidelity (what, why, how) + +**What it is:** the share of historical scans that reproduce **bit‑for‑bit** when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system. + +**Why you want it:** it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance. + +--- + +## The metric + +* **Per‑scan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`. +* **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`. +* **Breakdown:** also track by scanner, language, image base, feed version, and environment. + +--- + +## What must be captured in the scan manifest + +* Exact source refs (image digest / repo SHA), container layers’ digests +* Scanner build ID + config (flags, rules, lattice/policy sets, seeds) +* Feed snapshots (CVE DB, OVAL, vendor advisories) as **content‑addressed** bundles +* Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1) +* Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy + +--- + +## Pass/Fail rules you can ship + +* **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95 +* **Warn:** Any bucket drops by ≥ 2% week‑over‑week +* **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95 + +--- + +## Minimal replay harness (outline) + +1. Pick N historical scans (e.g., last 200 or stratified by image language). +2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds). +3. Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale). +4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256. +5. Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code). + +--- + +## Dashboard (what to show) + +* Fidelity % (30/90‑day) + sparkline +* Top offenders (by language/scanner/policy set) +* “Cause of mismatch” histogram (feed vs runtime vs code vs policy) +* Click‑through: deterministic diff (e.g., which CVEs flipped and why) + +--- + +## Quick wins for Stella Ops + +* Treat **feeds as immutable snapshots** (content‑addressed tar.zst) and record their digest in each scan. +* Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network). +* Normalize SBOM/VEX (key order, whitespace, float precision) before hashing. +* Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays. +* Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`. + +Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres? +Below is an extended “Replay Fidelity” design **plus a concrete development implementation plan** you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules. + +--- + +## 1) Extend the concept: Replay Fidelity as a product capability + +### 1.1 Fidelity levels (so you can be strict without being brittle) + +Instead of a single yes/no, define **tiers** that you can report and gate on: + +1. **Bitwise Fidelity (BF)** + + * *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization. + * *Use:* strongest auditability, catch ordering/nondeterminism. + +2. **Semantic Fidelity (SF)** + + * *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps). + * *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts. + * *Use:* protects you from “cosmetic diffs” and helps triage. + +3. **Policy Fidelity (PF)** + + * *Definition:* Final policy decision (pass/fail + reason codes) matches. + * *Use:* useful when outputs may evolve but governance outcome must remain stable. + +**Recommended reporting:** + +* Dashboard shows BF, SF, PF together. +* Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally. + +--- + +### 1.2 “Why did it drift?”—Mismatch classification taxonomy + +When a replay fails, auto-tag the cause so humans don’t diff JSON by hand. + +**Primary mismatch classes** + +* **Feed drift:** CVE/OVAL/vendor advisory snapshot differs. +* **Policy drift:** policy/lattice/rules differ (or default rule set changed). +* **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences. +* **Scanner drift:** scanner binary build differs or dependency versions changed. +* **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic. +* **External IO:** network calls, “latest” resolution, remote package registry changes. + +**Output:** a `mismatch_reason` plus a short `diff_summary`. + +--- + +### 1.3 Deterministic “scan envelope” design + +A replay only works if the scan is fully specified. + +**Scan envelope components** + +* **Inputs:** image digest, repo commit SHA, build provenance, layers digests. +* **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles. +* **Feeds:** content-addressed feed bundle digests (see §2.3). +* **Policy/rules:** git commit SHA + content digest of compiled rules. +* **Environment:** OS/arch, tz/locale, “clock mode”, network mode, CPU count. +* **Normalization:** “canonicalization version” for SBOM/VEX/findings. + +--- + +### 1.4 Canonicalization so “bitwise” is meaningful + +To make BF achievable: + +* Canonical JSON serialization (sorted keys, stable array ordering, normalized floats) +* Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames) +* Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)` +* Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4) + +--- + +### 1.5 Sampling strategy + +You don’t need to replay everything. + +**Nightly sample:** stratified by: + +* language ecosystem (npm, pip, maven, go, rust…) +* scanner engine +* base OS +* “regulatory tier” +* image size/complexity + +**Plus:** always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change. + +--- + +## 2) Technical architecture blueprint + +### 2.1 System components + +1. **Manifest Writer (in the scan pipeline)** + + * Produces `ScanManifest v1` JSON + * Records all digests and versions + +2. **Artifact Store** + + * Stores SBOM, findings, VEX, evidence blobs + * Stores canonical hashes for BF checks + +3. **Feed Snapshotter** + + * Periodically builds immutable feed bundles + * Content-addressed (digest-keyed) + * Stores metadata (source URLs, generation timestamp, signature) + +4. **Replay Orchestrator** + + * Chooses historical scans to replay + * Launches “replay executor” jobs + +5. **Replay Executor** + + * Runs scanner in pinned container image + * Network off, tz fixed, clock policy applied + * Produces new artifacts + hashes + +6. **Diff & Scoring Engine** + + * Computes BF/SF/PF + * Generates mismatch classification + diff summary + +7. **Metrics + UI Dashboard** + + * Prometheus metrics + * UI for drill-down diffs + +--- + +### 2.2 Data model (Postgres-friendly) + +**Core tables** + +* `scan_manifests` + + * `scan_id (pk)` + * `manifest_json` + * `manifest_sha256` + * `created_at` +* `scan_artifacts` + + * `scan_id (fk)` + * `artifact_type` (sbom|findings|vex|evidence) + * `artifact_uri` + * `canonical_sha256` + * `schema_version` +* `feed_snapshots` + + * `feed_digest (pk)` + * `bundle_uri` + * `sources_json` + * `generated_at` + * `signature` +* `replay_runs` + + * `replay_id (pk)` + * `original_scan_id (fk)` + * `status` (queued|running|passed|failed) + * `bf_match bool`, `sf_match bool`, `pf_match bool` + * `mismatch_reason` + * `diff_summary_json` + * `started_at`, `finished_at` + * `executor_env_json` (arch, tz, cpu, image digest) + +**Indexes** + +* `(created_at)` for sampling windows +* `(mismatch_reason, finished_at)` for triage +* `(scanner_version, ecosystem)` for breakdown dashboards + +--- + +### 2.3 Feed Snapshotting (the key to long-term replay) + +**Feed bundle format** + +* `feeds///...` inside a tar.zst +* manifest file inside bundle: `feed_bundle_manifest.json` containing: + + * source URLs + * retrieval commit/etag (if any) + * file hashes + * generated_by version + +**Content addressing** + +* Digest of the entire bundle (`sha256(tar.zst)`) is the reference. +* Scans record only the digest + URI. + +**Immutability** + +* Store bundles in object storage with WORM / retention if you need compliance. + +--- + +### 2.4 Replay execution sandbox + +For determinism, enforce: + +* **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags) +* **Fixed TZ/locale** +* **Pinned container image digest** +* **Clock policy** + + * Either “real time but recorded” or “frozen time at original scan timestamp” + * If scanner logic uses current date for severity windows, freeze time + +--- + +## 3) Development implementation plan + +I’ll lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size. + +### Workstream A — Scan Manifest & Canonical Artifacts + +**Goal:** every scan is replayable on paper, even before replays run. + +**Deliverables** + +* `ScanManifest v1` schema + writer integrated into scan pipeline +* Canonicalization library + canonical hashing for all artifacts + +**Acceptance criteria** + +* Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders +* Artifact hashes are stable across repeated runs in the same environment + +--- + +### Workstream B — Feed Snapshotting & Policy Versioning + +**Goal:** eliminate “feed drift” by pinning immutable inputs. + +**Deliverables** + +* Feed bundle builder + signer + uploader +* Policy/rules bundler (compiled rules bundle, digest recorded) + +**Acceptance criteria** + +* New scans reference feed bundle digests (not “latest”) +* A scan can be re-run with the same feed bundle and policy bundle + +--- + +### Workstream C — Replay Runner & Diff Engine + +**Goal:** execute historical scans and score BF/SF/PF with actionable diffs. + +**Deliverables** + +* `stella replay --from manifest.json` +* Orchestrator job to schedule replays +* Diff engine + mismatch classifier +* Storage of replay results + +**Acceptance criteria** + +* Replay produces deterministic artifacts in a pinned environment +* Dashboard/CLI shows BF/SF/PF + diff summary for failures + +--- + +### Workstream D — Observability, Dashboard, and CI Gates + +**Goal:** make fidelity visible and enforceable. + +**Deliverables** + +* Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf` +* Breakdown labels (scanner, ecosystem, policy_set, base_os) +* Alerts for drop thresholds +* CI gate option: “block release if BF < threshold on canary set” + +**Acceptance criteria** + +* Engineering can see drift within 24h +* Releases are blocked when fidelity regressions occur + +--- + +## 4) Suggested sprint plan with concrete tasks + +### Sprint 0 — Design lock + baseline + +**Tasks** + +* Define manifest schema: `ScanManifest v1` fields + versioning rules +* Decide canonicalization rules (what is normalized vs preserved) +* Choose initial “golden canary” scan set (10–20 representative targets) +* Add “replay-fidelity” epic with ownership & SLIs/SLOs + +**Exit criteria** + +* Approved schema + canonicalization spec +* Canary set stored and tagged + +--- + +### Sprint 1 — Manifest writer + artifact hashing (MVP) + +**Tasks** + +* Implement manifest writer in scan pipeline +* Store `manifest_json` + `manifest_sha256` +* Implement canonicalization + hashing for: + + * findings list (sorted) + * SBOM (normalized) + * VEX (if present) +* Persist canonical hashes in `scan_artifacts` + +**Exit criteria** + +* Two identical scans in the same environment yield identical artifact hashes +* A “manifest export” endpoint/CLI works: + + * `stella scan --emit-manifest out.json` + +--- + +### Sprint 2 — Feed snapshotter + policy bundling + +**Tasks** + +* Build feed bundler job: + + * pull raw sources + * normalize layout + * generate `feed_bundle_manifest.json` + * tar.zst + sha256 + * upload + record in `feed_snapshots` +* Update scan pipeline: + + * resolve feed bundle digest at scan start + * record digest in scan manifest +* Bundle policy/lattice: + + * compile rules into an immutable artifact + * record policy bundle digest in manifest + +**Exit criteria** + +* Scans reference immutable feed + policy digests +* You can fetch feed bundle by digest and reproduce the same feed inputs + +--- + +### Sprint 3 — Replay executor + “no network” sandbox + +**Tasks** + +* Create replay container image / runtime wrapper +* Implement `stella replay --from MANIFEST.json` + + * pulls scanner image by digest + * mounts feed bundle + policy bundle + * runs in network-off mode + * applies tz/locale + clock mode +* Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage) + +**Exit criteria** + +* Replay runs end-to-end for canary scans +* Deterministic runtime controls verified (no DNS egress, fixed tz) + +--- + +### Sprint 4 — Diff engine + mismatch classification + +**Tasks** + +* Implement BF compare (canonical hashes) +* Implement SF compare (semantic JSON/object comparison) +* Implement PF compare (policy decision equivalence) +* Implement mismatch classification rules: + + * if feed digest differs → feed drift + * if scanner digest differs → scanner drift + * if environment differs → runtime drift + * else → nondeterminism (with sub-tags for ordering/time/RNG) +* Generate `diff_summary_json`: + + * top N changed CVEs + * packages added/removed + * policy verdict changes + +**Exit criteria** + +* Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes +* Engineers can reproduce failures locally with the manifest + +--- + +### Sprint 5 — Dashboard + alerts + CI gate + +**Tasks** + +* Expose Prometheus metrics from replay service +* Build dashboard: + + * BF/SF/PF trends + * breakdown by ecosystem/scanner/policy + * mismatch cause histogram +* Add alerting rules (drop threshold, bucket regression) +* Add CI gate mode: + + * “run replays on canary set for this release candidate” + * block merge if BF < target + +**Exit criteria** + +* Fidelity visible to leadership and engineering +* Release process is protected by canary replays + +--- + +### Sprint 6 — Hardening + compliance polish + +**Tasks** + +* Backward compatible manifest upgrades: + + * `manifest_version` bump rules + * migration support +* Artifact signing / integrity: + + * sign manifest hash + * optional transparency log later +* Storage & retention policies (cost controls) +* Runbook + oncall playbook + +**Exit criteria** + +* Audit story is complete: “show me exactly how scan X was produced” +* Operational load is manageable and cost-bounded + +--- + +## 5) Engineering specs you can start implementing immediately + +### 5.1 `ScanManifest v1` skeleton (example) + +```json +{ + "manifest_version": "1.0", + "scan_id": "scan_123", + "created_at": "2025-12-12T10:15:30Z", + + "input": { + "type": "oci_image", + "image_ref": "registry/app@sha256:...", + "layers": ["sha256:...", "sha256:..."], + "source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"} + }, + + "scanner": { + "engine": "stella", + "scanner_image_digest": "sha256:...", + "scanner_version": "2025.12.0", + "config_digest": "sha256:...", + "flags": ["--deep", "--vex"] + }, + + "feeds": { + "vuln_feed_bundle_digest": "sha256:...", + "license_db_digest": "sha256:..." + }, + + "policy": { + "policy_bundle_digest": "sha256:...", + "policy_set": "prod-default" + }, + + "environment": { + "arch": "amd64", + "os": "linux", + "tz": "UTC", + "locale": "C", + "network": "disabled", + "clock_mode": "frozen", + "clock_value": "2025-12-12T10:15:30Z" + }, + + "normalization": { + "canonicalizer_version": "1.2.0", + "sbom_schema": "cyclonedx-1.6", + "vex_schema": "cyclonedx-vex-1.0" + } +} +``` + +--- + +### 5.2 CLI spec (minimal) + +* `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/` +* `stella replay --from MANIFEST.json --out-dir replay_out/` +* `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json` + +--- + +## 6) Testing strategy (to prevent determinism regressions) + +### Unit tests + +* Canonicalization: same object → same bytes +* Sorting stability: randomized input order → stable output +* Hash determinism + +### Integration tests + +* Golden canaries: + + * run scan twice in same runner → BF match + * replay from manifest → BF match +* “Network leak” test: + + * DNS requests must be zero +* “Clock leak” test: + + * freeze time; ensure outputs do not include real timestamps + +### Chaos tests + +* Vary CPU count, run concurrency, run order → still BF match +* Randomized scheduling / thread interleavings to find races + +--- + +## 7) Operational policies (so it stays useful) + +### Retention & cost controls + +* Keep full artifacts for regulated scans (e.g., 1–7 years) +* For non-regulated: + + * keep manifests + canonical hashes long-term + * expire heavy evidence blobs after N days +* Compress large artifacts and dedupe by digest + +### Alerting examples + +* BF drops by ≥2% week-over-week (any major bucket) → warn +* BF < 0.90 overall or regulated BF < 0.95 → page / block release + +### Triage workflow + +* Failed replay auto-creates a ticket with: + + * manifest link + * mismatch_reason + * diff_summary + * reproduction command + +--- + +## 8) What “done” looks like (definition of success) + +* Any customer/auditor can pick a scan from 6 months ago and you can: + + 1. retrieve manifest + feed bundle + policy bundle by digest + 2. replay in a pinned sandbox + 3. show BF/SF/PF results and diffs +* Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime. + +--- + +If you want, I can also provide: + +* a **Postgres DDL** for the tables above, +* a **Prometheus metrics contract** (names + labels + example queries), +* and a **diff_summary_json schema** that supports a UI “diff view” without reprocessing artifacts. diff --git a/docs/product-advisories/12-Dec-2025 - Smart‑Diff Detects Meaningful Risk Shifts.md b/docs/product-advisories/12-Dec-2025 - Smart‑Diff Detects Meaningful Risk Shifts.md new file mode 100644 index 000000000..0b1f88939 --- /dev/null +++ b/docs/product-advisories/12-Dec-2025 - Smart‑Diff Detects Meaningful Risk Shifts.md @@ -0,0 +1,840 @@ +Here’s a quick, plain‑English idea you can use right away: **not all code diffs are equal**—some actually change what’s *reachable* at runtime (and thus security posture), while others just refactor internals. A “**Smart‑Diff**” pipeline flags only the diffs that open or close attack paths by combining (1) call‑stack traces, (2) dependency graphs, and (3) dataflow. + +--- + +### Why this matters (background) + +* Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers). +* Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit. +* **Dataflow‑aware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policy‑defined entrypoint.” + +--- + +### Minimal architecture (fits Stella Ops) + +1. **Entrypoint map** (per service): controllers, handlers, consumers. +2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink). +3. **Reachability cache** keyed by (commit, entrypoint, package@version). +4. **Smart‑Diff** = `reachable_paths(commit_B) – reachable_paths(commit_A)`. + + * If a path to a sensitive sink is newly reachable → **High**. + * If a path disappears → auto‑generate **VEX “not affected (no reachable path)”**. + +--- + +### Tiny working seeds + +**C# (.NET 10) — Roslyn skeleton to diff call‑reachability** + +```csharp +// SmartDiff.csproj targets net10.0 +using Microsoft.CodeAnalysis; +using Microsoft.CodeAnalysis.CSharp; +using Microsoft.CodeAnalysis.FindSymbols; + +public static class SmartDiff +{ + public static async Task> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks) + { + var workspace = MSBuild.MSBuildWorkspace.Create(); + var solution = await workspace.OpenSolutionAsync(solutionPath); + var index = new HashSet(); + + foreach (var proj in solution.Projects) + { + var comp = await proj.GetCompilationAsync(); + if (comp is null) continue; + + // Resolve entrypoints & sinks by symbol name + var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend) + .OfType().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList(); + var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend) + .OfType().Where(m => sinks.Contains(m.ToDisplayString())).ToList(); + + foreach (var ep in epSymbols) + foreach (var sink in sinkSymbols) + { + // Heuristic reachability: cheap path search via SymbolFinder + var refs = await SymbolFinder.FindReferencesAsync(sink, solution); + if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk + index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}"); + } + } + return index; + + static IEnumerable Descend(INamespaceOrTypeSymbol sym) + { + foreach (var m in sym.GetMembers()) + { + yield return m; + if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x; + } + } + } +} +``` + +**Go — SSA & callgraph seed** + +```go +// go.mod: require golang.org/x/tools latest +package main + +import ( + "fmt" + "golang.org/x/tools/go/callgraph/cha" + "golang.org/x/tools/go/packages" + "golang.org/x/tools/go/ssa" +) + +func main() { + cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false} + pkgs, _ := packages.Load(cfg, "./...") + prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0)) + for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) } + prog.Build() + + cg := cha.CallGraph(prog) + // TODO: map entrypoints & sinks, then walk cg from EPs to sinks + fmt.Println("nodes:", len(cg.Nodes)) +} +``` + +--- + +### How to use it in your pipeline (fast win) + +* **Pre‑merge job**: + + 1. Build call graph for `HEAD` and `HEAD^`. + 2. Compute Smart‑Diff. + 3. If any *new* EP→sink path appears, fail with a short, proof‑linked note: + “New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).” +* **Post‑scan VEX**: + + * For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that package’s vulnerable surface. + +--- + +### Evidence to show in the UI + +* “**Path card**”: EP → … → Sink, with file:line hop‑list and commit hash. +* “**What changed**”: before/after path diff (green removed, red added). +* “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto). + +--- + +### Developer checklist (Stella Ops style) + +* [ ] Define entrypoints per service (attribute or YAML). +* [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO). +* [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`. +* [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service). +* [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX. +* [ ] Gate merges on **newly‑reachable sensitive sinks**; auto‑VEX when paths disappear. + +If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”). +Below is a concrete **development implementation plan** to take the “Smart‑Diff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). I’ll assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops). + +--- + +## 1) Product definition + +### Problem you’re solving + +Security noise comes from: + +* “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint” +* Git diffs look big even when behavior is unchanged +* Teams struggle to triage “is this change actually risky?” + +### What Smart‑Diff should do (core behavior) + +Given **base commit A** and **head commit B**: + +1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands). +2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.). +3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint). +4. Emit **Smart‑Diff**: + + * **Newly reachable** EP→sink paths (risk ↑) + * **Removed** EP→sink paths (risk ↓) + * **Changed** paths (same sink but different sanitization/guards) +5. Attach **dependency vulnerability context**: + + * If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable” + * Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence + +### MVP definition (minimum shippable) + +A PR check that: + +* Flags **new** reachable paths to a small set of high‑risk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL). +* Produces: + + * SARIF report (for code scanning UI) + * JSON artifact containing proof paths (EP → … → sink with file:line) + * Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed) + +--- + +## 2) Architecture you can actually build + +### High‑level components + +1. **Policy & Taxonomy Service** + + * Defines entrypoints, sources, sinks, sanitizers, confidence rules + * Versioned and centrally managed (but supports repo overrides) + +2. **Analyzer Workers (language adapters)** + + * .NET analyzer (Roslyn + control flow) + * Go analyzer (SSA + callgraph) + * Outputs standardized IR (Intermediate Representation) + +3. **Graph Store + Reachability Engine** + + * Stores symbol nodes + call edges + dataflow edges + * Computes reachable sinks per entrypoint + * Computes diff between commits A and B + +4. **Vulnerability Mapper + VEX Generator** + + * Maps vulnerable packages/functions → “surfaces” + * Joins with reachability results + * Emits OpenVEX (or CycloneDX VEX) with evidence links + +5. **CI/PR Integrations** + + * CLI that runs in CI + * Optional server mode (cache + incremental processing) + +6. **UI/API** + + * Path cards: “what changed”, “why it matters”, “proof” + * Filters by sink class, confidence, service, entrypoint + +### Data contracts (standardized IR) + +Make every analyzer output the same shapes so the rest of the pipeline is language‑agnostic: + +* **Symbols** + + * `symbol_id`: stable hash of (lang, module, fully-qualified name, signature) + * metadata: file, line ranges, kind (method/function), accessibility + +* **Edges** + + * Call edge: `caller_symbol_id -> callee_symbol_id` + * Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces + * Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.) + +* **Entrypoints / Sources / Sinks** + + * entrypoint: (symbol_id, route/topic/command metadata) + * sink: (symbol_id, sink_type, severity, cwe mapping optional) + +* **Paths** + + * `entrypoint -> ... -> sink` + * hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant + +--- + +## 3) Workstreams and deliverables + +### Workstream A — Policy, taxonomy, configuration + +**Deliverables** + +* `smartdiff.policy.yaml` schema and validator +* A default sink taxonomy: + + * `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later) +* Initial sanitizer patterns: + + * For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists + +**Implementation notes** + +* Start strict and small: 10–20 sinks, 10 sources, 10 sanitizers. +* Provide repo-level overrides: + + * `smartdiff.policy.yaml` in repo root + * Central policies referenced by version tag + +**Acceptance criteria** + +* A service can onboard by configuring: + + * entrypoint discovery mode (auto + manual) + * sink classes to enforce + * severity threshold to fail PR + +--- + +### Workstream B — .NET analyzer (Roslyn) + +**Deliverables** + +* Build pipeline that produces: + + * call graph (methods and invocations) + * basic control-flow guards for reachability (optional for MVP) + * taint propagation for common patterns (MVP: parameter → sink) +* Entry point discovery for: + + * ASP.NET controllers (`[HttpGet]`, `[HttpPost]`) + * Minimal APIs (`MapGet/MapPost`) + * gRPC service methods + * message consumers (configurable attributes/interfaces) + +**Implementation notes (practical path)** + +* MVP static callgraph: + + * Use Roslyn semantic model to resolve invocation targets + * For virtual/interface calls: conservative resolution to possible implementations within the compilation +* MVP taint: + + * “Sources”: request params/body, headers, query string, message payloads + * “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc. + * Propagate taint across: + + * parameter → local → argument + * return values + * simple assignments and concatenations (heuristic) +* Confidence scoring: + + * Direct static call resolution: high + * Reflection/dynamic: low (flag separately) + +**Acceptance criteria** + +* On a demo ASP.NET service, if a PR adds: + + * `HttpPost /upload` → `File.WriteAllBytes(userPath, ...)` + Smart‑Diff flags **new EP→FILE_WRITE path** and shows hops with file/line. + +--- + +### Workstream C — Go analyzer (SSA) + +**Deliverables** + +* SSA build + callgraph extraction +* Entrypoint discovery for: + + * `net/http` handlers + * common routers (Gin/Echo/Chi) via adapter rules + * gRPC methods + * consumers (Kafka/NATS/etc.) by config + +**Implementation notes** + +* Use `golang.org/x/tools/go/packages` + `ssa` build +* Callgraph: + + * start with CHA (Class Hierarchy Analysis) for speed + * later add pointer analysis for precision on interfaces +* Taint: + + * sources: `http.Request`, router params, message payloads + * sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs + +**Acceptance criteria** + +* A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding. + +--- + +### Workstream D — Graph store + reachability computation + +**Deliverables** + +* Schema in Postgres (recommended first) for: + + * commits, services, languages + * symbols, edges, entrypoints, sinks + * computed reachable “facts” (entrypoint→sink with shortest path(s)) +* Reachability engine: + + * BFS/DFS per entrypoint with early cutoffs + * path reconstruction storage (store predecessor map or store k-shortest paths) + +**Implementation notes** + +* Don’t start with a graph DB unless you must. +* Use Postgres tables + indexes: + + * `edges(from_symbol, to_symbol, commit_id, kind)` + * `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)` + * `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)` +* Cache: + + * keyed by (commit, policy_version, analyzer_version) + * avoids recompute on re-runs + +**Acceptance criteria** + +* For any analyzed commit, you can answer: + + * “Which sinks are reachable from these entrypoints?” + * “Show me one proof path per (entrypoint, sink_type).” + +--- + +### Workstream E — Smart‑Diff engine (the “diff” part) + +**Deliverables** + +* Diff algorithm producing three buckets: + + * `added_paths`, `removed_paths`, `changed_paths` +* “Changed” means: + + * same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes + +**Implementation notes** + +* Identify a path by a stable fingerprint: + + * `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)` +* Store: + + * top-k paths for each pair for evidence (k=1 for MVP, add more later) +* Severity gating rules: + + * Example: + + * New path to `CMD_EXEC` = fail + * New path to `FILE_WRITE` = warn unless under `/tmp` allowlist + * New path to `SQL_RAW` = fail unless parameterized sanitizer present + +**Acceptance criteria** + +* Given commits A and B: + + * If B introduces a new reachable sink, CI fails with a single actionable card: + + * **EP**: route / handler + * **Sink**: type + symbol + * **Proof**: hop list + * **Why**: policy rule triggered + +--- + +### Workstream F — Vulnerability mapping + VEX + +**Deliverables** + +* Ingest dependency inventory (SBOM or lockfiles) +* Map vulnerabilities to “surfaces” + + * package → vulnerable module/function patterns + * minimal version/range matching (from your existing vuln feed) +* Decision logic: + + * **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink + * else **Not affected / Not exploitable** with justification + +**Implementation notes** + +* Start with a pragmatic approach: + + * package‑level reachability: “is any symbol in that package reachable?” + * then iterate toward function‑level surfaces +* VEX output: + + * include commit hash, policy version, evidence paths + * embed links to internal “path card” URLs if available + +**Acceptance criteria** + +* For a known vulnerable dependency, the system emits: + + * VEX “not affected” if package code is never reached from any entrypoint, with proof references. + +--- + +### Workstream G — CI integration + developer UX + +**Deliverables** + +* A single CLI: + + * `smartdiff analyze --commit --service --lang ` + * `smartdiff diff --base --head --out sarif` +* CI templates for: + + * GitHub Actions / GitLab CI +* Outputs: + + * SARIF + * JSON evidence bundle + * optional OpenVEX file + +**Acceptance criteria** + +* Teams can enable Smart‑Diff by adding: + + * CI job + config file + * no additional infra required for MVP (local artifacts mode) +* When infra is available, enable server caching mode for speed. + +--- + +### Workstream H — UI “Path Cards” + +**Deliverables** + +* UI components: + + * Path card list with filters (sink type, severity, confidence) + * “What changed” diff view: + + * red = added hops + * green = removed hops + * “Evidence” panel: + + * file:line for each hop + * code snippets (optional) +* APIs: + + * `GET /smartdiff/{repo}/{pr}/findings` + * `GET /smartdiff/{repo}/{commit}/path/{path_id}` + +**Acceptance criteria** + +* A developer can click one finding and understand: + + * how the data got there + * exactly what line introduced the risk + * how to fix (sanitize/guard/allowlist) + +--- + +## 4) Milestone plan (sequenced, no time promises) + +### Milestone 0 — Foundation + +* Repo scaffolding: + + * `smartdiff-cli/` + * `analyzers/dotnet/` + * `analyzers/go/` + * `core-ir/` (schemas + validation) + * `server/` (optional; can come later) +* Define IR JSON schema + versioning rules +* Implement policy YAML + validator + sample policies +* Implement “local mode” artifact output + +**Exit criteria** + +* You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo. + +--- + +### Milestone 1 — Callgraph reachability MVP + +* .NET: build call edges + entrypoint discovery (basic) +* Go: build call edges + entrypoint discovery (basic) +* Graph store: in-memory or local sqlite/postgres +* Compute reachable sinks (callgraph only, no taint) + +**Exit criteria** + +* On a demo repo, you can list: + + * entrypoints + * reachable sinks (callgraph reachability only) + * a proof path (hop list) + +--- + +### Milestone 2 — Smart‑Diff MVP (PR gating) + +* Compute diff between base/head reachable sink sets +* Produce SARIF with: + + * rule id = sink type + * message includes entrypoint + sink + link to evidence JSON +* CI templates + documentation + +**Exit criteria** + +* In PR checks, the job fails on new EP→sink paths and links to a proof. + +--- + +### Milestone 3 — Taint/dataflow MVP (high-value sinks only) + +* Add taint propagation to reduce false positives: + + * differentiate “sink reachable” vs “untrusted data reaches sink” +* Add sanitizer recognition +* Add confidence scoring + suppression mechanisms (policy allowlists) + +**Exit criteria** + +* A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise). + +--- + +### Milestone 4 — VEX integration MVP + +* Join reachability with dependency vulnerabilities +* Emit OpenVEX (and/or CycloneDX VEX) +* Store evidence references (paths) inside VEX justification + +**Exit criteria** + +* For a repo with a vulnerable dependency, you can automatically produce: + + * affected/not affected with evidence. + +--- + +### Milestone 5 — Scale and precision improvements + +* Incremental analysis (only analyze changed projects/packages) +* Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion) +* Optional runtime telemetry integration: + + * import production traces to prioritize “actually observed” entrypoints + +**Exit criteria** + +* Works on large services with acceptable run time and stable noise levels. + +--- + +## 5) Backlog you can paste into Jira (epics + key stories) + +### Epic: Policy & taxonomy + +* Story: Define `smartdiff.policy.yaml` schema and validator + **AC:** invalid configs fail with clear errors; configs are versioned. +* Story: Provide default sink list and severities + **AC:** at least 10 sink rules with test cases. + +### Epic: .NET analyzer + +* Story: Resolve method invocations to symbols (Roslyn) + **AC:** correct targets for direct calls; conservative handling for virtual calls. +* Story: Discover ASP.NET routes and bind to entrypoint symbols + **AC:** entrypoints include route/method metadata. + +### Epic: Go analyzer + +* Story: SSA build and callgraph extraction + **AC:** function nodes and edges generated for a multi-package repo. +* Story: net/http entrypoint discovery + **AC:** handler functions recognized as entrypoints with path labels. + +### Epic: Reachability engine + +* Story: Compute reachable sinks per entrypoint + **AC:** store at least one path with hop list. +* Story: Smart‑Diff A vs B + **AC:** added/removed paths computed deterministically. + +### Epic: CI/SARIF + +* Story: Emit SARIF results + **AC:** findings appear in code scanning UI; include file/line. + +### Epic: Taint analysis + +* Story: Propagate taint from request to sink for 3 sink classes + **AC:** produces “tainted” evidence with a variable/argument trace. +* Story: Sanitizer recognition + **AC:** path marked “sanitized” and downgraded per policy. + +### Epic: VEX + +* Story: Generate OpenVEX statements from reachability + vuln feed + **AC:** for “not affected” includes justification and evidence references. + +--- + +## 6) Key engineering decisions (recommended defaults) + +### Storage + +* Start with **Postgres** (or even local sqlite for MVP) for simplicity. +* Introduce a graph DB only if: + + * you need very large multi-commit graph queries at low latency + * Postgres performance becomes a hard blocker + +### Confidence model + +Every edge/path should carry: + +* `confidence`: High/Med/Low +* `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic` + This lets you: +* gate only on high-confidence paths in early rollout +* keep low-confidence as “informational” + +### Suppression model + +* Local suppressions: + + * `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry +* Policy allowlists: + + * allow file writes only under certain directories + * allow outbound network only to configured domains + +--- + +## 7) Testing strategy (to avoid “cool demo, unusable tool”) + +### Unit tests + +* Symbol hashing stability tests +* Call resolution tests: + + * overloads, generics, interfaces, lambdas +* Policy parsing/validation tests + +### Integration tests (must-have) + +* Golden repos in `testdata/`: + + * one ASP.NET minimal API + * one MVC controller app + * one Go net/http + one Gin app +* Golden outputs: + + * expected entrypoints + * expected reachable sinks + * expected diff between commits + +### Regression tests + +* A curated corpus of “known issues”: + + * false positives you fixed should never return + * false negatives: ensure known risky path is always found + +### Performance tests + +* Measure: + + * analysis time per 50k LOC + * memory peak + * graph size +* Budget enforcement: + + * if over budget, degrade gracefully (lower precision, mark low confidence) + +--- + +## 8) Example configs and outputs (to make onboarding easy) + +### Example policy YAML (minimal) + +```yaml +version: 1 +service: invoices-api +entrypoints: + autodiscover: + dotnet: + aspnet: true + go: + net_http: true + +sinks: + - type: CMD_EXEC + severity: high + match: + dotnet: + symbols: + - "System.Diagnostics.Process.Start(string)" + go: + symbols: + - "os/exec.Command" + - type: FILE_WRITE + severity: medium + match: + dotnet: + namespaces: ["System.IO"] + go: + symbols: ["os.WriteFile"] + +gating: + fail_on: + - sink_type: CMD_EXEC + when: "added && confidence >= medium" + - sink_type: FILE_WRITE + when: "added && tainted && confidence >= medium" +``` + +### Evidence JSON shape (what the UI consumes) + +```json +{ + "commit": "abc123", + "entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"}, + "sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"}, + "confidence": "high", + "tainted": true, + "path": [ + {"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42}, + {"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18}, + {"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null} + ] +} +``` + +--- + +## 9) Risks and mitigations (explicit) + +1. **Dynamic behavior (reflection, DI, router magic)** + + * Mitigation: conservative fallbacks + confidence labels + optional runtime traces later + +2. **Noise from huge callgraphs** + + * Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, k‑shortest paths only + +3. **Large repo build failures** + + * Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag + +4. **Teams rejecting gating** + + * Mitigation: staged rollout: + + * Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER + +--- + +## 10) Definition of done (what “implemented” means) + +You should consider Smart‑Diff “implemented” when: + +* A repo can enable it with one config + one CI job. +* PRs get: + + * a small number of **actionable** findings (not hundreds) + * each finding has a proof path with file/line hops +* It reliably detects at least: + + * new command execution paths + * new unsafe deserialization paths + * new tainted filesystem write paths +* It can optionally emit VEX decisions backed by reachability evidence. + +--- + +If you want the next step, I can also give you: + +* a **concrete repo layout** with module boundaries, +* the **Postgres schema** (tables + indexes), +* and a **language adapter interface** (so adding Java/Python later is straightforward).